Why was Intel a no-show on No Execute?

G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

This has been discussed at quite some length in these newsgroups, but now it
looks like the mainstream press are starting to hear about it too. Intel had
to be embarrassed into including NX into its AMD64 implementation.

http://story.news.yahoo.com/news?tmpl=story&cid=1738&ncid=1209&e=7&u=/zd/20040525/tc_zd/127930

There's a few things that this article writer has gotten wrong, but a few
things were right.

One thing he got partially wrong was his statement about Intel having no
execute protection in the 16-bit segments. The feature was still there in
the 32-bit segments, Intel never got rid of them. It was stupid OS designers
who decided to ignore the feature that caused this problem.

Yousuf Khan

--
Humans: contact me at ykhan at rogers dot com
Spambots: just reply to this email address ;-)
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Yousuf Khan wrote:

> One thing he got partially wrong was his statement about Intel
> having no execute protection in the 16-bit segments. The feature
> was still there in the 32-bit segments, Intel never got rid of
> them. It was stupid OS designers who decided to ignore the
> feature that caused this problem.

Are you calling them "stupid" because they opted for paging
instead of segmentation, in an effort to write a portable OS?

Do you think there should be an x86-specific Linux branch,
using segmentation instead of paging?
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Grumble wrote:

> Yousuf Khan wrote:

>> One thing he got partially wrong was his statement about Intel
>> having no execute protection in the 16-bit segments. The feature
>> was still there in the 32-bit segments, Intel never got rid of
>> them. It was stupid OS designers who decided to ignore the
>> feature that caused this problem.

> Are you calling them "stupid" because they opted for paging
> instead of segmentation, in an effort to write a portable OS?

> Do you think there should be an x86-specific Linux branch,
> using segmentation instead of paging?


I don't think it would be so hard to put all the data in a
data segment, and the code in a code segment, without overlapping
them. It requires the CS: prefix on any loads from the code
segment. Self modifying code is out of style these days,
so that shouldn't be much of a problem.

Now, for things like JIT where code is constantly being
written while running some arrangement would need to be made.

-- glen
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

In comp.arch Grumble <a@b.c> wrote:
> Yousuf Khan wrote:
>
> > One thing he got partially wrong was his statement about Intel
> > having no execute protection in the 16-bit segments. The feature
> > was still there in the 32-bit segments, Intel never got rid of
> > them. It was stupid OS designers who decided to ignore the
> > feature that caused this problem.
>
> Are you calling them "stupid" because they opted for paging
> instead of segmentation, in an effort to write a portable OS?
>
> Do you think there should be an x86-specific Linux branch,
> using segmentation instead of paging?
>

There was one for quite a while for pre-386 modes/machines.

--
Sander

+++ Out of cheese error +++
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

In comp.sys.ibm.pc.hardware.chips glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> I don't think it would be so hard to put all the data in a
> data segment, and the code in a code segment, without overlapping
> them. It requires the CS: prefix on any loads from the code
> segment. Self modifying code is out of style these days,
> so that shouldn't be much of a problem.

That _still_ won't help (never mind interpreted or JIT).

If an attacker can redirect execution by modifying the
return address on the stack, s/he doesn't need their own
executable code. Just point to data like "/bin/sh" and
return to an `exec` syscall.

-- Robert
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

In comp.sys.ibm.pc.hardware.chips Robert Redelmeier <redelm@ev1.net.invalid> wrote:
> In comp.sys.ibm.pc.hardware.chips glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
>> I don't think it would be so hard to put all the data in a
>> data segment, and the code in a code segment, without overlapping
>> them. It requires the CS: prefix on any loads from the code
>> segment. Self modifying code is out of style these days,
>> so that shouldn't be much of a problem.
>
> That _still_ won't help (never mind interpreted or JIT).
>
> If an attacker can redirect execution by modifying the
> return address on the stack, s/he doesn't need their own
> executable code. Just point to data like "/bin/sh" and
> return to an `exec` syscall.

Ah, but you make me think -- all current CPUs have an internal
hardware call/return stack to speed up branch [mis]prediction.

It would be relatively simple to check this hw stack against
the memory stack and generate a fault if return addresses
don't match.

This could be enabled by a bit in the MSR if the OS has support
to handle/log "return addr faults". Most pgms should never
generate a return fault, but a mechanism could be made to
except those few that do.

A slightly bigger problem is the hw stacks are of limited
depth (6?) and it might be possible to flood them out.
But variable stack entry pointers would become more effective.

-- Robert
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

> It would be relatively simple to check this hw stack against
> the memory stack and generate a fault if return addresses
> don't match.

Lookup "call-with-current-continuation" to see why this is not a good idea.
Or maybe just think of how to implement exception handling.


Stefan
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Grumble <a@b.c> wrote:
> Yousuf Khan wrote:
>
>> One thing he got partially wrong was his statement about Intel
>> having no execute protection in the 16-bit segments. The feature
>> was still there in the 32-bit segments, Intel never got rid of
>> them. It was stupid OS designers who decided to ignore the
>> feature that caused this problem.
>
> Are you calling them "stupid" because they opted for paging
> instead of segmentation, in an effort to write a portable OS?

No, for not opting to use both. There was no mutual exclusivity between
paging and segmentation. Both could be used and complement each other.

I think the original OS designers in their haste to port Unix to the new
32-bit Intel chip did a simple cross-compile, and then didn't bother to make
use of any of the Intel-specific features of their architecture. They just
left it at "good enough". Of course, using Intel features would've made them
non-portable, but a lot of stuff gets non-portable at the lowest levels of
the kernel anyways.

> Do you think there should be an x86-specific Linux branch,
> using segmentation instead of paging?

There already was. The original pre-1.0 Linux kernels were using segments
*and* paging. I think with addition of new people into the development team,
Linux's original purpose got changed from being the ultimate Intel OS (Unix
or otherwise), to being a free version of portable Unix.

Yousuf Khan
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Robert Redelmeier <redelm@ev1.net.invalid> wrote:
> That _still_ won't help (never mind interpreted or JIT).
>
> If an attacker can redirect execution by modifying the
> return address on the stack, s/he doesn't need their own
> executable code. Just point to data like "/bin/sh" and
> return to an `exec` syscall.

How's an attacker to do that, when the the code, the stack and the heap
don't even share the same memory addresses?

Yousuf Khan
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

In comp.sys.ibm.pc.hardware.chips Yousuf Khan <news.tally.bbbl67@spamgourmet.com> wrote:
> How's an attacker to do that, when the the code, the stack and the heap
> don't even share the same memory addresses?

Easy. Overwrite the stack with crafted input to an unrestricted
input call (getch() is a frequent culprit). This is the basic
buffer overflow.

In the location for the return address (where EBP is usually
pointing), put in a return address that points to a suitably
dangerous part of the existing code. Like an `exec` syscall.
Above this return address, put in data to make that syscall
nefarious.

-- Robert
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

In comp.sys.ibm.pc.hardware.chips Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>> It would be relatively simple to check this hw stack against
>> the memory stack and generate a fault if return addresses
>> don't match.
>
> Lookup "call-with-current-continuation" to see why this is not a good idea.
> Or maybe just think of how to implement exception handling.

Exception handling is easy -- mismatch produces a MC interrupt.
The kernelspace ISR checks the MSRs which tell it that a return
addr mismatch occurred. Kenel decides what to do -- abort proc,
log, or proceed.

Sure it'll be slow, but how often are calls not paired with
returns? call jtable[eax*4] is the standard syntax for a
jump table, not `push eax/ret`

-- Robert
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Sander Vesik <sander@haldjas.folklore.ee> wrote:
>> Do you think there should be an x86-specific Linux branch,
>> using segmentation instead of paging?
>>
>
> There was one for quite a while for pre-386 modes/machines.

That was Minix. Linux has always been for 386 and later machines only.

Yousuf Khan
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Robert Redelmeier <redelm@ev1.net.invalid> wrote:
> In comp.sys.ibm.pc.hardware.chips Yousuf Khan
> <news.tally.bbbl67@spamgourmet.com> wrote:
>> How's an attacker to do that, when the the code, the stack and the
>> heap don't even share the same memory addresses?
>
> Easy. Overwrite the stack with crafted input to an unrestricted
> input call (getch() is a frequent culprit). This is the basic
> buffer overflow.
>
> In the location for the return address (where EBP is usually
> pointing), put in a return address that points to a suitably
> dangerous part of the existing code. Like an `exec` syscall.
> Above this return address, put in data to make that syscall
> nefarious.

Nope, won't work. Segmentation would protect it completely. There is no way
for data written to the heap to touch the data in the stack. Stack segment
and data segment are separate. It's like as if the stack had its own
container, the code has its own, and the data heap its own. What happens in
one container won't even reach the other containers.

Face it, segments were the perfect security mechanism, and systems
developers completely ignored it!

Yousuf Khan
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Robert Redelmeier wrote:

> Overwrite the stack with crafted input to an unrestricted
> input call (getch() is a frequent culprit).

There is no getch() in ISO C.

fgetc(), getc(), and getchar() return a single character.

Perhaps you meant gets().
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Robert Redelmeier wrote:

> Ah, but you make me think -- all current CPUs have an internal
> hardware call/return stack to speed up branch [mis]prediction.

e.g. the Athlon implements a 12-entry return address stack to
predict return addresses from a near or far call. As CALLs are
fetched, the next EIP is pushed onto the return stack. Subsequent
RETs pop a predicted return address off the top of the stack.

> It would be relatively simple to check this hw stack against
> the memory stack and generate a fault if return addresses
> don't match.

I think you've just killed the performance of recursive functions.

> This could be enabled by a bit in the MSR if the OS has support
> to handle/log "return addr faults". Most pgms should never
> generate a return fault

This is where I think you are wrong.

The K8 has a counter to measure this event:

88h IC Return stack hit
89h IC Return stack overflow

It would be interesting to take, say, SPEC CPU2000, and count
the number of overflows for each benchmark. I might try.

--
Regards, Grumble
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Grumble <a@b.c> writes:

>> It would be relatively simple to check this hw stack against
>> the memory stack and generate a fault if return addresses
>> don't match.

>I think you've just killed the performance of recursive functions.

And possibly longjmp()/setcontext() and the like; quite a bit of
additional work is needed to fix all such things (and if you want to
throw in binary compatibility, it's going to be harder still.

Casper
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

>>>>> "YK" == Yousuf Khan <news.tally.bbbl67@spamgourmet.com> writes:

YK> That was Minix. Linux has always been for 386 and later machines
YK> only.

I think the ELKS people will be saddened to hear that.


/Benny
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

In comp.sys.ibm.pc.hardware.chips Grumble <a@b.c> wrote:
> I think you've just killed the performance of recursive functions.

I don't think so. For a recursive function there are many
calls, possibly flooding out the hw return stack. But every
call has a return, and that address _is_ correct on both the
hw and memory stacks.

> 88h IC Return stack hit
> 89h IC Return stack overflow
>
> It would be interesting to take, say, SPEC CPU2000, and count
> the number of overflows for each benchmark. I might try.

Excellent! I do not suggest trapping out overflows.
They're to occur on deep recursion which should not contain
evil getch() calls. Just trap misses.

-- Robert

>
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

In comp.sys.ibm.pc.hardware.chips Grumble <a@b.c> wrote:
> There is no getch() in ISO C.
> Perhaps you meant gets().

Thank you for the correction. I do mean gets().
I apologize for any confusion.

-- Robert

>
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

In comp.sys.ibm.pc.hardware.chips Yousuf Khan <news.tally.bbbl67@spamgourmet.com> wrote:
> Nope, won't work. Segmentation would protect it completely. There is no way
> for data written to the heap to touch the data in the stack. Stack segment
> and data segment are separate. It's like as if the stack had its own
> container, the code has its own, and the data heap its own. What happens in
> one container won't even reach the other containers.

True in a literal sense.

But `c` compilers have this habit of allocating local variable
space on the stack. So when `char input[80];` is coded in a
routine, ESP gets decreased by 80 and that array is sitting
just below the return address!

I don't think it's _required_ by any standard that local vars are
allocated on the stack, but it sure makes memory managment easy.

AFAIK, only global vars and large malloc()s are put on the heap.

-- Robert
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Benny Amorsen <amorsen@vega.amorsen.dk> wrote:
>>>>>> "YK" == Yousuf Khan <news.tally.bbbl67@spamgourmet.com> writes:
>
>> That was Minix. Linux has always been for 386 and later machines
>> only.
>
> I think the ELKS people will be saddened to hear that.

So, it never surprises me to find Linux being ported to do something or
another at some point in time. I guess the question these days to ask is
whether there is something Linux hasn't been ported to? Commodore 64? Apple
II?

Yousuf Khan
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

In comp.arch Yousuf Khan <news.tally.bbbl67@spamgourmet.com> wrote:
> Robert Redelmeier <redelm@ev1.net.invalid> wrote:
> > In comp.sys.ibm.pc.hardware.chips Yousuf Khan
> > <news.tally.bbbl67@spamgourmet.com> wrote:
> >> How's an attacker to do that, when the the code, the stack and the
> >> heap don't even share the same memory addresses?
> >
> > Easy. Overwrite the stack with crafted input to an unrestricted
> > input call (getch() is a frequent culprit). This is the basic
> > buffer overflow.
> >
> > In the location for the return address (where EBP is usually
> > pointing), put in a return address that points to a suitably
> > dangerous part of the existing code. Like an `exec` syscall.
> > Above this return address, put in data to make that syscall
> > nefarious.
>
> Nope, won't work. Segmentation would protect it completely. There is no way
> for data written to the heap to touch the data in the stack. Stack segment

But procedure local variables (including arrays) don't live in the heap,
they live on the stack.

> and data segment are separate. It's like as if the stack had its own
> container, the code has its own, and the data heap its own. What happens in
> one container won't even reach the other containers.

Doesn't matter. All you need for an exploit is to be able to make *one*
system call. And for that, you don't need to write to the code segment
at all. The stack is enough.

>
> Face it, segments were the perfect security mechanism, and systems
> developers completely ignored it!
>
> Yousuf Khan
>
>

--
Sander

+++ Out of cheese error +++
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Robert Redelmeier wrote:

> In comp.sys.ibm.pc.hardware.chips Grumble wrote:
>
>> I think you've just killed the performance of recursive functions.
>
> I don't think so. For a recursive function there are many
> calls, possibly flooding out the hw return stack. But every
> call has a return, and that address _is_ correct on both the
> hw and memory stacks.

You don't call any other function in your recursive functions? :)

>> 88h IC Return stack hit
>> 89h IC Return stack overflow
>>
>> It would be interesting to take, say, SPEC CPU2000, and count
>> the number of overflows for each benchmark. I might try.
>
> Excellent! I do not suggest trapping out overflows.
> They're to occur on deep recursion which should not contain
> evil getch() calls. Just trap misses.

As far as I can tell, and with the exception of recursive
functions which call no other function, RAS overflow will
cause a RET misprediction.
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

Sander Vesik <sander@haldjas.folklore.ee> wrote:
>> and data segment are separate. It's like as if the stack had its own
>> container, the code has its own, and the data heap its own. What
>> happens in one container won't even reach the other containers.
>
> Doesn't matter. All you need for an exploit is to be able to make
> *one* system call. And for that, you don't need to write to the code
> segment at all. The stack is enough.

The only place you can run code is from the code segment. If you insert code
into the stack segment, none of it will be executable. At best it might end
up causing the return address to go to the wrong part of the code segment
and therefore run the program from the wrong point, but more likely the
program will just end up locking up and be shutdown by the OS.

Yousuf Khan
 
G

Guest

Guest
Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips,comp.sys.intel (More info?)

On Thu, 27 May 2004, Yousuf Khan wrote:

> The only place you can run code is from the code segment. If you insert code
> into the stack segment, none of it will be executable. At best it might end
> up causing the return address to go to the wrong part of the code segment
> and therefore run the program from the wrong point, but more likely the
> program will just end up locking up and be shutdown by the OS.

Changing branch address and stack values that get loaded to
arument registers (or just plain stack values on a stack machine)
are enough.

An object dump of a binary with stack overflow reveals the address
of a "system call" instruction, which is enough to know what return
adress is needed.

i.e. you don't need new code to execute you just need to get to
existing insn's in the binary with the appropriate state, and that
appropriate state can be set up with stack only overwriting.

Period.

Peter