Why was Intel a no-show on No Execute?

  • Thread starter Thread starter Yousuf Khan
  • Start date Start date
spinlock said:
You posted to COMP.SYS.INTEL, I dont think they make the 29000.
The x86 architecture has one stack.

On the other hand you cross-posted to comp.arch, where folks ponder
such things across multiple architectures. At least the guy actually
specifically stated his sources.
Regardless, no stack can be fragmented. Sorry, they are a singly-linked
list.

No, that's an implementation detail.
No way to delete in the middle.

Forget deletion in the middle, think space allocation if nothing else.

Cheers,
Rupert
 
In comp.sys.ibm.pc.hardware.chips Rob Warnock said:
+---------------
| >And there have certainly been other machines which didn't use single
| >linear stacks, e.g., early IBM S/360 code which statically allocated
| >register save blocks in the callers, which the callees dynamically
| >linked and unlinked (with back-pointers) without moving or "growing"
| >anything. The call stack was this a singly-linked list of static blocks
| >randomly scattered all over memory. [Yes, this did *not* support
| >recursive routines! At least, not this default linkage.]
|
| Sorry, Rob, but that is completely wrong. Yes, it allowed recursive
| routines, in at least three separate ways, all of which were used.
| There was no requirement for the save areas to be static.
+---------------

Sorry, I guess I misspoke. I didn't mean to imply that there was any
*requirement* for the save areas to be static, only that it was permitted
(if recursion wasn't needed). And for some programmers (at least, around
the shops where I was first exposed to S/360 code) it was their default
coding style, and that if you did that the resulting stack was therefore
non-linear (discontinuous).


-Rob

Not only the default style, but many S360 shops had some type of
entry/exit macros which very often put the save area at the beginning
of the module. Something like:
entrypt b xxx(,15)
savearea ds 18f
stm (14,12),12(13)
<chain save areas etc.>

In fact these macros are still around. I've seen them in newer code.
I'd guess it's a case of leave well enough alone.

Jerry
 
Yousuf said:
Nope, won't work. Segmentation would protect it completely. There is no way
for data written to the heap to touch the data in the stack. Stack segment
and data segment are separate. It's like as if the stack had its own
container, the code has its own, and the data heap its own. What happens in
one container won't even reach the other containers.

Face it, segments were the perfect security mechanism, and systems
developers completely ignored it!

They were not, which is one part of why they are not currently used even
though the x86 hardware has them. To separate the data and stack you
need to use long pointers which include the seg info, so a pointer to
integer (let's say) can point to ANY integer. The MS C compiler had a
huge model with separate code, data and stack space, and you could
specify a 2nd data area to use the remaining seg. Like the old FORTRAN
compilers which put things in kommon (sic) to get some extra memory
space (another 8k words on the one I first used).

You can't get even good protection until you go to pointers which have
both an address and a length, and some status bits, so you can pass r/w,
r/o, or executable pointers. In other words, a whole new CPU, which
allows the program to use 'const' for something other than documentation
and compiler hints.

The other reason was that they were too small.
 
Yousuf said:
The only place you can run code is from the code segment. If you insert code
into the stack segment, none of it will be executable. At best it might end
up causing the return address to go to the wrong part of the code segment
and therefore run the program from the wrong point, but more likely the
program will just end up locking up and be shutdown by the OS.

And most x86 operating systems have CS and DS refer to the same physical
memory. Because if you don't you need segment prefixes on data fetches,
in pointers, etc. Which kills performance.
 
Benny said:
YK> That was Minix. Linux has always been for 386 and later machines
YK> only.

I think the ELKS people will be saddened to hear that.

There is a patch set to allow use on a non-MMU x86 system. Think
embedded. Can't find the pointer quickly, but there is one. Will run on
an 8086 as I recall.
 
Bill Davidsen said:
They were not, which is one part of why they are not currently used
even though the x86 hardware has them. To separate the data and stack
you need to use long pointers which include the seg info, so a
pointer to integer (let's say) can point to ANY integer.

You didn't need long pointers at all. Stack access instructions were assumed
to be coming from the SS segment. Data access instructions were assumed to
be coming from the DS segment (though you could override with ES, FS, and
GS, as well). And the segments were all cached inside their own registers,
plus there were segment buffer caches (at least in some of the x86 family
members) which stored the most recently used segments inside the processor
so that they wouldn't have to be reloaded from a segment descriptor table in
memory. Even when using a segment register override, if the segment were
already inside the segment register it wasn't that much of a penalty to jump
(just a few privilege checks performed, but no need to read it in completely
from RAM).
The MS C
compiler had a huge model with separate code, data and stack space,
and you could specify a 2nd data area to use the remaining seg. Like
the old FORTRAN compilers which put things in kommon (sic) to get
some extra memory space (another 8k words on the one I first used).

Those were mainly for the DOS days, running in Real mode. Once the 32-bit
compilers came out, they got rid of all segment usage.
You can't get even good protection until you go to pointers which have
both an address and a length, and some status bits, so you can pass
r/w, r/o, or executable pointers. In other words, a whole new CPU,
which allows the program to use 'const' for something other than
documentation and compiler hints.

You really have no idea about what were in the advanced version of segments
do you? All of that you were asking for, were in Protected mode segments. It
had a start address, a length, and various status bits, including r/w, r/o,
and exec. It was all there in *Protected mode segments*.
The other reason was that they were too small.

Each could be upto 32-bits long, meaning 4GB's per segment. How big did you
want them to be?

Yousuf Khan
 
Bill Davidsen said:
And most x86 operating systems have CS and DS refer to the same
physical memory. Because if you don't you need segment prefixes on
data fetches, in pointers, etc. Which kills performance.

I'll tell you this once, and I'll repeat it as many times as you like. You
didn't need segment prefixes in most cases. All instruction accesses
defaulted to using CS. All data segment accesses defaulted to using DS.
String copy defaulted to DS (source) and ES (destination). Stack accesses
defaulted to the SS segment register. Once they were put into the segment
register, it meant all of their privilege checks were already performed, and
they no longer needed to be checked again. Even the occasional time that
you'd want to use segment override prefixes, you'd be using the two extra
segment registers like FS and GS, even those would be cached in a register
with all of their privilege checks already performed.

Yes, most x86 operating systems defaulted to making the CS and DS refer to
the same locations in memory, and that's precisely what I am criticising --
they shouldn't have ever done that. Pointing them to separate physical
locations is what they should've been doing all along. Let's face it there
is very little reason to read or write data into instruction sections of
memory, and there is very little reason to read instructions from data
sections of memory, so putting them into separate segments would've been
just ideal.

However, I will agree that the exploit where they know the specific location
in memory where a system function call exists, and cleverly overwriting the
return address in the stack segment to point to that address, wouldn't have
been preventable by a non-executable stack of any kind.

Yousuf Khan
 
In comp.arch Bill Davidsen said:
There is a patch set to allow use on a non-MMU x86 system. Think
embedded. Can't find the pointer quickly, but there is one. Will run on
an 8086 as I recall.

Uhh... yes there is and its called ELKS.
 
You didn't need long pointers at all. Stack access instructions were assumed
to be coming from the SS segment. Data access instructions were assumed to
be coming from the DS segment (though you could override with ES, FS, and
GS, as well).

In C, a "data pointer" might point to an object on the stack. Of course,
static analysis can sometimes figure out whether it's pointing to the stack
or not, but in most cases it can't know, so you need long
pointers everywhere :-(
You really have no idea about what were in the advanced version of segments
do you? All of that you were asking for, were in Protected mode segments. It
had a start address, a length, and various status bits, including r/w, r/o,
and exec. It was all there in *Protected mode segments*.

Do they allow mixed ro/rw where some parts are writable and others aren't?
And are you really advocating "one segment per object" ? It's going to be
....interesting... to see how you can make it go fast.

Now compare that with situations where the code is written in a strongly
typed language like ML, Sisal, Ada, Lisp, Modula-3, Java: you get all that
security, super fine-grained and all, with no special extra hardware support
for 2^32 segments. Sure those languages also have their own performance
cost, but you can almost always recover the full performace with careful
coding of the crucial part.
Each could be upto 32-bits long, meaning 4GB's per segment. How big did you
want them to be?

From 1 word to half-the-address-space.


Stefan
 
On Wed, 02 Jun 2004 05:36:57 GMT in comp.arch, "Yousuf Khan"
You didn't need long pointers at all. Stack access instructions were assumed
to be coming from the SS segment. Data access instructions were assumed to
be coming from the DS segment

And if someone passes you a pointer, how do you know which of those two
it is, unless it is some kind of "long" pointer?
 
Also because segmented memory is a royal PITA.
I'll tell you this once, and I'll repeat it as many times as you like. You
didn't need segment prefixes in most cases. All instruction accesses
defaulted to using CS. All data segment accesses defaulted to using DS.
String copy defaulted to DS (source) and ES (destination). Stack accesses
defaulted to the SS segment register. Once they were put into the segment
register, it meant all of their privilege checks were already performed, and
they no longer needed to be checked again. Even the occasional time that
you'd want to use segment override prefixes, you'd be using the two extra
segment registers like FS and GS, even those would be cached in a register
with all of their privilege checks already performed.

Oh yeah? I have a pointer to something. Where is it? If it's an
automatic variable, it's on the stack, if I malloc'd it, it's on the
heap in the DS, if it's a read only variable it may be in the CS.
Now, how do I tell the hardware, without using segment prefixes, where
the object pointed to is?
I now need 48 bit pointers to refer to a 4GB address space, clearly a
waste of memory.
Yes, most x86 operating systems defaulted to making the CS and DS refer to
the same locations in memory, and that's precisely what I am criticising --
they shouldn't have ever done that. Pointing them to separate physical
locations is what they should've been doing all along. Let's face it there
is very little reason to read or write data into instruction sections of
memory, and there is very little reason to read instructions from data
sections of memory, so putting them into separate segments would've been
just ideal.

Some compilers, like gcc, put string constants into the code area,
just because it is write protected.
Also what about trampolines, small code chunks placed on the stack, or
in the data area?
However, I will agree that the exploit where they know the specific location
in memory where a system function call exists, and cleverly overwriting the
return address in the stack segment to point to that address, wouldn't have
been preventable by a non-executable stack of any kind.
Exactly. There was a long discussion on LKML when the non-executable
stack patches were first proposed. The eventual consensus was that
nx-stack made exploits somewhat more difficult, but not
insurmountable.
 
In comp.sys.ibm.pc.hardware.chips Yousuf Khan said:
You didn't need long pointers at all. Stack access instructions were assumed
to be coming from the SS segment. Data access instructions were assumed to
be coming from the DS segment (though you could override with ES, FS, and
GS, as well). And the segments were all cached inside their own registers,
plus there were segment buffer caches (at least in some of the x86 family
members) which stored the most recently used segments inside the processor
so that they wouldn't have to be reloaded from a segment descriptor table in
memory. Even when using a segment register override, if the segment were
already inside the segment register it wasn't that much of a penalty to jump
(just a few privilege checks performed, but no need to read it in completely
from RAM).


Those were mainly for the DOS days, running in Real mode. Once the 32-bit
compilers came out, they got rid of all segment usage.


You really have no idea about what were in the advanced version of segments
do you? All of that you were asking for, were in Protected mode segments. It
had a start address, a length, and various status bits, including r/w, r/o,
and exec. It was all there in *Protected mode segments*.


Each could be upto 32-bits long, meaning 4GB's per segment. How big did you
want them to be?

Not unless the code segment overlaps the data segment. Segmented
addresses are translated to linear addresses, and if paging is on, the
linear address is translated by the mmu to a physical address. The
virtual address space is still only 4GB. Segments only subdivide the
4GB address space.
 
Stefan Monnier said:
In C, a "data pointer" might point to an object on the stack. Of
course, static analysis can sometimes figure out whether it's
pointing to the stack or not, but in most cases it can't know, so you
need long
pointers everywhere :-(

Well, I'm sure the C compiler implementation would know when it's accessing
a stack element or a data element, and use the appropriate pointers (i.e.
stack or data). When are the only times you'd be pointing to an element on
the stack? It's usually only done when accessing a passed argument. You
should know which arguments are passed into a function, and which ones are
local to the function.
Do they allow mixed ro/rw where some parts are writable and others
aren't? And are you really advocating "one segment per object" ?
It's going to be ...interesting... to see how you can make it go fast.

No, the segments themselves would be all one attribute. If you wanted more
fine-grained tuning of the memory usage, then you'd have to use the
combination of the segments and pages. Depending on the combination
privilege levels between the pages and the segments, the system defaults to
the higher security option.
From 1 word to half-the-address-space.

Yup.

Yousuf Khan
 
Well, I'm sure the C compiler implementation would know when it's accessing
a stack element or a data element, and use the appropriate pointers (i.e.
stack or data). When are the only times you'd be pointing to an element on
the stack? It's usually only done when accessing a passed argument. You
should know which arguments are passed into a function, and which ones are
local to the function.

<snarly>
I'm happy to see that all those people spending their career on this
problem are just idiots who should just ask you instead of wasting
everybody's time and money.
</>

This problem is simple if you're allowed to change the language such that
the coder has to put some annotations. But if you stick to C, it's
*very* difficult.
No, the segments themselves would be all one attribute. If you wanted more
fine-grained tuning of the memory usage, then you'd have to use the
combination of the segments and pages.

I know "fine-grained" is a debatable notion, but for me it mean something
like word-granularity. I'll accept 64bit or even 128bit words, but 512bytes
are not "a word".

How the other two questions: one segment per object? performance?


Stefan
 
Yousuf Khan said:
Well, I'm sure the C compiler implementation would know when it's accessing
a stack element or a data element, and use the appropriate pointers (i.e.
stack or data). When are the only times you'd be pointing to an element on
the stack? It's usually only done when accessing a passed argument. You
should know which arguments are passed into a function, and which ones are
local to the function.

I wish that's the case.
Unfortunately, you can take the address of
an automatic variable (which is on the stack), and store it in heap.
So the static compiler can not tell whether any arbitrary pointer
points to stack or heap unless some fancy analysis can prove otherwise.

If you like segments that much,
check out guarded pointer if you haven't done so yet.
One of relatively recent papers on guarded pointer:

http://www.ai.mit.edu/projects/aries/Documents/Memos/ARIES-02.pdf

Seongbae
 
David Harmon said:
On Wed, 02 Jun 2004 05:36:57 GMT in comp.arch, "Yousuf Khan"


And if someone passes you a pointer, how do you know which of those
two it is, unless it is some kind of "long" pointer?

I'm not going to get into C compiler design issues, if C requires that the
stack and data segments be combined, then let them be combined. That's also
possible, you can still keep the code segment off in a separate segment.

Yousuf Khan
 
Jerry Peters said:
Not unless the code segment overlaps the data segment. Segmented
addresses are translated to linear addresses, and if paging is on, the
linear address is translated by the mmu to a physical address. The
virtual address space is still only 4GB. Segments only subdivide the
4GB address space.


True, but we're not talking about a pure segments-only arrangement. We're
talking about a segmented/paged arrangement. Each segment could could have
its own separate page entries in the page directory. Meaning that with
demand-paging only the sections of a segment required to be in memory could
be in memory while the rest remains marked virtual on disk.

Yousuf Khan
 
Yousuf Khan said:
Well, I'm sure the C compiler implementation would know when it's accessing
a stack element or a data element, and use the appropriate pointers (i.e.
stack or data). When are the only times you'd be pointing to an element on
the stack? It's usually only done when accessing a passed argument. You
should know which arguments are passed into a function, and which ones are
local to the function.


No. It's quite common to pass a pointer to a piece of automatic (on
stack) storage to a function accepting pointers to that type of data.
Or a pointer to global, or allocated storage. Unless you pass some
additional data (in x86, a selector, for example), the called routine
has no way to tell which segment to look in. Consider:


char a[20];
void f(void)
{
char b[20];
char *c = malloc(20);
static char d[20];

strcpy(a, "abc");
strcpy(b, "abc");
strcpy(c, "abc");
strcpy(d, "abc");
}


What's poor strcpy() to do?
 
This problem is simple if you're allowed to change the language such that
the coder has to put some annotations. But if you stick to C, it's
*very* difficult.

It is probably equivalent to solving the halting problem...8-).

With C and seperate compilation, I see no way this could be done except by
using "long" pointers everywhere.

Jan
 
Well, I'm sure the C compiler implementation would know when it's accessing
a stack element or a data element, and use the appropriate pointers (i.e.

You mean a data element on the stack, on the heap, or in the static data
area (each of which might have its own segment if you are really
paranoic) ?

The answer is: no, of course not.

Only when it is /directly/ accessing that datum does it know for sure.
If you throw lots of compiler technology at the problem you can do
somewhat better but not all that much.

Imagine you are implementing strcpy(), which takes two pointers. Can
strcpy() gues which segments to use? No, of course not. Better pass it
long pointers, then.

Duh.

-Peter
 
Back
Top