Why was Intel a no-show on No Execute?

  • Thread starter Thread starter Yousuf Khan
  • Start date Start date
In comp.sys.ibm.pc.hardware.chips "Peter \"Firefly\" Lund said:
Pointer swizzling.

-Peter

Yeah, just what the world needs, a swapping system for segments. Can
you say _slow_? Given the context of this discussion, about using NX
segments, can you imagine the thrashing with disjoint 4GB code & data
segments? That's what Yousef's proposing you realize.

Jerry
 
Yeah, just what the world needs, a swapping system for segments. Can
you say _slow_? Given the context of this discussion, about using NX
segments, can you imagine the thrashing with disjoint 4GB code & data
segments? That's what Yousef's proposing you realize.

/I/ know that ;)

-Peter
 
Jerry,

Windows works the same way. Each process has its owns
set of page tables. That is how Windows multitasking
works. Part of a process switch is to switch the current set of
page tables. That is how the same virtual address in each process
accesses different physical address.
 
Jerry Peters said:
Yeah, just what the world needs, a swapping system for segments. Can
you say _slow_? Given the context of this discussion, about using NX
segments, can you imagine the thrashing with disjoint 4GB code & data
segments? That's what Yousef's proposing you realize.

Not necessary, the pages will take care of the pointers.

Yousuf Khan
 
Stephen Sprunk said:
How? LDT/GDT are used to map virtual addresses to linear addresses,
and page tables are used to map linear addresses to physical
addresses. If the linear address space is 32-bit, how can a single
process address more than 4GB in unique virtual address space? My
understanding is that systems over 4GB use different page directories
for each process so that the linear address space can be reused, but
I don't see a similar scheme using selectors within a single process.

It's been awhile since this stuff has swirled around in my head, I remember
the last time I really looked at it serious was back in the 386 days. Each
of the segments would have to have different page directories. And yes, I
guess if they have different page directories, then that would mean that the
segments need to occupy different linear addresses. However, the relative
offsets within each segment would still be from 0 to some number.

Yousuf Khan
 
Yousuf Khan said:
It's been awhile since this stuff has swirled around in my head, I remember
the last time I really looked at it serious was back in the 386 days. Each
of the segments would have to have different page directories.

Segments don't have page directories now, and segments didn't have page
directories back on the 386. The page directory base register is a single
global processor register, reloaded either explicitly (load to %cr3, IIRC),
or implicitly through a task gate control transfer between processes
(which is slow, and probably never used in modern OSses).

The only way I know of which could in theory give a single process
more than 32 bits of accessible address space, is the constant making
present / nonpresent of segments, with the kernel fault handler for
nonpresent segments fiddling with the page directory. However, as you
can imagine, that means a user/kernel/user switch whenever a nonpresent
segment is used, accompanied by TLB invalidation stuff on page directory
reloading.

Is anybody aware of a general purpose OS that did/does such segment twiddling?

best regards
Patrick
 
Patrick Schaaf said:
Segments don't have page directories now, and segments didn't have
page directories back on the 386.

I know that, I didn't mean they were literally linked, I meant it had to be
linked through a software setup inside the kernel.
The only way I know of which could in theory give a single process
more than 32 bits of accessible address space, is the constant making
present / nonpresent of segments, with the kernel fault handler for
nonpresent segments fiddling with the page directory. However, as you
can imagine, that means a user/kernel/user switch whenever a
nonpresent segment is used, accompanied by TLB invalidation stuff on
page directory reloading.

Which is the method I was actually thinking of. However, as you said it is a
slow process. It would be a less slow process if the data and code segments
occupied different locations in linear memory, therefore they could share
the same page table directories without requiring special OS-based software
page table switching techniques.
Is anybody aware of a general purpose OS that did/does such segment
twiddling?

I was thinking the pre-1.0 Linux kernels did it, but back then processes
were limited to 16MB of memory each in Linux, so therefore they could've all
fit into a single page directory, so it wasn't the ideal example.

Yousuf Khan
 
Yousuf Khan said:
Which is the method I was actually thinking of. However, as you said it is a
slow process. It would be a less slow process if the data and code segments
occupied different locations in linear memory, therefore they could share
the same page table directories without requiring special OS-based software
page table switching techniques.

That would have been possible if linear addresses had been expanded to
36-bit along with physical addresses, but they weren't :(

S
 
Jerry Peters wrote:

(snip)
??? You're confused. That's how Linux works now, each process has its
own page tables and hence address space. Oh, except Linux doesn't use
a task gate, since 1) it's slow, & 2) limits you to something like
4000 total processes. This still doesn't get you a _single_ AS that's
greater then 4GB. AFAIK x86 has no facilities like IBM's S3X0 has to
have access to multiple address spaces at once.

Isn't that what segment selectors are for?

One problem, though, is that segment descriptors, as far
as I know, aren't cached. On S/3x0 TLB entries include the
STO (segment table origin) in such a way that they don't need
to be invalidated on STO change.

Well, that isn't quite right. Using segment selectors
and the invalid bits in the right way should allow the OS
to update page tables as needed such that segment selectors
can be used as address space selectors. There are
compilers that will generate large model 32 bit code.

-- glen
 
Jerry Peters wrote:

(snip)
No, that still doesn't get you an address space > 4GB.

Yes, it does. Virtual address space doesn't mean it is all
addressable in fast RAM at once.
"Each segment could could have its own separate page entries in the
page directory." What the heck does this mean? A segment doesn't "have"
any entries in the page directory, the linear address which results
from segmentation gets translated to a physical address using the
page directory & page table entries. A linear address is 32 bits,
that's 4GB of address space.

The OS can update page tables and segment valid bits at the
appropriate time. It might be that no OS does that, but the
virtual address space is still 45 bits (not counting ring
bits or global/local bit).

-- glen
 
Jerry Peters wrote:

Yeah, just what the world needs, a swapping system for segments. Can
you say _slow_? Given the context of this discussion, about using NX
segments, can you imagine the thrashing with disjoint 4GB code & data
segments? That's what Yousef's proposing you realize.

If it were done some years ago, it might have made sense.
Now there is x86-64 at a reasonable price.

With reasonably locality, a program won't require updating
page tables all that often. It is only important on
machines with more than 4GB of real RAM, otherwise it
is always virtual, anyway.

-- glen
 
Stephen Sprunk wrote:

(snip)
While the physical address space can be 36-bit, isn't the linear address
space still limited to 32-bit? If that's correct, all segments must have
the same base (zero) if they're to have a 4GB limit.

Since most programs don't have 4GB of code, make the code
segment smaller. Segment descriptors have a limit field
that indicates the length of the segment for that reason.

-- glen
 
Jerry Peters wrote:

(snip)
In a function, automatic local variables are on the stack. I can pass
the address of one of these to any function which is expecting a
pointer of that type, I could also pass an address of a global or an
address returned by malloc, all in the same function call; think
printf and friends and character string arguments. In fact, with a
literal format string and using gcc defaults, I'd also be passing a
pointer in the text area of the program since gcc defaults to putting
ro strings in the code area.

In large model code, the contents of SS or DS would be passed.
Otherwise, SS=DS but DS != CS can still be done.

-- glen
 
In comp.sys.ibm.pc.hardware.chips glen herrmannsfeldt said:
Jerry Peters wrote:

(snip)


Isn't that what segment selectors are for?

One problem, though, is that segment descriptors, as far
as I know, aren't cached. On S/3x0 TLB entries include the
STO (segment table origin) in such a way that they don't need
to be invalidated on STO change.

Well, that isn't quite right. Using segment selectors
and the invalid bits in the right way should allow the OS
to update page tables as needed such that segment selectors
can be used as address space selectors. There are
compilers that will generate large model 32 bit code.

-- glen
Yeah, I thought of that, but the tlb design on the x86 would make
things very slow; every time you switched segments, the tlb would need
to be flushed.
On S3xo, there are instructions to move to/from a secondary address
space, the OS isn't involved except to set up the bind to the
secondary space. Normal paging occurs in both AS's as needed, of
course.
Actaully I believe segment descriptors are cached, I vaguely remember
some old DOS tricks that essentially used the cached descriptors
from protected mode in real mode.
 
In comp.sys.ibm.pc.hardware.chips glen herrmannsfeldt said:
Jerry Peters wrote:

(snip)


In large model code, the contents of SS or DS would be passed.
Otherwise, SS=DS but DS != CS can still be done.

-- glen
In the context of Linux, there is no such thing as "large model code".
There's a flat 4GB AS. According to Linus, gcc doesn't support
segments (from a recent thread on LKML about NX on X64).
 
glen herrmannsfeldt said:
Jerry Peters wrote:

(snip)


Isn't that what segment selectors are for?

Selectors or not, the linear address space is still 32-bit, so a single task
cannot immediately access more than 4GB of RAM no matter how many selectors
it uses.
Well, that isn't quite right. Using segment selectors
and the invalid bits in the right way should allow the OS
to update page tables as needed such that segment selectors
can be used as address space selectors. There are
compilers that will generate large model 32 bit code.

It may be possible to play tricks like this, but a latency in the hundreds
to thousands of cycles for an address space change makes it infeasible.
You'll get a lot more bang for your buck upgrading to a 64-bit system.

S
 
In comp.arch glen herrmannsfeldt said:
Stephen Sprunk wrote:

(snip)


Since most programs don't have 4GB of code, make the code
segment smaller. Segment descriptors have a limit field
that indicates the length of the segment for that reason.

So what happens when I mmap a file and later use mprotect to make
part of it executable?
 
Stephen Sprunk said:
It may be possible to play tricks like this, but a latency in the
hundreds to thousands of cycles for an address space change makes it
infeasible. You'll get a lot more bang for your buck upgrading to a
64-bit system.

Nobody is debating that, the only thing that was debated was whether it was
possible to do. It was just the ultimate limits of the segmented memory
model in an academic exercise.

Yousuf Khan
 
"Each segment could could have its own separate page entries in the
page directory." What the heck does this mean?

I think that Yousuf Khan is proposing having a separate
page table for each segment (which I am assuming
is an x86 segment - I'm coming into the middle of this).
That's an architectural change,
but a relatively minor one. If one of the x86
page table formats that produces >32 bit physical
addresses is used, this would allow each of the
2^32 addresses in a segment to map to some subset
of the, say, 2^40 physical addresses.

The architectural change might have been as simple
as having the Page Table Base Register, PTBR/CR3,
point to a memory region, each of which is the
page table base for a segment. (Taken to the extreme,
this is just another form of page table, with segment number
concatenated to the 32 bit virtual address.)

As you might imagine, Intel looked at this.
IIRC Novel seriously wanted it.

But I am rather glad that Intel did not implement
this for the x86. AMD64 is better.

Actually, in some ways I *like* 2D memory.
But it is a greater departure from the conventional
C memory model. it would have caused much more
pain than simply extending the virtual address
the way AMD64 did.
 
glen herrmannsfeldt said:
One problem, though, is that segment descriptors, as far
as I know, aren't cached.

The Intel P5 (original Pentium) had a segment descriptor cache.
Caused minor incompatibilities because it was not snooped;
i.e. it had "TLB semantics".
 
Back
Top