B-Tree implementation

Guest · Aug 28, 2007

I have a need to manage a rather large set of indexed objects. Currently I'm
doing this in memory with a Dictionary<> and persisting to a text file, but
it has an unpleasant habit of using very large amounts of memory (several GB).

So I would like to offload these objects onto disk (access time is less
critical than memory in this case).

In the olden days I would just have picked up an open source B-Tree library
for C++, but I can't find one for C# (not at least, that isn't hidden inside
something terribly complicated!).

I could use SQLExpress, but this seems overkill (and has potentially high
overheads). Also, I could end up with billions of records which would put me
in paid for territory...

Can anyone point me to some way of read write persisting of objects with
keyed access (and ideally some modest amount of caching)...

Thanks

Iain

Alvin Bruney [MVP] · Aug 29, 2007

Separate your storage from your access. It shouldn't matter how it is stored
on disk. What matters is that it should be able to resume it's in-memory
representation when it is loaded into memory. To that end, just create a
class with a data structure that is serializable. Provide serialize and
deserialize methods to retrieve and store the data.

Guest · Aug 29, 2007

Alvin Bruney said:
Separate your storage from your access. It shouldn't matter how it is stored
on disk. What matters is that it should be able to resume it's in-memory
representation when it is loaded into memory. To that end, just create a
class with a data structure that is serializable. Provide serialize and
deserialize methods to retrieve and store the data.

Thanks for your response, Alvin, but I haven't made myself clear.

I have potentially several million records keyed by (say) name. I can't
afford for them to be in memory all at once.

So I want to be able to store each record on disk as it is created and later
(possibly much later) retrieve it by the name I've given it.

Serialization (from what I know) means the whole lot is IN or OUT of memory.
Which is not what I want!

Iain

Vadym Stetsiak · Aug 29, 2007

Hello, Iain!

You can serialize/deserialize a single record and then put that record into
one big file and access it by offset.

With best regards, Vadym Stetsiak.
Blog: http://vadmyst.blogspot.com

You wrote on Wed, 29 Aug 2007 01:00:01 -0700:

I> Thanks for your response, Alvin, but I haven't made myself clear.

I> I have potentially several million records keyed by (say) name. I
I> can't afford for them to be in memory all at once.

I> So I want to be able to store each record on disk as it is created
I> and later (possibly much later) retrieve it by the name I've given
I> it.

I> Serialization (from what I know) means the whole lot is IN or OUT of
I> memory.
I> Which is not what I want!

I> Iain

Guest · Aug 29, 2007

Vadym Stetsiak said:
Hello, Iain!

You can serialize/deserialize a single record and then put that record into
one big file and access it by offset.

Fair enough, but that's the easy part

.

I need to access it by name (and in general the objects won't arrive sorted).

Granted, I could have an in memory dictionary (serialised separately) with
the record objects serialised as you say. Though then, there are
complications which updates of the records (if they increase in size after an
update for example).

Though it's certainly true that this could be a solution.

However, I'd still be interested in any C# B-Tree based object serialisation
code ...

Iain

Kevin Spencer · Aug 29, 2007

Hi Iain,

Since you can use unsafe (c/C++) code in a C# application, you can use
something like the following, which is a tutorial on B-Tree algorithms with
source code available for download:

http://cis.stvincent.edu/html/tutorials/swd/btree/btree.html

--
HTH,

Kevin Spencer
Microsoft MVP

DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

Guest · Aug 29, 2007

Kevin Spencer said:
Hi Iain,

Since you can use unsafe (c/C++) code in a C# application, you can use
something like the following, which is a tutorial on B-Tree algorithms with
source code available for download:

http://cis.stvincent.edu/html/tutorials/swd/btree/btree.html

Thanks for this.

For some reason I've never used C++ in the managed environment. I've done
lots of unmanaged c++ functionality as activeX called by c# programs, but
never put them in the same place.

This makes a lot of sense (though somehow I doubt it's *quite* as easy as
all that!).

Iain

Kevin Spencer · Aug 30, 2007

Hi lain,

Actually, it IS quite easy. Just use unsafe code blocks:

unsafe
{
// code
}

In the project, allow unsafe code blocks in the compiler. And Bob's your
uncle!

--
HTH,

Kevin Spencer
Microsoft MVP

DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

B-Tree implementation

Guest

Alvin Bruney [MVP]

Guest

Vadym Stetsiak

Guest

Kevin Spencer

Guest

Kevin Spencer