T
Tommy Vercetti
I have an application that loads a 150 MB text file containing 6.6
million strings. I want to be able to quickly determine run lookups
against this and determine yes or no; is a given string in this collection.
This obviously requires a binary tree (or possibly a hash table). My
first solution was to use STL with std::set<std::string>. However, this
solution took up 600MB and was unacceptable. I revised this by reading
the entire 150MB text file into a single contiguous block, and building
an std::set based on pointers into this block. I used a custom
comparison operator that compares until the end of a line. This worked
and dramatically reduced load time and RAM use. RAM use is now at 350MB.
However, I'd like to reduce this RAM footprint further. Anyone have any
ideas for this kind of task? I'm thinking of implementing a special case
binary tree that is optimized for this specific task. Obviously, I only
need to balance the tree once and I don't need to worry about additions
or removals once everything is built.
Any ideas/suggestions?
thanks!
million strings. I want to be able to quickly determine run lookups
against this and determine yes or no; is a given string in this collection.
This obviously requires a binary tree (or possibly a hash table). My
first solution was to use STL with std::set<std::string>. However, this
solution took up 600MB and was unacceptable. I revised this by reading
the entire 150MB text file into a single contiguous block, and building
an std::set based on pointers into this block. I used a custom
comparison operator that compares until the end of a line. This worked
and dramatically reduced load time and RAM use. RAM use is now at 350MB.
However, I'd like to reduce this RAM footprint further. Anyone have any
ideas for this kind of task? I'm thinking of implementing a special case
binary tree that is optimized for this specific task. Obviously, I only
need to balance the tree once and I don't need to worry about additions
or removals once everything is built.
Any ideas/suggestions?
thanks!