Fastest way to access data from a file.

  • Thread starter Thread starter Ignacio X. Domínguez
  • Start date Start date
I

Ignacio X. Domínguez

Hi. I'm developing a desktop application that needs to store some data in a
local file. Let's say for example that I want to have an address book with
names and phone numbers in a file. I would like to be able to retrieve the
name by searching for a given phone number the fastest I can.

I have considered the posibility of using XmlTextReader with something like:

<list>
<item number="1234567">
<name>John Doe</name>
</item>
<item number="9876*">
<name>Jane Doe</name>
</item>
</list>

or a simple text file with entries separated by special characters:

1234567~John Doe%9876*~Jane Doe%

and then use Split("%".ToCharArray()) to get an array of items, and then
Split("~".ToCharArray()) on every item to get name and number.

I this database will have around 200 entries (name and number in this
example), so I would like to know which one do you think will perform faster
and better on most configurations.

Thank you in advance.

Ignacio X. Domínguez
 
I would recommend an XML approach.
If the data file is not estimated to be gigantic, I would
use a DOM approach (XmlDocument).
Than you can perform XPath evaluations, which are fast if they are used
efficiently,
to find specific records.
And the data file also makes great sense when viewing in a text editor,
which may be good for debugging/monitoring the data easily.
 
Hi tocayo :)

The fastest way is to having it in memory, you could create a struct with
two members ( name, number) and have an ArrayList of them.. you read them at
the beggining and after that all is from memory.

if for some reason you cannot have them in memory the best way would be
keeping a text file, not a XML, in the text file you keep one record per
line, and divide the fields using a special char. For further performance
you should put the field you will search for first and make it a fixed
length. in this way you can use String.SubString instead of String.Split

Ayway , the best way is to keep them all in memory. even so I think that the
XML is not very good idea here.

Cheers,
 
Hi Dennis,
If the data file is not estimated to be gigantic, I would
use a DOM approach (XmlDocument).

I would have around 200 entries in average, in some cases maybe twice that
number. Do you think using XmlDocument in this case is better than using
XmlTextReader? or viceversa? Does these load the entire XML file into
memory?
Than you can perform XPath evaluations, which are fast if they are used
efficiently,

What is the most efficient way of using it?


Thanks in advance.
 
Hi,

That should make a small file, so I will concentrate myself instead on
which is the most convenient method to make searches/insertions, etc in the
code.

In any way you should read all that data in memory.

cheers,
 
I would have around 200 entries in average
200 entries is just about no trouble for the DOM to parse very fast.
Actually, you could have a lot more than that using DOM as well.
Does these load the entire XML file into memory?
The DOM(XmlDocument) implementation loads the data into memory,
whereas the XmlReader(XmlTextReader) does not.
However I do not think that is a problem here.
What is the most efficient way of using it?
Basically just try to avoid the wildcard(*) in XPath evaluations.


This is my recommendation, assuming the XML look like this:
<list>
<item number="1234567">
<name>John Doe</name>
</item>
<item number="9876543">
<name>Jane Doe</name>
</item>
</list>


<snippet>
string filename = "C:\phonenumbers.xml", phonenumber = "1234567";
XmlDocument doc = new XmlDocument();
doc.Load(filename);
// Select the node.
XmlNode resultNode = doc.DocumentElement.SelectSingleNode
(string.Format("item[@number='{0}']", phonenumber));
// Now we have all information about the item in the resultNode
instance.
if (null != resultNode)
Console.WriteLine("Phone number {0} belongs to {1}!"
, phonenumber, resultNode.SelectSingleNode("name").InnerText);
else
Console.WriteLine("Could not find phone number number {0}.", ssn);
</snippet>
 
Hi tocayo (hehehe), Why do you think a simple text file is better than a XML
in this case? The problem I see with the text file is the special separator
character, because it is going to limit somehow what the strings in each
entry can contain (can't have that char). Therefore, I need to make sure in
code that the strings being stored in the text file never contain that
character. I know I can use any weird character making it unlikely that it
will be chosen by any user, but there is still a chance.

What are your thoughts about this?
 
I was thinking in something like this:

<phonebook>
<item number="1234567">John Doe</item>
<item number="7654321">Jane Doe</item>
</phonebook>

I tried the following code and worked, but not sure if it is the most
efficient way:

GetNameFromNumber(string number)
{
System.Xml.XmlTextReader XMLReader = new
System.Xml.XmlTextReader("PhoneBook.xml");
XMLReader.MoveToContent();
while(XMLReader.Read()
{
XMLReader.MoveToContent();
string thisnum = XMLReader.GetAttribute("number");
if(thisnum != null && thisnum == number)
{
return XMLReader.ReadString();
}
}
return null;
}

What do you think?

Dennis Myrén said:
I would have around 200 entries in average
200 entries is just about no trouble for the DOM to parse very fast.
Actually, you could have a lot more than that using DOM as well.
Does these load the entire XML file into memory?
The DOM(XmlDocument) implementation loads the data into memory,
whereas the XmlReader(XmlTextReader) does not.
However I do not think that is a problem here.
What is the most efficient way of using it?
Basically just try to avoid the wildcard(*) in XPath evaluations.


This is my recommendation, assuming the XML look like this:
<list>
<item number="1234567">
<name>John Doe</name>
</item>
<item number="9876543">
<name>Jane Doe</name>
</item>
</list>


<snippet>
string filename = "C:\phonenumbers.xml", phonenumber = "1234567";
XmlDocument doc = new XmlDocument();
doc.Load(filename);
// Select the node.
XmlNode resultNode = doc.DocumentElement.SelectSingleNode
(string.Format("item[@number='{0}']", phonenumber));
// Now we have all information about the item in the resultNode
instance.
if (null != resultNode)
Console.WriteLine("Phone number {0} belongs to {1}!"
, phonenumber, resultNode.SelectSingleNode("name").InnerText);
else
Console.WriteLine("Could not find phone number number {0}.", ssn);
</snippet>



--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Ignacio X. Domínguez said:
Hi Dennis,


I would have around 200 entries in average, in some cases maybe twice
that
number. Do you think using XmlDocument in this case is better than using
XmlTextReader? or viceversa? Does these load the entire XML file into
memory?


What is the most efficient way of using it?


Thanks in advance.
 
Hi,




If you use a text file, y


Ignacio X. Domínguez said:
Hi tocayo (hehehe), Why do you think a simple text file is better than a XML
in this case?

It's simpler, and faster than the XML approach. With XML you have to deal
with building queries, etc, and searching data is not as easy. you have to
create objects for each queries, etc.

The problem I see with the text file is the special separator
character, because it is going to limit somehow what the strings in each
entry can contain (can't have that char).

No at all, if you use a comma you will get this problem , if you use a
control char you will not have it, I have a similar approach in a pocketPC
app and I use char(20), therefore I dont have the problem of the separator
char happening in a field. a simple String.Split and it's done.

then you can use something like a hashtable and use the number ( or names)
as index for the other value, very easy, efficient and the code is cleaner.


Cheers,
 
Yes, if you insist on using the XmlReader, that looks good.
Just remember that if you use XmlReader, invoke Close() prior to return from
the function,
otherwise the file channel remains open.

If it is likely the function will be called frequently,
then maybe you should think of either having a file channel open or load the
data
at startup maybe into some array of a custom struct, for best performance
possible).

struct Item
{
string number, name;
}


--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Ignacio X. Domínguez said:
I was thinking in something like this:

<phonebook>
<item number="1234567">John Doe</item>
<item number="7654321">Jane Doe</item>
</phonebook>

I tried the following code and worked, but not sure if it is the most
efficient way:

GetNameFromNumber(string number)
{
System.Xml.XmlTextReader XMLReader = new
System.Xml.XmlTextReader("PhoneBook.xml");
XMLReader.MoveToContent();
while(XMLReader.Read()
{
XMLReader.MoveToContent();
string thisnum = XMLReader.GetAttribute("number");
if(thisnum != null && thisnum == number)
{
return XMLReader.ReadString();
}
}
return null;
}

What do you think?

Dennis Myrén said:
I would have around 200 entries in average
200 entries is just about no trouble for the DOM to parse very fast.
Actually, you could have a lot more than that using DOM as well.
Does these load the entire XML file into memory?
The DOM(XmlDocument) implementation loads the data into memory,
whereas the XmlReader(XmlTextReader) does not.
However I do not think that is a problem here.
What is the most efficient way of using it?
Basically just try to avoid the wildcard(*) in XPath evaluations.


This is my recommendation, assuming the XML look like this:
<list>
<item number="1234567">
<name>John Doe</name>
</item>
<item number="9876543">
<name>Jane Doe</name>
</item>
</list>


<snippet>
string filename = "C:\phonenumbers.xml", phonenumber = "1234567";
XmlDocument doc = new XmlDocument();
doc.Load(filename);
// Select the node.
XmlNode resultNode = doc.DocumentElement.SelectSingleNode
(string.Format("item[@number='{0}']", phonenumber));
// Now we have all information about the item in the resultNode
instance.
if (null != resultNode)
Console.WriteLine("Phone number {0} belongs to {1}!"
, phonenumber, resultNode.SelectSingleNode("name").InnerText);
else
Console.WriteLine("Could not find phone number number {0}.", ssn);
</snippet>



--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Ignacio X. Domínguez said:
Hi Dennis,

If the data file is not estimated to be gigantic, I would
use a DOM approach (XmlDocument).

I would have around 200 entries in average, in some cases maybe twice
that
number. Do you think using XmlDocument in this case is better than using
XmlTextReader? or viceversa? Does these load the entire XML file into
memory?

Than you can perform XPath evaluations, which are fast if they are used
efficiently,

What is the most efficient way of using it?


Thanks in advance.



I would recommend an XML approach.
If the data file is not estimated to be gigantic, I would
use a DOM approach (XmlDocument).
Than you can perform XPath evaluations, which are fast if they are used
efficiently,
to find specific records.
And the data file also makes great sense when viewing in a text editor,
which may be good for debugging/monitoring the data easily.

--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Hi. I'm developing a desktop application that needs to store some
data
in
a
local file. Let's say for example that I want to have an address book
with
names and phone numbers in a file. I would like to be able to retrieve
the
name by searching for a given phone number the fastest I can.

I have considered the posibility of using XmlTextReader with something
like:

<list>
<item number="1234567">
<name>John Doe</name>
</item>
<item number="9876*">
<name>Jane Doe</name>
</item>
</list>

or a simple text file with entries separated by special characters:

1234567~John Doe%9876*~Jane Doe%

and then use Split("%".ToCharArray()) to get an array of items, and then
Split("~".ToCharArray()) on every item to get name and number.

I this database will have around 200 entries (name and number in this
example), so I would like to know which one do you think will perform
faster
and better on most configurations.

Thank you in advance.

Ignacio X. Domínguez
 
I chose XmlReader because it doesn't load the XML into memory. The function
is going to be called VERY frequently but the file can also be updated
externally. So if I don't invoke Close() it keeps the file channel open and
reuses it again on the next call or it leaves many open channels? Thank you
SO much for your help.


Dennis Myrén said:
Yes, if you insist on using the XmlReader, that looks good.
Just remember that if you use XmlReader, invoke Close() prior to return
from
the function,
otherwise the file channel remains open.

If it is likely the function will be called frequently,
then maybe you should think of either having a file channel open or load
the
data
at startup maybe into some array of a custom struct, for best performance
possible).

struct Item
{
string number, name;
}


--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Ignacio X. Domínguez said:
I was thinking in something like this:

<phonebook>
<item number="1234567">John Doe</item>
<item number="7654321">Jane Doe</item>
</phonebook>

I tried the following code and worked, but not sure if it is the most
efficient way:

GetNameFromNumber(string number)
{
System.Xml.XmlTextReader XMLReader = new
System.Xml.XmlTextReader("PhoneBook.xml");
XMLReader.MoveToContent();
while(XMLReader.Read()
{
XMLReader.MoveToContent();
string thisnum = XMLReader.GetAttribute("number");
if(thisnum != null && thisnum == number)
{
return XMLReader.ReadString();
}
}
return null;
}

What do you think?

Dennis Myrén said:
I would have around 200 entries in average
200 entries is just about no trouble for the DOM to parse very fast.
Actually, you could have a lot more than that using DOM as well.

Does these load the entire XML file into memory?
The DOM(XmlDocument) implementation loads the data into memory,
whereas the XmlReader(XmlTextReader) does not.
However I do not think that is a problem here.

What is the most efficient way of using it?
Basically just try to avoid the wildcard(*) in XPath evaluations.


This is my recommendation, assuming the XML look like this:
<list>
<item number="1234567">
<name>John Doe</name>
</item>
<item number="9876543">
<name>Jane Doe</name>
</item>
</list>


<snippet>
string filename = "C:\phonenumbers.xml", phonenumber = "1234567";
XmlDocument doc = new XmlDocument();
doc.Load(filename);
// Select the node.
XmlNode resultNode = doc.DocumentElement.SelectSingleNode
(string.Format("item[@number='{0}']", phonenumber));
// Now we have all information about the item in the resultNode
instance.
if (null != resultNode)
Console.WriteLine("Phone number {0} belongs to {1}!"
, phonenumber, resultNode.SelectSingleNode("name").InnerText);
else
Console.WriteLine("Could not find phone number number {0}.", ssn);
</snippet>



--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Hi Dennis,

If the data file is not estimated to be gigantic, I would
use a DOM approach (XmlDocument).

I would have around 200 entries in average, in some cases maybe twice
that
number. Do you think using XmlDocument in this case is better than using
XmlTextReader? or viceversa? Does these load the entire XML file into
memory?

Than you can perform XPath evaluations, which are fast if they are used
efficiently,

What is the most efficient way of using it?


Thanks in advance.



I would recommend an XML approach.
If the data file is not estimated to be gigantic, I would
use a DOM approach (XmlDocument).
Than you can perform XPath evaluations, which are fast if they are used
efficiently,
to find specific records.
And the data file also makes great sense when viewing in a text editor,
which may be good for debugging/monitoring the data easily.

--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Hi. I'm developing a desktop application that needs to store some data
in
a
local file. Let's say for example that I want to have an address book
with
names and phone numbers in a file. I would like to be able to retrieve
the
name by searching for a given phone number the fastest I can.

I have considered the posibility of using XmlTextReader with something
like:

<list>
<item number="1234567">
<name>John Doe</name>
</item>
<item number="9876*">
<name>Jane Doe</name>
</item>
</list>

or a simple text file with entries separated by special characters:

1234567~John Doe%9876*~Jane Doe%

and then use Split("%".ToCharArray()) to get an array of items, and
then
Split("~".ToCharArray()) on every item to get name and number.

I this database will have around 200 entries (name and number in this
example), so I would like to know which one do you think will perform
faster
and better on most configurations.

Thank you in advance.

Ignacio X. Domínguez
 
So if I don't invoke Close() it keeps the file channel open and
reuses it again on the next call or it leaves many open channels?
If you are using a local instance, then always call Close() before returning
the method.
The file channel remains open until hopefully the GC can dispose of it.
When you open the file again, a new channel is attempted to be established.
Always call Close.
The function is going to be called VERY frequently
but the file can also be updated externally.
Then you have no option but to open the file for each request and close
afterwards.
I feel somewhat repeatable, but this is another reason for using DOM
instead.
You could have a global instance, which will allow you to both read and
write.
The changes will be visible directly to anyone who uses the instance.
Only once a session, a file needs to be loaded from disk.
And only once a session(assuming changes have been made) the file needs to
be saved back.
If multiple instances of your application is lileky to run, you should use a
static instance.

Well, it is just my thoughts.
Of course, your solution will turn out good anyway.


--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Ignacio X. Domínguez said:
I chose XmlReader because it doesn't load the XML into memory. The function
is going to be called VERY frequently but the file can also be updated
externally. So if I don't invoke Close() it keeps the file channel open and
reuses it again on the next call or it leaves many open channels? Thank you
SO much for your help.


Dennis Myrén said:
Yes, if you insist on using the XmlReader, that looks good.
Just remember that if you use XmlReader, invoke Close() prior to return
from
the function,
otherwise the file channel remains open.

If it is likely the function will be called frequently,
then maybe you should think of either having a file channel open or load
the
data
at startup maybe into some array of a custom struct, for best performance
possible).

struct Item
{
string number, name;
}


--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Ignacio X. Domínguez said:
I was thinking in something like this:

<phonebook>
<item number="1234567">John Doe</item>
<item number="7654321">Jane Doe</item>
</phonebook>

I tried the following code and worked, but not sure if it is the most
efficient way:

GetNameFromNumber(string number)
{
System.Xml.XmlTextReader XMLReader = new
System.Xml.XmlTextReader("PhoneBook.xml");
XMLReader.MoveToContent();
while(XMLReader.Read()
{
XMLReader.MoveToContent();
string thisnum = XMLReader.GetAttribute("number");
if(thisnum != null && thisnum == number)
{
return XMLReader.ReadString();
}
}
return null;
}

What do you think?

I would have around 200 entries in average
200 entries is just about no trouble for the DOM to parse very fast.
Actually, you could have a lot more than that using DOM as well.

Does these load the entire XML file into memory?
The DOM(XmlDocument) implementation loads the data into memory,
whereas the XmlReader(XmlTextReader) does not.
However I do not think that is a problem here.

What is the most efficient way of using it?
Basically just try to avoid the wildcard(*) in XPath evaluations.


This is my recommendation, assuming the XML look like this:
<list>
<item number="1234567">
<name>John Doe</name>
</item>
<item number="9876543">
<name>Jane Doe</name>
</item>
</list>


<snippet>
string filename = "C:\phonenumbers.xml", phonenumber = "1234567";
XmlDocument doc = new XmlDocument();
doc.Load(filename);
// Select the node.
XmlNode resultNode = doc.DocumentElement.SelectSingleNode
(string.Format("item[@number='{0}']", phonenumber));
// Now we have all information about the item in the resultNode
instance.
if (null != resultNode)
Console.WriteLine("Phone number {0} belongs to {1}!"
, phonenumber, resultNode.SelectSingleNode("name").InnerText);
else
Console.WriteLine("Could not find phone number number {0}.", ssn);
</snippet>



--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Hi Dennis,

If the data file is not estimated to be gigantic, I would
use a DOM approach (XmlDocument).

I would have around 200 entries in average, in some cases maybe twice
that
number. Do you think using XmlDocument in this case is better than using
XmlTextReader? or viceversa? Does these load the entire XML file into
memory?

Than you can perform XPath evaluations, which are fast if they are used
efficiently,

What is the most efficient way of using it?


Thanks in advance.



I would recommend an XML approach.
If the data file is not estimated to be gigantic, I would
use a DOM approach (XmlDocument).
Than you can perform XPath evaluations, which are fast if they are used
efficiently,
to find specific records.
And the data file also makes great sense when viewing in a text editor,
which may be good for debugging/monitoring the data easily.

--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Hi. I'm developing a desktop application that needs to store some data
in
a
local file. Let's say for example that I want to have an address book
with
names and phone numbers in a file. I would like to be able to retrieve
the
name by searching for a given phone number the fastest I can.

I have considered the posibility of using XmlTextReader with something
like:

<list>
<item number="1234567">
<name>John Doe</name>
</item>
<item number="9876*">
<name>Jane Doe</name>
</item>
</list>

or a simple text file with entries separated by special characters:

1234567~John Doe%9876*~Jane Doe%

and then use Split("%".ToCharArray()) to get an array of items, and
then
Split("~".ToCharArray()) on every item to get name and number.

I this database will have around 200 entries (name and number in this
example), so I would like to know which one do you think will perform
faster
and better on most configurations.

Thank you in advance.

Ignacio X. Domínguez
 
Back
Top