String Question

  • Thread starter Thread starter Ann Marinas
  • Start date Start date
A

Ann Marinas

Happy New Year to all! :D

I am currently developoing an application that imports data from a CSV file.
Each comma represents an array item that I need to extract data with.

My problem is this...

I am encountering a string that has the example below:

a, b, c. "d,e,f,g", abcdef

----The data that has double quotes is considered to be one column.

Whenever I am using the String.Split(',') method, it keeps on breaking the
commas inside the quotes.

How can I prevent this so that it would treat the commas within the quotes
as a single unit?

I would really appreciate the help!

More Power to all! :D

Ann
 
Rather than parsing the data manually, you might want to use the ODBC driver
for text files. It handles all these little details for you, and allows you
to see the file as if it were a database table (you can execute sql
statements, etc).

And happy new year to you too!

-Rob Teixeira [MVP]
 
If only String.Split used strings instead of chars/char
arrays to specify the delimiter/s!

One solution that springs to mind, while not pretty, is
to just iterate through your array looking for elements
that contain double quotes. Assuming your original
string was correctly formatted such that all opening
double quotes had corresponding closing quotes, it should
work (although might take a while if your CSV strings are
huge). Here's a sample of what I mean (note that I have
not parsed, compiled or tested this code fragment - typos
or syntax errors are free bonuses! Also you could
probably make this more elegant - this is just to
illustrate the concept):

string origStr = "a, b, c, \"d,e,f,g\", abcdef";
string[] newArray = origStr.Split(',');
ArrayList finalValues = new ArrayList();
bool foundFirstQuote = false, foundSecondQuote = false;
StringBuilder newElem;
for (i=0; i<newArray.Length; i++)
{
string elem = newArray;
if ((elem.IndexOf("\"") >= 0) || foundFirstQuote)
{
// Our element looks like "a or a"
if (foundFirstQuote)
{
// We got the last value in the multival column
foundSecondQuote = true;
}
else
{
// We got the first val in the multival column
// - start building a single string of comma
// separated values until we hit the next
// double quote
newElem = new StringBuilder();
foundFirstQuote = true;
}
if (newElem.Length > 0)
{
// Separate the values with commas
newElem.Append(",");
}
// Add the new value to the comma-separated list
newElem.Append(elem);
if (foundSecondQuote)
{
// We have now processed the last value in
// the list - add the whole list to our
// ArrayList.
finalValues.Add(newElem.ToString());
foundFirstQuote = false;
foundSecondQuote = false;
}
}
else
{
// This value contained no double quotes - just
// add it as-is to the ArrayList.
finalValues.Add(elem);
}
}

Hope this helps. Sorry there was not a simpler answer!
-----Original Message-----
Happy New Year to all! :D

I am currently developoing an application that imports
data from a CSV file. Each comma represents an array item
that I need to extract data with.

My problem is this...

I am encountering a string that has the example below:

a, b, c. "d,e,f,g", abcdef

----The data that has double quotes is considered to be
one column.

Whenever I am using the String.Split(',') method, it
keeps on breaking the commas inside the quotes.

How can I prevent this so that it would treat the commas
within the quotes as a single unit?

I would really appreciate the help!
 
Greetings
If only String.Split used strings instead of chars/char
arrays to specify the delimiter/s!

I hit that split issue recently. I came up with the following hack solution:

string s = "need<br>to<br>split<br>this<br>string<br>into<br>an<br>array<br>";
string[] parts = s.Replace("<br>", "~").Split('~');

What about Regex.Split()?





Andrew Warren said:
If only String.Split used strings instead of chars/char
arrays to specify the delimiter/s!

One solution that springs to mind, while not pretty, is
to just iterate through your array looking for elements
that contain double quotes. Assuming your original
string was correctly formatted such that all opening
double quotes had corresponding closing quotes, it should
work (although might take a while if your CSV strings are
huge). Here's a sample of what I mean (note that I have
not parsed, compiled or tested this code fragment - typos
or syntax errors are free bonuses! Also you could
probably make this more elegant - this is just to
illustrate the concept):

string origStr = "a, b, c, \"d,e,f,g\", abcdef";
string[] newArray = origStr.Split(',');
ArrayList finalValues = new ArrayList();
bool foundFirstQuote = false, foundSecondQuote = false;
StringBuilder newElem;
for (i=0; i<newArray.Length; i++)
{
string elem = newArray;
if ((elem.IndexOf("\"") >= 0) || foundFirstQuote)
{
// Our element looks like "a or a"
if (foundFirstQuote)
{
// We got the last value in the multival column
foundSecondQuote = true;
}
else
{
// We got the first val in the multival column
// - start building a single string of comma
// separated values until we hit the next
// double quote
newElem = new StringBuilder();
foundFirstQuote = true;
}
if (newElem.Length > 0)
{
// Separate the values with commas
newElem.Append(",");
}
// Add the new value to the comma-separated list
newElem.Append(elem);
if (foundSecondQuote)
{
// We have now processed the last value in
// the list - add the whole list to our
// ArrayList.
finalValues.Add(newElem.ToString());
foundFirstQuote = false;
foundSecondQuote = false;
}
}
else
{
// This value contained no double quotes - just
// add it as-is to the ArrayList.
finalValues.Add(elem);
}
}

Hope this helps. Sorry there was not a simpler answer!
-----Original Message-----
Happy New Year to all! :D

I am currently developoing an application that imports
data from a CSV file. Each comma represents an array item
that I need to extract data with.

My problem is this...

I am encountering a string that has the example below:

a, b, c. "d,e,f,g", abcdef

----The data that has double quotes is considered to be
one column.

Whenever I am using the String.Split(',') method, it
keeps on breaking the commas inside the quotes.

How can I prevent this so that it would treat the commas
within the quotes as a single unit?

I would really appreciate the help!
 
Hi,
[inline]

Ann Marinas said:
Happy New Year to all! :D

I am currently developoing an application that imports data from a CSV file.
Each comma represents an array item that I need to extract data with.

My problem is this...

I am encountering a string that has the example below:

a, b, c. "d,e,f,g", abcdef

----The data that has double quotes is considered to be one column.

Whenever I am using the String.Split(',') method, it keeps on breaking the
commas inside the quotes.

If you want, you could write your own split function like this, it isn't
that complicated:

public static string[] SmartSplit(string line)
{
ArrayList parts = new ArrayList();
int i, iStart=0;
bool bQuoted=false;

for (i=0; i<line.Length; ++i)
{
switch (line)
{
case ',':
if (!bQuoted)
{
parts.Add( line.Substring( iStart, i - iStart ) );
iStart = i + 1;
}
break;
case '"': // read: single double single quote
bQuoted = !bQuoted;
break;
} // end switch
} // end for
parts.Add( line.Substring( iStart, i - iStart ) );

return (string[])parts.ToArray(typeof(string));
}


HTH,
greetings
 
Thank you so much for all of your help!

I really do appreciate it! :)

God Bless!

--Ann
 
Back
Top