Input from a CSV file double quotation mark proglem

  • Thread starter Thread starter Gregg
  • Start date Start date
G

Gregg

Hello all,

I have been banging my head over a problem that I am having reading a
comma seperated file (CSV) that can contain from 1 to 10,000 records.
My code snipit is as follows:

**Start code snipit**

Dim strCustFullName as string
Dim strCustAddr1 as string
Dim strCustAddr2 as string
Dim strCustCity as string
Dim strCustState as string
Dim strCustZip as string
Dim strCustLastName as string
Dim strCustFirstName as string

FileOpen(1, filename, OpenMode.Input)

Do While Not EOF(1)
Input(1,strCustFullName)
Input(1,strCustAddr1)
Input(1,strCustAddr2)
Input(1,strCustCity)
Input(1,strCustState)
Input(1,strCustZip)

'I then proceed to open and SQL table and insert the record I have
just
'read.
'I then close the SQL table and loop back to the next record in the
'CSV file

Loop

**End Code Snipit**

My CSV file that I am reading from looks like this:

**Start CSV Example**

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

**End CSV Example**

ok, so what I end up with is:
strCustFullName = LASTNAME,FIRSTNAME
strCustAddr1 =ADDR1
strCustAddr2 = ADDR2
strCustCity = CITY
strCustState = STATE
strCustZip = ZIP

I then use a SPLIT on strCustFullName on the "," and assign the two
values to strCustLastName and strCustFirst Name.

All of this works perfectly....until today. Today, I get a CSV file
that
looks like this:

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME""",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

The second record causes my input loop to die and treats the NICKNAME
as
though it was going in the Addr1 column.

So, how do I handle my input statements to ignore the quotes around
the NICKNAME
or at least treat them like a string and not a field seperator. As a
side note,
yes, I know there are 3 quotes after the NICKNAME. I was told by the
programmer
that passes this file to me that is the proper format and I tested
that by
opening this file with Excel, and Excel read the data correctly and
displayed
it as LASTNAME,FIRSTNAME "NICKNAME" (which is how the data looks from
the
original source)

The full intent of this process is to create an SQL table with the
name of
the CSV file and then populate it with the records within the CSV
file.
Process it for a Crystal Report and then throw the CSV file and the
Table
away. All worked great until I hit this.

I would appreciate any help at all. Thanks so much
Gregg Cliburn
 
You should be able to do this with a Regular Expression. I did a quick
search in Google for the term "CSV REGEX" and came across several regex's.

Here is the pattern and the site it came from.
(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)
http://geekswithblogs.net/mwatson/archive/2004/09/04/10658.aspx

I tested your sample data (in Chris Sells RegEx Designer .NET) and it seemed
to do the job fairly well. I think it will need to be modified slightly,
but, it pulled out the majority of the data correctly.
HTH,
Jared
 
Jared,

Thanks so much for the reply. However, the use of REGEX confuses me.
Can you give me an example of how I would use it to test the
strCustFullName field in my example ignore the quotes around the
nickname, but to throw away the quotes around the whole field.

Thanks for helping out a Dotnet newbie.
Gregg




Jared said:
You should be able to do this with a Regular Expression. I did a quick
search in Google for the term "CSV REGEX" and came across several regex's.

Here is the pattern and the site it came from.
(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)
http://geekswithblogs.net/mwatson/archive/2004/09/04/10658.aspx

I tested your sample data (in Chris Sells RegEx Designer .NET) and it seemed
to do the job fairly well. I think it will need to be modified slightly,
but, it pulled out the majority of the data correctly.
HTH,
Jared


Gregg said:
Hello all,

I have been banging my head over a problem that I am having reading a
comma seperated file (CSV) that can contain from 1 to 10,000 records.
My code snipit is as follows:

**Start code snipit**

Dim strCustFullName as string
Dim strCustAddr1 as string
Dim strCustAddr2 as string
Dim strCustCity as string
Dim strCustState as string
Dim strCustZip as string
Dim strCustLastName as string
Dim strCustFirstName as string

FileOpen(1, filename, OpenMode.Input)

Do While Not EOF(1)
Input(1,strCustFullName)
Input(1,strCustAddr1)
Input(1,strCustAddr2)
Input(1,strCustCity)
Input(1,strCustState)
Input(1,strCustZip)

'I then proceed to open and SQL table and insert the record I have
just
'read.
'I then close the SQL table and loop back to the next record in the
'CSV file

Loop

**End Code Snipit**

My CSV file that I am reading from looks like this:

**Start CSV Example**

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

**End CSV Example**

ok, so what I end up with is:
strCustFullName = LASTNAME,FIRSTNAME
strCustAddr1 =ADDR1
strCustAddr2 = ADDR2
strCustCity = CITY
strCustState = STATE
strCustZip = ZIP

I then use a SPLIT on strCustFullName on the "," and assign the two
values to strCustLastName and strCustFirst Name.

All of this works perfectly....until today. Today, I get a CSV file
that
looks like this:

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME""",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

The second record causes my input loop to die and treats the NICKNAME
as
though it was going in the Addr1 column.

So, how do I handle my input statements to ignore the quotes around
the NICKNAME
or at least treat them like a string and not a field seperator. As a
side note,
yes, I know there are 3 quotes after the NICKNAME. I was told by the
programmer
that passes this file to me that is the proper format and I tested
that by
opening this file with Excel, and Excel read the data correctly and
displayed
it as LASTNAME,FIRSTNAME "NICKNAME" (which is how the data looks from
the
original source)

The full intent of this process is to create an SQL table with the
name of
the CSV file and then populate it with the records within the CSV
file.
Process it for a Crystal Report and then throw the CSV file and the
Table
away. All worked great until I hit this.

I would appreciate any help at all. Thanks so much
Gregg Cliburn
 
I put this small example together for you. It provides no error handling,
you'll have to add it yourself. The input I used was as follows. Notice that
I removed one of the quotes after nickname.
HTH,
Jared

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME"",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME"",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME"",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

Imports System.IO
Imports System.Text.RegularExpressions

Private Sub ParseCSV(byval PathToFile as string)
Dim strCustFullName As String
Dim strCustAddr1 As String
Dim strCustAddr2 As String
Dim strCustCity As String
Dim strCustState As String
Dim strCustZip As String
Dim strCustLastName As String
Dim strCustFirstName As String

Dim File As New FileStream(PathToFile, FileMode.Open)
Dim Reader As New StreamReader(File)
Dim Line As String
Dim Reg As New
System.Text.RegularExpressions.Regex("(?:^|,)(\\\""(?:[^\\\""]+|\\\""\\\"")*\\\""|[^,]*)")
Dim Elements() As String
Do While Reader.Peek >= 0
Line = Reader.ReadLine
Elements = Reg.Split(Line)

strCustLastName =
Elements(1).Remove(Elements(1).IndexOf(ControlChars.Quote), 1)
strCustFirstName =
Elements(3).Remove(Elements(3).LastIndexOf(ControlChars.Quote), 1)
strCustAddr1 = Elements(5)
strCustAddr2 = Elements(7)
strCustCity = Elements(9)
strCustState = Elements(11)
strCustZip = Elements(13)
' Do something with fields here
' "INSERT INTO MYDATABASE VALUES(" & strCustFirstName, strCustLastName,
strCustAddr1, _
' strCustAddr2, strCustCity, strCustZip & ")"
Loop
Reader.Close()
File.Close()
End Sub

Gregg said:
Jared,

Thanks so much for the reply. However, the use of REGEX confuses me.
Can you give me an example of how I would use it to test the
strCustFullName field in my example ignore the quotes around the
nickname, but to throw away the quotes around the whole field.

Thanks for helping out a Dotnet newbie.
Gregg




Jared said:
You should be able to do this with a Regular Expression. I did a quick
search in Google for the term "CSV REGEX" and came across several
regex's.

Here is the pattern and the site it came from.
(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)
http://geekswithblogs.net/mwatson/archive/2004/09/04/10658.aspx

I tested your sample data (in Chris Sells RegEx Designer .NET) and it
seemed
to do the job fairly well. I think it will need to be modified slightly,
but, it pulled out the majority of the data correctly.
HTH,
Jared


Gregg said:
Hello all,

I have been banging my head over a problem that I am having reading a
comma seperated file (CSV) that can contain from 1 to 10,000 records.
My code snipit is as follows:

**Start code snipit**

Dim strCustFullName as string
Dim strCustAddr1 as string
Dim strCustAddr2 as string
Dim strCustCity as string
Dim strCustState as string
Dim strCustZip as string
Dim strCustLastName as string
Dim strCustFirstName as string

FileOpen(1, filename, OpenMode.Input)

Do While Not EOF(1)
Input(1,strCustFullName)
Input(1,strCustAddr1)
Input(1,strCustAddr2)
Input(1,strCustCity)
Input(1,strCustState)
Input(1,strCustZip)

'I then proceed to open and SQL table and insert the record I have
just
'read.
'I then close the SQL table and loop back to the next record in the
'CSV file

Loop

**End Code Snipit**

My CSV file that I am reading from looks like this:

**Start CSV Example**

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

**End CSV Example**

ok, so what I end up with is:
strCustFullName = LASTNAME,FIRSTNAME
strCustAddr1 =ADDR1
strCustAddr2 = ADDR2
strCustCity = CITY
strCustState = STATE
strCustZip = ZIP

I then use a SPLIT on strCustFullName on the "," and assign the two
values to strCustLastName and strCustFirst Name.

All of this works perfectly....until today. Today, I get a CSV file
that
looks like this:

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME""",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

The second record causes my input loop to die and treats the NICKNAME
as
though it was going in the Addr1 column.

So, how do I handle my input statements to ignore the quotes around
the NICKNAME
or at least treat them like a string and not a field seperator. As a
side note,
yes, I know there are 3 quotes after the NICKNAME. I was told by the
programmer
that passes this file to me that is the proper format and I tested
that by
opening this file with Excel, and Excel read the data correctly and
displayed
it as LASTNAME,FIRSTNAME "NICKNAME" (which is how the data looks from
the
original source)

The full intent of this process is to create an SQL table with the
name of
the CSV file and then populate it with the records within the CSV
file.
Process it for a Crystal Report and then throw the CSV file and the
Table
away. All worked great until I hit this.

I would appreciate any help at all. Thanks so much
Gregg Cliburn
 
Jared,

Thank you so much! I truly appreciate it. It seems to work pretty
good, except now I have found that sometimes the CITY column will
appear like this:
"CITY"
and the name might look like the following:
"LASTNAME,FIRSTNAME,MI"

I never realized that people could enter information in so many
different ways!
I have tried to add the

..Remove(Elements(1).IndexOf(ControlChars.Quote), 1)

to each of the split elements, but that doesn't seem to work, probably
because the System.Text.RegularExpressions.Regex("(?:^|,)(\\\""(?:[^\\\""]+|\\\""\\\"")*\\\""|[^,]*)")
would be different. Can I add the additional stuff to check for within
this one line, or do I need to define a different one for each
situation?

Thanks again for all the help, you are definatly showing me the right
direction.
Gregg


Jared said:
I put this small example together for you. It provides no error handling,
you'll have to add it yourself. The input I used was as follows. Notice that
I removed one of the quotes after nickname.
HTH,
Jared

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME"",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME"",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME"",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

Imports System.IO
Imports System.Text.RegularExpressions

Private Sub ParseCSV(byval PathToFile as string)
Dim strCustFullName As String
Dim strCustAddr1 As String
Dim strCustAddr2 As String
Dim strCustCity As String
Dim strCustState As String
Dim strCustZip As String
Dim strCustLastName As String
Dim strCustFirstName As String

Dim File As New FileStream(PathToFile, FileMode.Open)
Dim Reader As New StreamReader(File)
Dim Line As String
Dim Reg As New
System.Text.RegularExpressions.Regex("(?:^|,)(\\\""(?:[^\\\""]+|\\\""\\\"")*\\\""|[^,]*)")
Dim Elements() As String
Do While Reader.Peek >= 0
Line = Reader.ReadLine
Elements = Reg.Split(Line)

strCustLastName =
Elements(1).Remove(Elements(1).IndexOf(ControlChars.Quote), 1)
strCustFirstName =
Elements(3).Remove(Elements(3).LastIndexOf(ControlChars.Quote), 1)
strCustAddr1 = Elements(5)
strCustAddr2 = Elements(7)
strCustCity = Elements(9)
strCustState = Elements(11)
strCustZip = Elements(13)
' Do something with fields here
' "INSERT INTO MYDATABASE VALUES(" & strCustFirstName, strCustLastName,
strCustAddr1, _
' strCustAddr2, strCustCity, strCustZip & ")"
Loop
Reader.Close()
File.Close()
End Sub

Gregg said:
Jared,

Thanks so much for the reply. However, the use of REGEX confuses me.
Can you give me an example of how I would use it to test the
strCustFullName field in my example ignore the quotes around the
nickname, but to throw away the quotes around the whole field.

Thanks for helping out a Dotnet newbie.
Gregg




Jared said:
You should be able to do this with a Regular Expression. I did a quick
search in Google for the term "CSV REGEX" and came across several
regex's.

Here is the pattern and the site it came from.
(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)
http://geekswithblogs.net/mwatson/archive/2004/09/04/10658.aspx

I tested your sample data (in Chris Sells RegEx Designer .NET) and it
seemed
to do the job fairly well. I think it will need to be modified slightly,
but, it pulled out the majority of the data correctly.
HTH,
Jared


Hello all,

I have been banging my head over a problem that I am having reading a
comma seperated file (CSV) that can contain from 1 to 10,000 records.
My code snipit is as follows:

**Start code snipit**

Dim strCustFullName as string
Dim strCustAddr1 as string
Dim strCustAddr2 as string
Dim strCustCity as string
Dim strCustState as string
Dim strCustZip as string
Dim strCustLastName as string
Dim strCustFirstName as string

FileOpen(1, filename, OpenMode.Input)

Do While Not EOF(1)
Input(1,strCustFullName)
Input(1,strCustAddr1)
Input(1,strCustAddr2)
Input(1,strCustCity)
Input(1,strCustState)
Input(1,strCustZip)

'I then proceed to open and SQL table and insert the record I have
just
'read.
'I then close the SQL table and loop back to the next record in the
'CSV file

Loop

**End Code Snipit**

My CSV file that I am reading from looks like this:

**Start CSV Example**

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

**End CSV Example**

ok, so what I end up with is:
strCustFullName = LASTNAME,FIRSTNAME
strCustAddr1 =ADDR1
strCustAddr2 = ADDR2
strCustCity = CITY
strCustState = STATE
strCustZip = ZIP

I then use a SPLIT on strCustFullName on the "," and assign the two
values to strCustLastName and strCustFirst Name.

All of this works perfectly....until today. Today, I get a CSV file
that
looks like this:

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME""",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

The second record causes my input loop to die and treats the NICKNAME
as
though it was going in the Addr1 column.

So, how do I handle my input statements to ignore the quotes around
the NICKNAME
or at least treat them like a string and not a field seperator. As a
side note,
yes, I know there are 3 quotes after the NICKNAME. I was told by the
programmer
that passes this file to me that is the proper format and I tested
that by
opening this file with Excel, and Excel read the data correctly and
displayed
it as LASTNAME,FIRSTNAME "NICKNAME" (which is how the data looks from
the
original source)

The full intent of this process is to create an SQL table with the
name of
the CSV file and then populate it with the records within the CSV
file.
Process it for a Crystal Report and then throw the CSV file and the
Table
away. All worked great until I hit this.

I would appreciate any help at all. Thanks so much
Gregg Cliburn
 
Gregg,
The remove method should work, make sure you are setting value when you
use it.
For instance:

Dim str as string = "test"
' This line does nothing, the return value of the function was never
assigned to anything
str.Remove(0,1)
' the value of str = "test"
' This will work - forgive the explanation if you were doing this
str = str.Remove(0,1)
' the value of str = "est"


I would probably use the replace method vs. the remove in this scenario
Elements(1) = Elements(1).Replace(ControlChars.Quote, ""c)

The following are just suggestions, read them or not, just trying to give
you some extra ideas.

I think you would have better luck if you converted your csv input files
into xml equalivants. As it stands right now, you should be able to publish
a schema document (xsd) for your customers that you can validate the xml
against. This way the customer that supplied you with the file knows exactly
what format you will accept. It also helps to alleviate the problems with
your formatting issues, as long as the document validates you shouldn't have
a problem with your inputs/inserts and you can modify the file (for the most
part) without breaking or modifying your application.

Something like the following files will save you time in the long run,
unfortunately, you will have to rewrite your code to accommodate, at least
provide an overloaded method that accepts an xml file or test on the file
extension. I just typed in a simple xml document structure in Visual Studio,
and let it create the schema document for me, then I modified it ever so
slightly and tested to see if it validated. Now if customers all supplied
you with a document in the customers.xml structure, you can use an
xmlvalidatingreader and validate the document, and if it passes, you can
parse it and input it into your RDBMS.

Something like this doesn't take much effort to learn, there are several
books about xml alone, but, I learned the basics from www.w3schools.com and
the msdn help files and most people know a little about xml, enough to get
you through any immediate problems. Another nice thing about this is that
you can create a typed dataset that will speed up your inserts/updates etc.

Hope this helps,
Jared

' Customers.xml

<?xml version="1.0" encoding="utf-8"?>
<Customers xmlns="http://tempuri.org/Customers.xsd">
<Customer First="FirstName" Last="LastName" NickName="">
<Address>
<Street>
<Line>ADDR1</Line>
<Line>ADDR2</Line>
</Street>
<City>CITY</City>
<State>STATE</State>
<Zip>ZIP</Zip>
</Address>
</Customer>
</Customers>

' Customers.xsd
<?xml version="1.0"?>
<xs:schema id="Customers" targetNamespace="http://tempuri.org/Customers.xsd"
xmlns:mstns="http://tempuri.org/Customers.xsd"
xmlns="http://tempuri.org/Customers.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"
attributeFormDefault="qualified" elementFormDefault="qualified">
<xs:element name="Customers" msdata:IsDataSet="true"
msdata:EnforceConstraints="False">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Customer">
<xs:complexType>
<xs:sequence>
<xs:element name="Address" minOccurs="1" maxOccurs="1">
<xs:complexType>
<xs:sequence>
<xs:element name="Street" minOccurs="1" maxOccurs="1">
<xs:complexType>
<xs:sequence>
<xs:element name="Line" nillable="true"
minOccurs="1" maxOccurs="unbounded">
<xs:complexType>
<xs:simpleContent
msdata:ColumnName="Line_Text" msdata:Ordinal="0">
<xs:extension base="xs:string">
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="City" type="xs:string" minOccurs="1"
/>
<xs:element name="State" type="xs:string" minOccurs="1"
/>
<xs:element name="Zip" type="xs:string" minOccurs="1" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="First" form="unqualified" type="xs:string"
use="required" />
<xs:attribute name="Last" form="unqualified" type="xs:string"
use="required"/>
<xs:attribute name="NickName" form="unqualified"
type="xs:string" use="optional" />
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>

Gregg said:
Jared,

Thank you so much! I truly appreciate it. It seems to work pretty
good, except now I have found that sometimes the CITY column will
appear like this:
"CITY"
and the name might look like the following:
"LASTNAME,FIRSTNAME,MI"

I never realized that people could enter information in so many
different ways!
I have tried to add the

.Remove(Elements(1).IndexOf(ControlChars.Quote), 1)

to each of the split elements, but that doesn't seem to work, probably
because the
System.Text.RegularExpressions.Regex("(?:^|,)(\\\""(?:[^\\\""]+|\\\""\\\"")*\\\""|[^,]*)")
would be different. Can I add the additional stuff to check for within
this one line, or do I need to define a different one for each
situation?

Thanks again for all the help, you are definatly showing me the right
direction.
Gregg


Jared said:
I put this small example together for you. It provides no error handling,
you'll have to add it yourself. The input I used was as follows. Notice
that
I removed one of the quotes after nickname.
HTH,
Jared

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME"",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME"",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME"",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

Imports System.IO
Imports System.Text.RegularExpressions

Private Sub ParseCSV(byval PathToFile as string)
Dim strCustFullName As String
Dim strCustAddr1 As String
Dim strCustAddr2 As String
Dim strCustCity As String
Dim strCustState As String
Dim strCustZip As String
Dim strCustLastName As String
Dim strCustFirstName As String

Dim File As New FileStream(PathToFile, FileMode.Open)
Dim Reader As New StreamReader(File)
Dim Line As String
Dim Reg As New
System.Text.RegularExpressions.Regex("(?:^|,)(\\\""(?:[^\\\""]+|\\\""\\\"")*\\\""|[^,]*)")
Dim Elements() As String
Do While Reader.Peek >= 0
Line = Reader.ReadLine
Elements = Reg.Split(Line)

strCustLastName =
Elements(1).Remove(Elements(1).IndexOf(ControlChars.Quote), 1)
strCustFirstName =
Elements(3).Remove(Elements(3).LastIndexOf(ControlChars.Quote), 1)
strCustAddr1 = Elements(5)
strCustAddr2 = Elements(7)
strCustCity = Elements(9)
strCustState = Elements(11)
strCustZip = Elements(13)
' Do something with fields here
' "INSERT INTO MYDATABASE VALUES(" & strCustFirstName,
strCustLastName,
strCustAddr1, _
' strCustAddr2, strCustCity, strCustZip & ")"
Loop
Reader.Close()
File.Close()
End Sub

Gregg said:
Jared,

Thanks so much for the reply. However, the use of REGEX confuses me.
Can you give me an example of how I would use it to test the
strCustFullName field in my example ignore the quotes around the
nickname, but to throw away the quotes around the whole field.

Thanks for helping out a Dotnet newbie.
Gregg




You should be able to do this with a Regular Expression. I did a quick
search in Google for the term "CSV REGEX" and came across several
regex's.

Here is the pattern and the site it came from.
(?:^|,)(\\\"(?:[^\\\"]+|\\\"\\\")*\\\"|[^,]*)
http://geekswithblogs.net/mwatson/archive/2004/09/04/10658.aspx

I tested your sample data (in Chris Sells RegEx Designer .NET) and it
seemed
to do the job fairly well. I think it will need to be modified
slightly,
but, it pulled out the majority of the data correctly.
HTH,
Jared


Hello all,

I have been banging my head over a problem that I am having reading
a
comma seperated file (CSV) that can contain from 1 to 10,000
records.
My code snipit is as follows:

**Start code snipit**

Dim strCustFullName as string
Dim strCustAddr1 as string
Dim strCustAddr2 as string
Dim strCustCity as string
Dim strCustState as string
Dim strCustZip as string
Dim strCustLastName as string
Dim strCustFirstName as string

FileOpen(1, filename, OpenMode.Input)

Do While Not EOF(1)
Input(1,strCustFullName)
Input(1,strCustAddr1)
Input(1,strCustAddr2)
Input(1,strCustCity)
Input(1,strCustState)
Input(1,strCustZip)

'I then proceed to open and SQL table and insert the record I have
just
'read.
'I then close the SQL table and loop back to the next record in the
'CSV file

Loop

**End Code Snipit**

My CSV file that I am reading from looks like this:

**Start CSV Example**

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

**End CSV Example**

ok, so what I end up with is:
strCustFullName = LASTNAME,FIRSTNAME
strCustAddr1 =ADDR1
strCustAddr2 = ADDR2
strCustCity = CITY
strCustState = STATE
strCustZip = ZIP

I then use a SPLIT on strCustFullName on the "," and assign the two
values to strCustLastName and strCustFirst Name.

All of this works perfectly....until today. Today, I get a CSV file
that
looks like this:

"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME "NICKNAME""",ADDR1,ADDR2,CITY,STATE,ZIP
"LASTNAME,FIRSTNAME",ADDR1,ADDR2,CITY,STATE,ZIP

The second record causes my input loop to die and treats the
NICKNAME
as
though it was going in the Addr1 column.

So, how do I handle my input statements to ignore the quotes around
the NICKNAME
or at least treat them like a string and not a field seperator. As a
side note,
yes, I know there are 3 quotes after the NICKNAME. I was told by the
programmer
that passes this file to me that is the proper format and I tested
that by
opening this file with Excel, and Excel read the data correctly and
displayed
it as LASTNAME,FIRSTNAME "NICKNAME" (which is how the data looks
from
the
original source)

The full intent of this process is to create an SQL table with the
name of
the CSV file and then populate it with the records within the CSV
file.
Process it for a Crystal Report and then throw the CSV file and the
Table
away. All worked great until I hit this.

I would appreciate any help at all. Thanks so much
Gregg Cliburn
 
Back
Top