M
MW
Dear All
Does anyone have a regular expression to parse a comma delimited line with
some fields optionally having string delimiters (text qualifiers)
I am currently testing with this regular expression and it works in almost
all my test cases. I found this on the internet in a C# solution.
,(?=([^\"]*"[^"]*")*(?![^"]*"))
However in some of my test cases it fails and I am having difficulty
interpreting it.
The VB.NET function I used is
Public Function parseCSVLine(ByVal sInputString As String) As ArrayList
Dim r As New Regex(",(?=([^\" & Chr(34) & "]*" & Chr(34) & "[^" &
Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))")
Dim iStart As Integer, m As Match
Dim oArrayList As New ArrayList()
For Each m In r.Matches(sInputString)
oArrayList.Add(sInputString.Substring(iStart, m.Index - iStart))
iStart = m.Index + 1
Next
oArrayList.Add(sInputString.Substring(iStart, sInputString.Length -
iStart))
Return oArrayList
End Function
My test cases are as follows:
#
CSV
Value 1
Value 2
Value 3
Value 4
Results
1
a,b,c
a
b
c
P
2
"a",b,c
a
b
c
P
3
'a',b,c
'a'
b
c
P
4
a , b , c
a
b
c
P
5
aa,bb;cc
aa
bb;cc
P
6
P
7
a
a
P
8
,b,
b
P
9
,,c
c
P
10
,,
P
11
"",b
b
P
12
" ",b
[SPACE]
b
P
13
"a,b"
a,b
P
14
"a,b",c
a,b
c
P
15
" a , b ", c
a , b
c
P
16
a b,c
a b
c
P
17
a"b,c
a"b
C
P
18
"a""b",c
a"b
c
P
19
a""b,c
a""b
c
P
20
a,b",c
a
b"
c
O
21
a,b"",c
a
b""
c
P
22
a,"B: ""Hi, I'm B""",c
a
B: "Hi, I'm B"
c
P
23
a,"b,c
a
"b
c
O
24
a,bc"d,e
a
bc"d
e
O
25
a,bc"d",e
a
bc"d"
e
O
26
a,"bc"d,e
a
"bc"d
e
O
Many thanks,
Wazir
Does anyone have a regular expression to parse a comma delimited line with
some fields optionally having string delimiters (text qualifiers)
I am currently testing with this regular expression and it works in almost
all my test cases. I found this on the internet in a C# solution.
,(?=([^\"]*"[^"]*")*(?![^"]*"))
However in some of my test cases it fails and I am having difficulty
interpreting it.
The VB.NET function I used is
Public Function parseCSVLine(ByVal sInputString As String) As ArrayList
Dim r As New Regex(",(?=([^\" & Chr(34) & "]*" & Chr(34) & "[^" &
Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))")
Dim iStart As Integer, m As Match
Dim oArrayList As New ArrayList()
For Each m In r.Matches(sInputString)
oArrayList.Add(sInputString.Substring(iStart, m.Index - iStart))
iStart = m.Index + 1
Next
oArrayList.Add(sInputString.Substring(iStart, sInputString.Length -
iStart))
Return oArrayList
End Function
My test cases are as follows:
#
CSV
Value 1
Value 2
Value 3
Value 4
Results
1
a,b,c
a
b
c
P
2
"a",b,c
a
b
c
P
3
'a',b,c
'a'
b
c
P
4
a , b , c
a
b
c
P
5
aa,bb;cc
aa
bb;cc
P
6
P
7
a
a
P
8
,b,
b
P
9
,,c
c
P
10
,,
P
11
"",b
b
P
12
" ",b
[SPACE]
b
P
13
"a,b"
a,b
P
14
"a,b",c
a,b
c
P
15
" a , b ", c
a , b
c
P
16
a b,c
a b
c
P
17
a"b,c
a"b
C
P
18
"a""b",c
a"b
c
P
19
a""b,c
a""b
c
P
20
a,b",c
a
b"
c
O
21
a,b"",c
a
b""
c
P
22
a,"B: ""Hi, I'm B""",c
a
B: "Hi, I'm B"
c
P
23
a,"b,c
a
"b
c
O
24
a,bc"d,e
a
bc"d
e
O
25
a,bc"d",e
a
bc"d"
e
O
26
a,"bc"d,e
a
"bc"d
e
O
Many thanks,
Wazir