I need help in reading a “messy†CSV file.

  • Thread starter Thread starter Brad
  • Start date Start date
B

Brad

I am trying to read a CSV file that contains some messy data.

Here is an example.

John Olsen,London,A1000
Lars Larson,Oslo,B2000
Sean Penn
,Toronto,A1000
Sally Smith,Paris,B3000

Note how the third record should contain a comma after the word Penn, but
does not. Instead the word Penn is followed with a number of spaces and then
the first character in the next line is a comma.

This file is coming from an outside source, so I cannot control how it is
created. I would like to use VBA to read in this file and create a new file
with the records looking like this.

John Olsen,London,A1000
Lars Larson,Oslo,B2000
Sean Penn,Toronto,A1000
Sally Smith,Bonn,B3000

I could then import this “clean†file into Access.

Does anyone have an example on how to accomplish this?

Thanks in advance for your help.

Brad
 
Brad

It may be a matter of scale ...

If you have hundreds of thousands of records in the CSV file, it's probably
worth your while to come up with a procedure/process to do the cleanup. Or,
if you are just itching to learn coding, and have plenty of time for trial
and error and testing and rebuilding...

Or if you have a few hundred records, it would take considerably less time
to just 'manually' clean them up.

More info, please...

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
Jeff,

Thanks for your reply.

The CSV file comes from an outside source, and there are "messy records"
that are currently fixed manually each morning. The data is then imported
into Excel. I am in the process of building a small Access-based system to
replace the Excel spreadsheets. I would really like to eliminate the need
for the manual step as the data is imported into the new Access-based system.

I was hoping that some other Access user had previously encountered a
similar situation and had an example of how they dealt with it.

Thanks again for your thoughts.

Brad
 
Brad

If your data is guaranteed to be 'comma-separated', then you might be able
to get Access to import as raw data, with one field per comma-separator.
The trick will be (I haven't tested this) if Access treats the tab (?and
CRLF) as part of the 'ending' of one field, and waits for the comma to start
another.

If you try this and it works, you'll have 'raw data' imported into Access.
Then you could use queries to do any additional cleanup you need, or parsing
into more normalized tables.

Give it a try and post back -- you probably won't be the last to need to do
this!

Good luck

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
.... and if that doesn't work, perhaps one of the other newsgroup readers has
done something with text-oriented languages.

Also, have you tried searching on-line?

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
How many rows do you get every morning?

Brad said:
Jeff,

Thanks for your reply.

The CSV file comes from an outside source, and there are "messy records"
that are currently fixed manually each morning. The data is then imported
into Excel. I am in the process of building a small Access-based system
to
replace the Excel spreadsheets. I would really like to eliminate the need
for the manual step as the data is imported into the new Access-based
system.

I was hoping that some other Access user had previously encountered a
similar situation and had an example of how they dealt with it.

Thanks again for your thoughts.

Brad
 
I am trying to read a CSV file that contains some messy data.

Here is an example.

John Olsen,London,A1000
Lars Larson,Oslo,B2000
Sean Penn
,Toronto,A1000
Sally Smith,Paris,B3000

Off the top of my head, I would start with Chuck Grimsby's code that
reads text files. Then because he maintains pointers to line(n) and
line(n-1), you could split the line and check the ubound... if it's
equal to some constant (like 2, because the array is zero-based), just
write the value to the output text file. If not, read the next line
and put the pieces together, then write...
 
Back
Top