Importing large data files

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I'm trying to import a CSV file of nearly 3,000,000 KBytes and Access reports
"File .... does not contain data". When I "type" the file from the command
line the data is there and correctly formatted.

Is this size related, and what other options do I have?

Thanks

Neil
 
Hi Neil,

If I'm reading your zeros correctly, it's a 3 GB file, and I've a feeling
that the standard import routine karks at anything bigger than 2GB. But
that's academic, because an Access mdb file can't be bigger than 2 GB
anyway.

Most really big files in my limited expeience have a great deal of redundant
data in them. It's possible to use text file tools to normalise them into
something less intimidating. And if you're only interested in some of the
data, you can use text file tools to filter the rows and/or fields you need.
Post back here with more information about the structure of the file and
what you need from it.
 
John, thanks for that, pretty much confirms my fears.

The text file does have dozens of fields of null or duff data, which is
really of no interest to me. Trouble is, I don't know how to strip this out
of the text file so that it can be imported. In one of life's cruel twists,
I can filter it out or delete it once I have the data in Access!

Can you recomend a text editor or a different database I can pre-edit the
date with?
 
Neil,

The fact that it's a CSV file may complicate matters. Is it the kind of
CSV file in which text fields are delimited (qualified) by quote marks
and may contain commas?

If not - i.e. if every comma is a field separator - just download the
Gnu utilities from http://unxutils.sourceforge.net. They include 'cut',
which can extract specified fields from a delimited or fixed-width file,
and 'grep', which can extract lines that match a pattern and dump the
rest. Most of them have no limit on file size.

'cut' doesn't handle CSV files which may have commas in the data. One
easy way to extract fields from these is with Perl (free download from
www.activestate.com among other places): the standard installation
includes the ParseWords module which can handle most CSV files.
Something like this executed at the Windows command prompt will do the
job:

C:\Temp>perl -wlnMText::ParseWords -e"chomp; print join ',',
(parse_line ',', 1, $_ )[0..5,29,40..80]" "Input.txt"

The list in the square brackets specifies which fields you want:
0 is the first field,
0..5 is the first to sixth fields
and so on.

Once you're happy with the output you can either redirect it to a file
(e.g. append > "Output.txt" to the command) or have Perl edit the file
and leave a backup copy of the original by inserting
-i.bak
before the -e.

If it's a really gnarly file - e.g. with linebreaks and/or escaped
quotes in the data, things get more complicated - but probably there's a
free Perl module available to handle it.
 
John: thanks for a terrific response, you've clearly thought through the
problem and provided me with some great options. Cheers

N

John Nurick said:
Neil,

The fact that it's a CSV file may complicate matters. Is it the kind of
CSV file in which text fields are delimited (qualified) by quote marks
and may contain commas?

If not - i.e. if every comma is a field separator - just download the
Gnu utilities from http://unxutils.sourceforge.net. They include 'cut',
which can extract specified fields from a delimited or fixed-width file,
and 'grep', which can extract lines that match a pattern and dump the
rest. Most of them have no limit on file size.

'cut' doesn't handle CSV files which may have commas in the data. One
easy way to extract fields from these is with Perl (free download from
www.activestate.com among other places): the standard installation
includes the ParseWords module which can handle most CSV files.
Something like this executed at the Windows command prompt will do the
job:

C:\Temp>perl -wlnMText::ParseWords -e"chomp; print join ',',
(parse_line ',', 1, $_ )[0..5,29,40..80]" "Input.txt"

The list in the square brackets specifies which fields you want:
0 is the first field,
0..5 is the first to sixth fields
and so on.

Once you're happy with the output you can either redirect it to a file
(e.g. append > "Output.txt" to the command) or have Perl edit the file
and leave a backup copy of the original by inserting
-i.bak
before the -e.

If it's a really gnarly file - e.g. with linebreaks and/or escaped
quotes in the data, things get more complicated - but probably there's a
free Perl module available to handle it.



John, thanks for that, pretty much confirms my fears.

The text file does have dozens of fields of null or duff data, which is
really of no interest to me. Trouble is, I don't know how to strip this out
of the text file so that it can be imported. In one of life's cruel twists,
I can filter it out or delete it once I have the data in Access!

Can you recomend a text editor or a different database I can pre-edit the
date with?
 
Hello I am having the same problem but my difference is that even a small CSV
file cannot be imported into Access for some reason!

Help will be much appreciated cause it doesnt make any sense at all!!
Regards
Dimitris
 
Back
Top