Text File Cleanup Utility

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have roughly 1500 text files with comma delimited data that first need to
be cleaned up before importing to Access. Does anyone know of a (preferably
freeware) program that will accomplish this?

The areas in need of cleanup are identical in terms of size, only the text
differs by a little between files. What it boils down to is that the first
and last 10 (or so) lines of trash need to be deleted so the file is purely
comma delimited.

Thanks for your help.
 
I usually use Perl (free from www.activeperl.com) for this sort of job,
but commercial tools such as TextPipe Pro (www.datamystic.com) and
Monarch (www.datawatch.com) may be easier to use if you don't enjoy
programming.

If you know how many header and footer lines need to be deleted, the
Perl script I've pasted at the end of this message will do the job. It
chops off a specified number of lines from beginning and end of every
file that matches a wildcard filespec. You'd call it with something like
this:
perl "D:\folder\TopAndTail.pl" "*.txt" 4 5 -b

If you don't know how many lines you need to delete, the general idea is
to use a tool such as Perl (or your favourite programming or scripting
language) to read the file line by line and write each line to a new
file if it matches the pattern of the "data" lines, or if it doesn't
match the pattern of the header/footer lines. For example, if your data
is comma-separated but the header lines don't contain commas, you can
process all the .txt files in a folde with something like this:
perl -i.BAK -pe"print if m/,/" "D:\folder\myfile.txt"


I have roughly 1500 text files with comma delimited data that first need to
be cleaned up before importing to Access. Does anyone know of a (preferably
freeware) program that will accomplish this?

The areas in need of cleanup are identical in terms of size, only the text
differs by a little between files. What it boils down to is that the first
and last 10 (or so) lines of trash need to be deleted so the file is purely
comma delimited.

Thanks for your help.


#start of code==================
#TopAndTail.pl
#Removes header and footer lines from text files.
#John Nurick 2004-05

use strict;
die "\nTopAndTail.pl: removes header and footer lines from text files.
\n\n" .
"Usage: perl TopAndTail.pl filespec hdrlines ftrlines [-b]\n\n" .
" filespec can be a wildcard \n" .
" hdrlines and ftrlines are number of lines to omit.\n" .
" -b is optional switch to leave original file with .BAK
extension.\n\n" .
"NB: slurps each file into memory so not suitable for very large
files."
unless defined $ARGV[2];

my ($hdrlines, $ftrlines) = @ARGV[1 .. 2];
my $iofile; #path and name of input file
my $bakname; #path and name for backup file

undef $/; #slurp entire file

my @files = glob($ARGV[0]); #treat first argument as a wildcard

foreach $iofile (@files) {
open IOFILE, $iofile or die "Couldn't open $iofile for input.\n";
my (@lines) = split "\n", <IOFILE>;
if ($#lines < $hdrlines + $ftrlines) {
warn "$iofile doesn't have enough lines!\n";
close IOFILE;
next;
}
close IOFILE;
if ($ARGV[3] =~ m(^-b)i ) { # -b switch
$bakname = $iofile;
$bakname =~ s/\.[^\.]*$/\.BAK/;
unlink $bakname; #delete old .BAK file if it exists
rename $iofile, $bakname;
}
pop @lines until $lines[-1]; #dump any empty line(s) due to \n at end
of infile
open IOFILE, "> $iofile" or die "Couldn't open $iofile for output.\n";
print IOFILE join "\n", @lines[$hdrlines .. $#lines - $ftrlines];
close IOFILE;
print STDOUT "$iofile\n"; #list files as they are processed
}
#end of code======================================
 
You really don't need to clean up the files if you import them using code
and can "tell" which lines to omit.
Here is an outline:

Public Sub ImportFile(strPath As String)

Dim db As Database, rs As Recordset
Dim sLine As String, sTrimmed As String
Set db = CurrentDb
Set rs = db.OpenRecordset("TableName", dbOpenTable)

Open strPath For Input As #1

'Read a single line from an open sequential file and assign it to a String
variable.
Line Input #1, sLine
'Trim the leading blanks
sTrimmed = LTrim(sLine)

Do While Not EOF(1)
'read the next line of the file
Line Input #1, sLine
sTrimmed = LTrim(sLine)

'manipulate the string if necessary, then add it to the rs table.
If rs.BOF = True Then
rs.AddNew
Else
rs.Edit
End If
rs.Update
Loop
End Sub
 
Back
Top