I usually use Perl (free from
www.activeperl.com) for this sort of job,
but commercial tools such as TextPipe Pro (
www.datamystic.com) and
Monarch (
www.datawatch.com) may be easier to use if you don't enjoy
programming.
If you know how many header and footer lines need to be deleted, the
Perl script I've pasted at the end of this message will do the job. It
chops off a specified number of lines from beginning and end of every
file that matches a wildcard filespec. You'd call it with something like
this:
perl "D:\folder\TopAndTail.pl" "*.txt" 4 5 -b
If you don't know how many lines you need to delete, the general idea is
to use a tool such as Perl (or your favourite programming or scripting
language) to read the file line by line and write each line to a new
file if it matches the pattern of the "data" lines, or if it doesn't
match the pattern of the header/footer lines. For example, if your data
is comma-separated but the header lines don't contain commas, you can
process all the .txt files in a folde with something like this:
perl -i.BAK -pe"print if m/,/" "D:\folder\myfile.txt"
I have roughly 1500 text files with comma delimited data that first need to
be cleaned up before importing to Access. Does anyone know of a (preferably
freeware) program that will accomplish this?
The areas in need of cleanup are identical in terms of size, only the text
differs by a little between files. What it boils down to is that the first
and last 10 (or so) lines of trash need to be deleted so the file is purely
comma delimited.
Thanks for your help.
#start of code==================
#TopAndTail.pl
#Removes header and footer lines from text files.
#John Nurick 2004-05
use strict;
die "\nTopAndTail.pl: removes header and footer lines from text files.
\n\n" .
"Usage: perl TopAndTail.pl filespec hdrlines ftrlines [-b]\n\n" .
" filespec can be a wildcard \n" .
" hdrlines and ftrlines are number of lines to omit.\n" .
" -b is optional switch to leave original file with .BAK
extension.\n\n" .
"NB: slurps each file into memory so not suitable for very large
files."
unless defined $ARGV[2];
my ($hdrlines, $ftrlines) = @ARGV[1 .. 2];
my $iofile; #path and name of input file
my $bakname; #path and name for backup file
undef $/; #slurp entire file
my @files = glob($ARGV[0]); #treat first argument as a wildcard
foreach $iofile (@files) {
open IOFILE, $iofile or die "Couldn't open $iofile for input.\n";
my (@lines) = split "\n", <IOFILE>;
if ($#lines < $hdrlines + $ftrlines) {
warn "$iofile doesn't have enough lines!\n";
close IOFILE;
next;
}
close IOFILE;
if ($ARGV[3] =~ m(^-b)i ) { # -b switch
$bakname = $iofile;
$bakname =~ s/\.[^\.]*$/\.BAK/;
unlink $bakname; #delete old .BAK file if it exists
rename $iofile, $bakname;
}
pop @lines until $lines[-1]; #dump any empty line(s) due to \n at end
of infile
open IOFILE, "> $iofile" or die "Couldn't open $iofile for output.\n";
print IOFILE join "\n", @lines[$hdrlines .. $#lines - $ftrlines];
close IOFILE;
print STDOUT "$iofile\n"; #list files as they are processed
}
#end of code======================================