batch conversion using vba

  • Thread starter Thread starter cognoscento
  • Start date Start date
C

cognoscento

I've been left in the all-too-common situation of having really ugly
legacy documents that I have to convert into something usable. In this
case, it's a whole mess of Word documents that are basically long long
tables that I need to import into Access.

It's an easy matter to take those Word documents and convert them into
Excel spreadsheets. Unfortunately, the formatting is a mess; some cells
are merged and not all the information for one record is on a single
row. I'll have to do some batch processing to convert the formatting
first. The silver lining is that the formatting seems to be quite
regular so the conversion should require only a minimum of manual
intervention.

So, two questions:

1) Based on what I've said, is that a reasonable workdlow? Word
document ==> Excel Document ==> Processed with a VBA script to
regularize formatting ==> import into Access. I could probably drop the
Excel step, but I want to go over the data afterwards by hand just to
make sure everything's kosher.

2) Could anyone in here suggest any tutorials or sites that might help
educate me so that I can put this script together? I'm not terribly
familiar with VBA.

Cheers.

C.
 
To quite a large extent it's a matter of taste and the tools you're
familiar with, and your proposed workflow looks fine, with one
reservation. My own inclination (though of course I don't know what the
tables look like) would be towards one of the following:

A) Write VBA code that works with the Word Document and Table objects to
extract the data and append it to the Access tables.

B) 1) Export the contents of the tables to text files (maybe
Table|Convert to Text will suffice, more likely a bit of VBA using
Document and Table objects will be needed).
2) Write Perl code that munges these text files into regular
delimited text files that Access can import cleanly and also flags any
problematic records.

If the data in the cells is reasonably clean (basically one cell->one
field in one record) I'd probably go for (A); the more work that has to
be done on the text within individual Word cells, the more likely I'd be
to us (B). For example, it might be necessary to parse an address in one
cell into separate Building/Street/City/State/Zip fields, to regularise
dates that have been entered in different formats, or to deal with
values that need to go in a numeric field in Access but contain text
values such as "n/a" or "-" or "*" or "ditto" in Word. (It doesn't have
to be Perl, of course: any scripting language with good text-handling
facilities would do.)

The reservation I have about going Word->Excel->Access is that importing
from Excel to Access isn't always as simple a process as one might think
it ought to be. IME it's best to create the Access tables first and then
ensure that the column headers in Excel match the Access field names,
and the data matches the Access field types.

These links have lots of links to Access and VBA material:
http://www.mvps.org/access/resources/othersites.htm
http://www.ltcomputerdesigns.com/JCReferences.html
 
Back
Top