How to ignore extra or missing columns?

  • Thread starter Thread starter topmind
  • Start date Start date
T

topmind

I need a way to dynamically process input CSV files with varying
schemas. I have not found a way to get Access to ignore extra or
missing columns. For example, suppose I set up a work table with 30
columns. Sometimes I want it to automatically import a 10-column table
and other times a 50 column table. The first should simply have nulls
in the extra 20 and the second one (50 cols) would have the last 20 be
ignored. It is too picky about knowing the number of columns in
advanced. I live in a dynamic world :-)

Thanks in advanced for any solutions proposed.
 
This probably isn't difficult, but it's beyond the capability of the
standard text import mechanism.

The way you describe it, it sounds as if the first few columns define a
record, and the remaining columns contain a variable number of related
items. If so, the best approach would be to parse the CSV file
accordingly and import it into records in two or more related tables.

Otherwise, you can either pre-process the CSV files,to give them all the
desired number of fields, or else write VBA code to read them line by
line and append fields to the table as required.

This snippet of VB/VBA code will take a comma-delimited list of values
and return the first NFields of them, padding with empty values if there
aren't enough in the list. If you remove the As String, As Long, etc. it
will also work in VBScript.

Function Make20(Line As String, NFields As Long) As String
Dim arFields As Variant

arFields = Split(Line, ",", NFields + 1)
If UBound(arFields) <> NFields -1 Then
ReDim Preserve arFields(NFields -1)
End If
Split20 = Join(arFields, ",")
End Function
 
John said:
This probably isn't difficult, but it's beyond the capability of the
standard text import mechanism.

The way you describe it, it sounds as if the first few columns define a
record, and the remaining columns contain a variable number of related
items. If so, the best approach would be to parse the CSV file
accordingly and import it into records in two or more related tables.

Not really. A given file will have a fixed number of columns, it is
just that each file may have a different number. And, I have no real
control over this variation scenario. It is what is given to me.
Otherwise, you can either pre-process the CSV files,to give them all the
desired number of fields, or else write VBA code to read them line by
line and append fields to the table as required.

This snippet of VB/VBA code will take a comma-delimited list of values
and return the first NFields of them, padding with empty values if there
aren't enough in the list. If you remove the As String, As Long, etc. it
will also work in VBScript.

Function Make20(Line As String, NFields As Long) As String
Dim arFields As Variant

arFields = Split(Line, ",", NFields + 1)
If UBound(arFields) <> NFields -1 Then
ReDim Preserve arFields(NFields -1)
End If
Split20 = Join(arFields, ",")
End Function

Perhaps. However, some columns contain embedded commas. Example:

"1234","Simpson, OJ","123 Maple St.","CA"
"6544","Gates, Bill","One Microsoft Way","WA"
etc...

Here the name column (second) has an embedded comma. The Split function
will mess that up.

However, at least it is something to consider and at least I know there
is probably no obvious built-in way to do it without some somewhat
involved programming.

Thanks for your feedback and suggestions,

-T-
 
I was afraid this might be a problem. So here's Plan B: build and
execute the SQL statement for an append query that gets data from the
text file and returns the correct number of fields. Note the syntax of
the FROM clause: you'll need to write code to parse the path of your
text file and put the pieces into that format.

1) Open an empty recordset on the file to access the number and names of
its fields, and the same on the table:

Dim rsText As DAO.Recordset, rsTable As DAO.Recordset


Set rsText = CurrentDB.OpenRecordset( "SELECT * FROM " _
& "[Text;HDR=Yes;Database=C:\MyFolder\;].MyFile#txt " _
& "WHERE FALSE;" , dbOpenSnapshot)
'If the text files don't include field names in the first row,
'use HDR=No and field names F1, F2 F3,...

Set rsTable = CurrentDB.OpenRecordset( "SELECT * FROM " _
$ "MyTable WHERE FALSE", dbOpenSnapshot)

2) You now have two recordsets, each with a Fields collection. Iterate
through the two collections matching the field names and building a SQL
append query to suit. What follows is more pseudocode than air code:

Dim strSQL As String
Dim j As Long, k as Long

strSQL = "INSERT INTO MyTable ("
With rsTable.Fields
For j = 0 to .Count - 1
strSQL = strSQL & .Item(j).Name & ", "
Next
strSQL = Left(strSQL, Len(strSQL) - 2) & ") " & vbCrLf
End With

strSQL = strSQL & "SELECT "

'Now work out how many fields from the text file need to be
'included in the query.

If rsText.Fields.Count >= rsTable.Fields.Count Then
k = rsTable.Fields.Count
Else
k = rsText.Fields.Count
End If

'Continue building the SQL:
With rsText.Fields
For j = 0 to k - 1
'concatenate .Item(j).Name and a comma to strSQL
Next
If k < .Count Then
For j = rsTable.Fields.Count - 1 To k - 1
'concatenate Null, to strSQL
Next
End If
End With

Remove the terminal , from strSQL
Concatenate the FROM clause as in the OpenRecordset at the start
CurrentDB.Execute strSQL




John said:
This probably isn't difficult, but it's beyond the capability of the
standard text import mechanism.

The way you describe it, it sounds as if the first few columns define a
record, and the remaining columns contain a variable number of related
items. If so, the best approach would be to parse the CSV file
accordingly and import it into records in two or more related tables.

Not really. A given file will have a fixed number of columns, it is
just that each file may have a different number. And, I have no real
control over this variation scenario. It is what is given to me.
Otherwise, you can either pre-process the CSV files,to give them all the
desired number of fields, or else write VBA code to read them line by
line and append fields to the table as required.

This snippet of VB/VBA code will take a comma-delimited list of values
and return the first NFields of them, padding with empty values if there
aren't enough in the list. If you remove the As String, As Long, etc. it
will also work in VBScript.

Function Make20(Line As String, NFields As Long) As String
Dim arFields As Variant

arFields = Split(Line, ",", NFields + 1)
If UBound(arFields) <> NFields -1 Then
ReDim Preserve arFields(NFields -1)
End If
Split20 = Join(arFields, ",")
End Function

Perhaps. However, some columns contain embedded commas. Example:

"1234","Simpson, OJ","123 Maple St.","CA"
"6544","Gates, Bill","One Microsoft Way","WA"
etc...

Here the name column (second) has an embedded comma. The Split function
will mess that up.

However, at least it is something to consider and at least I know there
is probably no obvious built-in way to do it without some somewhat
involved programming.

Thanks for your feedback and suggestions,

-T-
 
Hmmm. Interesting approach.

I wonder if it might not be less coding to simply do a fancier parsing
of the input lines and use a variation of your first approach. I've
done this in other languages......I think. For example, have an
"inQuote" flag to determine if a comma is inside quotes or outside as
the current character pointer traverses an input line from
left-to-right.

Well, at least I have more options now to work with. One of them or a
combo should do it, eh?

Thanks for your help! Kudos!

-T-
 
There's always a Perl one-liner:

C:\Temp>perl -wlnMText::ParseWords -e"chomp; print join ',', (parse_line
',', 1, $_ . ',' x 30)[0..29]"
 
Back
Top