importing text files into access and generating queries and reports

  • Thread starter Thread starter g_1
  • Start date Start date
G

g_1

I have 50 word documents in Word 2007 format. The data in the word
files is in a table of two columns and 10 rows consisting of
paragraphs which is basically a report how a unit in our company has
done for a given year for various parameters. I have such documents
for the past three years so I have 150 such word documents.

I have to analyze how each unit has progressed from the past year and
be able to generate a graph and some queries for that.

I will clarify with an example. I have a MS-Word 2007 File1.docx. It
has data in a table having 2 columns and 10 rows

Column 1 of table(this is not present in the file)
Column 2 of table(this is not present in the file)

First row(this is not present in the file)

Staff Turnover, Reasons for it It is nearly 3% for
this year 2006. It occurred due to difference in pay
and how we can avoid it they were offered
here and other positions they found, less upscale
movement
and career growth. We will provide more training classes for
them to
retain them and so on.

Similarly, Second row

Third row

..
..
..
..
..
Tenth row

I have such 50 files for each unit for this year, previous year and
the year before that.

Is there a way in Access 2007 which allows you to import 3 text files
(converted from Word 2007 by doing save and choosing the option Save
as plain text) for three different years for one unit, convert their
data(which is in a paragraph format as indicated above) to some
graphical format(like pie chart) and provide the ability to run some
queries on them like which unit had maximum(or minimum) revenue, staff
turn over, operating costs?

In my above example, I would want to see, what is the average staff
turn over for the years 2006, 2007, 2008 by running a query?

I realize it might be difficult since there are few numbers and lot of
data is in sentence format.

I also would want to see some graph format showing the turn over rate
by year.

Is there a method for doing this for all multiple units(50 in my case)
at once? That way, I can query staff turn over rate for 2006 which
would list the turn over rate for all units for that year?

Do I need to convert the data into some format first which Access can
use to create a temporary table which can be used to create the graphs
and run queries on?

Can someone please clarify if this can be done? Are there other
options using which might be easier to achieve what I am trying to do?

Thanks a lot.
 
Seems the formatting did not come properly the first time.

I will clarify with an example. I have a MS-Word 2007 File1.docx. It
has data in a table having 2 columns and 10 rows

First row of table(the text "First row of table" is not present in
the file)

Column 1 of table(the text "Column 1 of table" is not present in the
file)
Staff Turnover, Reasons for it

Column 2 of table(this text "Column 2 of table " is not present in
the file)

It is nearly 3% for
this year 2006. It occurred due to difference in pay
and how we can avoid it they were offered
here and other positions they found, less upscale
movement
and career growth. We will provide more training classes for
them to retain them and so on.
Similarly, Second row

Third row
 
Seems the formatting did not come properly the first time.

I will clarify with an example. I have a MS-Word 2007 File1.docx. It
has data in a table having 2 columns and 10 rows

First row of table(the text "First row of table" is not present in
the file)

Column 1 of table(the text "Column 1 of table" is not present in the
file)
Staff Turnover, Reasons for it

Column 2 of table(this text "Column 2 of table " is not present in
the file)

It is nearly 3% for
this year 2006. It occurred due to difference in pay
and how we can avoid it they were offered
here and other positions they found, less upscale
movement
and career growth. We will provide more training classes for
them to retain them and so on.
Similarly, Second row

Third row

.
.
.
.
.
Tenth row

The rest of the description came out fine

This will be a problem however you cut it. Word Tables are very different
kinds of things than Access Tables, and in my experience there's no direct
import.

Your parsing the word table into an external text file changes the problem
space, but doesn't necessairly make it any easier - at least with your posted
examples, it's not at all obvious to me how (in an undifferentiated string of
text) to tell where one value ("Staff Turnover, Reasons for it") ends and
another value ("It is nearly...") begins. There are spaces and newlines
between and within each.

I suspect you'll need some VBA code to parse either the Word document (using
Word automation, which I do *not* know at all well) or the text file, looking
for (say) two consecutive newlines.

Is there any chance that you could export with some other delimiter, perhaps
some character or character string that will never appear in your text - ~!~
say - between the "fields"?
 
John said:
This will be a problem however you cut it. Word Tables are very different
kinds of things than Access Tables, and in my experience there's no direct
import.

Your parsing the word table into an external text file changes the problem
space, but doesn't necessairly make it any easier

Should I use any other format like OpenXML instead of plain text file
to make it easier?
- at least with your posted
examples, it's not at all obvious to me how (in an undifferentiated string of
text) to tell where one value ("Staff Turnover, Reasons for it") ends and
another value ("It is nearly...") begins.

Is there a way the Word table borders could be replaced by something
which could be recognized by Access? Or, should I try exporting the
data
in Word File to a Excel file and then try from there?

There are spaces and newlines
between and within each.
I suspect you'll need some VBA code to parse either the Word document (using
Word automation, which I do *not* know at all well) or the text file, looking
for (say) two consecutive newlines.
We don't want to add VBA code as macros would be disabled in the
document
unless there is no other option.
Is there any chance that you could export with some other delimiter, perhaps
some character or character string that will never appear in your text - ~!~
say - between the "fields"?

How do I do that? Can you please clarify with an example?

Thanks for your reply and time.
 
Should I use any other format like OpenXML instead of plain text file
to make it easier?

That's a lot more hopeful. A2003 can link directly to an XML file; if the XML
correctly parses the columns into fields and the rows into rows, it should be
very straightforward.
Is there a way the Word table borders could be replaced by something
which could be recognized by Access? Or, should I try exporting the
data
in Word File to a Excel file and then try from there?

That's how I've managed to migrate Word tables into Access - using Excel as an
intermediate. It's been a few years though!
We don't want to add VBA code as macros would be disabled in the
document
unless there is no other option.

Well... Access VBA is what I had in mind, not in the document.
How do I do that? Can you please clarify with an example?

I'd look into the XML and Excel options first!
 
I have 50 word documents in Word 2007 format. The data in the word
files is in a table of two columns and 10 rows consisting of
paragraphs which is basically a report how a unit in our company has
done for a given year for various parameters. I have such documents
for the past three years so I have 150 such word documents.

I have to analyze how each unit has progressed from the past year and
be able to generate a graph and some queries for that.

I will clarify with an example. I have a MS-Word 2007 File1.docx. It
has data in a table having 2 columns and 10 rows

Column 1 of table(this is not present in the file)
Column 2 of table(this is not present in the file)

First row(this is not present in the file)

Staff Turnover, Reasons for it It is nearly 3% for
this year 2006. It occurred due to difference in pay
and how we can avoid it they were offered
here and other positions they found, less upscale
movement
and career growth. We will provide more training classes for
them to
retain them and so on.

Similarly, Second row

Third row

.
.
.
.
.
Tenth row

I have such 50 files for each unit for this year, previous year and
the year before that.

Is there a way in Access 2007 which allows you to import 3 text files
(converted from Word 2007 by doing save and choosing the option Save
as plain text) for three different years for one unit, convert their
data(which is in a paragraph format as indicated above) to some
graphical format(like pie chart) and provide the ability to run some
queries on them like which unit had maximum(or minimum) revenue, staff
turn over, operating costs?

In my above example, I would want to see, what is the average staff
turn over for the years 2006, 2007, 2008 by running a query?

I realize it might be difficult since there are few numbers and lot of
data is in sentence format.

I also would want to see some graph format showing the turn over rate
by year.

Is there a method for doing this for all multiple units(50 in my case)
at once? That way, I can query staff turn over rate for 2006 which
would list the turn over rate for all units for that year?

Do I need to convert the data into some format first which Access can
use to create a temporary table which can be used to create the graphs
and run queries on?

Can someone please clarify if this can be done? Are there other
options using which might be easier to achieve what I am trying to do?

Thanks a lot.

IIRC, the Tables Collection in Word is a fairly convenient and
reliable way for Access to retrieve values directly from Word. See:

Returning the contents of each table cell

http://msdn.microsoft.com/en-us/library/bb221277.aspx#Returning2

A dynamically created append query for each Excel table row is
probably better than saving to arrays. The Access table might include
the name of the Word document, perhaps even the Word file's created/
lastmodified time as found by Access' FileDateTime() function. Note
that the code used in Access to automate Excel will possibly require a
reference to the Excel Object Library. Be aware of the difference
between early binding and late binding when automating Excel from
Access. Also, all references to any Excel object should use the
equivalent of a full path name or branch from an object defined using
the equivalent of a full path name so that the Excel object can be
released properly. For example, you might have something like
objXL.ActiveDocument.Tables(strWordTableName).Range instead of
ActiveDocument.Tables(strWordTableName).Range after the MSDN Excel VBA
sample code is moved into Access. Maybe use the Dir() function to
populate an Access form's multiselect listbox of Word documents and
loop through the documents selected in the listbox.

James A. Fortune
(e-mail address removed)
 
That's a lot more hopeful. A2003 can link directly to an XML file; if theXML
correctly parses the columns into fields and the rows into rows, it should be
very straightforward.
I am using Office 2007. So, when I save my Word 2007 file as a Word
Xml document
which has an extension .xml and then try to open it in MS-Excel 2007 I
get three
choices while opening it

1. XML Table
2. As a read only workbook
3. Use the XML Source task pane.

On choosing the first option(XML Table) I get the message
the specified XML source does not refer to a schema. Excel will
create a schema based on the XML source data.
The Excel file then has details like Author of file, name of
file, last saved, no of words, no of characters and so on
about the file, but not the contents which is what I want.

On choosing the second option(As a read only workbook)
I get the message "This file cannot be opened because of
an error. Check to make sure the file is a valid XML file and
the XML syntax is correct. You can also try
recreating the file and closing other applications before
you attempt to open it again." I click OK and the XML
file I tried to open, closes

On selecting the third option(Use the XML Source task pane)
I get the message "The specified XML source does not
refer to a schema. Excel will create a schema based on
the XML source data" I click OK. The XML maps I can
see for the document are for DocumentProperties
which have Tile, Author, no of words and so on.

Am I doing something wrong while saving the Word 2007
file as an XML file or while opening the XML file from
Excel 2007?



That's how I've managed to migrate Word tables into Access - using Excel as an
intermediate. It's been a few years though!
Do you recall how? I am having a hard time now
with Word and Excel 2007
Well... Access VBA is what I had in mind, not in the document.


I'd look into the XML and Excel options first!
Did not work. Can you please advise?
 
IIRC, the Tables Collection in Word is a fairly convenient and
reliable way for Access to retrieve values directly from Word.  See:

Returning the contents of each table cell

http://msdn.microsoft.com/en-us/library/bb221277.aspx#Returning2

Ok, I create an array and populate it as shown in the example
A dynamically created append query for each Excel table row is
probably better than saving to arrays.  

How do I create this dynamic append query?
The Access table might include
the name of the Word document, perhaps even the Word file's created/
lastmodified time as found by Access' FileDateTime() function.  Note
that the code used in Access to automate Excel will possibly require a
reference to the Excel Object Library.

Do I need to do anything to resolve a conflict if the Excel Object
Library? If so, what and how?
 Be aware of the difference
between early binding and late binding when automating Excel from
Access.  

It will not be automating, but one time transfer from Excel
to Access. Do I still need to worry about late and early
binding?
Also, all references to any Excel object should use the
equivalent of a full path name or branch from an object defined using
the equivalent of a full path name so that the Excel object can be
released properly.  For example, you might have something like
objXL.ActiveDocument.Tables(strWordTableName).Range instead of
ActiveDocument.Tables(strWordTableName).Range after the MSDN Excel VBA
sample code is moved into Access.

Do I need to worry about it if it is a one time transfer from
Excel to Access?
 Maybe use the Dir() function to
populate an Access form's multiselect listbox of Word documents and
loop through the documents selected in the listbox.

Can you please provide an example or link to this?

Sorry, if my questions are too naive, but I have not done
this before.


James A. Fortune
(e-mail address removed)

Thanks for your advice and time.
 
See below.

Ok, I create an array and populate it as shown in the example


How do I create this dynamic append query?

Example of a dynamic append query (untested):

Suppose I have a table that I use to collect data:

tblXY
ID AutoNumber
X Double
Y Double

In VBA:

Dim MyDB As DAO.Database
Dim dblX As Double
Dim dblY As Double
Dim boolError As Boolean
Dim N As Integer
....
Set MyDB = CurrentDb
For intI = 1 To N
...
boolError = GetNextXYPair(dblX, dblY)
If boolError = False Then
strSQL = "INSERT INTO tblXY ( X, Y) VALUES (" & CStr(dblX) & ", "
& CStr(dblY) & ");"
MyDB.Execute strSQL, dbFailOnError
DoEvents
End If
..
Next intI
Set MyDB = Nothing
Do I need to do anything to resolve a conflict if the Excel Object
Library? If so, what and how?

You will need the reference or you won't. If you attempt to use
something from the Excel Object Library and the reference to the
Library doesn't exist, then Access will not be able to understand your
reference to the object in your VBA code. You might only need to
reference the Library when using Early Binding. Many developers start
with Early Binding in the early stages of development to provide
Intellisense. After that, they often switch to Late Binding to obtain
other benefits, such as version independence. If you switch to Late
Binding, try removing the reference to the Excel Object Library and
see if you get an error or not.
It will not be automating, but one time transfer from Excel
to Access. Do I still need to worry about late and early
binding?

I consider any use of an Excel.Application Object within Access to be
"automation." Early Binding will use something like:

Dim objXL As Excel.Application

Late Binding:

Dim objXL As Object

Obviously, the Excel Object Library reference is necessary for Early
Binding due to the reference to Excel.Application in the code. For a
one time transfer simply use Early Binding. Late Binding relies on
Access being able to determine the function call's return type. E.g.:

Dim objXL As Object

Set objXL = CreateObject("Excel.Application")

infers the type of objXL from the CreateObject function, but does not
keep you from trying to use a method that Excel.Application does not
support at run-time (having Intellisense with Early Binding is not
just for convenience). There are also other situations where Access
allows less strong data type checks than, say, C#.
Do I need to worry about it if it is a one time transfer from
Excel to Access?

I think so. Even if you can get away with a "relative" path, you know
that the "absolute" path will work in all situations. I never use the
SendObject method for the same reason. Note that I could have used
dblX instead of CStr(dblX) in the code above and relied on implicit
conversion of dblX into a string. Perhaps I was being a little too
fussy with that one.
Can you please provide an example or link to this?

Please try to search this NG or comp.databases.ms-access first. If
you don't find an example, post back and someone should be able to
help. I even posted a link once to a sample database I created that
did almost exactly the same thing, but used *.* instead of *.doc
within the Dir() function.
Sorry, if my questions are too naive, but I have not done
this before.


Thanks for your advice and time.

Your thanks are appreciated.

James A. Fortune
(e-mail address removed)
 
James said:
See below.


How do I create this dynamic append query?

Example of a dynamic append query (untested):

Suppose I have a table that I use to collect data:

tblXY
ID AutoNumber
X Double
Y Double

In VBA:

Dim MyDB As DAO.Database
Dim dblX As Double
Dim dblY As Double
Dim boolError As Boolean
Dim N As Integer
...
Set MyDB = CurrentDb
For intI = 1 To N
...
boolError = GetNextXYPair(dblX, dblY)
If boolError = False Then
strSQL = "INSERT INTO tblXY ( X, Y) VALUES (" & CStr(dblX) & ", "
& CStr(dblY) & ");"
MyDB.Execute strSQL, dbFailOnError
DoEvents
End If
..
Next intI
Set MyDB = Nothing

Thanks, for the query.

You will need the reference or you won't. If you attempt to use
something from the Excel Object Library and the reference to the
Library doesn't exist, then Access will not be able to understand your
reference to the object in your VBA code. You might only need to
reference the Library when using Early Binding. Many developers start
with Early Binding in the early stages of development to provide
Intellisense. After that, they often switch to Late Binding to obtain
other benefits, such as version independence. If you switch to Late
Binding, try removing the reference to the Excel Object Library and
see if you get an error or not.

How do I know if I have the Excel Object Library or not?
I guess my code will fail if I don't have it as the references cannot
be
resolved? If I don't have it, should I get it from
http://www.filebuzz.com/findsoftware/Microsoft_Excel_12_0_Excel_Object_Library/1.html
Please try to search this NG or comp.databases.ms-access first.

A cursory search did not provide me an example which does something
related to my task.
The only searches I found were
http://groups.google.com/group/micr...hl=en&lnk=gst&q=dir+function#252a380585b66d4f

and

http://groups.google.com/group/micr...n&lnk=gst&q=Dir()+function++#b4bc2bf4d68138b2
If
you don't find an example, post back and someone should be able to
help. I even posted a link once to a sample database I created that
did almost exactly the same thing, but used *.* instead of *.doc
within the Dir() function.

Can you please advise now?

Can the Word 2007 files
be saved in some document format which can be imported properly
from Access 2007 as per my need. I tried saving them(Word 2007 files)
as Word XML
document which did not help. Would saving them in ".htm" or other
format help?

Your thanks are appreciated.
I appreciate your advice and aid.
 
How do I know if I have the Excel Object Library or not?
I guess my code will fail if I don't have it as the references cannot
be
resolved? If I don't have it, should I get it fromhttp://www.filebuzz.com/findsoftware/Microsoft_Excel_12_0_Excel_Objec...

In the VBA editor, if you have a Tools... menu look for References...
You should have at least one version of the Excel Object Library
available in the list.
A cursory search did not provide me an example which does something
related to my task.
The only searches I found werehttp://groups.google.com/group/microsoft.public.access.forms/browse_t...

and

http://groups.google.com/group/microsoft.public.access/browse_thread/...


Can you please advise now?

Search harder :-). Try:

https://files.oakland.edu/users/fortune/web/FileTimes.zip

Notes: The example is an A97 database, but should convert without
problems to later versions of Access. Post back if that's not the
case. The link is only expected to be good for about one year.
Oakland U. moved the location of personal webs recently, but said that
support for old links should work correctly for about a year. Perhaps
the link above does not have the same restrictions as the old links.
Can the Word 2007 files
be saved in some document format which can be imported properly
from Access 2007 as per my need. I tried saving them(Word 2007 files)
as Word XML
document which did not help. Would saving them in ".htm" or other
format help?

I haven't used Word 2007 yet so I'll refrain from making
recommendations about the best course of action when ignorance is
present. As a shot in the dark I'll suggest the possibility of
writing some kind of Word macro that prepares the data and exports
it. You could even write out your own XML that way. I vaguely recall
that in earlier versions of Word, I think I had to use the Workspace
object within Word VBA in order to interact with Access data
correctly. I might be able to dig up a small sample. Personally, I
like creating my own graphs from Access into PDF format without using
the Export to PDF add-in for Access 2007. Also, some of the Windows
Presentation Foundation (WPF) dynamic interactive graphing techniques
are quite powerful.

James A. Fortune
(e-mail address removed)

A good code sample is worth a thousand pictures.

Disclaimer: Any programming examples shown are for illustration
purposes only, without warranty either expressed or implied. This
includes, but is not limited to, the implied warranties of
merchantability or fitness for a particular purpose. This post assumes
that you are familiar with the programming language that is being
demonstrated and with the tools that are used to create and to debug
procedures. I might explain the functionality of a particular
procedure, but I am under no obligation to modify these examples to
provide added functionality or to construct procedures to meet your
specific requirements. Any code samples posted contain no known
hidden material defects. However, anyone who uses any code sample
posted does so with the understanding that they are responsible for
any testing of any illustrative code sample for any particular use.
Furthermore, anyone using an illustrative code sample I provide or
code derived from it does so at their own risk.
 
In the VBA editor, if you have a Tools... menu look for References...
You should have at least one version of the Excel Object Library
available in the list.

I'm not sure my advice here applies to Access 2007. Perhaps Google
for how to look at the list of References in Access 2007.

James A. Fortune
(e-mail address removed)
 
Back
Top