use vba to open pdf then save as xml

  • Thread starter Thread starter Peter Jamieson
  • Start date Start date
P

Peter Jamieson

I can open manually my pdf file then choose: File-> Save As-> Tables in
Excel Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract what I
need.
The above works manually and as I now receive a bunch of pdf's every couple
of days it would
be nice to automate in VBA the creation of the xml files from the pdf's if
possible

So far I have the following code that runs in VBA and opens the pdf file but
I am
unable to figure out the next step of how I can now save it as the xml file.
Any help appreciated!
Cheers, Peter

Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias "ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:\temp\Test.pdf", vbNullString,
"C:\", SW_SHOWNORMAL
End Sub
 
Peter said:
I can open manually my pdf file then choose: File-> Save As-> Tables in
Excel Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract
what I need.
The above works manually and as I now receive a bunch of pdf's every
couple of days it would
be nice to automate in VBA the creation of the xml files from the pdf's
if possible

So far I have the following code that runs in VBA and opens the pdf file
but I am
unable to figure out the next step of how I can now save it as the xml
file. Any help appreciated!

Jesus. You need to take Outlook Express out back and shoot it.
Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias "ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:\temp\Test.pdf",
vbNullString,
"C:\", SW_SHOWNORMAL
End Sub

Do you know anything about programming? The above looks like it was copy-
pasted from elsewhere with little understanding of its purpose. Still...


(Caveat: I'm using Excel 2000, which doesn't support PDF or XML. Some of my
info I had to get off MSDN.)

The easiest way to see how to automate this would be to find the "record
macro" menu item (in 2000, it's Tools -> Macro -> Record New Macro...; no
idea about modern versions but the web says "Developer tab, in the Code
group, click Macros") and then run through the process once. Stop recording
after you've saved the XML file, then open the VBA editor and look through
your recorded code.

Doing that for saving a normal workbook as text gives me this (after I
removed unnecessary comments, named arguments, and unneeded args):
Sub Macro1()
Workbooks.Open "E:\MS download links.xls"
ActiveWorkbook.SaveAs "E:\MS download links.txt", xlText
End Sub

You should be able to do something similar for PDF->XML.

To do all PDFs in a given directory, you could do something like this:
Sub convertPdfToXml()
cd "C:\path\to\the\PDF files"
fPdf = Dir("*.pdf")
While Len(fPdf)
Workbooks.Open fPdf
fXml = Replace(fPdf, ".pdf", ".xlsx", Compare:=vbTextCompare)
ActiveWorkbook.SaveAs fXml, FileFormat:=xlOpenXMLWorkbook
ActiveWorkbook.Close
fPdf = Dir
Wend
End Sub

This *might* work for you IF you do the following:
- fix the path containing the PDF files ("C:\path\to\the\PDF files")
- confirm the "Workbooks.Open" line works as-is
- confirm the extension of the XML files (I used ".xlsx"; change as needed)
- check the "ActiveWorkbook.SaveAs" FileFormat (maybe xlXMLSpreadsheet?)

Since I can't test this, it's up to you... but this is at least a start in
the right direction.
 
Hi Auric__,
Thx for your suggestions.
Unfortunately the: Workbooks.Open fPdf only gets what looks like some
formatting
syntax from the pdf file, none of the text.
I tried this suggestion of yours in Excel 2007 and 2003.

Your other idea to record a macro did not work: the code got to the
directory OK
but could not open the pdf so no progress.
Appreciate you help, Cheers, Peter
 
Peter Jamieson presented the following explanation :
I can open manually my pdf file then choose: File-> Save As-> Tables in Excel
Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract what I
need.
The above works manually and as I now receive a bunch of pdf's every couple
of days it would
be nice to automate in VBA the creation of the xml files from the pdf's if
possible

So far I have the following code that runs in VBA and opens the pdf file but
I am
unable to figure out the next step of how I can now save it as the xml file.
Any help appreciated!
Cheers, Peter

Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias "ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:\temp\Test.pdf", vbNullString,
"C:\", SW_SHOWNORMAL
End Sub

Why do you think Excel can open a PDF file? I'd think you need a
software that is able to convert a PDF to your chosen format if capable
of doing so. The structure of a PDF isn't something Excel or VBA can
handle same as it would a text file!
 
Thx Garry,
Yes I am aware of that and have used various pdf to xl converters but
none of them were able to perform well enough for what I need.
I am not trying to open the pdf in Excel directly, I am trying to use VBA to
manipulate the existing manual conversion process.
When I manually convert the pdf with "Save As Tables in Excel Spreadsheet"
(within the adobe exec) the resultant .xml files are then easily parsed by
my
VBA code, far more accurately than any of the bought or free pdf to excel
converters I have tried.
An alternative is maybe Perl or AutoIt but I was hoping to keep everything
within the Excel/VBA project.
Anyway, thanks for your interest, cheers, Peter
 
Back
Top