Can we Read the text contents from PDF using .net

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Can we Read the text contents from PDF using .net.

If possible means what to do.
 
Can we Read the text contents from PDF using .net.

If possible means what to do.

PDF is a collection of objects - it's not formatted text.

So you cna read the text - tho maybe not in a meaningful manner
 
Here is what I do:

''' <summary>
''' Gets the PDF text from a file
''' requires pdftotext.exe from http://www.foolabs.com/xpdf
''' </summary>
''' <param name="filename">The filename.</param>
''' <returns>PDF Text</returns>
Public Function getPDFtext(ByVal filename As String) As String
Dim p As New System.Diagnostics.Process
Dim std_out As IO.StreamReader
Dim txtStdout As String = ""

Try

p.StartInfo.FileName = "Asset Search\pdftotext.exe"
p.StartInfo.Arguments = filename & " -"
p.StartInfo.UseShellExecute = False
p.StartInfo.CreateNoWindow = True
p.StartInfo.RedirectStandardOutput = True

p.Start()
std_out = p.StandardOutput()

'Get the text from standard output
txtStdout = std_out.ReadToEnd()

std_out.Close()
Catch ex As Exception
MsgBox("Error in while extracting PDF text, the error is: " &
ex.Message.ToString)
End Try

Return txtStdout
End Function

I wouldn't use it for anything serious, business critical, or Realtime. For
that you should probably go with a commercial control like
http://www.pdfonline.com/. But for quick and dirty text extraction it works
fine for me.

Best Regards,

Chris
 
Just to clarify the following line should point the the actual pdftotext.exe
program

p.StartInfo.FileName = "Asset Search\pdftotext.exe" <---Points to location
of pdftotext.exe

Chris
 
If you want to pull information in some type of format my advice is to
purchase some sdk software (suggestions: OmniPage or ABBYY). The
software will allow you to extract information or images from a pdf. I
have never used the sdk kit however I have used the software and it
does extremely well when we use it to extract data from a pdf and
export it as an excel file.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top