Creating a C.S.V. - how ?

  • Thread starter Thread starter John Fitzsimons
  • Start date Start date
| 0743202236, A MIND AT A TIME - TPB, LIVINE M, $26.95
| 0895948575, A WOMAN'S I CHING - TPB, STEIN Diane, $29.95,
Cousin John ....

Hi Cousin Stanley,

I have at last sorted out my "input" file. It is approaching 200k
lines long.
The following Python program is set up
assuming 4 field entries per book record ....

Well, some records have 4 field entries and some have 5.
You will have to pre-edit the input file
to remove the 5th entry for each book ....

Not practical. That would mean my going through thousands of book
entries one by one and deleting the 5th entry where appropriate.
As an alternative to keep the 5th or i-th field entry,
you could add another line with a unique string
such as $$$$$ or ----- to delimit individual books ....

That wouldn't work either. For the same reason as above. AFTER I have
put each line of book info as one line per ISBN it would however be
easy. But no longer needed !
If you decide to go that way, let me know
and I can change the program to cope ....
Copy the code below, paste into a text editor,
and save as books_commas.py ....
Then .... python books_commas.py some_file_in.txt
The CSV output is written to the file books_out.txt
in the current working directory ....

Okay, going by your previous programming I would expect that to
work, and work quickly. :-)
'''
Module ....... books_commas.py
Purpose ...... Generate Comma Separated Variable Output File
from an input file with individual data items
on separate lines
Assumes an individual book record
has 4 field entries
Code_By ...... Stanley C. Kitching
Code_Date .... 2004-12-04
'''
import sys
path_in = sys.argv[ 1 ]
file_in = file( path_in , 'r' )
file_out = file( 'books_out.txt' , 'w' )
list_books = [ ]
this_list = []
for this_line in file_in :
if this_line != '\t\n' : # skip every other blank line
this_list.append( this_line.strip() )
if len( this_list ) == 4 :
list_books.append( this_list )
this_list = [ ]
for this_book in list_books :
file_out.write( '%s, %s, %s, %s \n' % ( tuple( this_book ) ) )

file_out.close()

Currently my input data is something like ;

0876044143

EDGAR CAYCE'S DIET & RECIPE GUIDE

A.R.E. PRESS

$28.95

0876044380

EDGAR CAYCE'S EGYPT

A.R.E. PRESS

$75.00

IND

As you can see the first book is four items and the second one five
items. I HAVE been able to do the job with a NoteTab clip someone
made for me but it takes quite a while to run. At least 15 minutes.

What the clip appears to do is to remove blank lines, make the ISBN
the first item in each line, and replace the line returns with commas.

The above comes out as ;

0876044143,EDGAR CAYCE'S DIET & RECIPE GUIDE,A.R.E. PRESS,$28.95
0876044380,EDGAR CAYCE'S EGYPT,A.R.E. PRESS,$75.00,IND

john -- do you still have that emacs lying around? if yes, then i've
written a small function that does pretty much what you want.

i've tried on the test data you posted and for me it does 200k lines in
about 2 seconds.

i made it separate by tabs, rather than commas, but the change is easy if
necessary.

jack
 
| The source isn't actually on a web page. I think it is SQL.
|
| I don't have direct access to it. A search web page gets it.
| Title by title or author by author.

I think I understand that your currnet input file
was built up piecwise little by little and would
contain only items that you have explicitly selected
from the original source ....

I'll have another look at coping with entries
with differing number of fields and have an idea
that I think might work and be fairly easy to add
to the small Python program that I originally posted ....
 
Hi Jack,

john -- do you still have that emacs lying around?

Yes, but I haven't updated it as I didn't want to lose all those extra
menu items you created for me.
if yes, then i've
written a small function that does pretty much what you want.
i've tried on the test data you posted and for me it does 200k lines in
about 2 seconds.

At least 100 times faster than my script. :-)
i made it separate by tabs, rather than commas, but the change is easy if
necessary.

Well, I am putting this info into Excel. It takes C.S.V. files fine. I
think it will take tab delimitation okay too.

Regards, John.
 
On Tue, 04 Jan 2005 12:22:23 +1100, John Fitzsimons wrote:

followed up by mail... assuming your old mail address is still good.

jack
the fast gorilla skied over a lazy cloud
 
Back
Top