J
jack horsfield
| 0743202236, A MIND AT A TIME - TPB, LIVINE M, $26.95
| 0895948575, A WOMAN'S I CHING - TPB, STEIN Diane, $29.95,Cousin John ....
Hi Cousin Stanley,
I have at last sorted out my "input" file. It is approaching 200k
lines long.
The following Python program is set up
assuming 4 field entries per book record ....
Well, some records have 4 field entries and some have 5.
You will have to pre-edit the input file
to remove the 5th entry for each book ....
Not practical. That would mean my going through thousands of book
entries one by one and deleting the 5th entry where appropriate.
As an alternative to keep the 5th or i-th field entry,
you could add another line with a unique string
such as $$$$$ or ----- to delimit individual books ....
That wouldn't work either. For the same reason as above. AFTER I have
put each line of book info as one line per ISBN it would however be
easy. But no longer needed !
If you decide to go that way, let me know
and I can change the program to cope ....Copy the code below, paste into a text editor,
and save as books_commas.py ....Then .... python books_commas.py some_file_in.txtThe CSV output is written to the file books_out.txt
in the current working directory ....
Okay, going by your previous programming I would expect that to
work, and work quickly.
'''
Module ....... books_commas.pyPurpose ...... Generate Comma Separated Variable Output File
from an input file with individual data items
on separate linesAssumes an individual book record
has 4 field entriesCode_By ...... Stanley C. Kitching
Code_Date .... 2004-12-04
'''import syspath_in = sys.argv[ 1 ]file_in = file( path_in , 'r' )file_out = file( 'books_out.txt' , 'w' )list_books = [ ]this_list = []for this_line in file_in :if this_line != '\t\n' : # skip every other blank linethis_list.append( this_line.strip() )if len( this_list ) == 4 :list_books.append( this_list )this_list = [ ]for this_book in list_books :file_out.write( '%s, %s, %s, %s \n' % ( tuple( this_book ) ) )
file_out.close()
Currently my input data is something like ;
0876044143
EDGAR CAYCE'S DIET & RECIPE GUIDE
A.R.E. PRESS
$28.95
0876044380
EDGAR CAYCE'S EGYPT
A.R.E. PRESS
$75.00
IND
As you can see the first book is four items and the second one five
items. I HAVE been able to do the job with a NoteTab clip someone
made for me but it takes quite a while to run. At least 15 minutes.
What the clip appears to do is to remove blank lines, make the ISBN
the first item in each line, and replace the line returns with commas.
The above comes out as ;
0876044143,EDGAR CAYCE'S DIET & RECIPE GUIDE,A.R.E. PRESS,$28.95
0876044380,EDGAR CAYCE'S EGYPT,A.R.E. PRESS,$75.00,IND
john -- do you still have that emacs lying around? if yes, then i've
written a small function that does pretty much what you want.
i've tried on the test data you posted and for me it does 200k lines in
about 2 seconds.
i made it separate by tabs, rather than commas, but the change is easy if
necessary.
jack