Creating a C.S.V. - how ?

jack horsfield · Jan 3, 2005

| 0743202236, A MIND AT A TIME - TPB, LIVINE M, $26.95
| 0895948575, A WOMAN'S I CHING - TPB, STEIN Diane, $29.95,

Click to expand...

Cousin John ....

Click to expand...

Hi Cousin Stanley,

I have at last sorted out my "input" file. It is approaching 200k
lines long.

The following Python program is set up
assuming 4 field entries per book record ....

Click to expand...

Well, some records have 4 field entries and some have 5.

You will have to pre-edit the input file
to remove the 5th entry for each book ....

Click to expand...

Not practical. That would mean my going through thousands of book
entries one by one and deleting the 5th entry where appropriate.

As an alternative to keep the 5th or i-th field entry,
you could add another line with a unique string
such as $$$$$ or ----- to delimit individual books ....

Click to expand...

That wouldn't work either. For the same reason as above. AFTER I have
put each line of book info as one line per ISBN it would however be
easy. But no longer needed !

If you decide to go that way, let me know
and I can change the program to cope ....

Click to expand...

Copy the code below, paste into a text editor,
and save as books_commas.py ....

Click to expand...

Then .... python books_commas.py some_file_in.txt

Click to expand...

The CSV output is written to the file books_out.txt
in the current working directory ....

Click to expand...

Okay, going by your previous programming I would expect that to
work, and work quickly.

'''
Module ....... books_commas.py

Click to expand...

Purpose ...... Generate Comma Separated Variable Output File
from an input file with individual data items
on separate lines

Click to expand...

Assumes an individual book record
has 4 field entries

Click to expand...

Code_By ...... Stanley C. Kitching
Code_Date .... 2004-12-04
'''

Click to expand...

import sys

Click to expand...

path_in = sys.argv[ 1 ]

Click to expand...

file_in = file( path_in , 'r' )

Click to expand...

file_out = file( 'books_out.txt' , 'w' )

Click to expand...

list_books = [ ]

Click to expand...

this_list = []

Click to expand...

for this_line in file_in :

Click to expand...

if this_line != '\t\n' : # skip every other blank line

Click to expand...

this_list.append( this_line.strip() )

Click to expand...

if len( this_list ) == 4 :

Click to expand...

list_books.append( this_list )

Click to expand...

this_list = [ ]

Click to expand...

for this_book in list_books :

Click to expand...

file_out.write( '%s, %s, %s, %s \n' % ( tuple( this_book ) ) )

file_out.close()

Click to expand...

Currently my input data is something like ;

0876044143

EDGAR CAYCE'S DIET & RECIPE GUIDE

A.R.E. PRESS

$28.95

0876044380

EDGAR CAYCE'S EGYPT

A.R.E. PRESS

$75.00

IND

As you can see the first book is four items and the second one five
items. I HAVE been able to do the job with a NoteTab clip someone
made for me but it takes quite a while to run. At least 15 minutes.

What the clip appears to do is to remove blank lines, make the ISBN
the first item in each line, and replace the line returns with commas.

The above comes out as ;

0876044143,EDGAR CAYCE'S DIET & RECIPE GUIDE,A.R.E. PRESS,$28.95
0876044380,EDGAR CAYCE'S EGYPT,A.R.E. PRESS,$75.00,IND

john -- do you still have that emacs lying around? if yes, then i've
written a small function that does pretty much what you want.

i've tried on the test data you posted and for me it does 200k lines in
about 2 seconds.

i made it separate by tabs, rather than commas, but the change is easy if
necessary.

jack

Cousin Stanley · Jan 3, 2005

| The source isn't actually on a web page. I think it is SQL.
|
| I don't have direct access to it. A search web page gets it.
| Title by title or author by author.

I think I understand that your currnet input file
was built up piecwise little by little and would
contain only items that you have explicitly selected
from the original source ....

I'll have another look at coping with entries
with differing number of fields and have an idea
that I think might work and be fairly easy to add
to the small Python program that I originally posted ....

John Fitzsimons · Jan 4, 2005

Hi Jack,

john -- do you still have that emacs lying around?

Yes, but I haven't updated it as I didn't want to lose all those extra
menu items you created for me.

if yes, then i've
written a small function that does pretty much what you want.

i've tried on the test data you posted and for me it does 200k lines in
about 2 seconds.

At least 100 times faster than my script. :-)

i made it separate by tabs, rather than commas, but the change is easy if
necessary.

Well, I am putting this info into Excel. It takes C.S.V. files fine. I
think it will take tab delimitation okay too.

Regards, John.

jack horsfield · Jan 4, 2005

On Tue, 04 Jan 2005 12:22:23 +1100, John Fitzsimons wrote:

followed up by mail... assuming your old mail address is still good.

jack
the fast gorilla skied over a lazy cloud

Creating a C.S.V. - how ?

jack horsfield

Cousin Stanley

John Fitzsimons

jack horsfield