Controlling Text Loading

  • Thread starter Thread starter Nigel
  • Start date Start date
N

Nigel

Hi All
I have a fixed width text file which contains three columns of data and up
to 150,000 rows

I wish to read these into a worksheet, parsing the text into three columns
(by specifying each column width), putting the data into columns A:C until I
reach 50,000. Then loading the next 50,000 rows into columns C:E, the next
F:H etc., until all rows have been loaded.

Any help greatly appreciated.

Cheers
Nigel
 
Sorry, the target references should have been A:C; D:F, G:I etc., No wonder
I need help!!

Nigel said:
Hi All
I have a fixed width text file which contains three columns of data and up
to 150,000 rows

I wish to read these into a worksheet, parsing the text into three columns
(by specifying each column width), putting the data into columns A:C until I
reach 50,000. Then loading the next 50,000 rows into columns C:E, the next
F:H etc., until all rows have been loaded.

Any help greatly appreciated.

Cheers
Nigel




----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption
=---
 
Nigel,

I didn't properly test the tab positions. There might be some issues from
0-based to 1-based offsets, but I think you should be able to work with
this.
It gets really slow for 150,000 records. My test file was about 7MB. Maybe
another, faster, solution will be posted soon?
Might even be opportunity for using ODBC against flatfile.

Sub test()
Const cCol1 = 20, cCol2 = 20, cCol3 = 20, cLimit = 50000
Dim intFreeFile As Integer, i As Long, j As Long, strTemp As String

'Generate a test file
intFreeFile = FreeFile
Open "c:\t\test.txt" For Output As #intFreeFile
For i = 1 To 150
Print #intFreeFile, "R" & i & "C1"; Tab(cCol1 + 1); "R" & i & "C2";
Tab(cCol1 + cCol2 + 1); "R" & i & "C3"
Next
Close #intFreeFile

'Load test file into Cells
intFreeFile = FreeFile
Open "c:\t\test.txt" For Input As #intFreeFile
i = 1: j = 1
Do Until EOF(intFreeFile)
Line Input #intFreeFile, strTemp
Cells(i, j).Value = Trim(Mid(strTemp, 1, cCol1))
Cells(i, j + 1).Value = Trim(Mid(strTemp, cCol1, cCol2))
Cells(i, j + 2).Value = Trim(Mid(strTemp, cCol1 + cCol2, cCol3))
i = i + 1
If i > cLimit Then
i = 1
j = j + 3
End If
Loop
Close #intFreeFile
End Sub


--
Rob van Gelder - http://www.vangelder.co.nz/excel


Nigel said:
Sorry, the target references should have been A:C; D:F, G:I etc., No wonder
I need help!!

until Encryption
=---




----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption
=---
 
Rob,
Thanks very much. I'll try this, but you have got me thinking about ODBC.
I do not know enough about this other than I can connect to the file without
loading it into Excel. But do you know if I can search the Text file and
extract the relevant bits into Excel?

Cheers
Nigel
 
Why not import it in to access, then use that as the source for a pivot
table.

Or you could use SQL command to do you search and so forth, returning the
results to excel

If the first "column" of data is a unique identifier whose position can be
numerically calculated, you can use a random read directly from the text
file.


Guess any answer would depend on specifically what you want to do with the
data.

--
Regards,
Tom Ogilvy


Nigel said:
Hi All
I have a fixed width text file which contains three columns of data and up
to 150,000 rows

I wish to read these into a worksheet, parsing the text into three columns
(by specifying each column width), putting the data into columns A:C until I
reach 50,000. Then loading the next 50,000 rows into columns C:E, the next
F:H etc., until all rows have been loaded.

Any help greatly appreciated.

Cheers
Nigel




----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption
=---
 
You can certainly load this via ODBC using the Microsoft Text Driver
It wouldn't solve your problem of wrapping after 50,000 records.

As Tom suggests, you may want to put the data into Access first before
moving to Excel.



--
Rob van Gelder - http://www.vangelder.co.nz/excel


Nigel said:
Rob,
Thanks very much. I'll try this, but you have got me thinking about ODBC.
I do not know enough about this other than I can connect to the file without
loading it into Excel. But do you know if I can search the Text file and
extract the relevant bits into Excel?

Cheers
Nigel






----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption
=---
 
Thanks for the tips, I have used a modified version of the suggestion made
by Rob van Gelder and this is working but as predicted a bit slow!
Here is some more info.
1. Specifically the data needs to be searchable, the first column is not
unique and is not at this stage ordered, but it is a fixed width.
2. I do not have direct control over the creation of the source, so any
sorts etc would need to be part of the loading process.
3. The process needs to be stand alone, allowing others to load the data and
use the Excel functions to search and report the contents. I cannot rely on
them having MS Access on their machine (in fact I know it does not exist in
many target PCs).
4. After loading the user enters a value to search and match the first field
as a key (Tom you kindly gave me some form control filling tips to do this),
and if the search is successful, a second form control fills with the second
data item for each match. Up to ~20 maximum.
5. User chooses from the second item list and is presented with the third
data item. (If only 1 item found in first search data is presneted directly
and no further action takes place)

I have it working using a combination of the Text load from Rob (but slow to
get data in) and the Search process from you Tom, modified to suit the data.

Two things I would strive to do.

1. Speed up the data load - read it directly or something else?
2. Speed up the search - it takes a 2-3 seconds on my PC to search all 150k
records, but the users are working in an busy operational area and probably
would benefit from more speed. (they may have PCs with less resouces further
slowing down the process)

Any thoughts or ideas greatly appreciated.

Cheers
Nigel
ps I am away until Thursday, so will not be able to reply sooner. But I
will. Thanks in advance
 
Nigel,

In my opinion, it's beyond what Excel is best at (that's probably a
dangerous thing to say in this group). Yes, Excel can do it, but you suffer
with speed and complexity of the system.
You've already hit the 65536 row limit, so you're wrapping the records which
really complicates development.

A modern database system can manage your data and do your 150k searches in
sub-second time - plus LOTS more.


That said, I realised after my post that some speed increase could be
achieved using array copies. It's still slow though.

Sub test()
Const cCol1 = 20, cCol2 = 20, cCol3 = 20, cLimit = 50000
Dim intFreeFile As Integer, i As Long, j As Long, strTemp As String
Dim arr() As String

'Generate a test file
intFreeFile = FreeFile
Open "c:\t\test.txt" For Output As #intFreeFile
For i = 1 To 150000
Print #intFreeFile, "R" & i & "C1"; Tab(cCol1 + 1); "R" & i & "C2";
Tab(cCol1 + cCol2 + 1); "R" & i & "C3"
Next
Close #intFreeFile

On Error Resume Next
intFreeFile = FreeFile
Open "c:\t\test.txt" For Input As #intFreeFile
i = 0: j = 1
ReDim arr(cLimit, 2)
Do Until EOF(intFreeFile)
Line Input #intFreeFile, strTemp
arr(i, 0) = Trim(Mid(strTemp, 1, cCol1))
arr(i, 1) = Trim(Mid(strTemp, cCol1, cCol2))
arr(i, 2) = Trim(Mid(strTemp, cCol1 + cCol2, cCol3))
i = i + 1
If i > cLimit - 1 Then
Cells(1, j).Resize(cLimit, 3).Value = arr
i = 0: j = j + 3
ReDim arr(cLimit, 2)
End If
Loop
Close #intFreeFile
End Sub


--
Rob van Gelder - http://www.vangelder.co.nz/excel


Nigel said:
Thanks for the tips, I have used a modified version of the suggestion made
by Rob van Gelder and this is working but as predicted a bit slow!
Here is some more info.
1. Specifically the data needs to be searchable, the first column is not
unique and is not at this stage ordered, but it is a fixed width.
2. I do not have direct control over the creation of the source, so any
sorts etc would need to be part of the loading process.
3. The process needs to be stand alone, allowing others to load the data and
use the Excel functions to search and report the contents. I cannot rely on
them having MS Access on their machine (in fact I know it does not exist in
many target PCs).
4. After loading the user enters a value to search and match the first field
as a key (Tom you kindly gave me some form control filling tips to do this),
and if the search is successful, a second form control fills with the second
data item for each match. Up to ~20 maximum.
5. User chooses from the second item list and is presented with the third
data item. (If only 1 item found in first search data is presneted directly
and no further action takes place)

I have it working using a combination of the Text load from Rob (but slow to
get data in) and the Search process from you Tom, modified to suit the data.

Two things I would strive to do.

1. Speed up the data load - read it directly or something else?
2. Speed up the search - it takes a 2-3 seconds on my PC to search all 150k
records, but the users are working in an busy operational area and probably
would benefit from more speed. (they may have PCs with less resouces further
slowing down the process)

Any thoughts or ideas greatly appreciated.

Cheers
Nigel
ps I am away until Thursday, so will not be able to reply sooner. But I
will. Thanks in advance



and




----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption
=---
 
Thanks again Rob, I pretty much knew this was really testing the limits of
Excel and that only minor improvements are likely with such a large data set
to load and search.

Regrettably our business systems do not have the flexibilty to add other
applications (without jumping through hoops of fire!!) so we are stuck with
Excel. I have speedied up the second stage of the search by storing all
records details found in an array, for subsequent processing but that's
about it.

One thought - is Excel faster when it searches / reads numerical data as
opposed to text strings? At present although the data is mainly numerical,
I have set cells to text, also would it be beneficial to load each string as
a complete string rather than chop it up?. But I suspect the gain is
marginal in the overall scheme of things.

Thanks for you continued help

Cheers
Nigel
 
It depends on your operational parameters, but you can do a one time
conversion to an access file - the users don't need to have access on their
machines to use the file. ADO, DAO, ODBC and so forth don't open access to
get information from an MDB.


--
Regards,
Tom Ogilvy

Nigel said:
Thanks for the tips, I have used a modified version of the suggestion made
by Rob van Gelder and this is working but as predicted a bit slow!
Here is some more info.
1. Specifically the data needs to be searchable, the first column is not
unique and is not at this stage ordered, but it is a fixed width.
2. I do not have direct control over the creation of the source, so any
sorts etc would need to be part of the loading process.
3. The process needs to be stand alone, allowing others to load the data and
use the Excel functions to search and report the contents. I cannot rely on
them having MS Access on their machine (in fact I know it does not exist in
many target PCs).
4. After loading the user enters a value to search and match the first field
as a key (Tom you kindly gave me some form control filling tips to do this),
and if the search is successful, a second form control fills with the second
data item for each match. Up to ~20 maximum.
5. User chooses from the second item list and is presented with the third
data item. (If only 1 item found in first search data is presneted directly
and no further action takes place)

I have it working using a combination of the Text load from Rob (but slow to
get data in) and the Search process from you Tom, modified to suit the data.

Two things I would strive to do.

1. Speed up the data load - read it directly or something else?
2. Speed up the search - it takes a 2-3 seconds on my PC to search all 150k
records, but the users are working in an busy operational area and probably
would benefit from more speed. (they may have PCs with less resouces further
slowing down the process)

Any thoughts or ideas greatly appreciated.

Cheers
Nigel
ps I am away until Thursday, so will not be able to reply sooner. But I
will. Thanks in advance



and




----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption
=---
 
And to be clear, Excel doesn't need access present to retrieve data from an
MDB/Access file.
 
Tom's suggestion is a good one.
If this is going to be a one-off load (from text to Excel) then consider
load to an MDB file using Access.
Once it's in MDB, you don't need Access any more, just use ActiveX Data
Objects library to access the data.
Sorting and searching is fairly easy.


As for your question, I don't know which is faster. You could record the
duration each routine takes, if you're that determined.
This KB article may help:
http://support.microsoft.com/default.aspx?scid=kb;[LN];172338


--
Rob van Gelder - http://www.vangelder.co.nz/excel


Nigel said:
Thanks again Rob, I pretty much knew this was really testing the limits of
Excel and that only minor improvements are likely with such a large data set
to load and search.

Regrettably our business systems do not have the flexibilty to add other
applications (without jumping through hoops of fire!!) so we are stuck with
Excel. I have speedied up the second stage of the search by storing all
records details found in an array, for subsequent processing but that's
about it.

One thought - is Excel faster when it searches / reads numerical data as
opposed to text strings? At present although the data is mainly numerical,
I have set cells to text, also would it be beneficial to load each string as
a complete string rather than chop it up?. But I suspect the gain is
marginal in the overall scheme of things.

Thanks for you continued help

Cheers
Nigel







----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption
=---
 
Rob van Gelder said:
load to an MDB file using Access.
Once it's in MDB, you don't need Access any more, just use ActiveX Data
Objects library to access the data.

You don't need the MS Access application at all. You can create a Jet
database using ADOX (Microsoft ActiveX Data Objects Extensions for
Data Definition Language and Security) and load the data using ADO
(Microsoft ActiveX Data Objects).

--
 
Thanks Tom, I am in the process of redeveloping the application to take
advantage of an MDB file.

Cheers
Nigel
 
Back
Top