1) The client does not know what software is as the client is not very
savvy
when it comes to computers. He is not sure, and is not in contact with the
previous programmers that
Yes, and I assuming that this is where you are supposed to come in then, is
it not?
It just seems to me a inquiry about these files might help you.
Is this a DOS computer, windows box, a Linux box?. A handheld pda? Perhaps
some kind of telephone hardware system that spits out data? An old
mainframe?
Your asking for
professional help here, and yet the steps being taken to ascertain the
systems that produces information is lacking here. This info could very well
help you folks eliminate a lot of your data processing and deciphering of
these files.
2) The file is not delimited, nor does it easily delimit in notepad,
wordpad, or textpad. Nor does it work very well using Excel's text to
columns
feature, or Excel's import text feature.
Well, I get all kinds of data that excel can't handle, but when you have
knowledge about the system that produces the information, what seems
incredibly insanely complex can be actually come quite simple.
And, are you sure about delimiters? I've have to assume that you've opened
up using a hex editor,
and looked for any other things like repeating chr(8), chr(9) etc?
For
example here is a dump of a file I get from a mainframe, and while it's
extremely difficult to parse out, if you know the operating system and
details about the pick operating system, then deciphering the following data
becomes actually quite easy
fid: 8100 : 0 8108 0 0 ( 1FA4 : 0 1FAC 0 0 )
000 :....].._18^Engineered Systems^^2960 Lindsay Dr SW^:
050 :^Calgary^Alberta^T3E 6A8^(000)-249-5752^Y^__....i.:
100 :._132^Munro Engineering Inc.^^7 Floor 840 6 Avenue:
150 : S.W.^^Calgary^Alberta^T2P 3E5^(000)-263-0070^Y^__:
200 :....a.._32^Computer Modelling Group^^3512 33 Stree:
250 :t NW^^Calgary^Alberta^T2L 2A6^(000)-282-9286^Y^_..:
300 :..i.._146^Prophet Technologies Ltd.^^115 1144 29 A:
350 :venue N.E.^^Calgary^Alberta^T2E 7P1^(000)-291-9526:
400 :^Y^_...._.._46^Keentech Software^^8340 Addison Dri:
450 :ve SE^^Calgary^Alberta^T2H 1P1^(000)-259-5676^Y^__:
500 :....W.._160^Software Plus Inc.^^600 11012 Macleod :
550 :S^^Calgary^Alberta^^(000)-278-4082^^__....g.._60^R:
600 :etail Business Systems Inc^^1200 736 6 Avenue SW^^:
650 :Calgary^Alberta^T2P 0T7^(000)-233-8705^^__....g.._:
700 :8^Digicore Software Systems Inc.^^Box 8656 Station:
750 : F^^Calgary^Alberta^T2J 5S4^(000)-251-7781^Y^_....:
800 :W.._74^Software Centre^^839 6 Avenue SW^^Calgary^A:
850 :lberta^T2P 0V3^(000)-269-6626^Y^__....c.._88^Bethe:
900 :l Software Inc.^^259 Edgeland Road N.W.^^Calgary^A:
950 :lberta^T3A 2Z2^(000)-239-2214^Y^__....k.._111^Ghos:
While the above is not XML all, it's actually delimited in the same fashion,
but that 25 year old mainframe uses custom delimiters. (4 of them).
Once you know what that the
delimiters are, then it's pretty easy to parse out the above data into
records + fields. Excel, or ms-access can't even BEGIN to import the above,
yet the data not that hard to parse out.
3) I would post some sample data here but it is confidential information
and
I would be compromising my client giving anyone else this information
Ok, that makes sense. However, you don't have to post the actual data if it
is senstive. You simply
enter some test data into that system, or simply in your editor
replace the data. You can simply replace all letters (alpha) to the letter
a, and then all numbers to 0. In fact you could write a few lines of code to
do this replacing for you, and presto you have some data that shows the
format, but does not compromise your clients information.
For example, here's the same data from above posted, and the only thing I've
replaced as the letters and numbers
aaa: 0000 : 0 0000 0 0 ( 0AA0 : 0 0AAA 0 0 )
000 :....].._00^Aaaaaaaaaa Aaaaaaa^^0000 Aaaaaaa Aa AA^:
000 :^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^A^__....a.:
000 :._000^Aaaaa Aaaaaaaaaaa Aaa.^^0 Aaaaa 000 0 Aaaaaa:
000 : A.A.^^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^A^__:
000 :....a.._00^Aaaaaaaa Aaaaaaaaa Aaaaa^^0000 00 Aaaaa:
000 :a AA^^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^A^_..:
000 :..a.._000^Aaaaaaa Aaaaaaaaaaaa Aaa.^^000 0000 00 A:
000 :aaaaa A.A.^^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000:
000 :^A^_...._.._00^Aaaaaaaa Aaaaaaaa^^0000 Aaaaaaa Aaa:
000 :aa AA^^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^A^__:
000 :....A.._000^Aaaaaaaa Aaaa Aaa.^^000 00000 Aaaaaaa :
000 :A^^Aaaaaaa^Aaaaaaa^^(000)-000-0000^^__....a.._00^A:
000 :aaaaa Aaaaaaaa Aaaaaaa Aaa^^0000 000 0 Aaaaaa AA^^:
000 :Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^^__....a.._:
000 :0^Aaaaaaaa Aaaaaaaa Aaaaaaa Aaa.^^Aaa 0000 Aaaaaaa:
000 : A^^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^A^_....:
000 :A.._00^Aaaaaaaa Aaaaaa^^000 0 Aaaaaa AA^^Aaaaaaa^A:
000 :aaaaaa^A0A 0A0^(000)-000-0000^A^__....a.._00^Aaaaa:
000 :a Aaaaaaaa Aaa.^^000 Aaaaaaaa Aaaa A.A.^^Aaaaaaa^A:
000 :aaaaaa^A0A 0A0^(000)-000-0000^A^__....a.._000^Aaaa:
Once again, not a huge deal. and once again, posting this information might
not help anyone here.
However, it sure is simple way to protect the data.
Lets not use every little issue as a roadblock that holds back possbile
solutions here. Lets do a "end run" around the problems we encounter and
find a solution here! That's how this whole industry works! Most people will
say hey we have a security issue, and I will say just replace the letters
and move onwards!
You mentioning and hoping that somebody might come along and say, hey I know
what that format is. Every small detail can help here.
4) it is data provided to my client via the internet, but the documents
are
not xml or html.
OK, now we're starting to get some progress here. Now we are talking about
potential documents, and perhaps not information produced in a database
format? This kind of information makes it even more imperative to find out
how the data is created. Perahps this data comes is from some type of
document system?
You could save yourself enormous amount of processing and work of
deciphering this if you ascertain what the document system that creaates
this stuff (and perhaps it might not help at all).
Perhaps the software that makes this stuff has some kind of free reader, and
you not need to write any custom code at all?
I most certainly apologize to you if I sound a little bit condescending, but
I guess I reacted in such a way that it seems so obvious to learn a little
bit more about the system that creates this information.
If at the end of the day you don't have the ability to get additional
information about this system, then you don't have that option. Without
that option, then you'll have to try other approaches, and
that very well seems what you doing now.
Sorry I could not be of more help here...