Need help finding software or add on to import strange files

  • Thread starter Thread starter notyelf
  • Start date Start date
N

notyelf

Hello all!

I have a client I just started working for a few weeks ago. He has been
through a few programmers on the project I am working on, so inevitably there
are some oddities I need to work out.

One of those 'oddities' is the software or add on used in order to import
their flat files. I have no idea what file types they are, but they do not
work in normal text programs or excel. They come to me as .C08, .E08, .H08,
..R08, .W08, and .DTA.

They have also been presented to me as .RAC, .CLS, .ENT, .WOR, .HOR. Which
as I can tell, the first letters correspond to the previous 5 file formats.
It looks to me like some sort of internal file extension naming system, but I
am unable to find the program that allows me to use these.

Any thoughts as to what these are or how to find out?
 
I assume that that of previous infrastructure and developers and I T. people
been dealing with these files for years?

Or, is it just an amazing coincidence that all these weird file extensions
appeared out of nowhere the day you landed on their doorstep?

I guess it seems kind of strange that you are asking here about these files,
but not asked around to the people giving you these files what kind of
sofware or what system did they come from? I'm really tempted here to make
some kind of condescending remark about how the company seems to be
upholding its tradition of hiring the wrong people who ask the wrong
questions, but lets just not go there........

In a nutshell, often the file extensions can help you determine what the
actual data inside of the files looks like, however changing a file
extension obviously has nothing to do with changing the contents inside of
those files.

The first thing I would do is ask from what system or what software produced
those files? While SQL server produces some files with extensions, no one
would try and read those files directly, but it sure is trivial and and an
easy matter to connect to SQL server to read that data (in other words, you
don't try and read the files directly, but you interface to the software
that produces those files, eg sql server, sap, what ever). And, more often
you can use that sottware system such as SQL server to export the data into
a standardized format such as comma delimited or some such text type file
that other applications can easily consume.

Barring where the files came from, the first thing I would do is simply try
to open up one of the files using notepad, and examine the data inside. In
fact, I surpirsed you not even posted a few lines of the data from from of
those files, so at least we could try and "guess" as to what the data
formats look like.

What does the data look like inside of these files? It is delimoted, is it
HTML, is it xml? is there tabs in it? Is it fixed lengh? Is it 100% random?

Perhaps it's possible that with a few lines of code you could roll your own
import? It is quite hard to tell right now how difficult of a task has
landed on your lap without at least a little bit more information on your
part.
 
Albert,

While I agree I should provide more information, I am thinking if you are
having a bad day, perhaps you should take your frustrations to a professional
instead of venting negative and condescending comments that are not useful to
anyone, on a public forum.

1) The client does not know what software is as the client is not very savvy
when it comes to computers. He is not sure, and is not in contact with the
previous programmers that

2) The file is not delimited, nor does it easily delimit in notepad,
wordpad, or textpad. Nor does it work very well using Excel's text to columns
feature, or Excel's import text feature.

3) I would post some sample data here but it is confidential information and
I would be compromising my client giving anyone else this information

4) it is data provided to my client via the internet, but the documents are
not xml or html.

5) the Access Application that I am programming in, has several hundred
pages of code, several of which, are used in importing these special files.
None of it however, is explicitly explained in the code, so it would be far
simpler to understand the software being used, then to spend days
restructuring the pages of code to work through a simpler text file format.
Not to mention the days it would take understanding exactly how the data is
delimited, since a standard comman, return carriage or quotation delmitation
does not work.

6) I am perfectly aware that it is very likely that the file extentions are
a result of an applications internal naming structure. This is why I asked
the question here, to see if anyone knew off chance what software application
uses this naming structure, so I might be able to locate the software and use
it! I even said so in my first post...

Hope that helps anyone trying to figure this out with me =)
 
On the whole, Albert was kind enough. His frustrations were more to
do with the fact that he would like to help but that you didn't give
him much to go on. He judged with incomplete information (his
judgement was based on the assumption that your situation was like so
many others...) As to your own personal comment to Albert you might
want to apologize. You were just as quick to judge with incomplete
information. Albert is an excellent source of information in these
newsgroups. Bear in mind that the unpaid volunteers who provide the
peer support in these Access newsgroups usually confine themselves to
things to do with issues *in Access*. While ExternalData is the
subject of this group, we're not responsible for knowing all of the
file formats and extensions used throughout the world. When exotic
file extensions or formats are in play the questions that come have to
come with those extensions and or formats defined.

Given what you had to say in your last post I suggest that you analyze
the *pages of code* that you have on hand. If it is uncommented then
now is the best time there will ever be to comment it. You will
benefit and so will any succeeding developers. By understanding the
code you will develop an understanding of those exotic formats and
understand the extension coding scheme. It's a terrible way to have
to get the information but that's all you've got unless you can get
the last developer who really understood it all to give you a hand.
I've had many situations like that. Tell the customer what the issues
are and that it will be longer to provide a solution that either of
you thought it could take. He'll want to know how long and you'll
just have to tell him that it's impossible to know until you are well
into the project. Don't commit to a deadline that you don't know you
can meet. Better to lose the job than to come to work each day filled
with worry and your gut churning with the knowledge that the schedule
is slipping away.

Post back with specific questions. Hope that Albert will be one of
the responders.

Oh, a great tool to help you document the code is MZTools, available
for free at a site of the same name.

HTH
 
1) The client does not know what software is as the client is not very
savvy
when it comes to computers. He is not sure, and is not in contact with the
previous programmers that

Yes, and I assuming that this is where you are supposed to come in then, is
it not?

It just seems to me a inquiry about these files might help you.

Is this a DOS computer, windows box, a Linux box?. A handheld pda? Perhaps
some kind of telephone hardware system that spits out data? An old
mainframe?

Your asking for
professional help here, and yet the steps being taken to ascertain the
systems that produces information is lacking here. This info could very well
help you folks eliminate a lot of your data processing and deciphering of
these files.

2) The file is not delimited, nor does it easily delimit in notepad,
wordpad, or textpad. Nor does it work very well using Excel's text to
columns
feature, or Excel's import text feature.

Well, I get all kinds of data that excel can't handle, but when you have
knowledge about the system that produces the information, what seems
incredibly insanely complex can be actually come quite simple.

And, are you sure about delimiters? I've have to assume that you've opened
up using a hex editor,
and looked for any other things like repeating chr(8), chr(9) etc?

For
example here is a dump of a file I get from a mainframe, and while it's
extremely difficult to parse out, if you know the operating system and
details about the pick operating system, then deciphering the following data
becomes actually quite easy


fid: 8100 : 0 8108 0 0 ( 1FA4 : 0 1FAC 0 0 )
000 :....].._18^Engineered Systems^^2960 Lindsay Dr SW^:
050 :^Calgary^Alberta^T3E 6A8^(000)-249-5752^Y^__....i.:
100 :._132^Munro Engineering Inc.^^7 Floor 840 6 Avenue:
150 : S.W.^^Calgary^Alberta^T2P 3E5^(000)-263-0070^Y^__:
200 :....a.._32^Computer Modelling Group^^3512 33 Stree:
250 :t NW^^Calgary^Alberta^T2L 2A6^(000)-282-9286^Y^_..:
300 :..i.._146^Prophet Technologies Ltd.^^115 1144 29 A:
350 :venue N.E.^^Calgary^Alberta^T2E 7P1^(000)-291-9526:
400 :^Y^_...._.._46^Keentech Software^^8340 Addison Dri:
450 :ve SE^^Calgary^Alberta^T2H 1P1^(000)-259-5676^Y^__:
500 :....W.._160^Software Plus Inc.^^600 11012 Macleod :
550 :S^^Calgary^Alberta^^(000)-278-4082^^__....g.._60^R:
600 :etail Business Systems Inc^^1200 736 6 Avenue SW^^:
650 :Calgary^Alberta^T2P 0T7^(000)-233-8705^^__....g.._:
700 :8^Digicore Software Systems Inc.^^Box 8656 Station:
750 : F^^Calgary^Alberta^T2J 5S4^(000)-251-7781^Y^_....:
800 :W.._74^Software Centre^^839 6 Avenue SW^^Calgary^A:
850 :lberta^T2P 0V3^(000)-269-6626^Y^__....c.._88^Bethe:
900 :l Software Inc.^^259 Edgeland Road N.W.^^Calgary^A:
950 :lberta^T3A 2Z2^(000)-239-2214^Y^__....k.._111^Ghos:


While the above is not XML all, it's actually delimited in the same fashion,
but that 25 year old mainframe uses custom delimiters. (4 of them).

Once you know what that the
delimiters are, then it's pretty easy to parse out the above data into
records + fields. Excel, or ms-access can't even BEGIN to import the above,
yet the data not that hard to parse out.
3) I would post some sample data here but it is confidential information
and
I would be compromising my client giving anyone else this information

Ok, that makes sense. However, you don't have to post the actual data if it
is senstive. You simply
enter some test data into that system, or simply in your editor
replace the data. You can simply replace all letters (alpha) to the letter
a, and then all numbers to 0. In fact you could write a few lines of code to
do this replacing for you, and presto you have some data that shows the
format, but does not compromise your clients information.

For example, here's the same data from above posted, and the only thing I've
replaced as the letters and numbers


aaa: 0000 : 0 0000 0 0 ( 0AA0 : 0 0AAA 0 0 )
000 :....].._00^Aaaaaaaaaa Aaaaaaa^^0000 Aaaaaaa Aa AA^:
000 :^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^A^__....a.:
000 :._000^Aaaaa Aaaaaaaaaaa Aaa.^^0 Aaaaa 000 0 Aaaaaa:
000 : A.A.^^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^A^__:
000 :....a.._00^Aaaaaaaa Aaaaaaaaa Aaaaa^^0000 00 Aaaaa:
000 :a AA^^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^A^_..:
000 :..a.._000^Aaaaaaa Aaaaaaaaaaaa Aaa.^^000 0000 00 A:
000 :aaaaa A.A.^^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000:
000 :^A^_...._.._00^Aaaaaaaa Aaaaaaaa^^0000 Aaaaaaa Aaa:
000 :aa AA^^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^A^__:
000 :....A.._000^Aaaaaaaa Aaaa Aaa.^^000 00000 Aaaaaaa :
000 :A^^Aaaaaaa^Aaaaaaa^^(000)-000-0000^^__....a.._00^A:
000 :aaaaa Aaaaaaaa Aaaaaaa Aaa^^0000 000 0 Aaaaaa AA^^:
000 :Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^^__....a.._:
000 :0^Aaaaaaaa Aaaaaaaa Aaaaaaa Aaa.^^Aaa 0000 Aaaaaaa:
000 : A^^Aaaaaaa^Aaaaaaa^A0A 0A0^(000)-000-0000^A^_....:
000 :A.._00^Aaaaaaaa Aaaaaa^^000 0 Aaaaaa AA^^Aaaaaaa^A:
000 :aaaaaa^A0A 0A0^(000)-000-0000^A^__....a.._00^Aaaaa:
000 :a Aaaaaaaa Aaa.^^000 Aaaaaaaa Aaaa A.A.^^Aaaaaaa^A:
000 :aaaaaa^A0A 0A0^(000)-000-0000^A^__....a.._000^Aaaa:

Once again, not a huge deal. and once again, posting this information might
not help anyone here.

However, it sure is simple way to protect the data.

Lets not use every little issue as a roadblock that holds back possbile
solutions here. Lets do a "end run" around the problems we encounter and
find a solution here! That's how this whole industry works! Most people will
say hey we have a security issue, and I will say just replace the letters
and move onwards!

You mentioning and hoping that somebody might come along and say, hey I know
what that format is. Every small detail can help here.
4) it is data provided to my client via the internet, but the documents
are
not xml or html.

OK, now we're starting to get some progress here. Now we are talking about
potential documents, and perhaps not information produced in a database
format? This kind of information makes it even more imperative to find out
how the data is created. Perahps this data comes is from some type of
document system?

You could save yourself enormous amount of processing and work of
deciphering this if you ascertain what the document system that creaates
this stuff (and perhaps it might not help at all).

Perhaps the software that makes this stuff has some kind of free reader, and
you not need to write any custom code at all?

I most certainly apologize to you if I sound a little bit condescending, but
I guess I reacted in such a way that it seems so obvious to learn a little
bit more about the system that creates this information.

If at the end of the day you don't have the ability to get additional
information about this system, then you don't have that option. Without
that option, then you'll have to try other approaches, and
that very well seems what you doing now.

Sorry I could not be of more help here...
 
Back
Top