Need help with parsing a multilined log file into objects

  • Thread starter Thread starter Paulers
  • Start date Start date
P

Paulers

Hello,

I have a log file that contains many multi-line messages. What is the
best approach to take for extracting data out of each message and
populating object properties to be stored in an ArrayList? I have tried
looping through the logfile using regex, if statements and flags to
find the start and end of each message but I do not see a good time in
this process to create a new instance of my Message object. While
messing around with it I tried to create a new instance in different
places of the loops but when I try to populate it I can not access the
object from within some if statements but not in others. I am wondering
if there is a better approach perhaps one that is more abstract and
takes advantage of OOP. Has anyone done something like this before, if
so I'd love to hear about your approach.

thanks
 
I have a log file that contains many multi-line messages. What is the
best approach to take for extracting data out of each message and
populating object properties to be stored in an ArrayList? I have tried
looping through the logfile using regex, if statements and flags to
find the start and end of each message but I do not see a good time in
this process to create a new instance of my Message object. While
messing around with it I tried to create a new instance in different
places of the loops but when I try to populate it I can not access the
object from within some if statements but not in others. I am wondering
if there is a better approach perhaps one that is more abstract and
takes advantage of OOP. Has anyone done something like this before, if
so I'd love to hear about your approach.

Do you have a known delimiter? How do you determine one row from another?
 
What is the message delimiter and what is the field delimiter?

In addition, please post a sample of the logfile and the definition of the
target object.
 
The type of messages I want to extract from the log look like this
(type 101's) but not all messages are in the same format. I would like
to extract the values and populate an opject with the same properties
and store them in a collection that I can iterate through. I just do
not know how to get the values for each of these type 101 messages into
their own objects.

03:34:06 server12 Trace: [ 384]abox1->bbox1: Control Message (=
Message Type 101); Message Length 921 bytes
Get Call (= Subtype 9); DialogueID: (2007) 000007d7;
SendSeqNo: (1)00000001
Trunk Group ID: (1) 00000001
Trunk Number: 3
Service ID: (0) 00000000
Dialed Number: INTELLI_CARE
ANI: 1111111111
Called Number: 8000025225
DNIS: 8000025225
 
To recap:

Does the data in the log file look EXACTLY like that?

Is there actually a newline betewen 'Control Message (=' and 'Message
Type'?

Does the timestamp, ('03:34:06') mark the start of the message?

Does the timestamp, ('03:34:06') mark the start of EVERY message?

Is the timestamp ALWAYS in the SAME format?

Please post the definition of the target object.


Paulers said:
The type of messages I want to extract from the log look like this
(type 101's) but not all messages are in the same format. I would like
to extract the values and populate an opject with the same properties
and store them in a collection that I can iterate through. I just do
not know how to get the values for each of these type 101 messages into
their own objects.

03:34:06 server12 Trace: [ 384]abox1->bbox1: Control Message (=
Message Type 101); Message Length 921 bytes
Get Call (= Subtype 9); DialogueID: (2007) 000007d7;
SendSeqNo: (1)00000001
Trunk Group ID: (1) 00000001
Trunk Number: 3
Service ID: (0) 00000000
Dialed Number: INTELLI_CARE
ANI: 1111111111
Called Number: 8000025225
DNIS: 8000025225

Stephany said:
What is the message delimiter and what is the field delimiter?

In addition, please post a sample of the logfile and the definition of
the
target object.
 
Yes I pasted this straight from the log file. all messages begin with a
timestamp. The timestamp is always in the same format
(\d{2}:\d{2}:\d{2})

Im sorry I dont understand what you mean by definition of the target
object. I basically have a class with properties set and getters that I
would like to populate for each object. Is that what you would like to
see?

thanks for your help! :)
Stephany said:
To recap:

Does the data in the log file look EXACTLY like that?

Is there actually a newline betewen 'Control Message (=' and 'Message
Type'?

Does the timestamp, ('03:34:06') mark the start of the message?

Does the timestamp, ('03:34:06') mark the start of EVERY message?

Is the timestamp ALWAYS in the SAME format?

Please post the definition of the target object.


Paulers said:
The type of messages I want to extract from the log look like this
(type 101's) but not all messages are in the same format. I would like
to extract the values and populate an opject with the same properties
and store them in a collection that I can iterate through. I just do
not know how to get the values for each of these type 101 messages into
their own objects.

03:34:06 server12 Trace: [ 384]abox1->bbox1: Control Message (=
Message Type 101); Message Length 921 bytes
Get Call (= Subtype 9); DialogueID: (2007) 000007d7;
SendSeqNo: (1)00000001
Trunk Group ID: (1) 00000001
Trunk Number: 3
Service ID: (0) 00000000
Dialed Number: INTELLI_CARE
ANI: 1111111111
Called Number: 8000025225
DNIS: 8000025225

Stephany said:
What is the message delimiter and what is the field delimiter?

In addition, please post a sample of the logfile and the definition of
the
target object.


Hello,

I have a log file that contains many multi-line messages. What is the
best approach to take for extracting data out of each message and
populating object properties to be stored in an ArrayList? I have tried
looping through the logfile using regex, if statements and flags to
find the start and end of each message but I do not see a good time in
this process to create a new instance of my Message object. While
messing around with it I tried to create a new instance in different
places of the loops but when I try to populate it I can not access the
object from within some if statements but not in others. I am wondering
if there is a better approach perhaps one that is more abstract and
takes advantage of OOP. Has anyone done something like this before, if
so I'd love to hear about your approach.

thanks
 
I would do something like the following.



1) Read through the entire log file and look for the start
of each section (each group of lines that start with your time stamp.

2) Read each of these sections in to an array list.
ArrayList1

3) Add each individual array list into another array list
ArrayList2 (you now have an array list of array lists)

4) Iterate through each element of ArrayList1 and grab
ArrayList2 then pass that through to another function that parses the data
from that one segment.

5) Each segment can be gone through line by line and be
taken apart for its individual sections using a mixture of String.Split,
String.Substring, etc.

6) This parsing function would return a new instance of
your data structure with all fields populated as needed.



Keep in mind that this is simpler as long as each line in each section is
always in the same format and always in the same order. If they are not in
the same order (I tend to think they would be since this appears to be a CDR
log form a PBX) you would need to perhaps use a regex string to match the
line up with the right parsing code.



All this is not hard, just tricky and takes some work.



Good luck. I had to do some of this just a short while ago and it was very
interesting.



Paulers said:
The type of messages I want to extract from the log look like this
(type 101's) but not all messages are in the same format. I would like
to extract the values and populate an opject with the same properties
and store them in a collection that I can iterate through. I just do
not know how to get the values for each of these type 101 messages into
their own objects.

03:34:06 server12 Trace: [ 384]abox1->bbox1: Control Message (=
Message Type 101); Message Length 921 bytes
Get Call (= Subtype 9); DialogueID: (2007) 000007d7;
SendSeqNo: (1)00000001
Trunk Group ID: (1) 00000001
Trunk Number: 3
Service ID: (0) 00000000
Dialed Number: INTELLI_CARE
ANI: 1111111111
Called Number: 8000025225
DNIS: 8000025225

Stephany said:
What is the message delimiter and what is the field delimiter?

In addition, please post a sample of the logfile and the definition of
the
target object.
 
Yes! - Any instance of that class is your target object.


Paulers said:
Yes I pasted this straight from the log file. all messages begin with a
timestamp. The timestamp is always in the same format
(\d{2}:\d{2}:\d{2})

Im sorry I dont understand what you mean by definition of the target
object. I basically have a class with properties set and getters that I
would like to populate for each object. Is that what you would like to
see?

thanks for your help! :)
Stephany said:
To recap:

Does the data in the log file look EXACTLY like that?

Is there actually a newline betewen 'Control Message (=' and 'Message
Type'?

Does the timestamp, ('03:34:06') mark the start of the message?

Does the timestamp, ('03:34:06') mark the start of EVERY message?

Is the timestamp ALWAYS in the SAME format?

Please post the definition of the target object.


Paulers said:
The type of messages I want to extract from the log look like this
(type 101's) but not all messages are in the same format. I would like
to extract the values and populate an opject with the same properties
and store them in a collection that I can iterate through. I just do
not know how to get the values for each of these type 101 messages into
their own objects.

03:34:06 server12 Trace: [ 384]abox1->bbox1: Control Message (=
Message Type 101); Message Length 921 bytes
Get Call (= Subtype 9); DialogueID: (2007) 000007d7;
SendSeqNo: (1)00000001
Trunk Group ID: (1) 00000001
Trunk Number: 3
Service ID: (0) 00000000
Dialed Number: INTELLI_CARE
ANI: 1111111111
Called Number: 8000025225
DNIS: 8000025225

Stephany Young wrote:
What is the message delimiter and what is the field delimiter?

In addition, please post a sample of the logfile and the definition of
the
target object.


Hello,

I have a log file that contains many multi-line messages. What is
the
best approach to take for extracting data out of each message and
populating object properties to be stored in an ArrayList? I have
tried
looping through the logfile using regex, if statements and flags to
find the start and end of each message but I do not see a good time
in
this process to create a new instance of my Message object. While
messing around with it I tried to create a new instance in different
places of the loops but when I try to populate it I can not access
the
object from within some if statements but not in others. I am
wondering
if there is a better approach perhaps one that is more abstract and
takes advantage of OOP. Has anyone done something like this before,
if
so I'd love to hear about your approach.

thanks
 
Paulers,

How many people need in a day this information. That shoulld be in my idea
the base of your decission.

If it is OOP, Poop or whatever is less important.

Just my thought,

Cor


Paulers said:
The type of messages I want to extract from the log look like this
(type 101's) but not all messages are in the same format. I would like
to extract the values and populate an opject with the same properties
and store them in a collection that I can iterate through. I just do
not know how to get the values for each of these type 101 messages into
their own objects.

03:34:06 server12 Trace: [ 384]abox1->bbox1: Control Message (=
Message Type 101); Message Length 921 bytes
Get Call (= Subtype 9); DialogueID: (2007) 000007d7;
SendSeqNo: (1)00000001
Trunk Group ID: (1) 00000001
Trunk Number: 3
Service ID: (0) 00000000
Dialed Number: INTELLI_CARE
ANI: 1111111111
Called Number: 8000025225
DNIS: 8000025225

Stephany said:
What is the message delimiter and what is the field delimiter?

In addition, please post a sample of the logfile and the definition of
the
target object.
 
Thanks for the wonderful advice I really appreciate it. I was wondering
if you had any pointers on how to extract just the lines of the
messages that I need so I can add them to the arraylist. I can loop
through file and grab all the first lines with the time stamp with a
regular expression but what is the though process behind obtaining the
rest of the lines of the message I am parsing without grabbing lines of
the next message?

thanks!

Ray said:
I would do something like the following.



1) Read through the entire log file and look for the start
of each section (each group of lines that start with your time stamp.

2) Read each of these sections in to an array list.
ArrayList1

3) Add each individual array list into another array list
ArrayList2 (you now have an array list of array lists)

4) Iterate through each element of ArrayList1 and grab
ArrayList2 then pass that through to another function that parses the data
from that one segment.

5) Each segment can be gone through line by line and be
taken apart for its individual sections using a mixture of String.Split,
String.Substring, etc.

6) This parsing function would return a new instance of
your data structure with all fields populated as needed.



Keep in mind that this is simpler as long as each line in each section is
always in the same format and always in the same order. If they are not in
the same order (I tend to think they would be since this appears to be a CDR
log form a PBX) you would need to perhaps use a regex string to match the
line up with the right parsing code.



All this is not hard, just tricky and takes some work.



Good luck. I had to do some of this just a short while ago and it was very
interesting.



Paulers said:
The type of messages I want to extract from the log look like this
(type 101's) but not all messages are in the same format. I would like
to extract the values and populate an opject with the same properties
and store them in a collection that I can iterate through. I just do
not know how to get the values for each of these type 101 messages into
their own objects.

03:34:06 server12 Trace: [ 384]abox1->bbox1: Control Message (=
Message Type 101); Message Length 921 bytes
Get Call (= Subtype 9); DialogueID: (2007) 000007d7;
SendSeqNo: (1)00000001
Trunk Group ID: (1) 00000001
Trunk Number: 3
Service ID: (0) 00000000
Dialed Number: INTELLI_CARE
ANI: 1111111111
Called Number: 8000025225
DNIS: 8000025225

Stephany said:
What is the message delimiter and what is the field delimiter?

In addition, please post a sample of the logfile and the definition of
the
target object.


Hello,

I have a log file that contains many multi-line messages. What is the
best approach to take for extracting data out of each message and
populating object properties to be stored in an ArrayList? I have tried
looping through the logfile using regex, if statements and flags to
find the start and end of each message but I do not see a good time in
this process to create a new instance of my Message object. While
messing around with it I tried to create a new instance in different
places of the loops but when I try to populate it I can not access the
object from within some if statements but not in others. I am wondering
if there is a better approach perhaps one that is more abstract and
takes advantage of OOP. Has anyone done something like this before, if
so I'd love to hear about your approach.

thanks
 
Back
Top