Is this possible with Reg Expressions?

  • Thread starter Thread starter Kyle
  • Start date Start date
K

Kyle

Hello, I am trying to replace all incidences of (Pattern1) NOT
immediately followed by (Pattern2) with ReplaceText. Is there any way
this can be done with RegExps? I can't figure it out.

What I've got is a log file where the first characters of every row is
a datetime stamp in a fixed format. However, each line of the log file
contains SQL and every now and then the SQL itself contains a line
break, which is making programmatically loading the log file into a
database difficult.

So, I'd like to replace all instances of \r\n not immediately followed
by a timestamp with a space.

I can replace all {\r\n followed by a timestamp} with {some other
special character followed by the timestamp}, and then replace all
{\r\n} with {space}, and then replace {special character} with {\r\n}.
But this is wildly inefficient considering these files have millions of
rows and I'd be replacing millions of instances then dozens of
instances then millions of instances, whereas I'd like to just do a
single replacement of dozens of instances.
 
I believe you are looking in the wrong place.

Let me explain.

It appears that you are trying to load a log file into a database table, but
you are having problems because you have line breaks inside the data of the
fields which make it difficult to tell where a LOG LINE ends.

If my assumption is correct, then I would suggest you look into ways of
recognizing a LINE rather than trying to recognize the line terminating
characters.

Think about it, it seems to me that the pattern you are looking for would be
something that describes a line, such as:

TimeStamp<ETC ETC ETC>\r\n

As you can see, ETC may include anything (even linebreaks).

So come up with a regexp that matches that and then play with the multiline
option.

I hope that helps.

Regards,

J.R.
 
Thanks. There are timestamps within the lines, so searching for
timestamp{etc}\r\n is not adequate, it really needs to be \r\n{Not
timestamp}

But, due to issues related to loading such large files entirely into
memory and operating on an enormous string, I've had to process
line-by-line in the file anyway. This is disappointing from a
performance perspective, but makes determining whether a line is valid
or an aritificial line break and really part of the previous line quite
easy, so the problem is solved.

Thanks for the assistance.
 
Kyle said:
Hello, I am trying to replace all incidences of (Pattern1) NOT
immediately followed by (Pattern2) with ReplaceText. Is there any way
this can be done with RegExps? I can't figure it out.

What I've got is a log file where the first characters of every row is
a datetime stamp in a fixed format. However, each line of the log file
contains SQL and every now and then the SQL itself contains a line
break, which is making programmatically loading the log file into a
database difficult.

So, I'd like to replace all instances of \r\n not immediately followed
by a timestamp with a space.
<snip>

Probably there's a RegEx that will do the replace for you, but I guess
a more immediate solution can be a simple parsing, something in the
likes of:

<aircode>
Sub SaveLog(ByVal SourceFile As String, _
ByVal DestFile As String)
Dim S As New System.Text.StringBuilder
Dim Out As New System.Text.StringBuilder

For Each Line As String _
In System.IO.File.ReadAllLines(SourceFile)

'The IsTimeStamp function must be code by you, of course
If IsTimeStamp(Line) Then

If S.Length > 0 Then
Out.AppendLine(S.ToString)
S = New System.Text.StringBuilder
End If

Else
S.Append(" "c)
End If
S.Append(Line)
Next

Out.AppendLine(S.ToString)
System.IO.File.WriteAllText(DestFile, Out.ToString)

End Sub
</aircode>

Of course, you may want to read the text a line at a time from a
textreader, instead of reading all of it from the source file.
Likewise, you may prefer writing the lines one at a time to a
textwriter instead of saving all of them in the Out stringbuilder.

HTH.

Regards,

Branco.
 
Back
Top