recommendation for parsing string

  • Thread starter Thread starter awrightus
  • Start date Start date
A

awrightus

Not looking for exact syntax, just a recommendation. Using VB Express
2008. I have a string that's X characters in length (perhaps
2000-4000 characters or so, essentially sentences). I need to insert
a carriage return (vbCrLf) at no more than the 69th character. If the
69th character is a space, then that's the simplest scenario. But if
the 69th character is in the middle of a word, I need to search back
until I find a space and then insert the vbCrLf. I know how to use
substring() to search through the string for spaces, but I'm getting a
little lost on an approach to subsequently inject that carriage return
at every subsequent 69th character (or backing up when it's in a
middle of a word). Just wondering if someone could give me a
suggestion as to the functions I need to be looking at. Thanks for
any insight.
 
Not looking for exact syntax, just a recommendation. Using VB Express
2008. I have a string that's X characters in length (perhaps
2000-4000 characters or so, essentially sentences). I need to insert
a carriage return (vbCrLf) at no more than the 69th character. If the
69th character is a space, then that's the simplest scenario. But if
the 69th character is in the middle of a word, I need to search back
until I find a space and then insert the vbCrLf. I know how to use
substring() to search through the string for spaces, but I'm getting a
little lost on an approach to subsequently inject that carriage return
at every subsequent 69th character (or backing up when it's in a
middle of a word). Just wondering if someone could give me a
suggestion as to the functions I need to be looking at. Thanks for
any insight.

Do you want to split the whole string to pieces or just the first 69
characters and ignore the rest?

-Teemu
 
Do you want to split the whole string to pieces or just the first 69
characters and ignore the rest?

 -Teemu

No, I need to breakup the entire string.

Steve
 
Not looking for exact syntax, just a recommendation. Using VB Express
2008. I have a string that's X characters in length (perhaps
2000-4000 characters or so, essentially sentences). I need to insert
a carriage return (vbCrLf) at no more than the 69th character. If the
69th character is a space, then that's the simplest scenario. But if
the 69th character is in the middle of a word, I need to search back
until I find a space and then insert the vbCrLf. I know how to use
substring() to search through the string for spaces, but I'm getting a
little lost on an approach to subsequently inject that carriage return
at every subsequent 69th character (or backing up when it's in a
middle of a word). Just wondering if someone could give me a
suggestion as to the functions I need to be looking at. Thanks for
any insight.

You could use a regular expression to split the string into lines, then
use a StringBuilder to put them together with line breaks between them:

MatchCollection lines = Regex.Matches(text, @"(.{1,69})(?: |$)");
StringBuilder builder = new StringBuilder();
foreach (Match line in lines) builder.AppendLine(line.Value);
 
You could use a regular expression to split the string into lines, then
use a StringBuilder to put them together with line breaks between them:

MatchCollection lines = Regex.Matches(text, @"(.{1,69})(?: |$)");
StringBuilder builder = new StringBuilder();
foreach (Match line in lines) builder.AppendLine(line.Value);

That's a very cool solution. Here is the same code in VB:

Dim lines As MatchCollection = Regex.Matches(TextToSplit,
"(.{1,69})(?: |$)")
Dim builder As New StringBuilder()
For Each k As Match In lines
builder.AppendLine(k.ToString)
Next

I wrote some code but it was over 20 lines so I won't post it here. :-)

-Teemu
 
I wrote some code but it was over 20 lines so I won't post it here. :-)

That regular expression couldn't handle situations when there were no spaces
so maybe I can post my code.

Dim TextToSplit As String =
My.Computer.FileSystem.ReadAllText("c:\example.txt")
TextToSplit = TextToSplit.Replace(Chr(10), "")
TextToSplit = TextToSplit.Replace(Chr(13), " ")
Dim Previous As Integer = 0
Dim x As String = ""

For i As Integer = 69 To TextToSplit.Length - 1 Step 69

' Character is space
If TextToSplit(i) = " " Then
x = TextToSplit.Substring(Previous, i - Previous)
ListBox1.Items.Add(x.Length.ToString + " " + x)
Previous = i + 1
Else

Dim OriginalI As Integer = i
While True

i = i - 1

' No space found
If i = Previous Then
i = OriginalI
x = TextToSplit.Substring(Previous, i - Previous)
ListBox1.Items.Add(x.Length.ToString + " " + x)
Previous = i
Exit While
End If

' Found space
If TextToSplit(i) = " " Then
x = TextToSplit.Substring(Previous, i - Previous)
ListBox1.Items.Add(x.Length.ToString + " " + x)
Previous = i + 1
Exit While
End If
End While
End If

Next

x = TextToSplit.Substring(Previous)
ListBox1.Items.Add(x.Length.ToString + " " + x)

-Teemu
 
I threw this together. Doesn't parse sentences exactly, but separates words...



Public Class Form1

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Dim Words As String = "Now is the time for all good men to come to the aid of their party. "
Words += Words
Words += Words
Dim Word() As String = Split(Words, " ")
Dim Sentence As String = ""
Dim ctr As Integer = 0
For Each w In Word
Dim Last = ctr + 1
ctr += w.Length + 1
If w.IndexOf(".") > -1 Then
Sentence += w + " "
Else
Sentence += w + " "
End If
If ctr > 69 Then
Sentence = Sentence.Substring(1, Last - 1) + vbCrLf
ctr = ctr - Last
Sentence += w + " "
End If
Next
MsgBox(Sentence)
End Sub
End Class


- David
 
awrigh said:
Not looking for exact syntax, just a recommendation.  Using VB Express
2008.  I have a string that's X characters in length (perhaps
2000-4000 characters or so, essentially sentences).  I need to insert
a carriage return (vbCrLf) at no more than the 69th character.  If the
69th character is a space, then that's the simplest scenario.  But if
the 69th character is in the middle of a word, I need to search back
until I find a space and then insert the vbCrLf.  I know how to use
substring() to search through the string for spaces, but I'm getting a
little lost on an approach to subsequently inject that carriage return
at every subsequent 69th character (or backing up when it's in a
middle of a word).  Just wondering if someone could give me a
suggestion as to the functions I need to be looking at.  Thanks for
any insight.

You probably will need Substring() to get chunks of the string and
then look for the space with the LastIndexOf() function.

I know you said you don't need code, but I couldn't resist the fun.
Therefore, here follows a possible solution, which accepts the text to
be broken up, the max line size and a list of acceptable delimiters.
It returns a list of lines, which can then be turned into an array and
joined() using ControlChars.CrLf as "glue".

In your case, it would be called like this:

Dim L As IList(Of String) = Wordwrap(Sample, 69, new Char(){" "c})
Dim R As String = String.Join(ControlChars.CrLf, L.ToArray)

<example>
Function Wordwrap( _
Text As String, _
Size As Integer, _
Delimiters As IList(Of Char) _
) As IList(Of String)

Dim Delims() As Char = Delimiters.ToArray
Dim Result As New List(Of String)

Dim Max As Integer = Text.Length - Size
Dim Pos As Integer = 0
Do While Pos < Max
Dim Chunk As String = Text.Substring(Pos, Size)
If Chunk.Length = Size Then
Dim Break As Integer = Chunk.LastIndexOfAny(Delims) + 1
If Break > 0 AndAlso Break < Size Then
'Found one of the delimiters
Chunk = Chunk.Remove(Break)
End If
End If
Result.Add(Chunk)
Pos += Chunk.Length
Loop
If Pos < Text.Length Then Result.Add(Text.Substring(Pos))
Return Result
End Function
</example>

Hope this helps.

Regards,

Branco.
 
A string is immutable so as you start working with inserting in the string
or whatever it would take time.

I simply would try first the substrings to build a long string with
stringbuilder in a loop.

At the end simple dim x = mystringbuilder.ToString

That whould most probably gives the most performance and the less memory
use.

Cor
 
Teemu said:
That regular expression couldn't handle situations when there were no
spaces

That is true.

This regular expression does:

"(.{1,69})(?: |$)|([^ ]{69})"
 
Back
Top