Need to split large text field

  • Thread starter Thread starter Calgarychris
  • Start date Start date
C

Calgarychris

Hi there,

I'm working in a spreadsheet that has a column of data that I need to split
apart. I was going to use text to columns, but unfortunately, the column is
missing delimiting characters that would allow me to split the data nicely.
Using find and replace to add in delimiting characters doesn't work as it
comes up with the error "formula too long". I tried to create a macro that
would add in the delimiting characters, but the results don't appear to be
consistent (although there is no error message). An example of the column is
here:

Lead Role: Bank of ABC(33.33%), DEF Bank(33.33%), Bank of GHI(33.33%)
Coordinator: Bank of GHI(33.33) Security agent: Bank of QRS(33.33) Facility
agent: TUV Bank(33.33)

The number of banks in each category is variable, as is the number of
categories (i.e. "titles") listed. I had thought if I could split the data
so that columns R-?? were lead banks (there would be lots of blanks) and then
columns XX-YY were Coordinator banks etc then I could manipulate the data as
required. There are probably 10-12 "titles"

As I also need to extract the percentage being used in each case, I was
thinking if the macro looped through each row putting that one column into an
array, checking for the various "titles" I needed and outputing the results
onto a seperate sheet, that would work best but I'm terribly unsure of array
code...

Does anyone have any idea of how I might tackle this problem?

Thanks very much!

Chris
 
Hi Chris,

I am not sure if the data example represents only one cell in your data. If
so, then you could use Text to Columns 3 times.

Ensure you have a backup of your data before attempting this.

Initially use Text to columns with right parenthesis bracket as the
delimiter. You will finish up with cells containing something like the
following but they will be across the page not one under the other:-

Lead Role: Bank of ABC(33.33%
, DEF Bank(33.33%
, Bank of GHI(33.33%
Coordinator: Bank of GHI(33.33
Security agent: Bank of QRS(33.33

Insert an additional column after each column of percentage data.

Use Text to columns again on each of the columns but this time use the left
bracket as the delimiter. You now have the percentage data separated into a
separate column something like this:-

Lead Role: Bank of ABC 33.33%

and repeated across the page for other records.

Insert additional columns again but this time to the left of each of the
percentage columns.

Use Text to columns again but this time use the colon as the delimiter and
repeat for each column still to be separated.

Select the data and using Find and Replace, replace the comma and space with
nothing. ie Insert a comma and a space in the Find field and leave the
replace field blank.

If I have not interpreted your request properly, then please provide an
example of how you want the data after it is split. I realize that you need
it across the page but list them one under the other so that it can be
correctly interpreted on the post.
 
I think it would help us, using the example text you posted, if you showed
us how the text would look after being split apart... that is, what parts of
the text go in which columns.
 
Hi OssieMac,

Thanks for the reply - unfortunately the brackets are only in some of the
cells and usually only for some of the "roles" or "titles". The data I
presented is one cell's worth - there's roughly 8K rows. In terms of laying
out the data afterwards, I had envisioned:

First Column
Lead Role: First Bank (30%), Second Bank (20%) etc.

Second Column:
Mandated Arranger: Second Bank, Third Bank, Fourth Bank, etc.

Third Column:
sub-underwriter: Eighth Bank, Ninth Bank etc

The number of columns would represent the total number of roles I have
(roughly 10 - 12 I believe just by eyeballing the data). Each row may or may
not have each role and each row would have a different number of banks in
each role...

I hope I'm clarifying this rather than confusing things even more...

Thanks again,
Chris
 
How does your "envisioned layout" coordinate with the text you posted
originally (you apparently changed terms)? What you need to keep in mind
when asking questions on newsgroups is that we know nothing about your
business model except what you tell us. That means you can't take for
granted that we understand the terms you use and how they relate to each
other. In your text example, you used the term "Coordinator" where I think
you are now using "Mandated Arranger"... does that mean the second term in
front of a colon is not fixed text? If that is the case, will it always be
one word or can it be one or more words? Same question for each term in
front of a colon. I'm assuming that the text does not have "hard returns"
(line feeds) between them, correct? Anything you can tell us about the
"pattern" of the text would be helpful.
 
Sorry Rick, obviously I'm a newb...

Each cell in this column has a variable number of titles (Lead Role,
Coordinator, Bookrunner, Facility agent, Lead Bank, Manager, Senior Manager,
etc etc etc) - they are not always listed in order, but are always spelt the
same. There are no carriage returns in the data

After each title there is a colon and then a list of banks, the names and
quantities of banks are both variable.

What I would like is to put the title and all the banks that fall into that
title into a column to the left of the original column.

So assuming the data is in Column A then column B would look like this:

<Title1>: Bank1, Bank2, Bank3

Column C

<Title2>: Bank1, Bank54, Bank23

Column D

<Title3>: Bank12, Bank20, Bank9

I don't know how to post a sample of the data, or I would...

Thanks
Chris
 
That can be done, although it does not seem to be an efficient way to parse the
data.

But, for the sake of your question, here is one way to split the original text
as you wish, and place the title and list of banks in each adjacent cell.

The routine depends CRITICALLY on the spelling of the various titles being the
same (as you wrote) and also on your having a *LIST* of what they are. (There
are ways to parse out the names without a list, provided they are always
preceded by something like (nn.nn%), but I will leave that for now; as I
suspect there will be changes in your specifications and requirements as we go
along.

In any event, the below macro will take your sample data and parse it out as
you have specified:

To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this Macro (Sub), first select the range of cells you wish to have split
up into adjacent columns. Then <alt-F8> opens the macro dialog box. Select the
macro by name, and <RUN>.

Make sure the line that starts aTitles = is all on one line. Most newsreaders
will split it over two lines.

====================================
Option Explicit
Sub BankInfo()
Dim c As Range, rg As Range
Const MaxNumTitles As Long = 12
Dim aTitles() As Variant
Dim SplitArray() As Variant
Dim aBanks As Variant
Dim re As Object, mc As Object, m As Object
Dim i As Long, j As Long, p As Long
Dim s As String

'note the line below needs to be all on one line
aTitles = VBA.Array("Lead Role", "Coordinator", "Bookrunner", "Facility
agent", "Lead Bank", "Manager", "Senior Manager", "Security Agent", "etc")

Set re = CreateObject("vbscript.regexp")
With re
.Global = True
.IgnoreCase = True
.Pattern = "\b(" & Join(aTitles, "|") & "):"
End With

Set rg = Selection 'or whatever
For Each c In rg
With c
Range(.Offset(0, 1), .Offset(0, MaxNumTitles)).Clear
s = .Value
Set mc = re.Execute(s)
ReDim SplitArray(0 To mc.Count)
i = 0: p = 0
For Each m In mc
SplitArray(i) = Mid(s, p + 1, m.FirstIndex - p)
p = m.FirstIndex + m.Length
i = i + 1
Next m
SplitArray(i) = Mid(s, p + 1)
For i = 1 To mc.Count
With .Offset(0, i)
.Value = mc(i - 1) & " " & SplitArray(i)
.Columns.AutoFit
End With
Next i
End With
Next c
Set re = Nothing
End Sub
=============================
--ron
 
The pattern from your first sample seems to be that between banks you have a
close parens and a comma ")," and inbetween banks and roles, you just have
close parens but no comma ") ", so that would be the key delimiter.

Also, Do you really want these in sequential columns, or would there be a
benefit to having all of the information for a particular job title in the
same column, e.g.:

A B C D
Source Bookrunner Lead Bank Manager
Etc
(long string) B1(x%),B2(x%) B1(x%),B4(x%)
(long string) B6(x%),B7(x%) B6(x%),B8(x%) B7(x%)

If you needed (or would benefit from) these arranged by job title, the
approach might be a little different.
 
If you do want to put each job into it's own column, here are a few formulas
to help you get started. Assumes your data starts in A2 (e.g. A2:A8000)
(1) Put the job titles (e.g., "Lead Role:", etc) in the column headers (B1,
C1, etc)
(2) Find the location that role occurs in your source string (put in cell B2
for now)
=FIND(B1,$A2)
(3) Find the location of the next break (put in cell B3 for now)
=FIND(") ",$A2,B2)
(4) put those strings together (put in cell B4 for now)
=MID($A2,B2,B3)
(5) To save space, combine all of this into one formula in B2:
=MID($A2,FIND(B1,$A2),FIND(") ",$A2,FIND(B1,$A2)))
now you can clear out the other cells we used above- those were just to show
you how it all works. You should be able to autofill this formula across your
entire worksheet.

Again, this puts each role in it's own column, so you would have blanks in
each row when a role isn't in the source data.

HTH,
Keith
 
Sorry Rick, obviously I'm a newb...

Each cell in this column has a variable number of titles (Lead Role,
Coordinator, Bookrunner, Facility agent, Lead Bank, Manager, Senior Manager,
etc etc etc) - they are not always listed in order, but are always spelt the
same. There are no carriage returns in the data

After each title there is a colon and then a list of banks, the names and
quantities of banks are both variable.

What I would like is to put the title and all the banks that fall into that
title into a column to the left of the original column.

So assuming the data is in Column A then column B would look like this:

<Title1>: Bank1, Bank2, Bank3

Column C

<Title2>: Bank1, Bank54, Bank23

Column D

<Title3>: Bank12, Bank20, Bank9

I don't know how to post a sample of the data, or I would...

Thanks
Chris


And here is another routine which does NOT require knowing the names of the
Titles, but does require that each title is preceded by a ")" as in your
examples:

=====================
Option Explicit
Sub BankInfo()
Dim c As Range, rg As Range
Const MaxNumTitles As Long = 12
Dim re As Object, mc As Object, m As Object
Dim i As Long, j As Long
Dim s As String

Set rg = Selection
Set re = CreateObject("vbscript.regexp")
With re
.Global = True
.Pattern = "\b([^):]+:)([\s\S]*?)(?=\b([^):]+:)|$)"
End With

For Each c In rg
With c
s = .Value
If re.Test(s) = True Then
Set mc = re.Execute(s)
i = 0
For Each m In mc
With .Offset(0, i + 1)
.Value = m.SubMatches(0) & " " & _
m.SubMatches(1)
.Columns.AutoFit
End With
i = i + 1
Next m
End If
End With
Next c

Set re = Nothing
End Sub
======================================

--ron
 
Ron,

Thanks very much, this code (whatever it's doing!) works beautifully. I
amended the MaxNumTitles to 14 and added a few titles to the list, but other
than that it produces exactly what I was looking for (any surprise is because
you were able to follow my convoluted question, not the fact that your code
works!). My co-worker tells me that although there are now multiple banks
listed after each title, that we can split that data apart...we'll see :)

Thank you again!
Chris
 
Hi Ker_01,

Unfortunately the brackets around % only appears in some of the cells, not
all. I think what Rick wrote in his previous post has it licked though...

Thanks!

Chris
 
Ron,

Thanks very much, this code (whatever it's doing!) works beautifully. I
amended the MaxNumTitles to 14 and added a few titles to the list, but other
than that it produces exactly what I was looking for (any surprise is because
you were able to follow my convoluted question, not the fact that your code
works!). My co-worker tells me that although there are now multiple banks
listed after each title, that we can split that data apart...we'll see :)

Thank you again!
Chris

I'm glad it's working for you.

Thanks for the feedback.

The code could certainly be modified to split the individual banks (and even
the percentages after), but one would need to know more about the set up of the
input and desired output in order to do that effectively.

Perhaps you might even want to have a separate worksheet for each item.

In any event, I was glad to help. It was an interesting exercise.
--ron
 
Back
Top