VBA for document formatting - HELP.

  • Thread starter Thread starter JRC
  • Start date Start date
J

JRC

Dear all:

I am currently working on a rather large document that I need to
format a certain specific way. The document includes nearly 57,000
lines of plain text that need to be indented according to a simple
rule so that it can then be imported into an application as a tab-
delimited file.

My system is a Mac G5 running the latest version of Leopard (not Snow
Leopard) which is 10.5.8 and I have Microsoft Office 2004 installed
(haven't upgraded to 2008 and will probably wait until next version to
upgrade).

Below is an example of the type of file I now have:

Body Regions;A01
Abdomen;A01.047
Abdominal Cavity;A01.047.025
Peritoneum;A01.047.025.600
Douglas' Pouch;A01.047.025.600.225
Mesentery;A01.047.025.600.451
Mesocolon;A01.047.025.600.451.535
Omentum;A01.047.025.600.573
Peritoneal Cavity;A01.047.025.600.678
Peritoneal Stomata;A01.047.025.600.700
Retroperitoneal Space;A01.047.025.750
Abdominal Wall;A01.047.050
Groin;A01.047.365
Inguinal Canal;A01.047.412
Umbilicus;A01.047.849
Back;A01.176
Lumbosacral Region;A01.176.519
Sacrococcygeal Region;A01.176.780
Breast;A01.236
Mammary Glands, Human;A01.236.249

This is just a small portion of the file I am working with. This is a
"flat" text file but what I need is a "hierarchical" or "tree
structure" file where each of these lines is indented with a number
of
"tab keystrokes" corresponding to its level as depicted by the
alphanumeric code following the term. Example:

Body Regions;A01
Abdomen;A01.047
Abdominal Cavity;A01.047.025
Peritoneum;A01.047.025.600
Douglas' Pouch;A01.047.025.600.225
Mesentery;A01.047.025.600.451
Mesocolon;A01.047.025.600.451.535
Omentum;A01.047.025.600.573
Peritoneal Cavity;A01.047.025.600.678
Peritoneal Stomata;A01.047.025.600.700
Retroperitoneal Space;A01.047.025.750
Abdominal Wall;A01.047.050
Groin;A01.047.365
Inguinal Canal;A01.047.412
Umbilicus;A01.047.849
Back;A01.176
Lumbosacral Region;A01.176.519
Sacrococcygeal Region;A01.176.780
Breast;A01.236
Mammary Glands, Human;A01.236.249

As you will notice each line has been indented using the tab key a
number of times corresponding to the size of the alphanumeric code.
Lines with an alphanumeric code containing only 3 characters stay on
the left most position (no tabs). Example:
Body Regions;A01

Lines with an alphanumeric code containing 7 characters (example:
A01.047) are indented with one single tab keystroke. Example:
Abdomen;A01.047

Lines with an alphanumeric code containing 11 characters (example:
A01.047.025) are indented with two tab keystrokes. Example:
Abdominal Cavity;A01.047.025

Lines with an alphanumeric code containing 15 characters (example:
A01.047.025.600) are indented with three tab keystrokes. Example:
Peritoneum;A01.047.025.600

And so on, and so forth.

Once this step is finished and the file has been edited from a flat
text to one with a hierarchical or tree structure using tab entries
(to derive a tab-delimited file) the second part of the project
involves the removal of certain characters and replacement with
others. Basically I need the format to go from the existing:

Abdominal Cavity;A01.047.025

Where the term is followed by a semicolon and alphanumeric code.

To the following:

Abdominal Cavity [A01.047.025]

Where a space is added after the term, the semicolon is removed and
brackets are placed at the beginning and end of the alphanumeric
code.

Considering the size of my document as I stated (little less than
57,000 entries) I would like to find a way to automate the process.

How can these two processes be accomplished with the use of a macro ?

I have never worked with macros before and any help would be extremely
valuable and appreciated. I have both Excel 04 and Word 04 as part of
the Office 2004 package in case the information is relevant. I am not
sure which of the applications would be better suited for this task
but am open to your suggestions.

Also I would appreciate suggestions on textbooks that would cover VBA
for Excel and Word. I would like to find books that would cover
everything from A to Z but with an approach that makes it easy for a
beginner to understand and work with VBA on Microsoft Office for the
Mac. Any
recommendations will be appreciated.

Thank you in advance for your help.


JRC
 
JRC,

I have assumed that your document is a text file. (If it isn't the code can be corrected to account
for that.)

The macro below will read in a file and add the tabs and add the spaces and brackets as you
described.

Run the macro, select the file with the data, and the macro will put the new file named "Output.txt"
into the same folder.

HTH,
Bernie
MS Excel MVP

Sub AddTabsToFile()
Dim FileName As String
Dim FileNumIn As Integer
Dim FileNumOut As Integer
Dim ResultStr As String
Dim myStr As String
Dim myS As Variant
Dim i As Integer

FileName = Application.GetOpenFilename
If FileName = "" Then End

FileNumIn = FreeFile()
Open FileName For Input As FileNumIn

FileNumOut = FreeFile()
Open "Output.txt" For Output Access Write As FileNumOut

Do While Seek(FileNumIn) <= LOF(FileNumIn)
'Store One Line Of Text From File To Variable
Line Input #FileNumIn, ResultStr
myStr = ""
myS = Split(ResultStr, ";")
For i = 1 To (Len(myS(1)) - 3) / 4
myStr = myStr & vbTab
Next i
myStr = myStr & myS(0) & " [" & myS(1) & "]"
Print #FileNumOut, myStr
Loop

Close FileNumIn
Close FileNumOut
End Sub



JRC said:
Dear all:

I am currently working on a rather large document that I need to
format a certain specific way. The document includes nearly 57,000
lines of plain text that need to be indented according to a simple
rule so that it can then be imported into an application as a tab-
delimited file.

My system is a Mac G5 running the latest version of Leopard (not Snow
Leopard) which is 10.5.8 and I have Microsoft Office 2004 installed
(haven't upgraded to 2008 and will probably wait until next version to
upgrade).

Below is an example of the type of file I now have:

Body Regions;A01
Abdomen;A01.047
Abdominal Cavity;A01.047.025
Peritoneum;A01.047.025.600
Douglas' Pouch;A01.047.025.600.225
Mesentery;A01.047.025.600.451
Mesocolon;A01.047.025.600.451.535
Omentum;A01.047.025.600.573
Peritoneal Cavity;A01.047.025.600.678
Peritoneal Stomata;A01.047.025.600.700
Retroperitoneal Space;A01.047.025.750
Abdominal Wall;A01.047.050
Groin;A01.047.365
Inguinal Canal;A01.047.412
Umbilicus;A01.047.849
Back;A01.176
Lumbosacral Region;A01.176.519
Sacrococcygeal Region;A01.176.780
Breast;A01.236
Mammary Glands, Human;A01.236.249

This is just a small portion of the file I am working with. This is a
"flat" text file but what I need is a "hierarchical" or "tree
structure" file where each of these lines is indented with a number
of
"tab keystrokes" corresponding to its level as depicted by the
alphanumeric code following the term. Example:

Body Regions;A01
Abdomen;A01.047
Abdominal Cavity;A01.047.025
Peritoneum;A01.047.025.600
Douglas' Pouch;A01.047.025.600.225
Mesentery;A01.047.025.600.451
Mesocolon;A01.047.025.600.451.535
Omentum;A01.047.025.600.573
Peritoneal Cavity;A01.047.025.600.678
Peritoneal Stomata;A01.047.025.600.700
Retroperitoneal Space;A01.047.025.750
Abdominal Wall;A01.047.050
Groin;A01.047.365
Inguinal Canal;A01.047.412
Umbilicus;A01.047.849
Back;A01.176
Lumbosacral Region;A01.176.519
Sacrococcygeal Region;A01.176.780
Breast;A01.236
Mammary Glands, Human;A01.236.249

As you will notice each line has been indented using the tab key a
number of times corresponding to the size of the alphanumeric code.
Lines with an alphanumeric code containing only 3 characters stay on
the left most position (no tabs). Example:
Body Regions;A01

Lines with an alphanumeric code containing 7 characters (example:
A01.047) are indented with one single tab keystroke. Example:
Abdomen;A01.047

Lines with an alphanumeric code containing 11 characters (example:
A01.047.025) are indented with two tab keystrokes. Example:
Abdominal Cavity;A01.047.025

Lines with an alphanumeric code containing 15 characters (example:
A01.047.025.600) are indented with three tab keystrokes. Example:
Peritoneum;A01.047.025.600

And so on, and so forth.

Once this step is finished and the file has been edited from a flat
text to one with a hierarchical or tree structure using tab entries
(to derive a tab-delimited file) the second part of the project
involves the removal of certain characters and replacement with
others. Basically I need the format to go from the existing:

Abdominal Cavity;A01.047.025

Where the term is followed by a semicolon and alphanumeric code.

To the following:

Abdominal Cavity [A01.047.025]

Where a space is added after the term, the semicolon is removed and
brackets are placed at the beginning and end of the alphanumeric
code.

Considering the size of my document as I stated (little less than
57,000 entries) I would like to find a way to automate the process.

How can these two processes be accomplished with the use of a macro ?

I have never worked with macros before and any help would be extremely
valuable and appreciated. I have both Excel 04 and Word 04 as part of
the Office 2004 package in case the information is relevant. I am not
sure which of the applications would be better suited for this task
but am open to your suggestions.

Also I would appreciate suggestions on textbooks that would cover VBA
for Excel and Word. I would like to find books that would cover
everything from A to Z but with an approach that makes it easy for a
beginner to understand and work with VBA on Microsoft Office for the
Mac. Any
recommendations will be appreciated.

Thank you in advance for your help.


JRC
 
Assuming data is in column A and starts at row 1:
Sub main()
Dim lastRow As Long
lastRow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
For i = 1 To lastRow
Dim strData As String
strData = Sheet1.Cells(i, 1)
Dim strEnding As String
strEnding = Mid(strData, InStr(1, strData, ";") + 1, Len(strData) - InStr(1,
strData, ";"))
For j = 1 To (Len(strEnding) / 4) + 0.25
strData = " " & strData
Next
Sheet1.Cells(i, 2) = Split(strData, ";")(0) & " [" & strEnding & "]"
Next
End Sub

As far as books go, they are great for reference but i have never read one
that did much for training. Coming on here and reading questions and trying
to build answers is about the best you can get.
--
-John http://www.jmbundy.blogspot.com/
Please rate when your question is answered to help us and others know what
is helpful.


JRC said:
Dear all:

I am currently working on a rather large document that I need to
format a certain specific way. The document includes nearly 57,000
lines of plain text that need to be indented according to a simple
rule so that it can then be imported into an application as a tab-
delimited file.

My system is a Mac G5 running the latest version of Leopard (not Snow
Leopard) which is 10.5.8 and I have Microsoft Office 2004 installed
(haven't upgraded to 2008 and will probably wait until next version to
upgrade).

Below is an example of the type of file I now have:

Body Regions;A01
Abdomen;A01.047
Abdominal Cavity;A01.047.025
Peritoneum;A01.047.025.600
Douglas' Pouch;A01.047.025.600.225
Mesentery;A01.047.025.600.451
Mesocolon;A01.047.025.600.451.535
Omentum;A01.047.025.600.573
Peritoneal Cavity;A01.047.025.600.678
Peritoneal Stomata;A01.047.025.600.700
Retroperitoneal Space;A01.047.025.750
Abdominal Wall;A01.047.050
Groin;A01.047.365
Inguinal Canal;A01.047.412
Umbilicus;A01.047.849
Back;A01.176
Lumbosacral Region;A01.176.519
Sacrococcygeal Region;A01.176.780
Breast;A01.236
Mammary Glands, Human;A01.236.249

This is just a small portion of the file I am working with. This is a
"flat" text file but what I need is a "hierarchical" or "tree
structure" file where each of these lines is indented with a number
of
"tab keystrokes" corresponding to its level as depicted by the
alphanumeric code following the term. Example:

Body Regions;A01
Abdomen;A01.047
Abdominal Cavity;A01.047.025
Peritoneum;A01.047.025.600
Douglas' Pouch;A01.047.025.600.225
Mesentery;A01.047.025.600.451
Mesocolon;A01.047.025.600.451.535
Omentum;A01.047.025.600.573
Peritoneal Cavity;A01.047.025.600.678
Peritoneal Stomata;A01.047.025.600.700
Retroperitoneal Space;A01.047.025.750
Abdominal Wall;A01.047.050
Groin;A01.047.365
Inguinal Canal;A01.047.412
Umbilicus;A01.047.849
Back;A01.176
Lumbosacral Region;A01.176.519
Sacrococcygeal Region;A01.176.780
Breast;A01.236
Mammary Glands, Human;A01.236.249

As you will notice each line has been indented using the tab key a
number of times corresponding to the size of the alphanumeric code.
Lines with an alphanumeric code containing only 3 characters stay on
the left most position (no tabs). Example:
Body Regions;A01

Lines with an alphanumeric code containing 7 characters (example:
A01.047) are indented with one single tab keystroke. Example:
Abdomen;A01.047

Lines with an alphanumeric code containing 11 characters (example:
A01.047.025) are indented with two tab keystrokes. Example:
Abdominal Cavity;A01.047.025

Lines with an alphanumeric code containing 15 characters (example:
A01.047.025.600) are indented with three tab keystrokes. Example:
Peritoneum;A01.047.025.600

And so on, and so forth.

Once this step is finished and the file has been edited from a flat
text to one with a hierarchical or tree structure using tab entries
(to derive a tab-delimited file) the second part of the project
involves the removal of certain characters and replacement with
others. Basically I need the format to go from the existing:

Abdominal Cavity;A01.047.025

Where the term is followed by a semicolon and alphanumeric code.

To the following:

Abdominal Cavity [A01.047.025]

Where a space is added after the term, the semicolon is removed and
brackets are placed at the beginning and end of the alphanumeric
code.

Considering the size of my document as I stated (little less than
57,000 entries) I would like to find a way to automate the process.

How can these two processes be accomplished with the use of a macro ?

I have never worked with macros before and any help would be extremely
valuable and appreciated. I have both Excel 04 and Word 04 as part of
the Office 2004 package in case the information is relevant. I am not
sure which of the applications would be better suited for this task
but am open to your suggestions.

Also I would appreciate suggestions on textbooks that would cover VBA
for Excel and Word. I would like to find books that would cover
everything from A to Z but with an approach that makes it easy for a
beginner to understand and work with VBA on Microsoft Office for the
Mac. Any
recommendations will be appreciated.

Thank you in advance for your help.


JRC
.
 
I think Bernie has provided an excellent code solution for you, tighter than
the one I wrote and was going to present, and gets the job done very well.
But I'll put mine up later as an alternative for 2 reasons: it shows a second
solution for the same problem, and also echos the output to the text file on
an Excel worksheet so you can get an idea of what was actually written (tabs
show up as little squares in the worksheet).

One good book for starting to learn VBA is
Visual Basic Programming for the Absolute Beginner, which you can see on
Amazon at
http://www.amazon.com/Visual-Basic-Programming-Absolute-Beginner/dp/0761535535#noop

I have also tried to put together an introductory booklet for those trying
to learn VBA for Excel, and you're welcome to see if it might not add a
little to other books you find on the subject:
http://www.jlathamsite.com/Teach/VBA/ProgrammingInExcelVBA_AnIntroduction.pdf

OK, the code I promised you:

Sub ReadTextFile()
'reads .txt file and places it into an
'Excel worksheet and writes it to a
'tab delimited file named "tdfFile.csv"
'in the same folder with the source .txt file
'
Const semicolon = ";"
Const tabIndicator = "."

Dim sourceFile As String ' the .txt file to be read
Dim buffNum As Integer
Dim rawData As String ' one line of data from the .txt file
Dim splitText As Variant
Dim tabCount As Integer
Dim LC As Integer
Dim rowCount As Long
Dim outputText As String

Dim delimitedFile As String
Dim tdfBuffNum As Integer

sourceFile = Application.GetOpenFilename
If sourceFile = "False" Then
Exit Sub ' user cancelled
End If
'set up the output file
tdfBuffNum = FreeFile()
delimitedFile = "tdfFile.csv"
delimitedFile = Left(sourceFile, InStrRev(sourceFile, _
Application.PathSeparator)) & delimitedFile
Open delimitedFile For Output As #tdfBuffNum
'clear contents of active sheet just so we
'can show what we're doing on it
ActiveSheet.Cells.ClearContents
Cells(1, 1) = "Output written to: " & delimitedFile
rowCount = 2 ' initialize
'start reading the source file
buffNum = FreeFile()
Open sourceFile For Input As #buffNum
Do While Not EOF(buffNum)
Line Input #buffNum, rawData
'line must have a semicolon in it to use
If InStr(rawData, semicolon) > 0 Then
splitText = Split(rawData, semicolon)
'number of tabs needed depends on
'number of periods in the code part.
'Bernie's method is faster
tabCount = 0
For LC = 1 To Len(splitText(1))
If Mid(splitText(1), LC, 1) = tabIndicator Then
tabCount = tabCount + 1
End If
Next
outputText = String(tabCount, vbTab) & _
splitText(0) & " [" & splitText(1) & "]"
Print #tdfBuffNum, outputText
'echo to worksheet
Cells(rowCount, 1) = outputText
rowCount = rowCount + 1
End If
Loop
Close #buffNum
Close #tdfBuffNum
End Sub





JRC said:
Dear all:

I am currently working on a rather large document that I need to
format a certain specific way. The document includes nearly 57,000
lines of plain text that need to be indented according to a simple
rule so that it can then be imported into an application as a tab-
delimited file.

My system is a Mac G5 running the latest version of Leopard (not Snow
Leopard) which is 10.5.8 and I have Microsoft Office 2004 installed
(haven't upgraded to 2008 and will probably wait until next version to
upgrade).

Below is an example of the type of file I now have:

Body Regions;A01
Abdomen;A01.047
Abdominal Cavity;A01.047.025
Peritoneum;A01.047.025.600
Douglas' Pouch;A01.047.025.600.225
Mesentery;A01.047.025.600.451
Mesocolon;A01.047.025.600.451.535
Omentum;A01.047.025.600.573
Peritoneal Cavity;A01.047.025.600.678
Peritoneal Stomata;A01.047.025.600.700
Retroperitoneal Space;A01.047.025.750
Abdominal Wall;A01.047.050
Groin;A01.047.365
Inguinal Canal;A01.047.412
Umbilicus;A01.047.849
Back;A01.176
Lumbosacral Region;A01.176.519
Sacrococcygeal Region;A01.176.780
Breast;A01.236
Mammary Glands, Human;A01.236.249

This is just a small portion of the file I am working with. This is a
"flat" text file but what I need is a "hierarchical" or "tree
structure" file where each of these lines is indented with a number
of
"tab keystrokes" corresponding to its level as depicted by the
alphanumeric code following the term. Example:

Body Regions;A01
Abdomen;A01.047
Abdominal Cavity;A01.047.025
Peritoneum;A01.047.025.600
Douglas' Pouch;A01.047.025.600.225
Mesentery;A01.047.025.600.451
Mesocolon;A01.047.025.600.451.535
Omentum;A01.047.025.600.573
Peritoneal Cavity;A01.047.025.600.678
Peritoneal Stomata;A01.047.025.600.700
Retroperitoneal Space;A01.047.025.750
Abdominal Wall;A01.047.050
Groin;A01.047.365
Inguinal Canal;A01.047.412
Umbilicus;A01.047.849
Back;A01.176
Lumbosacral Region;A01.176.519
Sacrococcygeal Region;A01.176.780
Breast;A01.236
Mammary Glands, Human;A01.236.249

As you will notice each line has been indented using the tab key a
number of times corresponding to the size of the alphanumeric code.
Lines with an alphanumeric code containing only 3 characters stay on
the left most position (no tabs). Example:
Body Regions;A01

Lines with an alphanumeric code containing 7 characters (example:
A01.047) are indented with one single tab keystroke. Example:
Abdomen;A01.047

Lines with an alphanumeric code containing 11 characters (example:
A01.047.025) are indented with two tab keystrokes. Example:
Abdominal Cavity;A01.047.025

Lines with an alphanumeric code containing 15 characters (example:
A01.047.025.600) are indented with three tab keystrokes. Example:
Peritoneum;A01.047.025.600

And so on, and so forth.

Once this step is finished and the file has been edited from a flat
text to one with a hierarchical or tree structure using tab entries
(to derive a tab-delimited file) the second part of the project
involves the removal of certain characters and replacement with
others. Basically I need the format to go from the existing:

Abdominal Cavity;A01.047.025

Where the term is followed by a semicolon and alphanumeric code.

To the following:

Abdominal Cavity [A01.047.025]

Where a space is added after the term, the semicolon is removed and
brackets are placed at the beginning and end of the alphanumeric
code.

Considering the size of my document as I stated (little less than
57,000 entries) I would like to find a way to automate the process.

How can these two processes be accomplished with the use of a macro ?

I have never worked with macros before and any help would be extremely
valuable and appreciated. I have both Excel 04 and Word 04 as part of
the Office 2004 package in case the information is relevant. I am not
sure which of the applications would be better suited for this task
but am open to your suggestions.

Also I would appreciate suggestions on textbooks that would cover VBA
for Excel and Word. I would like to find books that would cover
everything from A to Z but with an approach that makes it easy for a
beginner to understand and work with VBA on Microsoft Office for the
Mac. Any
recommendations will be appreciated.

Thank you in advance for your help.


JRC
.
 
JRC,

I have assumed that your document is a text file.  (If it isn't the code can be corrected to account
for that.)

The macro below will read in a file and add the tabs and add the spaces and brackets as you
described.

Run the macro, select the file with the data, and the macro will put the new file named "Output.txt"
into the same folder.

HTH,
Bernie
MS Excel MVP

Sub AddTabsToFile()
Dim FileName As String
Dim FileNumIn As Integer
Dim FileNumOut As Integer
Dim ResultStr As String
Dim myStr As String
Dim myS As Variant
Dim i As Integer

FileName = Application.GetOpenFilename
If FileName = "" Then End

FileNumIn = FreeFile()
Open FileName For Input As FileNumIn

FileNumOut = FreeFile()
Open "Output.txt" For Output Access Write As FileNumOut

Do While Seek(FileNumIn) <= LOF(FileNumIn)
   'Store One Line Of Text From File To Variable
   Line Input #FileNumIn, ResultStr
   myStr = ""
   myS = Split(ResultStr, ";")
   For i = 1 To (Len(myS(1)) - 3) / 4
       myStr = myStr & vbTab
   Next i
   myStr = myStr & myS(0) & " [" & myS(1) & "]"
   Print #FileNumOut, myStr
Loop

Close FileNumIn
Close FileNumOut
End Sub




Dear all:
I am currently working on a rather large document that I need to
format a certain specific way. The document includes nearly 57,000
lines of plain text that need to be indented according to a simple
rule so that it can then be imported into an application as a tab-
delimited file.
My system is a Mac G5 running the latest version of Leopard (not Snow
Leopard) which is 10.5.8 and I have Microsoft Office 2004 installed
(haven't upgraded to 2008 and will probably wait until next version to
upgrade).
Below is an example of the type of file I now have:
Body Regions;A01
Abdomen;A01.047
Abdominal Cavity;A01.047.025
Peritoneum;A01.047.025.600
Douglas' Pouch;A01.047.025.600.225
Mesentery;A01.047.025.600.451
Mesocolon;A01.047.025.600.451.535
Omentum;A01.047.025.600.573
Peritoneal Cavity;A01.047.025.600.678
Peritoneal Stomata;A01.047.025.600.700
Retroperitoneal Space;A01.047.025.750
Abdominal Wall;A01.047.050
Groin;A01.047.365
Inguinal Canal;A01.047.412
Umbilicus;A01.047.849
Back;A01.176
Lumbosacral Region;A01.176.519
Sacrococcygeal Region;A01.176.780
Breast;A01.236
Mammary Glands, Human;A01.236.249
This is just a small portion of the file I am working with. This is a
"flat" text file but what I need is a "hierarchical" or "tree
structure" file where each of these lines is indented with a number
of
"tab keystrokes" corresponding to its level as depicted by the
alphanumeric code following the term. Example:
Body Regions;A01
    Abdomen;A01.047
         Abdominal Cavity;A01.047.025
              Peritoneum;A01.047.025.600
                   Douglas' Pouch;A01.047.025.600.225
                   Mesentery;A01.047.025.600.451
                        Mesocolon;A01.047.025.600.451.535
                   Omentum;A01.047.025.600.573
                        Peritoneal Cavity;A01.047.025.600.678
                   Peritoneal Stomata;A01.047.025.600.700
              Retroperitoneal Space;A01.047.025.750
         Abdominal Wall;A01.047.050
         Groin;A01.047.365
         Inguinal Canal;A01.047.412
         Umbilicus;A01.047.849
    Back;A01.176
         Lumbosacral Region;A01.176.519
         Sacrococcygeal Region;A01.176.780
    Breast;A01.236
         Mammary Glands, Human;A01.236.249
As you will notice each line has been indented using the tab key a
number of times corresponding to the size of the alphanumeric code.
Lines with an alphanumeric code containing only 3 characters stay on
the left most position (no tabs). Example:
Body Regions;A01
Lines with an alphanumeric code containing 7 characters (example:
A01.047) are indented with one single tab keystroke. Example:
    Abdomen;A01.047
Lines with an alphanumeric code containing 11 characters (example:
A01.047.025) are indented with two tab keystrokes. Example:
         Abdominal Cavity;A01.047.025
Lines with an alphanumeric code containing 15 characters (example:
A01.047.025.600) are indented with three tab keystrokes. Example:
              Peritoneum;A01.047.025.600
And so on, and so forth.
Once this step is finished and the file has been edited from a flat
text to one with a hierarchical or tree structure using tab entries
(to derive a tab-delimited file) the second part of the project
involves the removal of certain characters and replacement with
others. Basically I need the format to go from the existing:
Abdominal Cavity;A01.047.025
Where the term is followed by a semicolon and alphanumeric code.
To the following:
Abdominal Cavity [A01.047.025]
Where a space is added after the term, the semicolon is removed and
brackets are placed at the beginning and end of the alphanumeric
code.
Considering the size of my document as I stated (little less than
57,000 entries) I would like to find a way to automate the process.
How can these two processes be accomplished with the use of a macro ?
I have never worked with macros before and any help would be extremely
valuable and appreciated. I have both Excel 04 and Word 04 as part of
the Office 2004 package in case the information is relevant. I am not
sure which of the applications would be better suited for this task
but am open to your suggestions.
Also I would appreciate suggestions on textbooks that would cover VBA
for Excel and Word. I would like to find books that would cover
everything  from A to Z but with an approach that makes it easy for a
beginner to understand and work with VBA on Microsoft Office for the
Mac. Any
recommendations will be appreciated.
Thank you in advance for your help.

Hello, Bernie.

Thank you very much for your help and the really detailed post. I have
tried copying and pasting your code into Excel but have not been
successful. As soon as I try to run it Excel gives me a error message
stating there is something wrong with the syntax and highlighting the
first line in yellow. I am not at all experienced with VBA so I am
finding it difficult to locate the reason for the problem.

Do you have any idea as to what is causing the problem ? Just in case
this is helpful I tried the subroutines from John and JLatham but both
caused the same or a similar type of error.

Thank you again,


JRC
 
Assuming data is in column A and starts at row 1:
Sub main()
Dim lastRow As Long
lastRow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
For i = 1 To lastRow
Dim strData As String
strData = Sheet1.Cells(i, 1)
Dim strEnding As String
strEnding = Mid(strData, InStr(1, strData, ";") + 1, Len(strData) - InStr(1,
strData, ";"))
For j = 1 To (Len(strEnding) / 4) + 0.25
strData = "     " & strData
Next
Sheet1.Cells(i, 2) = Split(strData, ";")(0) & " [" & strEnding & "]"
Next
End Sub

As far as books go, they are great for reference but i have never read one
that did much for training. Coming on here and reading questions and trying
to build answers is about the best you can get.
--
-Johnhttp://www.jmbundy.blogspot.com/
Please rate when your question is answered to help us and others know what
is helpful.



JRC said:
Dear all:
I am currently working on a rather large document that I need to
format a certain specific way. The document includes nearly 57,000
lines of plain text that need to be indented according to a simple
rule so that it can then be imported into an application as a tab-
delimited file.
My system is a Mac G5 running the latest version of Leopard (not Snow
Leopard) which is 10.5.8 and I have Microsoft Office 2004 installed
(haven't upgraded to 2008 and will probably wait until next version to
upgrade).
Below is an example of the type of file I now have:
Body Regions;A01
Abdomen;A01.047
Abdominal Cavity;A01.047.025
Peritoneum;A01.047.025.600
Douglas' Pouch;A01.047.025.600.225
Mesentery;A01.047.025.600.451
Mesocolon;A01.047.025.600.451.535
Omentum;A01.047.025.600.573
Peritoneal Cavity;A01.047.025.600.678
Peritoneal Stomata;A01.047.025.600.700
Retroperitoneal Space;A01.047.025.750
Abdominal Wall;A01.047.050
Groin;A01.047.365
Inguinal Canal;A01.047.412
Umbilicus;A01.047.849
Back;A01.176
Lumbosacral Region;A01.176.519
Sacrococcygeal Region;A01.176.780
Breast;A01.236
Mammary Glands, Human;A01.236.249
This is just a small portion of the file I am working with. This is a
"flat" text file but what I need is a "hierarchical" or "tree
structure" file where each of these lines is indented with a number
of
"tab keystrokes" corresponding to its level as depicted by the
alphanumeric code following the term. Example:
Body Regions;A01
     Abdomen;A01.047
          Abdominal Cavity;A01.047.025
               Peritoneum;A01.047.025.600
                    Douglas' Pouch;A01.047.025.600.225
                    Mesentery;A01.047.025.600.451
                         Mesocolon;A01.047.025.600.451.535
                    Omentum;A01.047.025.600.573
                         Peritoneal Cavity;A01.047.025.600.678
                    Peritoneal Stomata;A01.047.025.600.700
               Retroperitoneal Space;A01.047.025.750
          Abdominal Wall;A01.047.050
          Groin;A01.047.365
          Inguinal Canal;A01.047.412
          Umbilicus;A01.047.849
     Back;A01.176
          Lumbosacral Region;A01.176.519
          Sacrococcygeal Region;A01.176.780
     Breast;A01.236
          Mammary Glands, Human;A01.236.249
As you will notice each line has been indented using the tab key a
number of times corresponding to the size of the alphanumeric code.
Lines with an alphanumeric code containing only 3 characters stay on
the left most position (no tabs). Example:
Body Regions;A01
Lines with an alphanumeric code containing 7 characters (example:
A01.047) are indented with one single tab keystroke. Example:
     Abdomen;A01.047
Lines with an alphanumeric code containing 11 characters (example:
A01.047.025) are indented with two tab keystrokes. Example:
          Abdominal Cavity;A01.047.025
Lines with an alphanumeric code containing 15 characters (example:
A01.047.025.600) are indented with three tab keystrokes. Example:
               Peritoneum;A01.047.025.600
And so on, and so forth.
Once this step is finished and the file has been edited from a flat
text to one with a hierarchical or tree structure using tab entries
(to derive a tab-delimited file) the second part of the project
involves the removal of certain characters and replacement with
others. Basically I need the format to go from the existing:
Abdominal Cavity;A01.047.025
Where the term is followed by a semicolon and alphanumeric code.
To the following:
Abdominal Cavity [A01.047.025]
Where a space is added after the term, the semicolon is removed and
brackets are placed at the beginning and end of the alphanumeric
code.
Considering the size of my document as I stated (little less than
57,000 entries) I would like to find a way to automate the process.
How can these two processes be accomplished with the use of a macro ?
I have never worked with macros before and any help would be extremely
valuable and appreciated. I have both Excel 04 and Word 04 as part of
the Office 2004 package in case the information is relevant. I am not
sure which of the applications would be better suited for this task
but am open to your suggestions.
Also I would appreciate suggestions on textbooks that would cover VBA
for Excel and Word. I would like to find books that would cover
everything  from A to Z but with an approach that makes it easy for a
beginner to understand and work with VBA on Microsoft Office for the
Mac. Any
recommendations will be appreciated.
Thank you in advance for your help.

Hello, John.


Thank you very much for your help and really helpful suggestions. As I
mentioned on my post to Bernie I tried the subroutines you posted but
kept getting an error message. I am not at all experienced with VBA
and this is the reason I want to get a book to at least learn how to
do the basics.

As included in my original post I am running Excel 2004 on the Mac and
I am sure that there must be differences between this version and the
Windows versions. Having said that, what version of Windows is the
closest to the Mac version I am running ? I have tried to find a book
at Barnes and Noble that would cover the basics of VBA on the Mac but
haven't found a single one. An Amazon search revealed similar
statistics. I have only found books covering either 2003 or 2007 Excel
with VBA. Which of these two versions of Excel is the closest to the
one I am running ?

Thank you again for your really helpful reply and help.

JRC
 
I think Bernie has provided an excellent code solution for you, tighter than
the one I wrote and was going to present, and gets the job done very well..  
But I'll put mine up later as an alternative for 2 reasons: it shows a second
solution for the same problem, and also echos the output to the text fileon
an Excel worksheet so you can get an idea of what was actually written (tabs
show up as little squares in the worksheet).

One good book for starting to learn VBA is
Visual Basic Programming for the Absolute Beginner, which you can see on
Amazon athttp://www.amazon.com/Visual-Basic-Programming-Absolute-Beginner/dp/0...

I have also tried to put together an introductory booklet for those trying
to learn VBA for Excel, and you're welcome to see if it might not add a
little to other books you find on the subject:http://www.jlathamsite.com/Teach/VBA/ProgrammingInExcelVBA_AnIntroduc...

OK, the code I promised you:

Sub ReadTextFile()
'reads .txt file and places it into an
'Excel worksheet and writes it to a
'tab delimited file named "tdfFile.csv"
'in the same folder with the source .txt file
'
  Const semicolon = ";"
  Const tabIndicator = "."

  Dim sourceFile As String ' the .txt file to be read
  Dim buffNum As Integer
  Dim rawData As String ' one line of data from the .txt file
  Dim splitText As Variant
  Dim tabCount As Integer
  Dim LC As Integer
  Dim rowCount As Long
  Dim outputText As String

  Dim delimitedFile As String
  Dim tdfBuffNum As Integer

  sourceFile = Application.GetOpenFilename
  If sourceFile = "False" Then
    Exit Sub ' user cancelled
  End If
  'set up the output file
  tdfBuffNum = FreeFile()
  delimitedFile = "tdfFile.csv"
  delimitedFile = Left(sourceFile, InStrRev(sourceFile, _
   Application.PathSeparator)) & delimitedFile
  Open delimitedFile For Output As #tdfBuffNum
  'clear contents of active sheet just so we
  'can show what we're doing on it
  ActiveSheet.Cells.ClearContents
  Cells(1, 1) = "Output written to: " & delimitedFile
  rowCount = 2 ' initialize
  'start reading the source file
  buffNum = FreeFile()
  Open sourceFile For Input As #buffNum
  Do While Not EOF(buffNum)
    Line Input #buffNum, rawData
    'line must have a semicolon in it to use
    If InStr(rawData, semicolon) > 0 Then
      splitText = Split(rawData, semicolon)
      'number of tabs needed depends on
      'number of periods in the code part.
      'Bernie's method is faster
      tabCount = 0
      For LC = 1 To Len(splitText(1))
        If Mid(splitText(1), LC, 1) = tabIndicator Then
          tabCount = tabCount + 1
        End If
      Next
      outputText = String(tabCount, vbTab) & _
       splitText(0) & " [" & splitText(1) & "]"
      Print #tdfBuffNum, outputText
      'echo to worksheet
      Cells(rowCount, 1) = outputText
      rowCount = rowCount + 1
    End If
  Loop
  Close #buffNum
  Close #tdfBuffNum
End Sub



JRC said:
Dear all:
I am currently working on a rather large document that I need to
format a certain specific way. The document includes nearly 57,000
lines of plain text that need to be indented according to a simple
rule so that it can then be imported into an application as a tab-
delimited file.
My system is a Mac G5 running the latest version of Leopard (not Snow
Leopard) which is 10.5.8 and I have Microsoft Office 2004 installed
(haven't upgraded to 2008 and will probably wait until next version to
upgrade).
Below is an example of the type of file I now have:
Body Regions;A01
Abdomen;A01.047
Abdominal Cavity;A01.047.025
Peritoneum;A01.047.025.600
Douglas' Pouch;A01.047.025.600.225
Mesentery;A01.047.025.600.451
Mesocolon;A01.047.025.600.451.535
Omentum;A01.047.025.600.573
Peritoneal Cavity;A01.047.025.600.678
Peritoneal Stomata;A01.047.025.600.700
Retroperitoneal Space;A01.047.025.750
Abdominal Wall;A01.047.050
Groin;A01.047.365
Inguinal Canal;A01.047.412
Umbilicus;A01.047.849
Back;A01.176
Lumbosacral Region;A01.176.519
Sacrococcygeal Region;A01.176.780
Breast;A01.236
Mammary Glands, Human;A01.236.249
This is just a small portion of the file I am working with. This is a
"flat" text file but what I need is a "hierarchical" or "tree
structure" file where each of these lines is indented with a number
of
"tab keystrokes" corresponding to its level as depicted by the
alphanumeric code following the term. Example:
Body Regions;A01
     Abdomen;A01.047
          Abdominal Cavity;A01.047.025
               Peritoneum;A01.047.025.600
                    Douglas' Pouch;A01.047.025.600.225
                    Mesentery;A01.047.025.600.451
                         Mesocolon;A01.047.025.600.451.535
                    Omentum;A01.047.025.600.573
                         Peritoneal Cavity;A01.047.025.600.678
                    Peritoneal Stomata;A01.047.025.600.700
               Retroperitoneal Space;A01.047.025.750
          Abdominal Wall;A01.047.050
          Groin;A01.047.365
          Inguinal Canal;A01.047.412
          Umbilicus;A01.047.849
     Back;A01.176
          Lumbosacral Region;A01.176.519
          Sacrococcygeal Region;A01.176.780
     Breast;A01.236
          Mammary Glands, Human;A01.236.249
As you will notice each line has been indented using the tab key a
number of times corresponding to the size of the alphanumeric code.
Lines with an alphanumeric code containing only 3 characters stay on
the left most position (no tabs). Example:
Body Regions;A01
Lines with an alphanumeric code containing 7 characters (example:
A01.047) are indented with one single tab keystroke. Example:
     Abdomen;A01.047
Lines with an alphanumeric code containing 11 characters (example:
A01.047.025) are indented with two tab keystrokes. Example:
          Abdominal Cavity;A01.047.025
Lines with an alphanumeric code containing 15 characters (example:
A01.047.025.600) are indented with three tab keystrokes. Example:
               Peritoneum;A01.047.025.600
And so on, and so forth.
Once this step is finished and the file has been edited from a flat
text to one with a hierarchical or tree structure using tab entries
(to derive a tab-delimited file) the second part of the project
involves the removal of certain characters and replacement with
others. Basically I need the format to go from the existing:
Abdominal Cavity;A01.047.025
Where the term is followed by a semicolon and alphanumeric code.
To the following:
Abdominal Cavity [A01.047.025]
Where a space is added after the term, the semicolon is removed and
brackets are placed at the beginning and end of the alphanumeric
code.
Considering the size of my document as I stated (little less than
57,000 entries) I would like to find a way to automate the process.
How can these two processes be accomplished with the use of a macro ?
I have never worked with macros before and any help would be extremely
valuable and appreciated. I have both Excel 04 and Word 04 as part of
the Office 2004 package in case the information is relevant. I am not
sure which of the applications would be better suited for this task
but am open to your suggestions.
Also I would appreciate suggestions on textbooks that would cover VBA
for Excel and Word. I would like to find books that would cover
everything  from A to Z but with an approach that makes it easy for a
beginner to understand and work with VBA on Microsoft Office for the
Mac. Any
recommendations will be appreciated.
Thank you in advance for your help.

Hello, JLatham.


Thank you very much for your really helpful post and for the really
good suggestions. I have downloaded the document you created and will
study it either tonight or tomorrow night. Also I was very happy with
the book suggestion and will order me a copy as I am sure it will help
me.

Just as I posted to John and Bernie, I am having problems running the
routines you all have offered me. I am sure it has to do with my lack
of experience with VBA. Do you have any ideas as to what might be
causing this ?

Thank you for your really helpful and detailed post.

Best regards,


JRC
 
Back
Top