Separate Numeric and Text data

  • Thread starter Thread starter MG
  • Start date Start date
M

MG

Hi

I have the following data in a spreadsheet, for example
starting from A1:

1
2
3
14a
1ab
123c

I want to be able to separate this data into two
columns. Have tried Text To Columns, and have had no
luck.

There are no separators between the data, and the data is
of varying lengths - any ideas would be most appreciated.
 
B1:

=--LEFT(A1,SUMPRODUCT((LEN(A1)-LEN(SUBSTITUTE(A1,{".",0,1,2,3,4,5,6,7,8,9},"
")))))

C1:

=SUBSTITUTE(A1,B1,"")

The formulas expect input in A1 in the form DigitSequence followed by text,
with no digit in the latter part.
 
This from Jason Morin appears to work, I've no idea how -

=LEFT(A1,SUM(1*NOT(ISERROR(1*(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1))))))&"
"&RIGHT(A1,LEN(A1)-SUM(1*NOT(ISERROR(1*(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1
))))))

This is an array formula, you'll need to use CTRL+Shift+Enter instead of
plain Enter.

From there, you should be able to copy & paste special (values), then
Data -- Text to Columns.

Rgds,
Andy
 
Another way

=--MID(A1,MATCH(FALSE,ISERROR(--MID(A1,ROW(INDIRECT("1:100")),1)),0),100-SUM
(--ISERROR(1*MID(A1,ROW(INDIRECT("1:100")),1))))

entered with ctrl + shift & enter

will work if the numbers always are together, if not post back

--


For everyone's benefit keep the discussion in the newsgroup.

Regards,

Peo Sjoblom
 
Hi,

B1:
=IF(ISTEXT(A1),--LEFT(A1,SUMPRODUCT(COUNTIF(A1,"*"&{".",0,1,2,3,4,5,6,7,8,9}
&"*"))),A1)
C1: =SUBSTITUTE(A1,B1,"")


--
Regards,
Soo Cheon Jheong
Seoul, Korea
_ _
^¢¯^
--
 
Thanks, works well.
-----Original Message-----
B1:

=--LEFT(A1,SUMPRODUCT((LEN(A1)-LEN(SUBSTITUTE(A1, {".",0,1,2,3,4,5,6,7,8,9},"
")))))

C1:

=SUBSTITUTE(A1,B1,"")

The formulas expect input in A1 in the form
DigitSequence followed by text,
 
Thanks, I'll try this.
-----Original Message-----
Another way

=--MID(A1,MATCH(FALSE,ISERROR(--MID(A1,ROW(INDIRECT ("1:100")),1)),0),100-SUM
(--ISERROR(1*MID(A1,ROW(INDIRECT("1:100")),1))))

entered with ctrl + shift & enter

will work if the numbers always are together, if not post back

--


For everyone's benefit keep the discussion in the newsgroup.

Regards,

Peo Sjoblom

appreciated.


.
 
Thanks, I'll try this.
-----Original Message-----
This from Jason Morin appears to work, I've no idea how -

=LEFT(A1,SUM(1*NOT(ISERROR(1*(MID(A1,ROW(INDIRECT ("1:"&LEN(A1))),1))))))&"
(INDIRECT("1:"&LEN(A1))),1
))))))

This is an array formula, you'll need to use CTRL+Shift+Enter instead of
plain Enter.

From there, you should be able to copy & paste special (values), then
Data -- Text to Columns.

Rgds,
Andy


.
 
Not with duplicate digits in the didit sequence...

Try:

1222Saturnus

Excellent, this works well.
 
Thanks Aladin,

B1: =IF(ISTEXT(A1),--LEFT(A1,SUMPRODUCT((LEN(A1)-LEN(SUBSTITUTE(A1,
{".",0,1,2,3,4,5,6,7,8,9},""))))),IF(LEN(A1)>0,A1,""))

or

=IF(A1="","",--LEFT(A1,SUMPRODUCT(11-LEN(SUBSTITUTE(".1234567890",
MID(A1,ROW($1:$100),1),)))))

C1: =SUBSTITUTE(A1,B1,"")



Thanks for your help,
Regards,
Soo Cheon Jheong
Seoul, Korea
_ _
^¢¯^
--
 
Hi
though I prefere Aladin's solution this could be shortened to:
=LEFT(A1,MAX(IF(ISNUMBER(--MID(A1,ROW(INDIRECT("1:1024")),1)),ROW(INDIR
ECT("1:1024")))))
also entered as array formula with CTRL+SHIFT+ENTER

and if you define a name (Insert - Name - Define) called 'Seq' with the
following formula
=ROW(INDIRECT("1:1024")
you could use the formula
=LEFT(A1,MAX(IF(ISNUMBER(--MID(A1,seq,1)),seq)))
(also array entered)

Some additional note:
- will work only if all digits are at the beginning of your string
- to get the text part use (if the above formula is in B1):
=SUBSTITUTE(A1,B1,"")
 
Frank Kabel said:
and if you define a name (Insert - Name - Define) called 'Seq' with the
following formula
=ROW(INDIRECT("1:1024")
you could use the formula
=LEFT(A1,MAX(IF(ISNUMBER(--MID(A1,seq,1)),seq)))
(also array entered)
....

Using such a defined name, you could use the array formula

=LEFT(A4,MAX(ISNUMBER(-MID(A4,1,seq))*seq))

which only assumes that the leftmost longest valid numeric string should be
extracted, and it could be followed by anything else. Note: if you're
coverting strings to numbers immediately inside ISNUMBER (or ISERROR), you
need only '-', not '--'.

While you may prefer Aladin's formula, it chokes when there are trailing
numeric string characters. This is especially nasty when it comes to
periods. If A1 contained '14a.b.c.', Aladin's formula returns '14a.b'. If A1
contained '123-and now for lucky number 7', your formula would return the
entire string, and Aladin's would return '123-'.

Both your formula and Aladin's present some specification questions. If A1
contained '1.0..0abc', should the leftmost 'numeric' portion be '1.0', which
my formula above gives, or should it be '1.0..0' which both your formula and
Aladin's return?
 
Harlan Grove said:
While you may prefer Aladin's formula, it chokes when there are trailing
numeric string characters. This is especially nasty when it comes to
periods. If A1 contained '14a.b.c.', Aladin's formula returns '14a.b'. If A1
contained '123-and now for lucky number 7', your formula would return the
entire string, and Aladin's would return '123-'.

Both your formula and Aladin's present some specification questions. If A1
contained '1.0..0abc', should the leftmost 'numeric' portion be '1.0', which
my formula above gives, or should it be '1.0..0' which both your formula and
Aladin's return?

Well, this time I did not forget to qualify the input conditions in my reply
to the original post:

"The formulas expect input in A1 in the form DigitSequence followed by text,
with no digit in the latter part." I should have added: And no dot in the
text part. The dot is meant to capture a number before text like in 4.2abc.

Under conditions of no need for testing the input whether this meets the
qualification, the formula satisfies output specs like the OP's at a
reasonable cost. BTW, '1.0..0abc' as input would cause an error value with
the formula I posted.
 
Harlan said:
...

Using such a defined name, you could use the array formula

=LEFT(A4,MAX(ISNUMBER(-MID(A4,1,seq))*seq))

which only assumes that the leftmost longest valid numeric string
should be extracted, and it could be followed by anything else. Note:
if you're coverting strings to numbers immediately inside ISNUMBER
(or ISERROR), you need only '-', not '--'.

Nice variation. Also thanks for your note. I'm just so used for using a
double minus (but this would save some processing cycles) :-)


While you may prefer Aladin's formula, it chokes when there are
trailing numeric string characters. This is especially nasty when it
comes to periods. If A1 contained '14a.b.c.', Aladin's formula
returns '14a.b'. If A1 contained '123-and now for lucky number 7',
your formula would return the entire string, and Aladin's would
return '123-'.

o.k. but I think Alading stated these restrictions in his post. If you
this kind of strings it really depends on the specs what the expected
result should be (and then we can argue if Excel is the right tool for
processing such strings)

Both your formula and Aladin's present some specification questions.
If A1 contained '1.0..0abc', should the leftmost 'numeric' portion be
'1.0', which my formula above gives, or should it be '1.0..0' which
both your formula and Aladin's return?

In this case I would prefer a formula error due to wrong entry strings
<vbg>
Frank
 
Aladin Akyurek said:
Well, this time I did not forget to qualify the input conditions
in my reply to the original post:

"The formulas expect input in A1 in the form DigitSequence followed
by text, with no digit in the latter part." I should have added:
And no dot in the text part. The dot is meant to capture a number
before text like in 4.2abc.
....

I understood why you included the dot. What I was pointing out was that
that's rather restrictive (and unnecessarily so) in the general case. There
are more robust ways to parse out the leftmost longest valid numeric
substring than taking the leftmost substring based on counting the number of
'number' characters in the entire string.
 
Harlan,

Hier is another that I like (posted at MrExcel by fairwinds from Sweden)...

=LOOKUP(9.99999999999999E+307,--LEFT(A1,ROW(INDIRECT("1:"&LEN(A1)))))

which also can cope with input like

124AC23

One 'small' issue is when the digit seq in the input contains leading 0's as
in:

0124AC23

The output will be 124 instead of 0124, which appears acceptable to me.

Aladin
 
Aladin Akyurek said:
Hier is another that I like (posted at MrExcel by fairwinds from Sweden)...

=LOOKUP(9.99999999999999E+307,--LEFT(A1,ROW(INDIRECT("1:"&LEN(A1)))))

which also can cope with input like

124AC23

One 'small' issue is when the digit seq in the input contains leading 0's as
in:

0124AC23

The output will be 124 instead of 0124, which appears acceptable to
me.

vs the formula I gave,

=LEFT(A4,MAX(ISNUMBER(-MID(A4,1,seq))*seq))

where seq is defined as =ROW(INDIRECT("1:1024")).

Because the LOOKUP formula above insists on treating numeric
substrings as numbers, if A1 contained 1234567890123456abc0, the
LOOKUP formula would return 1234567890123450 while the LEFT formula,
which deals with text, would return 1234567890123456. Maybe the OP
doesn't need 16 or more digits, but if s/he did, the LOOKUP formula
wouldn't be reliable.

Then there's robustness. If A1 happened to be blank, the LOOKUP
formula returns #REF! while the LEFT formula returns "". If A1
contained "abc", the LOOKUP formula returns #N/A while the LEFT
formula sstill returns "". The LOOKUP formula is eating more nested
function call levels too. Granted the LEFT formula actually involves
more function calls, but the LEFT formula can delegate some of them to
a defined name. The LOOKUP formula, on the other hand, would be fubar
if the ROW call in it were changed to ROW(INDIRECT("1:1024")).
 
Harlan Grove said:
me.

vs the formula I gave,

=LEFT(A4,MAX(ISNUMBER(-MID(A4,1,seq))*seq))

where seq is defined as =ROW(INDIRECT("1:1024")).

Because the LOOKUP formula above insists on treating numeric
substrings as numbers, if A1 contained 1234567890123456abc0, the
LOOKUP formula would return 1234567890123450 while the LEFT formula,
which deals with text, would return 1234567890123456. Maybe the OP
doesn't need 16 or more digits, but if s/he did, the LOOKUP formula
wouldn't be reliable.

Then there's robustness. If A1 happened to be blank, the LOOKUP
formula returns #REF! while the LEFT formula returns "". If A1
contained "abc", the LOOKUP formula returns #N/A while the LEFT
formula sstill returns "". The LOOKUP formula is eating more nested
function call levels too. Granted the LEFT formula actually involves
more function calls, but the LEFT formula can delegate some of them to
a defined name. The LOOKUP formula, on the other hand, would be fubar
if the ROW call in it were changed to ROW(INDIRECT("1:1024")).

And, it doesn't survive an input like

14 Martin Place

and

124FEB1

by running into trouble with bits that can be interpreted as dates
by --LEFT(...) and --RIGHT(...).
 
{=LEFT(A1,MAX(ISNUMBER(-MID(A1,1,Seq))*Seq))}

with A1 housing:

14 Martin Place

returns 14 Mar instead of 14 just like the fairwinds formula.
 
Back
Top