S
shonend
I am trying to extract the pattern like this :
"SUB: some text LOT: one-word"
Described, "SUB" and "LOT" are key words; I want those words,
everything in between and one word following the "LOT:". Source text
may contain multiple "SUB: ... LOT:" blocks.
For example this is my source text:
SUB: this text I want to extract LOT: 2345 , something in between, new
SUB: again something I want to extract LOT: 2145 and more text here,
the end
When I apply this pattern:
SUB:\s+[^\r\n]+\s+LOT:\s+[^\r\n\s]+
in .NET's Regex.Matches(...), I only get one match:
SUB: this text I want to extract LOT: 2345 , something in between, new
SUB: again something I want to extract LOT: 2145
Obviously, something in this regex tells it to be "greedy", and I need
the partial matches too.
I thought this pattern would return ALL matches, which are:
1) SUB: this text I want to extract LOT: 2345
2) SUB: again something I want to extract LOT: 2145
3) SUB: this text I want to extract LOT: 2345 , something in between,
new SUB: again something I want to extract LOT: 2145
The last one I don't need of course, but I can handle it - ignore it,
and use only the first two.
So my idea was to modify my pattern to read like this:
give me all matches resembling text between "SUB:" and "LOT:",
including those keywords, plus one word after "LOT:", but (!) the text
between cannot contain "LOT:"
If I manage to compose such RegEx pattern, it would even eliminate the
result 3), and return only what I really need. But the problem is how
to define pattern that will eliminate (exclude) the whole word. I
tried "[^ ... ]" pattern, but that works only for single characters
listed between the brackets.
For example:
SUB:\s+[^\r\n(LOT]+\s+LOT:\s+[^\r\n\s]+
is not working. I thought that "( )" brackets would group the
characters and tell the regex not the match the appearance of the whole
word "LOT:". But instead, it invalidates any text that contain any of
these characters:
) ( : L T O
So if you could answer at least one of the following questions, I would
appreciate it very much:
1) generally, how do you compose the regex pattern to not match the
text that contain certain word?
2) if there is no easy solution for 1), or there is a better solution
for the problem I described above, what is it?
Thank you so much!
Shone
"SUB: some text LOT: one-word"
Described, "SUB" and "LOT" are key words; I want those words,
everything in between and one word following the "LOT:". Source text
may contain multiple "SUB: ... LOT:" blocks.
For example this is my source text:
SUB: this text I want to extract LOT: 2345 , something in between, new
SUB: again something I want to extract LOT: 2145 and more text here,
the end
When I apply this pattern:
SUB:\s+[^\r\n]+\s+LOT:\s+[^\r\n\s]+
in .NET's Regex.Matches(...), I only get one match:
SUB: this text I want to extract LOT: 2345 , something in between, new
SUB: again something I want to extract LOT: 2145
Obviously, something in this regex tells it to be "greedy", and I need
the partial matches too.
I thought this pattern would return ALL matches, which are:
1) SUB: this text I want to extract LOT: 2345
2) SUB: again something I want to extract LOT: 2145
3) SUB: this text I want to extract LOT: 2345 , something in between,
new SUB: again something I want to extract LOT: 2145
The last one I don't need of course, but I can handle it - ignore it,
and use only the first two.
So my idea was to modify my pattern to read like this:
give me all matches resembling text between "SUB:" and "LOT:",
including those keywords, plus one word after "LOT:", but (!) the text
between cannot contain "LOT:"
If I manage to compose such RegEx pattern, it would even eliminate the
result 3), and return only what I really need. But the problem is how
to define pattern that will eliminate (exclude) the whole word. I
tried "[^ ... ]" pattern, but that works only for single characters
listed between the brackets.
For example:
SUB:\s+[^\r\n(LOT]+\s+LOT:\s+[^\r\n\s]+
is not working. I thought that "( )" brackets would group the
characters and tell the regex not the match the appearance of the whole
word "LOT:". But instead, it invalidates any text that contain any of
these characters:
) ( : L T O
So if you could answer at least one of the following questions, I would
appreciate it very much:
1) generally, how do you compose the regex pattern to not match the
text that contain certain word?
2) if there is no easy solution for 1), or there is a better solution
for the problem I described above, what is it?
Thank you so much!
Shone