A
Alan
I hope this is a good place to post this --- could not find a
better suited group. . . .
I need to be able to extract some text from a particular XPS file that
is updated periodically but has a standard format. I have read up on
the XPS file format and understand its structure.
Here's my problem: The UnicodeString in the file has a bunch of words
stuck together. For example, the string:
"1 Here is a string of words strung together. I need to separate them
to extract them...."
would be represented something like this:
<FixedPage . . .
<Glyphs Fill="#ff000000" FontUri="/Documents/1/Resources/Fonts/
C33C1892-4299-487A-9A63-97230919AAA4.odttf"
FontRenderingEmSize="10.5596" StyleSimulations="None" OriginX="38.08"
OriginY="229.12"
Indices="3,27;25,331;39;40;36;39,73;3,27;50,77;53,73;3,27;36;47;44;57;40,855;49;36,250;47,447;20;21,57;20;3;11;21;12,164;23;21;24,57;3,27;11;26;12,361;25;19,57;3,27;11;23;12,200;21;3;11,34;23;12,1080;22,57;28;3,27;11,34;28;12,380;22;22,319;19;3;11;20;12,258;41;3,27;22;16,34;23,377;16,341;20;24,87;11;26;12,214;23;18,27;20,382;28;18;20"UnicodeString="1Hereisastringofwordsstrungtogether.Ineedtoseparatethemtoextractthem...." /. . .
</FixedPage>
In the above, the Indices shown do not go with the string, but I just
wanted to give you the general idea.
I know the second fields in Indices are AdvanceWidths, but I have not
found an easy (or any) way to determine where to put spaces in the
output string.
Can anyone shed some light on this or point me to a good source of
information?
Thanks, Alan
better suited group. . . .
I need to be able to extract some text from a particular XPS file that
is updated periodically but has a standard format. I have read up on
the XPS file format and understand its structure.
Here's my problem: The UnicodeString in the file has a bunch of words
stuck together. For example, the string:
"1 Here is a string of words strung together. I need to separate them
to extract them...."
would be represented something like this:
<FixedPage . . .
<Glyphs Fill="#ff000000" FontUri="/Documents/1/Resources/Fonts/
C33C1892-4299-487A-9A63-97230919AAA4.odttf"
FontRenderingEmSize="10.5596" StyleSimulations="None" OriginX="38.08"
OriginY="229.12"
Indices="3,27;25,331;39;40;36;39,73;3,27;50,77;53,73;3,27;36;47;44;57;40,855;49;36,250;47,447;20;21,57;20;3;11;21;12,164;23;21;24,57;3,27;11;26;12,361;25;19,57;3,27;11;23;12,200;21;3;11,34;23;12,1080;22,57;28;3,27;11,34;28;12,380;22;22,319;19;3;11;20;12,258;41;3,27;22;16,34;23,377;16,341;20;24,87;11;26;12,214;23;18,27;20,382;28;18;20"UnicodeString="1Hereisastringofwordsstrungtogether.Ineedtoseparatethemtoextractthem...." /. . .
</FixedPage>
In the above, the Indices shown do not go with the string, but I just
wanted to give you the general idea.
I know the second fields in Indices are AdvanceWidths, but I have not
found an easy (or any) way to determine where to put spaces in the
output string.
Can anyone shed some light on this or point me to a good source of
information?
Thanks, Alan