Complex XML transformation in a better performance way?

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

The problem is how to achieve the transformation as below:

The source xml contains tons of repeating structure like below, each item
node contains a person element and a insurance element that correlate to the
Person element with the person id.
<Item>
<Person id=â€p123†name=â€someone1â€>
<Insurance ref=â€p123†detail=â€blabla1â€>
</item>
<Item>
<Person id=â€p123†name=â€someone1â€>
<Insurance ref=â€p456†detail=â€blabla2â€>
</item>
<Item>
<Person id=â€p456†name=â€someone1â€>
<Insurance ref=â€p123†detail=â€blabla3â€>
</item>
The goal is to regroup to a structure of 1(Person) to many(Insurance), like
below
<Item>
<Person id=â€p123†name=â€someone1â€>
<Insurance ref=â€p123†detail=â€blabla1â€>
<Insurance ref=â€p123†detail=â€blabla3â€>
</Item>
My initial idea was to load the source into memory and dissect into
Hashtables so that I could easily regroup. However, since the source file is
really big (approximate 50M each with 70000 repeating items), obviously my
way of doing it is too memory consuming. I am frustrated, after a whole day
sitting quietly and cannot figure out a better way, I would really appreciate
any help.

Thanks in advance
 
Hello tommy,

Why not to parste in like TXT file, with Regexp template, to find need data?
XML parsers in .net are not so good to parse 50mb file

t> The problem is how to achieve the transformation as below:
t>
t> The source xml contains tons of repeating structure like below, each
t> item
t> node contains a person element and a insurance element that correlate
t> to the
t> Person element with the person id.
t> <Item>
t> <Person id="p123" name="someone1">
t> <Insurance ref="p123" detail="blabla1">
t> </item>
t> <Item>
t> <Person id="p123" name="someone1">
t> <Insurance ref="p456" detail="blabla2">
t> </item>
t> <Item>
t> <Person id="p456" name="someone1">
t> <Insurance ref="p123" detail="blabla3">
t> </item>
t> The goal is to regroup to a structure of 1(Person) to
t> many(Insurance), like
t> below
t> <Item>
t> <Person id="p123" name="someone1">
t> <Insurance ref="p123" detail="blabla1">
t> <Insurance ref="p123" detail="blabla3">
t> </Item>
t> My initial idea was to load the source into memory and dissect into
t> Hashtables so that I could easily regroup. However, since the source
t> file is really big (approximate 50M each with 70000 repeating items),
t> obviously my way of doing it is too memory consuming. I am
t> frustrated, after a whole day sitting quietly and cannot figure out a
t> better way, I would really appreciate any help.
t>
t> Thanks in advance
t>
---
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsch
 
Thanks a lot, Michael. But I have no idea with the Regexp, could you
please elaborate a little on that please.
 
Back
Top