Error in XML parsing schemas with CF SP2?

  • Thread starter Thread starter Richard Kucia
  • Start date Start date
R

Richard Kucia

I have an XML parsing problem that I can not resolve.

I have 2 files which contain identical data; call them SI (Schema Included)
and SX (Schema eXcluded). Each file contains 40 tables, most of which hold
only a few rows. When I view them in the Visual Studio, the files are
perfect. In fact, I originally let Visual Studio load SX and generate the
schema that I then pasted into the SI file. The data preview in Visual
Studio is correct.

I have been testing my application using various combinations of the SI and
SX files and the various ReadMode options. The data is loaded into an empty
dataset with ReadXML. A hand-typed highly abbreviated portion of the XML
data that demonstrates the problem is:

<platforms>
<platform type=0>
<networks>
<network linktype=0 />
(other network rows, with child tables [rates] and [rate])
</networks>
</platform>
<platform type=1>
<networks>
<network linktype=0 />
(other network rows, with child tables [rates] and [rate])
</networks>
</platform>

To track down the problem, I wrote a little schema-and-data table dumper. As
expected, tables called [platforms], [platform], [networks] and [network]
are created in the dataset. The dump for [networks] is always correct. It
is:

Table networks: networks_Id platform_Id
Parent of network with networks_Id = networks_Id
0 0
1 1

Here is the dump for [network] on SX with ReadMode.Auto. This means that the
schema needs to be inferred.

Table network: network_Id linktype enabled resource minport maxport
portname desc devices networks_Id
Parent of rates with network_Id = network_Id
0 0 true 2191 0 0 NONE None 0 0
1 1 true 2192 1 4 COM# Serial 16 0
2 2 false 2193 1 2 DN# DeviceNet 64 0
3 3 false 2194 0 0 CB CanBus 64 0
4 4 false 2195 0 0 PB ProfiBus 256 0
5 0 true 2191 0 0 NONE None 0 1
6 1 true 2192 1 8 COM# Serial 16 1
7 2 false 2193 1 2 DN# DeviceNet 64 1
8 3 false 2194 0 0 CB CanBus 64 1
9 4 false 2195 0 0 PB ProfiBus 256 1

However, here is the dump for [network] on SI with ReadMode.Auto. Since the
schema is indeed embedded in the file,
that schema should be used and thus the schema will not need to be inferred.
Remember that Visual Studio generated this
schema originally, and that the data content looks correct within the Visual
Studio preview.

Table network: linktype enabled resource minport maxport portname
desc devices network_Id networks_Id
Parent of rates with network_Id = network_Id
0 true 2191 0 0 NONE None 0 0 0
2 false 2193 1 2 DN# DeviceNet 64 1 0
3 false 2194 0 0 CB CanBus 64 2 0
4 false 2195 0 0 PB ProfiBus 256 3 0
0 true 2191 0 0 NONE None 0 4 1
2 false 2193 1 2 DN# DeviceNet 64 5 1
3 false 2194 0 0 CB CanBus 64 6 1
4 false 2195 0 0 PB ProfiBus 256 7 1

There are 2 problems:

1) OK, this is not a real problem, but it was a surprise. The schemas
reported by the dumper are different. The upper dump shows a [network_Id]
column as column #0, while that column appears second-last in the lower
dump. It does demonstrate that the schema-handling logic is different in the
two cases.

2) HUGE PROBLEM: the [network] rows for COM# are missing! There's 10 rows in
the table but only 8 show up. Check the sequence of values in column
[linktype] and [network_Id]. The XML parser seems to have completely ignored
2 rows!

Please tell me I've overlooked something obvious. Thanks.

Richard Kucia
 
Richard,

The first problem is a shortcoming of inference process. Inference is an
attempt to 'guess' schema of the XML file using set of predefined rules.
As any 'guess', this one could be wrong and might produce some unexpected
results.

If your schema is created via inference process, columns are added in order
they've been found (might change in upcoming releases).
In the example below 'Column2' in 'Table1' will be found first, so it will
have lower ordinal than 'Column1'.
Moreover, inference process might create some hidden columns to account for
nested tables parent-child relations, so ordinals are quite unpredictable
with inference.

Let's load this sample using inference:

<DS>
<Table1>
<Column2>Col2 Data 0 </Column2>
<Table2>
<Column3>Col4 Data 0 </Column3>
</Table2>
</Table1>
<Table1>
<Column1>Col1 Data 1</Column1>
<Column2>Col2 Data 1</Column2>
<Table2>
<Column3>Col4 Data 1</Column3>
</Table2>
</Table1>
</DS>

Here's the result:

--------------------------- DataSet ----------------------
DataSet: 'DS'
----------------------- Tables -----------------------
DataTable: 'Table1
------------------- Columns ----------------------
Column: 'Column2' 'String'
Column: 'Table1_Id' 'Int32' Unique Autoincrement
Column: 'Column1' 'String'
----------------- Child Tables -------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
----------------- Parent Tables ------------------
--------------------------------------------------
DataTable: 'Table2
------------------- Columns ----------------------
Column: 'Column3' 'String'
Column: 'Table1_Id' 'Int32'
----------------- Child Tables -------------------
----------------- Parent Tables ------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
--------------------------------------------------

We can see 'Column2' ordinal is zero; 'Column1' ordinal is 2!
And we have an extra column, 'Table1_Id' with ordinal 1 added to act as a
primary/foreign keys in these nested tables.

Unexpected? Sure. This is a reason why inference should _never_ be used.
Instead, you should design the schema you need so there will be no
surprises.

Let's take a look at the saved schema:

<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="DS" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Table1">
<xs:complexType>
<xs:sequence>
<xs:element name="Column2" type="xs:string" minOccurs="0" />
<xs:element name="Column1" type="xs:string" minOccurs="0" />
<xs:element name="Table2" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="Column3" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>

Do you see 'Table1_Id' column? Nope. It will be created for you because you
have nested tables, but it will be done after 'Column2' and 'Column1' are
created.
Thus, ordinals will change:

--------------------------- DataSet ----------------------
DataSet: 'DS'
----------------------- Tables -----------------------
DataTable: 'Table1
------------------- Columns ----------------------
Column: 'Column2' 'String'
Column: 'Column1' 'String'
Column: 'Table1_Id' 'Int32' Unique Autoincrement
----------------- Child Tables -------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
----------------- Parent Tables ------------------
--------------------------------------------------
DataTable: 'Table2
------------------- Columns ----------------------
Column: 'Column3' 'String'
Column: 'Table1_Id' 'Int32'
----------------- Child Tables -------------------
----------------- Parent Tables ------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
--------------------------------------------------

Now 'Column1' ordinal is 1, and it used to be 2.

How to avoid these problems? It's easy…. Do not use inference, design
schemas yourself.

Now, to the real problem... It is a known bug introduced in SP2 with
performance optimizations. It is fixed in upcoming V2.

If you have table elements with all columns mapped as attributes (like this
one: <network linktype=0 />) and a table is not a root table, every second
row will be lost.
Possible workarounds:

1. Map at least one column with not null data as element. Primary/Foreign
key is the best candidate.
2. Do not use nested tables, use related tables. It might also save you
some space in XML file, improve loading performance.

Here's a sample schema/data with related tables:

<?xml version="1.0" encoding="utf-8" ?>
<DS>
<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="DS" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Table1">
<xs:complexType>
<xs:sequence></xs:sequence>
<xs:attribute name="Column1" type="xs:string" />
<xs:attribute name="Column2" type="xs:string" />
<xs:attribute name="PrimaryKey" type="xs:int"/>
</xs:complexType>
</xs:element>
<xs:element name="Table2">
<xs:complexType>
<xs:sequence></xs:sequence>
<xs:attribute name="Colum3" type="xs:string" />
<xs:attribute name="ForeignKey" type="xs:int" />
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
<xs:key name="DSKey1" msdata:PrimaryKey="true">
<xs:selector xpath=".//Table1" />
<xs:field xpath="@PrimaryKey" />
</xs:key>
<xs:keyref name="Table1Table2" refer="DSKey1">
<xs:selector xpath=".//Table2" />
<xs:field xpath="@ForeignKey" />
</xs:keyref>
</xs:element>
</xs:schema>
<Table1 Column1="Data11" Column2="Data12" PrimaryKey="1"/>
<Table1 Column1="Data21" Column2="Data22" PrimaryKey="2"/>
<Table2 Colum3="Data31" ForeignKey="1"/>
<Table2 Colum3="Data32" ForeignKey="2"/>
</DS>

Best regards,

Ilya

This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
From: "Richard Kucia" <[email protected]>
Subject: Error in XML parsing schemas with CF SP2?
Date: Thu, 4 Mar 2004 12:12:44 -0500
Lines: 92
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
Message-ID: <[email protected]>
Newsgroups: microsoft.public.dotnet.framework.compactframework
NNTP-Posting-Host: 212.cleveland-11-13rs.oh.dial-access.att.net 12.75.70.212
Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP11.phx.gbl
Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.framework.compactframework:47665
X-Tomcat-NG: microsoft.public.dotnet.framework.compactframework

I have an XML parsing problem that I can not resolve.

I have 2 files which contain identical data; call them SI (Schema Included)
and SX (Schema eXcluded). Each file contains 40 tables, most of which hold
only a few rows. When I view them in the Visual Studio, the files are
perfect. In fact, I originally let Visual Studio load SX and generate the
schema that I then pasted into the SI file. The data preview in Visual
Studio is correct.

I have been testing my application using various combinations of the SI and
SX files and the various ReadMode options. The data is loaded into an empty
dataset with ReadXML. A hand-typed highly abbreviated portion of the XML
data that demonstrates the problem is:

<platforms>
<platform type=0>
<networks>
<network linktype=0 />
(other network rows, with child tables [rates] and [rate])
</networks>
</platform>
<platform type=1>
<networks>
<network linktype=0 />
(other network rows, with child tables [rates] and [rate])
</networks>
</platform>

To track down the problem, I wrote a little schema-and-data table dumper. As
expected, tables called [platforms], [platform], [networks] and [network]
are created in the dataset. The dump for [networks] is always correct. It
is:

Table networks: networks_Id platform_Id
Parent of network with networks_Id = networks_Id
0 0
1 1

Here is the dump for [network] on SX with ReadMode.Auto. This means that the
schema needs to be inferred.

Table network: network_Id linktype enabled resource minport maxport
portname desc devices networks_Id
Parent of rates with network_Id = network_Id
0 0 true 2191 0 0 NONE None 0 0
1 1 true 2192 1 4 COM# Serial 16 0
2 2 false 2193 1 2 DN# DeviceNet 64 0
3 3 false 2194 0 0 CB CanBus 64 0
4 4 false 2195 0 0 PB ProfiBus 256 0
5 0 true 2191 0 0 NONE None 0 1
6 1 true 2192 1 8 COM# Serial 16 1
7 2 false 2193 1 2 DN# DeviceNet 64 1
8 3 false 2194 0 0 CB CanBus 64 1
9 4 false 2195 0 0 PB ProfiBus 256 1

However, here is the dump for [network] on SI with ReadMode.Auto. Since the
schema is indeed embedded in the file,
that schema should be used and thus the schema will not need to be inferred.
Remember that Visual Studio generated this
schema originally, and that the data content looks correct within the Visual
Studio preview.

Table network: linktype enabled resource minport maxport portname
desc devices network_Id networks_Id
Parent of rates with network_Id = network_Id
0 true 2191 0 0 NONE None 0 0 0
2 false 2193 1 2 DN# DeviceNet 64 1 0
3 false 2194 0 0 CB CanBus 64 2 0
4 false 2195 0 0 PB ProfiBus 256 3 0
0 true 2191 0 0 NONE None 0 4 1
2 false 2193 1 2 DN# DeviceNet 64 5 1
3 false 2194 0 0 CB CanBus 64 6 1
4 false 2195 0 0 PB ProfiBus 256 7 1

There are 2 problems:

1) OK, this is not a real problem, but it was a surprise. The schemas
reported by the dumper are different. The upper dump shows a [network_Id]
column as column #0, while that column appears second-last in the lower
dump. It does demonstrate that the schema-handling logic is different in the
two cases.

2) HUGE PROBLEM: the [network] rows for COM# are missing! There's 10 rows in
the table but only 8 show up. Check the sequence of values in column
[linktype] and [network_Id]. The XML parser seems to have completely ignored
2 rows!

Please tell me I've overlooked something obvious. Thanks.

Richard Kucia
 
Some really good stuff here.

Thanks, Ilya :)

--
Alex Yakhnin .NET CF MVP
www.intelliprog.com | www.opennetcf.org

"Ilya Tumanov [MS]" said:
Richard,

The first problem is a shortcoming of inference process. Inference is an
attempt to 'guess' schema of the XML file using set of predefined rules.
As any 'guess', this one could be wrong and might produce some unexpected
results.

If your schema is created via inference process, columns are added in order
they've been found (might change in upcoming releases).
In the example below 'Column2' in 'Table1' will be found first, so it will
have lower ordinal than 'Column1'.
Moreover, inference process might create some hidden columns to account for
nested tables parent-child relations, so ordinals are quite unpredictable
with inference.

Let's load this sample using inference:

<DS>
<Table1>
<Column2>Col2 Data 0 </Column2>
<Table2>
<Column3>Col4 Data 0 </Column3>
</Table2>
</Table1>
<Table1>
<Column1>Col1 Data 1</Column1>
<Column2>Col2 Data 1</Column2>
<Table2>
<Column3>Col4 Data 1</Column3>
</Table2>
</Table1>
</DS>

Here's the result:

--------------------------- DataSet ----------------------
DataSet: 'DS'
----------------------- Tables -----------------------
DataTable: 'Table1
------------------- Columns ----------------------
Column: 'Column2' 'String'
Column: 'Table1_Id' 'Int32' Unique Autoincrement
Column: 'Column1' 'String'
----------------- Child Tables -------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
----------------- Parent Tables ------------------
--------------------------------------------------
DataTable: 'Table2
------------------- Columns ----------------------
Column: 'Column3' 'String'
Column: 'Table1_Id' 'Int32'
----------------- Child Tables -------------------
----------------- Parent Tables ------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
--------------------------------------------------

We can see 'Column2' ordinal is zero; 'Column1' ordinal is 2!
And we have an extra column, 'Table1_Id' with ordinal 1 added to act as a
primary/foreign keys in these nested tables.

Unexpected? Sure. This is a reason why inference should _never_ be used.
Instead, you should design the schema you need so there will be no
surprises.

Let's take a look at the saved schema:

<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="DS" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Table1">
<xs:complexType>
<xs:sequence>
<xs:element name="Column2" type="xs:string" minOccurs="0" />
<xs:element name="Column1" type="xs:string" minOccurs="0" />
<xs:element name="Table2" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="Column3" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>

Do you see 'Table1_Id' column? Nope. It will be created for you because you
have nested tables, but it will be done after 'Column2' and 'Column1' are
created.
Thus, ordinals will change:

--------------------------- DataSet ----------------------
DataSet: 'DS'
----------------------- Tables -----------------------
DataTable: 'Table1
------------------- Columns ----------------------
Column: 'Column2' 'String'
Column: 'Column1' 'String'
Column: 'Table1_Id' 'Int32' Unique Autoincrement
----------------- Child Tables -------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
----------------- Parent Tables ------------------
--------------------------------------------------
DataTable: 'Table2
------------------- Columns ----------------------
Column: 'Column3' 'String'
Column: 'Table1_Id' 'Int32'
----------------- Child Tables -------------------
----------------- Parent Tables ------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
--------------------------------------------------

Now 'Column1' ordinal is 1, and it used to be 2.

How to avoid these problems? It's easy.. Do not use inference, design
schemas yourself.

Now, to the real problem... It is a known bug introduced in SP2 with
performance optimizations. It is fixed in upcoming V2.

If you have table elements with all columns mapped as attributes (like this
one: <network linktype=0 />) and a table is not a root table, every second
row will be lost.
Possible workarounds:

1. Map at least one column with not null data as element. Primary/Foreign
key is the best candidate.
2. Do not use nested tables, use related tables. It might also save you
some space in XML file, improve loading performance.

Here's a sample schema/data with related tables:

<?xml version="1.0" encoding="utf-8" ?>
<DS>
<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="DS" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Table1">
<xs:complexType>
<xs:sequence></xs:sequence>
<xs:attribute name="Column1" type="xs:string" />
<xs:attribute name="Column2" type="xs:string" />
<xs:attribute name="PrimaryKey" type="xs:int"/>
</xs:complexType>
</xs:element>
<xs:element name="Table2">
<xs:complexType>
<xs:sequence></xs:sequence>
<xs:attribute name="Colum3" type="xs:string" />
<xs:attribute name="ForeignKey" type="xs:int" />
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
<xs:key name="DSKey1" msdata:PrimaryKey="true">
<xs:selector xpath=".//Table1" />
<xs:field xpath="@PrimaryKey" />
</xs:key>
<xs:keyref name="Table1Table2" refer="DSKey1">
<xs:selector xpath=".//Table2" />
<xs:field xpath="@ForeignKey" />
</xs:keyref>
</xs:element>
</xs:schema>
<Table1 Column1="Data11" Column2="Data12" PrimaryKey="1"/>
<Table1 Column1="Data21" Column2="Data22" PrimaryKey="2"/>
<Table2 Colum3="Data31" ForeignKey="1"/>
<Table2 Colum3="Data32" ForeignKey="2"/>
</DS>

Best regards,

Ilya

This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
From: "Richard Kucia" <[email protected]>
Subject: Error in XML parsing schemas with CF SP2?
Date: Thu, 4 Mar 2004 12:12:44 -0500
Lines: 92
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
Message-ID: <[email protected]>
Newsgroups: microsoft.public.dotnet.framework.compactframework
NNTP-Posting-Host: 212.cleveland-11-13rs.oh.dial-access.att.net 12.75.70.212
Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP11.phx.gbl
Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.framework.compactframework:47665
X-Tomcat-NG: microsoft.public.dotnet.framework.compactframework

I have an XML parsing problem that I can not resolve.

I have 2 files which contain identical data; call them SI (Schema Included)
and SX (Schema eXcluded). Each file contains 40 tables, most of which hold
only a few rows. When I view them in the Visual Studio, the files are
perfect. In fact, I originally let Visual Studio load SX and generate the
schema that I then pasted into the SI file. The data preview in Visual
Studio is correct.

I have been testing my application using various combinations of the SI and
SX files and the various ReadMode options. The data is loaded into an empty
dataset with ReadXML. A hand-typed highly abbreviated portion of the XML
data that demonstrates the problem is:

<platforms>
<platform type=0>
<networks>
<network linktype=0 />
(other network rows, with child tables [rates] and [rate])
</networks>
</platform>
<platform type=1>
<networks>
<network linktype=0 />
(other network rows, with child tables [rates] and [rate])
</networks>
</platform>

To track down the problem, I wrote a little schema-and-data table
dumper.
As
expected, tables called [platforms], [platform], [networks] and [network]
are created in the dataset. The dump for [networks] is always correct. It
is:

Table networks: networks_Id platform_Id
Parent of network with networks_Id = networks_Id
0 0
1 1

Here is the dump for [network] on SX with ReadMode.Auto. This means that the
schema needs to be inferred.

Table network: network_Id linktype enabled resource minport maxport
portname desc devices networks_Id
Parent of rates with network_Id = network_Id
0 0 true 2191 0 0 NONE None 0 0
1 1 true 2192 1 4 COM# Serial 16 0
2 2 false 2193 1 2 DN# DeviceNet 64 0
3 3 false 2194 0 0 CB CanBus 64 0
4 4 false 2195 0 0 PB ProfiBus 256 0
5 0 true 2191 0 0 NONE None 0 1
6 1 true 2192 1 8 COM# Serial 16 1
7 2 false 2193 1 2 DN# DeviceNet 64 1
8 3 false 2194 0 0 CB CanBus 64 1
9 4 false 2195 0 0 PB ProfiBus 256 1

However, here is the dump for [network] on SI with ReadMode.Auto. Since the
schema is indeed embedded in the file,
that schema should be used and thus the schema will not need to be inferred.
Remember that Visual Studio generated this
schema originally, and that the data content looks correct within the Visual
Studio preview.

Table network: linktype enabled resource minport maxport portname
desc devices network_Id networks_Id
Parent of rates with network_Id = network_Id
0 true 2191 0 0 NONE None 0 0 0
2 false 2193 1 2 DN# DeviceNet 64 1 0
3 false 2194 0 0 CB CanBus 64 2 0
4 false 2195 0 0 PB ProfiBus 256 3 0
0 true 2191 0 0 NONE None 0 4 1
2 false 2193 1 2 DN# DeviceNet 64 5 1
3 false 2194 0 0 CB CanBus 64 6 1
4 false 2195 0 0 PB ProfiBus 256 7 1

There are 2 problems:

1) OK, this is not a real problem, but it was a surprise. The schemas
reported by the dumper are different. The upper dump shows a [network_Id]
column as column #0, while that column appears second-last in the lower
dump. It does demonstrate that the schema-handling logic is different in the
two cases.

2) HUGE PROBLEM: the [network] rows for COM# are missing! There's 10
rows
in
the table but only 8 show up. Check the sequence of values in column
[linktype] and [network_Id]. The XML parser seems to have completely ignored
2 rows!

Please tell me I've overlooked something obvious. Thanks.

Richard Kucia
 
Ilya,

Thanks for the quick response and detailed explanation.

I'm extremely concerned that MS has released the new SP2 with this data-loss
bug. If I remember correctly, it was a problem with XML that caused the
original SP2 to be recalled. Perhaps I overlooked the new SP2 known-bug list
(is there one?). And what is V2, and when will it appear?

Frankly, there ought to be an emergency fix for an error that's this
serious. When a software bug causes data loss, that deserves immediate
attention. How many applications are out there which were successfully
running on pre-SP2 CF which are now *silently* malfunctioning? If a customer
purchases a PDA with SP2 on it, or a user upgrades to SP2, that PDA's
application suite is suddenly toast.

Please understand that I'm not complaining about my own situation; it's the
worldwide application implications that's frightening.

Thanks.
Richard Kucia

"Ilya Tumanov [MS]" said:
Richard,

The first problem is a shortcoming of inference process. Inference is an
attempt to 'guess' schema of the XML file using set of predefined rules.
As any 'guess', this one could be wrong and might produce some unexpected
results.

If your schema is created via inference process, columns are added in order
they've been found (might change in upcoming releases).
In the example below 'Column2' in 'Table1' will be found first, so it will
have lower ordinal than 'Column1'.
Moreover, inference process might create some hidden columns to account for
nested tables parent-child relations, so ordinals are quite unpredictable
with inference.

Let's load this sample using inference:

<DS>
<Table1>
<Column2>Col2 Data 0 </Column2>
<Table2>
<Column3>Col4 Data 0 </Column3>
</Table2>
</Table1>
<Table1>
<Column1>Col1 Data 1</Column1>
<Column2>Col2 Data 1</Column2>
<Table2>
<Column3>Col4 Data 1</Column3>
</Table2>
</Table1>
</DS>

Here's the result:

--------------------------- DataSet ----------------------
DataSet: 'DS'
----------------------- Tables -----------------------
DataTable: 'Table1
------------------- Columns ----------------------
Column: 'Column2' 'String'
Column: 'Table1_Id' 'Int32' Unique Autoincrement
Column: 'Column1' 'String'
----------------- Child Tables -------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
----------------- Parent Tables ------------------
--------------------------------------------------
DataTable: 'Table2
------------------- Columns ----------------------
Column: 'Column3' 'String'
Column: 'Table1_Id' 'Int32'
----------------- Child Tables -------------------
----------------- Parent Tables ------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
--------------------------------------------------

We can see 'Column2' ordinal is zero; 'Column1' ordinal is 2!
And we have an extra column, 'Table1_Id' with ordinal 1 added to act as a
primary/foreign keys in these nested tables.

Unexpected? Sure. This is a reason why inference should _never_ be used.
Instead, you should design the schema you need so there will be no
surprises.

Let's take a look at the saved schema:

<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="DS" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Table1">
<xs:complexType>
<xs:sequence>
<xs:element name="Column2" type="xs:string" minOccurs="0" />
<xs:element name="Column1" type="xs:string" minOccurs="0" />
<xs:element name="Table2" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="Column3" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>

Do you see 'Table1_Id' column? Nope. It will be created for you because you
have nested tables, but it will be done after 'Column2' and 'Column1' are
created.
Thus, ordinals will change:

--------------------------- DataSet ----------------------
DataSet: 'DS'
----------------------- Tables -----------------------
DataTable: 'Table1
------------------- Columns ----------------------
Column: 'Column2' 'String'
Column: 'Column1' 'String'
Column: 'Table1_Id' 'Int32' Unique Autoincrement
----------------- Child Tables -------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
----------------- Parent Tables ------------------
--------------------------------------------------
DataTable: 'Table2
------------------- Columns ----------------------
Column: 'Column3' 'String'
Column: 'Table1_Id' 'Int32'
----------------- Child Tables -------------------
----------------- Parent Tables ------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
--------------------------------------------------

Now 'Column1' ordinal is 1, and it used to be 2.

How to avoid these problems? It's easy.. Do not use inference, design
schemas yourself.

Now, to the real problem... It is a known bug introduced in SP2 with
performance optimizations. It is fixed in upcoming V2.

If you have table elements with all columns mapped as attributes (like this
one: <network linktype=0 />) and a table is not a root table, every second
row will be lost.
Possible workarounds:

1. Map at least one column with not null data as element. Primary/Foreign
key is the best candidate.
2. Do not use nested tables, use related tables. It might also save you
some space in XML file, improve loading performance.

Here's a sample schema/data with related tables:

<?xml version="1.0" encoding="utf-8" ?>
<DS>
<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="DS" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Table1">
<xs:complexType>
<xs:sequence></xs:sequence>
<xs:attribute name="Column1" type="xs:string" />
<xs:attribute name="Column2" type="xs:string" />
<xs:attribute name="PrimaryKey" type="xs:int"/>
</xs:complexType>
</xs:element>
<xs:element name="Table2">
<xs:complexType>
<xs:sequence></xs:sequence>
<xs:attribute name="Colum3" type="xs:string" />
<xs:attribute name="ForeignKey" type="xs:int" />
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
<xs:key name="DSKey1" msdata:PrimaryKey="true">
<xs:selector xpath=".//Table1" />
<xs:field xpath="@PrimaryKey" />
</xs:key>
<xs:keyref name="Table1Table2" refer="DSKey1">
<xs:selector xpath=".//Table2" />
<xs:field xpath="@ForeignKey" />
</xs:keyref>
</xs:element>
</xs:schema>
<Table1 Column1="Data11" Column2="Data12" PrimaryKey="1"/>
<Table1 Column1="Data21" Column2="Data22" PrimaryKey="2"/>
<Table2 Colum3="Data31" ForeignKey="1"/>
<Table2 Colum3="Data32" ForeignKey="2"/>
</DS>

Best regards,

Ilya

This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
From: "Richard Kucia" <[email protected]>
Subject: Error in XML parsing schemas with CF SP2?
Date: Thu, 4 Mar 2004 12:12:44 -0500
Lines: 92
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
Message-ID: <[email protected]>
Newsgroups: microsoft.public.dotnet.framework.compactframework
NNTP-Posting-Host: 212.cleveland-11-13rs.oh.dial-access.att.net 12.75.70.212
Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP11.phx.gbl
Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.framework.compactframework:47665
X-Tomcat-NG: microsoft.public.dotnet.framework.compactframework

I have an XML parsing problem that I can not resolve.

I have 2 files which contain identical data; call them SI (Schema Included)
and SX (Schema eXcluded). Each file contains 40 tables, most of which hold
only a few rows. When I view them in the Visual Studio, the files are
perfect. In fact, I originally let Visual Studio load SX and generate the
schema that I then pasted into the SI file. The data preview in Visual
Studio is correct.

I have been testing my application using various combinations of the SI and
SX files and the various ReadMode options. The data is loaded into an empty
dataset with ReadXML. A hand-typed highly abbreviated portion of the XML
data that demonstrates the problem is:

<platforms>
<platform type=0>
<networks>
<network linktype=0 />
(other network rows, with child tables [rates] and [rate])
</networks>
</platform>
<platform type=1>
<networks>
<network linktype=0 />
(other network rows, with child tables [rates] and [rate])
</networks>
</platform>

To track down the problem, I wrote a little schema-and-data table
dumper.
As
expected, tables called [platforms], [platform], [networks] and [network]
are created in the dataset. The dump for [networks] is always correct. It
is:

Table networks: networks_Id platform_Id
Parent of network with networks_Id = networks_Id
0 0
1 1

Here is the dump for [network] on SX with ReadMode.Auto. This means that the
schema needs to be inferred.

Table network: network_Id linktype enabled resource minport maxport
portname desc devices networks_Id
Parent of rates with network_Id = network_Id
0 0 true 2191 0 0 NONE None 0 0
1 1 true 2192 1 4 COM# Serial 16 0
2 2 false 2193 1 2 DN# DeviceNet 64 0
3 3 false 2194 0 0 CB CanBus 64 0
4 4 false 2195 0 0 PB ProfiBus 256 0
5 0 true 2191 0 0 NONE None 0 1
6 1 true 2192 1 8 COM# Serial 16 1
7 2 false 2193 1 2 DN# DeviceNet 64 1
8 3 false 2194 0 0 CB CanBus 64 1
9 4 false 2195 0 0 PB ProfiBus 256 1

However, here is the dump for [network] on SI with ReadMode.Auto. Since the
schema is indeed embedded in the file,
that schema should be used and thus the schema will not need to be inferred.
Remember that Visual Studio generated this
schema originally, and that the data content looks correct within the Visual
Studio preview.

Table network: linktype enabled resource minport maxport portname
desc devices network_Id networks_Id
Parent of rates with network_Id = network_Id
0 true 2191 0 0 NONE None 0 0 0
2 false 2193 1 2 DN# DeviceNet 64 1 0
3 false 2194 0 0 CB CanBus 64 2 0
4 false 2195 0 0 PB ProfiBus 256 3 0
0 true 2191 0 0 NONE None 0 4 1
2 false 2193 1 2 DN# DeviceNet 64 5 1
3 false 2194 0 0 CB CanBus 64 6 1
4 false 2195 0 0 PB ProfiBus 256 7 1

There are 2 problems:

1) OK, this is not a real problem, but it was a surprise. The schemas
reported by the dumper are different. The upper dump shows a [network_Id]
column as column #0, while that column appears second-last in the lower
dump. It does demonstrate that the schema-handling logic is different in the
two cases.

2) HUGE PROBLEM: the [network] rows for COM# are missing! There's 10
rows
in
the table but only 8 show up. Check the sequence of values in column
[linktype] and [network_Id]. The XML parser seems to have completely ignored
2 rows!

Please tell me I've overlooked something obvious. Thanks.

Richard Kucia
 
Richard,

No, it was not XML, it was SQL CE compatibility. No matter, you right, it's
quite bad.
I don't remember if this problem was found after SP2 was shipped or fix was
rejected for some reason.
In any case, I understand your frustration and I'm sorry about this, we'll
do better next time.

As to emergency fix, we do monitor magnitude of a problem.
So far, it was quite limited; I believe you're the second one affected.
You see, you need a very special set of circumstances to hit this problem:

1. Table should have all columns mapped as attributes, no elements or TEXT
mapped columns.
2. Table should be a nested table.
3. Table should not have any nested tables.
4. Table should have more than one record related to a single record in a
parent table.

At this point, I would say emergency patch is unlikely since problem is
relatively isolated and there's a bunch of workarounds from slightly
modifying data/schema to doing nothing and sticking with SP1 until V2
arrives.

V2 beta will be available at MDC (http://www.microsoftmdc.com/?id=w51732)
or soon after, release will follow.

Again, sorry about this, we're trying hard to write a bug free code, but
we're just humans…

Best regards,

Ilya

This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
From: "Richard Kucia" <[email protected]>
References: <[email protected]>
Subject: Re: Error in XML parsing schemas with CF SP2?
Date: Fri, 5 Mar 2004 10:47:01 -0500
Lines: 344
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
Message-ID: <eg9l#[email protected]>
Newsgroups: microsoft.public.dotnet.framework.compactframework
NNTP-Posting-Host: 8.cleveland-14-15rs.oh.dial-access.att.net 12.75.71.8
Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP10.phx.gbl
Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.framework.compactframework:47741
X-Tomcat-NG: microsoft.public.dotnet.framework.compactframework

Ilya,

Thanks for the quick response and detailed explanation.

I'm extremely concerned that MS has released the new SP2 with this data-loss
bug. If I remember correctly, it was a problem with XML that caused the
original SP2 to be recalled. Perhaps I overlooked the new SP2 known-bug list
(is there one?). And what is V2, and when will it appear?

Frankly, there ought to be an emergency fix for an error that's this
serious. When a software bug causes data loss, that deserves immediate
attention. How many applications are out there which were successfully
running on pre-SP2 CF which are now *silently* malfunctioning? If a customer
purchases a PDA with SP2 on it, or a user upgrades to SP2, that PDA's
application suite is suddenly toast.

Please understand that I'm not complaining about my own situation; it's the
worldwide application implications that's frightening.

Thanks.
Richard Kucia

"Ilya Tumanov [MS]" said:
Richard,

The first problem is a shortcoming of inference process. Inference is an
attempt to 'guess' schema of the XML file using set of predefined rules.
As any 'guess', this one could be wrong and might produce some unexpected
results.

If your schema is created via inference process, columns are added in order
they've been found (might change in upcoming releases).
In the example below 'Column2' in 'Table1' will be found first, so it will
have lower ordinal than 'Column1'.
Moreover, inference process might create some hidden columns to account for
nested tables parent-child relations, so ordinals are quite unpredictable
with inference.

Let's load this sample using inference:

<DS>
<Table1>
<Column2>Col2 Data 0 </Column2>
<Table2>
<Column3>Col4 Data 0 </Column3>
</Table2>
</Table1>
<Table1>
<Column1>Col1 Data 1</Column1>
<Column2>Col2 Data 1</Column2>
<Table2>
<Column3>Col4 Data 1</Column3>
</Table2>
</Table1>
</DS>

Here's the result:

--------------------------- DataSet ----------------------
DataSet: 'DS'
----------------------- Tables -----------------------
DataTable: 'Table1
------------------- Columns ----------------------
Column: 'Column2' 'String'
Column: 'Table1_Id' 'Int32' Unique Autoincrement
Column: 'Column1' 'String'
----------------- Child Tables -------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
----------------- Parent Tables ------------------
--------------------------------------------------
DataTable: 'Table2
------------------- Columns ----------------------
Column: 'Column3' 'String'
Column: 'Table1_Id' 'Int32'
----------------- Child Tables -------------------
----------------- Parent Tables ------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
--------------------------------------------------

We can see 'Column2' ordinal is zero; 'Column1' ordinal is 2!
And we have an extra column, 'Table1_Id' with ordinal 1 added to act as a
primary/foreign keys in these nested tables.

Unexpected? Sure. This is a reason why inference should _never_ be used.
Instead, you should design the schema you need so there will be no
surprises.

Let's take a look at the saved schema:

<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="DS" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Table1">
<xs:complexType>
<xs:sequence>
<xs:element name="Column2" type="xs:string" minOccurs="0" />
<xs:element name="Column1" type="xs:string" minOccurs="0" />
<xs:element name="Table2" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="Column3" type="xs:string" minOccurs="0" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>

Do you see 'Table1_Id' column? Nope. It will be created for you because you
have nested tables, but it will be done after 'Column2' and 'Column1' are
created.
Thus, ordinals will change:

--------------------------- DataSet ----------------------
DataSet: 'DS'
----------------------- Tables -----------------------
DataTable: 'Table1
------------------- Columns ----------------------
Column: 'Column2' 'String'
Column: 'Column1' 'String'
Column: 'Table1_Id' 'Int32' Unique Autoincrement
----------------- Child Tables -------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
----------------- Parent Tables ------------------
--------------------------------------------------
DataTable: 'Table2
------------------- Columns ----------------------
Column: 'Column3' 'String'
Column: 'Table1_Id' 'Int32'
----------------- Child Tables -------------------
----------------- Parent Tables ------------------
Relation 'Table1_Table2': 'Table2'->'Table1, nested
--------------------------------------------------

Now 'Column1' ordinal is 1, and it used to be 2.

How to avoid these problems? It's easy.. Do not use inference, design
schemas yourself.

Now, to the real problem... It is a known bug introduced in SP2 with
performance optimizations. It is fixed in upcoming V2.

If you have table elements with all columns mapped as attributes (like this
one: <network linktype=0 />) and a table is not a root table, every second
row will be lost.
Possible workarounds:

1. Map at least one column with not null data as element. Primary/Foreign
key is the best candidate.
2. Do not use nested tables, use related tables. It might also save you
some space in XML file, improve loading performance.

Here's a sample schema/data with related tables:

<?xml version="1.0" encoding="utf-8" ?>
<DS>
<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="DS" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Table1">
<xs:complexType>
<xs:sequence></xs:sequence>
<xs:attribute name="Column1" type="xs:string" />
<xs:attribute name="Column2" type="xs:string" />
<xs:attribute name="PrimaryKey" type="xs:int"/>
</xs:complexType>
</xs:element>
<xs:element name="Table2">
<xs:complexType>
<xs:sequence></xs:sequence>
<xs:attribute name="Colum3" type="xs:string" />
<xs:attribute name="ForeignKey" type="xs:int" />
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
<xs:key name="DSKey1" msdata:PrimaryKey="true">
<xs:selector xpath=".//Table1" />
<xs:field xpath="@PrimaryKey" />
</xs:key>
<xs:keyref name="Table1Table2" refer="DSKey1">
<xs:selector xpath=".//Table2" />
<xs:field xpath="@ForeignKey" />
</xs:keyref>
</xs:element>
</xs:schema>
<Table1 Column1="Data11" Column2="Data12" PrimaryKey="1"/>
<Table1 Column1="Data21" Column2="Data22" PrimaryKey="2"/>
<Table2 Colum3="Data31" ForeignKey="1"/>
<Table2 Colum3="Data32" ForeignKey="2"/>
</DS>

Best regards,

Ilya

This posting is provided "AS IS" with no warranties, and confers no rights.
--------------------
From: "Richard Kucia" <[email protected]>
Subject: Error in XML parsing schemas with CF SP2?
Date: Thu, 4 Mar 2004 12:12:44 -0500
Lines: 92
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
Message-ID: <[email protected]>
Newsgroups: microsoft.public.dotnet.framework.compactframework
NNTP-Posting-Host: 212.cleveland-11-13rs.oh.dial-access.att.net 12.75.70.212
Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP11.phx.gbl
Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.framework.compactframework:47665
X-Tomcat-NG: microsoft.public.dotnet.framework.compactframework

I have an XML parsing problem that I can not resolve.

I have 2 files which contain identical data; call them SI (Schema Included)
and SX (Schema eXcluded). Each file contains 40 tables, most of which hold
only a few rows. When I view them in the Visual Studio, the files are
perfect. In fact, I originally let Visual Studio load SX and generate the
schema that I then pasted into the SI file. The data preview in Visual
Studio is correct.

I have been testing my application using various combinations of the
SI
and
SX files and the various ReadMode options. The data is loaded into an empty
dataset with ReadXML. A hand-typed highly abbreviated portion of the XML
data that demonstrates the problem is:

<platforms>
<platform type=0>
<networks>
<network linktype=0 />
(other network rows, with child tables [rates] and [rate])
</networks>
</platform>
<platform type=1>
<networks>
<network linktype=0 />
(other network rows, with child tables [rates] and [rate])
</networks>
</platform>

To track down the problem, I wrote a little schema-and-data table
dumper.
As
expected, tables called [platforms], [platform], [networks] and [network]
are created in the dataset. The dump for [networks] is always correct. It
is:

Table networks: networks_Id platform_Id
Parent of network with networks_Id = networks_Id
0 0
1 1

Here is the dump for [network] on SX with ReadMode.Auto. This means
that
the
schema needs to be inferred.

Table network: network_Id linktype enabled resource minport maxport
portname desc devices networks_Id
Parent of rates with network_Id = network_Id
0 0 true 2191 0 0 NONE None 0 0
1 1 true 2192 1 4 COM# Serial 16 0
2 2 false 2193 1 2 DN# DeviceNet 64 0
3 3 false 2194 0 0 CB CanBus 64 0
4 4 false 2195 0 0 PB ProfiBus 256 0
5 0 true 2191 0 0 NONE None 0 1
6 1 true 2192 1 8 COM# Serial 16 1
7 2 false 2193 1 2 DN# DeviceNet 64 1
8 3 false 2194 0 0 CB CanBus 64 1
9 4 false 2195 0 0 PB ProfiBus 256 1

However, here is the dump for [network] on SI with ReadMode.Auto.
Since
the
schema is indeed embedded in the file,
that schema should be used and thus the schema will not need to be inferred.
Remember that Visual Studio generated this
schema originally, and that the data content looks correct within the Visual
Studio preview.

Table network: linktype enabled resource minport maxport portname
desc devices network_Id networks_Id
Parent of rates with network_Id = network_Id
0 true 2191 0 0 NONE None 0 0 0
2 false 2193 1 2 DN# DeviceNet 64 1 0
3 false 2194 0 0 CB CanBus 64 2 0
4 false 2195 0 0 PB ProfiBus 256 3 0
0 true 2191 0 0 NONE None 0 4 1
2 false 2193 1 2 DN# DeviceNet 64 5 1
3 false 2194 0 0 CB CanBus 64 6 1
4 false 2195 0 0 PB ProfiBus 256 7 1

There are 2 problems:

1) OK, this is not a real problem, but it was a surprise. The schemas
reported by the dumper are different. The upper dump shows a [network_Id]
column as column #0, while that column appears second-last in the lower
dump. It does demonstrate that the schema-handling logic is different
in
the
two cases.

2) HUGE PROBLEM: the [network] rows for COM# are missing! There's 10
rows
in
the table but only 8 show up. Check the sequence of values in column
[linktype] and [network_Id]. The XML parser seems to have completely ignored
2 rows!

Please tell me I've overlooked something obvious. Thanks.

Richard Kucia
 
Back
Top