Preserve XML whitespace in Excel 2003

  • Thread starter Thread starter ghoyer
  • Start date Start date
G

ghoyer

I have a hand-made schema, which I'm trying to get Excel 2003 to treat
properly.
The problem is that Excel insists on stripping the whitespace from my
strings.
The WC3 solution is to specify xml:space="preserve" like so:

(?xml version="1.0" encoding="UTF-8"?)
(ns1:aSchema xmlns:ns1="uri.mine" xml:space="preserve")
(myElement)
(Display) ,TEST (/Display)
(/myElement)
(/ns1:aSchema)



This however is not a stable XML file for Excel, since it insists on using
it's own prefixes for namespaces -- even reserved ones.
Importing the above and exporting it produces:


(?xml version='1.0' encoding='UTF-8'?)
(ns1:aSchema ns2:space="preserve"
xmlns:ns2="http://www.w3.org/XML/1998/namespace" xmlns:ns1="uri.mine"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance")
(myElement)
(Display) ,TEST (/Display)
(/myElement)
(/ns1:aSchema)



This fails validation miserably -- Python's Expat barfs with

xml.parsers.expat.ExpatError: prefix must not be bound to one of the
reserved namespace names: line 2, column 0


since it's smart enough to know that the namespace Excel prefixed as ns2
should not be re-prefixed.

(and the solution is not to drop namespaces -- I'm using that for version
control. Since I'm confident that the required elements for by XML will
change in the future, the schema is namespaced, so that different versions
can coexist and can be "inherited")

Moreover, the Schema that Excel generates for this XML is a complete disaster:

The XML Map aSchema_Map contains 3 schemas.

(!-- Schema #1 named Schema3 by Excel --)
(?xml version='1.0' encoding='UTF-8'?)
(xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="uri.mine" xmlns:ns0="uri.mine"
xmlns:ns1="http://www.w3.org/XML/1998/namespace")
(xsd:import namespace="http://www.w3.org/XML/1998/namespace")(/xsd:import)
(xsd:import)(/xsd:import)
(xsd:element nillable="true" name="aSchema")
(xsd:complexType)
(xsd:sequence minOccurs="0")
(xsd:element minOccurs="0" ref="myElement")(/xsd:element)
(/xsd:sequence)
(xsd:attribute ref="ns1:space")(/xsd:attribute)
(/xsd:complexType)
(/xsd:element)
(/xsd:schema)

(!-- Schema #2 named Schema1 by Excel --)
(?xml version='1.0' encoding='UTF-8'?)
(xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3.org/XML/1998/namespace"
xmlns:ns0="http://www.w3.org/XML/1998/namespace")
(xsd:attribute name="space" type="xsd:string")(/xsd:attribute)
(/xsd:schema)

(!-- Schema #3 named Schema2 by Excel --)
(?xml version='1.0' encoding='UTF-8'?)
(xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema")
(xsd:element nillable="true" name="myElement")
(xsd:complexType)
(xsd:sequence minOccurs="0")
(xsd:element minOccurs="0" nillable="true" type="xsd:string"
name="Display" form="unqualified")(/xsd:element)
(/xsd:sequence)
(/xsd:complexType)
(/xsd:element)
(/xsd:schema)



WTF? Excel itself barfs trying to deal with this. You can't do this as a
single file. You can import Schema #2 into Excel. This is at best pseudocode.

I can coerce Excel into generating the (namespaceprefix):space="preserve"
from a single schema file:

(?xml version='1.0' encoding='UTF-8'?)
(xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:ns.1="uri.mine.1" targetNamespace="uri.mine.1" version="$Revision: #3
$" elementFormDefault="unqualified" attributeFormDefault="unqualified")
(xsd:import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="http://www.w3.org/2001/xml.xsd"/)

(xsd:element name="myElement" type="ns.1:myType"/)

(xsd:complexType name="myType")
(xsd:sequence)
(xsd:element type="xsd:string" name="Display" /)
(/xsd:sequence)
(xsd:attribute ref="xml:space" use="required" fixed="preserve"/)
(/xsd:complexType)
(/xsd:schema)



but this will still result in the Expat validation failure:

(?xml version='1.0' encoding='UTF-8'?)
(ns1:myElement ns2:space="preserve"
xmlns:ns2="http://www.w3.org/XML/1998/namespace" xmlns:ns1="uri.mine.1")
(Display) ,TEST (/Display)
(/ns1:myElement)


Note also, that this requires a cell mapped to the xml:space attribute,
with the value set to "preserve" in spite of the fact that the schema marks
it "required" and "fixed".

tl;dr Excel 2003 sucks at schemas: what schema can preserve whitespace and
still pass validation?
 
Back
Top