XML Tips

 

09 Jul 2002 01:03

 

Use DTDs (or schemas)

DTDs should be defined for all XML documents.  This reduces the number of complete DTD rewrites and also catches bugs during development.  You must use a DTD if you want your parser to generate a hash table for fast IDREF access.

See Also.


Creating XML Documents in Code

  1. Use the DOM.  

    Unfortunately, the DOM1 interface does not let you create a <!DOCTYPE> node, so if you want to create documents with a <!DOCTYPE> you must use string concatenation and then parse the string.

    One compromise to creating the entire document as a string is to build a string consisting of the smallest possible valid document with a DTD reference, parse the string, and then modify the DOM directly.
  2. String concatenation
    • You must "escape" certain characters via string substitution.
    • " to &quot; attribute values only
      & to &amp;
      < to &lt;
      > to &gt;

It's not easy to determine whether using the DOM is better than string concatenation.  Speed is usually a function of memory allocations and both methods allocate memory heavily.  Strings need a large contiguous chunk of memory while the DOM does not -- this can make a difference performance-wise because the memory allocator doesn't have to find a large free block when using the DOM.  However, the DOM uses more total memory than string concatenation.


Use XPATH to Traverse the DOM

XPATH can reduce code and bugs and protect code against minor changes to the DTD.  The downside is that XPATH expressions must be compiled and are significantly slower than using the DOM API.

Here is an example in Java.

If you want MSXML3 to use XPATH instead of "MS patterns" in methods like selectNodes, you need to set the SelectionLanguage property:

CComPtr<IXMLDOMDocument2> idom2;
CComVariant v(L"XPath");
idom2->setProperty(CComBSTR("SelectionLanguage"),v);


Datatypes

All values in an XML document are just text.  Datetime, binary, and numeric values must be derived by parsing text.

Binary Data

Binary data should be encoded using base64.

Decimal values

Encode all decimal values with a period representing the decimal point.

Date and time

  Use ISO8601:1988