GetElementById Pitfalls

Intro

Contrary to popular belief, getElementById() (W3 spec, PHP manual entry) does not return the element with the attribute named <em>id</em> by default. According to the DOM specs, getElementById does return the element with an attribute defined as attribute type ID. In HTML (and therefore all HTML DOM implementations in browsers, SVG is another example), this happens to be the attribute named id. But if you want to use getElementById with an arbitrary XML document, you have to define that first. But there's also an easy solution with the recent addition of xml__id support in libxml2, at least if you're in control of the XML documents.

Following are different approaches to make getElementById work with your XML documents (the examples are for PHP 5, but the techniques should work with other languages as well)

DTD

You can define your ID attributes with a DTD

You have to define the ID attribute for every element. And your ID attribute doesn't have to be called id, it can be every valid attribute name. Of course, the DTD definition can also be external.

Here's how it's defined in the XHTML 1.0 DTD -> http://www.w3.org/TR/xhtml1/dtds.html#dtdentry_xhtml1-strict.dtd_coreattrs

XML Schema

Instead of a DTD, you can also use a XML Schema document. I'm no Schema expert, therefore the following is just a snippet of how it should basically look like. Any additions to it are welcome

Relax NG

In Relax NG, you can also define ID attributes with a data-type expression:

Also untested.

xml:id

As the above scenarios all have their drawbacks (it relies on external definitions of what's an ID and also needs a validating parser), the W3C came up with another approach, the xml:id attribute. The specs are quite short and worth a read for more information about the whole problem.

As of version 2.6.9, libxml2 (and therefore PHP) does support the xml:id specs:

The ID does have to be a valid NCName, which for example means, that the first letter can't be a number.

Addition from discussion on php-xml-dev:

XPath

If you can't change the XML document and use xml:id (or adding DTD definitions) or you don't have a validating parser, you can also use XPath for getting elements with an attribute named id (this would also work with @xml:id instead of @id, if you don't have libxml2 2.6.9 installed).

This does actually return the element with an attribute named id in contrast to the DTD approach above, which returns the element with the attribute defined as ID. You should be aware of that difference.

Benchmarks

See my slides about "XML on Speed" to see some benchmarks with XPath vs. getElementById() : http://php5.bitflux.org/xmlonspeed/slide_21.php

More Links

Category:Articles

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.
  1. Sep 30, 2007

    Anonymous says:

    This page is very helpful, thank you. For the schema, you need to run the comma...

    This page is very helpful, thank you.

    For the schema, you need to run the command:

    $boolResult = $DOMDocument->schemaValidate("/root/too/the.xsd");

  2. Dec 28, 2007

    Anonymous says:

    Thank you very much;

    Thank you very much;

  3. Jun 30, 2008

    Anonymous says:

    Thanks for this article ... A nice way to implement the X-Path variant is to ex...

    Thanks for this article ...

    A nice way to implement the X-Path variant is to extend the DOMDocument class and override getElementById.

    looks something like this:

    class CustomDOMDocument extends DOMDocument {

    public function getElementById($id)

    Unknown macro: { $xpath = new DOMXPath($this); $result = $xpath->query("//*[@id='$id']"); return $result->item(0); }

    }

  4. Jul 28, 2008

    Anonymous says:

    Thanks very much for that XPath-trick! Never would have found that by myself.

    Thanks very much for that XPath-trick!
    Never would have found that by myself.

  5. Oct 01, 2008

    Anonymous says:

    I choose the relaxNG trick (and made the following generic schema): $doc = new...

    I choose the relaxNG trick (and made the following generic schema):

    $doc = new DOMDocument();
    $doc->load(...);
    
    $rng = '
    <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
        <start>
            <element>
                <anyName/>
                <ref name="anythingID"/>
            </element>
        </start>
        <define name="anythingID">
            <zeroOrMore>
                <choice>
                    <element>
                        <anyName/>
                        <ref name="anythingID"/>
                    </element>
                    <attribute name="id">
                        <data type="ID"/>
                    </attribute>
                    <zeroOrMore>
                        <attribute><anyName/></attribute>
                    </zeroOrMore>
                    <text/>
                </choice>
            </zeroOrMore>
        </define>
    </grammar>
    ';
    
    
    $doc->relaxNGValidateSource($rng);
    var_dump($doc->getElementById('id1'));
    

    Note that ID values must be valid ones :
      - integers do no work!
      - @see http://www.w3.org/TR/REC-xml/#id
      - => (Letter | '' | ':') ( Letter | Digit | '.' | '-' | '' | ':' | CombiningChar | Extender  )*

  6. Oct 04, 2008

    Anonymous says:

    Quite dirty but worky: Apply id to any element after loading. $doc->loadHT...

    Quite dirty but worky: Apply id to any element after loading.

    $doc->loadHTML($html);

    $elements = $doc->getElementsByTagName('*');

    foreach($elements as $element)
    {
    try

    Unknown macro: { $element->setIdAttribute('id', true); }

    catch(Exception $e) {}
    }

  7. Apr 27, 2009

    Anonymous says:

    Just wanted to say... Thanks

    Just wanted to say... Thanks

  8. Jun 28, 2009

    Anonymous says:

    Great Article. Thanks)

    Great Article. Thanks)


These projects are supported by Liip AG