5.10. Finding direct children of a node

Another useful techique when parsing XML documents is finding all the direct child elements of a particular element. For instance, in our grammar files, a ref element can have several p elements, each of which can contain many things, including other p elements. We want to find just the p elements that are children of the ref, not p elements that are children of other p elements.

You might think we could simply use getElementsByTagName for this, but we can’t. getElementsByTagName searches recursively and returns a single list for all the elements it finds. Since p elements can contain other p elements, we can’t use getElementsByTagName, because it would return nested p elements that we don’t want. To find only direct child elements, we’ll need to do it ourselves.

Example 5.39. Finding direct child elements

    def randomChildElement(self, node):
        choices = [e for e in node.childNodes
                   if e.nodeType == e.ELEMENT_NODE] 1 2 3
        chosen = random.choice(choices)             4
        return chosen                              
1 As we saw in Example 5.9, the childNodes attribute returns a list of all the child nodes of an element.
2 However, as we saw in Example 5.11, the list returned by childNodes contains all different types of nodes, including text nodes. That’s not what we’re looking for here. We only want the children that are elements.
3 Each node has a nodeType attribute, which can be ELEMENT_NODE, TEXT_NODE, COMMENT_NODE, or any number of other values. The complete list of possible values is in the __init__.py file in the xml.dom package. (See Packages for more on packages.) But we’re just interested in nodes that are elements, so we can filter the list to only include those nodes whose nodeType is ELEMENT_NODE.
4 Once we have a list of actual elements, choosing a random one is easy. Python comes with a module called random which includes several useful functions. The random.choice function takes a list of any number of items and returns a random item. In this case, the list contains p elements, so chosen is now a p element selected at random from the children of the ref element we were given.