The DOM Level 2 Filter and Iterator interfaces extend the functionality of the DOM to allow simple and efficient traversal of document subtrees, node lists, or the results of queries.
This proposal contains Filter and Iterator interfaces, but no query interfaces. A separate specification will be prepared for query interfaces, which will be query-language independent.
In several popular approaches to software design, iterators are considered a basic building block for building reusable software and software libraries. For instance, they are fundamental to the Design Patterns approach, STL, and the Java libraries. The main advantages of node iterators in the DOM are:
An iterator allows the nodes of a data structure to be returned sequentially. When an iterator is first created, calling nextNode() returns the first node. When no more nodes are present, nextNode() returns a null. It is important to remember that DOM structures may change as a document is loaded - when nextNode() finds no more nodes, it is still quite possible that further nodes may be added in the next instant. Since iterators do not know how to predict the future, there is no way to check whether further nodes may be added at any given time.Mutation and Iterators
Since the DOM permits editing, and an iterator may be active while the data structure it navigates is being edited, an iterator must behave gracefully in the face of change.
Filters allow the user to "filter out" nodes. Each filter contains a user-written function that looks at a node and determines whether or not it should be filtered out. To use a filter, you create an iterator that uses the filter. The iterator applies the filter to each node, and if the filter rejects the node, the iterator skips over the node as though it were not present in the document. Filters are easy to write, since they need not know how to navigate the structure on which they operate, and they can be reused for different kinds of iterators that operate on different data structures.
Let's use a filter to write code to find the named anchors in an HTML document. In HTML, an HREF can refer to any <A> element that has a NAME attribute. The first step is to write a filter that looks at a node and determines whether it is a named anchor:
class NamedAnchorFilter implements NodeFilter { boolean acceptNode(Node n) { if (n instanceof Element) { Element e = n; if (n.getAttribute("NAME") != NULL) { return true; } } return false; } }
To use this filter, create an instance of the filter and create an iterator using it:
NamedAnchorFilter naf; NodeIterator nit = document.createFilteredTreeIterator(naf);
At this point, the iterator will show only the named anchors in the document. Writing equivalent code without filters would be marginally simpler, and no less efficient. The advantage of using filters is that it allows reuse. For instance, if you have another part of your program that needs to find the named anchors in a NodeList, you can use the filter the same way you used it for the document:
NamedAnchorFilter naf; NodeIterator nit = nodelist.createFilteredTreeIterator(naf);
interface NodeIterator { Node nextNode(); void reset(); };
nextNode
Node
in the set being iterated over,
or NULL if there are no more members in that set.
reset
interface NodeFilter { boolean acceptNode(in Node n); };
acceptNode
n |
The node to check to see if it passes the filter or not. |
NodeIterator::nextNode()
method,
FALSE if this node is to be ignored.</
NodeIterators are used to step through a set of nodes, e.g. the set of nodes in a NodeList, the document subtree governed by a particular node, the results of a query, or any other set of nodes. The NodeIterator interface is very simple, containing only two methods.
interface NodeIterator { Node nextNode(); void reset(); };
The nextNode() method returns the next Node. The reset() method sets the internal position back to the start position. When a NodeIterator is first created or reset, the nextNode() method returns the first node. If there are no more nodes, nextNode() returns null.
Any iterator that returns nodes may implement the NodeIterator
interface. Users and vendor libraries may also choose to create
iterators that implement the NodeIterator interface.
Iterators can be used with or without filters, and flags can be set to determine which node types are traversed by an iterator. The factory methods that create iterators also determine the exact behavior of the iterator.
Node has methods that create iterators to traverse the node and its children in document order (depth first, pre-order traversal, which is equivalent to the order in which the start tags occur in the text representation of the document). The following Node methods create iterators:
NodeIterator createTreeIterator(); NodeIterator createSelectiveTreeIterator(int whatToShow); NodeIterator createFilteredTreeIterator(NodeFilter filter); NodeIterator createSelectiveFilteredTreeIterator(int whatToShow, NodeFilter filter);
In the above factories, the "whatToShow" flag is allows parameters to determine whether entities are shown as expanded, and which node types are shown:
public static final int TW_DEFAULT = 0x0022; public static final int TW_ALL = 0xFFFF; public static final int TW_ELEMENT = 0x0002; public static final int TW_PI = 0x0008; public static final int TW_COMMENT = 0x0010; public static final int TW_TEXT = 0x0020; public static final int TW_ENTITYREF = 0x0040;
These flags can be combined using OR:
Node iter=factory.create(root, TW_ELEMENT | TW_PI | TW_COMMENT | TW_EXPANDED);
The default view shows elements and text, but no other nodes (attributes are retrieved from the elements). The constant TW_DEFAULT is a mask that defines this default view.
If TW_ENTITYREF is not set, entities are expanded. If TW_ENTITYREF is set, entity references will be encountered by the iterator. There is no setting that shows both the entity reference and its expansion.
Several people have suggested that the functionality of whatToShow be implemented using filters. We feel that it is better to implement them using iterators, since it makes it possible to provide a more efficient implementation. A filter must examine each node individually; an iterator can make use of internal data structures to examine only those nodes that are desired.
Containers should also be able to create iterators. For instance, NodeList will contain the following method, which creates an iterator to traverse the list:
NodeIterator createListIterator();
In a later version of Level 2, when queries are supported, we will also want factory methods that can issue a query and provide an iterator for the result set. These methods may look something like this:
NodeIterator createTreeQueryIterator(DOMString query); NodeIterator createListQueryIterator(DOMString query);
Filters are simply functions that "filter out" nodes. If an iterator is given a filter, before it returns the next node, it applies the filter. If the filter says to show the node, it returns it; otherwise, the iterator looks for the next node. Node filters implement the NodeFilter interface:
interface NodeFilter { boolean acceptNode( Node n ); }
If acceptNode() returns true, the node is accepted; otherwise, it is rejected.
Filters do not need to know how to iterate, nor do they need to know anything about the data structure that is being iterated. This makes it very easy to write filters, since the only thing they have to know how to do is evaluate a single node. One filter may be used with a number of different kinds of iterators, encouraging code reuse.
The DOM does not specify any concrete NodeFilter classes; NodeFilter
is just an interface that allows users to write their own filters.