In SAXON I use a Java method:
setElementHandler(pattern, handler)
The pattern is usually an element name but it can also be a
simple pattern of the form "parentname/childname". This is,
by serendipity, a tiny subset of the XSL pattern syntax. The
more sophisticated you make the pattern syntax, the more
difficult become the rules for what happens when an element
matches multiple patterns; but one approach would be to use
XSL patterns as defined.
The "handler" in SAXON is an instance of the class that will
handle the element but it could equally be the class itself.
(The advantage of passing an instance is that the same
handler class can then be used for several element types,
customised by some parameter in the constructor).
It would be easy to add a mechanism that drives these calls
from a property file or from PIs in the document; the
reverse would be less easy.
> - what are the *non-graphics* methods for an element ,
e.g.:
> doneParse()/processXML()
> isValid() [i.e. non-XML validation - type, values, etc.]
> write() - recreate XML or other formats
SAXON calls the following element-handler callbacks:
- beforeGroup() before a group of one-or-more consecutive
elements of the same type
- startElement()
- characters()
- endElement()
- afterGroup() after a group of consecutive elements of the
same type
The beforeGroup() and afterGroup() are there because I found
them useful in doing rendition, e.g. generating HTML lists
or tables.
The first parameter to the callback is an ElementInfo object
which the handler can call. The ElementInfo provides:
- navigational methods to determine the context of the
element. These include getParent(), getAncestor(),
getPreviousSibling(), isFirstInGroup(), etc. etc. Arguably
these should all be in the DOM; but the DOM is not very
generous in its provision of "convenience" methods. Also,
SAXON can provide a lot of context information even in a
serial pass that is not building the DOM: it maintains
knowledge of the stack and of "previous siblings" of
elements on the stack, which I find is sufficient context
for many purposes.
- the ability to setUserData() and getUserData() on the
element instance. These will often be used to create
pointers from the "syntactic" (DOM) model of the document to
the "semantic" (business object) model.
- a set of methods to generate output. These rely on the
ability to associate a Writer with either an element type or
with an individual element instance, which provides the
capability to split a document into parts (which can then,
if you like, be recombined in a different sequence).
There isn't a specific isValid(), rather any of the other
methods has the option to do validation and return an
exception if invalid.
One thing that isn't in SAXON but is needed is support for
IDREF or more generally for XPointer chasing within a
document or from one document to another. Don't know if this
is in scope for SOX!
Mike Kay