Re: SOX

David Brownell (db@Eng.Sun.COM)
Fri, 02 Oct 1998 11:24:21 -0700


Peter Murray-Rust wrote:
>
> >- Generating customized content. It's no good solving only half
> > the problem, and customization during parsing is "easy" (as
> > suggested by all the results there).
>
> Could you expand on 'customized content'? Does this mean creating
> element-specific storage and additional methods?

There are basically three places content/information shows up: as
it's coming in (during parsing), when it's in memory, and when it's
going back out. A simple test of API sufficiency is how easy it is
to "round-trip" logical information. (I'll exclude DTD contents
from "logical" info, as well as stuff like entity boundaries. If we
assume schemas are happening, that simplifies APIs a lot!)

Customized content "in-memory" is a broad category: basically
all the representations an application may choose. DOM for this,
a specialized representation for that. Multiple copies of this,
drop that entirely. Merge info from this document with that other
state. In an object-oriented terminology, setting up the "object
model" for the application; yes, with additional methods. In such
a model, a document may be no more than a temporary artifact, and
the application may have rather strong requirements about how it
stores data represented in XML by elements, text, and so on.

(Clearly SAX does a very nice job of letting applications choose
the most appropriate representation for their data! It may not
be DOM, though for at least the simplest apps it should be.)

Similarly for customizing the content when it goes back out. I'll
mention two basic approaches that folk seem interested in:

- Writing it back out as XML text. That was what I was
referring to above. Basically, one needs the ability
to transform back from an application-optimized version
in memory to an XML transfer syntax. It may not look
very much like the document did on input; that's often
the point of the processing!!

- Writing it back out in some other format, such as in
the guise of object or relational database entries.
One needs to be able to read this back in of course.

If the focus is on the data (usually a good idea :-) then the
usage of XML and DOM may not always be center stage. When I
talk with folk doing document management systems, they don't
always want XML except as an interchange format; ditto folk who
are working on commerce or workflow apps.

With an XML-focussed hat on for the moment, "customized content"
was intended to address roundtripping information as XML text.
But if done as well as it needs to be, it'll work with systems
that aren't focussed exclusively on XML.

- Dave