Re: Namespaces need versioning

Martin Bryan (mtbryan@sgml.u-net.com)
Tue, 20 Oct 1998 09:05:10 +0100


Charlie Reitzel wrote:

>I sense a great deal of hesiation among industry groups hoping to base
data exchange standards on XML DTD's due to this issue. Is everybody giving
up on validation and moving to DCD instead?

Hopefully not: but validation of dynamic DTDs is always going to be a
problem:-(

In an earlier message he said:

>>The fact that the prefix
>> is, for all practical purposes, a public identifier, controlled by the
>> DTD author, and tied to a specific verion of the DTD, prevents document
>> authors from changing DTD versions without rewriting their documents.
>> - Not useful.

I would have said very useful. Authors don't want users to change the DTD
after the document instance has been written.

>The difference [from the old method of using PIs] is that for a prefix to
be set in a DTD it has to be a #FIXED attribute of the element and is,
therefore, defined by the author of the DTD
and *not* the document.

Remember that you only need to use #FIXED if you need to ensure the value
never changes. The alternative is to provide a default value that will be
used unless the author makes a concious choice to override it.

> If the prefix is set by the document author, then
the exact, version-specific URL must also be included with each instance of
the element using the prefix.

Why is this necessarily wrong? Once the instance is created the document
type used for its creation should never need to be changed. It is only when
you want to revise the instance and, as part of the revision, make it
conform to a revised version of the DTD, that you need to consider changing
the markup of the instance.

If the version information gets associated with a local namespace
declaration, e.g.

<old:something xmlns:old="http://www.myco.to/DTDs/something-v1.ent">
<new:something xmlns:new="http://www.myco.to/DTDs/something-v2.ent">

it will allow authors to mix two versions of the declarations for the same
thing in a single instance, if they really want to do so, whereas,

<!ELEMENT changes (old:something, new:something)>
<!ELEMENT old.something (model1)>
<!ATTLIST old:something
xmlns:old CDATA #FIXED "http://www.myco.to/DTDs/something-v1.ent">
<!ELEMENT new:something (model2)>
<!ATTLIST new:something
xmlns:new CDATA #FIXED "http://www.myco.to/DTDs/something-v2.ent">

would force them to use the correct models in the correct place

>In the first case, the prefix must also be version specific and, therefore,
cannot be used as a mechanism to insulate documents from changes in the DTD
version.

Again I am lost. By using local namespace declarations in the instance, as
shown above, you can certainly insulate document instances from future
changes in the DTD, which is what I thought was wanted.

> In the second case, when the DTD version changes, every instance
of its use must be updated to refer to the new version of the DTD fragment.

No: All the existing instances remain referencing the old version of the
DTD. What you need to do is to change the identifier of the updated version
so that it can be differentiated from that of the old version. Why should
existing document instances need to point to the revised DTD until they need
to be revised?

>Remember, the purpose here is to define standard libraries of DTD fragments
which can be used for industry specific EDI transactions, standard data
types, and the like. Such DTD fragments will occasionally undergo updates
which will be, by and large, backward compatible - much like standard
function libraries. Typically, only a small number existing elements would
be "broken" by the new DTD fragment.

Again my question is "why do old transactions need to be validated against
new DTDs?" (New XSL style sheets I understand the need for!)

> The document author simply has to
change the prefix declaration to point to the new version of the DTD
fragment and run the documents through the parser and/or application. With
a well designed and maintained DTD fragment, only a few elements will be
kicked out.

With the local namespace approach he has the advantage of only having to add
a new xmlns attribute to every element the parser kicked out. I don't see
how any simpler method could be devised.

>With the old spec, the prefix is associated with a URL uniquely within a
document and cannot mean different things for different elements.

Now we come to the real nub of the problem, which is to stop users applying
the same namespace names twice if validation is to take place. But this
cannot happen because if the user tries to do it the parser will reject the
DTD as it will contain two definitions for the same element. If the element
is one whose role has not changed all you need to do is to remove the
duplicated definition. If it is one that has changed then to validate it
using the revised definition you must assign it a unique namespace qualifier
to the revised definition and use the extended name in all occurrences of
the revised element in the revised instance. It seems to me that exactly the
same problem occurs as that which occurs when you are using processing
instructions to define name spaces within a DTD and then fail to qualify
sublement definitions correctly. Its just that you get more chances for it
to happen. In what way do you see the two processes as differing?

> The
prefix declaration occurs in the document prolog, possibly in an external
entity (i.e. DTD fragment).

Or it can occur in the local subset as a set of additional attribute
definitions for the elements affected. (To my mind this is by far the best
place to manage namespaces. Namespaces declared in external entities must
have their versions controlled from the relevant parameter entity
declaration, which is dangerous and prone to error.)

> Currently, the NS processing instruction is
allowed as a custom, non-standard feature and may well not be inter-operable
with other XML parsers or applications. That said, the processing
instruction is *required* for validation to work (i.e. for an element
instance to be correctly associated with its declaration by its qualified
name

The whole point is that the revised spec, because it is based solely on
attributes, will be accessible using SGML tools as well as XML tools. The PI
specific one requires the addition of a specialized PI processor to any tool
that is going to use it.

Martin Bryan
The SGML Centre