RE: DTD Question

Rick Jelliffe (ricko@allette.com.au)
Thu, 22 Oct 1998 16:13:35 +1000


> From: owner-xml-dev@ic.ac.uk [mailto:owner-xml-dev@ic.ac.uk]On Behalf Of
> John Cowan

> Indeed. In fact, the Right Thing turns out to be CommonName and Surname;
> my CommonName is "John Cowan" and my Surname is "Cowan". See the
> X.500 Person schema.

That is probably as good a system as any: trying to break names down into
parts that have international usefulness is a mug's game. Here in Taiwan
they use name-of-family (comes first) and then name-in-family, but people
very commonly adopt a Western name. I had a Japanese friend who had 6
different aliases he used in different contexts (including "Cindy" as in
Cindy Crawford, which was a little perplexing). In some South-East Asian
countries sometimes people only have one name, and dont use name-of-family.
And, of course, in North Europe in times past people derived their surname
from their father's name or their location: probably that still goes on
somewhere in the world. In the West, women often retain their maiden names
for work. Westerners with unhyphenated double-barrelled names often only use
one (e.g. Mr John Price Pontifex might always use Mr John Price except in
official documents).

So the idea of just allowing
<!ELEMENT person ( name+ )>
<!ELEMENT name ( #PCDATA )>
<!ATTLIST name type NMTOKENS "CommonName">
<!-- Example roles are
OfficialName, NickName, FamilyName, SurName, GivenName, Alias,
FirstName, MiddleName, LastName -->
...
<person><name>Rick Jelliffe</name>
<name type="Surname LastName FamilyName">Jelliffe</name>
<name type="GivenName FriendName FirstName">Rick</name>
<name type="OfficialName Fullname">Jelliffe, Richard Alan</name>
<name type="StageName">The King of Bubble and Skank</name>
<name type="NickName">Papa Shakey</name>
<name type="Alias">Brunopoly</name>
<name type="AustralianMistake">Jeff Jolly</name>
<name type="JapaneseMistake">Lick Jellyfish</name>
<name type="TaiwaneseMistake">Rick, Jelly Furry</name>
</person>

I think this example also demonstrates the difference between a DTD and a
schema. The DTD mainly gives information related to markup parsing and
moving constants into a header (e.g. internal entities, FIXED and default
attribute values): in particular, for XML the information it gives is that
the type attribute should be interpreted as a list of tokens. The comment
allows interesting parts of the schema to be written down.

But there may always constraints such as "every person must have an official
name beginning with J and be sorted in alphabetical order" which are not
expressible by DTDs, and could only be expressed in a schema language which
has access to the DOM (e.g. using XLL) and can pattern match and can save
values: in other words, a fairly general purpose expression language.

In fact, I doubt that the idea of "a schema" for a document is the way to
go. It seems to me that every different use of a document might require some
schema optimized for that use. Data entry requires lexical validation schema
(e.g. to make sure a date is in a particular format). Text processing
requires another (e.g. that the document is valid against a DTD so that all
required elements are found and in the expected positions, or that the data
has been sorted). Data delivery might require another (e.g. that the
"html:alt" attribute value for an image is generated in the language of the
recipient, or that the bounding box of an image is not greater than the
column width it has to fit into).

Perhaps these can fit into "data typing", "document typing" and "publication
typing", where DCD, DTD and XSL can fit the bills. Too early to say.

Rick Jelliffe