Re: Questions on DCD

Dean Roddey (roddey@us.ibm.com)
Fri, 2 Oct 1998 14:38:28 -0400


>>6) I know this is not going to make me popular, but I think that ther=
e
>>are too many datatypes
>
>I do too, and I've warned my co-editors to expect massive amputations
>in the committee process, if DCD ever gets taken up.
>

I think everyone thinks that this is the case. The problem is that if y=
ou have
too few, then many applications will require ad hoc and inconsistent ex=
tensions
of the mechanism. Too many and you bog down all processors with having =
to deal
with lots of types that most people don't use (and you still don't cove=
r all
the bases.)

I made a proposal related to the whole type validation mechanism, which=
really
extends beyond DCD but would address a lot of its issues as well as all=
ow for
much more extensibility. I'd like to post it here for comment (with som=
e
internal information removed) in order to perhaps bash it out as a gene=
ral type
validation mechanism. However, right not its in a Notes database and if=
I post
it here its going to look so horrible that it will probably be unreadab=
le.

But, just in case, here it is. I'd appreciate any comments on it (if yo=
u can
read it.) Posting stuff to this mailing list from Notes generally seems=
to
totally destroy its format.

Overview
This document is related to the DCD proposal made to the W3C, and more
specifically the DCD 'constrain mechanism'. DCD provides a means for in=
dicating
constraints on the values of elements and attributes. This mechanism is=

provided via the Min/MinExclusive and Max/MaxExclusive properties, and =
has
'falls within a single range' semantics. In other words each element or=

attribute definition can express a value range (inclusive or exclusive)=
within
which the value of each instance of that element or attribute in the ta=
rget XML
file must fit, in order to meet the constraints.
The purpose of this document is to propose an alternative constraint me=
chanism
which we feel is no more complex and far more flexible than the one cur=
rent
proposed.
Just to provide a refresher course for the existing DCD constraint prop=
osal,
here is an example snippet from a DCD which defines an attribute which =
is of
type int and which has a range of 1 to 10, inclusive.
<AttributeDef Name=3D"Foo" Datatype=3D"int" Min=3D"1" Max=3D"10"/>
The same mechanism applies to element definitions as well. The content =
of the
min/max values must of course make sense for the declared data type of =
the
element or attribute.
After doing a quick and dirty demonstration program of DCD (based on th=
e
existing functionality in the XML4J parser), the XML Team at JTC-SV wou=
ld like
to put forward a proposal which we feel makes the constraint mechanism =
more
useful and extensible, without placing an undue burden on the common ca=
se of a
simple document with simple validation needs (i.e. not constraints requ=
ired,
just structural validation.)
The overall goals of this constraint mechanism are:
Minimum code support requirements in the core parser architecture (i.e.=
minimal
cost for those who don't use it)
Reasonable implementation size and complexity
Open endedness and flexibility for the uncommon case and user
Simplicity of understanding and use for the common case and user
We feel that a constraint mechanism is probably achievable which meets =
these
requirements. As the likely targets of such a proposal, we obviously do=
not
want to propose something which is not achievable and maintainable with=

reasonable effort, so we certainly hope not to contribute to the growin=
g
perception that 'deep thoughts' in the XML world are out of hand, and r=
eal
world implementation is suffering for it.
Driving Forces
The primary driving force of this proposal is a belief that the constra=
int
mechanism currently expressed in the DCD proposal is insufficient to me=
et more
than a small fraction of the needs of the possibly quite wide target au=
dience.
We understand the reasoning behind this initial proposal, i.e. to maint=
ain a
level of simplicity that would increase the likelihood of acceptance an=
d
implementation; however, we feel that the current mechanism is sufficie=
ntly
limited that its implementation might be counterproductive. The reasoni=
ng is
that almost any real world application of the technology would require =
some
amount of manual extension. Such extensions are not possible within the=

existing specification, and hence would almost certainly be implemented=
in a
haphazard way, hindering interoperability of implementations.
Also, since any such haphazard extensions have the potential of becomin=
g
defacto standards, we would like to avoid having such 'design by aggreg=
ation'
imposed upon us by the marketplace. By providing a more extensible mech=
anism up
front, we would hope to avoid this scenario, since any reasonable exten=
sion of
the mechanism could be made without stepping outside the system provide=
d.
And thirdly, though obviously useful, the limited constraints expressab=
le in
the existing system does not seem sufficient enough to warrant the effo=
rt of
implementing a constraint mechanism in the parser. Such a mechanism is
non-trivial and imposes some mimimum of unavoidable overhead on the par=
ser. For
such an effort to be made and such a performance burden to be accepted,=
we
would very much prefer to achieve more powerful constraint checking for=
our
buck.
The Basic Concept
Our concept is based loosely upon the existing experience of spreadshee=
ts,
which are probably the prototypical example of simple 'application deve=
lopment'
for the end user. In particular, the 'function' concept of the spreadsh=
eet,
which provides an easy to understand mechanism for doing simple arithme=
tic and
logic operations. These functions are in the form of a simple function =
call
which evaluates its parameters and returns a boolean pass/fail result.
So, at its simplest level, a constraint expression would look something=
like
this:
<AttributeDef Name=3D"Foo" Datatype=3D"int" Constraint=3D"IsInRange(1,1=
0)" />
In this scenario, a "Constraint" property is introduced. Its value is a=
string
which expresses some constraint by way of a 'function syntax' expressio=
n. In
this case the function is "InRange" and it takes two values, the minimu=
m and
maximum values of the range. All constraints will be of this form.
High Level Implementation
The implementation of this proposed validation scheme is relatively
straightforward. It can be delivered in three conceptual layers, each o=
f which
provides increasing levels of sophistication for increasing levels of e=
ffort
and coding skill. These layers will be discussed here in detail, as wel=
l as how
those layers can be fit together and 'delivered'.
Intrinsic Functions
At the core of the validation system there will be a set of intrinsic
functions, which are provided with the parser implementation, and which=
should
be required in any DCD implementation by the specification. This will i=
nsure
interoperability of core validation services. These functions will be s=
elected
for their high 'bang for the overhead buck' appeal, i.e. they will meet=

hopefully 90% of the common case needs with minimal overhead (since the=
y will
be packaged with the parser core.)
A likely set of core functions would be:
Name Example
IsEqualTo Constraint=3D"IsEqualTo(5.0)"
IsGreaterThan Constraint=3D"IsGreaterThan(&BaseLevel;)"
IsLessThan Constraint=3D"IsLessThan(25)"
IsInRange Constraint=3D"IsInRange(&ValidRange;)"
IsOneOf Constraint=3D"IsOneOf(Blue, Red, Pink)"
IsTrue Constraint=3D"IsTrue()"
IsFalse Constraint=3D"IsFalse()"
IsEven, IsOdd Constraint=3D"IsEven()"
IsMultipleOf Constraint=3D"IsMultipleOf(255)"
IsInMultiRange Constraint=3D"IsInMultiRange(1-10, 90-100)"
IsStrEqualTo Constraint=3D"!IsStrEqualTo('We the People')"
IsDigit, IsChar, etc... Constraint=3D"IsHexDigit()"
And, Or, Xor Constraint=3D"And(IsInRange(1, 90), !IsMultipleOf(5))"

This set of functions should meet the needs of quite a wide range of
applications, though there might be a couple more fundamental ones that=
could
or should be added. Though the semantics of these are quite obvious, a =
little
discussion of the finer points is presented before we move on.
First of all, notice how these functions leverage the power of general
entities, by allowing flexible replacement of function parameters. This=

capability will provide a lot of power to modify the validation over ti=
me
without changing the DCD itself. This is not in an of itself an improve=
ment
over the existing validation scheme, since entity replacement is inhere=
nt to
XML; however, the more expressive the validation mechanism, the more le=
verage
is gained.
Secondly, note the second to the last line, which describes the 'charac=
ter
type' functions. These can be mapped pretty directly to the language su=
pport
for such things, and will provide a nice way to check a lot of characte=
ristics
of single character fields. There are language and locale issues involv=
ed here,
which will be discussed at the end of this document.
Also, note the last line which defines some boolean logic functions. Th=
ese can
be intrinsically handled by the processor itself, and will support much=
more
complex constraints built from more basic ones. As long as we limit the=
nesting
to something reasonable such as a single level, the complexity of these=

functions will be quite small. They will merely be a recursive containe=
r and
invoker of other functions, with a little evaluation of the boolean res=
ults of
each one. Though the example shows two parameters, there is no reason w=
hy it
cannot easily allow an open ended number of subexpression parameters.
Negation is implemented by the '!' prefix before a function, as in the =
last
line where the function checks that the value is both in the range 1..9=
0
inclusive and is not a multiple 5. This provides a lot of flexibility a=
nd
avoids the need for having explicit Not versions of functions, and the
implementation of it is ultra trivial. In the IsStrEqualTo() example ch=
ecks
that the value is not equal to "We the People".
The amount of code to implement these intrinsic functions, above and be=
yond the
basic amount of instructure required to support constraint checking at =
all, is
very trivial. Most of them will resolve to singe lines of evaluation co=
de.
To insure openness, the function mechanism will probably be based on th=
e
namespace proposal as well. So, in reality, the above functions would a=
ctually
be part of a "Htpp://W3C.Constraints/DCDStd" namespace for instance. Th=
is will
allow a convenient partitioning of the function namespace, as well as a=
very
flexible way of providing alternative processing by just mapping the na=
mespace
prefix to another URI that maps to a different set of functions!
Third Party Functions
The next level of support would be the ability to plug in third party
validation functions. This would open up the system considerably by pro=
viding a
well defined delivery mechanism for functions, to which third parties c=
ould
write. As long as these functions can be expressed with the simple func=
tion
syntax described above, they can be as complex as the developer wishes =
them to
be and the user is willing to deal with.
Support for third party functions requires a well defined interface to =
which
they can be developed. This required interface is actually quite simple=
and
convenient, and will have very few semantic demands to be met. The very=
simple
semantics insures that open endedness is not compromised by the interfa=
ce. A
proposed interface is described below.
Custom Functions
At the upper end of the spectrum are custom applications which would pr=
ovide
their own functions for doing very domain specific constraint validatio=
n. These
could include PIN number validation, database lookup of names or ids or=
social
security numbers, and on and on. Our proposal provides a flexible back =
door for
the validation mechanism to accomodate the most complex imagineable val=
idation,
without increasing the overhead of the common case by a single CPU cycl=
e.
Though there is no limit to the complexity or sophistication that these=
custom
functions could achieve, there are no implementation issues here which =
go
beyond those of the third party function development scenario discussed=
above,
at least from our perspective as the parser provider.
Implementation Details
This section puts forward a specific example implementation that we bel=
ieve
will meet all of the requirements and fulfill all of the promise of the=

proposed system. Example Java implementations are presented, but the
implementation would be easily done in C++ or any other quality object =
oriented
language.
The Function Interface
A function is represented in the implementation as a simple abstract in=
terface
class. The interface is extremely simple, but allows the system to mana=
ge them
and invoke them generically and reasonably efficiently. For this discus=
ion, the
interface is called ValFunction.
A concrete implementation of it would look something like this in Java.=
This
very simple class would allow the functions to be managed and invoked v=
ery
simply and easily. Of course this is not a very complex example, and co=
uld be
achieved by way of an intrinsic IsLessThan() function, but it shows how=
one
would implement a simple function class.
class ValidSalary implements ValFunction
{
// Default ctor only because they are factory created
ValidSalary()
{
}

// 'Parsing' method
public void Parse(String[] astrParams)
{
// We only take one function param of maximum salary
if (astrParams.length !=3D 1)
throw SomeError();

// Try to convert to our max salary member
fMaxSal =3D new Double(astrParams[0]).doubleValue();

// Format our constraints into the description string
strDesc =3D new String("< " + fMaxSal);
}

// Evaluation method
public boolean bEvaluate(String strValue)
{
// Convert the string to a double and compare to max
double fTmp =3D new Double(strValue).doubleValue();
return (fTmp < fMaxSal);
}

// Reporting method for errors
public String strConstrainDesc()
{
return strDesc;
}

// Private data
double fMaxSal =3D 0;
};

The constructor is a default since functions will generally be 'factory=

created'. However, the factory can certainly invoke them with particula=
r
parameters. More on this below in the "Function Bundle Interface" secti=
on.
The Parse() method is called once during the evaluation of an element o=
r
attribute which declares a constraint that uses the function. The conte=
nts of
the function (the stuff after the function name, i.e. inside the functi=
on's
parenthesis) is passed to the parser method in an array of strings whic=
h
represent the comma separated function parameters. The function will ev=
aluate
these parameters, which represent the validity constraints set up for t=
he
element or attribute, and store that information in some (hopefully) op=
timal
internal format. In the example above, which validates maximum salaries=
, it
converts the single parameter to a double and stores that for later use=