[Schematron] Checking existence conditioned on a variant type value

Thu Oct 13 12:59:06 EDT 2011

At 2011-10-13 12:30 -0400, Norm Birkett wrote:
>The XML I'm validating represents complex data objects with many variant
>types, the type in each case indicated by the values of certain type
>fields. Suppose, for example, that each of the following is a valid
>document (this is a made-up example that illustrates the relationships,
>not actual data):
>
><Rentable>
>         <RentableType>FurnApt</RentableType>
>         <Rooms>
>                 <Room>
>                         <RoomType>Bedroom</RoomType>
>                         ...
>                 </Room>
>                 ...
>         </Rooms>
>         <Closets>
>                 ...
>         </Closets>
>         <Furnishings>
>                 <Furnishing>
>                         <FurnishingType>Bed</FurnishingType>
>                         <BedSize>Twin</BedSize>
>                         <DefaultRoom>1</DefaultRoom>
>                 </Furnishing>
>         </Furnishings>
></Rentable>
>
><Rentable>
>         <RentableType>UnfurnApt</RentableType>
>         <Rooms>
>         ...
>         </Rooms>
>         <Closets>
>                 ...
>         </Closets>
></Rentable>
>
><Rentable>
>         <RentableType>OfficeSuite</RentableType>
>         ...
></Rentable>
>
><Rentable>
>         <RentableType>StorageCloset</RentableType>
>         ...
></Rentable>
>
>Now suppose I've already run these through RELAX NG validation, so that
>we know the element names are spelled correctly, and that Furnishings
>elements are children of Rentables and not of Rooms, and so forth.
>Suppose further that I've validated all the enumeration types
>(RentableType isn't "Wombat" on any document, FurnishingType is always a
>known furnishing type, etc.)

Fine ... but I think you've drawn up short at this point.

>I next want to use Schematron to validate all the conditional existence
>rules.

Well, for *existence* rules, why not use RELAX-NG?  I typically use 
Schematron for *value* rules and a grammar for existence rules.

>Suppose I want to check that Furnishings appear when they should and not
>when they shouldn't. Approach #1:
>
><sch:pattern id="FurnApt">
>         <sch:title>All rules pertaining to Furnished Apartments go
>here.</sch:title>
>         <sch:rule context="Rentable/RentableType">
>                 <sch:assert test="not(./text() = 'FurnApt' and
>not(../Furnishings))">

First of all, throughout your tests I see you use:  ./text()='xxx'

This, to me, is very inappropriate.  You should use .='xxx'.  I 
typically see this misused in XQuery, but nevertheless it just is not 
appropriate how you are using it.  I'm not trying to be critical so 
please forgive my tone, but I'm trying to hammer this home to XML'ers 
who have not recognized the issue before.

In all of my professional XSLT career, I have so very rarely had to 
address text() nodes in my work.  The problem is that it is far too 
granular.  If you want to check the value of an element, check the 
element and don't check its text children.  The value of an element 
is, by XPath definition, the concatenation of all of its descendent 
text nodes.  There may be other markup interfering.

For examples, consider:

    <salutation>Hello <emph>World</emph></salutation>

With <salutation> the current node, "./text()" has the value "Hello " 
while "." has the value "Hello World".

Even if you don't have any embedded elements, consider the following:

   <RentableType>Furn<!--the OCR read this as "m"-->Apt</RentableType>

The XPath expression "./text()" returns two nodes whose values are 
"Furn" and "Apt", neither of which is equal to "FurnApt".  Whereas 
the XPath expression "." returns "FurnApt" which is what your 
Schematron is looking for.

Next, your test of "not(../Furnishings)" is entirely acceptable, but 
I would not have gone in that direction.  Mind you, you get the 
benefit of the custom error message where I would merely get a schema 
validation error, but I would have done the following in my RELAX-NG:

   element Rentable
     {
       (
         element RentableType { "FurnApt" },
         element Furnishings { ...whatever...}, # mandatory for this type
         element OtherStuff...
       )
      |
       (
         element RentableType { "UnfurnApt" },
         element OtherStuff...
       )
     }

>Approach #2:
>
><sch:pattern id="FurnishingsWrtRentableType">
>         <sch:title>The Furnishings element should exist only for
>appropriate RentableTypes.</sch:title>
>         <sch:rule context="Rentable/Furnishings">
>                 <sch:assert test="not(../RentableType = 'UnfurnApt' or
>../RentableType = 'StorageCloset' or ...)">
>                         Furnishings not permitted for this RentableType.
>                 </sch:assert>
>         </sch:rule>
>         <sch:rule context="Rentable/RentableType">
>                 <sch:assert test="not(./text() = 'FurnApt' and
>not(../Furnishings))">
>                         A furnished apartment must have furnishings.
>                 </sch:assert>
>         </sch:rule>
></sch:pattern>

While technically the same, I think you may miss out on mandatory 
children that are absent, because you can't trigger a context on the 
absent child.  Since you have to use the parent context to test the 
absent mandatory child, you might as well test everything else.

And I think this reflects your author's top-down approach as 
well.  If the way they worked they started off with the children and 
then wrapped them with a parent, then this approach might be 
acceptable (but for the missing mandatory children noted above), but 
I don't know of anyone who approaches markup this way.

>I'm wondering what reasons there may be for favoring one approach over
>the other. So far, I've come up with the following:
>(1) Approach #1 has advantages if RentableType is the driver of most
>thinking and work on the data--esp. if different people work on
>different RentableTypes.

Sounds good.

>(2) Approach #2 is a bit more data-element-centric, which could be
>helpful given that I want to slurp my Schematron rules into a kind of
>data dictionary (using XSLT or some such tool).

Wellllllll .... make things easier for the user, not for the 
programmer.  The programmer is paid to work harder, the user should 
have things done for them.

>Are there other factors (e.g. performance related) that I should be
>thinking about?

Only the lack of a context to test when that context is absent.

>Meanwhile, when I look at Approach #1, I keep wishing that the context
>attribute of the rule would allow me to specify not only a node, but
>also a condition on that node. (Is there a way to do this that I've
>missed?) That would remove a lot of the awkward and repetitious
>conditionality from the assertions' test attributes.

Ummmmm ... why not use XPath predicates on the context?

>Another thought I have, this time when looking at Approach #2, is being
>able to specify a non-existent context for a rule would have certain
>advantages. Obviously such a rule would have to be flagged as only being
>triggered when the context isn't found, and one would have to take care
>to specify the extent of the search (for example if one were getting a
>document consisting of a list of Rentables).

Not sure what you are saying there ... but a rule that triggers on 
the absence of something has to be written for something else that is 
guaranteed to be there.

>But this last is just an aesthetic point, as obviously I could do what I
>needed perfectly well by specifying a different context for the rule
>having to do with non-existence of the Furnishings. And I'm not even
>sure it's a very strong aesthetic point.
>
>Lastly, are there any horribly terrible ways I'm doing things in the
>examples above?

Just that ugly and inappropriate "./text()" stuff ... I hope I've 
explained myself in that regard.

In the XSLT classroom I tell my students "if you think you need to 
write text(), think again because you probably don't *need* to and 
there are better ways to get at what you really need".

>Any much better ways to skin the cat that I'm missing?

Not off the top ... I just think presence testing is grammatical.  I 
use Schematron for value testing.  I see "value testing" as when the 
grammar has already confirmed the items are where they are supposed 
to be, but then do they have appropriate values in concert with the 
values of other items?  That isn't grammatical.

And value constraints can change in the different *business* 
contexts, while grammar constraints don't change.  For example, in my 
work in OASIS UBL, the structure of an invoice is always the same, 
but if Sally is allowed to pay with a cheque but Harry must pay with 
a certified cheque, I can apply different Schematron-expressed value 
constraints to the two invoice instances that use the same 
standardized structural constraints in the schema.

I hope this helps.

. . . . . . . . . .  Ken

--
Contact us for world-wide XML consulting and instructor-led training
Crane Softwrights Ltd.            http://www.CraneSoftwrights.com/s/
G. Ken Holman                   mailto:gkholman at CraneSoftwrights.com
Google+ profile: https://plus.google.com/116832879756988317389/about
Legal business disclaimers:    http://www.CraneSoftwrights.com/legal