[Schematron] Checking existence conditioned on a variant type value
G. Ken Holman
gkholman at CraneSoftwrights.com
Thu Oct 13 12:59:06 EDT 2011
At 2011-10-13 12:30 -0400, Norm Birkett wrote:
>The XML I'm validating represents complex data objects with many variant
>types, the type in each case indicated by the values of certain type
>fields. Suppose, for example, that each of the following is a valid
>document (this is a made-up example that illustrates the relationships,
>not actual data):
>
><Rentable>
> <RentableType>FurnApt</RentableType>
> <Rooms>
> <Room>
> <RoomType>Bedroom</RoomType>
> ...
> </Room>
> ...
> </Rooms>
> <Closets>
> ...
> </Closets>
> <Furnishings>
> <Furnishing>
> <FurnishingType>Bed</FurnishingType>
> <BedSize>Twin</BedSize>
> <DefaultRoom>1</DefaultRoom>
> </Furnishing>
> </Furnishings>
></Rentable>
>
><Rentable>
> <RentableType>UnfurnApt</RentableType>
> <Rooms>
> ...
> </Rooms>
> <Closets>
> ...
> </Closets>
></Rentable>
>
><Rentable>
> <RentableType>OfficeSuite</RentableType>
> ...
></Rentable>
>
><Rentable>
> <RentableType>StorageCloset</RentableType>
> ...
></Rentable>
>
>Now suppose I've already run these through RELAX NG validation, so that
>we know the element names are spelled correctly, and that Furnishings
>elements are children of Rentables and not of Rooms, and so forth.
>Suppose further that I've validated all the enumeration types
>(RentableType isn't "Wombat" on any document, FurnishingType is always a
>known furnishing type, etc.)
Fine ... but I think you've drawn up short at this point.
>I next want to use Schematron to validate all the conditional existence
>rules.
Well, for *existence* rules, why not use RELAX-NG? I typically use
Schematron for *value* rules and a grammar for existence rules.
>Suppose I want to check that Furnishings appear when they should and not
>when they shouldn't. Approach #1:
>
><sch:pattern id="FurnApt">
> <sch:title>All rules pertaining to Furnished Apartments go
>here.</sch:title>
> <sch:rule context="Rentable/RentableType">
> <sch:assert test="not(./text() = 'FurnApt' and
>not(../Furnishings))">
First of all, throughout your tests I see you use: ./text()='xxx'
This, to me, is very inappropriate. You should use .='xxx'. I
typically see this misused in XQuery, but nevertheless it just is not
appropriate how you are using it. I'm not trying to be critical so
please forgive my tone, but I'm trying to hammer this home to XML'ers
who have not recognized the issue before.
In all of my professional XSLT career, I have so very rarely had to
address text() nodes in my work. The problem is that it is far too
granular. If you want to check the value of an element, check the
element and don't check its text children. The value of an element
is, by XPath definition, the concatenation of all of its descendent
text nodes. There may be other markup interfering.
For examples, consider:
<salutation>Hello <emph>World</emph></salutation>
With <salutation> the current node, "./text()" has the value "Hello "
while "." has the value "Hello World".
Even if you don't have any embedded elements, consider the following:
<RentableType>Furn<!--the OCR read this as "m"-->Apt</RentableType>
The XPath expression "./text()" returns two nodes whose values are
"Furn" and "Apt", neither of which is equal to "FurnApt". Whereas
the XPath expression "." returns "FurnApt" which is what your
Schematron is looking for.
Next, your test of "not(../Furnishings)" is entirely acceptable, but
I would not have gone in that direction. Mind you, you get the
benefit of the custom error message where I would merely get a schema
validation error, but I would have done the following in my RELAX-NG:
element Rentable
{
(
element RentableType { "FurnApt" },
element Furnishings { ...whatever...}, # mandatory for this type
element OtherStuff...
)
|
(
element RentableType { "UnfurnApt" },
element OtherStuff...
)
}
>Approach #2:
>
><sch:pattern id="FurnishingsWrtRentableType">
> <sch:title>The Furnishings element should exist only for
>appropriate RentableTypes.</sch:title>
> <sch:rule context="Rentable/Furnishings">
> <sch:assert test="not(../RentableType = 'UnfurnApt' or
>../RentableType = 'StorageCloset' or ...)">
> Furnishings not permitted for this RentableType.
> </sch:assert>
> </sch:rule>
> <sch:rule context="Rentable/RentableType">
> <sch:assert test="not(./text() = 'FurnApt' and
>not(../Furnishings))">
> A furnished apartment must have furnishings.
> </sch:assert>
> </sch:rule>
></sch:pattern>
While technically the same, I think you may miss out on mandatory
children that are absent, because you can't trigger a context on the
absent child. Since you have to use the parent context to test the
absent mandatory child, you might as well test everything else.
And I think this reflects your author's top-down approach as
well. If the way they worked they started off with the children and
then wrapped them with a parent, then this approach might be
acceptable (but for the missing mandatory children noted above), but
I don't know of anyone who approaches markup this way.
>I'm wondering what reasons there may be for favoring one approach over
>the other. So far, I've come up with the following:
>(1) Approach #1 has advantages if RentableType is the driver of most
>thinking and work on the data--esp. if different people work on
>different RentableTypes.
Sounds good.
>(2) Approach #2 is a bit more data-element-centric, which could be
>helpful given that I want to slurp my Schematron rules into a kind of
>data dictionary (using XSLT or some such tool).
Wellllllll .... make things easier for the user, not for the
programmer. The programmer is paid to work harder, the user should
have things done for them.
>Are there other factors (e.g. performance related) that I should be
>thinking about?
Only the lack of a context to test when that context is absent.
>Meanwhile, when I look at Approach #1, I keep wishing that the context
>attribute of the rule would allow me to specify not only a node, but
>also a condition on that node. (Is there a way to do this that I've
>missed?) That would remove a lot of the awkward and repetitious
>conditionality from the assertions' test attributes.
Ummmmm ... why not use XPath predicates on the context?
>Another thought I have, this time when looking at Approach #2, is being
>able to specify a non-existent context for a rule would have certain
>advantages. Obviously such a rule would have to be flagged as only being
>triggered when the context isn't found, and one would have to take care
>to specify the extent of the search (for example if one were getting a
>document consisting of a list of Rentables).
Not sure what you are saying there ... but a rule that triggers on
the absence of something has to be written for something else that is
guaranteed to be there.
>But this last is just an aesthetic point, as obviously I could do what I
>needed perfectly well by specifying a different context for the rule
>having to do with non-existence of the Furnishings. And I'm not even
>sure it's a very strong aesthetic point.
>
>Lastly, are there any horribly terrible ways I'm doing things in the
>examples above?
Just that ugly and inappropriate "./text()" stuff ... I hope I've
explained myself in that regard.
In the XSLT classroom I tell my students "if you think you need to
write text(), think again because you probably don't *need* to and
there are better ways to get at what you really need".
>Any much better ways to skin the cat that I'm missing?
Not off the top ... I just think presence testing is grammatical. I
use Schematron for value testing. I see "value testing" as when the
grammar has already confirmed the items are where they are supposed
to be, but then do they have appropriate values in concert with the
values of other items? That isn't grammatical.
And value constraints can change in the different *business*
contexts, while grammar constraints don't change. For example, in my
work in OASIS UBL, the structure of an invoice is always the same,
but if Sally is allowed to pay with a cheque but Harry must pay with
a certified cheque, I can apply different Schematron-expressed value
constraints to the two invoice instances that use the same
standardized structural constraints in the schema.
I hope this helps.
. . . . . . . . . . Ken
--
Contact us for world-wide XML consulting and instructor-led training
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/
G. Ken Holman mailto:gkholman at CraneSoftwrights.com
Google+ profile: https://plus.google.com/116832879756988317389/about
Legal business disclaimers: http://www.CraneSoftwrights.com/legal
More information about the Schematron
mailing list