[Schematron] Progressive validation and termination (CALS tables)

Nigel Whitaker nigel.whitaker at deltaxml.com
Mon Aug 12 14:29:09 EDT 2013


Hello,

Over a year ago I wrote about an issue I was having with progressive validation.

George and Rick provided useful replies which I pondered for far too long - sorry!

But recently the issue was rekindled because we've had both enhancement requests from some of our customers and also external requests to get the code finished and released (with an Open Source license).

The code I'm talking about is now here:  http://code.google.com/p/cals-table-schematron/

It checks, as far as I'm aware, all of the semantic rules for CALS tables.

George suggested a pattern for including or repeating patterns in the phases.
In my code it looks like this:

  <phase id="context">
    <active pattern="p-context"/>
  </phase>
  <phase id="referencing">
    <active pattern="p-context"/>
    <active pattern="p-referencing" />
  </phase>
  <phase id="spansandcolspecs">
    <active pattern="p-context"/>
    <active pattern="p-referencing" />
    <active pattern="p-spansandcolspecs"/>
  </phase>
  <phase id="structure">
    <active pattern="p-context"/>
    <active pattern="p-referencing" />
    <active pattern="p-spansandcolspecs"/>
    <active pattern="p-structure"/>
  </phase>

This works, in that it does get things in the right order.  So if I do <schema defaultPhase="structure" …>
or by passing the phase param on the command-line I get the desired order.

I looked at the generated code and found that the skeleton generates the same code as phase=#ALL which runs the phases in document order.

So in both cases I end up with this generated code for running the four phases (you can get additional code with some skeleton params):

   <!--SCHEMA SETUP-->
  <xsl:template match="/">
      <xsl:apply-templates select="/" mode="M11"/>
      <xsl:apply-templates select="/" mode="M12"/>
      <xsl:apply-templates select="/" mode="M13"/>
      <xsl:apply-templates select="/" mode="M14"/>
   </xsl:template>



The other problem was stopping after one phase failed (because the following phases were written assuming things like referential integrity - eg: entry/@colname  references pointing to a single colspec in CALS).

As it currently stands users can end up seeing things like this:

"An empty sequence is not allowed as the first argument of cals:colnum()"    which is an internal function used for structural checking as I've assumed that when doing structural checking that all references can be resolved.



I thought about three possible ways of implementing the "stopping after a failed phase" behaviour:

I did consider XProc, it may be possible to define a step that runs each phase (perhaps from a 'phases' param/option that specified the ordered list of phases) with p:xslt and looks at the SVRL to check for failures.  I didn't go this route because (a) I lacked XProc skills and (b) I was concerned about the performance issues,  for 4 phases I couldn't see how to avoid 4 XSLT transforms and thus compiling the generated XSLT 4 times and also parsing the instance file being checked 4 times.


Another technique I considered, primarily for performance was to write a multi-phase-runner in Java using Saxon s9api.   This would compile the XSLT once, load the input XML using a DocumentBuilder into an XdmNode and run 4 transformations, checking the XdmNode result by running an XPath query on the SVRL from the 4 runs.


In the end the technique we're currently using (in some of our software) involves modifying the XSLT generated by the XSLT2 skeleton.  So the example above would become:


  <xsl:variable name="passed" select="true()" saxon:assignable="yes"/>
  ...
  <xsl:template match="/">
      <saxon:assign name="passed" select="true()"/>
      <xsl:if test="$passed">
        <xsl:apply-templates select="/" mode="M11"/>
      </xsl:if>
      <xsl:if test="$passed">
        <xsl:apply-templates select="/" mode="M12"/>
      </xsl:if>
      <xsl:if test="$passed">
        <xsl:apply-templates select="/" mode="M13"/>
      </xsl:if>
      <xsl:if test="$passed">
        <xsl:apply-templates select="/" mode="M14"/>
      </xsl:if>
   </xsl:template>


And the code for each assertion then sets the 'variable' with the addition of the saxon:assign statement, in the example below:

<xsl:template match="*:tgroup" priority="1000" mode="M11">
<!--ASSERT -->
  <xsl:choose>
    <xsl:when test="count(distinct-values(*:colspec/@colname)) eq count(*:colspec/@colname)"/>
      <xsl:otherwise>
         <svrl:failed-assert xmlns:svrl="http://purl.oclc.org/dsdl/svrl" test="count(distinct-values(*:colspec/@colname)) eq count(*:colspec/@colname)">
           <xsl:attribute name="location">
             <xsl:apply-templates select="." mode="schematron-select-full-path"/>
           </xsl:attribute>
           <svrl:text>The colnames of the colspecs in a tgroup (<xsl:value-of xmlns="http://purl.oclc.org/dsdl/schematron" select="saxon:path()"/>) must be unique CALS-T10R4B</svrl:text>
         </svrl:failed-assert>
         <saxon:assign  name="passed" select="false()"/>
      </xsl:otherwise>
   </xsl:choose>


It's using saxon:assign and that's bypassing functional XSLT which I'm not happy with or proud of!   I think it's also a PE/EE feature of Saxon and won't work in HE.

So while my XSLT hacking works for our purposes I'm not sure it would be an acceptable contribution to the main skeleton code?  

I did try Rick's suggestion of xsl:message/@terminate='yes'  but that was equally messy as it terminated the XSLT process and you didn't necessarily get any SVRL output from the failed phase.


I was wondering:
   - can anyone suggest a better XSLT approach than saxon:assign or xsl:message/@terminate ?
   - should I persevere with either the s9api or XProc multi-phase running approaches?

Finally, if any schematron experts can suggest better ways of doing this I would appreciate the input (I'm proficient in XPath/XSLT but have limited experience with schematron).

Thanks,

Nigel





-- 
Nigel Whitaker, Software Architect, DeltaXML Ltd. "Experts in information change"
nigel.whitaker at deltaxml.com   http://www.deltaxml.com   +44 1684 869035 
Registered in England: 02528681 Reg. Office: Monsell House, WR8 0QN, UK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.eccnet.com/pipermail/schematron/attachments/20130812/11a5bd3b/attachment.html 


More information about the Schematron mailing list