[Schematron] Help sought: implementation of Character Repertoire in XSLT2 for embedding in schematron

Rick Jelliffe rjelliffe at allette.com.au
Thu Sep 18 11:07:35 EDT 2008


Part of ISO DSDL is the Character Repertoire Description Language. 

I have coded up a little implementation which
  1) converts from CRepDL to regular expressions
  2) converts from CRepDL embedded in Schematon into extra <sch:assert> 
element (a schematron pre-processor)

However, I have run out of time to complete this this month, so I 
thought I would post it in case someone else wanted to
get the regexes working. (The rest of the code seems to work, it is the 
regex generating templates from line 282 that need
to be corrected.)

The XSD spec defines a set, based on Perl.  But XSLT2 refers back to the 
Unicode regexes. The two have significant
differences:  for example the availability of  && operators, the use of 
|| rather than |, and the ability to have nested
[ items ].

So if anyone wants a fun weekend job, attached is the code and test 
files:  what is needed is to figure out what
kind of regular expressions SAXON 9 actually implements, and to generate 
that. I suspect that every different
implementation of XSLT2 or EXSLT will have a different regex library in 
practise!

Here is an example of a Schematron file with an embedded CDRL schema as 
used by this code.
=====================================================
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"

    xmlns:ext="http://www.schematron.com/namespace/extensionsAndExperiments"
   queryBinding="xslt2" >
  <sch:title>CRDL Test</sch:title>
  
    <union xmlns="http://purl.oclc.org/dsdl/crepdl/ns/structure/1.0"  
xml:id="iso8859-6alt" >
        <char>\p{IsBasicLatin}</char>
        <char>&#xA0;</char>
        <char>&#xA4;</char>
        <char>&#xAD;</char>
        <char>&#x60C;</char>
        <char>&#x61B;</char>
        <char>&#x61F;</char>
        <char>[&#x621;-&#x63A;]</char>
        <char>[&#x640;-&#x652;]</char>
    </union>
  
  <sch:pattern  id="p1">
     <sch:title>Text</sch:title>
    
     <sch:rule context="/*">
         <sch:assert test="true()" ext:crdl-type="iso8859-6alt">The text 
is ISO 8859-6</sch:assert>
     </sch:rule>
  </sch:pattern>
 
 </sch:schema>
===============================================

With the properties extensions I want to propose for the revised ISO 
Schematron, it would look like this:


===============================================
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"

    xmlns:ext="http://www.schematron.com/namespace/extensionsAndExperiments"
   queryBinding="xslt2" >
  <sch:title>CRDL Test</sch:title>
  
  <sch:pattern  id="p1">
     <sch:title>Text</sch:title>
    
     <sch:rule context="/*">
         <sch:assert test="true()"  properties="iso8859-6alt">The text 
is ISO 8859-6</sch:assert>
     </sch:rule>
  </sch:pattern>
 
 <sch:properties>

   <sch:property id="iso8859-6alt" role="character_repertoire" >
        <union xmlns="http://purl.oclc.org/dsdl/crepdl/ns/structure/1.0"   >
            <char>\p{IsBasicLatin}</char>
            <char>&#xA0;</char>
            <char>&#xA4;</char>
            <char>&#xAD;</char>
            <char>&#x60C;</char>
            <char>&#x61B;</char>
            <char>&#x61F;</char>
            <char>[&#x621;-&#x63A;]</char>
            <char>[&#x640;-&#x652;]</char>
        </union>
    </sch:property>

   </sch:properties>
 </sch:schema>

===============================================

Cheers
Rick


-------------- next part --------------
A non-text attachment was scrubbed...
Name: iso_crdl_expand.xsl
Type: text/xml
Size: 16772 bytes
Desc: not available
Url : http://www.eccnet.com/pipermail/schematron/attachments/20080919/6d214ee5/attachment.xsl 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xxx.xml
Type: text/xml
Size: 38 bytes
Desc: not available
Url : http://www.eccnet.com/pipermail/schematron/attachments/20080919/6d214ee5/attachment.xml 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.sch
Type: text/xml
Size: 805 bytes
Desc: not available
Url : http://www.eccnet.com/pipermail/schematron/attachments/20080919/6d214ee5/attachment-0001.xml 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xxx-bad.xml
Type: text/xml
Size: 25 bytes
Desc: not available
Url : http://www.eccnet.com/pipermail/schematron/attachments/20080919/6d214ee5/attachment-0002.xml 


More information about the Schematron mailing list