[Schematron] Help sought: implementation of Character Repertoire in XSLT2 for embedding in schematron
Rick Jelliffe
rjelliffe at allette.com.au
Thu Sep 18 11:07:35 EDT 2008
Part of ISO DSDL is the Character Repertoire Description Language.
I have coded up a little implementation which
1) converts from CRepDL to regular expressions
2) converts from CRepDL embedded in Schematon into extra <sch:assert>
element (a schematron pre-processor)
However, I have run out of time to complete this this month, so I
thought I would post it in case someone else wanted to
get the regexes working. (The rest of the code seems to work, it is the
regex generating templates from line 282 that need
to be corrected.)
The XSD spec defines a set, based on Perl. But XSLT2 refers back to the
Unicode regexes. The two have significant
differences: for example the availability of && operators, the use of
|| rather than |, and the ability to have nested
[ items ].
So if anyone wants a fun weekend job, attached is the code and test
files: what is needed is to figure out what
kind of regular expressions SAXON 9 actually implements, and to generate
that. I suspect that every different
implementation of XSLT2 or EXSLT will have a different regex library in
practise!
Here is an example of a Schematron file with an embedded CDRL schema as
used by this code.
=====================================================
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
xmlns:ext="http://www.schematron.com/namespace/extensionsAndExperiments"
queryBinding="xslt2" >
<sch:title>CRDL Test</sch:title>
<union xmlns="http://purl.oclc.org/dsdl/crepdl/ns/structure/1.0"
xml:id="iso8859-6alt" >
<char>\p{IsBasicLatin}</char>
<char> </char>
<char>¤</char>
<char>­</char>
<char>،</char>
<char>؛</char>
<char>؟</char>
<char>[ء-غ]</char>
<char>[ـ-ْ]</char>
</union>
<sch:pattern id="p1">
<sch:title>Text</sch:title>
<sch:rule context="/*">
<sch:assert test="true()" ext:crdl-type="iso8859-6alt">The text
is ISO 8859-6</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
===============================================
With the properties extensions I want to propose for the revised ISO
Schematron, it would look like this:
===============================================
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
xmlns:ext="http://www.schematron.com/namespace/extensionsAndExperiments"
queryBinding="xslt2" >
<sch:title>CRDL Test</sch:title>
<sch:pattern id="p1">
<sch:title>Text</sch:title>
<sch:rule context="/*">
<sch:assert test="true()" properties="iso8859-6alt">The text
is ISO 8859-6</sch:assert>
</sch:rule>
</sch:pattern>
<sch:properties>
<sch:property id="iso8859-6alt" role="character_repertoire" >
<union xmlns="http://purl.oclc.org/dsdl/crepdl/ns/structure/1.0" >
<char>\p{IsBasicLatin}</char>
<char> </char>
<char>¤</char>
<char>­</char>
<char>،</char>
<char>؛</char>
<char>؟</char>
<char>[ء-غ]</char>
<char>[ـ-ْ]</char>
</union>
</sch:property>
</sch:properties>
</sch:schema>
===============================================
Cheers
Rick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: iso_crdl_expand.xsl
Type: text/xml
Size: 16772 bytes
Desc: not available
Url : http://www.eccnet.com/pipermail/schematron/attachments/20080919/6d214ee5/attachment.xsl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xxx.xml
Type: text/xml
Size: 38 bytes
Desc: not available
Url : http://www.eccnet.com/pipermail/schematron/attachments/20080919/6d214ee5/attachment.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.sch
Type: text/xml
Size: 805 bytes
Desc: not available
Url : http://www.eccnet.com/pipermail/schematron/attachments/20080919/6d214ee5/attachment-0001.xml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xxx-bad.xml
Type: text/xml
Size: 25 bytes
Desc: not available
Url : http://www.eccnet.com/pipermail/schematron/attachments/20080919/6d214ee5/attachment-0002.xml
More information about the Schematron
mailing list