Attendees
|
Peter Bergström |
EuroSTEP |
peter.bergstrom@eurostep.se |
Sweden |
|
Jim Crawford |
Lockheed Martin |
crawf03@ibm.net |
USA |
|
John Dunford |
NATO CALS |
jdunford@cals.nato.be |
Belgium |
|
Bernd Ingenbleek |
Concad |
bernd@concad.de |
Germany |
|
Eric Lebegue |
ESPRIT CONCEPT |
eric.lebegue@esprico.fr |
France |
|
Gregor Lorenz |
Daimler-Benz AG |
gregor.lorenz@dbag.ulm.daimlerbenz.com |
Germany |
|
Sharon Kemmerer |
National Institute for Standards and Technology |
kemmerer@nist.gov |
USA |
|
Deok-Soo Kim |
Hanyang University |
dskim@email.hanyang.ac.kr |
Korea |
|
W. Eliot Kimber |
ISOGEN |
eliot@isogen.com | |
|
Gregor Lorenz |
Daimler-Benz AG |
gregor.lorenz@dbag.ulm.daimlerbenz.com |
Germany |
|
Ming Liang Lu |
Aigis Systems Inc. |
mingllu@aol.com |
USA |
|
Helium Mak |
National Research Council |
helium.mak@nrc.ca |
Canada |
|
Juergen Mohrmann |
Debis Systemhaus |
mohrmann@str.daimler-benz.com |
Germany |
|
Andreas Ort |
IMW,TU-CLAUSTHAL |
ort@imw.tu-clausthal.de |
Germany |
|
Chris Partridge |
REV-ENG Consulting |
chris.partridge@compuserve.com |
UK |
|
Daniel Rivers-Moore |
RivCom |
daniel.rivers-moore@rivcom.com |
UK |
|
Günter Sauter |
Daimler-Benz AG |
gunter.sauter@dbag.ulm.daimlerbenz.com |
Germany |
|
Raimar Scherer |
Technische Universität, Dresden, Instut für Baumechanik und Bauinformatik |
scherer@cib.bau.tu-dresden.de |
Germany |
|
Jürgen Sellentin |
Daimler-Benz AG |
sellenti@informatik.tu-muenchen.de |
Germany |
|
Nigel Shaw |
EuroSTEP |
nigel.shaw@eurostep.co.uk |
UK |
|
Tony Stewart |
RivCom |
tony.stewart@rivcom.com |
USA |
|
Masaru Suzuki |
Nippon CALS Research Partnership (NCALS) |
suzuki@ncals.cif.or.jp |
Japan |
|
Philip Tutton |
Ministry of Defence (UK) |
calsoffice.dpmp@gtnet.gov.uk |
UK |
|
King Yee |
Boeing Aircraft |
king.g.yee@boeing.com |
USA |
|
Bernd Wenzel |
EuroSTEP |
bg_wenzel@compuserve.com |
Germany |
|
Brenda Young |
Boeing Aircraft |
brenda.young@boeing.com |
USA |
|
Dong Jinxiang |
Shejiang University, China |
djx@csdem.sju.edu.ch |
China |
Hugh Tucker, Chair of T14, was unable to attend the Florence meetings. At his request, Nigel Shaw agreed to act in his placechair the meeting, with assistance from Daniel Rivers-Moore. Nigel opened the meeting on Monday, October 20, 1997, following the SC4 plenary. The attendees introduced themselves and the agenda was presented and accepted.
The group decided to spend the It was agreed that the entire week would be spent working on the WG10 Preliminary Work Item: "SGML and Industrial Data," which was adopted in San Diego at the urging of T14 and based in large part on work done by T14 over the previous years. Because Nigel Shaw and Daniel Rivers-Moore are the co-chairsjoint project leaders of the PWI as well as members of T14, it was agreed that they would lead the group through an intensive working session on the PWI.
One of the primary areas of interest to the PWI is whether, and to what extent, the linking and addressing capabilities of the HyTime SGML standard can be used to solve the problems involved in linking/assembling human-readable information with/from product data. To this end, Daniel Rivers-Moore had invited Eliot Kimber, one of the authors of the 1997 revision of the HyTime standard and a member of the World Wide Web Consortium working Working group Group that is developing the XML standard, to join us in Florence. Eliot was only able to attend on Monday and Tuesday. Therefore, the agenda was arranged to take advantage of his presence.
Monday morning started with the opening and presentation of the agenda followed by a session for newcomers and to bring people up-to-date. Monday afternoon consisted of a presentation by Nigel Shaw on a "Beginner’s Guide to EXPRESS," largely to bring Eliot and other attending SGML experts up to date on necessary STEP concepts. This was followed by a presentation by Daniel Rivers-Moore on some of the ways in which the emerging XML standard could be used to build rich views of document components.
Tuesday has traditionally been a "presentation-day" with presentations from experts in fields that are of interest to or have direct influence on the work of T14 . Patrick Gannon of CommerceNet had submitted a paper on the role of XML in Electronic Commerce. Although Patrick could not attend, his paper was to be presented to the group by Tony Stewart. This was followed by Eliot Kimber spending most of the morning giving a tutorial in HyTime’s linking and addressing mechanisms.
Tuesday afternoon was spent in an open discussion of points arising from the previous presentations by Eliot Kimber, Nigel Shaw and Daniel Rivers-Moore, leading to suggested actions that would need to be taken in order to bring the SGML and STEP standards more closely together.
Wednesday has traditionally been a "working day." Working closely from the text of the "Industrial Requirements" section of the PWI Project Plan document that Daniel Rivers-Moore had circulated before the Florence meetings, the group developed a list of the requirements that would be addressed by the PWI and specific action points for the period before Orlando. Additional requirements submitted by Honeywell, POSC and others were also taken into account.
Thursday was the day for "summing up" day and planning for work to be done in and prior to the next meetings. A joint meeting with WG10 was held in the first half of Thursday morning, in which Nigel and Daniel reported on the decisions and action points arising from the previous three days. Then the T14 group spent a session drawing up a formal list of action items for the next four months.
The chairman went through the minutes of the San Diego meeting pointing out the highlights for those who had not attended the meeting.
The minutes were accepted.
PISTEP Funding Mechanism Available
Daniel Rivers-Moore reported that PISTEP has approved an additional item on its 1998 Business Plan, entitled "New Projects", with the SGML and Industrial Data PWI as the first (and so far the only) project coming under that heading. It will therefore be possible for PISTEP to allocate part of its 1998 budget to funding this initiative, though the amount of any such funding is yet to be determined. It will also be possible for PISTEP member companies to use this project as a way of channelling any funding they wish to provide in support of the PWI.
Joint session with WG3/T12 Process Plant
Due to lack of time, the planned joint session became a brief appearance by Daniel Rivers-Moore at the WG3/T12 meeting. He discussed the industry requirements we had so far identified and requested that they supply additional requirements if any exist.
XML-EDI (Patrick Gannon)
Patrick Gannon, of CommerceNet, submitted a paper on the work CommerceNet is doing to use XML for electronic commerce. They anticipate that XML will resolve many current problems, and may supplant EDIFACT as the primary means of performing EDI. However, it is early days yet and all of this remains to be seen.
As Patrick was unable to be in Florence, Tony Stewart read his paper, which consisted of annotated PowerPoint slides. Daniel Rivers-Moore noted that Patrick Gannon currently expects to attend the Orlando meeting, so questions could be directed to him then.
Group consensus: These are interesting developing technologies not directly relevant to our near-term work. We will "watch this space," but no direct action required now.
The following action items were agreed during the Florence meetings. A more detailed explanation of each of these items can be found in the Notes and Transcripts that follow. Unless otherwise noted, each task is intended to be done at or before Orlando. [The "Benefits" column contains high-level benefits as perceived by the editor. Many other benefits can be derived from these same actions; see the Transcript for more details.]
|
Task |
Benefit |
Note |
|
Create an SGML Property Set corresponding to the EXPRESS model defined in SDAI, plus possibly some extensions, in order to represent all of EXPRESS in a single SGML model. In later notes we called this: "a HyTime view of EXPRESS-driven data." - Identify aspects of EXPRESS that cannot be modeled in SGML, if any. |
Allow hyperlinks in and between EXPRESS data. (Where "hyperlink" is an SGML term that can encompass both linking and referencing.) |
Primary work to be done by SGML experts, probably from Isogen, with assistance of STEP experts, probably from EuroSTEP. |
|
Create EXPRESS models of the three core SGML structures: Property Set Requirements (section A.4 of the SGML standard); the HyTime Property Set; and the SGML Property Set (at least the abstract part). Work to be done in the sequence listed above.
|
Allow storage of document elements, including links, in EXPRESS-driven databases. This should allow robust management of elements and links. |
Primary work to be done by STEP experts, probably from EuroSTEP, with assistance of SGML experts, probably from Isogen. |
|
Propose a generic approach to handling encoded data (including SGML) within EXPRESS databases. At another point, this action was defined as: Propose a generic mechanism for referencing data objects whose syntax may be different from EXPRESS.
|
Two subtly different tasks (inclusion vs. reference) to accomplish the same goal: allow efficient use of document elements, multiple languages, graphics, and much more in EXPRESS databases. In fact we expect one solution to cover both sub-goals. |
SGML_STRING generalized to all data formats. No resources assigned to this yet. |
|
Investigate the character set issue.
|
Allow international character sets both in the EXPRESS language and in the data/documents being processed or stored. The importance of consistency between EXPRESS and XML in the character sets supported was stressed. |
Phil Spiby to be involved in this, in liaison with the character-set specialists in the XML Working Group. |
|
Issue a ballot comment on Part 41.
|
Broaden the currently proposed solution to the multi-language problem so that it can handle our requirements. |
EuroStep/Concad to draft the comment. Jim Crawford to review and try for US submission as well. |
|
Global addressing and naming mechanisms
|
Links must be able to point to anything anywhere by globally unique name as well as location, and must be maintainable even if either the source or the referenced item is moved or copied. There must also be a way to verify that the thing being pointed to is the thing you actually wanted. |
Much work is being done on this elsewhere, so it’s a question of identifying the best and/or most standard solution. No resources explicitly assigned. |
|
Address EXPRESS’ lack of a generic storage model
|
Required in order to globally address EXPRESS data. |
This is in the "virtual" Part 20, but now needs to be codified. EuroSTEP to take this forward. |
|
Develop an XML interchange format for EXPRESS data
|
Would allow delivery of EXPRESS data to (anticipated) inexpensive XML tools, browsers, etc. Other benefits to be identified in the paper. |
RivCom to work with Henry Thompson (University of Edinburgh), |
|
Update the Industry Requirements section of the Project Plan document
|
Generate funding and momentum; explain the issues. |
Will incorporate feedback received before and during Florence. |
|
Liaise/have joint meeting with the Parametrics group in Orlando and assess the relevance of HyTime/SGML in addressing their concerns. - This needs to be scheduled. |
HyTime offers many mechanisms for linking, referencing, and replacing text with other values that should be useful to Parametrics. |
We are required to solve many of the same underlying problems. |
|
Write memo re: ISO TC10 DIS on administering the management of documents, saying we’ll provide comment on the DIS and perhaps liaison later if required. - Deadline is 12 November! |
Management of documents is also within the scope of the PWI. |
Nigel Shaw and Jim Crawford to do this. |
|
Thank Barrie Reynolds of Honeywell for his submission.
|
||
|
Thank Robert Aydelotte of POSC for his submission.
|
The Notes and Transcripts were compiled from detailed notes taken by Tony Stewart and Peter Bergström. Action points and passages that in retrospect seem especially relevant are shown in bold. However, these are not the only interesting sections. Participants in the meetings characterized them as among the best working meetings many of us had ever attended. If you take the time to read through the sessions, you will learn why.
The notes on Monday’s sessions are the least complete, but this is not a problem because the most meaty work took place from Tuesday onwards.
Monday
Beginner's Guide to EXPRESS (Nigel)
Shaw: What is a product? "Whatever you are interested in." What is NOT a product? (No answer.)
Shaw: Product data exists in many different forms, in many different systems. The description of product data is one up from the implementation forms. A conceptual database is essentially created. "Heterogeneous implementation forms" has been there from the beginning.
EXPRESS: a language for recording and communication data models in a implementation independent fashion. Capture everything that is needed, support automatic generation of implementation...
EXPRESS-G Graphical representation
EXPRESS-P processes
EXPRESS-C O object methods
EXPRESS-I Instances (can be done with SGML also)
(Nigel displays a brief example of EXPRESS using dogs and cats to show an entity-attribute modeling approach.)
A discussion of constraints in EXPRESS vs. SGML. EXPRESS is not a programming language because it cannot be RUN: it exists to set constraints. EXPRESS 2 will be RUNnable.
Discussion of the presence of rules in EXPRESS, which does not include the ability to specify when the rules should be imposed. You can't see when and where a rule should be applied.
Discussion of the simple Types in EXPRESS, attributes, and the role of constraints in the language.
EXPRESS is the key to STEP, but is being used beyond, in other standards.
Discussion of the Part 21 file format. The file format can be derived from the base EXPRESS model, so a file format comes as a "freebie." The Entities in the file are not named. They are sequence dependent, which means you must have the schema in order to interpret the file. The file contains the public identifier of the relevant STEP document, but that begs the question of which version of that document should be used for conforming the file.
SDAI: a standard way to define an API against data from an EXPRESS schema. No matter what language is used, the semantics of the functions are the same.
Discussion: Nigel Shaw and Eliot Kimber discuss whether we’re better off with a file format API like SDAI, or a querying language. This raises the issue of early vs. late bindings. The choice of which is preferable depends on the stability of the model, since SDAI is a programmer’s interface that must be defined early, as opposed to a query language or end-user’s API which allows late bindings.
Introduction to the question: Do we need SGML_STRING?
Shaw: SGML_STRING is a proposed addition to the EXPRESS language which would serve as a bucket to hold SGML data. A storage model for some data that happens to be modeled/written in SGML.
Mak: You might want to store SGML in EXPRESS, but you might just want to make links and store the SGML elsewhere. This is (or should be) an end-user choice.
Rivers-Moore: A document is a presentation of a subset of the information in human-readable form. This means that it is not conceptually different form the rest of the data you are working with.
Shaw: Along those lines, in recent NATO work SGML-String has been generalised to cover generalised encodings.
Partridge: A crucial point is that implementation is meant to be a picture of the real world, not just information about information.
Shaw: I should have said that EXPRESS does not enforce any modeling style.
Partridge: The point is to make the language as expressible as possible.
XML Data Schema (paper from Henry Thompson presented by Daniel Rivers-Moore)
Rivers-Moore: Henry Thompson of the University of Edinburgh has sent a contribution to the group, in the form of a document entitled "Specification for XML-Data." This is a work in progress, not a standard, which is available on Microsoft’s web site. Rivers-Moore: The paper includes samples of XML. You can apply the syntax to things outside of books and documents, in this case an order for a bookstore. XML provides a syntax for easily writing data structures.
Rivers-Moore: Think of XML as "simplified SGML," where the document’s text can itself be data. An example shown is a recording’s title that includes the name of the composer. The composer’s name can be identified inside the string within this particular document instance. This is eminently possible in SGML, but difficult or impossible in EXPRESS.
Comments by Shaw, Rivers-Moore and others: The SGML DTD corresponds to an EXPRESS schema, while a document instance corresponds to a Part 21 file. General discussion of the similarities and contrasts.
Rivers-Moore: XML makes the DTD optional, provided that you put in fully-formed tagging so that the schema can be inferred from the document.
Kimber: There are two reasons to have a schema. 1) To have formal declarations. 2) To be able to parse the document. XML imposes constraints so that if the document is well-formed, you can always parse it without a DTD.
Rivers-Moore: Both XML and SGML insist that there must be a well-formed tree, though its pieces can be sent in separate files which reference each other and are assembled at the receiving end.
General discussion of Typing, which is possible at the Attribute level in SGML but not in the ordinary contents of an SGML Element. This causes a dilemma for the person defining the SGML schema: whether to store SGML data as an attribute in order to be able to Type it, or to store it inside the element where it can’t be Typed but it can be richly tagged.
General discussion of Namespaces. The proposed XML language will handle namespaces via colons, which allow you to indicate the space within which an entity name operates. Shaw explains that STEP has rules to handle most of these issues, but that there is an implied single namespace for the assembled schema. This is seen as a weakness by the group, since it assumes that all referenced items are within the bounds of the assembled schema.
SGML data decomposition (Daniel Rivers-Moore)
General discussion of IDs. IDs are a Type within SGML/XML. They must be unique within a given document (or namespace). One can use ID’sDs within attributes to link elements together, just as a foreign key in a database record allows it to be linked to another record.
Rivers-Moore demonstrates how one can "normalize" SGML data. The format in which most SGML data is stored is a nested tree where each element contains sub-elements down to very atomic levels (e.g. paragraphs, words or even characters) and there are no IDs or, therefore, links based on IDs. At the other end of the spectrum, one can fully normalize a document by decomposing it into semantically meaningful data elements (e.g. names, addresses, part numbers, etc.), assigning an ID to each one, and storing the links between these elements which, when assembled, recreate the document in a human-readable form.
Rivers-Moore demonstrates a tool RivCom has developed that applies XML presentational rules to XML data that has been stored in a highly decomposed form, and by applying different Style Sheets, presents the data in various useful human-readable forms. This shows that the above example about normalizing SGML data is not merely interesting theory, but achievable in current software.
Shaw and Kimber discuss the fact that Rivers-Moore’s demo includes both facts and also the strings that represent the links between the facts. Shaw says that these are part of the "presentation," but wants a better word.
Shaw: Business needs the ability to publish derived documents (e.g. documents assembled from bits of data and text elements), then store these as data. For example, the manual for a product is assembled from bits of information about the product, then shipped (with a part number) as part of the product itself.
Kimber notes that a DTD is a form of syntax constraint declaration, not really a schema. It determines what can and cannot be described within the SGML.
Mak: One point about the comparison made earlier between DTDs and Schemas is that you don’t want to have to create as many schemas as people are creating DTDs.
Kimber: In SGML, it’s really Architectures that correspond to the EXPRESS Schema, not the DTD. An Architecture is the union between the SGML declaration and anything you want to write about it. [Note: Architectures are part of the HyTime standard.]
Kimber: What are the things we can agree on about a document structure within a community?
(He draws a diagram showing the continuum from generic models (e.g. generic data content and metadata), through increasingly specific levels (e.g. header, list, figures), to very specific (maintenance manuals, reference guides, etc.) The different levels are delineated by lines drawn across the diagram.)
Kimber: (Commenting on the diagram.) As the communities of interests narrow, specialization increases. But you can always derive the inheritance path (from the generic to the specific). How should we decide where to specialize? "When you can’t agree any more, it’s time to draw the line (across the diagram) and specialize in order to go on." Each level represents a new architecture; each can be derived from another. In a sense, they are really schemas.
Shaw: Another view of the world is from a database perspective. This has three levels: Views of the information (Views); Conceptual design (the nature of the stuff being stored); and the Storage design (storage mechanism).
Discussion between Kimber and Shaw about whether we need SGML_STRING. Kimber believes that it seems EXPRESS could go lower in its decomposition of what are now STRINGS. Shaw believes that we do need SGML_STRING, but that this conversation will continue on another day.
Tuesday
What HyTime can bring us (Eliot Kimber)
Rivers-Moore: introduces Eliot Kimber, a consultant from Isogen who is one of the authors of the 1997 edition of the HyTime standard and in a member of the working group that is developing XML.
Kimber: This will be a presentation on SGML, HyTime and Data Abstractions; its purpose is to provide enough technical background to this group so that we can have a useful discussion.
Kimber: HyTime is ISO 10744 ; an application of SGML that addresses the areas of hyperlinking, addressing and multi-media structuring, all using SGML syntax. ISO 10744 also includes annexes that provide various extensions to SGML, the most important of which are the Architectural Forms definition requirements (a general mechanism for having a hierarchy of document type schemas from which you can specialize).
(Concept demonstration. Using the HyBrowse browser, which can be downloaded from the web, Kimber demonstrates using hyperlinks to structure annotations applied to documents. This allows one to apply edits to documents without modifying the documents themselves. Starts by showing original version of a document, plus comments that exist as hyperlinks between the original lines and the suggested change. One can build a view of the document from this data and links: "Here’s the original, and here’s what you should change it to.")
Crawford: What are the rules for hyperlinks?
Kimber: Hyperlinks are strongly typed. The relationship itself is typed, and each end of the link is typed. HyTime makes no policy links, and it needn’t be binary.
Shaw: What makes a link "hyper?"
Kimber: We use the word to describe an arbitrary relationship, in order to distinguish from the LINK feature of SGML. Because I’m using normal SGML syntax, the description of the link can be as rich as I want it to be. Metadata, markup, etc.
Kimber: One thing that distinguishes HyTime links from others is the idea of strong typing, giving presumably rich names to the attributes of the links. I can create arbitrarily complex networks of relationships and manage those using the same technology I already use to manage my SGML data: authoring tools, display tools, etc.
Crawford: Are there tools that allow you to manage your network of links?
Kimber: None that you can buy off the shelf. All the existing SGML document management systems are weak, because they don’t provide facility to manage the relationships.
Crawford: Do you know any industries where they’ve come up with a rich suite of link types?
Kimber: Not really. One customer, the Congressional Info Service, abstracts publications of the US government; essentially creating hyperlinks from documents to their abstracts. Isogen created a system to manage that, but it’s a pretty simple set of relationships. Other research done by the French electric company uses Topic Maps… One of the challenges of HyTime is that it’s complicated, so it has taken time for people to start implementing it on a large scale. It’s a very general mechanism and a complete mechanism, though simple at heart.
Rivers-Moore: Are there any tools under development?
Kimber: No. SGML management systems today are very weak in this area.
Crawford: It’s extremely difficult to manage the links using relational architecture.
Kimber: The scale problem is daunting.
Crawford: The "where used" query in particular will dim the lights on the mainframe.
Kimber: The complexity of the information is unavoidable, so the only way to solve the scale problem is to make the computers faster.
Rivers-Moore: These problems are analogous to the EPISTLE problems. I believe that when we come to work items, the problems will be simple, but on the implementation side the problems will be very real.
Crawford: The issue is to bound the problems. We must stick to a domain like product data and its usages, thus defining our implementation of HyTime for our domain in order to avoid the problems you get by being arbitrary.
Shaw: It depends on the granularity of your information at certain levels. If you split everything up so you’re addressing individual addressable items in your storage mechanism, it can be done.
Introduction to Groves and Addressing
Kimber: (Back to the presentation.) In order to create hyperlinks, you must have a way to address the things being linked. The bulk of HyTime deals with how to address these things. If I’m linking things of different types, I need a sufficiently robust, sufficiently standardized way of addressing them. HyTime, like STEP, was designed for large scale, scalable systems.
Addressing (means) identifying semantic objects within a given storage object. The things being addressed are abstractions in memory, not the source data. This requires a standardized, generic representation of the abstractions: "Groves." Groves are a generic abstract object model optimized for representing parsed SGML documents. Once something (e.g. a parsed document) is in a grove, I can address the objects in a grove using HyTime syntax. Because groves are general, it should be essentially identical to other object property models. It should be relatively easy to define mappings between other existing object models and groves. (Both DSSSL and HyTime define mappings to and operations on groves. Together they provide the capabilities to do anything you would want to do with groves. Anything you want to do with HyTime can be implemented using the DSSSL functionality.)
In SGML, the fundamental unit of storage is an "entity." An SGML entity is an abstract storage object, no relation to "entities" in data modeling. SGML defines two kinds of storage entities: documents and data. (Or, SGML and not-SGML, which from the SGML perspective describes the entire world). Addressing in SGML is always relative to its storage container.
System names and public names: If I only use public names within a document, the system can map them to real storage locations. ISO 9070 defines public identifiers and a registration mechanism for defining registration owner names, essentially identical function to ISO 8824. These provide a level of indirection; universal, including a registered owner name. HyTime’s addressing depends on all the mechanisms already in SGML for describing the location of information.
It is impossible to manage addresses if they are not indirect. The addresses are within documents, and if you cannot change the documents, you cannot change direct addresses. Indirection also allows us to transform one form of address into another form of address. The system can transform direct addresses created by the authors into indirect addresses that are stored and can be maintained.
HyTime standard available on the Web
www.ornl.gov/sgml/wg8/docs/hytime. (Not sure of the full URL, but if you go to the ‘docs’ level you should be able to find it.) The HyTime syntax isn’t important. What’s important are the underlying semantic mechanism, and the abstractions we depend on. In particular, Groves.Groves
A grove is what you get when you parse a document and build an abstraction of it in memory. Roughly equivalent to a "parse tree," though that’s not the complete story. Groves are what makes everything in SGML, Hytime, DSSSL etc work. The parser that creates the grove is the only part of the system that operates on the source data; all the rest operate on the grove.
A grove is a collection of nodes. Each node has a specific node class, a set of properties, and a value, which can also be a list of elements or a string of nodes. Within the grove, properties representing "content" are distinguished from other types of data (i.e. metadata). In sum, you have objects with properties and relationships among the properties.
Groves are "constructed" based on a property set. For a given property set and set of data, there can only be one grove. But with different property set applied to the same data, you get a different grove.
The construction process:
Property sets are actually schemas. They are used for defining the object model for SGML and the semantic processing that HyTime does.
A complete grove contains many things you don’t need for a given application. So, you create a "grove plan" which governs what should be included in your view of the grove. The grove plan defines a subset of the functionality of the entire property set. (A grove plan is my "view:" include these properties, exclude those properties.) Different applications can have different views of the data, and can then communicate with each other about their different views, which is very important.
A grove plan can be applied during grove construction, thus affecting the grove itself (as opposed to merely a view of it). So, a data set plus a property set plus a grove plan defines a grove. Once the grove has been constructed, you can use grove plans to govern your views of it.
For any common type of data, there is probably an SGML parser already. The real challenge for creating groves is defining the property set.
Shaw: This I find confusing, because in the STEP world we’ve been working on building a common understanding of what the information is, so you can navigate on the conceptual model rather than on the output of the parser.
Kimber: But, you need some kind of parser even then, you just haven’t included it in your purview. My understanding of computer science is that every system in the world does require these elements.
Bergstrá öm: But it doesn’t have to be stored in groves.
Kimber: Right, this is just an abstraction of an abstract data model. You can implement this any way you want. But, all SGML tools have to operate on groves, so as an SGML systems integrator I’d like to see my information parsed into groves because then I can integrate everything with everything and then it’s really easy. <g>
Shaw: I don’t think parsing is quite right, because you can navigate rather than parse.
(Morning Break)
Sources of HyTime information:
www.hytime.org HyTime users group
www.drmacro.com/bookrev HyTime book (out of date)
www.ornl.gov/sgml/wg8/docs HyTime standard
www.isogen.com Papers, Software, Adept Editor HyTime scripts etc
Relationship between EXPRESS and Groves
Kimber: Because a grove is a representation of objects and relations, it should be possible to present EXPRESS as a grove, maybe not all of it, but a fairly close subset at worst. It should be possible to define a property set from a given EXPRESS schema. I can then define operations on the grove, that express the semantics of the EXPRESS schema.
Shaw: It would be interesting to express the SGML property set as an EXPRESS schema. Probably possible to define everything in EXPRESS.
Kimber: I would be surprised to find that there is anything in a grove that can’t be expressed in EXPRESS. However, I think it’s likely that there are things you can express in EXPRESS that could not be expressed in a grove.
Wenzel: I’m surprised when you say that EXPRESS could map the whole grove, because EXPRESS doesn’t take the representation of the information into account (much). We need in the STEP world to add representational capabilities to EXPRESS if we want to carry it forward.
Rivers-Moore: One possibility for work to be done is to attempt to build a complete EXPRESS representation of a grove, or vice versa, then see what needs to be done to flesh them out.
Wenzel: You probably cannot map the entire property set into EXPRESS, since there are implementation specific things added when we go from express to implementation forms such as Part 21, Part 22.
Property Sets
Kimber: The SGML property set is really the heart of the language/standard. It contains normalization rules, for normalizing (say) strings for comparison. (The Property Set contains references to the actual SGML standard, by paragraph within volume, which is the normative definition of what SGML is.) This section consists of instructions to whoever is implementing the Grove Constructor program to implement it correctly.
Kimber: You can formally define a namespace in the property set--then hope that the implementations properly use that namespace!
Kimber: The SGML property set is divided into abstract properties and markup properties. Markup properties hold the original strings. You could rebuild the original document, byte for byte, by assembling the markup properties. But normally you ignore them. Once you’ve parsed and stored the abstract contents of a document, you could (ignoring the markup) generate a new SGML document containing the original content, but not necessarily containing the same syntax as the original document (since the markup in the new document is generated). But you could write a query that would traverse any grove that included the appropriate markup properties and regenerate the original document, byte for byte.
Shaw: In many cases, EXPRESS does not assume a syntax for the implementation, and we don’t care about it, so we don’t include facilities to store and recreate the original syntax.
Kimber: SGML’s reason for being is paranoid distrust of all programmers. As soon as you put your data into an abstraction that can only be accessed by programs, you have given over your ability to access the data to the program and the programmer who created it. SGML therefore allows you (if you wish) always to get back to the original, because you can preserve every character in the original data.
Shaw: STEP also expresses a distrust in one sense, in that we want to get hold of the information in a way that is independent of the programmer.
Sauter: I’m not so interested in whether I can map SGML to EXPRESS and vice-versa, but rather, can I get back to my original data from the documents that reference it.
Kimber: The thing that I can control as the owner of the data is its string representation, and everything else is driven by that.
Shaw: But, a lot of what we’re interested in is not efficiently handled as strings, or it’s not nice to view it as a string.
Kimber: You should be able to define an EXPRESS schema for the grove model, and then use EXPRESS-based tools to manage the data that has been placed in the groves.
Kimber: So, we want to get non-SGML data into our documents in a standardized, non-proprietary way. And, we need to define constraints on the content. HyTime provides some relevant features. First, the addressing mechanism can address anything. Also, HyTime provides lexical typing: string spelling rules. With the combination of lexical types and the ability to get data from a database, we get a lot of constraint definition capabilities.
(draws illustration)
DEFINING CONSTRAINTS
Lexical typing
No algorithms, except string comparison
Regular expressions
Not data typing
Use by reference
value reference facility
property, content and annotation relationships. The two first in SGML.
(end of illustration)
Kimber: You can use any mechanism you want for lexical typing and queries.
Kimber: "Use by Reference." This is a general clause. SGML focuses on content and property relationships. HyTime goes further. Using HyTime location addressing, I can say that the content of an element is an item in a database. Using HyTime, you make a grove out of the contents of a database: write a function that runs a query into the database via the API, and returns the value in a form the SGML (HyTime) engine understands. (And note that additional constraints on the data can be placed in the query.) The general mechanism in HyTime is called a Value Reference facility. This is different from saying that I’ve got a content element whose value is something else; instead, we say to use the actual part number (in a database) as the content.
Shaw: But you have to define the query up front. This is early-bound.
Kimber: But, you can define a query that is very generic and will be bound at runtime via parameters or whatever.
Crawford: How are links articulated in the groves?
Kimber: (draws picture) Via the HyTime Semantic Grove, which is created by the HyTime engine, and includes as its nodes anchors and pointers to the original data. This grove is also defined via a property set, but in this case the property set relates to semantics rather than syntax (unlike the SGML property set). With the HyTime property set, if I have another data type which includes things from which I can derive the same semantic objects, and I’ve created a grove from it, I can apply HyTime hyperlink processing to that other data set even though it’s not represented in SGML. For example, if I’ve got a CGM grove, I can use that grove as input to the HyTime engine.
Shaw: This raises a further topic for discussion. We could do an EXPRESS model of the HyTime semantic, and an EXPRESS model of the HyTime storage mechanisms.
Kimber: I should be able to add properties to my non-SGML data which are addresses into my SGML data; for example, make a property of a Part the address of its description in a document. (These don’t just have to be strings.) Use the address of the grove node as the address in the document.
Shaw: But even though the above would solve some of the problems, we’d like to be able to store the SGML data itself in STEP databases.
Rivers-Moore: What is the reason we would have to have an SGML-STRING type in EXPRESS--in order to do that?
Shaw: For a start, it would have to contain binary data. We don’t need SGML String; we need an information object storage structure. That would convert the STEP world into an open standards capability in some senses.
Rivers-Moore: We must reach closure on this particular decision this week, as it’s been hanging for several years.
Crawford: I think Nigel is onto it. There are many reasons to open up the STEP storage architecture, including SGML and other things we haven’t thought of yet.
Mak: One thing you need to consider is who’s actually going to use the piece of information: PDM system, STEP system, what? If it’s a PDM environment you probably want it outside of STEP so others can access it. What is the main use of the information?
Crawford: Given that in this particular world we know how to use the information, there’s clear rationale for inclusion in the STEP data. But I’m torn about whether this should be more generalized because we’re in the world of standardization.
Crawford: Nigel commented before about the difference between parsing and navigating. We have the conceptual scheme and the presentation scheme. Something goes in between, though, which is how to instantiate and navigate the information. I don’t see anything in the standards world that helps me on that. What I’ve heard today is wonderful stuff on the delivery side, but how do you manage the links and the webs and dynamically have access to them? I think the grove concept really shines some light on that…
Shaw: Bernd has remarked that there ought to be a Part 20, which would be an abstract representation of the data similar to the grove.
(Lunch break.)
Open Forum for discussion of points arising from Eliot’s presentation
Shaw: We’re increasingly getting into architectures with more than one level of abstraction. A nice way to move between those are these kinds of mapping tools.
Kimber: One nice thing you can do with a style language is to say that the style result is another document.
Kimber: I’m trying to think of the cross-ways these two things could be applied. You could definitely define a property set that would describe an instance of things defined in EXPRESS.
Shaw: But EXPRESS doesn’t go down to the physical level. We have the assumption that there are identifiers, but we don’t get down to the identifiers themselves.
(Rivers-Moore discusses the different kinds of property sets one could define.)
Shaw: The name of the game is the grove constructor, isn’t it.
Shaw: SDAI lets you potentially get little bits of your grove, not the whole thing. But you could construct a mechanism for doing so.
Shaw: The basic working assumption is that we have access to the schema and some instances that correspond to it.
Rivers-Moore: That is, a Part 21 file plus the corresponding schema?
Shaw: Logically speaking, there is a sort of Part 20, but it’s only implicitly inferred from statements in Part 1.
Rivers-Moore: So, one could say that there’s a possible NWI to develop the property set derived from the schema and the Part 21 file.
Shaw: No, you just need the schema. However, there are several addressing problems in EXPRESS: Getting to a starting point in order to walk through the members of a set; Getting inside of a STRING if you encounter one.
(Discussion of the fact that in an SGML grove, the root node corresponds to the storage container. So, the grove root can be considered a "schema_instance" with a nodelist that can have zero or many nodes.)
(Discussions of issues surrounding what is required when you encounter a document instance inside an EXPRESS database.)
Rivers-Moore: Suppose you have a well-formed XML document in a STRING field in the EXPRESS database. When I encounter this, how do I know it’s an SGML string? What seems to me is missing (which Nigel showed this morning) is a pairing of a data field and a second field explaining the encoding of the first field.
(Nigel Shaw draws a chart which starts out as a simple listing of the four levels of the STEP standard relating to the EXPRESS language–EXPRESS instances, Part 21 files, SDAI, and EXPRESS Schemas – but expands during the following conversations as more and more details are added. This is the final appearance of the chart.
(Note that just one Action Point was derived from this chart. It is inserted in the text in boldface at the point during the conversation below when the need for it became apparent.)
|
property sets |
|||
|
Storage model |
Syntactic |
Semantic |
|
|
Instances of EXPRESS |
None |
no syntax known |
Early and/or late? |
|
P21 |
"file" name + info obj id |
header + sections |
same as above + a bit (id's) |
|
SDAI |
Repository + schema instance |
same as first + other extensions |
|
|
EXPRESS schema |
info obj id (8824) |
same as above |
|
Kimber: In order to reference data in a STEP storage system, I need to reference both the data and the repository. I need a way to say that "this set of objects is an identifiable unit."
Shaw: A "schema instance" in SDAI terms.
Kimber: But the STEP standard does not provide a way to identify the storage object that conforms to the schema. From a HyTime viewpoint, an address always contains two parts: Storage Object and Semantic Object, where the semantic is always contained within the storage.
Shaw: We should continue using the 8824 mechanism to identify the storage instances.
Sauter: The problem is that in EXPRESS the identifiers are explicitly hidden. This is the biggest problem.
Kimber: Getting to the individual objects isn’t necessarily the problem.
Rivers-Moore: Once you have gotten "somewhere," you can navigate by walking the links or whatever. The issue is how to get to the first place.
Kimber: Because EXPRESS lacks an abstract storage model, it doesn’t provide a way to identify schema instances that removes the need from the person making the reference to know anything about how the schema instance is stored.
Shaw: Maybe "accessed" instead of "stored."
Kimber: I know which schema I’m using, and I want to access an object that is an instance of it, but there is no notion of "this schema instance" and therefore nothing for me to give a name to except as provided through a particular API (Part 21).
Sauter: Are you talking about instances of a schema or instances of an entity?
Shaw: It’s a set of instances corresponding to a schema.
Kimber: In other words, for some set of objects that all use the same schema, they represent a set of objects that are instances of the schema.
Shaw: But in SGML you cannot have an empty set, while in STEP it is possible.
Rivers-Moore: Not a problem, because there is an implicit root, the empty set.
(Discussion of how the root can also be considered to be the storage container.)
Rivers-Moore: So because an SGML document has to have a single root element (exactly 1), and because EXPRESS can have 0, 1 or many entities in its instance, we’re saying that that set of instances in EXPRESS will be considered to be an artificial entity that corresponds to the root at the top of the SGML grove tree.
Shaw: But what about when a repository contains multiple schemas?
Kimber: Use a hyper-grove, containing one or more groves.
Shaw: But those schema instances may conform to different schemas, but have overlapping data. My schemas conform to a view, and I may have several views of the data… In most cases, the same sub-schema will have been re-used in two different schemas. The same data with the same structure, but it reappears in two schemas and I can navigate to it according to two different routes via the two different schemas and end up at the same place.
Kimber: So (drawing), I have data object A which is a real instance in a repository with type X. That type X may be defined in Schema A and Schema B. Does Type X have to be the same in the two?
Shaw: Let’s assume Yes, but there may be additional sub-types in one of the two.
Kimber: But we don’t care about that, we’re paying attention to just this level. In that sense, then this is just like the SGML view…
Rivers-Moore: It seems to me we’ve introduced a new set of requirements for finding an analogy.
Shaw: Let me put my spin on what I think is happening. As you go from the EXPRESS schema to SDAI, there is a more tangible storage. And I can go one level higher, to a collection of repositories. But we still don’t have a mechanism to get at the bounding repository.
Kimber: Does an object know what schema governs it?
Shaw: Yes, effectively. The EXPRESS language operation returns "type of" which includes the schema in order to fully identify the type.
Kimber: So, for any object there is a relationship to its types which involves its schema.
Rivers-Moore: And might be multiple schemas?
Shaw: Yes.
Kimber: As we get into combining schemas, we get into the area of Architectures.
Shaw: Anything that is a shared or public schema assigned from within the organization has been assigned an identifier. (There’s also the BSU used in PLIB, but I’m not going into that…)
Sauter: What about using the URL?
Shaw: That’s not within the scope of what we can standardize.
Kimber: Can I define relations among objects that are governed by different schemas?
Shaw: No.
Rivers-Moore: In HyTime you can.
Shaw: That doesn’t mean it isn’t desirable. Once we have an addressing mechanism for schemas, we could do this. But I’d rather use another standard to do that if it exists out there.
Rivers-Moore: Here we’re discovering things that can only be done in one or the other of these standards. These are interesting points…
Shaw: We have a world at the moment where the relationships supported by the scheme of things are those defined in the schema. At an instance level we need to be able to break out of that. This is also being addressed by Parametrics…
Kimber: If the schema definition is part of the data, then it makes sense that you’d want to create a property set for it in order to apply your grove tools to the schema.
Shaw: The SDAI document includes the ability to hold an EXPRESS Schema; therefore, EXPRESS already includes an EXPRESS Schema of itself in order to enable this. This would probably be a Semantic property set… If we’re going to be consistent, we end up with a single model that is consistent with the EXPRESS defined in SDAI, plus possibly some extensions. This would be the most useful way (or starting point) for us (if we decide to make such a property set).
Action Point agreed: Create an SGML Property Set corresponding to the EXPRESS model defined in SDAI, plus possibly some extensions, in order to represent all of EXPRESS in a single model.
Shaw: All of the above is for pulling EXPRESS stuff into the SGML world. It might be useful to do the reverse, and bring the SGML and/or HyTime property sets into the EXPRESS world.
Kimber: If I’ve got an EXPRESS model of the property sets, then I can use EXPRESS tools to work with the information that has been placed in my groves.
Shaw: It looks like we need an EXPRESS model based on clause A.4: Property Set Requirements Annex. But should we model the SGML Property Set syntax? It looks like potentially there is some value there.
Kimber: At a minimum you would do the base abstract level, but to have a complete implementation you’d have to also include the markup properties. Question is whether you need to be able to get the same string out?
Shaw: If you’re going to do it, you may as well do the whole thing.
Kimber: If you include the markup properties, they are redundant with the abstract properties.
Shaw: Not from our perspective.
Kimber: For every character in every string, there is a data character object. We have the mechanisms here to handle two quite different levels of granularity.
Shaw: Then there’s the HyTime property set. I know we have to model this, because I want to be able to store and manage those HyTime links.
(Afternoon Break)
HyTime Addressing Concepts
(Because the conversation prior to the break had focussed on addressing issues, Eliot Kimber begins with a further discussion of the HyTime addressing mechanisms.)
Kimber: There are three forms of addressing: name based, structure based and semantic.
Location source: the root of a grove or sub-grove. If you traverse location sources upwards, you end up at the root of the entire grove.
Hyper-grove: a set of groves based on documents that are related to each other, each of which is likely to end up containing pointers to one or more of the others. A sub document is not syntactically a part of the document that references it, so that it must have its own grove.
HyTime has a concept of Bounded Object Set; you can have a set of groves and say that together they represent a set.
(Discussion of how every system has to be bounded somewhere. Yesterday, Jim Crawford suggested that perhaps by limiting the bounds of the problem domain and the range of link types to be used we could arrive at a manageable amount of work for the project, and solutions that can actually be implemented.)
Bergström: One thing we haven’t mentioned is how to transform the data you retrieve from your data system before linking or presenting it.
Kimber: In HyTime terms, in order to retrieve (say) a Real from a database, I have to have a grove with a property whose type is Real to hold the retrieved value.
Bergström: But where does the knowledge of how to transform the number go?
Kimber: In the grove constructor that pulls the value from the database.
Bergström: But I’m looking at the presentation of it.
Kimber: In the DSSSL spec I can format it any way I want, including by interrogating the contents of your other groves (e.g. the HyTime semantic grove) to determine what to do with it.
Sauter: Can you also associate a tool with the data in order to specify how you would view or work with the data in the document?
Kimber: Yes, you can associate tools with the data notation you use to locate the container of the data.
Liu: I recently proposed an NWI for plant operations. How can this technology support real time requirements?
Kimber: In a sense it’s the same picture we’ve been discussing. We are linking a database, with documents, with the application(s) used to access/display the data in the database. There are a couple of ways to generate documents differently on the fly based on real-time operating conditions (such as an abnormal temperature). One can tag the information and have a style sheet that determines what to show based on IF tests, or one can switch between style sheets. Another way is to set up business rules that are linked to presentation rules.
(Discussion of how you can put triggered actions inside a DSSSL style sheet, not just representational changes. There’s nothing inherently dynamic about DSSSL, but it has the ability to call functions, so if you put it in a loop and constantly evaluate the style sheet you can cause it to call other routines.)
Bergström: Let’s get back to the issues of what models need to be created.
Kimber: Do you have a program to take a DTD and generate an EXPRESS model from it?
Shaw: No because there are too many decisions that need to be made in the process.
Rivers-Moore: One could standardize the way to do that, but would it be a good idea?
Group consensus: No.
Bergström: I’m concerned about how to go about building the SGML model (property set) based on the EXPRESS model.
(Discussion of resource constraints (or possibly availability of resources) to do these conversions.)
Kimber: I suspect that each side should build the model in the language it understands of the other side’s data. The difficulty is understanding the semantics that you have to express.
Rivers-Moore: Yes. Each side of the world has done its self-modeling in its own language. Now pass those models across to people on the other side so they can use the language they understand to translate and represent the semantics from the other side. With, of course, dialogs about what it means when one side doesn’t think it can understand or translate the other side’s semantics.
Kimber: A tremendously useful exercise. Doing a property set is much more like doing an EXPRESS model than doing a DTD is like doing anything else.
(Discussion of whether it’s more practical for one person to do this kind of work, or a team. An individual would be capable, given the right base knowledge.)
Rivers-Moore: Maybe it makes sense to create paired teams with an expert in the target language doing the primary work, along with a semantic expert from the other side to field immediate questions.
Kimber: Is this proposed work part of the PWI or an exercise for our own edification?
Rivers-Moore and Shaw: The purpose of the PWI includes proofs of concept. We’ve done enough talking to think that doing this exercise will be useful in determining what to standardize and what to build as proofs of concept. Doing this work could be seen either as part of a study or part of a standardization effort. Even if this is a waste of time, learning that fact should be useful.
Kimber: Even if the result doesn’t become part of a standard, it should be useful as a technical report of some kind.
Shaw: In what sequence should the EXPRESS models be built?
Kimber: Start with the A.4 model, then the HyTime model, then the SGML model. But we need all three of them fairly quickly.
Bergström: Do we need a separate property set for each AP?
Shaw: Only with early binding. If late bound, you would define (probably) one property set for all instances of EXPRESS.
(The following chart is drawn on the board, representing ACTION DECISIONS made by the group: Three EXPRESS models will be created, one for each of the three SGML Property Sets in the left-hand column. They should be built in the sequence shown, treated as three steps in a single operation.)
|
EXPRESS Models |
Binding |
|
|
Property Set Requirements (A.4) |
Do this 1st |
|
|
SGML Property Set (the abstract part at least) |
Do this 3rd |
Late |
|
HyTime Property set |
Must do 2nd, to be able to do HyTime from EXPRESS |
Rivers-Moore: The question has been raised that one could imagine an alternate SGML exchange format as an alternative to Part 21. Should we consider whether to do this?
(Rivers-Moore proposes an additional action item to develop an XML-based syntax for an exchange between EXPRESS databases as an alternative to Part 21. [This action item is later changed to a feasibility study and business case for developing the XML syntax.-Ed.])
Rivers-Moore: This would open up all the XML tools that are likely to come into the market soon, and compatibility with other web-based stuff. You could export a STEP database straight to a browser and it would understand what it was receiving.
Bergström: It would be XML, possibly XLL with the links, but not XSL.
Rivers-Moore: But that depends on the specific style. It might say "render this" or "store it" or whatever.
Bergström: Writing useful style sheets is an extra effort, not part of our job as part of the standards world.
Shaw: If I map from my EXPRESS out to XML, I’m going to have a potentially random view. If I then sort of impose the ordering of a document on this, I’m going to have a form of a view as opposed to the stuff itself.
Rivers-Moore: But remember that XML going over the wire doesn’t necessarily have to be shown to the user. It could go to his system.…
Rivers-Moore: Peter asked to estimate the amount of time that will be involved. Eliot, how would you estimate this?
Kimber: About a week of one person’s time to do a first draft of the grove property set, then double that estimate. Then probably that same magnitude for each of the others, except that to do the HyTime model you have to understand the semantics which is not trivial.
Rivers-Moore: It seems that the EXPRESS expert who’s going to do the HyTime one will have to spend significant time closely with a HyTime expert. This might be a job for Nigel and Peter with input from Peter Newcomb.
(Rivers-Moore raises the question of whether to have formal links with other standards groups.)
Shaw: There is real potential value, but we don’t need to rush into this right now.
Kimber: The act of formalizing the liaison won’t change the work that happens.
Bergström: We have not resolved the SGML_STRING problem. Do we need it on the board of possible actions?
Shaw: I thought we had decided to go up a level.
Bergström: But what do we have to do about this unnamed thing: Information Object Representation?
Rivers-Moore: I feel we can say that this group does not see a special need for SGML String, but that we can see a very real possibility that STEP will need to handle encodings in general, which would have to go back to WG10. Let’s resume this tomorrow and do it properly.
Wednesday
Review and prioritise "Industry Requirements"
The next portion of the meeting is spent working through the "Industry Requirements" section of the Project Plan that had been distributed in September by Daniel Rivers-Moore. Each paragraph is examined and modified if necessary, and action points are established to create solutions to the proposed requirements.
The text of the document is projected on the screen. Headers below correspond to the sections within the document. Each section below begins with a "Summary" of the primary points agreed-on pertaining to that topic.
Lifecycle Management
Rivers-Moore: What do we mean by lifecycle management? (Reads from the text of his paper.) (Mentions the upcoming submission from Honeywell in which they will take the concept of "components of documents" a long way.) At least a part of this is about being able to deliver documents from components, while another part is about referencing product data within a document. Also, in the case of process control, some of the data you might point to could be changing in real time, thus causing the document (potentially) to reconfigure itself.
Link Management
Summary:
Rivers-Moore: I propose that we add "redesigned" to the first sentence of the Link Management paragraph. (Agreed. He reads the paragraph.) "Just as product components are linked together in complex configurations, which change over time as the product is assembled, used, redesigned, reconfigured, perhaps disassembled and recycled, so document components are linked to each other in ways susceptible of reuse and recycling…" Effective management of the links is the key to the whole thing.
Bergström: Do we need to define what we mean by "link," and how that relates to "relations" in the EXPRESS model? If links are different from EXPRESS relations, this is not right.
Rivers-Moore: In my mind they are the same in principal. When I say "link" I mean a relationship.
Bergström: I understand both sides, but we are talking to EXPRESS people who understand relations as one thing, while on the SGML side they talk about links that are not modeled as such.
Rivers-Moore: Then this is an important point. Maybe the word "link" is misleading for this community?
Bergström: Maybe define what we mean by link management.
Shaw: I see links falling into two categories. There are links provided for in the structural definitions we relate (EXPRESS schema, defined at the type level and captured in the schema). Then there is a second class of relationship that I might want to capture, that says that actually this value came from over there. I didn't predict that at the structural level, but at the instance level I want to add more richness, so I create an instance link that was not defined in my structure. In a sense, in a DTD I can use my attribute and ID references to create a series of links that cut across the predefined structure of the DTD. It’s this characteristic of pre-defined structurally-driven relationships in one category, as opposed to the ad-hoc ones. It’s the second category we have to deal with, because there is no chance of standardizing how people are going to tie their documents to their product models, so we have to define a general way of doing this.
Rivers-Moore: So, to summarize: Is a link a relation in an EXPRESS schema? Yes, that’s one kind. But there will also be unpredictable new links and link types to be handled during the life of the product.
Shaw: Of course you can always define a view of the world that turns those into structural links.
Bergström: I’m just saying that the last two sentences here have that meta-level: they include everything in the word "link" here. I think you can have successful product management without necessarily using HyTime; not everyone will use those techniques.
Rivers-Moore: Yes, you may always find that you want to assert the existence of a relationship that you didn’t define before. It might also be then that over time you want to formalize these relationships and extend the schema.
Shaw: Another way is to put the links at a lower level of detail than what the schema defines. And this will be at a level below what is worth modeling in STEP (e.g. in a description).
Rivers-Moore: But these could be modeled in SGML, if you wished. One needs to be able to formalize things, but we have mechanisms to do that already. You also need to be able to handle the unpredicted or unpredictable , so you have to have a means to handle that. And you also must be able, if you choose, to formalize what was ad-hoc. HyTime provides a mechanism for all of this. And even to do this on a one-off basis, applying a view to something without touching it. But then Eliot admitted that the tools to handle all that are unfinished or broken. So, you need (new) reliable, robust tools for managing all of this flexibility. This was Jim Crawford’s issue.
Rivers-Moore: Pondering that, it occurred to me that these things that HyTime describes to manage all this are themselves things (architectural forms, links, etc.). Applying these are a kind of reconfiguration. STEP has tools to handle reconfigurations and links.
Bergström: That’s also what Eliot said. It’s just that there are no systems to buy that do it.
Rivers-Moore: So it seems we have a candidate for new useful work. This set of problems could possibly be met by a PDM system to do configuration change control (or change management).
Bergström: But it could also be met by EXPRESS models. Isn’t there a Part 44 for change management.
Shaw: Let’s be clear about what we’re trying to do. Link management is a functionality. In the standards paradigm that SC4 operates in, maybe with the exception of PLIB, we tend to work at the information-only level. We do not standardize functionality. So the issue here is the information necessary to support link management, rather than how a system would go ahead and actually do the work. We need to take care to maintain that the information we are dealing with is the information to enable the functionality.
Bergström: So we have to take even more care not to use words like PDM systems, but rather a schema… Neither SGML nor HyTime can do anything. They can just encode things or give structure to them…
Rivers-Moore: But I think both the EXPRESS and SGML worlds are beginning to talk about behavior.
Bergström: But they do not standardize how to handle this, they give us the possibility in a standard way to say what we want to do with the information.
(Discussion in which Bergström and Rivers-Moore explore and agree with the above statement.)
Bergström: Then again, I agree with Daniel that the function can be done in a PDM system. And HyTime is part of the solution to link management.
Shaw: One of the things we could be talking about here is implicit in what we suggested yesterday, which is creating a storage model for HyTime such that a PDM could handle it.
Rivers-Moore: Would this be a "new AP" for link management?
Shaw: What you do to manage things is to bring them into a common environment. It gets difficult if you’ve got multiple paradigms. This suggest that you need to fold the links back in so that you can store them along with the data even though they are logically separate. One way to do that is to take that information and have an EXPRESS model for [the HyTime property set].
(Rivers-Moore notes the above on the board. General agreement.)
Structured text
Summary:
(Rivers-Moore reads the paragraph about structural text.)
Rivers-Moore: The statement that SGML tagged text must occur in an EXPRESS model has been affirmed by this group. (Assent)
Bergström: But yesterday we said "no" to the question of whether we need a separate or new data type for it.
Rivers-Moore on board: "Tagged text in an EXPRESS model: that is a requirement." Question: does this require a new data type? The feeling yesterday was "no," but there is an issue of character sets. Will STRING handle the control characters?
Bergström: You can’t store binary in a string, so you have to know what you’re storing. You can’t store both text and illustrations in a string. In order to store XML, you need to store also the reference data, such as binary pictures.
Rivers-Moore: Yes, that object needs to be available. It seems clear that if it’s a binary object it will be in a binary form.
Bergström: But the problem might be that if you have a string in EXPRESS and you don’t know what type of information is in the string, it may be difficult to understand what it is, so you can misinterpret the string. And therefore we might need a type attribute on the thing. Does that require a new data type?
Rivers-Moore: I like this view a lot. To be able to handle what we are saying we want to put in, we want to be able to put in a string of characters that may not be ASCII characters, and might have pointer to a binary object, so STRING as it stands is not adequate. BINARY would be a better candidate, but you still have the problem of knowing what’s in the field. So you need to be able to specify how to parse the item in the field.
Shaw: That doesn’t need to be a new data type necessarily. I think even if you have this, you have to leave people the freedom to insert binary information into their documents if they wish to. We must allow for the possibility of mixed.
Rivers-Moore: Yes and no. If they want that object to be an SGML document, then SC4 isn’t saying you cannot put the binary object in it, but the rules of SGML say that binary objects must be done by reference. Of course this might change over time, or be handled by users in violation of the rules. How is anyone going to know how to parse it? All we need to say is, here’s an object and here’s how to handle it.
Sauter: This can be specified in the schema using an attribute.
Shaw: It comes down to enabling again. The reason you can’t stick to ASCII for SGML_STRING is because you can switch encodings of the SGML. Somewhere down in that document there could be a link to a TIF file. Now we have two information objects with a dependency. That dependency, at the level of lifecycle or link management, is an important thing. Now we have something akin to an assembly relationship. There are two links there: a link in the sense of completeness, and a tangible link saying "it goes here." Document dependency is a requirement that may be reflected in links internal to those documents. I don’t believe that this necessarily needs to be understood at a more abstract level, but you do need to be able to record the fact that there is a dependency between these two pieces of information.
Rivers-Moore: (draws document and illustration on board) The SGML declaration identifies the illustration.
Shaw: I don’t want to have to parse the SGML to know that.
Rivers-Moore: Fine. Let’s decompose all these issues. There are actually multiple links involved in including the illustration [along with its text], all of which are included in the SGML document. But somebody had to tell me that the string in my database was SGML I think we need to say in the EXPRESS model, outside the object, that it is SGML.
Shaw: I need also to know that there is a dependency between the document and the illustration, which does not necessarily imply the direct call from the document to the drawing.
Rivers-Moore: Agreed. But none of this says we need a new data type.
Sauter: Why do you have to specify the dependency?
Shaw: It may or may not be in the SGML, or we may not be working with SGML at all. I need an external way to verify that all the related information has been sent.
Sauter: But if there’s a reference in the first document…
Shaw: But I don’t want to have to read and parse the document in order to find out about the link.
Sauter: But that would mean you have to specify the whole set of links in the SGML file in EXPRESS. Why? There’s an SGML world already, and there’s an EXPRESS world.
Shaw: It’s very necessary. To support any form of the assembly of documents, you need to be able to identify the components from outside of them.
Rivers-Moore: I understand this problem, but the ability to point to information without storing all the dependencies is a useful aspect of the Internet.
Shaw: We’re not saying you have to have this information, just that you can.
Sauter: Ok, useful.
Rivers-Moore: So, consensus that we do not need a data type. Do we need a new type of schema?
Shaw: We come back to the question… SGML is interesting, but in the general case here we should look at CGM, IGES, a whole bunch of other information encodings and say what are the characteristics of all of these. We should have one mechanism that deals with encoded information, or perhaps an ASCII specialization on that. Maybe it’s an information object which has some characteristics and some content, and fundamentally we’re going to store it in a computer, as numbers, binary, text, whatever.
Rivers-Moore: Proposed summary: No new data type for tagged text. SGML content can use binary, but you need to be able to assert dependencies between documents and components, and between any encoded object and its encoding. (This applies to all encodings, not just SGML.)
Shaw: It gets worse, because sometimes you need to know both the standard and the encoding…
Rivers-Moore: I think that’s out of scope for the PWI.
Shaw: I disagree strongly. It’s in the scope. This is the information storage abstract level as distinct from the physical level.
Rivers-Moore: OK, done.
(Lunch.)
Ingenbleek: Is there a requirement that there’s a necessity for links into a storage environment, or data model? At the moment we have only whole documents.
Rivers-Moore: That is not the intent, because we should be linking into document components. But we don’t say whether the component is a full file, or just part of one.
Bergström: I think this is addressed by the last point in the text.
Rivers-Moore: Maybe. Let’s address this later.
Ingenbleek: I am interested in finding a solution to a specialization of a generic Part which solves all the problems of referencing other documents, independent of SGML. This is missing from STEP; there is not even a mechanism to reference from STEP to STEP. So I want to see in everything we do, what is the SGML part and what is general in approach.
Bergström: I think it’s in the last thing we discussed before lunch. Do you mean actually pointing, rather than including in the schema:?
Rivers-Moore: HyTime is under a standard. Just as STEP is not the totality of SC4, so HyTime is just part of the SGML set of standards. There are other existing and proposed SGML standards, which we could consider under the terms of the PWI as written. This includes HyTime explicitly. And HyTime allows you to link anything. So, within our scope is to think about all those document description languages, then the need to link them can be added explicitly to our requirements, but in terms of how to meet that need, the answer is that it can be done in HyTime.
Bergström: An SGML document typically in practice is a text file and a number of graphics files, so in order to handle SGML we need to handle all kinds of formats anyway.
Ingenbleek: But at the moment we are talking about how to point to the internals of a file. This has not been solved.
Rivers-Moore: HyTime defines a syntax to address anything that can be stored meaningfully in a computer, and link it to anything else that can be stored meaningfully in a computer. Here it is stronger than EXPRESS. So, if we add to the EXPRESS language the work to be done to deal with these links…
(Ingenbleek says he wants to be on the team to do this work.)
Bergström: One of the problems is that you can do such things in EXPRESS but there’s no standardized way to do it. And you can do it in SGML, but if you want to work with both, then you need a formalized way to share the information between them.
Ingenbleek: I’m always thinking of a wider range, not just SGML but EDIFACT etc. The solution should be an explicitly generalized case that works for all other standards in the world.
Rivers-Moore: HyTime has this sort of vast ambition. On another level you could say that STEP has the same aspiration. In a sense, the Grove can be the common language which allows one to address things that are stored anywhere in any format. It does that by defining some very generic things like a node and an arc. If you think in the STEP world, in EXPRESS you can do different kinds of data models, and one of the styles of data modeling which has been advocated is the EPISTLE approach. This also goes back to some very basic generic things.
(Discussion of the process of decomposing both STEP and SGML models into, respectively, an EPISTLE model and the granular SGML used in the RivCom demo shown by Rivers-Moore on the first day, then back up again to another model in order to map across.)
Bergström: Returning to the issue of what encodings we link to. I think we should recognize that some of the encodings we want to link to will need a way of mapping between the languages, while in other cases (e.g. a PDF file) all we need is to be able to point to them. So I think if we can make this more general than having to be SGML it will be very useful.
Ingenbleek: So, this is why it’s important to reject specialized terms like SGML STRING.
(General agreement.)
Rivers-Moore: I feel the SGML_STRING concept has served a useful purpose in focusing attention on these issues…
Bergström: I was a co-author of one of the original white papers, and it was clear to us from the beginning that this was a wider problem. But our task then was how to integrate SGML with STEP, hence the name. But that is history now…
Rivers-Moore: That’s useful, because some of the prior documents make it look like this group’s mission is to get SGML_STRING into the EXPRESS language. But that is not our mission. That was one solution to our mission, but we’ve realized that it is too specific.
(General agreement, yes, we need a more generic solution.)
BergstromBergström: May I suggest that we log this additional requirement that was implicit in our discussions but may not have been explicitly stated: A generic mechanism for referencing data objects whose syntax may be different from EXPRESS. (wording worked out by group) [E.g.i.e. SGML SGML_STRING generalized to all data formats.]
Action Point: The group will identify a generic mechanism for referencing data objects whose syntax may be different from EXPRESS.
Parameterization
Summary:
Rivers-Moore: So, let’s move on to Parameterization [in the Requirements document]. Does anyone know anything about the NWI on this subject? (No, partly because Nigel is not in the room) All I can say is that others who should know have felt that HyTime should be able to help solve some of the Parameterization problems. But, I don’t know if anything formal has been done about that.
(Continues reading. Adds "or early binding" to the text of the paragraph.)
Rivers-Moore: For me it’s easy: HyTime does that. I don’t want to go in depth into this unless people here want to. But I do think that we should log that we want the debate to take place in the Parameterization group.
Bergström: Maybe we should say as a comment that in HyTime it’s possible to address things by query (in any query language), and that the DSSSL standard has a query language that is useful in this context and (is designed for and) works well on Groves. So, if we do a property set for EXPRESS it would work fine on that property set too. SDQL, Standard Document Query Language. I agree that we should leave the rest to the Parameterization group.
Bergström: And I think that the opposite way, we were thinking of making those EXPRESS models, will work the same way but I’m not sure.
Rivers-Moore: Question to study: Would EXPRESS models of HyTime and SGML and Property Set Requirements allow late binding in EXPRESS/SDAI to SGML objects?
Action Point: Determine whether EXPRESS models of HyTime and SGML and Property Set Requirements will allow late binding in EXPRESS/SDAI to SGML objects.
Transclusion
Summary:
(Discussion of words "parameterization" and "transclusion" and whether they are just synonyms.)
Rivers-Moore: I think that transclusion should not be a property of the link. Say there is a link between me, Reimar and a conference that says "he met him there." It could be that I have various other links between all the other people who met each other at the conference. It could be that I have a style sheet that says that whenever I display the conference, I don’t want to see the name of the conference, I want to see a list of the set of people who met each other there. …
Bergström: I’m not so sure about that.
(Rivers-Moore discusses how he took issue on the XML forum with having behavior be an attribute of a link, and explains the XML working group’s reasoning for why it was included in the link language anyway even though style is a separate language. In a nutshell, the working group included transclusion in the link language because it’s something you would really want to do with a link, even though it really belongs elsewhere.)
Group consensus: We’ll still use the word "transclusion" but combine it with "parameterization" in terms of the work of this group, and treat them as the same thing.
Robust global addressing
Summary:
Rivers-Moore: (Mentions the URN (Universal Resource Name) Internet initiative that attempts to remove the dependence of URLs on physical locations.) This is a standardization initiative in the W3C consortium. We as a group should find out what is happening with URNs. There is also a thing called a Formal Public Identifier (FPI) in SGML which allows a level of indirection in addressing items from documents. And, there is a mechanism in SC4 for naming things in a globally unique way. So I believe there is work to be done in looking at these naming mechanisms and seeing how they should be used. Does anyone here have expertise in this?
(No. Shaw, who might, is not in the room.)
Scherer: In our project we identify things both with an internal number and with a URL for global identification.
Rivers-Moore: But a URL has a problem because it is globally unique, but it refers to a specific physical location rather than the thing at that location. So if the content at that location changes, you have no way of knowing that your link now points to the wrong thing.
Stewart: You need a means both of locating a referenced thing and of verifying that what you found is what you wanted, rather than something else that has been moved to the location.
Rivers-Moore: So we should log as work to be done to investigate URNs, FPIs and whatever SC4 is doing with this same problem. (Nigel knows about the last of these.) What are they for? Are they the same in purpose or in mechanism? What do we do about it?
Storage Management
Summary:
Rivers-Moore: Obviously this is greatly a management issue, but there must be the technical mechanisms for supporting it.
Bergström: It’s also partly what we talked about yesterday, when we said that there are different storage models for different parts of the STEP standard. We need to be able to name them in order to access them. EXPRESS doesn’t name the container, but we need to be able to address this. I think we said this could be the first node of the grove?
Rivers-Moore: We did agree that even in EXPRESS there is a virtual top node which describes the unmentioned storage container. In SGML the root of the tree is the same thing as the total stored object. So in fact there isn’t a discrepancy, except that EXPRESS doesn’t have a mechanism for naming the container. So Nigel was suggesting that if you add to EXPRESS a way of naming the container, then you both have a way of referencing the storage object and you’ve enabled mapping to the HyTime structures.
Bergström: And Nigel mentioned ISO standard 8824, regarding this.
Bergström: There is also the opposite situation (from a distributed repository): that you’d like to be able to store everything in one system, for instance to be able to store SGML in a STEP repository. I think this should be one of the parts of the storage management requirements.
Rivers-Moore: I think yours is a special case of that.
Stewart: I think it’s important to explicitly state what Peter said.
Rivers-Moore: So, either a distributed virtual repository or a centralized data base/repository which contains all relevant data objects. (In whatever format.) A further question: can 8824 be used to align the two storage paradigms?
Stewart: You not only need to know how to manage the storage, but also how to know when the links have failed, e.g. the system is broken.
Bergström: The issue is, what information do we need to capture in order to accomplish this.
(Group agrees. But no action point is logged.)
Sharing and exchange
Summary:
Rivers-Moore: It seems to me that the real issue isn’t whether it’s sharing vs. exchange. That’s just a question of real time vs. batch…
Ingenbleek: There is a difference, because Exchange is making a copy of the data while Sharing is providing access/making visible. And, the copies are always different.
Rivers-Moore: I believe (again) that how this is done isn’t in our scope, but making sure it is possible is.
(discussion)
Rivers-Moore: Really, sharing is exchange where the recipient doesn’t store the data. You still are making a transient copy.
Bergström: I see what you mean about all copies being different, but the issue is, where do you draw the line? In all these deliveries and transformations, the information is still transmitted…
(discussion)
Rivers-Moore: I think all we’re saying is that we need to know whether there are any aspects of the information that cannot move between the different worlds (SGML, EXPRESS), which would imply an incomplete mapping, and that we’ve already given ourselves the job of doing this.
Bergström: I agree that there is nothing additional to do on this specific point. But we were discussing yesterday that a Part 21 file does not include all of the data that you have in an instance of an EXPRESS model (e.g. derived data). There might be a need to look at this requirement in the STEP community: are the existing mechanisms robust enough to exchange the information we want to exchange? That might not be the case with Part 21, for example. This might be a driving force for developing an XML format for exchanging data.
Rivers-Moore: Ok, Action item: Is there data loss? In Part 21, yes. In XML, we don’t know. Our work must define whether there is completeness, and where not, identify it.
Rivers-Moore: I think there is also an action to develop an XML exchange format for STEP. Please make this an action item under this topic. That would be the means of implementing the issues we just discussed.
Ingenbleek: Making copies may influence the robustness of links.
Bergström: Yes, every time you transform data you have to manage the links.
Rivers-Moore: Copies means persistent storage in another location. The link address needs to handle this. For example, if my link address points to the server and I move the information offline, the addressing mechanism must not simply break. (This should be included in the above action item.)
(Discussion of how HTML is broken in exactly this regard.)
Bergström: To clarify: When you make a copy [of a linked set of information], sometimes you want to readdress your link(s) to point to the local copy, but sometimes you don’t because you want always to point to the master original.
Internationalisation
Summary:
Rivers-Moore: Since we’re talking about "the kind of information normally contained in documents," and "human-readable," this implies that it must be able to be written in any of the written languages of the industrialized world. This is being worked on. Should this work item address this?
Bergström: With the solution we’re talking about now, with documents stored as a binary, there’s not a need for EXPRESS to use the same character set as XML or SGML.
Shaw: I disagree. I’d like my information to be compatible. (At some level there is a requirement for compatibility.)
Rivers-Moore: There will be a profusion of cheap parsers soon, maybe free, that can parse Unicode. With XML adopting Unicode there will be much more use of it than now and it will be less expensive. I’m not saying what the right answer is, but rather asking, do we need to align with the XML answer?
Bergström: I understand what Nigel is saying, but in SGML the parser can understand other character sets coming in and still convert them on the fly, provided that you know the character set you’re going into. Sure, everything would be nice if it was harmonized but it’s not a requirement.
Shaw: I’m not on top of this issue. What I am aware of is that there’s not just one character set, even in EXPRESS. The one we use to write it, the one legal for literals, the one legal for instances defined as strings. And for each of those we’ve got an encoding issue too. And I’m not sure what they’ve specified for the use of Unicode in XML. This is a difficult area. I think from our perspective the real action is to assign either a liaison or a watching brief, and the parties involved need to be those with the best understanding of the issue. Perhaps Phil Spiby from our side, and probably Ed Barkmyer as someone for NIST who understands these issues, and then perhaps we need to identify an appropriate source so we can understand at the earliest possible stage the position that XML is taking. If those guys decide there is a clash, we can do something about it. Otherwise, we don’t have a problem. (Ed is at NIST and on the exploders for WG11.)
Mak: Also, in the STEP world there’s a whole subcommittee set up to study character sets.
Shaw: The other name to add is Neal Laurence, who’s looking at it from the issue of Part 21.
Rivers-Moore: (names some of the people from the XML world we might contact. Writes them on the board.) So our action item will be for someone to contact some of these people and set up the watching group.
Ingenbleek: Internationalization is not only about character sets, its also about such things as how to represent numbers, dates, currency, etc. Collating series…
Rivers-Moore: So there’s also the semantics of data types to be considered. Can we log this as a question for careful thought?
Ingenbleek: Ok for me.
Document paradigm
Summary:
Rivers-Moore: (Reads the relevant section of the document.) The question is, can you meet all the above requirements and still keep the idea of a document? Or, do we have to create a new paradigm? We should find out what work the DOM (Document Object Model) people are doing, maybe liaison with them, take it into account, who knows.
Rivers-Moore: I’d like to introduce the feedback I received from Barrie Reynolds of Honeywell UK in response to this paper about Requirements. (Puts the Barrie Reynolds paper on the screen.)
Rivers-Moore: For Honeywell, a document is "that which is displayed on the screen." (Discusses the Honeywell list of contexts they want to take into account when assembling the document from various components in response to real-time events.) Apart from the fact that it’s in real time this is very similar to what we’ve already been discussing.
Sellentin: Do they mean they want actively updated elements within the document?
Rivers-Moore: I think so.
Shaw: But none of these examples are, strictly speaking, our problem. They are all variations on the same basic thing, which is, you have some pool of information and a person who needs access to this information. We’re talking about some kind of engine that sits on top of the information, perhaps a control program, that presents information to Fred. We have several variations on this theme: A style sheet/monitor situation in a loop, a query, a control program, whatever. In all of them there is a serious active component. Plus, there is text being generated (including logs, screen displays, etc.) which may come back in and be stored. The information is in our scope, but how the active components work is out of our scope.
Rivers-Moore: So that’s good news, we can tell Honeywell that when the work item is completed there will be standard ways to solve his problem.
Rivers-Moore: Yes, the fact that we’re creating documents on the fly means that we have a requirement to be able to capture and store them as new but relevant documents.
Bergström: Also, in more traditional case where you produce documents that need to be signed in ink and stored, the electronic equivalent is to have an electronic signature and store it back into the database.
(Rivers-Moore points out Honeywell line about the relevance of the information possibly not being known when the text is entered.)
Shaw: But I’d almost argue that if you change the tagging significantly, you are actually changing the document. A lot of this is about the metadata about the whole set of information, as distinct from the information contained in it.
Rivers-Moore: This is different from Honeywell’s intention. They feel that the author will write small bits of information, such as, how to deal with some switch. That goes in and is tagged as step 17 of some procedure. At some point, context-sensitively you might want to discover when and why this procedure was run. Now that same bit of text is being presented in a different context, as the context in which someone did the procedure.
Shaw: But again I say that it’s the role of the string that is changing, not the value of the string.
Mak: Consider, there might be multiple parts of the document that are dependent on external situations.
Rivers-Moore: Can we take as an example: "Rotate the <part> <angle> degrees to the <direction>." Then there is an external tag, saying this is "A Step." But this identification of the use of the string should not be part of the object, because then the object cannot be re-used.
(Further discussion of these issues. You start with a generic typical instance, against which queries can be run, used for many installation. Then you specialize this for a single plant, where there is one set of right returned values…)
Shaw: I think we’re getting into multiple business issues, including deciding how much to reuse the text. I would argue that what you’ve written is a function, and once the person did the task I no longer have the function, I have an invocation of the function. This is no longer a "step" but rather a performed action.
Rivers-Moore: I think all the Honeywell person is saying is that he wants the system to be able to handle reusing what starts as generic function as a static step that has been performed.
Shaw: Yes, you can do these kinds of things, but you’re mixing the worlds of Parametrics and text. We can enable this kind of thing, but we shouldn’t force it on them.
Rivers-Moore: Yes, we got this requirement from Honeywell and we should decide whether to enable it in our model. But we don’t have to decide whether or not it is a good business decision to implement a system this way.
Bergström: What is the action item?
Rivers-Moore: Maybe just to log the fact that there is data modeling work to be done here, but not necessarily to prioritize it within the scope of the PWI.
Bergström: But I think we are starting with the thinking we’ve already done. Nothing prevents people from doing this based on what we’ve already said.
Rivers-Moore: I think all the other tools provide this capability.
Stewart: I don’t think it’s necessary or within our scope.
Shaw: It might be worth thinking through some of the possibilities raised by this example. … I’ve got a style of tagging which identifies what something is, another which identifies a query you can run against an element. These are all styles. What it comes down to is the functionality I can provide within a string that links me back to product data.
(discussion)
Rivers-Moore: But do we want an action point to create a clear conceptual model (per the "Document Paradigms" paragraph).
(discussion)
Shaw: Have we had any real discussion on metadata? Some of the things Honeywell is asking for are metadata in relation to the content.
Rivers-Moore: I am convinced that in the parameterization issue, the document model issue, and possibly the data/metadata issue (e.g. function calls, reuse) there’s a huge amount of meat. If we attempt to get through it as part of the PWI we won’t deliver NWIs within a year. My personal view is that I agree that we can do a lot of useful work without addressing this. I also believe that if we do our other work, people will address this in all sorts of useful ways, and it might turn out to be a valid work item for standardization. I propose to log it as an interesting and deep area that we will not address early in our task list.
Shaw: (draws on board) The process we’re involved in involves fixing things. Placing rules and constraints on things. We have several levels at which you can do this. Start with three:
Infrastructure (STEP technology, in this case);
Specific Model;
Instance.
There are things you can do at the instance level that you can’t do or control at the model level. Parameterization is at the instance level. It’s ad-hoc relationships between instances. One level we’ve got to address is at the Infrastructure: Express meets HyTime, etc. The DOM model issue is at the Specific Model level. So is most of the rest of STEP, and the EPISTLE style of doing things.
Shaw: We have to decide whether we aim for solutions that operate on these different levels. My feeling is that we have to operate mostly on the Infrastructure and Instance levels. At the model level there’s a great deal of variability, via different modeling styles. But at the Instance level we can derive predictability from the Grove mechanism. And we can make this predictable from the EXPRESS model of the grove. In the meantime, a lot of the SC4 work lies in the middle level, the modeling level. At that point you’ve got to say, what can I achieve in terms of standardization there and how much can I realistically do?
Bergström: But some of the things we’re doing at the lower level might end up being resources or resource models, and that is on the second level.
Shaw: Potentially, but I don’t know. Some of the things we do at the bottom are liable to end up as mechanisms used at the model level, but not necessarily part of it.
Rivers-Moore: I find the three levels very helpful, agree about much of the work being at the bottom layer, but not sure why you’ve put an emphasis on the Instance level (and then repeated it at the bottom level).
Shaw: If you look at Part 21, you’ll find that some things couldn’t be done at the model level and had to be moved to the instance level. We can learn from that.
Rivers-Moore: So if we provide an infrastructure at the bottom that provides all you need to move about the middle layer, we don’t have to build models in this group. We all agree that it is not our job now to draw models here.
Shaw: I think the modeling is quite a distinct problem from the ability to interoperate between the two worlds.
(Juergen Mohrmann enters the room.)
Shaw: You’ve arrived at an opportune time. AP 214 wants to know whether we will provide facilities that let them handle multi language strings within EXPRESS without having to do modeling to achieve that. My feeling is that we are modeling things in 214 and Part 41 that should not be modeled, if we (in this group) can provide facilities to handle this at the infrastructure level. It’s a question of the tradeoff between where does it make sense to do your modeling using an EXPRESS functionality, vs. where should you model the string using SGML etc.
(Discussion of what would be needed for new equivalent of SGML String.)
Bergström: It looks like we need an action item to deal with the extension to Part 41 required in order to handle SGML.
(Rivers-Moore recaps this morning’s decisions about SGML_STRING.)
Shaw: Unfortunately, we made a decision here that… (goes to the board) The requirement is to be able to put into my Part EXPRESS world the ability either to point to another structure that effectively is a table of Language and String with various options; or, to include tagged text to accomplish the same.
(Discussion of possibility of having a default DTD and store XML components in the database.)
Shaw: Part 41, Fundamentals of Product Data Concepts and Assumptions. This is the core Part of the STEP model. It defines three types: text, label, identifier. All three are just renames of String. What we are potentially talking about is saying "hang on, we’d like to be able to do something which is either a normal string or maybe an encoded string."
Mohrmann: All entities are based on these types. We need multi-language capability for Text and Label. We’re very focused on upward compatibility. It is difficult to change these types to allow the encoding we need and keep upward compatibility. I think you need to make a small Express model like a data container with one or two entities to explain how you would accomplish this. (Goes to board to discuss the timing of pending parts that reference international character sets. If what they have proposed is not good, would this PWI group please put forward an issue that there is a better way to deal with internationalization issues.)
Rivers-Moore: This sounds good to me. But does the comment that needs to be logged have to be fleshed out with a proposed solution? Who will make the comment?
Mohrmann: I see it this way. One way is to say I don’t like this solution because SGML is better. But that’s a negative comment. But you could make a positive comment that you have other requirements that will cause you to propose a bigger (different) solution.
Rivers-Moore: So could Bernd Ingenbleek and Nigel take this forward and make the comment to Part 41 CD Ballot?
(Yes)
Mohrmann: If you have a proposal before Orlando, I’d like some kind of advance review.
Agenda item: In Orlando on the Tuesday, from 1:00 to 3:00 there will be a joint session between T14 and WG12. Subject is to decide how to address the comment that we will have submitted on multi_language_string.
Thursday
Joint session with WG10 (Architecture)
(Discussion of lack of a WG10 quorum. Without one, we can only discuss but not create WG10 action items.)
Wenzel: Proposes generic agenda. (introduction)
Shaw: Notes that T14 will not assign actions until later, so in this meeting we can only deal with WG10-level actions.
Shaw at podium. Gives overview of how we spent the last few days, e.g. the T14 agenda leading up to now. Gives very brief overview of how HyTime works.
Rivers-