SGML Documentation Objects within the STEP Environment

Author: Hugh  Tucker



Hugh Tucker, director of Documenta ApS, participates actively in standards activities. At present he is Chairman of TC 184/SC4/WG3/T14 (Product Data Exchange using the STEP/Standard for the Exchange of Product Model Data), as well as a member (representative for Denmark) in ISO/IEC JTC 1 SC18/WG8 (Document Processing and Related Communication - Document Description and Processing Languages),

He has participated in many different European Community Projects, including several EPHOS projects and he is the Editor of the OII Technical Handbook, under the IMPACT 2 programme. Hugh Tucker is an experienced consultant in designing, analyzing, and specifying information system architectures.

He is the principle of Documenta ApS, a limited Danish Company registered since 1976, that provides professional services in the area of documentation and information systems based on international standards. Documenta, and its daughter company in Sweden, InterDoc Consultants AB, are involved in consulting and SGML software development as well as participating in the SGML/STEP harmonization / integration effort.

Author: Betty  Harvey




Betty Harvey is President and principle owner of Electronic Commerce Connection, Inc. Prior experience includes acting a Management Analyst for the U.S. Immigration and Naturalization Service, a Scientific and Engineering User Support Specialist at David Taylor Model Basin, NSWC (formerly David Taylor Naval Ship Research & Development Center) where she provided support for a multitude of computers including Cray XMP. While David Taylor Ms. Harvey participated in the development of U.S. DoD CALS standards, including IETMs, SGML and Internet protocols. In 1994, Ms. Harvey was awarded “Employee of the Year, Engineer/Scientist”.

Ms. Harvey reactivated and currently coordinates the Washington, D.C. Area SGML Users Group ( and acts as Secretary for the ISO/TC184/SC4/WG3/T14 ( This ISO Technical Committee is responsible for the integration of the SGML and STEP standards.


ISO 10303, Standard Exchange for Product Data (STEP), is being developed by a broad range of industries to provide extensive support for modelling, automated storage schema generation, life-cycle support, plus many more data management facilities. ISO 8879, Standard Generalized Markup Language (SGML), and the SGML family of standards, including HyTime and DSSSL, is used for modelling and encoding the documentation of industrial products, many of which are produced using STEP.

There are technical differences between the STEP and SGML as well as differences in their application and spheres of enterprise. For example, STEP is used during the early stages of product development, e.g., design, testing, whereas SGML is more commonly applied during the latter processes of a product's life cycle.

This paper discusses the technical differences and problems between the two technologies and outlines some of the identified requirements needed to harmonize the two types of data. A approach based on information objects is presented showing how SGML product documentation information can be incorporated and stored together with STEP information. Using an information object methodology could allow textual data such as designer's and testing notes, method annotations, comments, etc. produced during the beginning of the product development cycle to be associated and archived with the actual design models.

The definition of an information object is discussed and the distinction is drawn between a perceptual documentation object type and the conceptual information object type needed in modelling STEP data. Implementation suggestions are made along with the practical requirements needed to make information objects effective and useful.

The STEP standard task group, Product Documentation (ISO 184/SC4/WG3/T14) is currently tasked with the responsibility for creating a methodology for the cooperation of the STEP and SGML standards. Information will be provided about how current corporate initiatives could impact and provide pertinent input in the T14 Working Group.


ISO 10303, Standard Exchange for Product Data (STEP), is being developed by a broad range of industries to provide extensive support for modeling, automated storage schema generation, life-cycle support, plus many more data management facilities. ISO 8879, Standard Generalized Markup Language (SGML), and the SGML family of standards, including HyTime and DSSSL, is used for modelling and encoding the documentation of products.

STEP and SGML are used in the same industries and are concerned with the same products. However, there are technical differences between the two standards as well as differences in their application and spheres of enterprise. For example, STEP data is developed during the early stages of product development, e.g., design, testing, whereas SGML data is more commonly generated during the later processes of a product's cycle.

Many industries and individual corporations are currently in the process of defining methodologies for including product documentation in SGML both during the product development cycles, and through the entire life-cycle of the product.

Currently there is an ISO standard task group assigned to look into the possibility of integrating the SGML standard within the STEP environment. As organizations are beginning to implement STEP they are beginning to see the value of allowing SGML within the STEP environment. They see that they can greatly decrease their time to market by establishing information capturing mechanisms at the beginning of the product development cycle.

Why Harmonize Industrial Data and Product Documentation?

Organizations that create products are also create product documentation. Documentation usually starts being compiled at the time the product is being designed and developed. However, this design and manufacturing documentation is often not used in the final user/operations oriented product documentation. And frequently, it is not even used in the development of this documentation! The reasons for this can be many, but a main factor is that the user documentation is typically developed at the end of the product development cycle. At this time, the engineers who developed the product are usually involved with other product developments and are not available for the technical writers who create the technical documentation. The design and manufacturing documentation is most frequently in data formats that are inaccessible to the technical writers, e.g., CAD/CAM data, parametric models etc.

It would be a great support to the writers of the final product documentation if they could efficiently access the documentation and notes generated during the design, development and manufacture/implementation of a product. The larger the amount of product documentation that is available from the front-end phases of the product development cycle, the greater the benefit to the final documentation.

The engineer(s) who are developing a product are the most knowledgeable individuals about that product and are in the best position to supply information about the product. Information created by the engineer(s) is more likely to be accurate.

Another aspect, is that information about design decisions, testing results, etc. can be required later in a products life, e.g., by authorities. Harmonizing the data types between the early design documentation and the final user documentation can make the archiving and retrieval infinitely easier.

It is important to understand at this point that we are not talking about replacing the technical writer. The technical writer will still add value to the documentation. The information will still require expert writers and editors to provide the technical documentation to the end-user in a meaningful and understandable format.

Why Focus on STEP and SGML?

The answer to the question 'Why focus on STEP and SGML?' seems obvious to us. Both STEP and SGML are:

Both standards play different but related roles in information management. Each role is already well defined within the information architecture of an organization.

Corporations today are using STEP within their design, testing, and manufacturing environments and are creating product documentation within an SGML environment. These efforts are usually within two distinct and separate organizational groups within the corporation. The harmonizing of the information types would allow (during the design and manufacturing of the product):

The time and cost to produce the documentation products can be greatly reduced, quality improved and product information archives made more veritable.

Information objects can be exchanged between documentation products and reused over the life of an individual product in a multitude of ways, such as:

How STEP and SGML Can Work Together

STEP and SGML/HyTime can work very well as companion standards. STEP does a very good job of modeling product data while SGML is foremost for modelling documentation.

STEP is a relatively new ISO standard. It became a standard in 1992. It is an information exchange standard that allows information to be exchanged between different CAD systems. Initially it was a standard that involved modeling product or processing geometry. However, as STEP is beginning to implemented and used in the commercial, government and military areas organizations are beginning to focus on STEP for its standard modeling capability. STEP uses a modeling language called EXPRESS for creating models. The EXPRESS language allows user defined entities. Each entity has a unique ID similar to ID's used within SGML.

Unlike SGML, the STEP standard doesn't have the capability of reuse of data by referencing or addressing the data. For instance, to reuse a circle within a EXPRESS model, the entire contents of the entity that defined the circle need to be copied. Whereas in SGML, if you wanted to reuse an object, you would reference that object. Figure 1 shows how STEP requires an object to be copied whereas SGML allows the object to be referenced.

SGML Reusable Information Objects within the STEP Architecture.

Although the addressing mechanism of SGML cannot be used within the EXPRESS model, if SGML, which includes HyTime capability can be facilitated within the STEP model, the robust addressing capabilities of HyTime can be used. SGML and HyTime can be used as enabling technologies to supply technical information concerning the product. HyTime can be used to reference specific objects within a STEP model.

Another advantage to the STEP standard lies in integrating the SGML character-based functionality into STEP. The ability to handle character code sets, support multiple languages, interface to publishing environments, etc. has not been addressed in any way in the STEP standard and is sorely needed.

Technical Distinctions

SGML and STEP, being derived from completely different domains, have different technical structure and composition. However, in some respects there are similar. Both standards are based upon a modelling language: the DTD programs of SGML and the EXPRESS programs of STEP. In the following, we will show that the modelling functionality of the two languages is the key that can allow data types to be intermixed between the two domains. (Actually, we are mainly concerned with making SGML data types compatible with STEP as SGML can already include STEP data into documents.)

Modelling in STEP and SGML

The EXPRESS modelling language is an development of several generations of Entity-Attribute-Relationship (EAR) or Entity-Relationship modelling languages originating from the work of Bachman [4] and Engles [5] on data modelling. Later work by Chen [6] and many others developed these languages, for example, allowing the possibility of modelling propositions about more than two entities and the possibilities of relationships having attributes.

SGML on the other hand was developed from the ideas of structuring documents and has no direct modelling language predecessors. In this respect, SGML is a more specific modelling language than EXPRESS, having been designed to handle character codes sets and text strings.

Modelling Distinctions

The distinction between data types in the two modelling languages provides the clue to integration and/or harmonization.

SGML specifically defines the "character" or actually a virtual character code, at the lowest level of the model and different codes intrinsically have different behaviour. SGML has no way of assigning semantic meaning to any constructs above the character level.

STEP, on the other hand, is quite orthogonal in its approach. There are no possibilities of distinguishing character code types, e.g., it would be very difficult to implement the idea of a tag in STEP. Entities, on the other hand, are semantically encoded or programmed, and it is possible to check entities to assure that they fulfill their semantic role.

It should be noted that even though it would be possible to re-develop the SGML character modelling functions in EXPRESS, the exercise would be an exercise in futility.

Information Objects

The concept of using an information object, or documentation object, to harmonize the integration of data types between STEP and SGML was proposed in earlier papers [3].

First, we must be able to define semantically-meaningful information objects in SGML. To do this we need:

Second, we need to find a method by which information objects can maintain the character-based functionality of SGML.

Third, information objects must be able to be defined as entities in EXPRESS models.

Fourth, the information objects must be able to be mapped to SGML structures in order to be presented.

Conceptual versus Perceptual Objects

A basic problem in modelling documentation has often been the transition from the application specific structures to the presentation structures. The distinction is made here between what we call conceptual information objects and perceptual documentation objects.

We consider perceptual object types (classes) as content models encompassing element structures such as paragraphs, lists, and tables. These objects are omnipresent in our documentation models for the obvious reason that we need to be able to represent the objects in an way that can be used by presentation techniques.

Conceptual objects, on the other hand, are the application relevant notions representing some meaningful aspect of the model. Conceptual objects classes may be for example, descriptions, Procedures, Processes, Requirements, etc.

Conceptual objects will contain perceptual objects in order that their content can be mapped to the presentation medium.

Conceptual Information Objects

An conceptual information object represents one idea, concept, or relates to one main point of the product, function, or process that is being described. An information object is a locution (set of words, phrases, sentences, etc.) that has the product model, or some explicit part of the model, as its context.

The information object is a locution of product documentation describing one idea.

Information objects will have an inherent perceptual content model, such as paragraphs, lists, or tables. For example, a procedure could be modelled as a typical conceptual information object. It would likely be defined with a content consisting of the sequence of steps to be performed. The steps could be represented as a the element structure, numbered-list.

Information Object Classes

Information objects are the instances of product documentation containing the concepts, ideas, instructions, or descriptions of the specific aspects of the product, its functions, operations, etc. Information objects will be assigned and classified in classes. Information object classes will provide a descriptive level of semantics which, by distinguishing between semantic types, will help in establishing guidelines for authors and editors. Information classes can assist in choosing the correct information object to describe the proper type of information, and help with the structuring and definition of the information content and flow in a more logical and systematic manner.

The classes of information objects available should be well defined and semantically meaningful. It is suggested that a set of information object classes be defined for the STEP environment. This set will have common characteristics and be based on "best practice policies" which will allow author/editors to use information objects in different publications within one application. (In SGML this could be by defining different DTDs and in STEP by defining different Application Protocols to span application boundaries, i.e., between APs.

As explained above, the information object is a key concept for the integration of product structures and publishing structures. Information objects will belong to the product structures as a descriptive information view of the product. Some of the same information objects may be directly associated with publishing structures, i.e., information directly related to publications or other perceptual presentations. Thus, information objects must be able to belong directly to product structures as well as publishing structures, inferring that the semantic content of their information class must be meaningful and applicable to both structures.

Information classes will provide several levels of functionality:

  • 1. Information classes will provide ways of working with and disseminating knowledge about the semantics of the types of chosen information objects.

  • 2. The ability to interchange and share information through the acceptance of common information classes. For example, the same information object could be extracted from the product data and embedded into different publishing structures-as long as the publishing structures recognized the same information class. The publishing structures would also be modelled by DTDs and the information classes would be part of the DTD sharing the information objects.

  • 3. The different classes of information and their application within the STEP and SGML models will be distinguishable-supporting the mapping between the models.

  • 4. The different levels of information can be identified (semantically) and referred to by external processes.

Information object classes can also provide a way of solving intrinsic SGML problems of interchange and re-use of content between DTDs.

Modelling with Information Objects in EXPRESS

When modelling product information with EXPRESS, the conceptual object can be depicted as an entity and enter into relationships with other conceptual objects within the product data domain. In EXPRESS, information objects may be represented by an instance of an SGML_STRING (see later) created by an author, or may be generated from another representation of a product, such as from a representation in a database.

This also opens up the possibility of modelling documents using EXPRESS. This could have application in environments which have investments in PDM, CM, Workflow systems based on STEP data. In this way, the documentation could become just another component of a product and be managed in the same way as the other components.

The use of information objects for product documentation information provides the ability to model product documentation within the product data environment.

Modelling with Information Object Classes in SGML

Grouping information into information objects allows an intuitive and flexible structuring of information. It also allows the re-use of information as the information objects can occur in several concurrent structures. Each structure could be represented by its own DTD but still have the possibility of sharing common information objects. For example, the same information objects could occur in a weekly maintenance manual as well as in the daily job sheets. Information objects could be referred to (linked to) from multiple representations of models.

Product Documentation Task Group (ISO/TC184/SC4/WG3/T14)

There are currently two proposals for including SGML information within the STEP architecture. The first allows a new entity type within STEP called the 'SGML_string'. The 'SGML_string' will allow SGML information to be embedded within the STEP architecture. Figure 2 shows how the SGML information can be embedded within a STEP model using the 'SGML_string' concept.[3]

SGML_string Representation

The 'SGML_string' construct is an easy but powerful mechanism for including intelligent information within the STEP environment. Currently text strings are allowed within STEP but there isn't a mechanism for including 'intelligent' text. The 'SGML_string' construct will allow reusable Information Objects to be defined.

Today's information paradigm has shifted from the time when we were only dealing with the information printed on paper. When information is only delivered on paper we think of this information as a document - cover-to-cover.

Today we think of information as it really is, reusable information objects. This means that our information can be taken to a wide range of products and media:


  • paper

  • CD-ROM

  • Internet/Intranet


  • Technical Manuals

  • User Manuals

  • Computer Based Training

  • On-line help

  • And many more....

SGML_String Example

The second method would require a technical documentation Application Protocol (AP). STEP has provided an architecture and methodology for the development of application protocols (APs). [1] An AP is a standards document (a part of ISO 10303) that provides for delivery of information in a well defined application context. The use of an AP ensure that the information conveyed is that which was intended. An AP is to STEP what a DTD is to SGML. One approach that has been discussed within ISO STEP is to develop an AP specifically for technical documentation.

There is a high probably that both of the above methods will be pursued and that both will be used within the STEP architecture. The first approach will allow information objects, i.e., procedures, tasks, warnings, etc. to be embedded within any AP. The second approach, a technical documentation AP, will define how technical documentation will be organized within the STEP environment. Information from the 'SGML_string' can be used and referenced within the technical documentation AP.

Preliminary Work Items

During the ISO TC184/SC4 Meeting in June, 1997, in San Diego, WG10 submitted a Preliminary Work Item (PWI) which was approved by SC4. The PWI will investigate ways to increase the interoperability between both families of standards (STEP and SGML) for the benefit of product definition, product documentation, and beyond. The PWI is intended to create one or more New Work Items for subjects such as: [2]

The PWI is a major accomplishment for the T14 Working Group and is the first step forward in accomplishing the goal of utilization of SGML within the STEP environment.

Future Issues

As this initiative moves forward there are emerging technological advances that are being made that will need to be monitored, as well as implemented into the solution. Some of the new technologies that are being watched carefully by the TC14 committee are the Document Style Semantic and Specification Language (DSSSL) and the eXtensible Markup Language (XML). Specifically, XML looks like it may be a good mechanism for moving forward with the 'SGML_string' construct within the STEP environment.

XML will enable the STEP community to use SGML easily and more efficiently. It eliminates some of the harder to implement aspects of SGML. We believe that XML may be a viable option for the STEP/SGML initiative. XML may be a valid solution for implementing both the 'SGML_string' and information object technologies.

One of the major questions about incorporating SGML into the STEP environment is; "Where will the DTD or DTD fragments (if we are talking about information objects) reside within the 'SGML-string' construct. With the work that is being done with XML, it may be possible to put the onus of the DTD at the authoring component and eliminate the need for incorporating the DTD into the STEP model. Technically this concept brings up its own challenges and concerns but the problem becomes simpler than trying to have the DTD and SGML declaration residing within the STEP model.


The work currently being done by the TC14 group is breaking new ground and is really exciting. The possibility of these two information standards working together will be of great benefit to major industry players.

If you are interested in learning more about the work of the TC14 group or becoming a member of the committee, information about the working group can be found at URL:

There is also an exploder list that is available and all are welcome to join. If you are interested in joining the TC14 exploder list send an e-mail message to with the word 'SUBSCRIBE' in the subject list. You will be added to the list.


1. Danner, William F., Developing Application Protocols (APs) using the architecture and methods of STEP (STandard for the Exchange of Product Data), Computer Integrated Construction Group, NIST, October 1996

2. SC4 Resolution, Preliminary Work Item Proposal: SGML and Industrial Data, from WG10, June 1997,

3. Reschke, R. and Tucker, H., Configuration Control for Product Documentation, (A Way of Integrating STEP & SGML, 0.4), 31-August-1996

4. Bachman, C.W., Data Structure Diagrams, In: Data Base 1, No. 2, 1969, (Publication of ACM Special Interest Group on Business Data Processing).

5. Engles, R.W., A Tutorial on Data Base Organization. In: Annual Review in Automatic Programming, Vol. 7, Part 1, Pergamon Press, 1972.

6. Chen, P.P., The Entity - Relationship Model - Toward a Unified View of Data, In: ACM TODS, Vol. 1, No. 1, 1976.