<?xml version="1.0"?>
<?xml-stylesheet href="Training-to-slidy.xsl" type="text/xsl"?>
<!--<!DOCTYPE training SYSTEM "../../../SoftQuad/XMetaL%203/Rules/training.dtd">
--><training>
<training-material id="Metadata"> 
  <title>A Scalable XML Approach to Records Archive Metadata</title>
  <presentation-info><conference-name>MarkLogic Summit</conference-name> 
  <date>November 18, 2009</date></presentation-info><authorinfo><author> 
  <name>Betty Harvey</name><company>Electronic Commerce Connection,
  Inc.</company> <e-mail>harvey@eccnet.com</e-mail></author></authorinfo> 
  <introduction id="introduction"> <quoted-block> 
	 <para>To maximize success in service oriented development of applications
		and enterprise wide sharing and reuse initiatives (such as data warehousing), a
		well-scoped and focused function to manage key metadata artifacts is
		cost-effective and critical. </para><quoted-from>Gartner
	 Group</quoted-from></quoted-block> 
	 <para>In response to the growing usage of information technology for
		conducting business by federal agencies, NARA has made the decision to build
		the Electronics Records Archive (ERA) system. Its main goal is ?to preserve
		electronic records independent of the hardware and software that created them?
		For the ERA system to store, preserve, and provide access to electronic
		records, it has to cope with the following challenges. Metadata will play a key
		role in the lifecycle of all records in ERA. Development of the UML Logical
		Data Model (LDM) metadata is a collaborative effort within NARA. </para> 
  </introduction><section> 
  <title>Introduction</title> 
  <introduction id="era-intro"><quoted-block> 
	 <para>ERA is the National Archives and Records Administration's strategic
		initiative to preserve and provide long-term access to uniquely valuable
		electronic records of the U.S. Government, and to transition government-wide
		management of the lifecycle of all records into the realm of
		e-government.</para><quoted-from>http://www.archives.gov/era/</quoted-from></quoted-block>
	 
	 <slide id="acronym"> 
		<title>Acronyms Used in Presentation</title><deflist><termgrp> 
		<term>ACE</term><definition>Archival Catalog Entry</definition></termgrp> 
		<termgrp> <term>ARC</term><definition>Archival Research
		Catalog</definition></termgrp><termgrp><term>ERA</term><definition>Electronic
		Records Administration</definition></termgrp><termgrp> 
		<term>IOC</term><definition>Initial Operating Capability</definition></termgrp>
		<termgrp><term>LCDRG</term><definition>Lifecycle Data Requirements
		Guide</definition></termgrp><termgrp><term>LDM</term><definition>Logical Data
		Model</definition></termgrp><termgrp> <term>NARA</term><definition>National
		Archive and Records Administration</definition></termgrp><termgrp> 
		<term>OAIS</term><definition>Open Archival Information System
		</definition></termgrp><termgrp> <term>SGML</term><definition>Standard
		Generalized Markup Language</definition></termgrp><termgrp> 
		<term>UML</term><definition>Unified Modeling
		Language</definition></termgrp></deflist> 
	 </slide> 
  </introduction> 
  <slide> 
	 <title>Electronic Records Archives</title> 
	 <list role="incremental"> 
		<item>Multi-year initiative.</item> 
		<item>Initial Operating Capability (IOC) - 6/27/08.</item> 
		<item>Project defined in 5 increments with increased
		  functionality.</item> 
		<item>Currently in increment 3.</item> 
	 </list> 
	 <instructors-notes> 
		<para>The IOC had limited functionality and only 5 agencies used the
		  system. Some of the intiial functionality was deferred in order to get to IOC.
		  </para> 
		<para>Goal of 30 agencies using ERA by the end of 2010. 2012 all agencies
		  should be using ERA. </para> 
	 </instructors-notes> 
  </slide> 
  <slide> 
	 <title>Business Objectives</title> 
	 <list role="incremental"> 
		<item>Manage the lifecycle of records </item> 
		<item>Provide long term preservation</item> 
		<item>Provide access to digital objects</item> 
		<item>Manage business forms used by NARA and Federal Agencies 
		  <list> 
			 <item> Records Schedule</item> 
			 <item>Transfer Request and Transfer Plan </item> 
			 <item> Legal Transfer Instrument</item> 
			 <item>Other lesser used forms</item> 
		  </list></item> 
	 </list> 
	 <para>The "Preservation Framework" is currently being developed. Metadata
		will accommodate information needed for the preservation framework.</para> 
  </slide> 
  <slide> 
	 <title>Technical ERA Objectives</title> 
	 <list> 
		<item>Develop a modernized system with automatic workflow that can
		  streamline the digital archive business process. 
		  <list> 
			 <item>XML is used for both metadata and business objects</item> 
			 <item>Manage workflow</item> 
			 <item>Create and maintain information (metadata) about digital
				objects 
				<list> 
				  <item>Provide the ability to transform objects</item> 
				  <item>Provide representations of objects</item> 
				</list></item> 
			 <item>Build on a SOA Framework</item> 
		  </list></item> 
	 </list> 
  </slide> 
  <slide> 
	 <title>New Technologies Introduced in Increment 3</title> 
	 <list> 
		<item>XForms for creation of Business Objects (Orbeon)</item> 
		<item>Redesign and enhancement of archival metadata (ACE)</item> 
		<item>XML Repository for Business Objects and Metadata (MarkLogic Server)
		  </item> 
		<item>Enhanced workflow management (SoftwareAG)</item> 
		<item>Search capability</item> 
	 </list><note>NARA's ERA Systems Engineering developed proof-of-concept
	 prototypes before new technologies were recommended for I3.</note> 
	 <instructors-notes> 
		<para>New technologies were introduced into ERA for various
		  reasons:</para> 
		<list> 
		  <item>JSP pages had thousands of lines of code and were fragile. 
			 <list> 
				<item>Transformation required for getting data in/out of
				  form.</item> 
			 </list> </item> 
		  <item>The current ACE design and implementation is unsatisfactory. 
			 <list> 
				<item>Not self-describing</item> 
				<item>Relationships are vague</item> 
				<item>Contain minimal data</item> 
			 </list></item> 
		  <item>Simple changes to business objects are difficult and
			 costly.</item> 
		  <item>The current ECM doesn't support XML object well. 
			 <list> 
				<item>Objects cannot be deleted</item> 
				<item>No search capability</item> 
			 </list></item> 
		  <item>Workflow currently hardcoded in ERA. Workflow changes require
			 major recoding.</item> 
		  <item>Currently no search capability in ERA.</item> 
		</list> 
	 </instructors-notes> 
  </slide></section><section id="metdatastandards"> 
  <title>Archival Catalog Entry (ACE)</title> 
  <introduction id="Meta-intro"> 
	 <para>Organizations understand the importance of metadata. There are
		numerous metadata standards and implementations of metadata. NARA looked at
		many different standards (including their own) before defining the metadata
		(ACE) structure. This section will discuss the ACE concept and vision</para> 
  </introduction> 
  <slide id="definition2"> 
	 <title>Standard Metadata Vocabularies Analyzed</title> 
	 <para>Metadata is <i>data about data</i>. There are many different metadata
		standards</para> 
	 <list> 
		<item>Dublin Core</item> 
		<item>METS </item> 
		<item>MODS</item> 
		<item>EADS</item> 
		<item><b>PREMIS</b></item> 
	 </list> 
	 <instructors-notes> 
		<para>I3 is a complete rework of the original ACE for IOC. The new ACE
		  builds on the constructs of PREMIS. It includes NARA standard metadata that is
		  currently being used for archival descriptions.</para> 
	 </instructors-notes> 
  </slide> 
  <slide id="slide10"> 
	 <title>LCDRG (Lifecycle Data Requirements Guide)</title> 
	 <list> 
		<item>LCDRG NARA's standard for Lifecycle Metadata</item> 
		<item>LCDRG used for Archival Research Catalog (ARC)
		  <url>http://www.archives.gov/research/arc/</url></item> 
		<item>Contains metadata elements, as well as authority lists</item> 
		<item>LCDRG available on-line at:
		  <url>http://www.archives.gov/research/arc/lifecycle-data-requirements.pdf</url></item>
		
	 </list><figure role="incremental" id="lcdrgfigure"><caption>LCDRG
	 Structure</caption><!--<graphic name="graphics/LCDRG.jpg" width="468" height="338"
	 alt="LCDRG Structure"/><graphic name="graphics/LCDRG1.jpg" width="468"
	 height="338" alt="LCDRG Structure"/><graphic name="graphics/LCDRG2.jpg"
	 width="468" height="338" alt="LCDRG Structure"/>--><graphic
	 name="graphics/LCDRG3.jpg" alt="LCDRG Structure" width="429"
	 height="338"/></figure> 
  </slide> 
  <slide id="assets"> 
	 <title>ERA Assets</title> 
	 <para>All assets in ERA must contain an ACE entry. </para><deflist> 
	 <termgrp><term>Business Objects</term><definition>The documents that control
	 the communications between NARA and contributing entities (government,
	 collection holders, private donations)</definition></termgrp><termgrp> 
	 <term>Government Records</term><definition>Permanent records that sent
	 periodically to NARA for safe-keeping</definition></termgrp><termgrp> 
	 <term>Donations</term><definition>Private donations from organizations and
	 individuals</definition></termgrp></deflist> 
  </slide> 
  <slide id="ARC1"> 
	 <title>Archival Research Catalog</title> <quoted-block> 
	 <para>The Archival Research Catalog (ARC) is the online catalog of NARA's
		nationwide holdings in the Washington, DC area, Regional Archives and
		Presidential
		Libraries.</para><quoted-from>http://www.archives.gov/research/arc/about-arc.html</quoted-from></quoted-block>
	 
	 <list> 
		<header>Information in ARC</header> 
		<item>Archival Descriptions (currently 4 million entries) 
		  <list> 
			 <item>ARC descriptions cover mainly non-electronic records</item> 
			 <item>Intellectual metadata content written by archivists</item> 
			 <item>Representations of paper documents are available via NARA and
				NARA partners, i.e., footnote.com.</item> 
		  </list></item> 
		<item>Authority Lists</item> 
		<item>Thesauri Data 
		  <list> 
			 <item>Geographic Subjects </item> 
			 <item>People </item> 
			 <item>Topical Subjects </item> 
			 <item>Organizations</item> 
		  </list></item> 
	 </list> <figure id="archival-taxonomy-fig"><caption>NARA Archival
	 Hierarchy</caption><graphic name="graphics/arc-data-model.gif"
	 alt="NARA Archival Taxonomy" width="285" height="215"/></figure> 
  </slide> 
  <slide id="ACE"> 
	 <title>ERA Archival Catalog Entry (ACE)</title> <quoted-block> 
	 <para> ERA is the National Archives and Records Administration's strategic
		initiative to preserve and provide long-term access to uniquely valuable
		electronic records of the U.S. Government, and to transition government-wide
		management of the lifecycle of all records into the realm of
		e-government.</para><quoted-from>NARA</quoted-from></quoted-block> 
	 <para>Start with OAIS reference model </para> 
	 <para>Based on PREMIS model</para> 
	 <para>Incorporates LCDRG constructs that can be automatically populated on
		ingest.</para> 
  </slide> 
  <slide id="ArchivalMetadata"> 
	 <title>Archival Metadata</title> <figure id="ArchivalMetadata-fig"> 
	 <caption>Metadata for Increment 3</caption><graphic
	 name="graphics/ArchivalMetadata.jpg" width="664" height="367"
	 alt="Archival Metadata"/></figure> <note>In I3 all current ARC descriptions
	 will get an ERA ACE metadata description. </note> 
  </slide> 
  <slide id="relationship"> 
	 <title>Asset Relationships</title> <figure id="relationshipsfig"><caption><?xm-replace_text {caption}?></caption><graphic
	 name="graphics/relationships.jpg" width="525" height="431"
	 alt="ERA Object Relationships"/></figure> 
  </slide> 
  <slide id="challenge"> 
	 <title>LDM to Physical (XML) Model</title> 
	 <list> 
		<header>Challenges</header> 
		<item>Conversion of LDM to XML Schema</item> 
		<item>NARA considers LDM to be <i>normative</i> version of the data
		  model</item> 
		<item>Keeping LDM and XML schema is sync after initial migration</item> 
	 </list><figure id="snippet"><caption>LDM Snippet</caption><graphic
	 name="graphics/ACE.jpg" width="509" height="326" alt="ACE Model"/></figure> 
	 <figure id="ldm"><caption>Complete LDM Model</caption><graphic
	 name="graphics/DataModel.jpg" width="607" height="313"
	 alt="Data Model"/></figure> 
  </slide></section> <section id="section3"> 
  <title>XML Repository Proof of Concept</title> 
  <introduction id="repositorysection"><quoted-block> 
	 <para>The reason why the universe is eternal is that it does not live for
		itself; it gives life to others as it transforms</para><quoted-from> Lao
	 Tzu</quoted-from></quoted-block> 
	 <para>NARA ERA Systems Engineering is continuously analyzing where the ERA
		system is at the current time and where it needs to be when it is fully
		operational. They analyze data and software technologies to improve
		efficiencies within the ERA system.</para> 
  </introduction> 
  <slide id="objective"> 
	 <title>Objective</title> 
	 <list> 
		<item>Analyze the feasibility of XML Repository for ACEs and Business
		  Objects</item> 
		<item>Analyze the availability of software for native XML storage and
		  retrieval purposes</item> 
		<item>Standards-based approach</item> 
		<item>Proof of concept was initially performed within NARA Systems
		  Engineering <xref refid="footnote1">*</xref></item> 
	 </list> <note id="footnote1">The proof-of-concept will be
	 demonstrated.</note> 
	 <para>Based on NARA's proof-of-concept ERA systems integrator performed
		their own proof-of-concept.</para> 
  </slide> 
  <slide id="proof"> 
	 <title>Proof of Concept </title> 
	 <list> 
		<item>Data source - ARC XML files 
		  <list> 
			 <item>4 million ARC records</item> 
			 <item>ARC Desciptions</item> 
			 <item>Thesauri Data</item> 
		  </list></item> 
		<item>MarkLogic Server 
		  <list> 
			 <item>Use Application Developer for initial application (GOD BLESS
				the MarkLogic Application Builder Team)</item> 
			 <item>Customized the default application using XQuery</item> 
			 <item>OxygenXML used to edit Application Developer files via WEBDAV
				(Love Oxygen - GOD BLESS Oxygen Developers, as well)</item> 
		  </list></item> 
	 </list> 
  </slide> 
  <slide id="proofofconcept"> 
	 <title>What The Proof of Concept Is Not</title> 
	 <list> 
		<item>Fully-functioning ERA Metadata repository.</item> 
		<item>ARC data was available and was effective in proving the
		  availability of functionality.</item> 
		<item>Not an interactive content management system that provides editing
		  and updating of records.</item> 
	 </list> 
  </slide> 
  <slide id="proof1"> 
	 <title>Conclusions</title> 
	 <list> 
		<item>XML repository for metadata is appropriate.</item> 
		<item>Initial analysis proved it was scalable for large scale
		  applications.</item> 
		<item>Relatively easy to develop and maintain.</item> 
		<item>Robust built-in searching capabilities.</item> 
		<item>Easily integrates other information repositories that are available
		  on-line.</item>
		<item>Proof-of-concept only skimmed the surface of MarkLogic
		  capability.</item> 
	 </list> 
  </slide> 
  <slide id="Demo"> 
	 <title>Demo</title> 
	 <para>On-line demonstration.</para> 
  </slide></section><section id="section4"> 
  <title>Demo Screen Shots</title> 
  <introduction id="Introduction"> 
	 <para>The following screen shots are provided in order to demonstration
		some of the proof-of-concept functionality. This is being provided in-case the
		internet is unavailable during the presentation.</para> 
  </introduction> 
  <slide id="homescreen"> 
	 <title>Home Screen</title> 
	 <list> 
		<item>Facited Navigation</item> 
		<item>Number of objects</item> 
	 </list><figure id="home"><caption>Home Page</caption><graphic
	 name="graphics/HomePage.jpg" alt="Home Page ARC" width="746"
	 height="698"/></figure> 
  </slide> 
  <slide id="link"> 
	 <title>Ability to Incorporate Outside Sources</title><figure id="linkfig"> 
	 <caption>Link to Outside Image Repository</caption><graphic
	 name="graphics/LinkToImages.jpg" alt="Link to Images" width="740"
	 height="717"/></figure> 
  </slide></section>
</training-material></training>

