Introduction

I will state the obvious — XML is good!  However, there are times when you want to display the XML in other formats.  XML has the flexibility to create different types of output. There are instances where displaying data in a spreadsheet is optimal for presentation and analysis of data. MarkLogic is a repository for storing and managing XML documents.  MarkLogic uses standard XQuery for accessing and manipulating XML.  Microsoft Excel does a really nice job of providing a format for performing data analysis.  In this article I am going to show you how to create a tabbed Excel spreadsheet using XQuery (with a MarkLogic extension) and XML stored in MarkLogic.

The Data

For the purpose of this article the following XML data will be used as an example.  This is a simplistic XML but the ideas and coding can be easily modified for your data.  The main sections of the purchase order that we want to focus on is:

  1. vendor
  2. originator
  3. item

We will make a 3-tabbed Excel spreadsheet that contains the tabs VendorOriginator and ItemsOrdered. Below is a sample of the data as it resides in MarkLogic.

Simple XML File

<purchase-order unique-id=”po1″>
<date>2012-01-05</date>
<po-number>3</po-number>
<vendor>
<contact-info>
<name>Acme Products</name>
<address>
<street>1200 Industrial Blvd</street>
<city>Anywhere</city>
<state>CA</state>
<postalcode>99999</postalcode>
</address>
<phone>410-555-1212</phone>
</contact-info>
</vendor>
<originator>
<contact-info>
<name>Porky Pig</name>
<address>
<street>1600 Pig Road</street>
<city>Loonytoon</city>
<state>CA</state>
<postalcode>20089</postalcode>
<phone>301-555-1212</phone>
</address>
</contact-info>
</originator>
<ship-to>
<contact-info>
<name>Wiley Coyote</name>
<address>
<street>24 Bird Place</street>
<city>Looneytoons</city>
<state>CA</state>
<postalcode>99998</postalcode>
</address>
<phone>301-555-1313</phone>
</contact-info>
</ship-to>
<items-ordered>
<item id=”item1″>
<item-number>301</item-number>
<item-name>Big Hammer</item-name>
<description>Hammer used to hit Road Runner over the head.</description>
<qty>1</qty>
<cost>19.95</cost>
</item>
<item unique-id=”item2″>
<item-number>2</item-number>
<item-name>invisible Rope</item-name>
<description>Rope used to trip Road Runner.  Works better when
2 ropes are laid sequentially 10 ft. apart.
</description>
<qty>2</qty>
<cost>5.00</cost>
</item>
</items-ordered>
<delivery-options>
Please do not leave on door step.  Road Runner is very fast
and will retrieve items.
</delivery-options>
</purchase-order>

 

 

The XQuery Organization

It is important to know that the XQuery itself is pretty straight forward. However, creating the Excel 2003 XML needs to be exact and use appropriate Microsoft Excel syntax. There will be 3 Excel Worksheets and we will write an XQuery function that will create each Worksheet for us.
The following diagram shows the organization of the XQuery file:

 

Structure of the Xquery Document
XQuery Organization

We will create an XQuery function that will create a tab for each of the 3 Worksheets we want to create. I will talk more about these functions later.

Understanding Excel 2003 XML Structure

It is import that you understand the structure of the XML that you will be creating.  I prefer creating Excel spreadsheets using 2003 XML because it is pretty straight-forward.  If I am converting from Excel to another XML vocabulary I prefer using Excel 2007.  The structural components of Excel 2007 are:

Microsoft Excel 2003 Structure

<Workbook>
<DocumentProperties/>
<ExcelWorkbook/>
<Styles/>
<Worksheet>
<Table>
<Column/>
<Row>
<Cell>
<Data/>
</Cell>
</Row>
</Table>
</Worksheet>
</Workbook>

 

I will briefly describe each of the components:

  1. <Workbook> – Root element.
  2. <DocumentProperties> – Metadata about the Workbook, i.e., author, date, etc.
  3. <ExceWorkbook> – Window parameters.
  4. <Styles> – Individual styles are contained in this section.  The styles are referenced by other elements.
  5. <Worksheet> – This is where individual tabs are defined.  For this example we will have 3 tabs.
  6. <Table> – Wrapper element for the table. Very similar to CALS model.
  7. <Column> – This provides information about individual columns.  You should have a <Column> element for each column in the Worksheet.
  8. <Row> – The row defines each row in the Worksheet.
  9. <Cell> – The cell provides information about the cell including links to styles.
  10. <Data> – The actual text of the cell is in the <Data> element.

There are many elements inside the spreadsheet that needs to be defined.

Technical Admission:

Providing full-disclosure and admitting that I can be lazy and like to take shortcuts where ever possible.  I create an Excel template of the data I want to capture and perform a ‘Save As’ and save the file as Excel 2003 XML.  This method provides the skeleton for populating your XML into the Excel spreadsheet.  There are some modifications to the XML that you will need to make but for the most part you can take this skeleton at ‘face value’. The Excel spreadsheet I am using for this exercise can be downloaded here.

The XQuery

The body of the XQuery is a simple let statement.  A single variable is created that contains the entire spreadsheet.  On the return statement, the file is written to a writable location.  The location can be either inside MarkLogic or out the file system.  If the report is public I prefer to place the file under the HTTP server in order to provide access to the data immediately.

XQuery Header

In the XQuery header you will need to declare all the Excel spreadsheet namespaces.  You will also need to declare another custom namespace for your functions to reside it.  This custom namespace can use any prefix and URI you desire.  I use the prefix ‘custom’.  Notice in the header we have modified the normal XQuery declaration from “xquery version=”1.0” to “xquery version=”1.0-ml”.  We will be using MarkLogic extensions to write to an external file.

XQuery Header

xquery version “1.0-ml”;
declare namespace ss=”urn:schemas-microsoft-com:office:spreadsheet”;
declare namespace o=”urn:schemas-microsoft-com:office:office”;
declare namespace x=”urn:schemas-microsoft-com:office:excel”;
declare namespace html=”http://www.w3.org/TR/REC-html40″;
declare namespace xdmp=”http://marklogic.com/xdmp”;
declare namespace custom=”urn:eccnet:xquery:functions”;

 

Below is the skeleton of the query:

Begin Excel Spreadsheet

let $excel := 

<Workbook xmlns=”urn:schemas-microsoft-com:office:spreadsheet”
xmlns:o=”urn:schemas-microsoft-com:office:office”
xmlns:x=”urn:schemas-microsoft-com:office:excel”
xmlns:ss=”urn:schemas-microsoft-com:office:spreadsheet”
xmlns:html=”http://www.w3.org/TR/REC-html40″>

 

(:This is the metadata section:)

 

<DocumentProperties xmlns=”urn:schemas-microsoft-com:office:office”>
<Author>Betty</Author>
<LastAuthor> Harvey </LastAuthor>

 

(:You can use a function to output the current date and time.  This isn’t necessary but is a nice touch.:)
<Created>{fn:current-dateTime()}</Created>
<LastSaved>{fn:current-dateTime()}</LastSaved>

<Company>Electronic Commerce Connection, Inc.</Company>
<Version>12.00</Version>
</DocumentProperties>

 

(:Excel spreadsheet set up!:)
<ExcelWorkbook xmlns=”urn:schemas-microsoft-com:office:excel”>
<WindowHeight>14565</WindowHeight>
<WindowWidth>23835</WindowWidth>
<WindowTopX>120</WindowTopX>
<WindowTopY>60</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>

 

(:Styles Section:)
<Styles>

 

(:Default style.  If no style attribute is added to the cells, this default style will be picked up:)
<Style ss:ID=”Default” ss:Name=”Normal”>
<Alignment ss:Vertical=”Bottom”/>
<Borders/>
<Font ss:FontName=”Calibri” x:Family=”Swiss” ss:Size=”11″ ss:Color=”#000000″/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>

 

(:Header Style:)
<Style ss:ID=”s62″>
<Font ss:FontName=”Calibri” x:Family=”Swiss” ss:Size=”11″ ss:Color=”#000000″
ss:Bold=”1″/>
</Style>
</Styles>

 

Example of Worksheet for  Vendor

<Worksheet ss:Name=”Vendor”> 

<Table ss:ExpandedColumnCount=”7″>
<Column ss:AutoFitWidth=”0″ ss:Width=”66.75″/>
<Column ss:AutoFitWidth=”0″ ss:Width=”107.25″/>
<Column ss:AutoFitWidth=”0″ ss:Width=”129″/>
<Column ss:AutoFitWidth=”0″ ss:Width=”100.5″/>
<Column ss:AutoFitWidth=”0″ ss:Width=”104.25″/>
<Column ss:AutoFitWidth=”0″ ss:Width=”96″/>
<Column ss:AutoFitWidth=”0″ ss:Width=”141″/>

 

(:The first row contains header information.  The attribute ‘ss:StyleID’ attribute references the style number in the <Style> element and the ss:ID attribute.:)

 

<Row ss:StyleID=”s62″>
<Cell><Data ss:Type=”String”>ID</Data></Cell>
<Cell><Data ss:Type=”String”>Vendor Name</Data></Cell>
<Cell><Data ss:Type=”String”>Street</Data></Cell>
<Cell><Data ss:Type=”String”>City</Data></Cell>
<Cell><Data ss:Type=”String”>State</Data></Cell>
<Cell><Data ss:Type=”String”>Zip Code</Data></Cell>
<Cell><Data ss:Type=”String”>Phone</Data></Cell>
</Row>

 

(:Call the function ‘createVendor’.  The ‘create Vendor function will populate all the rows.:)

 

{custom:createVendor()}
</Table>
</Worksheet>

 

(:Create the last 2 Excel Tabs.:)

 

<Worksheet ss:Name=”Originator”>

 

{custom:createOriginator()}

</Worksheet>

<Worksheet ss:Name=”Items”>

..

{custom:createItems()}

</Worksheet>

 

 

 

Finalize the Workbook

 

</Workbook>

 

 

XQuery Return and Save the File

return 

xdmp:save(“http://www.eccnet.com/examples/po.xls”, document {$excel} )

 

 

Creating Functions

A custom function should be created for each of the tabs (<Worksheet>) to populate the rows from each of the purchase orders in your data base.  Below is an example of the function for the Vendor.  This function is called from the main XQuery after the header row of the <Worksheet>

declare function custom:createVendor() as node()+ {
for $po in collection()/descendant::purchase-order/vendor
let $po-id := $po/ancestor::purchase-order/po-number/text()
let $name := $po/contact-info/name/text()
let $street := $po/contact-info/address/street/text()
let $city := $po/contact-info/address/city/text()
let $state := $po/contact-info/address/state/text()
let $postalcode := $po/contact-info/address/postalcode/text()
let $phone := $po/contact-info/phone/text() 

return
<Row xmlns:ss=”urn:schemas-microsoft-com:office:spreadsheet”
xmlns=”urn:schemas-microsoft-com:office:spreadsheet”>
<Cell><Data ss:Type=”String”>{$po-id}</Data></Cell>
<Cell><Data ss:Type=”String”>{$name}</Data></Cell>
<Cell><Data ss:Type=”String”>{$street}</Data></Cell>
<Cell><Data ss:Type=”String”>{$city}</Data></Cell>
<Cell><Data ss:Type=”String”>{$state}</Data></Cell>
<Cell><Data ss:Type=”String”>{$postalcode}</Data></Cell>
<Cell><Data ss:Type=”String”>{$phone}</Data></Cell>

</Row>
};

 

Common Excel Spreadsheet Errors

After you have completed the XQuery and create the Excel spreadsheet you may encounter some errors in the spreadsheet because of tagging issues.  If there are errors in the tagging in the spreadsheet Microsoft places a log file in the directory:  C:\Documents and Settings\User\Local Settings\Temporary Internet Files\Content.MSO.  Depending on the Windows version the path to the log file may be different.  Excel will tell you that there is an error and provide the location and name of the file.

ExpandedRowCount Error

If you are using a created Excel spreadsheet as your template, an attribute will be included in the <Table> element that tells Excel how many rows are included in the table.  The ExpandedRowCount is optional.  Delete this attribute when creating the XQuery.

Table Tagging Created by Excel

<Table ss:ExpandedColumnCount=”6″ ss:ExpandedRowCount=”5″ x:FullColumns=”1″
x:FullRows=”1″ ss:DefaultRowHeight=”15″>

 

The above example shows the tagging from Excel.  You can delete all attributes and it won’t affect the spreadsheet.  However, you must delete the ‘ss:ExpandedRowCount’.  You could calculate the number o f rows that are created and populate this attribute with a valid number.  If the number doesn’t reflect accurately the number of rows in the resulting spreadsheet, the log file will display:

XML ERROR in Worksheet Setting
REASON:                Bad Value
FILE:        C:\DOCUME~1\BHarvey\LOCALS~1\Temp\po.xls
GROUP:  Worksheet
TAG:       Table
ATTRIB:  ExpandedRowCount
VALUE:   1

 

Extra Tagging

XQuery will pick up the elements if you do not extract the text node.  For example:

Let $phone := $po/contact-info/phone

Will result in <phone>…</phone>.  Excel will not know how to handle the tagging.  The following error will be included in the log file:

XML ERROR in Workbook Setting
REASON:     Illegal Tag
FILE: C:\DOCUME~1\BHarvey\LOCALS~1\Temp\po.xls
GROUP:      Workbook
TAG:  Row
 

Style References

Columns, rows and cells can reference styles from the <Styles> section of the Worksheet.  If your Xquery references a style id that  doesn’t have a corresponding style you will receive an error and the Excel spreadsheet will not open.  The following error will be included in the log file:

XML ERROR in Table
REASON:     Bad Value
FILE: C:\DOCUME~1\BHarvey\LOCALS~1\Temp\po.xls
GROUP:      Row
TAG:  Cell
ATTRIB:     StyleID
VALUE:      s63

 

Conclusion

There are a lot of different options and functions you can add automatically to your spreadsheet.  Once you have the basic constructs and methodology what you can do with spreadsheets is endless.  In future articles I will delve deeper into the functionality that can be automatically added to your spreadsheet.  This article just scrapes the surface and provides how to create a very basic tabbed Excel spreadsheet. The Xquery file for this article is available at http://www.eccnet.com/examples/po/CreatePOTabbedExcel.xqy.