Technical Overview of the Project

Creating a web site from a text-oriented database

One major factor in the design of this application has been the desire to create a web site which is genuinely live, in that it draws it content directly from a constantly changing database. This has led to the choice of ModesXML as the underlying database engine. ModesXML is a generic database, originally designed for museum object cataloguing, which supports the storage and retrieval of any XML data for which a Document Type Definition (DTD) is available.

There are two principal data sources behind this site. The web pages themselves are derived from a database within which each record contains a page's content in XHTML, embedded within a simple holding structure for the page's metadata. This database plays the role of a content management system within the framework. The pages "know" their place in the site hierarchy (i.e. their parent page), and this information is used to generate the navigation panel dynamically.

At various points in the site, links are made from site pages to the Wordsworth Trust's database of manuscript transcriptions and page images. These typically take the form of "pre-defined" searches which retrieve specific sets of data. However, these "pre-defined" searches are always run on the live data. These means that the results returned will develop and change, as the Trust adds to and updates this primary database. Meanwhile, the same core data is available both for internal collections management use and for other forms of interpretation and publication.

The transcriptions database follows the Text Encoding Initiative (TEI) guidelines. Particular attention has been paid to the recording of features which illustrate the process of manuscript creation. It makes extensive use of the TEI's flexible "feature structure" facility to add metadata both to individual lines (and parts of lines) and to page images.

ModesXML has a relatively simple design (for an XML database), in that it indexes, and so supports the retrieval of, "records". Each record contains a single XML element and its subelements. These can be of any granularity, but by keeping the record size small you can improve the precision of retrieval. With a deeply-nested XML application like the TEI, this fragmentation has been achieved by including processing instructions which link a record's root element to its parent record. This technique allows us to record, and so retrieve, individual lines as separate records. When a more complete version of the data is required, an XSLT transform can (and does!) reconstruct the complete TEI document from these individual fragments.

Populating the databases

Most of the web site pages were originally authored using a word processor. They were then exported to XHTML format using Open Office 2.0's "export as XHTML" facility. This conversion does a good job of retaining the style information from the source document as a set of Cascading Style Sheet (CSS) instructions. This CSS was retained in the ModesXML version of the page, and is used to style the web pages you see. The XHTML version of the page was then processed by a simple XSLT transform which added the top-level structure required by our "web pages" XML application. This revised form was then imported into the ModesXML database.

Some pages (including this one!) were authored directly within ModesXML. A fallback CSS stylesheet is provided for those pages which were created in this way, and which lack style settings.

Most of the supporting TEI database has been populated using ModesXML directly. ModesXML facilities such as grid-based data entry, and the ability to copy whole chunks of data have been used to advantage in this work.