E <      ^ R    MICHAEL Documentation > Editor's Guide > Reference >                    

 

Searching for XML documents in the database

2005-05-24 / 2005-06-16
Sévigny, Martin (AJLSM, France)

The MICHAEL platform production module lets you search XML documents in a database. There are two ways to search: simple search and XQuery search and report.

1) Simple search

The simple search is always available from the viewer, in the search zone at the bottom. The search bar looks like this:

To perform a search, you may specify four different informations:

  1. The search zone in the documents, either Fulltext or Identifier . A drop-down list on the left lets you choose the search zone. By default, full-text search is selected.
  2. The words to search are typed in a text box in the middle of the search bar.
  3. If you type many words, you can specify if you want to retrieve documents with all the words or at least one of the words. Radio buttons below the search zone drop-down list lets you choose your option : and means all the words, or means at least one word.
  4. The location, which means the folders where you want to search. A drop-down list lets you choose between the root of the database or the current folder.

Once these informations provided, you can run the query by clicking on the go button on the right. You will then get search results such as these ones:

The search results are close to what you see when you are browsing folders. Each document found is on its own row, with an icon to modify the document and another one to download it. Then, you may view the document by clicking on its title, or you can browse to its location (folder) by clicking on the folder name just before the document title.

This simple search engine aims at providing a quick way to identify documents to work on. Please note that if you know the identifier of a document, you can quickly find this sole document with the search bar also.

Finally, you will never find documents for which you don't have at least read access.

2) XQuery searching and reporting

XQuery is a query language – and much more – almots standardized by the W3C. In the MICHAEL production module, it will let you find document using complex queries on both the structure and the contents of documens. But it can also be used to generate complex reports on your data.

The following informations let you use the XQuery functionalities in the production module, but for more information on XQuery itself you should read relevant external documentation.

Run an XQuery

To run an XQuery, you must use the XQuery button in the toolbar at the bottom of pages.

If you intend to run the query on documents within a specific folder, you should browse to this folder before.

Once you click on this button, you will get a form like this one:

In this form, you can specify the search zone, you can select a predefined query or type a new query, you can select an XSLT transformation to format the results, and you can select the type of output. All these concepts are defined and explained below.

If you want to select a predefined XQuery, this query must be in the /XQuery folder in the database or in the /XQuery sub-folder within your home folder. It must also be in the specific MICHAEL format for XQuery, the simplest way is to use the forms to create one (see below). To select one, just use the Predefined query drop-down-list at the top of the form.

If you prefer to type a new query, then the next part of the form will be used. First, you will specify the search zone (folder) with the Path drop-down list. You can either select the root folder or the current folder. In all cases, the actual search zone will be the folder select and all its sub-folders, recursively.

You can type your XQuery in the Search query text box. If you need more space than this simple text box, click on the bigger link and you will then have a full text area such as:

Some XQuery examples are provided later in this document.

The result of an XQuery is generally in XML. If you would like to process this XML with an XSLT transformation to create another XML output or HTML, you can provide your own XSLT file by typing the URL of this XSLT in the XSLT transformation text box; this URL must be available for the server. You must also select the type of output for your XSLT transformation, either XML or HTML, with the appropriate radio buttons.

If you do not provide an XSLT transformation, you can instead select the output type between Xdepo navigation , XML Xdepo , XML raw and Xdepo stylesheet .

The Xdepo navigation output type will give standard results to be browsed in HTML, with one row per document found. This may be appropriate for queries resulting in complete documents.

The XML Xdepo output type will give you the XML used by Xdepo to build the navigation. It contains the XML output of your XQuery surrounded by context information.

The XML raw output type will return the exact XML created by your query.

3) Store and reuse an XQuery

The MICHAEL production module provides a user interface to create a XQuery and to store it in the database. It can be reused later, for modification or more probably execution. This user interface is built with s specific datatype built into the engine: xquery . A related form is also provided. This is why you can create such an XQuery anywhere in the database, using standard mechanism to add a document.

To create an XQuery, click on the Add document button in the folder where you want to put your XQuery. Then, in the Choose datatype form, select these values:

Once you click on the Create the document button, you will get the specific form to build en XQuery, such as this one:

This form is very similar to the one used to execute an XQuery (see previous version). You will find the name of the XQuery, the location where to execute it, the XQuery itself, and the type of output. The document you will create with this form is an XML document, containing all the information you provide. You can manage this document within the database as any other document.

The complete name of the XQuery is the name used to select it when you will run it. You can provide many names in different languages.

The path is the folder where you want to execute the query. Only documents in this folder and its sub-folders will be considered in the query execution.

You can provide an XSLT transformation to process the output of your XQuery with the Browse... button. This XSLT transformation must be available on your computer for uploading in the database. If you provide such as transformation, you must specify the type of output, a choice between XML and HTML.

The Query text area if for the XQuery itself. You can type in the XQuery you want to execute. If you include in your XQuery special strings such as param_0:# , param_1:# , etc., they will be replaced at execution time by parameters provided by the user with a special form. This is very useful to create a basic XQuery that can be reused with specific options. If you are using parameters in the XQuery, you must enter the number of parameters when you create it.

Finally, output types bear the same meaning as when you execute a query (see before).

Once you click on the Save button, the document will be stored in the database.

4) XQuery examples

Records with comments

This simple example finds records with at least one character in the comments for the records:

/*/metadata[comment[normalize-space(.)!='']]

This is close to an Xpath query but it still works with the XQuery engine. If you select Xdepo navigation as an output type, you will get standard search results.

Records modified since

The following query will find records modified since 40 days:

xquery version "1.0";
let $date := number(40)
return
/*/metadata[modification-date[days-from-duration(current-date() - xs:date(xs:dateTime(.))) < $date]]

For instance, if you select Xdepo XML as an output type, you will get this:

<xdepo:results>
  <xdepo:result>
    <metadata>
      <creation-date>2005-06-08T13:46:47</creation-date>
      <modification-date>2005-06-08T13:46:47</modification-date>
      <update>2005-06-08</update>
      <agent code="msevigny"></agent>
      <rights>Licenced under the Creative Commons Licence&#13;
(http://creativecommons.org/licenses/by-nc-sa/2.0/uk/)</rights>
      <language code="en"></language>
      <record-status code="valid"></record-status>
    </metadata>
  </xdepo:result>
  <xdepo:result>
    <metadata>
      <creation-date>2005-06-14T12:41:44</creation-date>
      <modification-date>2005-06-14T12:41:44</modification-date>
      <update>2005-06-14</update>
      <agent code="kfernie"></agent>
      <rights>Licenced under the Creative Commons Licence&#13;
(http://creativecommons.org/licenses/by-nc-sa/2.0/uk/)</rights>
      <language code="en"></language>
      <record-status code="draft"></record-status>
    </metadata>
  </xdepo:result>
</xdepo:results>

This output shows the the MICHAEL production engine outputs a root element named xdepo:results for general information about the query execution, and one xdepo:result element for each result in order to be able to contextualize it.

List of titles

Here is an example of a query that will return all the digital collection titles and will sort them in alphabetical order. The title will be displayed, or better structured in the output XML:

<documents>
{
   for $title in /digital-collection/identification/title
   order by $title
   return $title
}
</documents>

The output could be this one (namespace declarations have been omitted and the list of titles is truncated):

<?xml version="1.0" encoding="UTF-8"?>
<documents>
  <title>ARKive</title>
  <title>Aberdeen Art Gallery and Museums explorer</title>
  <title>About Medway</title>
  <title>Act of Union Virtual Library</title>
  ...
</documents>

We may also use the raw XML output type in order to process the result and get an HTML result. Here is an example XSLT transformation that does this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
    <xsl:template match="documents">
        <html>
            <head>
                <title>Liste of titles</title>
            </head>
            <body>
                <h1>Liste des documents avec un titre en français</h1>
                <ol>
                    <xsl:apply-templates/>
                </ol>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="title">
        <li><xsl:apply-templates/></li>
    </xsl:template>
</xsl:stylesheet>

The results would be something like: