diff options
Diffstat (limited to 'doc/src/xml-processing')
-rw-r--r-- | doc/src/xml-processing/xml-patterns.qdoc | 904 | ||||
-rw-r--r-- | doc/src/xml-processing/xml-processing.qdoc | 631 | ||||
-rw-r--r-- | doc/src/xml-processing/xquery-introduction.qdoc | 1023 |
3 files changed, 2558 insertions, 0 deletions
diff --git a/doc/src/xml-processing/xml-patterns.qdoc b/doc/src/xml-processing/xml-patterns.qdoc new file mode 100644 index 0000000..13191dd --- /dev/null +++ b/doc/src/xml-processing/xml-patterns.qdoc @@ -0,0 +1,904 @@ +/**************************************************************************** +** +** Copyright (C) 2009 Nokia Corporation and/or its subsidiary(-ies). +** Contact: Nokia Corporation (qt-info@nokia.com) +** +** This file is part of the documentation of the Qt Toolkit. +** +** $QT_BEGIN_LICENSE:LGPL$ +** No Commercial Usage +** This file contains pre-release code and may not be distributed. +** You may use this file in accordance with the terms and conditions +** contained in the Technology Preview License Agreement accompanying +** this package. +** +** GNU Lesser General Public License Usage +** Alternatively, this file may be used under the terms of the GNU Lesser +** General Public License version 2.1 as published by the Free Software +** Foundation and appearing in the file LICENSE.LGPL included in the +** packaging of this file. Please review the following information to +** ensure the GNU Lesser General Public License version 2.1 requirements +** will be met: http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html. +** +** In addition, as a special exception, Nokia gives you certain +** additional rights. These rights are described in the Nokia Qt LGPL +** Exception version 1.1, included in the file LGPL_EXCEPTION.txt in this +** package. +** +** If you have questions regarding the use of this file, please contact +** Nokia at qt-info@nokia.com. +** +** +** +** +** +** +** +** +** $QT_END_LICENSE$ +** +****************************************************************************/ + +/*! + \page xmlprocessing.html + \title Using XML Technologies + + \previouspage Working with the DOM Tree + \contentspage XML Processing + + \keyword Patternist + + \brief An overview of Qt's support for using XML technologies in + Qt programs. + + \tableofcontents + + \section1 Introduction + + XQuery is a language for traversing XML documents to select and + aggregate items of interest and to transform them for output as + XML or some other format. XPath is the \e{element selection} part + of XQuery. + + The QtXmlPatterns module supports using + \l{http://www.w3.org/TR/xquery} {XQuery 1.0} and + \l{http://www.w3.org/TR/xpath20} {XPath 2.0} in Qt applications, + for querying XML data \e{and} for querying + \l{QAbstractXmlNodeModel} {non-XML data that can be modeled to + look like XML}. The QtXmlPatterns module is included in the \l{Qt + Full Framework Edition}, and the \l{Open Source Versions of Qt}. + Readers who are not familiar with the XQuery/XPath language can read + \l {A Short Path to XQuery} for a brief introduction. + + \section1 Advantages of using QtXmlPatterns and XQuery + + The XQuery/XPath language simplifies data searching and + transformation tasks by eliminating the need for doing a lot of + C++ or Java procedural programming for each new query task. Here + is an XQuery that constructs a bibliography of the contents of a + library: + + \target qtxmlpatterns_example_query + \quotefile snippets/patternist/introductionExample.xq + + First, the query opens a \c{<bibliography>} element in the + output. The + \l{xquery-introduction.html#using-path-expressions-to-match-select-items} + {embedded path expression} then loads the XML document describing + the contents of the library (\c{library.xml}) and begins the + search. For each \c{<book>} element it finds, where the publisher + was Addison-Wesley and the publication year was after 1991, it + creates a new \c{<book>} element in the output as a child of the + open \c{<bibliography>} element. Each new \c{<book>} element gets + the book's title as its contents and the book's publication year + as an attribute. Finally, the \c{<bibliography>} element is + closed. + + The advantages of using QtXmlPatterns and XQuery in your Qt + programs are summarized as follows: + + \list + + \o \bold{Ease of development}: All the C++ programming required to + perform data query tasks can be replaced by a simple XQuery + like the example above. + + \o \bold{Comprehensive functionality}: The + \l{http://www.w3.org/TR/xquery/#id-expressions} {expression + syntax} and rich set of + \l{http://www.w3.org/TR/xpath-functions} {functions and + operators} provided by XQuery are sufficient for performing any + data searching, selecting, and sorting tasks. + + \o \bold{Conformance to standards}: Conformance to all applicable + XML and XQuery standards ensures that QtXmlPatterns can always + process XML documents generated by other conformant + applications, and that XML documents created with QtXmlPatterns + can be processed by other conformant applications. + + \o \bold{Maximal flexibility} The QtXmlPatterns module can be used + to query XML data \e{and} non-XML data that can be + \l{QAbstractXmlNodeModel} {modeled to look like XML}. + + \endlist + + \section1 Using the QtXmlPatterns module + + There are two ways QtXmlPatterns can be used to evaluate queries. + You can run the query engine in your Qt application using the + QtXmlPatterns C++ API, or you can run the query engine from the + command line using Qt's \c{xmlpatterns} command line utility. + + \section2 Running the query engine from your Qt application + + If we save the example XQuery shown above in a text file (e.g. + \c{myquery.xq}), we can run it from a Qt application using a + standard QtXmlPatterns code sequence: + + \snippet doc/src/snippets/code/src_xmlpatterns_api_qxmlquery.cpp 3 + + First construct a QFile for the text file containing the XQuery + (\c{myquery.xq}). Then create an instance of QXmlQuery and call + its \l{QXmlQuery::}{setQuery()} function to load and parse the + XQuery file. Then create an \l{QXmlSerializer} {XML serializer} to + output the query's result set as unformatted XML. Finally, call + the \l{QXmlQuery::}{evaluateTo()} function to evaluate the query + and serialize the results as XML. + + \note If you compile Qt yourself, the QtXmlPatterns module will + \e{not} be built if exceptions are disabled, or if you compile Qt + with a compiler that doesn't support member templates, e.g., MSVC + 6. + + See the QXmlQuery documentation for more information about the + QtXmlPatterns C++ API. + + \section2 Running the query engine from the command line utility + + \e xmlpatterns is a command line utility for running XQueries. It + expects the name of a file containing the XQuery text. + + \snippet doc/src/snippets/code/doc_src_qtxmlpatterns.qdoc 2 + + The XQuery in \c{myQuery.xq} will be evaluated and its output + written to \c stdout. Pass the \c -help switch to get the list of + input flags and their meanings. + + xmlpatterns can be used in scripting. However, the descriptions + and messages it outputs were not meant to be parsed and may be + changed in future releases of Qt. + + \target QtXDM + \section1 The XQuery Data Model + + XQuery represents data items as \e{atomic values} or \e{nodes}. An + atomic value is a value in the domain of one of the + \l{http://www.w3.org/TR/xmlschema-2/#built-in-datatypes} {built-in + datatypes} defined in \l{http://www.w3.org/TR/xmlschema-2} {Part + 2} of the W3C XML Schema. A node is normally an XML element or + attribute, but when non-XML data is \l{QAbstractXmlNodeModel} + {modeled to look like XML}, a node can also represent a non-XML + data items. + + When you run an XQuery using the C++ API in a Qt application, you + will often want to bind program variables to $variables in the + XQuery. After the query is evaluated, you will want to interpret + the sequence of data items in the result set. + + \section2 Binding program variables to XQuery variables + + When you want to run a parameterized XQuery from your Qt + application, you will need to \l{QXmlQuery::bindVariable()} {bind + variables} in your program to $name variables in your XQuery. + + Suppose you want to parameterize the bibliography XQuery in the + example above. You could define variables for the catalog that + contains the library (\c{$file}), the publisher name + (\c{$publisher}), and the year of publication (\c{$year}): + + \target qtxmlpatterns_example_query2 + \quotefile snippets/patternist/introExample2.xq + + Modify the QtXmlPatterns code to use one of the \l{QXmlQuery::} + {bindVariable()} functions to bind a program variable to each + XQuery $variable: + + \snippet doc/src/snippets/code/src_xmlpatterns_api_qxmlquery.cpp 4 + + Each program variable is passed to QtXmlPatterns as a QVariant of + the type of the C++ variable or constant from which it is + constructed. Note that QtXmlPatterns assumes that the type of the + QVariant in the bindVariable() call is the correct type, so the + $variable it is bound to must be used in the XQuery accordingly. + The following table shows how QVariant types are mapped to XQuery + $variable types: + + \table + + \header + \o QVariant type + \o XQuery $variable type + + \row + \o QVariant::LongLong + \o \c xs:integer + + \row + \o QVariant::Int + \o \c xs:integer + + \row + \o QVariant::UInt + \o \c xs:nonNegativeInteger + + \row + \o QVariant::ULongLong + \o \c xs:unsignedLong + + \row + \o QVariant::String + \o \c xs:string + + \row + \o QVariant::Double + \o \c xs:double + + \row + \o QVariant::Bool + \o \c xs:boolean + + \row + \o QVariant::Double + \o \c xs:decimal + + \row + \o QVariant::ByteArray + \o \c xs:base64Binary + + \row + \o QVariant::StringList + \o \c xs:string* + + \row + \o QVariant::Url + \o \c xs:string + + \row + \o QVariant::Date + \o \c xs:date. + + \row + \o QVariant::DateTime + \o \c xs:dateTime + + \row + \o QVariant::Time. + \o \c xs:time. (see \l{Binding To Time}{Binding To + QVariant::Time} below) + + \row + \o QVariantList + \o (see \l{Binding To QVariantList}{Binding To QVariantList} + below) + + \endtable + + A type not shown in the table is not supported and will cause + undefined XQuery behavior or a $variable binding error, depending + on the context in the XQuery where the variable is used. + + \target Binding To Time + \section3 Binding To QVariant::Time + + Because the instance of QTime used in QVariant::Time does not + include a zone offset, an instance of QVariant::Time should not be + bound to an XQuery variable of type \c xs:time, unless the QTime is + UTC. When binding a non-UTC QTime to an XQuery variable, it should + first be passed as a string, or converted to a QDateTime with an arbitrary + date, and then bound to an XQuery variable of type \c xs:dateTime. + + \target Binding To QVariantList + \section3 Binding To QVariantList + + A QVariantList can be bound to an XQuery $variable. All the + \l{QVariant}s in the list must be of the same atomic type, and the + $variable the variant list is bound to must be of that same atomic + type. If the QVariants in the list are not all of the same atomic + type, the XQuery behavior is undefined. + + \section2 Interpreting XQuery results + + When the results of an XQuery are returned in a sequence of \l + {QXmlResultItems} {result items}, atomic values in the sequence + are treated as instances of QVariant. Suppose that instead of + serializing the results of the XQuery as XML, we process the + results programatically. Modify the standard QtXmlPatterns code + sequence to call the overload of QXmlQuery::evaluateTo() that + populates a sequence of \l {QXmlResultItems} {result items} with + the XQuery results: + + \snippet doc/src/snippets/code/src_xmlpatterns_api_qxmlquery.cpp 5 + + Iterate through the \l {QXmlResultItems} {result items} and test + each QXmlItem to see if it is an atomic value or a node. If it is + an atomic value, convert it to a QVariant with \l {QXmlItem::} + {toAtomicValue()} and switch on its \l {QVariant::type()} {variant + type} to handle all the atomic values your XQuery might return. + The following table shows the QVariant type to expect for each + atomic value type (or QXmlName): + + \table + + \header + \o XQuery result item type + \o QVariant type returned + + \row + \o \c xs:QName + \o QXmlName (see \l{Handling QXmlNames}{Handling QXmlNames} + below) + + \row + \o \c xs:integer + \o QVariant::LongLong + + \row + \o \c xs:string + \o QVariant::String + + \row + \o \c xs:string* + \o QVariant::StringList + + \row + \o \c xs:double + \o QVariant::Double + + \row + \o \c xs:float + \o QVariant::Double + + \row + \o \c xs:boolean + \o QVariant::Bool + + \row + \o \c xs:decimal + \o QVariant::Double + + \row + \o \c xs:hexBinary + \o QVariant::ByteArray + + \row + \o \c xs:base64Binary + \o QVariant::ByteArray + + \row + \o \c xs:gYear + \o QVariant::DateTime + + \row + \o \c xs:gYearMonth + \o QVariant::DateTime + + \row + \o \c xs:gMonthDay + \o QVariant::DateTime + + \row + \o \c xs:gDay + \o QVariant::DateTime + + \row + \o \c xs:gMonth + \o QVariant::DateTime + + \row + \o \c xs:anyURI + \o QVariant::Url + + \row + \o \c xs:untypedAtomic + \o QVariant::String + + \row + \o \c xs:ENTITY + \o QVariant::String + + \row + \o \c xs:date + \o QVariant::DateTime + + \row + \o \c xs:dateTime + \o QVariant::DateTime + + \row + \o \c xs:time + \o (see \l{xstime-not-mapped}{No mapping for xs:time} below) + + \endtable + + \target Handling QXmlNames + \section3 Handling QXmlNames + + If your XQuery can return atomic value items of type \c{xs:QName}, + they will appear in your QXmlResultItems as instances of QXmlName. + Since the QVariant class does not support the QXmlName class + directly, extracting them from QXmlResultItems requires a bit of + slight-of-hand using the \l{QMetaType} {Qt metatype system}. We + must modify our example to use a couple of template functions, a + friend of QMetaType (qMetaTypeId<T>()) and a friend of QVariant + (qVariantValue<T>()): + + \snippet doc/src/snippets/code/src_xmlpatterns_api_qxmlquery.cpp 6 + + To access the strings in a QXmlName returned by an + \l{QXmlQuery::evaluateTo()} {XQuery evaluation}, the QXmlName must + be accessed with the \l{QXmlNamePool} {name pool} from the + instance of QXmlQuery that was used for the evaluation. + + \target xstime-not-mapped + \section3 No mapping for xs:time + + An instance of \c xs:time can't be represented correctly as an + instance of QVariant::Time, unless the \c xs:time is a UTC time. + This is because xs:time has a zone offset (0 for UTC) in addition + to the time value, which the QTime in QVariant::Time does not + have. This means that if an XQuery tries to return an atomic value + of type \c xs:time, an invalid QVariant will be returned. A query + can return an atomic value of type xs:time by either converting it + to an \c xs:dateTime with an arbitrary date, or to an \c xs:string. + + \section1 Using XQuery with Non-XML Data + + Although the XQuery language was designed for querying XML, with + QtXmlPatterns one can use XQuery for querying any data that can + be modeled to look like XML. Non-XML data is modeled to look like + XML by loading it into a custom subclass of QAbstractXmlNodeModel, + where it is then presented to the QtXmlPatterns XQuery engine via + the same API the XQuery engine uses for querying XML. + + When QtXmlPatterns loads and queries XML files and produces XML + output, it can always load the XML data into its default XML node + model, where it can be traversed efficiently. The XQuery below + traverses the product orders found in the XML file \e myOrders.xml + to find all the skin care product orders and output them ordered + by shipping date. + + \quotefile snippets/patternist/introAcneRemover.xq + + QtXmlPatterns can be used out of the box to perform this + query, provided \e myOrders.xml actually contains well-formed XML. It + can be loaded directly into the default XML node model and + traversed. But suppose we want QtXmlPatterns to perform queries on + the hierarchical structure of the local file system. The default + XML node model in QtXmlPatterns is not suitable for navigating the + file system, because there is no XML file to load that contains a + description of it. Such an XML file, if it existed, might look + something like this: + + \quotefile snippets/patternist/introFileHierarchy.xml + + The \l{File System Example}{File System Example} does exactly this. + + There is no such file to load into the default XML node model, but + one can write a subclass of QAbstractXmlNodeModel to represent the + file system. This custom XML node model, once populated with all + the directory and file descriptors obtained directly from the + system, presents the complete file system hierarchy to the query + engine via the same API used by the default XML node model to + present the contents of an XML file. In other words, once the + custom XML node model is populated, it presents the file system to + the query engine as if a description of it had been loaded into + the default XML node model from an XML file like the one shown + above. + + Now we can write an XQuery to find all the XML files and parse + them to find the ones that don't contain well-formed XML. + + \quotefromfile snippets/patternist/introNavigateFS.xq + \skipto <html> + \printuntil + + Without QtXmlPatterns, there is no simple way to solve this kind + of problem. You might do it by writing a C++ program to traverse + the file system, sniff out all the XML files, and submit each one + to an XML parser to test that it contains valid XML. The C++ code + required to write that program will probably be more complex than + the C++ code required to subclass QAbstractXmlNodeModel, but even + if the two are comparable, your custom C++ program can be used + only for that one task, while your custom XML node model can be + used by any XQuery that must navigate the file system. + + The general approach to using XQuery to perform queries on non-XML + data has been a three step process. In the first step, the data is + loaded into a non-XML data model. In the second step, the non-XML + data model is serialized as XML and output to XML (text) files. In + the final step, an XML tool loads the XML files into a second, XML + data model, where the XQueries can be performed. The development + cost of implementing this process is often high, and the three + step system that results is inefficient because the two data + models must be built and maintained separately. + + With QtXmlPatterns, subclassing QAbstractXmlNodeModel eliminates + the transformation required to convert the non-XML data model to + the XML data model, because there is only ever one data model + required. The non-XML data model presents the non-XML data to the + query engine via the XML data model API. Also, since the query + engine uses the API to access the QAbstractXmlNodeModel, the data + model subclass can construct the elements, attributes and other + data on demand, responding to the query's specific requests. This + can greatly improve efficiency, because it means the entire model + might not have to be built. For example, in the file system model + above, it is not necessary to build an instance for a whole + XML file representing the whole file system. Instead nodes are + created on demand, which also likely is a small subset of the file + system. + + Examples of other places where XQuery could be used in + QtXmlPatterns to query non-XML data: + + \list + + \o The internal representation for word processor documents + + \o The set of dependencies for a software build system + + \o The hierarchy (or graph) that links a set of HTML documents + from a web crawler + + \o The images and meta-data in an image collection + + \o The set of D-Bus interfaces available in a system + + \o A QObject hierarchy, as seen in the \l{QObject XML Model + Example} {QObject XML Model example}. + + \endlist + + See the QAbstractXmlNodeModel documentation for information about + how to implement custom XML node models. + + \section1 More on using QtXmlPatterns with non-XML Data + + Subclassing QAbstractXmlNodeModel to let the query engine access + non-XML data by the same API it uses for XML is the feature that + enables QtXmlPatterns to query non-XML data with XQuery. It allows + XQuery to be used as a mapping layer between different non-XML + node models or between a non-XML node model and the built-in XML + node model. Once the subclass(es) of QAbstractXmlNodeModel have + been written, XQuery can be used to select a set of elements from + one node model, transform the selected elements, and then write + them out, either as XML using QXmlQuery::evaluateTo() and QXmlSerializer, + or as some other format using a subclass of QAbstractXmlReceiver. + + Consider a word processor application that must import and export + data in several different formats. Rather than writing a lot of + C++ code to convert each input format to an intermediate form, and + more C++ code to convert the intermediate form back to each + output format, one can implement a solution based on QtXmlPatterns + that uses simple XQueries to transform each XML or non-XML format + (e.g. MathFormula.xml below) to the intermediate form (e.g. the + DocumentRepresentation node model class below), and more simple + XQueries to transform the intermediate form back to each XML or + non-XML format. + + \image patternist-wordProcessor.png + + Because CSV files are not XML, a subclass of QAbstractXmlNodeModel + is used to present the CSV data to the XQuery engine as if it were + XML. What are not shown are the subclasses of QAbstractXmlReceiver + that would then send the selected elements into the + DocumentRepresentation node model, and the subclasses of + QAbstractXmlNodeModel that would ultimately write the output files + in each format. + + \section1 Security Considerations + + \section2 Code Injection + + XQuery is vulnerable to + \l{http://en.wikipedia.org/wiki/Code_injection} {code injection + attacks} in the same way as the SQL language. If an XQuery is + constructed by concatenating strings, and the strings come from + user input, the constructed XQuery could be malevolent. The best + way to prevent code injection attacks is to not construct XQueries + from user-written strings, but only accept user data input using + QVariant and variable bindings. See QXmlQuery::bindVariable(). + + The articles + \l{http://www.ibm.com/developerworks/xml/library/x-xpathinjection.html} + {Avoid the dangers of XPath injection}, by Robi Sen and + \l{http://www.packetstormsecurity.org/papers/bypass/Blind_XPath_Injection_20040518.pdf} + {Blind XPath Injection}, by Amit Klein, discuss the XQuery code + injection problem in more detail. + + \section2 Denial of Service Attacks + + Applications using QtXmlPatterns are subject to the same + limitations of software as other systems. Generally, these can not + be checked. This means QtXmlPatterns does not prevent rogue + queries from consuming too many resources. For example, a query + could take too much time to execute or try to transfer too much + data. A query could also do too much recursion, which could crash + the system. XQueries can do these things accidentally, but they + can also be done as deliberate denial of service attacks. + + \section1 Features and Conformance + + \section2 XQuery 1.0 + + QtXmlPatterns aims at being a + \l{http://www.w3.org/TR/xquery/#id-xquery-conformance} {conformant + XQuery processor}. It adheres to + \l{http://www.w3.org/TR/xquery/#id-minimal-conformance} {Minimal + Conformance} and supports the + \l{http://www.w3.org/TR/xquery/#id-serialization-feature} + {Serialization Feature} and the + \l{http://www.w3.org/TR/xquery/#id-full-axis-feature} {Full Axis + Feature}. QtXmlPatterns currently passes 97% of the tests in the + \l{http://www.w3.org/XML/Query/test-suite} {XML Query Test Suite}. + Areas where conformance may be questionable and where behavior may + be changed in future releases include: + + \list + + \o Some corner cases involving namespaces and element constructors + are incorrect. + + \o XPath is a subset of XQuery and the implementation of + QtXmlPatterns uses XPath 2.0 with XQuery 1.0. + + \endlist + + The specifications discusses conformance further: + \l{http://www.w3.org/TR/xquery/}{XQuery 1.0: An XML Query + Language}. W3C's XQuery testing effort can be of interest as + well, \l{http://www.w3.org/XML/Query/test-suite/}{XML Query Test + Suite}. + + Currently \c fn:collection() does not access any data set, and + there is no API for providing data through the collection. As a + result, evaluating \c fn:collection() returns the empty + sequence. We intend to provide functionality for this in a future + release of Qt. + + Only queries encoded in UTF-8 are supported. + + \section2 XSLT 2.0 + + Partial support for XSLT was introduced in Qt 4.5. Future + releases of QtXmlPatterns will aim to support these XSLT + features: + + \list + \o Basic XSLT 2.0 processor + \o Serialization feature + \o Backwards Compatibility feature + \endlist + + For details, see \l{http://www.w3.org/TR/xslt20/#conformance}{XSL + Transformations (XSLT) Version 2.0, 21 Conformance}. + + \note In this release, XSLT support is considered experimental. + + Unsupported or partially supported XSLT features are documented + in the following table. The implementation of XSLT in Qt 4.5 can + be seen as XSLT 1.0 but with the data model of XPath 2.0 and + XSLT 2.0, and using the using the functionality of XPath 2.0 and + its accompanying function library. When QtXmlPatterns encounters + an unsupported or partially support feature, it will either report + a syntax error or silently continue, unless otherwise noted in the + table. + + The implementation currently passes 42% of W3C's XSLT test suite, + which focus on features introduced in XSLT 2.0. + + \table + \header + \o XSL Feature + \o Support Status + \row + \o \c xsl:key and \c fn:key() + \o not supported + \row + \o \c xsl:include + \o not supported + \row + \o \c xsl:import + \o not supported + \row + \o \c xsl:copy + + \o The \c copy-namespaces and \c inherit-namespaces attributes + have no effect. For copied comments, attributes and + processing instructions, the copy has the same node + identity as the original. + + \row + \o \c xsl:copy-of + \o The \c copy-namespaces attribute has no effect. + \row + \o \c fn:format-number() + \o not supported + \row + \o \c xsl:message + \o not supported + \row + \o \c xsl:use-when + \o not supported + \row + \o \c Tunnel Parameters + \o not supported + \row + \o \c xsl:attribute-set + \o not supported + \row + \o \c xsl:decimal-format + \o not supported + \row + \o \c xsl:fallback + \o not supported + \row + \o \c xsl:apply-imports + \o not supported + \row + \o \c xsl:character-map + \o not supported + \row + \o \c xsl:number + \o not supported + \row + \o \c xsl:namespace-alias + \o not supported + \row + \o \c xsl:output + \o not supported + \row + \o \c xsl:output-character + \o not supported + \row + \o \c xsl:preserve-space + \o not supported + \row + \o \c xsl:result-document + \o not supported + \row + \o Patterns + \o Complex patterns or patterns with predicates have issues. + \row + \o \c 2.0 Compatibility Mode + + \o Stylesheets are interpreted as XSLT 2.0 stylesheets, even + if the \c version attribute is in the XSLT source is + 1.0. In other words, the version attribute is ignored. + + \row + \o Grouping + + \o \c fn:current-group(), \c fn:grouping-key() and \c + xsl:for-each-group. + + \row + \o Regexp elements + \o \c xsl:analyze-string, \c xsl:matching-substring, + \c xsl:non-matching-substring, and \c fn:regex-group() + \row + \o Date & Time formatting + \o \c fn:format-dateTime(), \c fn:format-date() and fn:format-time(). + + \row + \o XPath Conformance + \o Since XPath is a subset of XSLT, its issues are in affect too. + \endtable + + The QtXmlPatterns implementation of the XPath Data Model does not + include entities (due to QXmlStreamReader not reporting them). + This means that functions \c unparsed-entity-uri() and \c + unparsed-entity-public-id() always return negatively. + + \section2 XPath 2.0 + + Since XPath 2.0 is a subset of XQuery 1.0, XPath 2.0 is + supported. Areas where conformance may be questionable and, + consequently, where behavior may be changed in future releases + include: + + \list + \o Regular expression support is currently not conformant + but follows Qt's QRegExp standard syntax. + + \o Operators for \c xs:time, \c xs:date, and \c xs:dateTime + are incomplete. + + \o Formatting of very large or very small \c xs:double, \c + xs:float, and \c xs:decimal values may be incorrect. + \endlist + + \section2 xml:id + + Processing of XML files supports \c xml:id. This allows elements + that have an attribute named \c xml:id to be looked up efficiently + with the \c fn:id() function. See + \l{http://www.w3.org/TR/xml-id/}{xml:id Version 1.0} for details. + + \section2 XML Schema 1.0 + + There are two ways QtXmlPatterns can be used to validate schemas: + You can use the C++ API in your Qt application using the classes + QXmlSchema and QXmlSchemaValidator, or you can use the command line + utility named xmlpatternsvalidator (located in the "bin" directory + of your Qt build). + + The QtXmlPatterns implementation of XML Schema validation supports + the schema specification version 1.0 in large parts. Known problems + of the implementation and areas where conformancy may be questionable + are: + + \list + \o Large \c minOccurs or \c maxOccurs values or deeply nested ones + require huge amount of memory which might cause the system to freeze. + Such a schema should be rewritten to use \c unbounded as value instead + of large numbers. This restriction will hopefully be fixed in a later release. + \o Comparison of really small or large floating point values might lead to + wrong results in some cases. However such numbers should not be relevant + for day-to-day usage. + \o Regular expression support is currently not conformant but follows + Qt's QRegExp standard syntax. + \o Identity constraint checks can not use the values of default or fixed + attribute definitions. + \endlist + + \section2 Resource Loading + + When QtXmlPatterns loads an XML resource, e.g., using the + \c fn:doc() function, the following schemes are supported: + + \table + \header + \o Scheme Name + \o Description + \row + \o \c file + \o Local files. + \row + \o \c data + + \o The bytes are encoded in the URI itself. e.g., \c + data:application/xml,%3Ce%2F%3E is \c <e/>. + + \row + \o \c ftp + \o Resources retrieved via FTP. + \row + \o \c http + \o Resources retrieved via HTTP. + \row + \o \c https + \o Resources retrieved via HTTPS. This will succeed if no SSL + errors are encountered. + \row + \o \c qrc + \o Qt Resource files. Expressing it as an empty scheme, :/..., + is not supported. + + \endtable + + \section2 XML + + XML 1.0 and XML Namespaces 1.0 are supported, as opposed to the + 1.1 versions. When a strings is passed to a query as a QString, + the characters must be XML 1.0 characters. Otherwise, the behavior + is undefined. This is not checked. + + URIs are first passed to QAbstractUriResolver. Check + QXmlQuery::setUriResolver() for possible rewrites. +*/ + +/*! + \namespace QPatternist + \brief The QPatternist namespace contains classes and functions required by the QtXmlPatterns module. + \internal +*/ diff --git a/doc/src/xml-processing/xml-processing.qdoc b/doc/src/xml-processing/xml-processing.qdoc new file mode 100644 index 0000000..f675356 --- /dev/null +++ b/doc/src/xml-processing/xml-processing.qdoc @@ -0,0 +1,631 @@ +/**************************************************************************** +** +** Copyright (C) 2009 Nokia Corporation and/or its subsidiary(-ies). +** Contact: Nokia Corporation (qt-info@nokia.com) +** +** This file is part of the documentation of the Qt Toolkit. +** +** $QT_BEGIN_LICENSE:LGPL$ +** No Commercial Usage +** This file contains pre-release code and may not be distributed. +** You may use this file in accordance with the terms and conditions +** contained in the Technology Preview License Agreement accompanying +** this package. +** +** GNU Lesser General Public License Usage +** Alternatively, this file may be used under the terms of the GNU Lesser +** General Public License version 2.1 as published by the Free Software +** Foundation and appearing in the file LICENSE.LGPL included in the +** packaging of this file. Please review the following information to +** ensure the GNU Lesser General Public License version 2.1 requirements +** will be met: http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html. +** +** In addition, as a special exception, Nokia gives you certain +** additional rights. These rights are described in the Nokia Qt LGPL +** Exception version 1.1, included in the file LGPL_EXCEPTION.txt in this +** package. +** +** If you have questions regarding the use of this file, please contact +** Nokia at qt-info@nokia.com. +** +** +** +** +** +** +** +** +** $QT_END_LICENSE$ +** +****************************************************************************/ + +/*! + \group xml-tools + \title XML Classes + + \brief Classes that support XML, via, for example DOM and SAX. + + These classes are relevant to XML users. + + \generatelist{related} +*/ + +/*! + \page xml-processing.html + \title XML Processing + \brief An Overview of the XML processing facilities in Qt. + + In addition to core XML support, classes for higher level querying + and manipulation of XML data are provided by the QtXmlPatterns + module. In the QtSvg module, the QSvgRenderer and QSvgGenerator + classes can read and write a subset of SVG, an XML-based file + format. Qt also provides helper functions that may be useful to + those working with XML and XHTML: see Qt::escape() and + Qt::convertFromPlainText(). + + \section1 Topics: + + \list + \o \l {Classes for XML Processing} + \o \l {An Introduction to Namespaces} + \o \l {XML Streaming} + \o \l {The SAX Interface} + \o \l {Working with the DOM Tree} + \o \l {Using XML Technologies}{XQuery/XPath and XML Schema} + \list + \o \l{A Short Path to XQuery} + \endlist + \endlist + + \section1 Classes for XML Processing + + These classes are relevant to XML users. + + \annotatedlist xml-tools +*/ + +/*! + \page xml-namespaces.html + \title An Introduction to Namespaces + \target namespaces + + \contentspage XML Processing + \nextpage XML Streaming + + Parts of the Qt XML module documentation assume that you are familiar + with XML namespaces. Here we present a brief introduction; skip to + \link #namespacesConventions Qt XML documentation conventions \endlink + if you already know this material. + + Namespaces are a concept introduced into XML to allow a more modular + design. With their help data processing software can easily resolve + naming conflicts in XML documents. + + Consider the following example: + + \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 6 + + Here we find three different uses of the name \e title. If you wish to + process this document you will encounter problems because each of the + \e titles should be displayed in a different manner -- even though + they have the same name. + + The solution would be to have some means of identifying the first + occurrence of \e title as the title of a book, i.e. to use the \e + title element of a book namespace to distinguish it from, for example, + the chapter title, e.g.: + \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 7 + + \e book in this case is a \e prefix denoting the namespace. + + Before we can apply a namespace to element or attribute names we must + declare it. + + Namespaces are URIs like \e http://www.example.com/fnord/book/. This + does not mean that data must be available at this address; the URI is + simply used to provide a unique name. + + We declare namespaces in the same way as attributes; strictly speaking + they \e are attributes. To make for example \e + http://www.example.com/fnord/ the document's default XML namespace \e + xmlns we write + + \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 8 + + To distinguish the \e http://www.example.com/fnord/book/ namespace from + the default, we must supply it with a prefix: + + \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 9 + + A namespace that is declared like this can be applied to element and + attribute names by prepending the appropriate prefix and a ":" + delimiter. We have already seen this with the \e book:title element. + + Element names without a prefix belong to the default namespace. This + rule does not apply to attributes: an attribute without a prefix does + not belong to any of the declared XML namespaces at all. Attributes + always belong to the "traditional" namespace of the element in which + they appear. A "traditional" namespace is not an XML namespace, it + simply means that all attribute names belonging to one element must be + different. Later we will see how to assign an XML namespace to an + attribute. + + Due to the fact that attributes without prefixes are not in any XML + namespace there is no collision between the attribute \e title (that + belongs to the \e author element) and for example the \e title element + within a \e chapter. + + Let's clarify this with an example: + \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 10 + + Within the \e document element we have two namespaces declared. The + default namespace \e http://www.example.com/fnord/ applies to the \e + book element, the \e chapter element, the appropriate \e title element + and of course to \e document itself. + + The \e book:author and \e book:title elements belong to the namespace + with the URI \e http://www.example.com/fnord/book/. + + The two \e book:author attributes \e title and \e name have no XML + namespace assigned. They are only members of the "traditional" + namespace of the element \e book:author, meaning that for example two + \e title attributes in \e book:author are forbidden. + + In the above example we circumvent the last rule by adding a \e title + attribute from the \e http://www.example.com/fnord/ namespace to \e + book:author: the \e fnord:title comes from the namespace with the + prefix \e fnord that is declared in the \e book:author element. + + Clearly the \e fnord namespace has the same namespace URI as the + default namespace. So why didn't we simply use the default namespace + we'd already declared? The answer is quite complex: + \list + \o attributes without a prefix don't belong to any XML namespace at + all, not even to the default namespace; + \o additionally omitting the prefix would lead to a \e title-title clash; + \o writing it as \e xmlns:title would declare a new namespace with the + prefix \e title instead of applying the default \e xmlns namespace. + \endlist + + With the Qt XML classes elements and attributes can be accessed in two + ways: either by refering to their qualified names consisting of the + namespace prefix and the "real" name (or \e local name) or by the + combination of local name and namespace URI. + + More information on XML namespaces can be found at + \l http://www.w3.org/TR/REC-xml-names/. + + \target namespacesConventions + \section1 Conventions Used in the Qt XML Documentation + + The following terms are used to distinguish the parts of names within + the context of namespaces: + \list + \o The \e {qualified name} + is the name as it appears in the document. (In the above example \e + book:title is a qualified name.) + \o A \e {namespace prefix} in a qualified name + is the part to the left of the ":". (\e book is the namespace prefix in + \e book:title.) + \o The \e {local part} of a name (also refered to as the \e {local + name}) appears to the right of the ":". (Thus \e title is the + local part of \e book:title.) + \o The \e {namespace URI} ("Uniform Resource Identifier") is a unique + identifier for a namespace. It looks like a URL + (e.g. \e http://www.example.com/fnord/ ) but does not require + data to be accessible by the given protocol at the named address. + \endlist + + Elements without a ":" (like \e chapter in the example) do not have a + namespace prefix. In this case the local part and the qualified name + are identical (i.e. \e chapter). + + \sa {DOM Bookmarks Example}, {SAX Bookmarks Example} +*/ + +/*! + \page xml-streaming.html + \title XML Streaming + + \previouspage An Introduction to Namespaces + \contentspage XML Processing + \nextpage The SAX Interface + + Since version 4.3, Qt provides two new classes for reading and + writing XML: QXmlStreamReader and QXmlStreamWriter. + + The QXmlStreamReader and QXmlStreamWriter are two new classes provided + in Qt 4.3 and later. A stream reader reports an XML document as a stream + of tokens. This differs from SAX as SAX applications provide handlers to + receive XML events from the parser whereas the QXmlStreamReader drives the + loop, pulling tokens from the reader when they are needed. + This pulling approach makes it possible to build recursive descent parsers, + allowing XML parsing code to be split into different methods or classes. + + QXmlStreamReader is a well-formed XML 1.0 parser that excludes external + parsed entities. Hence, data provided by the stream reader adheres to the + W3C's criteria for well-formed XML, as long as no error occurs. Otherwise, + functions such as \l{QXmlStreamReader::atEnd()}{atEnd()}, + \l{QXmlStreamReader::error()}{error()} and \l{QXmlStreamReader::hasError()} + {hasError()} can be used to check and view the errors. + + An example of QXmlStreamReader implementation would be the \c XbelReader in + \l{QXmlStream Bookmarks Example}, which is a subclass of QXmlStreamReader. + The constructor takes \a treeWidget as a parameter and the class has Xbel + specific functions: + + \snippet examples/xml/streambookmarks/xbelreader.h 1 + + \dots + \snippet examples/xml/streambookmarks/xbelreader.h 2 + \dots + + The \c read() function accepts a QIODevice and sets it with + \l{QXmlStreamReader::setDevice()}{setDevice()}. The + \l{QXmlStreamReader::raiseError()}{raiseError()} function is used to + display a custom error message, inidicating that the file's version + is incorrect. + + \snippet examples/xml/streambookmarks/xbelreader.cpp 1 + + The pendent to QXmlStreamReader is QXmlStreamWriter, which provides an XML + writer with a simple streaming API. QXmlStreamWriter operates on a + QIODevice and has specialised functions for all XML tokens or events you + want to write, such as \l{QXmlStreamWriter::writeDTD()}{writeDTD()}, + \l{QXmlStreamWriter::writeCharacters()}{writeCharacters()}, + \l{QXmlStreamWriter::writeComment()}{writeComment()} and so on. + + To write XML document with QXmlStreamWriter, you start a document with the + \l{QXmlStreamWriter::writeStartDocument()}{writeStartDocument()} function + and end it with \l{QXmlStreamWriter::writeEndDocument()} + {writeEndDocument()}, which implicitly closes all remaining open tags. + Element tags are opened with \l{QXmlStreamWriter::writeStartDocument()} + {writeStartDocument()} and followed by + \l{QXmlStreamWriter::writeAttribute()}{writeAttribute()} or + \l{QXmlStreamWriter::writeAttributes()}{writeAttributes()}, + element content, and then \l{QXmlStreamWriter::writeEndDocument()} + {writeEndDocument()}. Also, \l{QXmlStreamWriter::writeEmptyElement()} + {writeEmptyElement()} can be used to write empty elements. + + Element content comprises characters, entity references or nested elements. + Content can be written with \l{QXmlStreamWriter::writeCharacters()} + {writeCharacters()}, a function that also takes care of escaping all + forbidden characters and character sequences, + \l{QXmlStreamWriter::writeEntityReference()}{writeEntityReference()}, + or subsequent calls to \l{QXmlStreamWriter::writeStartElement()} + {writeStartElement()}. + + The \c XbelWriter class from \l{QXmlStream Bookmarks Example} is a subclass + of QXmlStreamWriter. Its \c writeFile() function illustrates the core + functions of QXmlStreamWriter mentioned above: + + \snippet examples/xml/streambookmarks/xbelwriter.cpp 1 +*/ + +/*! + \page xml-sax.html + \title The SAX interface + + \previouspage XML Streaming + \contentspage XML Processing + \nextpage Working with the DOM Tree + + SAX is an event-based standard interface for XML parsers. + The Qt interface follows the design of the SAX2 Java implementation. + Its naming scheme was adapted to fit the Qt naming conventions. + Details on SAX2 can be found at \l{http://www.saxproject.org}. + + Support for SAX2 filters and the reader factory are under + development. The Qt implementation does not include the SAX1 + compatibility classes present in the Java interface. + + \section1 Introduction to SAX2 + + The SAX2 interface is an event-driven mechanism to provide the user with + document information. An "event" in this context means something + reported by the parser, for example, it has encountered a start tag, + or an end tag, etc. + + To make it less abstract consider the following example: + \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 3 + + Whilst reading (a SAX2 parser is usually referred to as "reader") + the above document three events would be triggered: + \list 1 + \o A start tag occurs (\c{<quote>}). + \o Character data (i.e. text) is found, "A quotation.". + \o An end tag is parsed (\c{</quote>}). + \endlist + + Each time such an event occurs the parser reports it; you can set up + event handlers to respond to these events. + + Whilst this is a fast and simple approach to read XML documents, + manipulation is difficult because data is not stored, simply handled + and discarded serially. The \l{Working with the DOM Tree}{DOM interface} + reads in and stores the whole document in a tree structure; + this takes more memory, but makes it easier to manipulate the + document's structure. + + The Qt XML module provides an abstract class, \l QXmlReader, that + defines the interface for potential SAX2 readers. Qt includes a reader + implementation, \l QXmlSimpleReader, that is easy to adapt through + subclassing. + + The reader reports parsing events through special handler classes: + \table + \header \o Handler class \o Description + \row \o \l QXmlContentHandler + \o Reports events related to the content of a document (e.g. the start tag + or characters). + \row \o \l QXmlDTDHandler + \o Reports events related to the DTD (e.g. notation declarations). + \row \o \l QXmlErrorHandler + \o Reports errors or warnings that occurred during parsing. + \row \o \l QXmlEntityResolver + \o Reports external entities during parsing and allows users to resolve + external entities themselves instead of leaving it to the reader. + \row \o \l QXmlDeclHandler + \o Reports further DTD related events (e.g. attribute declarations). + \row \o \l QXmlLexicalHandler + \o Reports events related to the lexical structure of the + document (the beginning of the DTD, comments etc.). + \endtable + + These classes are abstract classes describing the interface. The \l + QXmlDefaultHandler class provides a "do nothing" default + implementation for all of them. Therefore users only need to overload + the QXmlDefaultHandler functions they are interested in. + + To read input XML data a special class \l QXmlInputSource is used. + + Apart from those already mentioned, the following SAX2 support classes + provide additional useful functionality: + \table + \header \o Class \o Description + \row \o \l QXmlAttributes + \o Used to pass attributes in a start element event. + \row \o \l QXmlLocator + \o Used to obtain the actual parsing position of an event. + \row \o \l QXmlNamespaceSupport + \o Used to implement namespace support for a reader. Note that + namespaces do not change the parsing behavior. They are only + reported through the handler. + \endtable + + The \l{SAX Bookmarks example} illustrates how to subclass + QXmlDefaultHandler to read an XML bookmark file (XBEL) and + how to generate XML by hand. + + \section1 SAX2 Features + + The behavior of an XML reader depends on its support for certain + optional features. For example, a reader may have the feature "report + attributes used for namespace declarations and prefixes along with + the local name of a tag". Like every other feature this has a unique + name represented by a URI: it is called + \e http://xml.org/sax/features/namespace-prefixes. + + The Qt SAX2 implementation can report whether the reader has + particular functionality using the QXmlReader::hasFeature() + function. Available features can be tested with QXmlReader::feature(), + and switched on or off using QXmlReader::setFeature(). + + Consider the example + \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 4 + A reader that does not support the \e + http://xml.org/sax/features/namespace-prefixes feature would report + the element name \e document but not its attributes \e xmlns:book and + \e xmlns with their values. A reader with the feature \e + http://xml.org/sax/features/namespace-prefixes reports the namespace + attributes if the \link QXmlReader::feature() feature\endlink is + switched on. + + Other features include \e http://xml.org/sax/features/namespace + (namespace processing, implies \e + http://xml.org/sax/features/namespace-prefixes) and \e + http://xml.org/sax/features/validation (the ability to report + validation errors). + + Whilst SAX2 leaves it to the user to define and implement whatever + features are required, support for \e + http://xml.org/sax/features/namespace (and thus \e + http://xml.org/sax/features/namespace-prefixes) is mandantory. + The \l QXmlSimpleReader implementation of \l QXmlReader, + supports them, and can do namespace processing. + + \l QXmlSimpleReader is not validating, so it + does not support \e http://xml.org/sax/features/validation. + + \section1 Namespace Support via Features + + As we have seen in the previous section, we can configure the + behavior of the reader when it comes to namespace + processing. This is done by setting and unsetting the + \e http://xml.org/sax/features/namespaces and + \e http://xml.org/sax/features/namespace-prefixes features. + + They influence the reporting behavior in the following way: + \list 1 + \o Namespace prefixes and local parts of elements and attributes can + be reported. + \o The qualified names of elements and attributes are reported. + \o \l QXmlContentHandler::startPrefixMapping() and \l + QXmlContentHandler::endPrefixMapping() are called by the reader. + \o Attributes that declare namespaces (i.e. the attribute \e xmlns and + attributes starting with \e{xmlns:}) are reported. + \endlist + + Consider the following element: + \snippet doc/src/snippets/code/doc_src_qtxml.qdoc 5 + With \e http://xml.org/sax/features/namespace-prefixes set to true + the reader will report four attributes; but with the \e + namespace-prefixes feature set to false only three, with the \e + xmlns:fnord attribute defining a namespace being "invisible" to the + reader. + + The \e http://xml.org/sax/features/namespaces feature is responsible + for reporting local names, namespace prefixes and URIs. With \e + http://xml.org/sax/features/namespaces set to true the parser will + report \e title as the local name of the \e fnord:title attribute, \e + fnord being the namespace prefix and \e http://example.com/fnord/ as + the namespace URI. When \e http://xml.org/sax/features/namespaces is + false none of them are reported. + + In the current implementation the Qt XML classes follow the definition + that the prefix \e xmlns itself isn't associated with any namespace at all + (see \link http://www.w3.org/TR/1999/REC-xml-names-19990114/#ns-using + http://www.w3.org/TR/1999/REC-xml-names-19990114/#ns-using \endlink). + Therefore even with \e http://xml.org/sax/features/namespaces and + \e http://xml.org/sax/features/namespace-prefixes both set to true + the reader won't return either a local name, a namespace prefix or + a namespace URI for \e xmlns:fnord. + + This might be changed in the future following the W3C suggestion + \link http://www.w3.org/2000/xmlns/ http://www.w3.org/2000/xmlns/ \endlink + to associate \e xmlns with the namespace \e http://www.w3.org/2000/xmlns. + + As the SAX2 standard suggests, \l QXmlSimpleReader defaults to having + \e http://xml.org/sax/features/namespaces set to true and + \e http://xml.org/sax/features/namespace-prefixes set to false. + When changing this behavior using \l QXmlSimpleReader::setFeature() + note that the combination of both features set to + false is illegal. + + \section2 Summary + + \l QXmlSimpleReader implements the following behavior: + + \table + \header \o (namespaces, namespace-prefixes) + \o Namespace prefix and local part + \o Qualified names + \o Prefix mapping + \o xmlns attributes + \row \o (true, false) \o Yes \o Yes* \o Yes \o No + \row \o (true, true) \o Yes \o Yes \o Yes \o Yes + \row \o (false, true) \o No* \o Yes \o No* \o Yes + \row \o (false, false) \i41 Illegal + \endtable + + The behavior of the entries marked with an asterisk (*) is not specified by SAX. + + \section1 Properties + + Properties are a more general concept. They have a unique name, + represented as an URI, but their value is \c void*. Thus nearly + anything can be used as a property value. This concept involves some + danger, though: there is no means of ensuring type-safety; the user + must take care that they pass the right type. Properties are + useful if a reader supports special handler classes. + + The URIs used for features and properties often look like URLs, e.g. + \c http://xml.org/sax/features/namespace. This does not mean that the + data required is at this address. It is simply a way of defining + unique names. + + Anyone can define and use new SAX2 properties for their readers. + Property support is not mandatory. + + To set or query properties the following functions are provided: \l + QXmlReader::setProperty(), \l QXmlReader::property() and \l + QXmlReader::hasProperty(). +*/ + +/*! + \page xml-dom.tml + \title Working with the DOM Tree + \target dom + + \previouspage The SAX Interface + \contentspage XML Processing + \nextpage {Using XML Technologies}{XQuery/XPath and XML Schema} + + DOM Level 2 is a W3C Recommendation for XML interfaces that maps the + constituents of an XML document to a tree structure. The specification + of DOM Level 2 can be found at \l{http://www.w3.org/DOM/}. + + \target domIntro + \section1 Introduction to DOM + + DOM provides an interface to access and change the content and + structure of an XML file. It makes a hierarchical view of the document + (a tree view). Thus -- in contrast to the SAX2 interface -- an object + model of the document is resident in memory after parsing which makes + manipulation easy. + + All DOM nodes in the document tree are subclasses of \l QDomNode. The + document itself is represented as a \l QDomDocument object. + + Here are the available node classes and their potential child classes: + + \list + \o \l QDomDocument: Possible children are + \list + \o \l QDomElement (at most one) + \o \l QDomProcessingInstruction + \o \l QDomComment + \o \l QDomDocumentType + \endlist + \o \l QDomDocumentFragment: Possible children are + \list + \o \l QDomElement + \o \l QDomProcessingInstruction + \o \l QDomComment + \o \l QDomText + \o \l QDomCDATASection + \o \l QDomEntityReference + \endlist + \o \l QDomDocumentType: No children + \o \l QDomEntityReference: Possible children are + \list + \o \l QDomElement + \o \l QDomProcessingInstruction + \o \l QDomComment + \o \l QDomText + \o \l QDomCDATASection + \o \l QDomEntityReference + \endlist + \o \l QDomElement: Possible children are + \list + \o \l QDomElement + \o \l QDomText + \o \l QDomComment + \o \l QDomProcessingInstruction + \o \l QDomCDATASection + \o \l QDomEntityReference + \endlist + \o \l QDomAttr: Possible children are + \list + \o \l QDomText + \o \l QDomEntityReference + \endlist + \o \l QDomProcessingInstruction: No children + \o \l QDomComment: No children + \o \l QDomText: No children + \o \l QDomCDATASection: No children + \o \l QDomEntity: Possible children are + \list + \o \l QDomElement + \o \l QDomProcessingInstruction + \o \l QDomComment + \o \l QDomText + \o \l QDomCDATASection + \o \l QDomEntityReference + \endlist + \o \l QDomNotation: No children + \endlist + + With \l QDomNodeList and \l QDomNamedNodeMap two collection classes + are provided: \l QDomNodeList is a list of nodes, + and \l QDomNamedNodeMap is used to handle unordered sets of nodes + (often used for attributes). + + The \l QDomImplementation class allows the user to query features of the + DOM implementation. + + To get started please refer to the \l QDomDocument documentation. + You might also want to take a look at the \l{DOM Bookmarks example}, + which illustrates how to read and write an XML bookmark file (XBEL) + using DOM. +*/ diff --git a/doc/src/xml-processing/xquery-introduction.qdoc b/doc/src/xml-processing/xquery-introduction.qdoc new file mode 100644 index 0000000..7e65b7b --- /dev/null +++ b/doc/src/xml-processing/xquery-introduction.qdoc @@ -0,0 +1,1023 @@ +/**************************************************************************** +** +** Copyright (C) 2009 Nokia Corporation and/or its subsidiary(-ies). +** Contact: Nokia Corporation (qt-info@nokia.com) +** +** This file is part of the documentation of the Qt Toolkit. +** +** $QT_BEGIN_LICENSE:LGPL$ +** No Commercial Usage +** This file contains pre-release code and may not be distributed. +** You may use this file in accordance with the terms and conditions +** contained in the Technology Preview License Agreement accompanying +** this package. +** +** GNU Lesser General Public License Usage +** Alternatively, this file may be used under the terms of the GNU Lesser +** General Public License version 2.1 as published by the Free Software +** Foundation and appearing in the file LICENSE.LGPL included in the +** packaging of this file. Please review the following information to +** ensure the GNU Lesser General Public License version 2.1 requirements +** will be met: http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html. +** +** In addition, as a special exception, Nokia gives you certain +** additional rights. These rights are described in the Nokia Qt LGPL +** Exception version 1.1, included in the file LGPL_EXCEPTION.txt in this +** package. +** +** If you have questions regarding the use of this file, please contact +** Nokia at qt-info@nokia.com. +** +** +** +** +** +** +** +** +** $QT_END_LICENSE$ +** +****************************************************************************/ + +/*! + \page xquery-introduction.html + \title A Short Path to XQuery + + \startpage Using XML Technologies + \target XQuery-introduction + +XQuery is a language for querying XML data or non-XML data that can be +modeled as XML. XQuery is specified by the \l{http://www.w3.org}{W3C}. + +\tableofcontents + +\section1 Introduction + +Where Java and C++ are \e{statement-based} languages, the XQuery +language is \e{expression-based}. The simplest XQuery expression is an +XML element constructor: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 20 + +This \c{<recipe/>} element is an XQuery expression that forms a +complete XQuery. In fact, this XQuery doesn't actually query +anything. It just creates an empty \c{<recipe/>} element in the +output. But \l{Constructing Elements} {constructing new elements in an +XQuery} is often necessary. + +An XQuery expression can also be enclosed in curly braces and embedded +in another XQuery expression. This XQuery has a document expression +embedded in a node expression: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 21 + +It creates a new \c{<html>} element in the output and sets its \c{id} +attribute to be the \c{id} attribute from an \c{<html>} element in the +\c{other.html} file. + +\section1 Using Path Expressions To Match & Select Items + +In C++ and Java, we write nested \c{for} loops and recursive functions +to traverse XML trees in search of elements of interest. In XQuery, we +write these iterative and recursive algorithms with \e{path +expressions}. + +A path expression looks somewhat like a typical \e{file pathname} for +locating a file in a hierarchical file system. It is a sequence of one +or more \e{steps} separated by slash '/' or double slash '//'. +Although path expressions are used for traversing XML trees, not file +systems, in QtXmlPatterms we can model a file system to look like an +XML tree, so in QtXmlPatterns we can use XQuery to traverse a file +system. See the \l {File System Example} {file system example}. + +Think of a path expression as an algorithm for traversing an XML tree +to find and collect items of interest. This algorithm is evaluated by +evaluating each step moving from left to right through the sequence. A +step is evaluated with a set of input items (nodes and atomic values), +sometimes called the \e focus. The step is evaluated for each item in +the focus. These evaluations produce a new set of items, called the \e +result, which then becomes the focus that is passed to the next step. +Evaluation of the final step produces the final result, which is the +result of the XQuery. The items in the result set are presented in +\l{http://www.w3.org/TR/xquery/#id-document-order} {document order} +and without duplicates. + +With QtXmlPatterns, a standard way to present the initial focus to a +query is to call QXmlQuery::setFocus(). Another common way is to let +the XQuery itself create the initial focus by using the first step of +the path expression to call the XQuery \c{doc()} function. The +\c{doc()} function loads an XML document and returns the \e {document +node}. Note that the document node is \e{not} the same as the +\e{document element}. The \e{document node} is a node constructed in +memory, when the document is loaded. It represents the entire XML +document, not the document element. The \e{document element} is the +single, top-level XML element in the file. The \c{doc()} function +returns the document node, which becomes the singleton node in the +initial focus set. The document node will have one child node, and +that child node will represent the document element. Consider the +following XQuery: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 18 + +The \c{doc()} function loads the file \l{cookbook.xml} and returns the +document node. The document node then becomes the focus for the next +step \c{//recipe}. Here the double slash means select all \c{<recipe>} +elements found below the document node, regardless of where they +appear in the document tree. The query selects all \c{<recipe>} +elements in the cookbook. See \l{Running The Cookbook Examples} for +instructions on how to run this query (and most of the ones that +follow) from the command line. + +Conceptually, evaluation of the steps of a path expression is similar +to iterating through the same number of nested \e{for} loops. Consider +the following XQuery, which builds on the previous one: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 19 + +This XQuery is a single path expression composed of three steps. The +first step creates the initial focus by calling the \c{doc()} +function. We can paraphrase what the query engine does at each step: + +\list 1 + \o for each node in the initial focus (the document node)... + \o for each descendant node that is a \c{<recipe>} element... + \o collect the child nodes that are \c{<title>} elements. +\endlist + +Again the double slash means select all the \c{<recipe>} elements in the +document. The single slash before the \c{<title>} element means select +only those \c{<title>} elements that are \e{child} elements of a +\c{<recipe>} element (i.e. not grandchildren, etc). The XQuery evaluates +to a final result set containing the \c{<title>} element of each +\c{<recipe>} element in the cookbook. + +\section2 Axis Steps + +The most common kind of path step is called an \e{axis step}, which +tells the query engine which way to navigate from the context node, +and which test to perform when it encounters nodes along the way. An +axis step has two parts, an \e{axis specifier}, and a \e{node test}. +Conceptually, evaluation of an axis step proceeds as follows: For each +node in the focus set, the query engine navigates out from the node +along the specified axis and applies the node test to each node it +encounters. The nodes selected by the node test are collected in the +result set, which becomes the focus set for the next step. + +In the example XQuery above, the second and third steps are both axis +steps. Both apply the \c{element(name)} node test to nodes encountered +while traversing along some axis. But in this example, the two axis +steps are written in a \l{Shorthand Form} {shorthand form}, where the +axis specifier and the node test are not written explicitly but are +implied. XQueries are normally written in this shorthand form, but +they can also be written in the longhand form. If we rewrite the +XQuery in the longhand form, it looks like this: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 22 + +The two axis steps have been expanded. The first step (\c{//recipe}) +has been rewritten as \c{/descendant-or-self::element(recipe)}, where +\c{descendant-or-self::} is the axis specifier and \c{element(recipe)} +is the node test. The second step (\c{title}) has been rewritten as +\c{/child::element(title)}, where \c{child::} is the axis specifier +and \c{element(title)} is the node test. The output of the expanded +XQuery will be exactly the same as the output of the shorthand form. + +To create an axis step, concatenate an axis specifier and a node +test. The following sections list the axis specifiers and node tests +that are available. + +\section2 Axis Specifiers + +An axis specifier defines the direction you want the query engine to +take, when it navigates away from the context node. QtXmlPatterns +supports the following axes. + +\table +\header + \o Axis Specifier + \o refers to the axis containing... + \row + \o \c{self::} + \o the context node itself + \row + \o \c{attribute::} + \o all attribute nodes of the context node + \row + \o \c{child::} + \o all child nodes of the context node (not attributes) + \row + \o \c{descendant::} + \o all descendants of the context node (children, grandchildren, etc) + \row + \o \c{descendant-or-self::} + \o all nodes in \c{descendant} + \c{self} + \row + \o \c{parent::} + \o the parent node of the context node, or empty if there is no parent + \row + \o \c{ancestor::} + \o all ancestors of the context node (parent, grandparent, etc) + \row + \o \c{ancestor-or-self::} + \o all nodes in \c{ancestor} + \c{self} + \row + \o \c{following::} + \o all nodes in the tree containing the context node, \e not + including \c{descendant}, \e and that follow the context node + in the document + \row + \o \c{preceding::} + \o all nodes in the tree contianing the context node, \e not + including \c{ancestor}, \e and that precede the context node in + the document + \row + \o \c{following-sibling::} + \o all children of the context node's \c{parent} that follow the + context node in the document + \row + \o \c{preceding-sibling::} + \o all children of the context node's \c{parent} that precede the + context node in the document +\endtable + +\section2 Node Tests + +A node test is a conditional expression that must be true for a node +if the node is to be selected by the axis step. The conditional +expression can test just the \e kind of node, or it can test the \e +kind of node and the \e name of the node. The XQuery specification for +\l{http://www.w3.org/TR/xquery/#node-tests} {node tests} also defines +a third condition, the node's \e {Schema Type}, but schema type tests +are not supported in QtXmlPatterns. + +QtXmlPatterns supports the following node tests. The tests that have a +\c{name} parameter test the node's name in addition to its \e{kind} +and are often called the \l{Name Tests}. + +\table +\header + \o Node Test + \o matches all... + \row + \o \c{node()} + \o nodes of any kind + \row + \o \c{text()} + \o text nodes + \row + \o \c{comment()} + \o comment nodes + \row + \o \c{element()} + \o element nodes (same as star: *) + \row + \o \c{element(name)} + \o element nodes named \c{name} + \row + \o \c{attribute()} + \o attribute nodes + \row + \o \c{attribute(name)} + \o attribute nodes named \c{name} + \row + \o \c{processing-instruction()} + \o processing-instructions + \row + \o \c{processing-instruction(name)} + \o processing-instructions named \c{name} + \row + \o \c{document-node()} + \o document nodes (there is only one) + \row + \o \c{document-node(element(name))} + \o document node with document element \c{name} +\endtable + +\target Shorthand Form +\section2 Shorthand Form + +Writing axis steps using the longhand form with axis specifiers and +node tests is semantically clear but syntactically verbose. The +shorthand form is easy to learn and, once you learn it, just as easy +to read. In the shorthand form, the axis specifier and node test are +implied by the syntax. XQueries are normally written in the shorthand +form. Here is a table of some frequently used shorthand forms: + +\table +\header + \o Shorthand syntax + \o Short for... + \o matches all... + \row + \o \c{name} + \o \c{child::element(name)} + \o child nodes that are \c{name} elements + + \row + \o \c{*} + \o \c{child::element()} + \o child nodes that are elements (\c{node()} matches + \e all child nodes) + + \row + \o \c{..} + \o \c{parent::node()} + \o parent nodes (there is only one) + + \row + \o \c{@*} + \o \c{attribute::attribute()} + \o attribute nodes + + \row + \o \c{@name} + \o \c{attribute::attribute(name)} + \o \c{name} attributes + + \row + \o \c{//} + \o \c{descendant-or-self::node()} + \o descendent nodes (when used instead of '/') + +\endtable + +The \l{http://www.w3.org/TR/xquery/}{XQuery language specification} +has a more detailed section on the shorthand form, which it calls the +\l{http://www.w3.org/TR/xquery/#abbrev} {abbreviated syntax}. More +examples of path expressions written in the shorthand form are found +there. There is also a section listing examples of path expressions +written in the \l{http://www.w3.org/TR/xquery/#unabbrev} {longhand +form}. + +\target Name Tests +\section2 Name Tests + +The name tests are the \l{Node Tests} that have the \c{name} +parameter. A name test must match the node \e name in addition to the +node \e kind. We have already seen name tests used: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 19 + +In this path expression, both \c{recipe} and \c{title} are name tests +written in the shorthand form. XQuery resolves these names +(\l{http://www.w3.org/TR/xquery/#id-basics}{QNames}) to their expanded +form using whatever +\l{http://www.w3.org/TR/xquery/#dt-namespace-declaration} {namespace +declarations} it knows about. Resolving a name to its expanded form +means replacing its namespace prefix, if one is present (there aren't +any present in the example), with a namespace URI. The expanded name +then consists of the namespace URI and the local name. + +But the names in the example above don't have namespace prefixes, +because we didn't include a namespace declaration in our +\c{cookbook.xml} file. However, we will often use XQuery to query XML +documents that use namespaces. Forgetting to declare the correct +namespace(s) in an XQuery is a common cause of XQuery failures. Let's +add a \e{default} namespace to \c{cookbook.xml} now. Change the +\e{document element} in \c{cookbook.xml} from: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 23 + +to... + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 24 + +This is called a \e{default namespace} declaration because it doesn't +include a namespace prefix. By including this default namespace +declaration in the document element, we mean that all unprefixed +\e{element} names in the document, including the document element +itself (\c{cookbook}), are automatically in the default namespace +\c{http://cookbook/namespace}. Note that unprefixed \e{attribute} +names are not affected by the default namespace declaration. They are +always considered to be in \e{no namespace}. Note also that the URL +we choose as our namespace URI need not refer to an actual location, +and doesn't refer to one in this case. But click on +\l{http://www.w3.org/XML/1998/namespace}, for example, which is the +namespace URI for elements and attributes prefixed with \c{xml:}. + +Now when we try to run the previous XQuery example, no output is +produced! The path expression no longer matches anything in the +cookbook file because our XQuery doesn't yet know about the namespace +declaration we added to the cookbook document. There are two ways we +can declare the namespace in the XQuery. We can give it a \e{namespace +prefix} (e.g. \c{c} for cookbook) and prefix each name test with the +namespace prefix: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 3 + +Or we can declare the namespace to be the \e{default element +namespace}, and then we can still run the original XQuery: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 4 + +Both methods will work and produce the same output, all the +\c{<title>} elements: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 5 + +But note how the output is slightly different from the output we saw +before we added the default namespace declaration to the cookbook file. +QtXmlPatterns automatically includes the correct namespace attribute +in each \c{<title>} element in the output. When QtXmlPatterns loads a +document and expands a QName, it creates an instance of QXmlName, +which retains the namespace prefix along with the namespace URI and +the local name. See QXmlName for further details. + +One thing to keep in mind from this namespace discussion, whether you +run XQueries in a Qt program using QtXmlPatterns, or you run them from +the command line using xmlpatterns, is that if you don't get the +output you expect, it might be because the data you are querying uses +namespaces, but you didn't declare those namespaces in your XQuery. + +\section3 Wildcards in Name Tests + +The wildcard \c{'*'} can be used in a name test. To find all the +attributes in the cookbook but select only the ones in the \c{xml} +namespace, use the \c{xml:} namespace prefix but replace the +\e{local name} (the attribute name) with the wildcard: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 7 + +Oops! If you save this XQuery in \c{file.xq} and run it through +\c{xmlpatterns}, it doesn't work. You get an error message instead, +something like this: \e{Error SENR0001 in file:///...file.xq, at line +1, column 1: Attribute xml:id can't be serialized because it appears +at the top level.} The XQuery actually ran correctly. It selected a +bunch of \c{xml:id} attributes and put them in the result set. But +then \c{xmlpatterns} sent the result set to a \l{QXmlSerializer} +{serializer}, which tried to output it as well-formed XML. Since the +result set contains only attributes and attributes alone are not +well-formed XML, the \l{QXmlSerializer} {serializer} reports a +\l{http://www.w3.org/TR/2005/WD-xslt-xquery-serialization-20050915/#id-errors} +{serialization error}. + +Fear not. XQuery can do more than just find and select elements and +attributes. It can \l{Constructing Elements} {construct new ones on +the fly} as well, which is what we need to do here if we want +\c{xmlpatterns} to let us see the attributes we selected. The example +above and the ones below are revisited in the \l{Constructing +Elements} section. You can jump ahead to see the modified examples +now, and then come back, or you can press on from here. + +To find all the \c{name} attributes in the cookbook and select them +all regardless of their namespace, replace the namespace prefix with +the wildcard and write \c{name} (the attribute name) as the local +name: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 8 + +To find and select all the attributes of the \e{document element} in +the cookbook, replace the entire name test with the wildcard: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 9 + +\section1 Using Predicates In Path Expressions + +Predicates can be used to further filter the nodes selected by a path +expression. A predicate is an expression in square brackets ('[' and +']') that either returns a boolean value or a number. A predicate can +appear at the end of any path step in a path expression. The predicate +is applied to each node in the focus set. If a node passes the +filter, the node is included in the result set. The query below +selects the recipe element that has the \c{<title>} element +\c{"Hard-Boiled Eggs"}. + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 10 + +The dot expression ('.') can be used in predicates and path +expressions to refer to the current context node. The following query +uses the dot expression to refer to the current \c{<method>} element. +The query selects the empty \c{<method>} elements from the cookbook. + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 11 + +Note that passing the dot expression to the +\l{http://www.w3.org/TR/xpath-functions/#func-string-length} +{string-length()} function is optional. When +\l{http://www.w3.org/TR/xpath-functions/#func-string-length} +{string-length()} is called with no parameter, the context node is +assumed: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 12 + +Actually, selecting an empty \c{<method>} element might not be very +useful by itself. It doesn't tell you which recipe has the empty +method: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 31 + +\target Empty Method Not Robust +What you probably want to see instead are the \c{<recipe>} elements that +have empty \c{<method>} elements: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 32 + +The predicate uses the +\l{http://www.w3.org/TR/xpath-functions/#func-string-length} +{string-length()} function to test the length of each \c{<method>} +element in each \c{<recipe>} element found by the node test. If a +\c{<method>} contains no text, the predicate evaluates to \c{true} and +the \c{<recipe>} element is selected. If the method contains some +text, the predicate evaluates to \c{false}, and the \c{<recipe>} +element is discarded. The output is the entire recipe that has no +instructions for preparation: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 33 + +The astute reader will have noticed that this use of +\c{string-length()} to find an empty element is unreliable. It works +in this case, because the method element is written as \c{<method/>}, +guaranteeing that its string length will be 0. It will still work if +the method element is written as \c{<method></method>}, but it will +fail if there is any whitespace between the opening and ending +\c{<method>} tags. A more robust way to find the recipes with empty +methods is presented in the section on \l{Boolean Predicates}. + +There are many more functions and operators defined for XQuery and +XPath. They are all \l{http://www.w3.org/TR/xpath-functions} +{documented here}. + +\section2 Positional Predicates + +Predicates are often used to filter items based on their position in +a sequence. For path expressions processing items loaded from XML +documents, the normal sequence is +\l{http://www.w3.org/TR/xquery/#id-document-order} {document order}. +This query returns the second \c{<recipe>} element in the +\c{cookbook.xml} file: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 13 + +The other frequently used positional function is +\l{http://www.w3.org/TR/xpath-functions/#func-last} {last()}, which +returns the numeric position of the last item in the focus set. Stated +another way, \l{http://www.w3.org/TR/xpath-functions/#func-last} +{last()} returns the size of the focus set. This query returns the +last recipe in the cookbook: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 16 + +And this query returns the next to last \c{<recipe>}: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 17 + +\section2 Boolean Predicates + +The other kind of predicate evaluates to \e true or \e false. A +boolean predicate takes the value of its expression and determines its +\e{effective boolean value} according to the following rules: + +\list + \o An expression that evaluates to a single node is \c{true}. + + \o An expression that evaluates to a string is \c{false} if the + string is empty and \c{true} if the string is not empty. + + \o An expression that evaluates to a boolean value (i.e. type + \c{xs:boolean}) is that value. + + \o If the expression evaluates to anything else, it's an error + (e.g. type \c{xs:date}). + +\endlist + +We have already seen some boolean predicates in use. Earlier, we saw +a \e{not so robust} way to find the \l{Empty Method Not Robust} +{recipes that have no instructions}. \c{[string-length(method) = 0]} +is a boolean predicate that would fail in the example if the empty +method element was written with both opening and closing tags and +there was whitespace between the tags. Here is a more robust way that +uses a different boolean predicate. + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 34 + +This one uses the +\l{http://www.w3.org/TR/xpath-functions/#func-empty} {empty()} and +function to test whether the method contains any steps. If the method +contains no steps, then \c{empty(step)} will return \c{true}, and +hence the predicate will evaluate to \c{true}. + +But even that version isn't foolproof. Suppose the method does contain +steps, but all the steps themselves are empty. That's still a case of +a recipe with no instructions that won't be detected. There is a +better way: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 35 + +This version uses the +\l{http://www.w3.org/TR/xpath-functions/#func-not} {not} and +\l{http://www.w3.org/TR/xpath-functions/#func-normalize-space} +{normalize-space()} functions. \c{normalize-space(method))} returns +the contents of the method element as a string, but with all the +whitespace normalized, i.e., the string value of each \c{<step>} +element will have its whitespace normalized, and then all the +normalized step values will be concatenated. If that string is empty, +then \c{not()} returns \c{true} and the predicate is \c{true}. + +We can also use the +\l{http://www.w3.org/TR/xpath-functions/#func-position} {position()} +function in a comparison to inspect positions with conditional logic. The +\l{http://www.w3.org/TR/xpath-functions/#func-position} {position()} +function returns the position index of the current context item in the +sequence of items: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 14 + +Note that the first position in the sequence is position 1, not 0. We +can also select \e{all} the recipes after the first one: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 15 + +\target Constructing Elements +\section1 Constructing Elements + +In the section about \l{Wildcards in Name Tests} {using wildcards in +name tests}, we saw three simple example XQueries, each of which +selected a different list of XML attributes from the cookbook. We +couldn't use \c{xmlpatterns} to run these queries, however, because +\c{xmlpatterns} sends the XQuery results to a \l{QXmlSerializer} +{serializer}, which expects to serialize the results as well-formed +XML. Since a list of XML attributes by itself is not well-formed XML, +the serializer reported an error for each XQuery. + +Since an attribute must appear in an element, for each attribute in +the result set, we must create an XML element. We can do that using a +\l{http://www.w3.org/TR/xquery/#id-for-let} {\e{for} clause} with a +\l{http://www.w3.org/TR/xquery/#id-variables} {bound variable}, and a +\l{http://www.w3.org/TR/xquery/#id-orderby-return} {\e{return} +clause} with an element constructor: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 25 + +The \e{for} clause produces a sequence of attribute nodes from the result +of the path expression. Each attribute node in the sequence is bound +to the variable \c{$i}. The \e{return} clause then constructs a \c{<p>} +element around the attribute node. Here is the output: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 28 + +The output contains one \c{<p>} element for each \c{xml:id} attribute +in the cookbook. Note that XQuery puts each attribute in the right +place in its \c{<p>} element, despite the fact that in the \e{return} +clause, the \c{$i} variable is positioned as if it is meant to become +\c{<p>} element content. + +The other two examples from the \l{Wildcards in Name Tests} {wildcard} +section can be rewritten the same way. Here is the XQuery that selects +all the \c{name} attributes, regardless of namespace: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 26 + +And here is its output: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 29 + +And here is the XQuery that selects all the attributes from the +\e{document element}: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 27 + +And here is its output: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 30 + +\section2 Element Constructors are Expressions + +Because node constructors are expressions, they can be used in +XQueries wherever expressions are allowed. + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 40 + +If \c{cookbook.xml} is loaded without error, a \c{<resept>} element +(Norweigian word for recipe) is constructed for each \c{<recipe>} +element in the cookbook, and the child nodes of the \c{<recipe>} are +copied into the \c{<resept>} element. But if the cookbook document +doesn't exist or does not contain well-formed XML, a single +\c{<resept>} element is constructed containing an error message. + +\section1 Constructing Atomic Values + +XQuery also has atomic values. An atomic value is a value in the value +space of one of the built-in datatypes in the \l +{http://www.w3.org/TR/xmlschema-2} {XML Schema language}. These +\e{atomic types} have built-in operators for doing arithmetic, +comparisons, and for converting values to other atomic types. See the +\l {http://www.w3.org/TR/xmlschema-2/#built-in-datatypes} {Built-in +Datatype Hierarchy} for the entire tree of built-in, primitive and +derived atomic types. \note Click on a data type in the tree for its +detailed specification. + +To construct an atomic value as element content, enclose an expression +in curly braces and embed it in the element constructor: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 36 + +Sending this XQuery through xmlpatterns produces: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 37 + +To compute the value of an attribute, enclose the expression in +curly braces and embed it in the attribute value: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 38 + +Sending this XQuery through xmlpatterns produces: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 39 + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 40 + +If \c{cookbook.xml} is loaded without error, a \c{<resept>} element +(Norweigian word for recipe) is constructed for each \c{<recipe>} +element in the cookbook, and the child nodes of the \c{<recipe>} are +copied into the \c{<resept>} element. But if the cookbook document +doesn't exist or does not contain well-formed XML, a single +\c{<resept>} element is constructed containing an error message. + +\section1 Running The Cookbook Examples + +Most of the XQuery examples in this document refer to the cookbook +written in XML shown below. Save it as \c{cookbook.xml}. In the same +directory, save one of the cookbook XQuery examples in a \c{.xq} file +(e.g. \c{file.xq}). Run the XQuery using Qt's command line utility: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 6 + +\section2 cookbook.xml + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 100 + +\section1 Further Reading + +There is much more to the XQuery language than we have presented in +this short introduction. We will be adding more here in later +releases. In the meantime, playing with the \c{xmlpatterns} utility +and making modifications to the XQuery examples provided here will be +quite informative. An XQuery textbook will be a good investment. + +You can also ask questions on XQuery mail lists: + +\list +\o +\l{http://qt.nokia.com/lists/qt-interest/}{qt-interest} +\o +\l{http://www.x-query.com/mailman/listinfo/talk}{talk at x-query.com}. +\endlist + +\l{http://www.functx.com/functx/}{FunctX} has a collection of XQuery +functions that can be both useful and educational. + +This introduction contains many links to the specifications, which, of course, +are the ultimate source of information about XQuery. They can be a bit +difficult, though, so consider investing in a textbook: + +\list + + \o \l{http://www.w3.org/TR/xquery/}{XQuery 1.0: An XML Query + Language} - the main source for syntax and semantics. + + \o \l{http://www.w3.org/TR/xpath-functions/}{XQuery 1.0 and XPath + 2.0 Functions and Operators} - the builtin functions and operators. + +\endlist + +\section1 FAQ + +The answers to these frequently asked questions explain the causes of +several common mistakes that most beginners make. Reading through the +answers ahead of time might save you a lot of head scratching. + +\section2 Why didn't my path expression match anything? + +The most common cause of this bug is failure to declare one or more +namespaces in your XQuery. Consider the following query for selecting +all the examples in an XHTML document: + +\quotefile snippets/patternist/simpleHTML.xq + +It won't match anything because \c{index.html} is an XHTML file, and +all XHTML files declare the default namespace +\c{"http://www.w3.org/1999/xhtml"} in their top (\c{<html>}) element. +But the query doesn't declare this namespace, so the path expression +expands \c{html} to \c{{}html} and tries to match that expanded name. +But the actual expanded name is +\c{{http://www.w3.org/1999/xhtml}html}. One possible fix is to declare the +correct default namespace in the XQuery: + +\quotefile snippets/patternist/simpleXHTML.xq + +Another common cause of this bug is to confuse the \e{document node} +with the top element node. They are different. This query won't match +anything: + +\quotefile snippets/patternist/docPlainHTML.xq + +The \c{doc()} function returns the \e{document node}, not the top +element node (\c{<html>}). Don't forget to match the top element node +in the path expression: + +\quotefile snippets/patternist/docPlainHTML2.xq + +\section2 What if my input namespace is different from my output namespace? + +Just remember to declare both namespaces in your XQuery and use them +properly. Consider the following query, which is meant to generate +XHTML output from XML input: + +\quotefile snippets/patternist/embedDataInXHTML.xq + +We want the \c{<html>}, \c{<body>}, and \c{<p>} nodes we create in the +output to be in the standard XHTML namespace, so we declare the +default namespace to be \c{http://www.w3.org/1999/xhtml}. That's +correct for the output, but that same default namespace will also be +applied to the node names in the path expression we're trying to match +in the input (\c{/tests/test[@status = "failure"]}), which is wrong, +because the namespace used in \c{testResult.xml} is perhaps in the +empty namespace. So we must declare that namespace too, with a +namespace prefix, and then use the prefix with the node names in +the path expression. This one will probably work better: + +\quotefile snippets/patternist/embedDataInXHTML2.xq + +\section2 Why doesn't my return clause work? + +Recall that XQuery is an \e{expression-based} language, not +\e{statement-based}. Because an XQuery is a lot of expressions, +understanding XQuery expression precedence is very important. +Consider the following query: + +\quotefile snippets/patternist/forClause2.xq + +It looks ok, but it isn't. It is supposed to be a FLWOR expression +comprising a \e{for} clause and a \e{return} clause, but it isn't just +that. It \e{has} a FLWOR expression, certainly (with the \e{for} and +\e{return} clauses), but it \e{also} has an arithmetic expression +(\e{+ $d}) dangling at the end because we didn't enclose the return +expression in parentheses. + +Using parentheses to establish precedence is more important in XQuery +than in other languages, because XQuery is \e{expression-based}. In +In this case, without parantheses enclosing \c{$i + $d}, the return +clause only returns \c{$i}. The \c{+$d} will have the result of the +FLWOR expression as its left operand. And, since the scope of variable +\c{$d} ends at the end of the \e{return} clause, a variable out of +scope error will be reported. Correct these problems by using +parentheses. + +\quotefile snippets/patternist/forClause.xq + +\section2 Why didn't my expression get evaluated? + +You probably misplaced some curly braces. When you want an expression +evaluated inside an element constructor, enclose the expression in +curly braces. Without the curly braces, the expression will be +interpreted as text. Here is a \c{sum()} expression used in an \c{<e>} +element. The table shows cases where the curly braces are missing, +misplaced, and placed correctly: + +\table +\header + \o element constructor with expression... + \o evaluates to... + \row + \o <e>sum((1, 2, 3))</e> + \o <e>sum((1, 2, 3))</e> + \row + \o <e>sum({(1, 2, 3)})</e> + \o <e>sum(1 2 3)</e> + \row + \o <e>{sum((1, 2, 3))}</e> + \o <e>6</e> +\endtable + +\section2 My predicate is correct, so why doesn't it select the right stuff? + +Either you put your predicate in the wrong place in your path +expression, or you forgot to add some parentheses. Consider this +input file \c{doc.txt}: + +\quotefile snippets/patternist/doc.txt + +Suppose you want the first \c{<span>} element of every \c{<p>} +element. Apply a position filter (\c{[1]}) to the \c{/span} path step: + +\quotefile snippets/patternist/filterOnStep.xq + +Applying the \c{[1]} filter to the \c{/span} step returns the first +\c{<span>} element of each \c{<p>} element: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 41 + +\note: You can write the same query this way: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 44 + +Or you can reduce it right down to this: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 45 + +On the other hand, suppose you really want only one \c{<span>} +element, the first one in the document (i.e., you only want the first +\c{<span>} element in the first \c{<p>} element). Then you have to do +more filtering. There are two ways you can do it. You can apply the +\c{[1]} filter in the same place as above but enclose the path +expression in parentheses: + +\quotefile snippets/patternist/filterOnPath.xq + +Or you can apply a second position filter (\c{[1]} again) to the +\c{/p} path step: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 43 + +Either way the query will return only the first \c{<span>} element in +the document: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 42 + +\section2 Why doesn't my FLWOR behave as expected? + +The quick answer is you probably expected your XQuery FLWOR to behave +just like a C++ \e{for} loop. But they aren't the same. Consider a +simple example: + +\quotefile snippets/patternist/letOrderBy.xq + +This query evaluates to \e{4 -4 -2 2 -8 8}. The \e{for} clause does +set up a \e{for} loop style iteration, which does evaluate the rest of +the FLWOR multiple times, one time for each value returned by the +\e{in} expression. That much is similar to the C++ \e{for} loop. + +But consider the \e{return} clause. In C++ if you hit a \e{return} +statement, you break out of the \e{for} loop and return from the +function with one value. Not so in XQuery. The \e{return} clause is +the last clause of the FLWOR, and it means: \e{Append the return value +to the result list and then begin the next iteration of the FLWOR}. +When the \e{for} clause's \e{in} expression no longer returns a value, +the entire result list is returned. + +Next, consider the \e{order by} clause. It doesn't do any sorting on +each iteration of the FLWOR. It just evaluates its expression on each +iteration (\c{$a} in this case) to get an ordering value to map to the +result item from each iteration. These ordering values are kept in a +parallel list. The result list is sorted at the end using the parallel +list of ordering values. + +The last difference to note here is that the \e{let} clause does +\e{not} set up an iteration through a sequence of values like the +\e{for} clause does. The \e{let} clause isn't a sort of nested loop. +It isn't a loop at all. It is just a variable binding. On each +iteration, it binds the \e{entire} sequence of values on the right to +the variable on the left. In the example above, it binds (4 -4) to +\c{$b} on the first iteration, (-2 2) on the second iteration, and (-8 +8) on the third iteration. So the following query doesn't iterate +through anything, and doesn't do any ordering: + +\quotefile snippets/patternist/invalidLetOrderBy.xq + +It binds the entire sequence (2, 3, 1) to \c{$i} one time only; the +\e{order by} clause only has one thing to order and hence does +nothing, and the query evaluates to 2 3 1, the sequence assigned to +\c{$i}. + +\note We didn't include a \e{where} clause in the example. The +\e{where} clause is for filtering results. + +\section2 Why are my elements created in the wrong order? + +The short answer is your elements are \e{not} created in the wrong +order, because when appearing as operands to a path expression, +there is no correct order. Consider the following query, +which again uses the input file \c{doc.txt}: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 46 + +The query finds all the \c{<p>} elements in the file. For each \c{<p>} +element, it builds a \c{<p>} element in the output containing the +concatenated contents of all the \c{<p>} element's child \c{<span>} +elements. Running the query through \c{xmlpatterns} might produce the +following output, which is not sorted in the expected order. + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 47 + +You can use a \e{for} loop to ensure that the order of +the result set corresponds to the order of the input sequence: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 48 + +This version produces the same result set but in the expected order: + +\snippet snippets/code/doc_src_qtxmlpatterns.qdoc 49 + +\section2 Why can't I use \c{true} and \c{false} in my XQuery? + +You can, but not by just using the names \c{true} and \c{false} +directly, because they are \l{Name Tests} {name tests} although they look +like boolean constants. The simple way to create the boolean values is +to use the builtin functions \c{true()} and \c{false()} wherever +you want to use \c{true} and \c{false}. The other way is to invoke the +boolean constructor: + +\quotefile snippets/patternist/xsBooleanTrue.xq + +*/ |