diff options
Diffstat (limited to 'doc/extsearch.doc')
-rw-r--r-- | doc/extsearch.doc | 320 |
1 files changed, 320 insertions, 0 deletions
diff --git a/doc/extsearch.doc b/doc/extsearch.doc new file mode 100644 index 0000000..a86d1db --- /dev/null +++ b/doc/extsearch.doc @@ -0,0 +1,320 @@ +/****************************************************************************** + * + * Copyright (C) 1997-2012 by Dimitri van Heesch. + * + * Permission to use, copy, modify, and distribute this software and its + * documentation under the terms of the GNU General Public License is hereby + * granted. No representations are made about the suitability of this software + * for any purpose. It is provided "as is" without express or implied warranty. + * See the GNU General Public License for more details. + * + * Documents produced by Doxygen are derivative works derived from the + * input used in their production; they are not affected by this license. + * + */ +/*! \page extsearch External Indexing and Searching + +\section extsearch_intro Introduction + +With release 1.8.3, doxygen provides the ability to search through HTML using +an external indexing tool and search engine. +This has several advantages: +- For large projects it can have significant performance advantages over + doxygen's built-in search engine, as doxygen uses a rather simple indexing + algorithm. +- It allows combining the search data of multiple projects into one index, + allowing a global search across multiple doxygen projects. +- It allows adding additional data to the search index, i.e. other web pages + not produced by doxygen. +- The search engine needs to run on a web server, but clients can still browse + the web pages locally. + +To avoid that everyone has to start writing their own indexer and search +engine, doxygen provides an example tool for each action: `doxyindexer` +for indexing the data and `doxysearch.cgi` for searching through the index. + +The data flow is shown in the following diagram: +\dot +digraph Flow { + edge [fontname="helvetica",fontsize="10pt"]; + node [shape=ellipse,fontname="helvetica",fontsize="10pt"]; + doxygen; + doxyindexer; + doxysearch [label="doxysearch.cgi"]; + browser [label="HTML page\nin browser"]; + node [shape=note]; + searchdata [label="searchdata.xml"]; + searchindex [label="doxysearch.db"]; + + doxygen -> searchdata [label=" writes"]; + searchdata -> doxyindexer [label=" reads"]; + doxyindexer -> searchindex [label=" writes"]; + searchindex -> doxysearch [label=" reads"]; + doxysearch -> browser [label=" get results "]; + browser -> doxysearch [label=" query "]; +} +\enddot + +- `doxygen` produces the raw search data +- `doxyindexer` indexes the data into a search database `doxysearch.db` +- when a user performs a search from a doxygen generated HTML page, + the CGI binary `doxysearch.cgi` will be invoked. +- the `doxysearch.cgi` tool will perform a query on the database and return + the results. +- The browser will show the search results. + +\section extsearch_config Configuring + +The first step is to make the search engine available via a web server. +If you use `doxysearch.cgi` this means making the +<a href="http://en.wikipedia.org/wiki/Common_Gateway_Interface">CGI</a> binary +available from the web server (i.e. be able to run it from a +browser via an URL starting with http:) + +How to setup a web server is outside the scope of this document, +but if you for instance have Apache installed, you could simply copy the +`doxysearch.cgi` file from doxygen's `bin` dir to the `cgi-bin` of the +Apache web server. Read the <a href="http://httpd.apache.org/docs/2.2/howto/cgi.html">apache documentation</a> for details. + +To test if `doxysearch.cgi` is accessible start your web browser and +point to URL to the binary and add `?test` at the end + + http://yoursite.com/path/to/cgi/doxysearch.cgi?test + +You should get the following message: + + Test failed: cannot find search index doxysearch.db + +If you use Internet Explorer you may be prompted to download a file, +which will then contain this message. + +Since we didn't create or install a doxysearch.db it is ok for the test to +fail for this reason. How to correct this is discussed in the next section. + +Before continuing with the next section add the above +URL (without the `?test` part) to the `SEARCHENGINE_URL` tag in +doxygen's configuration file: + + SEARCHENGINE_URL = http://yoursite.com/path/to/cgi/doxysearch.cgi + +\subsection extsearch_single Single project index + +To use the external search option, make sure the following options are enabled +in doxygen's configuration file: + + SEARCHENGINE = YES + SERVER_BASED_SEARCH = YES + EXTERNAL_SEARCH = YES + +This will make doxygen generate a file called `searchdata.xml` in the output +directory (configured with \ref cfg_output_directory "OUTPUT_DIRECTORY"). +You can change the file name (and location) with the +\ref cfg_searchdata_file "SEARCHDATA_FILE" option. + +The next step is to put the raw search data into an index for efficient +searching. You can use `doxyindexer` for this. Simply run it from the command +line: + + doxyindexer searchdata.xml + +This will create a directory called `doxysearch.db` with some files in it. +By default the directory will be created at the location from which doxyindexer +was started, but you can change the directory using the `-o` option. + +Copy the `doxysearch.db` directory to the same directory as where +the `doxysearch.cgi` is located and rerun the browser test by pointing +the browser to + + http://yoursite.com/path/to/cgi/doxysearch.cgi?test + +You should now get the following message: + + Test successful. + +Now you should be enable to search for words and symbols from the HTML output. + +\subsection extsearch_multi Multi project index + +In case you have two doxygen projects A and B where B depends on A via a +tag file, i.e. the configuration of project A says: + + GENERATE_TAGFILES = A.tag + +and the configuration of project B has its dependency on A configured as +follows: + + TAGFILES = ../project_A/A.tag=../../project_A/html + +then it may be desirable to allow searching for words in both projects. + +To make this possible all that is needed is to combine the search data +for both projects into one index, i.e. run + + doxyindexer project_A/searchdata.xml project_B/searchdata.xml + +and then copy the resulting `doxysearch.db` to the directory where also +`doxysearch.cgi` used by project B is located. + +In case you also want to link to search results in project B +from the search page of project A (or in general +between two projects that are otherwise unrelated), +you need to give some additional information in order for doxygen to make +the right links. This is what the +\ref cfg_extra_search_mappings "EXTRA_SEARCH_MAPPINGS" option is for. + +Each project needs to have a tag file defined, i.e. in the above example +involving project A and B, also project B should define a tag file: + + GENERATE_TAGFILES = B.tag + +then project A can define the mapping as follows: + + EXTRA_SEARCH_MAPPINGS = B.tag=../../project_B/html + +with this addition, projects A and B can share the same search database. + +@note The mapping defined by `EXTRA_SEARCH_MAPPINGS` is treated as an +extension of the mappings already defined by `TAGFILES`. In case the same +tag file is mentioned in both options, the one in `TAGFILES` is used. + +\section extsearch_update Updating the index + +When you modify the source code, you should re-run doxygen to get up to date +documentation again. When using external searching you also need to update the +search index by re-running `doxyindexer`. You could wrap the call to doxygen +and doxyindexer together in a script to make this process easier. + +\section extsearch_api Programming interface + +Previous sections have assumed you use the tools `doxyindexer` +and `doxysearch.cgi` to do the indexing and searching, but you could also +write your own index and search tools if you like. + +For this 3 interfaces are important +- The format of the input for the index tool. +- The format of the input for the search engine. +- The format of the output of search engine. + +The next subsections describe these interfaces in more detail. + +\subsection extsearch_api_index Indexer input format + +The search data produced by doxygen follows the +<a href="http://wiki.apache.org/solr/UpdateXmlMessages">Solr XML index message</a> +format. + +The input for the indexer is an XML file, which consists of one `<add>` tag containing +multiple `<doc>` tags, which in turn contain multiple `<field>` tags. + +Here is an example of one doc node, which contains the search data and meta data for +one method: + + <add> + ... + <doc> + <field name="type">function</field> + <field name="name">QXmlReader::setDTDHandler</field> + <field name="args">(QXmlDTDHandler *handler)=0</field> + <field name="tag">qtools.tag</field> + <field name="url">de/df6/class_q_xml_reader.html#a0b24b1fe26a4c32a8032d68ee14d5dba</field> + <field name="keywords">setDTDHandler QXmlReader::setDTDHandler QXmlReader</field> + <field name="text">Sets the DTD handler to handler DTDHandler()</field> + </doc> + ... + </add> + +Each field has a name. The following field names are supported: +- *type*: the type of the search entry; can be one of: source, function, slot, + signal, variable, typedef, enum, enumvalue, property, event, related, + friend, define, file, namespace, group, package, page, dir +- *name*: the name of the search entry; for a method this is the qualified name of the method, + for a class it is the name of the class, etc. +- *args*: the parameter list (in case of functions or methods) +- *tag*: the name of the tag file used for this project. +- *url*: the (relative) URL to the HTML documentation for this entry. +- *keywords*: important words that are representative for the entry. When searching for such + keyword, this entry should get a higher rank in the search results. +- *text*: the documentation associated with the item. Note that only words are present, no markup. + +@note Due to the potentially large size of the XML file, it is recommended to use a +<a href="http://en.wikipedia.org/wiki/Simple_API_for_XML">SAX based parser</a> to process it. + +\subsection extsearch_api_search_in Search URL format + +When the search engine is invoked from a doxygen generated HTML page, a number of parameters are +passed to via the <a href="http://en.wikipedia.org/wiki/Query_string">query string</a>. + +The following fields are passed: +- *q*: the query text as entered by the user +- *n*: the number of search results requested. +- *p*: the number of search page for which to return the results. Each page has *n* values. +- *cb*: the name of the callback function, used for JSON with padding, see the next section. + +From the complete list of search results, the range `[n*p - n*(p+1)-1]` should be returned. + +Here is an example of how a query looks like. + + http://yoursite.com/path/to/cgi/doxysearch.cgi?q=list&n=20&p=1&cb=dummy + +It represents a query for the word 'list' (`q=list`) requesting 20 search results (`n=20`), +starting with the result number 20 (`p=1`) and using callback 'dummy' (`cb=dummy`): + + +@note The values are <a href="http://en.wikipedia.org/wiki/Percent-encoding">URL encoded</a> so they +have to be decoded before they can be used. + +\subsection extsearch_api_search_out Search results format + +When invoking the search engine as shown in the previous subsection, it should reply with +the results. The format of the reply is +<a href="http://en.wikipedia.org/wiki/JSONP">JSON with padding</a>, which is basically +a javascript struct wrapped in a function call. The name of function should be the name of +the callback (as passed with the *cb* field in the query). + +With the example query as shown the previous subsection the main structure of the reply should +look as follows: + + dummy({ + "hits":179, + "first":20, + "count":20, + "page":1, + "pages":9, + "query": "list", + "items":[ + ... + ]}) + +The fields have the following meaning: +- *hits*: the total number of search results (could be more than was requested). +- *first*: the index of first result returned: \f$\min(n*p,\mbox{\em hits})\f$. +- *count*: the actual number of results returned: \f$\min(n,\mbox{\em hits}-\mbox{\em first})\f$ +- *page*: the page number of the result: \f$p\f$ +- *pages*: the total number of pages: \f$\lceil\frac{\mbox{\em hits}}{n}\rceil\f$. +- *items*: an array containing the search data per result. + +Here is an example of how the element of the *items* array should look like: + + {"type": "function", + "name": "QDir::entryInfoList(const QString &nameFilter, int filterSpec=DefaultFilter, int sortSpec=DefaultSort) const", + "tag": "qtools.tag", + "url": "d5/d8d/class_q_dir.html#a9439ea6b331957f38dbad981c4d050ef", + "fragments":[ + "Returns a <span class=\"hl\">list</span> of QFileInfo objects for all files and directories...", + "... pointer to a QFileInfoList The <span class=\"hl\">list</span> is owned by the QDir object...", + "... to keep the entries of the <span class=\"hl\">list</span> after a subsequent call to this..." + ] + }, + +The fields for such an item have the following meaning: +- *type*: the type of the item, as found in the field with name "type" in the raw search data. +- *name*: the name of the item, including the parameter list, as found in the fields with + name "name" and "args" in the raw search data. +- *tag*: the name of the tag file, as found in the field with name "tag" in the raw search data. +- *url*: the name of the (relative) URL to the documentation, as found in the field with name "url" + in the raw search data. +- "fragments": an array with 0 or more fragments of text containing words that have been search for. + These words should be wrapped in `<span class="hl">` and `</span>` tags to highlight them + in the output. + +*/ |