summaryrefslogtreecommitdiffstats
path: root/doc/extsearch.doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc/extsearch.doc')
-rw-r--r--doc/extsearch.doc320
1 files changed, 320 insertions, 0 deletions
diff --git a/doc/extsearch.doc b/doc/extsearch.doc
new file mode 100644
index 0000000..a86d1db
--- /dev/null
+++ b/doc/extsearch.doc
@@ -0,0 +1,320 @@
+/******************************************************************************
+ *
+ * Copyright (C) 1997-2012 by Dimitri van Heesch.
+ *
+ * Permission to use, copy, modify, and distribute this software and its
+ * documentation under the terms of the GNU General Public License is hereby
+ * granted. No representations are made about the suitability of this software
+ * for any purpose. It is provided "as is" without express or implied warranty.
+ * See the GNU General Public License for more details.
+ *
+ * Documents produced by Doxygen are derivative works derived from the
+ * input used in their production; they are not affected by this license.
+ *
+ */
+/*! \page extsearch External Indexing and Searching
+
+\section extsearch_intro Introduction
+
+With release 1.8.3, doxygen provides the ability to search through HTML using
+an external indexing tool and search engine.
+This has several advantages:
+- For large projects it can have significant performance advantages over
+ doxygen's built-in search engine, as doxygen uses a rather simple indexing
+ algorithm.
+- It allows combining the search data of multiple projects into one index,
+ allowing a global search across multiple doxygen projects.
+- It allows adding additional data to the search index, i.e. other web pages
+ not produced by doxygen.
+- The search engine needs to run on a web server, but clients can still browse
+ the web pages locally.
+
+To avoid that everyone has to start writing their own indexer and search
+engine, doxygen provides an example tool for each action: `doxyindexer`
+for indexing the data and `doxysearch.cgi` for searching through the index.
+
+The data flow is shown in the following diagram:
+\dot
+digraph Flow {
+ edge [fontname="helvetica",fontsize="10pt"];
+ node [shape=ellipse,fontname="helvetica",fontsize="10pt"];
+ doxygen;
+ doxyindexer;
+ doxysearch [label="doxysearch.cgi"];
+ browser [label="HTML page\nin browser"];
+ node [shape=note];
+ searchdata [label="searchdata.xml"];
+ searchindex [label="doxysearch.db"];
+
+ doxygen -> searchdata [label=" writes"];
+ searchdata -> doxyindexer [label=" reads"];
+ doxyindexer -> searchindex [label=" writes"];
+ searchindex -> doxysearch [label=" reads"];
+ doxysearch -> browser [label=" get results "];
+ browser -> doxysearch [label=" query "];
+}
+\enddot
+
+- `doxygen` produces the raw search data
+- `doxyindexer` indexes the data into a search database `doxysearch.db`
+- when a user performs a search from a doxygen generated HTML page,
+ the CGI binary `doxysearch.cgi` will be invoked.
+- the `doxysearch.cgi` tool will perform a query on the database and return
+ the results.
+- The browser will show the search results.
+
+\section extsearch_config Configuring
+
+The first step is to make the search engine available via a web server.
+If you use `doxysearch.cgi` this means making the
+<a href="http://en.wikipedia.org/wiki/Common_Gateway_Interface">CGI</a> binary
+available from the web server (i.e. be able to run it from a
+browser via an URL starting with http:)
+
+How to setup a web server is outside the scope of this document,
+but if you for instance have Apache installed, you could simply copy the
+`doxysearch.cgi` file from doxygen's `bin` dir to the `cgi-bin` of the
+Apache web server. Read the <a href="http://httpd.apache.org/docs/2.2/howto/cgi.html">apache documentation</a> for details.
+
+To test if `doxysearch.cgi` is accessible start your web browser and
+point to URL to the binary and add `?test` at the end
+
+ http://yoursite.com/path/to/cgi/doxysearch.cgi?test
+
+You should get the following message:
+
+ Test failed: cannot find search index doxysearch.db
+
+If you use Internet Explorer you may be prompted to download a file,
+which will then contain this message.
+
+Since we didn't create or install a doxysearch.db it is ok for the test to
+fail for this reason. How to correct this is discussed in the next section.
+
+Before continuing with the next section add the above
+URL (without the `?test` part) to the `SEARCHENGINE_URL` tag in
+doxygen's configuration file:
+
+ SEARCHENGINE_URL = http://yoursite.com/path/to/cgi/doxysearch.cgi
+
+\subsection extsearch_single Single project index
+
+To use the external search option, make sure the following options are enabled
+in doxygen's configuration file:
+
+ SEARCHENGINE = YES
+ SERVER_BASED_SEARCH = YES
+ EXTERNAL_SEARCH = YES
+
+This will make doxygen generate a file called `searchdata.xml` in the output
+directory (configured with \ref cfg_output_directory "OUTPUT_DIRECTORY").
+You can change the file name (and location) with the
+\ref cfg_searchdata_file "SEARCHDATA_FILE" option.
+
+The next step is to put the raw search data into an index for efficient
+searching. You can use `doxyindexer` for this. Simply run it from the command
+line:
+
+ doxyindexer searchdata.xml
+
+This will create a directory called `doxysearch.db` with some files in it.
+By default the directory will be created at the location from which doxyindexer
+was started, but you can change the directory using the `-o` option.
+
+Copy the `doxysearch.db` directory to the same directory as where
+the `doxysearch.cgi` is located and rerun the browser test by pointing
+the browser to
+
+ http://yoursite.com/path/to/cgi/doxysearch.cgi?test
+
+You should now get the following message:
+
+ Test successful.
+
+Now you should be enable to search for words and symbols from the HTML output.
+
+\subsection extsearch_multi Multi project index
+
+In case you have two doxygen projects A and B where B depends on A via a
+tag file, i.e. the configuration of project A says:
+
+ GENERATE_TAGFILES = A.tag
+
+and the configuration of project B has its dependency on A configured as
+follows:
+
+ TAGFILES = ../project_A/A.tag=../../project_A/html
+
+then it may be desirable to allow searching for words in both projects.
+
+To make this possible all that is needed is to combine the search data
+for both projects into one index, i.e. run
+
+ doxyindexer project_A/searchdata.xml project_B/searchdata.xml
+
+and then copy the resulting `doxysearch.db` to the directory where also
+`doxysearch.cgi` used by project B is located.
+
+In case you also want to link to search results in project B
+from the search page of project A (or in general
+between two projects that are otherwise unrelated),
+you need to give some additional information in order for doxygen to make
+the right links. This is what the
+\ref cfg_extra_search_mappings "EXTRA_SEARCH_MAPPINGS" option is for.
+
+Each project needs to have a tag file defined, i.e. in the above example
+involving project A and B, also project B should define a tag file:
+
+ GENERATE_TAGFILES = B.tag
+
+then project A can define the mapping as follows:
+
+ EXTRA_SEARCH_MAPPINGS = B.tag=../../project_B/html
+
+with this addition, projects A and B can share the same search database.
+
+@note The mapping defined by `EXTRA_SEARCH_MAPPINGS` is treated as an
+extension of the mappings already defined by `TAGFILES`. In case the same
+tag file is mentioned in both options, the one in `TAGFILES` is used.
+
+\section extsearch_update Updating the index
+
+When you modify the source code, you should re-run doxygen to get up to date
+documentation again. When using external searching you also need to update the
+search index by re-running `doxyindexer`. You could wrap the call to doxygen
+and doxyindexer together in a script to make this process easier.
+
+\section extsearch_api Programming interface
+
+Previous sections have assumed you use the tools `doxyindexer`
+and `doxysearch.cgi` to do the indexing and searching, but you could also
+write your own index and search tools if you like.
+
+For this 3 interfaces are important
+- The format of the input for the index tool.
+- The format of the input for the search engine.
+- The format of the output of search engine.
+
+The next subsections describe these interfaces in more detail.
+
+\subsection extsearch_api_index Indexer input format
+
+The search data produced by doxygen follows the
+<a href="http://wiki.apache.org/solr/UpdateXmlMessages">Solr XML index message</a>
+format.
+
+The input for the indexer is an XML file, which consists of one `<add>` tag containing
+multiple `<doc>` tags, which in turn contain multiple `<field>` tags.
+
+Here is an example of one doc node, which contains the search data and meta data for
+one method:
+
+ <add>
+ ...
+ <doc>
+ <field name="type">function</field>
+ <field name="name">QXmlReader::setDTDHandler</field>
+ <field name="args">(QXmlDTDHandler *handler)=0</field>
+ <field name="tag">qtools.tag</field>
+ <field name="url">de/df6/class_q_xml_reader.html#a0b24b1fe26a4c32a8032d68ee14d5dba</field>
+ <field name="keywords">setDTDHandler QXmlReader::setDTDHandler QXmlReader</field>
+ <field name="text">Sets the DTD handler to handler DTDHandler()</field>
+ </doc>
+ ...
+ </add>
+
+Each field has a name. The following field names are supported:
+- *type*: the type of the search entry; can be one of: source, function, slot,
+ signal, variable, typedef, enum, enumvalue, property, event, related,
+ friend, define, file, namespace, group, package, page, dir
+- *name*: the name of the search entry; for a method this is the qualified name of the method,
+ for a class it is the name of the class, etc.
+- *args*: the parameter list (in case of functions or methods)
+- *tag*: the name of the tag file used for this project.
+- *url*: the (relative) URL to the HTML documentation for this entry.
+- *keywords*: important words that are representative for the entry. When searching for such
+ keyword, this entry should get a higher rank in the search results.
+- *text*: the documentation associated with the item. Note that only words are present, no markup.
+
+@note Due to the potentially large size of the XML file, it is recommended to use a
+<a href="http://en.wikipedia.org/wiki/Simple_API_for_XML">SAX based parser</a> to process it.
+
+\subsection extsearch_api_search_in Search URL format
+
+When the search engine is invoked from a doxygen generated HTML page, a number of parameters are
+passed to via the <a href="http://en.wikipedia.org/wiki/Query_string">query string</a>.
+
+The following fields are passed:
+- *q*: the query text as entered by the user
+- *n*: the number of search results requested.
+- *p*: the number of search page for which to return the results. Each page has *n* values.
+- *cb*: the name of the callback function, used for JSON with padding, see the next section.
+
+From the complete list of search results, the range `[n*p - n*(p+1)-1]` should be returned.
+
+Here is an example of how a query looks like.
+
+ http://yoursite.com/path/to/cgi/doxysearch.cgi?q=list&n=20&p=1&cb=dummy
+
+It represents a query for the word 'list' (`q=list`) requesting 20 search results (`n=20`),
+starting with the result number 20 (`p=1`) and using callback 'dummy' (`cb=dummy`):
+
+
+@note The values are <a href="http://en.wikipedia.org/wiki/Percent-encoding">URL encoded</a> so they
+have to be decoded before they can be used.
+
+\subsection extsearch_api_search_out Search results format
+
+When invoking the search engine as shown in the previous subsection, it should reply with
+the results. The format of the reply is
+<a href="http://en.wikipedia.org/wiki/JSONP">JSON with padding</a>, which is basically
+a javascript struct wrapped in a function call. The name of function should be the name of
+the callback (as passed with the *cb* field in the query).
+
+With the example query as shown the previous subsection the main structure of the reply should
+look as follows:
+
+ dummy({
+ "hits":179,
+ "first":20,
+ "count":20,
+ "page":1,
+ "pages":9,
+ "query": "list",
+ "items":[
+ ...
+ ]})
+
+The fields have the following meaning:
+- *hits*: the total number of search results (could be more than was requested).
+- *first*: the index of first result returned: \f$\min(n*p,\mbox{\em hits})\f$.
+- *count*: the actual number of results returned: \f$\min(n,\mbox{\em hits}-\mbox{\em first})\f$
+- *page*: the page number of the result: \f$p\f$
+- *pages*: the total number of pages: \f$\lceil\frac{\mbox{\em hits}}{n}\rceil\f$.
+- *items*: an array containing the search data per result.
+
+Here is an example of how the element of the *items* array should look like:
+
+ {"type": "function",
+ "name": "QDir::entryInfoList(const QString &nameFilter, int filterSpec=DefaultFilter, int sortSpec=DefaultSort) const",
+ "tag": "qtools.tag",
+ "url": "d5/d8d/class_q_dir.html#a9439ea6b331957f38dbad981c4d050ef",
+ "fragments":[
+ "Returns a <span class=\"hl\">list</span> of QFileInfo objects for all files and directories...",
+ "... pointer to a QFileInfoList The <span class=\"hl\">list</span> is owned by the QDir object...",
+ "... to keep the entries of the <span class=\"hl\">list</span> after a subsequent call to this..."
+ ]
+ },
+
+The fields for such an item have the following meaning:
+- *type*: the type of the item, as found in the field with name "type" in the raw search data.
+- *name*: the name of the item, including the parameter list, as found in the fields with
+ name "name" and "args" in the raw search data.
+- *tag*: the name of the tag file, as found in the field with name "tag" in the raw search data.
+- *url*: the name of the (relative) URL to the documentation, as found in the field with name "url"
+ in the raw search data.
+- "fragments": an array with 0 or more fragments of text containing words that have been search for.
+ These words should be wrapped in `<span class="hl">` and `</span>` tags to highlight them
+ in the output.
+
+*/