diff options
author | William Joye <wjoye@cfa.harvard.edu> | 2016-10-25 20:57:49 (GMT) |
---|---|---|
committer | William Joye <wjoye@cfa.harvard.edu> | 2016-10-25 20:57:49 (GMT) |
commit | d1c4bf158203c4e8ec29fdeb83fd311e36320885 (patch) | |
tree | 15874534e282f67505ce4af5ba805a1ff70ec43e /funtools/doc/idx.html | |
parent | e19a18e035dc4d0e8e215f9b452bb9ef6f58b9d7 (diff) | |
parent | 339420dd5dd874c41f6bab5808291fb4036dd022 (diff) | |
download | blt-d1c4bf158203c4e8ec29fdeb83fd311e36320885.zip blt-d1c4bf158203c4e8ec29fdeb83fd311e36320885.tar.gz blt-d1c4bf158203c4e8ec29fdeb83fd311e36320885.tar.bz2 |
Merge commit '339420dd5dd874c41f6bab5808291fb4036dd022' as 'funtools'
Diffstat (limited to 'funtools/doc/idx.html')
-rw-r--r-- | funtools/doc/idx.html | 216 |
1 files changed, 216 insertions, 0 deletions
diff --git a/funtools/doc/idx.html b/funtools/doc/idx.html new file mode 100644 index 0000000..a08b10b --- /dev/null +++ b/funtools/doc/idx.html @@ -0,0 +1,216 @@ +<!-- =defdoc funidx funidx n --> +<HTML> +<HEAD> +<TITLE>Table Filtering with Indexes</TITLE> +</HEAD> +<BODY> + +<!-- =section funidx NAME --> +<H2><A NAME="funidx">Funidx: Using Indexes to Filter Rows in a Table</A></H2> + +<!-- =section funidx SYNOPSIS --> +<H2>Summary</H2> +<P> +This document contains a summary of the user interface for +filtering rows in binary tables with indexes. + +<!-- =section funidx DESCRIPTION --> +<H2>Description</H2> +<P> +Funtools <A HREF="./filters.html">Table Filtering</A> allows rows in a +table to be selected based on the values of one or more columns in the +row. Because the actual filter code is compiled on the fly, it is very +efficient. However, for very large files (hundreds of Mb or larger), +evaluating the filter expression on each row can take a long time. Therefore, +funtools supports index files for columns, which are used automatically during +filtering to reduce dramatically the number of row evaluations performed. +The speed increase for indexed filtering can be an order of magnitude or +more, depending on the size of the file. + +<P> +The <A HREF="./programs.html#funindex">funindex</A> program creates an +index on one or more columns in a binary table. For example, to create an index +for the column pi in the file huge.fits, use: +<PRE> + funindex huge.fits pi +</PRE> +This will create an index named huge_pi.idx. + +<P> +When a filter expression is initialized for row evaluation, funtools +looks for an index file for each column in the filter expression. If +found, and if the file modification date of the index file is later +than that of the data file, then the index will be used to reduce the +number of rows that are evaluated in the filter. When +<A HREF="./regions.html">Spatial Region Filtering</A> is part of the +expression, the columns associated with the region are checked for index +files. + +<P> +If an index file is not available for a given column, then in general, +all rows must be checked when that column is part of a filter +expression. This is not true, however, when a non-indexed column is +part of an AND expression. In this case, only the rows that pass the +other part of the AND expression need to be checked. Thus, in some cases, +filtering speed can increase significantly even if all columns are not +indexed. + +<P> +Also note that certain types of filter expression syntax cannot make +use of indices. For example, calling functions with column names as +arguments implies that all rows must be checked against the function +value. Once again, however, if this function is part of an AND +expression, then a significant improvement in speed still is possible +if the other part of the AND expression is indexed. + +<P> +For example, note below the dramatic speedup in searching a 1 Gb +file using an AND filter, even when one of the columns (pha) has no +index: + +<PRE> + time fundisp \ + huge.fits'[idx_activate=0,idx_debug=1,pha=2348&&cir 4000 4000 1]' \ + "x y pha" + x y pha + ---------- ----------- ---------- + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 42.36u 13.07s 6:42.89 13.7% + + time fundisp \ + huge.fits'[idx_activate=1,idx_debug=1,pha=2348&&cir 4000 4000 1]' \ + "x y pha" + x y pha + ---------- ----------- ---------- + idxeq: [INDEF] + idxand sort: x[ROW 8037025:8070128] y[ROW 5757665:5792352] + idxand(1): INDEF [IDX_OR_SORT] + idxall(1): [IDX_OR_SORT] + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 3999.48 4000.47 2348 + 1.55u 0.37s 1:19.80 2.4% +</PRE> + +When all columns are indexed, the increase in speed can be even more dramatic: +<PRE> + time fundisp \ + huge.fits'[idx_activate=0,idx_debug=1,pi=770&&cir 4000 4000 1]' \ + "x y pi" + x y pi + ---------- ----------- ---------- + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 42.60u 12.63s 7:28.63 12.3% + + time fundisp \ + huge.fits'[idx_activate=1,idx_debug=1,pi=770&&cir 4000 4000 1]' \ + "x y pi" + x y pi + ---------- ----------- ---------- + idxeq: pi start=9473025,stop=9492240 => pi[ROW 9473025:9492240] + idxand sort: x[ROW 8037025:8070128] y[ROW 5757665:5792352] + idxor sort/merge: pi[ROW 9473025:9492240] [IDX_OR_SORT] + idxmerge(5): [IDX_OR_SORT] pi[ROW] + idxall(1): [IDX_OR_SORT] + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 3999.48 4000.47 770 + 1.67u 0.30s 0:24.76 7.9% +</PRE> + +<P> +The miracle of indexed filtering (and indeed, of any indexing) is the +speed of the binary search on the index, which is of order log2(n) +instead of n. (The funtools binary search method is taken from +http://www.tbray.org/ongoing/When/200x/2003/03/22/Binary, to whom +grateful acknowledgement is made.) This means that the larger the +file, the better the performance. Conversely, it also means that for +small files, using an index (and the overhead involved) can slow +filtering down somewhat. Our tests indicate that on a file containing +a few tens of thousands of rows, indexed filtering can be 10 to 20 +percent slower than non-indexed filtering. Of course, your mileage +will vary with conditions (disk access speed, amount of available +memory, process load, etc.) + +<p> +Any problem encountered during index processing will result in +indexing being turned off, and replaced by filtering all rows. You can turn +filtering off manually by setting the idx_activate variable to 0 (in a filter +expression) or the FILTER_IDX_ACTIVATE environment variable to 0 (in the global +environment). Debugging output showing how the indexes are being processed can +be displayed to stderr by setting the idx_debug variable to 1 (in a filter +expression) or the FILTER_IDX_DEBUG environment variable to 1 (in the global +environment). + +<p> +Currently, indexed filtering only works with FITS binary tables and raw +event files. It does not work with text files. This restriction might be +removed in a future release. + +<!-- =section funidx SEE ALSO --> +<!-- =text See funtools(n) for a list of Funtools help pages --> +<!-- =stop --> + +<P> +<A HREF="./help.html">Go to Funtools Help Index</A> + +<H5>Last updated: August 3, 2007</H5> + +</BODY> +</HTML> |