summaryrefslogtreecommitdiffstats
path: root/funtools/doc/pod/funindexes.pod
diff options
context:
space:
mode:
Diffstat (limited to 'funtools/doc/pod/funindexes.pod')
-rw-r--r--funtools/doc/pod/funindexes.pod215
1 files changed, 0 insertions, 215 deletions
diff --git a/funtools/doc/pod/funindexes.pod b/funtools/doc/pod/funindexes.pod
deleted file mode 100644
index e4cabaa..0000000
--- a/funtools/doc/pod/funindexes.pod
+++ /dev/null
@@ -1,215 +0,0 @@
-=pod
-
-=head1 NAME
-
-
-
-B<Funindexes: Using Indexes to Filtering Rows in a Table>
-
-
-
-=head1 SYNOPSIS
-
-
-
-
-
-This document contains a summary of the user interface for
-filtering rows in binary tables with indexes.
-
-
-
-=head1 DESCRIPTION
-
-
-
-
-
-Funtools Table Filtering allows rows in a
-table to be selected based on the values of one or more columns in the
-row. Because the actual filter code is compiled on the fly, it is very
-efficient. For very large files (hundreds of Mb or larger), however,
-evaluating the filter expression on each row can take a long time. Therefore,
-funtools supports index files for columns, which are used automatically during
-filtering to reduce dramatically the number of row evaluations performed.
-The speed increase for indexed filtering can be an order of magnitude or
-more, depending on the size of the file.
-
-
-The funindex program creates a
-index on column in a binary table. For example, to create an index
-for the column pi in the file huge.fits, use:
-
- funindex huge.fits pi
-
-This will create an index named huge_pi.idx.
-
-
-When a filter expression is initialized for row evaluation, funtools
-looks for an index file for each column in the filter expression. If
-found, and if the file modification date of the index file is later
-than that of the data file, then the index will be used to reduce the
-number of rows that are evaluated in the filter. When <A
-HREF="./regions.html">Spatial Region Filtering is part of the
-expression, the columns associated with the region checked for index
-files.
-
-
-If an index file is not available for a given column, then in general,
-all rows must be checked when that column is part of a filter
-expression. This is not true, however, when a non-indexed column is
-part of an AND expression. In this case, only the rows that pass the
-other part of the AND expression need to be checked. Thus, in some cases,
-filtering speed can increase significantly even if all columns are not
-indexed.
-
-
-Also note that certain types of filter expression syntax cannot make
-use of indices. For example, calling functions with column names as
-arguments implies that all rows must be checked against the function
-value. Once again, however, if this function is part of an AND
-expression, then a significant improvement in speed still is possible
-if the other part of the AND expression is indexed.
-
-
-As an example, note below the dramatic speedup in searching a 1 Gb
-file using an AND filter, even when one of the columns (pha) has no
-index:
-
-
-bynars-16: time fundisp huge.fits'[idx_use=0,idx_debug=1,pha=2348&&cir 4000 4000 1]' "x y pha"
- x y pha
------------ ----------- ----------
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
-42.36u 13.07s 6:42.89 13.7%
-
-bynars-17: time fundisp huge.fits'[idx_use=1,idx_debug=1,pha=2348&&cir 4000 4000 1]' "x y pha"
- x y pha
------------ ----------- ----------
-idxeq: [INDEF]
-idxand sort: x[ROW 8037025:8070128] y[ROW 5757665:5792352]
-idxand(1): INDEF [IDX_OR_SORT]
-idxall(1): [IDX_OR_SORT]
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
- 3999.48 4000.47 2348
-1.55u 0.37s 1:19.80 2.4%
-
-
-When all columns are indexed, the increase in speed can be even more dramatic:
-
-bynars-20: time fundisp huge.fits'[idx_use=0,idx_debug=1,pi=770&&cir 4000 4000 1]' "x y pi"
- x y pi
------------ ----------- ----------
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
-42.60u 12.63s 7:28.63 12.3%
-
-bynars-21: time fundisp huge.fits'[idx_use=1,idx_debug=1,pi=770&&cir 4000 4000 1]' "x y pi"
- x y pi
------------ ----------- ----------
-idxeq: pi start=9473025,stop=9492240 => pi[ROW 9473025:9492240]
-idxand sort: x[ROW 8037025:8070128] y[ROW 5757665:5792352]
-idxor sort/merge: pi[ROW 9473025:9492240] [IDX_OR_SORT]
-idxmerge(5): [IDX_OR_SORT] pi[ROW]
-idxall(1): [IDX_OR_SORT]
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
- 3999.48 4000.47 770
-1.67u 0.30s 0:24.76 7.9%
-
-
-
-The miracle of indexed filtering (and indeed, of any indexing) is due
-to the speed of the binary search on the index, which is of order
-log2(n) instead of n. (The funtools binary search method is taken from
-http://www.tbray.org/ongoing/When/200x/2003/03/22/Binary, to whom
-grateful acknowledgement is made.) This means that the larger the
-file, the better the performance. Conversely, it also means that
-for small files, using an index (and the overhead involved) can slow
-filtering down somewhat. Our tests indicate that on a file containing
-a few tens of thousands of rows, indexed filtering can be 10-20
-percent slower. Of course, your mileage will vary with conditions
-(disk access speed, amount of available memory, process load, etc.)
-
-
-Any problem encountered during index processing is supposed to result in
-indexing being turned off, replaced by filtering all rows. You can turn
-filtering off manually by setting the idx_use variable to 0 (in a filter
-expression) or the FILTER_IDX_USE environment variable to 0 (in the global
-environment). Debugging output showing how the indexes are being processed can
-be displayed to stderr by setting the idx_debug variable to 1 (in a filter
-expression) or the FILTER_IDX_DEBUG environment variable to 1 (in the global
-environment).
-
-
-Currently, indexed filtering only works with FITS binary tables and raw
-event files. It does not work with text files. This restriction might be
-removed in a future release.
-
-
-
-=head1 SEE ALSO
-
-
-
-See funtools(n) for a list of Funtools help pages
-
-
-
-=cut