diff options
Diffstat (limited to 'funtools/doc/pod/funidx.pod')
-rw-r--r-- | funtools/doc/pod/funidx.pod | 224 |
1 files changed, 0 insertions, 224 deletions
diff --git a/funtools/doc/pod/funidx.pod b/funtools/doc/pod/funidx.pod deleted file mode 100644 index 37bb712..0000000 --- a/funtools/doc/pod/funidx.pod +++ /dev/null @@ -1,224 +0,0 @@ -=pod - -=head1 NAME - - - -B<Funidx: Using Indexes to Filter Rows in a Table> - - - -=head1 SYNOPSIS - - - - - -This document contains a summary of the user interface for -filtering rows in binary tables with indexes. - - - -=head1 DESCRIPTION - - - - - -Funtools Table Filtering allows rows in a -table to be selected based on the values of one or more columns in the -row. Because the actual filter code is compiled on the fly, it is very -efficient. However, for very large files (hundreds of Mb or larger), -evaluating the filter expression on each row can take a long time. Therefore, -funtools supports index files for columns, which are used automatically during -filtering to reduce dramatically the number of row evaluations performed. -The speed increase for indexed filtering can be an order of magnitude or -more, depending on the size of the file. - - -The funindex program creates an -index on one or more columns in a binary table. For example, to create an index -for the column pi in the file huge.fits, use: - - funindex huge.fits pi - -This will create an index named huge_pi.idx. - - -When a filter expression is initialized for row evaluation, funtools -looks for an index file for each column in the filter expression. If -found, and if the file modification date of the index file is later -than that of the data file, then the index will be used to reduce the -number of rows that are evaluated in the filter. When -Spatial Region Filtering is part of the -expression, the columns associated with the region are checked for index -files. - - -If an index file is not available for a given column, then in general, -all rows must be checked when that column is part of a filter -expression. This is not true, however, when a non-indexed column is -part of an AND expression. In this case, only the rows that pass the -other part of the AND expression need to be checked. Thus, in some cases, -filtering speed can increase significantly even if all columns are not -indexed. - - -Also note that certain types of filter expression syntax cannot make -use of indices. For example, calling functions with column names as -arguments implies that all rows must be checked against the function -value. Once again, however, if this function is part of an AND -expression, then a significant improvement in speed still is possible -if the other part of the AND expression is indexed. - - -For example, note below the dramatic speedup in searching a 1 Gb -file using an AND filter, even when one of the columns (pha) has no -index: - - - time fundisp \ - huge.fits'[idx_activate=0,idx_debug=1,pha=2348&&cir 4000 4000 1]' \ - "x y pha" - x y pha - ---------- ----------- ---------- - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 42.36u 13.07s 6:42.89 13.7% - - time fundisp \ - huge.fits'[idx_activate=1,idx_debug=1,pha=2348&&cir 4000 4000 1]' \ - "x y pha" - x y pha - ---------- ----------- ---------- - idxeq: [INDEF] - idxand sort: x[ROW 8037025:8070128] y[ROW 5757665:5792352] - idxand(1): INDEF [IDX_OR_SORT] - idxall(1): [IDX_OR_SORT] - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 3999.48 4000.47 2348 - 1.55u 0.37s 1:19.80 2.4% - - -When all columns are indexed, the increase in speed can be even more dramatic: - - time fundisp \ - huge.fits'[idx_activate=0,idx_debug=1,pi=770&&cir 4000 4000 1]' \ - "x y pi" - x y pi - ---------- ----------- ---------- - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 42.60u 12.63s 7:28.63 12.3% - - time fundisp \ - huge.fits'[idx_activate=1,idx_debug=1,pi=770&&cir 4000 4000 1]' \ - "x y pi" - x y pi - ---------- ----------- ---------- - idxeq: pi start=9473025,stop=9492240 => pi[ROW 9473025:9492240] - idxand sort: x[ROW 8037025:8070128] y[ROW 5757665:5792352] - idxor sort/merge: pi[ROW 9473025:9492240] [IDX_OR_SORT] - idxmerge(5): [IDX_OR_SORT] pi[ROW] - idxall(1): [IDX_OR_SORT] - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 3999.48 4000.47 770 - 1.67u 0.30s 0:24.76 7.9% - - - -The miracle of indexed filtering (and indeed, of any indexing) is the -speed of the binary search on the index, which is of order log2(n) -instead of n. (The funtools binary search method is taken from -http://www.tbray.org/ongoing/When/200x/2003/03/22/Binary, to whom -grateful acknowledgement is made.) This means that the larger the -file, the better the performance. Conversely, it also means that for -small files, using an index (and the overhead involved) can slow -filtering down somewhat. Our tests indicate that on a file containing -a few tens of thousands of rows, indexed filtering can be 10 to 20 -percent slower than non-indexed filtering. Of course, your mileage -will vary with conditions (disk access speed, amount of available -memory, process load, etc.) - - -Any problem encountered during index processing will result in -indexing being turned off, and replaced by filtering all rows. You can turn -filtering off manually by setting the idx_activate variable to 0 (in a filter -expression) or the FILTER_IDX_ACTIVATE environment variable to 0 (in the global -environment). Debugging output showing how the indexes are being processed can -be displayed to stderr by setting the idx_debug variable to 1 (in a filter -expression) or the FILTER_IDX_DEBUG environment variable to 1 (in the global -environment). - - -Currently, indexed filtering only works with FITS binary tables and raw -event files. It does not work with text files. This restriction might be -removed in a future release. - - - -=head1 SEE ALSO - - - -See funtools(n) for a list of Funtools help pages - - - -=cut |