summaryrefslogtreecommitdiffstats
path: root/funtools/doc/pod/funcalc.pod
diff options
context:
space:
mode:
Diffstat (limited to 'funtools/doc/pod/funcalc.pod')
-rw-r--r--funtools/doc/pod/funcalc.pod572
1 files changed, 572 insertions, 0 deletions
diff --git a/funtools/doc/pod/funcalc.pod b/funtools/doc/pod/funcalc.pod
new file mode 100644
index 0000000..f63cfc8
--- /dev/null
+++ b/funtools/doc/pod/funcalc.pod
@@ -0,0 +1,572 @@
+=pod
+
+=head1 NAME
+
+
+
+B<funcalc - Funtools calculator (for binary tables)>
+
+
+=head1 SYNOPSIS
+
+
+
+
+
+funcalc [-n] [-a argstr] [-e expr] [-f file] [-l link] [-p prog] <iname> [oname [columns]]
+
+
+
+
+
+=head1 OPTIONS
+
+
+
+
+
+ -a argstr # user arguments to pass to the compiled program
+ -e expr # funcalc expression
+ -f file # file containing funcalc expression
+ -l libs # libs to add to link command
+ -n # output generated code instead of compiling and executing
+ -p prog # generate named program, no execution
+ -u # die if any variable is undeclared (don't auto-declare)
+
+
+
+
+=head1 DESCRIPTION
+
+
+
+
+B<funcalc> is a calculator program that allows arbitrary
+expressions to be constructed, compiled, and executed on columns in a
+Funtools table (FITS binary table or raw event file). It works by
+integrating user-supplied expression(s) into a template C program,
+then compiling and executing the program. B<funcalc> expressions
+are C statements, although some important simplifications (such
+as automatic declaration of variables) are supported.
+
+
+B<funcalc> expressions can be specified in three ways: on the
+command line using the B<-e [expression]> switch, in a file using
+the B<-f [file]> switch, or from stdin (if neither B<-e> nor
+B<-f> is specified). Of course a file containing B<funcalc>
+expressions can be read from stdin.
+
+
+Each invocation of B<funcalc> requires an input Funtools table
+file to be specified as the first command line argument. The output
+Funtools table file is the second optional argument. It is needed only
+if an output FITS file is being created (i.e., in cases where the
+B<funcalc> expression only prints values, no output file is
+needed). If input and output file are both specified, a third optional
+argument can specify the list of columns to activate (using
+FunColumnActivate()). Note
+that B<funcalc> determines whether or not to generate code for
+writing an output file based on the presence or absence of an
+output file argument.
+
+
+A B<funcalc> expression executes on each row of a table and
+consists of one or more C statements that operate on the columns of
+that row (possibly using temporary variables). Within an expression,
+reference is made to a column of the B<current> row using the C
+struct syntax B<cur->[colname]>, e.g. cur->x, cur->pha, etc.
+Local scalar variables can be defined using C declarations at very the
+beginning of the expression, or else they can be defined automatically
+by B<funcalc> (to be of type double). Thus, for example, a swap of
+columns x and y in a table can be performed using either of the
+following equivalent B<funcalc> expressions:
+
+
+ double temp;
+ temp = cur->x;
+ cur->x = cur->y;
+ cur->y = temp;
+
+
+or:
+
+
+ temp = cur->x;
+ cur->x = cur->y;
+ cur->y = temp;
+
+
+When this expression is executed using a command such as:
+
+ funcalc -f swap.expr itest.ev otest.ev
+
+the resulting file will have values of the x and y columns swapped.
+
+
+By default, the data type of the variable for a column is the same as
+the data type of the column as stored in the file. This can be changed
+by appending ":[dtype]" to the first reference to that column. In the
+example above, to force x and y to be output as doubles, specify the
+type 'D' explicitly:
+
+ temp = cur->x:D;
+ cur->x = cur->y:D;
+ cur->y = temp;
+
+
+Data type specifiers follow standard FITS table syntax for defining
+columns using TFORM:
+
+
+=over 4
+
+
+
+
+=item *
+
+A: ASCII characters
+
+
+=item *
+
+B: unsigned 8-bit char
+
+
+=item *
+
+I: signed 16-bit int
+
+
+=item *
+
+U: unsigned 16-bit int (not standard FITS)
+
+
+=item *
+
+J: signed 32-bit int
+
+
+=item *
+
+V: unsigned 32-bit int (not standard FITS)
+
+
+=item *
+
+E: 32-bit float
+
+
+=item *
+
+D: 64-bit float
+
+
+=item *
+
+X: bits (treated as an array of chars)
+
+
+=back
+
+
+Note that only the first reference to a column should contain the
+explicit data type specifier.
+
+
+Of course, it is important to handle the data type of the columns
+correctly. One of the most frequent cause of error in B<funcalc>
+programming is the implicit use of the wrong data type for a column in
+expression. For example, the calculation:
+
+ dx = (cur->x - cur->y)/(cur->x + cur->y);
+
+usually needs to be performed using floating point arithmetic. In
+cases where the x and y columns are integers, this can be done by
+reading the columns as doubles using an explicit type specification:
+
+ dx = (cur->x:D - cur->y:D)/(cur->x + cur->y);
+
+
+Alternatively, it can be done using C type-casting in the expression:
+
+ dx = ((double)cur->x - (double)cur->y)/((double)cur->x + (double)cur->y);
+
+
+
+In addition to accessing columns in the current row, reference also
+can be made to the B<previous> row using B<prev->[colname]>,
+and to the B<next> row using B<next->[colname]>. Note that if
+B<prev->[colname]> is specified in the B<funcalc>
+expression, the very first row is not processed. If
+B<next->[colname]> is specified in the B<funcalc>
+expression, the very last row is not processed. In this way,
+B<prev> and B<next> are guaranteed always to point to valid
+rows. For example, to print out the values of the current x column
+and the previous y column, use the C fprintf function in a
+B<funcalc> expression:
+
+ fprintf(stdout, "%d %d\n", cur->x, prev->y);
+
+
+
+New columns can be specified using the same B<cur->[colname]>
+syntax by appending the column type (and optional tlmin/tlmax/binsiz
+specifiers), separated by colons. For example, cur->avg:D will define
+a new column of type double. Type specifiers are the same those
+used above to specify new data types for existing columns.
+
+
+For example, to create and output a new column that is the average value of the
+x and y columns, a new "avg" column can be defined:
+
+ cur->avg:D = (cur->x + cur->y)/2.0
+
+Note that the final ';' is not required for single-line expressions.
+
+
+As with FITS TFORM data type specification, the column data type
+specifier can be preceded by a numeric count to define an array, e.g.,
+"10I" means a vector of 10 short ints, "2E" means two single precision
+floats, etc. A new column only needs to be defined once in a
+B<funcalc> expression, after which it can be used without
+re-specifying the type. This includes reference to elements of a
+column array:
+
+
+ cur->avg[0]:2D = (cur->x + cur->y)/2.0;
+ cur->avg[1] = (cur->x - cur->y)/2.0;
+
+
+
+The 'X' (bits) data type is treated as a char array of dimension
+(numeric_count/8), i.e., 16X is processed as a 2-byte char array. Each
+8-bit array element is accessed separately:
+
+ cur->stat[0]:16X = 1;
+ cur->stat[1] = 2;
+
+Here, a 16-bit column is created with the MSB is set to 1 and the LSB set to 2.
+
+
+By default, all processed rows are written to the specified output
+file. If you want to skip writing certain rows, simply execute the C
+"continue" statement at the end of the B<funcalc> expression,
+since the writing of the row is performed immediately after the
+expression is executed. For example, to skip writing rows whose
+average is the same as the current x value:
+
+
+ cur->avg[0]:2D = (cur->x + cur->y)/2.0;
+ cur->avg[1] = (cur->x - cur->y)/2.0;
+ if( cur->avg[0] == cur->x )
+ continue;
+
+
+
+If no output file argument is specified on the B<funcalc> command
+line, no output file is opened and no rows are written. This is useful
+in expressions that simply print output results instead of generating
+a new file:
+
+ fpv = (cur->av3:D-cur->av1:D)/(cur->av1+cur->av2:D+cur->av3);
+ fbv = cur->av2/(cur->av1+cur->av2+cur->av3);
+ fpu = ((double)cur->au3-cur->au1)/((double)cur->au1+cur->au2+cur->au3);
+ fbu = cur->au2/(double)(cur->au1+cur->au2+cur->au3);
+ fprintf(stdout, "%f\t%f\t%f\t%f\n", fpv, fbv, fpu, fbu);
+
+In the above example, we use both explicit type specification
+(for "av" columns) and type casting (for "au" columns) to ensure that
+all operations are performed in double precision.
+
+
+When an output file is specified, the selected input table is
+processed and output rows are copied to the output file. Note that
+the output file can be specified as "stdout" in order to write the
+output rows to the standard output. If the output file argument is
+passed, an optional third argument also can be passed to specify which
+columns to process.
+
+
+In a FITS binary table, it sometimes is desirable to copy all of the
+other FITS extensions to the output file as well. This can be done by
+appending a '+' sign to the name of the extension in the input file
+name. See B<funtable> for a related example.
+
+
+B<funcalc> works by integrating the user-specified expression
+into a template C program called tabcalc.c.
+The completed program then is compiled and executed. Variable
+declarations that begin the B<funcalc> expression are placed in
+the local declaration section of the template main program. All other
+lines are placed in the template main program's inner processing
+loop. Other details of program generation are handled
+automatically. For example, column specifiers are analyzed to build a
+C struct for processing rows, which is passed to
+FunColumnSelect() and used
+in FunTableRowGet(). If
+an unknown variable is used in the expression, resulting in a
+compilation error, the program build is retried after defining the
+unknown variable to be of type double.
+
+
+Normally, B<funcalc> expression code is added to
+B<funcalc> row processing loop. It is possible to add code
+to other parts of the program by placing this code inside
+special directives of the form:
+
+ [directive name]
+ ... code goes here ...
+ end
+
+
+The directives are:
+
+
+=over 4
+
+
+
+
+=item *
+
+B<global> add code and declarations in global space, before the main routine.
+
+
+
+=item *
+
+B<local> add declarations (and code) just after the local declarations in
+main
+
+
+
+=item *
+
+B<before> add code just before entering the main row processing loop
+
+
+
+=item *
+
+B<after> add code just after exiting the main row processing loop
+
+
+=back
+
+
+
+Thus, the following B<funcalc> expression will declare global
+variables and make subroutine calls just before and just after the
+main processing loop:
+
+ global
+ double v1, v2;
+ double init(void);
+ double finish(double v);
+ end
+ before
+ v1 = init();
+ end
+ ... process rows, with calculations using v1 ...
+ after
+ v2 = finish(v1);
+ if( v2 < 0.0 ){
+ fprintf(stderr, "processing failed %g -> %g\n", v1, v2);
+ exit(1);
+ }
+ end
+
+Routines such as init() and finish() above are passed to the generated
+program for linking using the B<-l [link directives ...]>
+switch. The string specified by this switch will be added to the link
+line used to build the program (before the funtools library). For
+example, assuming that init() and finish() are in the library
+libmysubs.a in the /opt/special/lib directory, use:
+
+ funcalc -l "-L/opt/special/lib -lmysubs" ...
+
+
+
+User arguments can be passed to a compiled funcalc program using a string
+argument to the "-a" switch. The string should contain all of the
+user arguments. For example, to pass the integers 1 and 2, use:
+
+ funcalc -a "1 2" ...
+
+The arguments are stored in an internal array and are accessed as
+strings via the ARGV(n) macro. For example, consider the following
+expression:
+
+ local
+ int pmin, pmax;
+ end
+
+ before
+ pmin=atoi(ARGV(0));
+ pmax=atoi(ARGV(1));
+ end
+
+ if( (cur->pha >= pmin) && (cur->pha <= pmax) )
+ fprintf(stderr, "%d %d %d\n", cur->x, cur->y, cur->pha);
+
+This expression will print out x, y, and pha values for all rows in which
+the pha value is between the two user-input values:
+
+ funcalc -a '1 12' -f foo snr.ev'[cir 512 512 .1]'
+ 512 512 6
+ 512 512 8
+ 512 512 5
+ 512 512 5
+ 512 512 8
+
+ funcalc -a '5 6' -f foo snr.ev'[cir 512 512 .1]'
+ 512 512 6
+ 512 512 5
+ 512 512 5
+
+
+
+Note that it is the user's responsibility to ensure that the correct
+number of arguments are passed. The ARGV(n) macro returns a NULL if a
+requested argument is outside the limits of the actual number of args,
+usually resulting in a SEGV if processed blindly. To check the
+argument count, use the ARGC macro:
+
+ local
+ long int seed=1;
+ double limit=0.8;
+ end
+
+ before
+ if( ARGC >= 1 ) seed = atol(ARGV(0));
+ if( ARGC >= 2 ) limit = atof(ARGV(1));
+ srand48(seed);
+ end
+
+ if ( drand48() > limit ) continue;
+
+
+
+The macro WRITE_ROW expands to the FunTableRowPut() call that writes
+the current row. It can be used to write the row more than once. In
+addition, the macro NROW expands to the row number currently being
+processed. Use of these two macros is shown in the following example:
+
+ if( cur->pha:I == cur->pi:I ) continue;
+ a = cur->pha;
+ cur->pha = cur->pi;
+ cur->pi = a;
+ cur->AVG:E = (cur->pha+cur->pi)/2.0;
+ cur->NR:I = NROW;
+ if( NROW < 10 ) WRITE_ROW;
+
+
+
+If the B<-p [prog]> switch is specified, the expression is not
+executed. Rather, the generated executable is saved with the specified
+program name for later use.
+
+
+If the B<-n> switch is specified, the expression is not
+executed. Rather, the generated code is written to stdout. This is
+especially useful if you want to generate a skeleton file and add your
+own code, or if you need to check compilation errors. Note that the
+comment at the start of the output gives the compiler command needed
+to build the program on that platform. (The command can change from
+platform to platform because of the use of different libraries,
+compiler switches, etc.)
+
+
+As mentioned previously, B<funcalc> will declare a scalar
+variable automatically (as a double) if that variable has been used
+but not declared. This facility is implemented using a sed script
+named funcalc.sed, which processes the
+compiler output to sense an undeclared variable error. This script
+has been seeded with the appropriate error information for gcc, and for
+cc on Solaris, DecAlpha, and SGI platforms. If you find that automatic
+declaration of scalars is not working on your platform, check this sed
+script; it might be necessary to add to or edit some of the error
+messages it senses.
+
+
+In order to keep the lexical analysis of B<funcalc> expressions
+(reasonably) simple, we chose to accept some limitations on how
+accurately C comments, spaces, and new-lines are placed in the
+generated program. In particular, comments associated with local
+variables declared at the beginning of an expression (i.e., not in a
+B<local...end> block) will usually end up in the inner loop, not
+with the local declarations:
+
+ /* this comment will end up in the wrong place (i.e, inner loop) */
+ double a; /* also in wrong place */
+ /* this will be in the the right place (inner loop) */
+ if( cur->x:D == cur->y:D ) continue; /* also in right place */
+ a = cur->x;
+ cur->x = cur->y;
+ cur->y = a;
+ cur->avg:E = (cur->x+cur->y)/2.0;
+
+Similarly, spaces and new-lines sometimes are omitted or added in a
+seemingly arbitrary manner. Of course, none of these stylistic
+blemishes affect the correctness of the generated code.
+
+
+Because B<funcalc> must analyze the user expression using the data
+file(s) passed on the command line, the input file(s) must be opened
+and read twice: once during program generation and once during
+execution. As a result, it is not possible to use stdin for the
+input file: B<funcalc> cannot be used as a filter. We will
+consider removing this restriction at a later time.
+
+
+Along with C comments, B<funcalc> expressions can have one-line
+internal comments that are not passed on to the generated C
+program. These internal comment start with the B<#> character and
+continue up to the new-line:
+
+ double a; # this is not passed to the generated C file
+ # nor is this
+ a = cur->x;
+ cur->x = cur->y;
+ cur->y = a;
+ /* this comment is passed to the C file */
+ cur->avg:E = (cur->x+cur->y)/2.0;
+
+
+
+As previously mentioned, input columns normally are identified by
+their being used within the inner event loop. There are rare cases
+where you might want to read a column and process it outside the main
+loop. For example, qsort might use a column in its sort comparison
+routine that is not processed inside the inner loop (and therefore not
+implicitly specified as a column to be read). To ensure that such a
+column is read by the event loop, use the B<explicit> keyword.
+The arguments to this keyword specify columns that should be read into
+the input record structure even though they are not mentioned in the
+inner loop. For example:
+
+ explicit pi pha
+
+will ensure that the pi and pha columns are read for each row,
+even if they are not processed in the inner event loop. The B<explicit>
+statement can be placed anywhere.
+
+
+Finally, note that B<funcalc> currently works on expressions
+involving FITS binary tables and raw event files. We will consider
+adding support for image expressions at a later point, if there is
+demand for such support from the community.
+
+
+
+
+=head1 SEE ALSO
+
+
+
+See funtools(n) for a list of Funtools help pages
+
+
+=cut