diff options
Diffstat (limited to 'doc/pod/funcalc.pod')
-rw-r--r-- | doc/pod/funcalc.pod | 572 |
1 files changed, 572 insertions, 0 deletions
diff --git a/doc/pod/funcalc.pod b/doc/pod/funcalc.pod new file mode 100644 index 0000000..f63cfc8 --- /dev/null +++ b/doc/pod/funcalc.pod @@ -0,0 +1,572 @@ +=pod + +=head1 NAME + + + +B<funcalc - Funtools calculator (for binary tables)> + + +=head1 SYNOPSIS + + + + + +funcalc [-n] [-a argstr] [-e expr] [-f file] [-l link] [-p prog] <iname> [oname [columns]] + + + + + +=head1 OPTIONS + + + + + + -a argstr # user arguments to pass to the compiled program + -e expr # funcalc expression + -f file # file containing funcalc expression + -l libs # libs to add to link command + -n # output generated code instead of compiling and executing + -p prog # generate named program, no execution + -u # die if any variable is undeclared (don't auto-declare) + + + + +=head1 DESCRIPTION + + + + +B<funcalc> is a calculator program that allows arbitrary +expressions to be constructed, compiled, and executed on columns in a +Funtools table (FITS binary table or raw event file). It works by +integrating user-supplied expression(s) into a template C program, +then compiling and executing the program. B<funcalc> expressions +are C statements, although some important simplifications (such +as automatic declaration of variables) are supported. + + +B<funcalc> expressions can be specified in three ways: on the +command line using the B<-e [expression]> switch, in a file using +the B<-f [file]> switch, or from stdin (if neither B<-e> nor +B<-f> is specified). Of course a file containing B<funcalc> +expressions can be read from stdin. + + +Each invocation of B<funcalc> requires an input Funtools table +file to be specified as the first command line argument. The output +Funtools table file is the second optional argument. It is needed only +if an output FITS file is being created (i.e., in cases where the +B<funcalc> expression only prints values, no output file is +needed). If input and output file are both specified, a third optional +argument can specify the list of columns to activate (using +FunColumnActivate()). Note +that B<funcalc> determines whether or not to generate code for +writing an output file based on the presence or absence of an +output file argument. + + +A B<funcalc> expression executes on each row of a table and +consists of one or more C statements that operate on the columns of +that row (possibly using temporary variables). Within an expression, +reference is made to a column of the B<current> row using the C +struct syntax B<cur->[colname]>, e.g. cur->x, cur->pha, etc. +Local scalar variables can be defined using C declarations at very the +beginning of the expression, or else they can be defined automatically +by B<funcalc> (to be of type double). Thus, for example, a swap of +columns x and y in a table can be performed using either of the +following equivalent B<funcalc> expressions: + + + double temp; + temp = cur->x; + cur->x = cur->y; + cur->y = temp; + + +or: + + + temp = cur->x; + cur->x = cur->y; + cur->y = temp; + + +When this expression is executed using a command such as: + + funcalc -f swap.expr itest.ev otest.ev + +the resulting file will have values of the x and y columns swapped. + + +By default, the data type of the variable for a column is the same as +the data type of the column as stored in the file. This can be changed +by appending ":[dtype]" to the first reference to that column. In the +example above, to force x and y to be output as doubles, specify the +type 'D' explicitly: + + temp = cur->x:D; + cur->x = cur->y:D; + cur->y = temp; + + +Data type specifiers follow standard FITS table syntax for defining +columns using TFORM: + + +=over 4 + + + + +=item * + +A: ASCII characters + + +=item * + +B: unsigned 8-bit char + + +=item * + +I: signed 16-bit int + + +=item * + +U: unsigned 16-bit int (not standard FITS) + + +=item * + +J: signed 32-bit int + + +=item * + +V: unsigned 32-bit int (not standard FITS) + + +=item * + +E: 32-bit float + + +=item * + +D: 64-bit float + + +=item * + +X: bits (treated as an array of chars) + + +=back + + +Note that only the first reference to a column should contain the +explicit data type specifier. + + +Of course, it is important to handle the data type of the columns +correctly. One of the most frequent cause of error in B<funcalc> +programming is the implicit use of the wrong data type for a column in +expression. For example, the calculation: + + dx = (cur->x - cur->y)/(cur->x + cur->y); + +usually needs to be performed using floating point arithmetic. In +cases where the x and y columns are integers, this can be done by +reading the columns as doubles using an explicit type specification: + + dx = (cur->x:D - cur->y:D)/(cur->x + cur->y); + + +Alternatively, it can be done using C type-casting in the expression: + + dx = ((double)cur->x - (double)cur->y)/((double)cur->x + (double)cur->y); + + + +In addition to accessing columns in the current row, reference also +can be made to the B<previous> row using B<prev->[colname]>, +and to the B<next> row using B<next->[colname]>. Note that if +B<prev->[colname]> is specified in the B<funcalc> +expression, the very first row is not processed. If +B<next->[colname]> is specified in the B<funcalc> +expression, the very last row is not processed. In this way, +B<prev> and B<next> are guaranteed always to point to valid +rows. For example, to print out the values of the current x column +and the previous y column, use the C fprintf function in a +B<funcalc> expression: + + fprintf(stdout, "%d %d\n", cur->x, prev->y); + + + +New columns can be specified using the same B<cur->[colname]> +syntax by appending the column type (and optional tlmin/tlmax/binsiz +specifiers), separated by colons. For example, cur->avg:D will define +a new column of type double. Type specifiers are the same those +used above to specify new data types for existing columns. + + +For example, to create and output a new column that is the average value of the +x and y columns, a new "avg" column can be defined: + + cur->avg:D = (cur->x + cur->y)/2.0 + +Note that the final ';' is not required for single-line expressions. + + +As with FITS TFORM data type specification, the column data type +specifier can be preceded by a numeric count to define an array, e.g., +"10I" means a vector of 10 short ints, "2E" means two single precision +floats, etc. A new column only needs to be defined once in a +B<funcalc> expression, after which it can be used without +re-specifying the type. This includes reference to elements of a +column array: + + + cur->avg[0]:2D = (cur->x + cur->y)/2.0; + cur->avg[1] = (cur->x - cur->y)/2.0; + + + +The 'X' (bits) data type is treated as a char array of dimension +(numeric_count/8), i.e., 16X is processed as a 2-byte char array. Each +8-bit array element is accessed separately: + + cur->stat[0]:16X = 1; + cur->stat[1] = 2; + +Here, a 16-bit column is created with the MSB is set to 1 and the LSB set to 2. + + +By default, all processed rows are written to the specified output +file. If you want to skip writing certain rows, simply execute the C +"continue" statement at the end of the B<funcalc> expression, +since the writing of the row is performed immediately after the +expression is executed. For example, to skip writing rows whose +average is the same as the current x value: + + + cur->avg[0]:2D = (cur->x + cur->y)/2.0; + cur->avg[1] = (cur->x - cur->y)/2.0; + if( cur->avg[0] == cur->x ) + continue; + + + +If no output file argument is specified on the B<funcalc> command +line, no output file is opened and no rows are written. This is useful +in expressions that simply print output results instead of generating +a new file: + + fpv = (cur->av3:D-cur->av1:D)/(cur->av1+cur->av2:D+cur->av3); + fbv = cur->av2/(cur->av1+cur->av2+cur->av3); + fpu = ((double)cur->au3-cur->au1)/((double)cur->au1+cur->au2+cur->au3); + fbu = cur->au2/(double)(cur->au1+cur->au2+cur->au3); + fprintf(stdout, "%f\t%f\t%f\t%f\n", fpv, fbv, fpu, fbu); + +In the above example, we use both explicit type specification +(for "av" columns) and type casting (for "au" columns) to ensure that +all operations are performed in double precision. + + +When an output file is specified, the selected input table is +processed and output rows are copied to the output file. Note that +the output file can be specified as "stdout" in order to write the +output rows to the standard output. If the output file argument is +passed, an optional third argument also can be passed to specify which +columns to process. + + +In a FITS binary table, it sometimes is desirable to copy all of the +other FITS extensions to the output file as well. This can be done by +appending a '+' sign to the name of the extension in the input file +name. See B<funtable> for a related example. + + +B<funcalc> works by integrating the user-specified expression +into a template C program called tabcalc.c. +The completed program then is compiled and executed. Variable +declarations that begin the B<funcalc> expression are placed in +the local declaration section of the template main program. All other +lines are placed in the template main program's inner processing +loop. Other details of program generation are handled +automatically. For example, column specifiers are analyzed to build a +C struct for processing rows, which is passed to +FunColumnSelect() and used +in FunTableRowGet(). If +an unknown variable is used in the expression, resulting in a +compilation error, the program build is retried after defining the +unknown variable to be of type double. + + +Normally, B<funcalc> expression code is added to +B<funcalc> row processing loop. It is possible to add code +to other parts of the program by placing this code inside +special directives of the form: + + [directive name] + ... code goes here ... + end + + +The directives are: + + +=over 4 + + + + +=item * + +B<global> add code and declarations in global space, before the main routine. + + + +=item * + +B<local> add declarations (and code) just after the local declarations in +main + + + +=item * + +B<before> add code just before entering the main row processing loop + + + +=item * + +B<after> add code just after exiting the main row processing loop + + +=back + + + +Thus, the following B<funcalc> expression will declare global +variables and make subroutine calls just before and just after the +main processing loop: + + global + double v1, v2; + double init(void); + double finish(double v); + end + before + v1 = init(); + end + ... process rows, with calculations using v1 ... + after + v2 = finish(v1); + if( v2 < 0.0 ){ + fprintf(stderr, "processing failed %g -> %g\n", v1, v2); + exit(1); + } + end + +Routines such as init() and finish() above are passed to the generated +program for linking using the B<-l [link directives ...]> +switch. The string specified by this switch will be added to the link +line used to build the program (before the funtools library). For +example, assuming that init() and finish() are in the library +libmysubs.a in the /opt/special/lib directory, use: + + funcalc -l "-L/opt/special/lib -lmysubs" ... + + + +User arguments can be passed to a compiled funcalc program using a string +argument to the "-a" switch. The string should contain all of the +user arguments. For example, to pass the integers 1 and 2, use: + + funcalc -a "1 2" ... + +The arguments are stored in an internal array and are accessed as +strings via the ARGV(n) macro. For example, consider the following +expression: + + local + int pmin, pmax; + end + + before + pmin=atoi(ARGV(0)); + pmax=atoi(ARGV(1)); + end + + if( (cur->pha >= pmin) && (cur->pha <= pmax) ) + fprintf(stderr, "%d %d %d\n", cur->x, cur->y, cur->pha); + +This expression will print out x, y, and pha values for all rows in which +the pha value is between the two user-input values: + + funcalc -a '1 12' -f foo snr.ev'[cir 512 512 .1]' + 512 512 6 + 512 512 8 + 512 512 5 + 512 512 5 + 512 512 8 + + funcalc -a '5 6' -f foo snr.ev'[cir 512 512 .1]' + 512 512 6 + 512 512 5 + 512 512 5 + + + +Note that it is the user's responsibility to ensure that the correct +number of arguments are passed. The ARGV(n) macro returns a NULL if a +requested argument is outside the limits of the actual number of args, +usually resulting in a SEGV if processed blindly. To check the +argument count, use the ARGC macro: + + local + long int seed=1; + double limit=0.8; + end + + before + if( ARGC >= 1 ) seed = atol(ARGV(0)); + if( ARGC >= 2 ) limit = atof(ARGV(1)); + srand48(seed); + end + + if ( drand48() > limit ) continue; + + + +The macro WRITE_ROW expands to the FunTableRowPut() call that writes +the current row. It can be used to write the row more than once. In +addition, the macro NROW expands to the row number currently being +processed. Use of these two macros is shown in the following example: + + if( cur->pha:I == cur->pi:I ) continue; + a = cur->pha; + cur->pha = cur->pi; + cur->pi = a; + cur->AVG:E = (cur->pha+cur->pi)/2.0; + cur->NR:I = NROW; + if( NROW < 10 ) WRITE_ROW; + + + +If the B<-p [prog]> switch is specified, the expression is not +executed. Rather, the generated executable is saved with the specified +program name for later use. + + +If the B<-n> switch is specified, the expression is not +executed. Rather, the generated code is written to stdout. This is +especially useful if you want to generate a skeleton file and add your +own code, or if you need to check compilation errors. Note that the +comment at the start of the output gives the compiler command needed +to build the program on that platform. (The command can change from +platform to platform because of the use of different libraries, +compiler switches, etc.) + + +As mentioned previously, B<funcalc> will declare a scalar +variable automatically (as a double) if that variable has been used +but not declared. This facility is implemented using a sed script +named funcalc.sed, which processes the +compiler output to sense an undeclared variable error. This script +has been seeded with the appropriate error information for gcc, and for +cc on Solaris, DecAlpha, and SGI platforms. If you find that automatic +declaration of scalars is not working on your platform, check this sed +script; it might be necessary to add to or edit some of the error +messages it senses. + + +In order to keep the lexical analysis of B<funcalc> expressions +(reasonably) simple, we chose to accept some limitations on how +accurately C comments, spaces, and new-lines are placed in the +generated program. In particular, comments associated with local +variables declared at the beginning of an expression (i.e., not in a +B<local...end> block) will usually end up in the inner loop, not +with the local declarations: + + /* this comment will end up in the wrong place (i.e, inner loop) */ + double a; /* also in wrong place */ + /* this will be in the the right place (inner loop) */ + if( cur->x:D == cur->y:D ) continue; /* also in right place */ + a = cur->x; + cur->x = cur->y; + cur->y = a; + cur->avg:E = (cur->x+cur->y)/2.0; + +Similarly, spaces and new-lines sometimes are omitted or added in a +seemingly arbitrary manner. Of course, none of these stylistic +blemishes affect the correctness of the generated code. + + +Because B<funcalc> must analyze the user expression using the data +file(s) passed on the command line, the input file(s) must be opened +and read twice: once during program generation and once during +execution. As a result, it is not possible to use stdin for the +input file: B<funcalc> cannot be used as a filter. We will +consider removing this restriction at a later time. + + +Along with C comments, B<funcalc> expressions can have one-line +internal comments that are not passed on to the generated C +program. These internal comment start with the B<#> character and +continue up to the new-line: + + double a; # this is not passed to the generated C file + # nor is this + a = cur->x; + cur->x = cur->y; + cur->y = a; + /* this comment is passed to the C file */ + cur->avg:E = (cur->x+cur->y)/2.0; + + + +As previously mentioned, input columns normally are identified by +their being used within the inner event loop. There are rare cases +where you might want to read a column and process it outside the main +loop. For example, qsort might use a column in its sort comparison +routine that is not processed inside the inner loop (and therefore not +implicitly specified as a column to be read). To ensure that such a +column is read by the event loop, use the B<explicit> keyword. +The arguments to this keyword specify columns that should be read into +the input record structure even though they are not mentioned in the +inner loop. For example: + + explicit pi pha + +will ensure that the pi and pha columns are read for each row, +even if they are not processed in the inner event loop. The B<explicit> +statement can be placed anywhere. + + +Finally, note that B<funcalc> currently works on expressions +involving FITS binary tables and raw event files. We will consider +adding support for image expressions at a later point, if there is +demand for such support from the community. + + + + +=head1 SEE ALSO + + + +See funtools(n) for a list of Funtools help pages + + +=cut |