summaryrefslogtreecommitdiffstats
path: root/funtools/doc/pod/funcolumnselect.pod
diff options
context:
space:
mode:
Diffstat (limited to 'funtools/doc/pod/funcolumnselect.pod')
-rw-r--r--funtools/doc/pod/funcolumnselect.pod686
1 files changed, 686 insertions, 0 deletions
diff --git a/funtools/doc/pod/funcolumnselect.pod b/funtools/doc/pod/funcolumnselect.pod
new file mode 100644
index 0000000..ce65d38
--- /dev/null
+++ b/funtools/doc/pod/funcolumnselect.pod
@@ -0,0 +1,686 @@
+=pod
+
+=head1 NAME
+
+
+
+B<FunColumnSelect - select Funtools columns>
+
+
+=head1 SYNOPSIS
+
+
+
+
+
+ #include <funtools.h>
+
+ int FunColumnSelect(Fun fun, int size, char *plist,
+ char *name1, char *type1, char *mode1, int offset1,
+ char *name2, char *type2, char *mode2, int offset2,
+ ...,
+ NULL)
+
+ int FunColumnSelectArr(Fun fun, int size, char *plist,
+ char **names, char **types, char **modes,
+ int *offsets, int nargs);
+
+
+
+
+
+=head1 DESCRIPTION
+
+
+
+The B<FunColumnSelect()> routine is used to select the columns
+from a Funtools binary table extension or raw event file for
+processing. This routine allows you to specify how columns in a file
+are to be read into a user record structure or written from a user
+record structure to an output FITS file.
+
+
+The first argument is the Fun handle associated with this set of
+columns. The second argument specifies the size of the user record
+structure into which columns will be read. Typically, the sizeof()
+macro is used to specify the size of a record structure. The third
+argument allows you to specify keyword directives for the selection
+and is described in more detail below.
+
+
+Following the first three required arguments is a variable length list of
+column specifications. Each column specification will consist of four
+arguments:
+
+
+=over 4
+
+
+
+
+=item *
+
+B<name>: the name of the column
+
+
+
+=item *
+
+B<type>: the data type of the column as it will be stored in
+the user record struct (not the data type of the input file). The
+following basic data types are recognized:
+
+
+=over 4
+
+
+
+
+=item *
+
+A: ASCII characters
+
+
+=item *
+
+B: unsigned 8-bit char
+
+
+=item *
+
+I: signed 16-bit int
+
+
+=item *
+
+U: unsigned 16-bit int (not standard FITS)
+
+
+=item *
+
+J: signed 32-bit int
+
+
+=item *
+
+V: unsigned 32-bit int (not standard FITS)
+
+
+=item *
+
+E: 32-bit float
+
+
+=item *
+
+D: 64-bit float
+
+
+=back
+
+
+The syntax used is similar to that which defines the TFORM parameter
+in FITS binary tables. That is, a numeric repeat value can precede
+the type character, so that "10I" means a vector of 10 short ints, "E"
+means a single precision float, etc. Note that the column value from
+the input file will be converted to the specified data type as the
+data is read by
+FunTableRowGet().
+
+
+[ A short digression regarding bit-fields: Special attention is
+required when reading or writing the FITS bit-field type
+("X"). Bit-fields almost always have a numeric repeat character
+preceding the 'X' specification. Usually this value is a multiple of 8
+so that bit-fields fit into an integral number of bytes. For all
+cases, the byte size of the bit-field B is (N+7)/8, where N is the
+numeric repeat character.
+
+
+A bit-field is most easily declared in the user struct as an array of
+type char of size B as defined above. In this case, bytes are simply
+moved from the file to the user space. If, instead, a short or int
+scalar or array is used, then the algorithm for reading the bit-field
+into the user space depends on the size of the data type used along
+with the value of the repeat character. That is, if the user data
+size is equal to the byte size of the bit-field, then the data is
+simply moved (possibly with endian-based byte-swapping) from one to
+the other. If, on the other hand, the data storage is larger than the
+bit-field size, then a data type cast conversion is performed to move
+parts of the bit-field into elements of the array. Examples will help
+make this clear:
+
+
+
+=over 4
+
+
+
+
+=item *
+
+If the file contains a 16X bit-field and user space specifies a 2B
+char array[2], then the bit-field is moved directly into the char array.
+
+
+
+=item *
+
+If the file contains a 16X bit-field and user space specifies a 1I
+scalar short int, then the bit-field is moved directly into the short int.
+
+
+
+=item *
+
+If the file contains a 16X bit-field and user space specifies a 1J
+scalar int, then the bit-field is type-cast to unsigned int before
+being moved (use of unsigned avoids possible sign extension).
+
+
+
+=item *
+
+If the file contains a 16X bit-field and user space specifies a 2J
+int array[2], then the bit-field is handled as 2 chars, each of which
+are type-cast to unsigned int before being moved (use of unsigned
+avoids possible sign extension).
+
+
+
+=item *
+
+If the file contains a 16X bit-field and user space specifies a 1B
+char, then the bit-field is treated as a char, i.e., truncation will
+occur.
+
+
+
+=item *
+
+If the file contains a 16X bit-field and user space specifies a 4J
+int array[4], then the results are undetermined.
+
+
+
+=back
+
+
+For all user data types larger than char, the bit-field is byte-swapped
+as necessary to convert to native format, so that bits in the
+resulting data in user space can be tested, masked, etc. in the same
+way regardless of platform.]
+
+
+In addition to setting data type and size, the B<type>
+specification allows a few ancillary parameters to be set, using the
+full syntax for B<type>:
+
+ [@][n]<type>[[['B']poff]][:[tlmin[:tlmax[:binsiz]]]]
+
+
+
+The special character "@" can be prepended to this specification to
+indicated that the data element is a pointer in the user record,
+rather than an array stored within the record.
+
+
+The [n] value is an integer that specifies the
+number of elements that are in this column (default is 1). TLMIN,
+TLMAX, and BINSIZ values also can be specified for this column after
+the type, separated by colons. If only one such number is specified,
+it is assumed to be TLMAX, and TLMIN and BINSIZ are set to 1.
+
+
+The [poff] value can be used to specify the offset into an
+array. By default, this offset value is set to zero and the data
+specified starts at the beginning of the array. The offset usually
+is specified in terms of the data type of the column. Thus an offset
+specification of [5] means a 20-byte offset if the data type is a
+32-bit integer, and a 40-byte offset for a double. If you want to
+specify a byte offset instead of an offset tied to the column data type,
+precede the offset value with 'B', e.g. [B6] means a 6-bye offset,
+regardless of the column data type.
+
+The [poff] is especially useful in conjunction with the pointer @
+specification, since it allows the data element to anywhere stored
+anywhere in the allocated array. For example, a specification such as
+"@I[2]" specifies the third (i.e., starting from 0) element in the
+array pointed to by the pointer value. A value of "@2I[4]" specifies
+the fifth and sixth values in the array. For example, consider the
+following specification:
+
+ typedef struct EvStruct{
+ short x[4], *atp;
+ } *Event, EventRec;
+ /* set up the (hardwired) columns */
+ FunColumnSelect( fun, sizeof(EventRec), NULL,
+ "2i", "2I ", "w", FUN_OFFSET(Event, x),
+ "2i2", "2I[2]", "w", FUN_OFFSET(Event, x),
+ "at2p", "@2I", "w", FUN_OFFSET(Event, atp),
+ "at2p4", "@2I[4]", "w", FUN_OFFSET(Event, atp),
+ "atp9", "@I[9]", "w", FUN_OFFSET(Event, atp),
+ "atb20", "@I[B20]", "w", FUN_OFFSET(Event, atb),
+ NULL);
+
+Here we have specified the following columns:
+
+
+=over 4
+
+
+
+
+=item *
+
+2i: two short ints in an array which is stored as part the
+record
+
+
+=item *
+
+2i2: the 3rd and 4th elements of an array which is stored
+as part of the record
+
+
+=item *
+
+an array of at least 10 elements, not stored in the record but
+allocated elsewhere, and used by three different columns:
+
+
+=over 4
+
+
+
+
+=item *
+
+at2p: 2 short ints which are the first 2 elements of the allocated array
+
+
+=item *
+
+at2p4: 2 short ints which are the 5th and 6th elements of
+the allocated array
+
+
+=item *
+
+atp9: a short int which is the 10th element of the allocated array
+
+
+=back
+
+
+
+
+=item *
+
+atb20: a short int which is at byte offset 20 of another allocated array
+
+
+=back
+
+
+In this way, several columns can be specified, all of which are in a
+single array. B<NB>: it is the programmer's responsibility to
+ensure that specification of a positive value for poff does not point
+past the end of valid data.
+
+
+
+=item *
+
+B<read/write mode>: "r" means that the column is read from an
+input file into user space by
+FunTableRowGet(), "w" means that
+the column is written to an output file. Both can specified at the same
+time.
+
+
+
+=item *
+
+B<offset>: the offset into the user data to store
+this column. Typically, the macro FUN_OFFSET(recname, colname) is used
+to define the offset into a record structure.
+
+
+=back
+
+
+
+
+When all column arguments have been specified, a final NULL argument
+must added to signal the column selection list.
+
+
+As an alternative to the varargs
+FunColumnSelect()
+routine, a non-varargs routine called
+FunColumnSelectArr()
+also is available. The first three arguments (fun, size, plist) of this
+routine are the same as in
+FunColumnSelect().
+Instead of a variable
+argument list, however,
+FunColumnSelectArr()
+takes 5 additional arguments. The first 4 arrays arguments contain the
+names, types, modes, and offsets, respectively, of the columns being
+selected. The final argument is the number of columns that are
+contained in these arrays. It is the user's responsibility to free
+string space allocated in these arrays.
+
+
+Consider the following example:
+
+ typedef struct evstruct{
+ int status;
+ float pi, pha, *phas;
+ double energy;
+ } *Ev, EvRec;
+
+ FunColumnSelect(fun, sizeof(EvRec), NULL,
+ "status", "J", "r", FUN_OFFSET(Ev, status),
+ "pi", "E", "r", FUN_OFFSET(Ev, pi),
+ "pha", "E", "r", FUN_OFFSET(Ev, pha),
+ "phas", "@9E", "r", FUN_OFFSET(Ev, phas),
+ NULL);
+
+
+Each time a row is read into the Ev struct, the "status" column is
+converted to an int data type (regardless of its data type in the
+file) and stored in the status value of the struct. Similarly, "pi"
+and "pha", and the phas vector are all stored as floats. Note that the
+"@" sign indicates that the "phas" vector is a pointer to a 9 element
+array, rather than an array allocated in the struct itself. The row
+record can then be processed as required:
+
+ /* get rows -- let routine allocate the row array */
+ while( (ebuf = (Ev)FunTableRowGet(fun, NULL, MAXROW, NULL, &got)) ){
+ /* process all rows */
+ for(i=0; i<got; i++){
+ /* point to the i'th row */
+ ev = ebuf+i;
+ ev->pi = (ev->pi+.5);
+ ev->pha = (ev->pi-.5);
+ }
+
+
+
+FunColumnSelect()
+can also be called to define "writable" columns in order to generate a FITS
+Binary Table, without reference to any input columns. For
+example, the following will generate a 4-column FITS binary table when
+FunTableRowPut() is used to
+write Ev records:
+
+
+ typedef struct evstruct{
+ int status;
+ float pi, pha
+ double energy;
+ } *Ev, EvRec;
+
+ FunColumnSelect(fun, sizeof(EvRec), NULL,
+ "status", "J", "w", FUN_OFFSET(Ev, status),
+ "pi", "E", "w", FUN_OFFSET(Ev, pi),
+ "pha", "E", "w", FUN_OFFSET(Ev, pha),
+ "energy", "D", "w", FUN_OFFSET(Ev, energy),
+ NULL);
+
+All columns are declared to be write-only, so presumably the column
+data is being generated or read from some other source.
+
+
+In addition,
+FunColumnSelect()
+can be called to define B<both> "readable" and "writable" columns.
+In this case, the "read" columns
+are associated with an input file, while the "write" columns are
+associated with the output file. Of course, columns can be specified as both
+"readable" and "writable", in which case they are read from input
+and (possibly modified data values are) written to the output.
+The
+FunColumnSelect()
+call itself is made by passing the input Funtools handle, and it is
+assumed that the output file has been opened using this input handle
+as its
+Funtools reference handle.
+
+
+Consider the following example:
+
+ typedef struct evstruct{
+ int status;
+ float pi, pha, *phas;
+ double energy;
+ } *Ev, EvRec;
+
+ FunColumnSelect(fun, sizeof(EvRec), NULL,
+ "status", "J", "r", FUN_OFFSET(Ev, status),
+ "pi", "E", "rw", FUN_OFFSET(Ev, pi),
+ "pha", "E", "rw", FUN_OFFSET(Ev, pha),
+ "phas", "@9E", "rw", FUN_OFFSET(Ev, phas),
+ "energy", "D", "w", FUN_OFFSET(Ev, energy),
+ NULL);
+
+As in the "read" example above, each time an row is read into the Ev
+struct, the "status" column is converted to an int data type
+(regardless of its data type in the file) and stored in the status
+value of the struct. Similarly, "pi" and "pha", and the phas vector
+are all stored as floats. Since the "pi", "pha", and "phas" variables
+are declared as "writable" as well as "readable", they also will be
+written to the output file. Note, however, that the "status" variable
+is declared as "readable" only, and hence it will not be written to
+an output file. Finally, the "energy" column is declared as
+"writable" only, meaning it will not be read from the input file. In
+this case, it can be assumed that "energy" will be calculated in the
+program before being output along with the other values.
+
+
+In these simple cases, only the columns specified as "writable" will
+be output using
+FunTableRowPut(). However,
+it often is the case that you want to merge the user columns back in
+with the input columns, even in cases where not all of the input
+column names are explicitly read or even known. For this important
+case, the B<merge=[type]> keyword is provided in the plist string.
+
+
+The B<merge=[type]> keyword tells Funtools to merge the columns from
+the input file with user columns on output. It is normally used when
+an input and output file are opened and the input file provides the
+Funtools reference handle
+for the output file. In this case, each time
+FunTableRowGet() is called, the
+raw input rows are saved in a special buffer. If
+FunTableRowPut() then is called
+(before another call to
+FunTableRowGet()), the contents
+of the raw input rows are merged with the user rows according to the
+value of B<type> as follows:
+
+
+
+=over 4
+
+
+
+
+=item *
+
+B<update>: add new user columns, and update value of existing ones (maintaining the input data type)
+
+
+
+=item *
+
+B<replace>: add new user columns, and replace the data type
+and value of existing ones. (Note that if tlmin/tlmax values are not
+specified in the replacing column, but are specified in the original
+column being replaced, then the original tlmin/tlmax values are used
+in the replacing column.)
+
+
+
+=item *
+
+B<append>: only add new columns, do not "replace" or "update" existing ones
+
+
+=back
+
+
+
+
+Consider the example above. If B<merge=update> is specified in the
+plist string, then "energy" will be added to the input columns, and
+the values of "pi", "pha", and "phas" will be taken from the user
+space (i.e., the values will be updated from the original values, if
+they were changed by the program). The data type for "pi", "pha", and
+"phas" will be the same as in the original file. If
+B<merge=replace> is specified, both the data type and value of
+these three input columns will be changed to the data type and value
+in the user structure. If B<merge=append> is specified, none of
+these three columns will be updated, and only the "energy" column will
+be added. Note that in all cases, "status" will be written from the
+input data, not from the user record, since it was specified as read-only.
+
+
+Standard applications will call
+FunColumnSelect()
+to define user columns. However, if this routine is not called, the
+default behavior is to transfer all input columns into user space. For
+this purpose a default record structure is defined such that each data
+element is properly aligned on a valid data type boundary. This
+mechanism is used by programs such as fundisp and funtable to process
+columns without needing to know the specific names of those columns.
+It is not anticipated that users will need such capabilities (contact
+us if you do!)
+
+
+By default, FunColumnSelect()
+reads/writes rows to/from an "array of structs", where each struct contains
+the column values for a single row of the table. This means that the
+returned values for a given column are not contiguous. You can
+set up the IO to return a "struct of arrays" so that each of the
+returned columns are contiguous by specifying B<org=structofarrays>
+(abbreviation: B<org=soa>) in the plist.
+(The default case is B<org=arrayofstructs> or B<org=aos>.)
+
+
+For example, the default setup to retrieve rows from a table would be
+to define a record structure for a single event and then call
+ FunColumnSelect()
+as follows:
+
+ typedef struct evstruct{
+ short region;
+ double x, y;
+ int pi, pha;
+ double time;
+ } *Ev, EvRec;
+
+ got = FunColumnSelect(fun, sizeof(EvRec), NULL,
+ "x", "D:10:10", mode, FUN_OFFSET(Ev, x),
+ "y", "D:10:10", mode, FUN_OFFSET(Ev, y),
+ "pi", "J", mode, FUN_OFFSET(Ev, pi),
+ "pha", "J", mode, FUN_OFFSET(Ev, pha),
+ "time", "1D", mode, FUN_OFFSET(Ev, time),
+ NULL);
+
+Subsequently, each call to
+FunTableRowGet()
+will return an array of structs, one for each returned row. If instead you
+wanted to read columns into contiguous arrays, you specify B<org=soa>:
+
+ typedef struct aevstruct{
+ short region[MAXROW];
+ double x[MAXROW], y[MAXROW];
+ int pi[MAXROW], pha[MAXROW];
+ double time[MAXROW];
+ } *AEv, AEvRec;
+
+ got = FunColumnSelect(fun, sizeof(AEvRec), "org=soa",
+ "x", "D:10:10", mode, FUN_OFFSET(AEv, x),
+ "y", "D:10:10", mode, FUN_OFFSET(AEv, y),
+ "pi", "J", mode, FUN_OFFSET(AEv, pi),
+ "pha", "J", mode, FUN_OFFSET(AEv, pha),
+ "time", "1D", mode, FUN_OFFSET(AEv, time),
+ NULL);
+
+Note that the only modification to the call is in the plist string.
+
+
+Of course, instead of using staticly allocated arrays, you also can specify
+dynamically allocated pointers:
+
+ /* pointers to arrays of columns (used in struct of arrays) */
+ typedef struct pevstruct{
+ short *region;
+ double *x, *y;
+ int *pi, *pha;
+ double *time;
+ } *PEv, PEvRec;
+
+ got = FunColumnSelect(fun, sizeof(PEvRec), "org=structofarrays",
+ "$region", "@I", mode, FUN_OFFSET(PEv, region),
+ "x", "@D:10:10", mode, FUN_OFFSET(PEv, x),
+ "y", "@D:10:10", mode, FUN_OFFSET(PEv, y),
+ "pi", "@J", mode, FUN_OFFSET(PEv, pi),
+ "pha", "@J", mode, FUN_OFFSET(PEv, pha),
+ "time", "@1D", mode, FUN_OFFSET(PEv, time),
+ NULL);
+
+Here, the actual storage space is either allocated by the user or by the
+FunColumnSelect() call).
+
+
+In all of the above cases, the same call is made to retrieve rows, e.g.:
+
+ buf = (void *)FunTableRowGet(fun, NULL, MAXROW, NULL, &got);
+
+However, the individual data elements are accessed differently.
+For the default case of an "array of structs", the
+individual row records are accessed using:
+
+ for(i=0; i<got; i++){
+ ev = (Ev)buf+i;
+ fprintf(stdout, "%.2f\t%.2f\t%d\t%d\t%.4f\t%.4f\t%21.8f\n",
+ ev->x, ev->y, ev->pi, ev->pha, ev->dx, ev->dy, ev->time);
+ }
+
+For a struct of arrays or a struct of array pointers, we have a single struct
+through which we access individual columns and rows using:
+
+ aev = (AEv)buf;
+ for(i=0; i<got; i++){
+ fprintf(stdout, "%.2f\t%.2f\t%d\t%d\t%.4f\t%.4f\t%21.8f\n",
+ aev->x[i], aev->y[i], aev->pi[i], aev->pha[i],
+ aev->dx[i], aev->dy[i], aev->time[i]);
+ }
+
+Support for struct of arrays in the
+FunTableRowPut()
+call is handled analogously.
+
+
+See the evread example code
+and
+evmerge example code
+for working examples of how
+FunColumnSelect() is used.
+
+
+
+
+=head1 SEE ALSO
+
+
+
+See funtools(n) for a list of Funtools help pages
+
+
+=cut