1 files changed, 564 insertions, 0 deletions
diff --git a/doc/text.html b/doc/text.html
new file mode 100644
index 0000000..f4fdac8
--- /dev/null
+++ b/doc/text.html
@@ -0,0 +1,564 @@
+<!-- =defdoc funtext funtext n -->
+<HTML>
+<HEAD>
+<TITLE>Column-based Text Files</TITLE>
+</HEAD>
+<BODY>
+
+<!-- =section funtext NAME -->
+<H2><A NAME="funtext">Funtext: Support for Column-based Text Files</A></H2>
+
+<!-- =section funtext SYNOPSIS -->
+<H2>Summary</H2>
+<P>
+This document contains a summary of the options for processing column-based
+text files.
+
+<!-- =section funtext DESCRIPTION -->
+<H2>Description</H2>
+
+<P>
+Funtools will automatically sense and process "standard"
+column-based text files as if they were FITS binary tables without any
+change in Funtools syntax. In particular, you can filter text files
+using the same syntax as FITS binary tables:
+<pre>
+  fundisp foo.txt'[cir 512 512 .1]'
+  fundisp -T foo.txt > foo.rdb
+  funtable foo.txt'[pha=1:10,cir 512 512 10]' foo.fits
+</pre>
+
+<P>
+The first example displays a filtered selection of a text file.  The
+second example converts a text file to an RDB file.  The third example
+converts a filtered selection of a text file to a FITS binary table.
+
+<P>
+Text files can also be used in Funtools image programs. In this case,
+you must provide binning parameters (as with raw event files), using
+the bincols keyword specifier:
+
+<pre>
+  bincols=([xname[:tlmin[:tlmax:[binsiz]]]],[yname[:tlmin[:tlmax[:binsiz]]]
+</pre>
+
+For example:
+<pre>
+  funcnts foo'[bincols=(x:1024,y:1024)]' "ann 512 512 0 10 n=10"
+</pre>
+
+<H2>Standard Text Files</H2>
+
+<P>
+Standard text files have the following characteristics:
+
+<ul>
+<li> Optional comment lines start with #
+<li> Optional blank lines are considered comments
+<li> An optional table header consists of the following (in order):
+<ul>
+     <li> a single line of alpha-numeric column names
+     <li> an optional line of unit strings containing the same number of cols
+     <li> an optional line of dashes containing the same number of cols
+</ul>
+<li> Data lines follow the optional header and (for the present) consist of
+     the same number of columns as the header.
+<li> Standard delimiters such as space, tab, comma, semi-colon, and bar.
+</ul>
+
+<P>
+Examples:
+
+<pre>
+  # rdb file
+  foo1	foo2	foo3	foos
+  ----	----	----	----
+  1	2.2	3	xxxx
+  10	20.2	30	yyyy
+
+  # multiple consecutive whitespace and dashes
+  foo1   foo2    foo3 foos
+  ---    ----    ---- ----
+     1    2.2    3    xxxx
+    10   20.2    30   yyyy
+
+  # comma delims and blank lines
+  foo1,foo2,foo3,foos
+
+  1,2.2,3,xxxx
+  10,20.2,30,yyyy
+
+  # bar delims with null values
+  foo1|foo2|foo3|foos
+  1||3|xxxx
+  10|20.2||yyyy
+
+  # header-less data
+  1     2.2   3 xxxx
+  10    20.2 30 yyyy
+</pre>
+
+<P>
+The default set of token delimiters consists of spaces, tabs, commas,
+semi-colons, and vertical bars. Several parsers are used
+simultaneously to analyze a line of text in different ways.  One way
+of analyzing a line is to allow a combination of spaces, tabs, and
+commas to be squashed into a single delimiter (no null values between
+consecutive delimiters). Another way is to allow tab, semi-colon, and
+vertical bar delimiters to support null values, i.e. two consecutive
+delimiters implies a null value (e.g. RDB file).  A successful parser
+is one which returns a consistent number of columns for all rows, with
+each column having a consistent data type.  More than one parser can
+be successful. For now, it is assumed that successful parsers all
+return the same tokens for a given line. (Theoretically, there are
+pathological cases, which will be taken care of as needed). Bad parsers
+are discarded on the fly.
+
+<P>
+If the header does not exist, then names "col1", "col2", etc.  are
+assigned to the columns to allow filtering.  Furthermore, data types
+for each column are determined by the data types found in the columns
+of the first data line, and can be one of the following: string, int,
+and double. Thus, all of the above examples return the following
+display:
+<pre>
+  fundisp foo'[foo1>5]'
+        FOO1                  FOO2       FOO3         FOOS
+  ---------- --------------------- ---------- ------------
+          10           20.20000000         30         yyyy
+</pre>
+
+<H2>Comments Convert to Header Params</H2>
+
+<P>
+Comments which precede data rows are converted into header parameters and
+will be written out as such using funimage or funhead. Two styles of comments
+are recognized:
+
+<P>
+1. FITS-style comments have an equal sign "=" between the keyword and
+value and an optional slash "/" to signify a comment. The strict FITS
+rules on column positions are not enforced. In addition, strings only
+need to be quoted if they contain whitespace. For example, the following
+are valid FITS-style comments:
+
+<pre>
+  # fits0 = 100
+  # fits1 = /usr/local/bin
+  # fits2 = "/usr/local/bin /opt/local/bin"
+  # fits3c = /usr/local/bin /opt/local/bin /usr/bin
+  # fits4c = "/usr/local/bin /opt/local/bin" / path dir
+</pre>
+
+Note that the fits3c comment is not quoted and therefore its value is the
+single token "/usr/local/bin" and the comment is "opt/local/bin /usr/bin".
+This is different from the quoted comment in fits4c.
+
+<P>
+2. Free-form comments can have an optional colon separator between the
+keyword and value. In the absence of quote, all tokens after the
+keyword are part of the value, i.e. no comment is allowed. If a string
+is quoted, then slash "/" after the string will signify a comment.
+For example:
+
+<pre>
+  # com1 /usr/local/bin
+  # com2 "/usr/local/bin /opt/local/bin"
+  # com3 /usr/local/bin /opt/local/bin /usr/bin
+  # com4c "/usr/local/bin /opt/local/bin" / path dir
+  
+  # com11: /usr/local/bin
+  # com12: "/usr/local/bin /opt/local/bin"
+  # com13: /usr/local/bin /opt/local/bin /usr/bin
+  # com14c: "/usr/local/bin /opt/local/bin" / path dir
+</pre>
+  
+<P>
+Note that com3 and com13 are not quoted, so the whole string is part of
+the value, while comz4c and com14c are quoted and have comments following
+the values.
+
+<P>
+Some text files have column name and data type information in the header.
+You can specify the format of column information contained in the
+header using the "hcolfmt=" specification. See below for a detailed
+description.
+
+<H2>Multiple Tables in a Single File</H2>
+
+<P> 
+Multiple tables are supported in a single file. If an RDB-style file
+is sensed, then a ^L (vertical tab) will signify end of
+table. Otherwise, an end of table is sensed when a new header (i.e.,
+all alphanumeric columns) is found. (Note that this heuristic does not
+work for single column tables where the column type is ASCII and the
+table that follows also has only one column.) You also can specify
+characters that signal an end of table condition using the <b>eot=</b>
+keyword. See below for details.
+
+<P>
+You can access the nth table (starting from 1) in a multi-table file
+by enclosing the table number in brackets, as with a FITS extension:
+
+<pre>
+  fundisp foo'[2]'
+</pre>
+The above example will display the second table in the file.
+(Index values start at 1 in oder to maintain logical compatibility
+with FITS files, where extension numbers also start at 1).
+
+
+<H2>TEXT() Specifier</H2>
+
+<P>
+As with ARRAY() and EVENTS() specifiers for raw image arrays and raw
+event lists respectively, you can use TEXT() on text files to pass
+key=value options to the parsers. An empty set of keywords is
+equivalent to not having TEXT() at all, that is:
+
+<pre>
+  fundisp foo
+  fundisp foo'[TEXT()]'
+</pre>
+
+are equivalent. A multi-table index number is placed before the TEXT()
+specifier as the first token, when indexing into a multi-table:
+
+  fundisp foo'[2,TEXT(...)]'
+
+<P>
+The filter specification is placed after the TEXT() specifier, separated
+by a comma, or in an entirely separate bracket:
+
+<pre>
+  fundisp foo'[TEXT(...),circle 512 512 .1]'
+  fundisp foo'[2,TEXT(...)][circle 512 512 .1]'
+</pre>
+
+<H2>Text() Keyword Options</H2>
+
+<P> 
+The following is a list of keywords that can be used within the TEXT()
+specifier (the first three are the most important):
+
+<DL>
+
+<P>
+<DT> delims="[delims]"
+<DD>Specify token delimiters for this file. Only a single parser having these
+delimiters will be used to process the file.
+<pre>
+  fundisp foo.fits'[TEXT(delims="!")]'
+  fundisp foo.fits'[TEXT(delims="\t%")]'
+</pre>
+
+<P>
+<DT> comchars="[comchars]"
+<DD> Specify comment characters. You must include "\n" to allow blank lines.
+These comment characters will be used for all standard parsers (unless delims
+are also specified).
+<pre>
+  fundisp foo.fits'[TEXT(comchars="!\n")]'
+</pre>
+
+<P>
+<DT> cols="[name1:type1 ...]"
+<DD> Specify names and data type of columns. This overrides header
+names and/or data types in the first data row or default names and
+data types for header-less tables.
+<pre>
+  fundisp foo.fits'[TEXT(cols="x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e")]'
+</pre>
+<P>
+If the column specifier is the only keyword, then the cols= is not
+required (in analogy with EVENTS()):
+<pre>
+  fundisp foo.fits'[TEXT(x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e)]'
+</pre>
+Of course, an index is allowed in this case:
+<pre>
+  fundisp foo.fits'[2,TEXT(x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e)]'
+</pre>
+
+<P>
+<DT> eot="[eot delim]"
+<DD> Specify end of table string specifier for multi-table files. RDB
+files support ^L. The end of table specifier is a string and the whole
+string must be found alone on a line to signify EOT. For example:
+<pre>
+  fundisp foo.fits'[TEXT(eot="END")]' 
+</pre>
+will end the table when a line contains "END" is found. Multiple lines
+are supported, so that:
+<pre>
+  fundisp foo.fits'[TEXT(eot="END\nGAME")]'
+</pre>
+will end the table when a line contains "END" followed by a line
+containing "GAME".
+<P>
+In the absence of an EOT delimiter, a new table will be sensed when a new
+header (all alphanumeric columns) is found.
+
+<P>
+<DT> null1="[datatype]"
+<DD> Specify data type of a single null value in row 1.
+Since column data types are determined by the first row, a null value
+in that row will result in an error and a request to specify names and
+data types using cols=. If you only have a one null in row 1, you don't
+need to specify all names and columns. Instead, use null1="type" to
+specify its data type.
+
+<P>
+<DT> alen=[n]
+<DD>Specify size in bytes for ASCII type columns.
+FITS binary tables only support fixed length ASCII columns, so a
+size value must be specified. The default is 16 bytes.
+
+<P>
+<DT> nullvalues=["true"|"false"]
+<DD>Specify whether to expect null values.
+Give the parsers a hint as to whether null values should be allowed. The
+default is to try to determine this from the data.
+
+<P>
+<DT> whitespace=["true"|"false"]
+<DD> Specify whether surrounding white space should be kept as part of
+string tokens.  By default surrounding white space is removed from
+tokens.
+
+<P>
+<DT> header=["true"|"false"]
+<DD>Specify whether to require a header.  This is needed by tables
+containing all string columns (and with no row containing dashes), in
+order to be able to tell whether the first row is a header or part of
+the data. The default is false, meaning that the first row will be
+data. If a row dashes are present, the previous row is considered the
+column name row.
+
+<P>
+<DT> units=["true"|"false"]
+<DD>Specify whether to require a units line.
+Give the parsers a hint as to whether a row specifying units should be
+allowed. The default is to try to determine this from the data.
+
+<P>
+<DT> i2f=["true"|"false"]
+<DD>Specify whether to allow int to float conversions.
+If a column in row 1 contains an integer value, the data type for that
+column will be set to int. If a subsequent row contains a float in
+that same column, an error will be signaled. This flag specifies that,
+instead of an error, the float should be silently truncated to
+int. Usually, you will want an error to be signaled, so that you can
+specify the data type using cols= (or by changing the value of
+the column in row 1).
+
+<P>
+<DT> comeot=["true"|"false"|0|1|2]
+<DD>Specify whether comment signifies end of table.
+If comeot is 0 or false, then comments do not signify end of table and
+can be interspersed with data rows. If the value is true or 1 (the
+default for standard parsers), then non-blank lines (e.g. lines
+beginning with '#') signify end of table but blanks are allowed
+between rows. If the value is 2, then all comments, including blank
+lines, signify end of table.
+
+<P>
+<DT> lazyeot=["true"|"false"]
+<DD>Specify whether "lazy" end of table should be permitted (default is
+true for standard formats, except rdb format where explicit ^L is required
+between tables). A lazy EOT can occur when a new table starts directly
+after an old one, with no special EOT delimiter. A check for this EOT
+condition is begun when a given row contains all string tokens. If, in
+addition, there is a mismatch between the number of tokens in the
+previous row and this row, or a mismatch between the number of string
+tokens in the prev row and this row, a new table is assumed to have
+been started. For example:
+<PRE>
+  ival1 sval3
+  ----- -----
+  1     two
+  3     four
+
+  jval1 jval2 tval3
+  ----- ----- ------
+  10    20    thirty
+  40    50    sixty
+</PRE>
+Here the line "jval1 ..." contains all string tokens.  In addition,
+the number of tokens in this line (3) differs from the number of
+tokens in the previous line (2). Therefore a new table is assumed
+to have started. Similarly:
+<PRE>
+  ival1 ival2 sval3
+  ----- ----- -----
+  1     2     three
+  4     5     six
+
+  jval1 jval2 tval3
+  ----- ----- ------
+  10    20    thirty
+  40    50    sixty
+</PRE>
+Again, the line "jval1 ..." contains all string tokens. The number of
+string tokens in the previous row (1) differs from the number of
+tokens in the current row(3). We therefore assume a new table as been
+started. This lazy EOT test is not performed if lazyeot is explicitly
+set to false.
+
+<P>
+<DT> hcolfmt=[header column format]
+<DD> Some text files have column name and data type information in the header.
+For example, VizieR catalogs have headers containing both column names
+and data types:
+<PRE>
+  #Column e_Kmag  (F6.3)  ?(k_msigcom) K total magnitude uncertainty (4)  [ucd=ERROR]
+  #Column Rflg    (A3)    (rd_flg) Source of JHK default mag (6)  [ucd=REFER_CODE]
+  #Column Xflg    (I1)    [0,2] (gal_contam) Extended source contamination (10) [ucd=CODE_MISC]
+</PRE>
+
+while Sextractor files have headers containing column names alone:
+
+<PRE>
+  #   1 X_IMAGE         Object position along x                         [pixel]
+  #   2 Y_IMAGE         Object position along y                         [pixel]
+  #   3 ALPHA_J2000     Right ascension of barycenter (J2000)           [deg]
+  #   4 DELTA_J2000     Declination of barycenter (J2000)               [deg]
+</PRE>
+The hcolfmt specification allows you to describe which header lines
+contain column name and data type information. It consists of a string 
+defining the format of the column line, using "$col" (or "$name") to
+specify placement of the column name, "$fmt" to specify placement of the
+data format, and "$skip" to specify tokens to ignore. You also can
+specify tokens explicitly (or, for those users familiar with how
+sscanf works, you can specify scanf skip specifiers using "%*").
+For example, the VizieR hcolfmt above might be specified in several ways:
+<PRE>
+  Column $col ($fmt)    # explicit specification of "Column" string
+  $skip  $col ($fmt)    # skip one token
+  %*s $col  ($fmt)      # skip one string (using scanf format)
+</PRE>
+while the Sextractor format might be specified using:
+<PRE>
+  $skip $col            # skip one token  
+  %*d $col              # skip one int (using scanf format)
+</PRE>
+You must ensure that the hcolfmt statement only senses actual column
+definitions, with no false positives or negatives.  For example, the
+first Sextractor specification, "$skip $col", will consider any header
+line containing two tokens to be a column name specifier, while the
+second one, "%*d $col", requires an integer to be the first token. In
+general, it is preferable to specify formats as explicitly as
+possible.
+
+<P>
+Note that the VizieR-style header info is sensed automatically by the
+funtools standard VizieR-like parser, using the hcolfmt "Column $col
+($fmt)".  There is no need for explicit use of hcolfmt in this case.
+
+<P>
+<DT> debug=["true"|"false"]
+<DD>Display debugging information during parsing.
+
+</DL>
+
+<H2>Environment Variables</H2>
+
+<P> 
+Environment variables are defined to allow many of these TEXT() values to be
+set without having to include them in TEXT() every time a file is processed:
+
+<pre>
+  keyword       environment variable
+  -------       --------------------
+  delims        TEXT_DELIMS
+  comchars      TEXT_COMCHARS
+  cols          TEXT_COLUMNS
+  eot           TEXT_EOT
+  null1         TEXT_NULL1
+  alen          TEXT_ALEN
+  bincols       TEXT_BINCOLS
+  hcolfmt       TEXT_HCOLFMT
+</pre>
+
+<H2>Restrictions and Problems</H2>
+
+<P> 
+As with raw event files, the '+' (copy extensions) specifier is not
+supported for programs such as funtable.
+
+<P>
+String to int and int to string data conversions are allowed by the
+text parsers. This is done more by force of circumstance than by
+conviction: these transitions often happens with VizieR catalogs,
+which we want to support fully. One consequence of allowing these
+transitions is that the text parsers can get confused by columns which
+contain a valid integer in the first row and then switch to a
+string. Consider the following table:
+<PRE>
+  xxx   yyy     zzz
+  ----  ----    ----
+  111   aaa     bbb
+  ccc   222     ddd
+</PRE>
+The xxx column has an integer value in row one a string in row two,
+while the yyy column has the reverse. The parser will erroneously
+treat the first column as having data type int:
+<PRE>
+  fundisp foo.tab
+         XXX          YYY          ZZZ
+  ---------- ------------ ------------
+         111        'aaa'        'bbb'
+  1667457792        '222'        'ddd'
+</PRE>
+while the second column is processed correctly. This situation can be avoided
+in any number of ways, all of which force the data type of the first column
+to be a string. For example, you can edit the file and explicitly quote the
+first row of the column:
+<PRE>
+  xxx   yyy     zzz
+  ----  ----    ----
+  "111" aaa     bbb
+  ccc   222     ddd
+
+  [sh] fundisp foo.tab
+           XXX          YYY          ZZZ
+  ------------ ------------ ------------
+         '111'        'aaa'        'bbb'
+         'ccc'        '222'        'ddd'
+</PRE>
+You can edit the file and explicitly set the data type of the first column:
+<PRE>
+  xxx:3A   yyy  zzz
+  ------   ---- ----
+  111      aaa  bbb
+  ccc      222  ddd
+
+  [sh] fundisp foo.tab
+           XXX          YYY          ZZZ
+  ------------ ------------ ------------
+         '111'        'aaa'        'bbb'
+         'ccc'        '222'        'ddd'
+</PRE>
+You also can explicitly set the column names and data types of all columns,
+without editing the file:
+<PRE>
+  [sh] fundisp foo.tab'[TEXT(xxx:3A,yyy:3A,zzz:3a)]'
+           XXX          YYY          ZZZ
+  ------------ ------------ ------------
+         '111'        'aaa'        'bbb'
+         'ccc'        '222'        'ddd'
+</PRE>
+The issue of data type transitions (which to allow and which to disallow)
+is still under discussion.
+
+<!-- =section funtext SEE ALSO -->
+<!-- =text See funtools(n) for a list of Funtools help pages -->
+<!-- =stop -->
+
+<P>
+<A HREF="./help.html">Go to Funtools Help Index</A>
+
+<H5>Last updated: August 3, 2007</H5>
+
+</BODY>
+</HTML>