summaryrefslogtreecommitdiffstats
path: root/funtools/doc/text.html
diff options
context:
space:
mode:
authorWilliam Joye <wjoye@cfa.harvard.edu>2016-10-26 21:13:00 (GMT)
committerWilliam Joye <wjoye@cfa.harvard.edu>2016-10-26 21:13:00 (GMT)
commitda2e3d212171bbe64c1af39114fd067308656990 (patch)
tree9601f7ed15fa1394762124630c12a792bc073ec2 /funtools/doc/text.html
parent76b109ad6d97d19ab835596dc70149ef379f3733 (diff)
downloadblt-da2e3d212171bbe64c1af39114fd067308656990.zip
blt-da2e3d212171bbe64c1af39114fd067308656990.tar.gz
blt-da2e3d212171bbe64c1af39114fd067308656990.tar.bz2
rm funtools for update
Diffstat (limited to 'funtools/doc/text.html')
-rw-r--r--funtools/doc/text.html564
1 files changed, 0 insertions, 564 deletions
diff --git a/funtools/doc/text.html b/funtools/doc/text.html
deleted file mode 100644
index f4fdac8..0000000
--- a/funtools/doc/text.html
+++ /dev/null
@@ -1,564 +0,0 @@
-<!-- =defdoc funtext funtext n -->
-<HTML>
-<HEAD>
-<TITLE>Column-based Text Files</TITLE>
-</HEAD>
-<BODY>
-
-<!-- =section funtext NAME -->
-<H2><A NAME="funtext">Funtext: Support for Column-based Text Files</A></H2>
-
-<!-- =section funtext SYNOPSIS -->
-<H2>Summary</H2>
-<P>
-This document contains a summary of the options for processing column-based
-text files.
-
-<!-- =section funtext DESCRIPTION -->
-<H2>Description</H2>
-
-<P>
-Funtools will automatically sense and process "standard"
-column-based text files as if they were FITS binary tables without any
-change in Funtools syntax. In particular, you can filter text files
-using the same syntax as FITS binary tables:
-<pre>
- fundisp foo.txt'[cir 512 512 .1]'
- fundisp -T foo.txt > foo.rdb
- funtable foo.txt'[pha=1:10,cir 512 512 10]' foo.fits
-</pre>
-
-<P>
-The first example displays a filtered selection of a text file. The
-second example converts a text file to an RDB file. The third example
-converts a filtered selection of a text file to a FITS binary table.
-
-<P>
-Text files can also be used in Funtools image programs. In this case,
-you must provide binning parameters (as with raw event files), using
-the bincols keyword specifier:
-
-<pre>
- bincols=([xname[:tlmin[:tlmax:[binsiz]]]],[yname[:tlmin[:tlmax[:binsiz]]]
-</pre>
-
-For example:
-<pre>
- funcnts foo'[bincols=(x:1024,y:1024)]' "ann 512 512 0 10 n=10"
-</pre>
-
-<H2>Standard Text Files</H2>
-
-<P>
-Standard text files have the following characteristics:
-
-<ul>
-<li> Optional comment lines start with #
-<li> Optional blank lines are considered comments
-<li> An optional table header consists of the following (in order):
-<ul>
- <li> a single line of alpha-numeric column names
- <li> an optional line of unit strings containing the same number of cols
- <li> an optional line of dashes containing the same number of cols
-</ul>
-<li> Data lines follow the optional header and (for the present) consist of
- the same number of columns as the header.
-<li> Standard delimiters such as space, tab, comma, semi-colon, and bar.
-</ul>
-
-<P>
-Examples:
-
-<pre>
- # rdb file
- foo1 foo2 foo3 foos
- ---- ---- ---- ----
- 1 2.2 3 xxxx
- 10 20.2 30 yyyy
-
- # multiple consecutive whitespace and dashes
- foo1 foo2 foo3 foos
- --- ---- ---- ----
- 1 2.2 3 xxxx
- 10 20.2 30 yyyy
-
- # comma delims and blank lines
- foo1,foo2,foo3,foos
-
- 1,2.2,3,xxxx
- 10,20.2,30,yyyy
-
- # bar delims with null values
- foo1|foo2|foo3|foos
- 1||3|xxxx
- 10|20.2||yyyy
-
- # header-less data
- 1 2.2 3 xxxx
- 10 20.2 30 yyyy
-</pre>
-
-<P>
-The default set of token delimiters consists of spaces, tabs, commas,
-semi-colons, and vertical bars. Several parsers are used
-simultaneously to analyze a line of text in different ways. One way
-of analyzing a line is to allow a combination of spaces, tabs, and
-commas to be squashed into a single delimiter (no null values between
-consecutive delimiters). Another way is to allow tab, semi-colon, and
-vertical bar delimiters to support null values, i.e. two consecutive
-delimiters implies a null value (e.g. RDB file). A successful parser
-is one which returns a consistent number of columns for all rows, with
-each column having a consistent data type. More than one parser can
-be successful. For now, it is assumed that successful parsers all
-return the same tokens for a given line. (Theoretically, there are
-pathological cases, which will be taken care of as needed). Bad parsers
-are discarded on the fly.
-
-<P>
-If the header does not exist, then names "col1", "col2", etc. are
-assigned to the columns to allow filtering. Furthermore, data types
-for each column are determined by the data types found in the columns
-of the first data line, and can be one of the following: string, int,
-and double. Thus, all of the above examples return the following
-display:
-<pre>
- fundisp foo'[foo1>5]'
- FOO1 FOO2 FOO3 FOOS
- ---------- --------------------- ---------- ------------
- 10 20.20000000 30 yyyy
-</pre>
-
-<H2>Comments Convert to Header Params</H2>
-
-<P>
-Comments which precede data rows are converted into header parameters and
-will be written out as such using funimage or funhead. Two styles of comments
-are recognized:
-
-<P>
-1. FITS-style comments have an equal sign "=" between the keyword and
-value and an optional slash "/" to signify a comment. The strict FITS
-rules on column positions are not enforced. In addition, strings only
-need to be quoted if they contain whitespace. For example, the following
-are valid FITS-style comments:
-
-<pre>
- # fits0 = 100
- # fits1 = /usr/local/bin
- # fits2 = "/usr/local/bin /opt/local/bin"
- # fits3c = /usr/local/bin /opt/local/bin /usr/bin
- # fits4c = "/usr/local/bin /opt/local/bin" / path dir
-</pre>
-
-Note that the fits3c comment is not quoted and therefore its value is the
-single token "/usr/local/bin" and the comment is "opt/local/bin /usr/bin".
-This is different from the quoted comment in fits4c.
-
-<P>
-2. Free-form comments can have an optional colon separator between the
-keyword and value. In the absence of quote, all tokens after the
-keyword are part of the value, i.e. no comment is allowed. If a string
-is quoted, then slash "/" after the string will signify a comment.
-For example:
-
-<pre>
- # com1 /usr/local/bin
- # com2 "/usr/local/bin /opt/local/bin"
- # com3 /usr/local/bin /opt/local/bin /usr/bin
- # com4c "/usr/local/bin /opt/local/bin" / path dir
-
- # com11: /usr/local/bin
- # com12: "/usr/local/bin /opt/local/bin"
- # com13: /usr/local/bin /opt/local/bin /usr/bin
- # com14c: "/usr/local/bin /opt/local/bin" / path dir
-</pre>
-
-<P>
-Note that com3 and com13 are not quoted, so the whole string is part of
-the value, while comz4c and com14c are quoted and have comments following
-the values.
-
-<P>
-Some text files have column name and data type information in the header.
-You can specify the format of column information contained in the
-header using the "hcolfmt=" specification. See below for a detailed
-description.
-
-<H2>Multiple Tables in a Single File</H2>
-
-<P>
-Multiple tables are supported in a single file. If an RDB-style file
-is sensed, then a ^L (vertical tab) will signify end of
-table. Otherwise, an end of table is sensed when a new header (i.e.,
-all alphanumeric columns) is found. (Note that this heuristic does not
-work for single column tables where the column type is ASCII and the
-table that follows also has only one column.) You also can specify
-characters that signal an end of table condition using the <b>eot=</b>
-keyword. See below for details.
-
-<P>
-You can access the nth table (starting from 1) in a multi-table file
-by enclosing the table number in brackets, as with a FITS extension:
-
-<pre>
- fundisp foo'[2]'
-</pre>
-The above example will display the second table in the file.
-(Index values start at 1 in oder to maintain logical compatibility
-with FITS files, where extension numbers also start at 1).
-
-
-<H2>TEXT() Specifier</H2>
-
-<P>
-As with ARRAY() and EVENTS() specifiers for raw image arrays and raw
-event lists respectively, you can use TEXT() on text files to pass
-key=value options to the parsers. An empty set of keywords is
-equivalent to not having TEXT() at all, that is:
-
-<pre>
- fundisp foo
- fundisp foo'[TEXT()]'
-</pre>
-
-are equivalent. A multi-table index number is placed before the TEXT()
-specifier as the first token, when indexing into a multi-table:
-
- fundisp foo'[2,TEXT(...)]'
-
-<P>
-The filter specification is placed after the TEXT() specifier, separated
-by a comma, or in an entirely separate bracket:
-
-<pre>
- fundisp foo'[TEXT(...),circle 512 512 .1]'
- fundisp foo'[2,TEXT(...)][circle 512 512 .1]'
-</pre>
-
-<H2>Text() Keyword Options</H2>
-
-<P>
-The following is a list of keywords that can be used within the TEXT()
-specifier (the first three are the most important):
-
-<DL>
-
-<P>
-<DT> delims="[delims]"
-<DD>Specify token delimiters for this file. Only a single parser having these
-delimiters will be used to process the file.
-<pre>
- fundisp foo.fits'[TEXT(delims="!")]'
- fundisp foo.fits'[TEXT(delims="\t%")]'
-</pre>
-
-<P>
-<DT> comchars="[comchars]"
-<DD> Specify comment characters. You must include "\n" to allow blank lines.
-These comment characters will be used for all standard parsers (unless delims
-are also specified).
-<pre>
- fundisp foo.fits'[TEXT(comchars="!\n")]'
-</pre>
-
-<P>
-<DT> cols="[name1:type1 ...]"
-<DD> Specify names and data type of columns. This overrides header
-names and/or data types in the first data row or default names and
-data types for header-less tables.
-<pre>
- fundisp foo.fits'[TEXT(cols="x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e")]'
-</pre>
-<P>
-If the column specifier is the only keyword, then the cols= is not
-required (in analogy with EVENTS()):
-<pre>
- fundisp foo.fits'[TEXT(x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e)]'
-</pre>
-Of course, an index is allowed in this case:
-<pre>
- fundisp foo.fits'[2,TEXT(x:I,y:I,pha:I,pi:I,time:D,dx:E,dy:e)]'
-</pre>
-
-<P>
-<DT> eot="[eot delim]"
-<DD> Specify end of table string specifier for multi-table files. RDB
-files support ^L. The end of table specifier is a string and the whole
-string must be found alone on a line to signify EOT. For example:
-<pre>
- fundisp foo.fits'[TEXT(eot="END")]'
-</pre>
-will end the table when a line contains "END" is found. Multiple lines
-are supported, so that:
-<pre>
- fundisp foo.fits'[TEXT(eot="END\nGAME")]'
-</pre>
-will end the table when a line contains "END" followed by a line
-containing "GAME".
-<P>
-In the absence of an EOT delimiter, a new table will be sensed when a new
-header (all alphanumeric columns) is found.
-
-<P>
-<DT> null1="[datatype]"
-<DD> Specify data type of a single null value in row 1.
-Since column data types are determined by the first row, a null value
-in that row will result in an error and a request to specify names and
-data types using cols=. If you only have a one null in row 1, you don't
-need to specify all names and columns. Instead, use null1="type" to
-specify its data type.
-
-<P>
-<DT> alen=[n]
-<DD>Specify size in bytes for ASCII type columns.
-FITS binary tables only support fixed length ASCII columns, so a
-size value must be specified. The default is 16 bytes.
-
-<P>
-<DT> nullvalues=["true"|"false"]
-<DD>Specify whether to expect null values.
-Give the parsers a hint as to whether null values should be allowed. The
-default is to try to determine this from the data.
-
-<P>
-<DT> whitespace=["true"|"false"]
-<DD> Specify whether surrounding white space should be kept as part of
-string tokens. By default surrounding white space is removed from
-tokens.
-
-<P>
-<DT> header=["true"|"false"]
-<DD>Specify whether to require a header. This is needed by tables
-containing all string columns (and with no row containing dashes), in
-order to be able to tell whether the first row is a header or part of
-the data. The default is false, meaning that the first row will be
-data. If a row dashes are present, the previous row is considered the
-column name row.
-
-<P>
-<DT> units=["true"|"false"]
-<DD>Specify whether to require a units line.
-Give the parsers a hint as to whether a row specifying units should be
-allowed. The default is to try to determine this from the data.
-
-<P>
-<DT> i2f=["true"|"false"]
-<DD>Specify whether to allow int to float conversions.
-If a column in row 1 contains an integer value, the data type for that
-column will be set to int. If a subsequent row contains a float in
-that same column, an error will be signaled. This flag specifies that,
-instead of an error, the float should be silently truncated to
-int. Usually, you will want an error to be signaled, so that you can
-specify the data type using cols= (or by changing the value of
-the column in row 1).
-
-<P>
-<DT> comeot=["true"|"false"|0|1|2]
-<DD>Specify whether comment signifies end of table.
-If comeot is 0 or false, then comments do not signify end of table and
-can be interspersed with data rows. If the value is true or 1 (the
-default for standard parsers), then non-blank lines (e.g. lines
-beginning with '#') signify end of table but blanks are allowed
-between rows. If the value is 2, then all comments, including blank
-lines, signify end of table.
-
-<P>
-<DT> lazyeot=["true"|"false"]
-<DD>Specify whether "lazy" end of table should be permitted (default is
-true for standard formats, except rdb format where explicit ^L is required
-between tables). A lazy EOT can occur when a new table starts directly
-after an old one, with no special EOT delimiter. A check for this EOT
-condition is begun when a given row contains all string tokens. If, in
-addition, there is a mismatch between the number of tokens in the
-previous row and this row, or a mismatch between the number of string
-tokens in the prev row and this row, a new table is assumed to have
-been started. For example:
-<PRE>
- ival1 sval3
- ----- -----
- 1 two
- 3 four
-
- jval1 jval2 tval3
- ----- ----- ------
- 10 20 thirty
- 40 50 sixty
-</PRE>
-Here the line "jval1 ..." contains all string tokens. In addition,
-the number of tokens in this line (3) differs from the number of
-tokens in the previous line (2). Therefore a new table is assumed
-to have started. Similarly:
-<PRE>
- ival1 ival2 sval3
- ----- ----- -----
- 1 2 three
- 4 5 six
-
- jval1 jval2 tval3
- ----- ----- ------
- 10 20 thirty
- 40 50 sixty
-</PRE>
-Again, the line "jval1 ..." contains all string tokens. The number of
-string tokens in the previous row (1) differs from the number of
-tokens in the current row(3). We therefore assume a new table as been
-started. This lazy EOT test is not performed if lazyeot is explicitly
-set to false.
-
-<P>
-<DT> hcolfmt=[header column format]
-<DD> Some text files have column name and data type information in the header.
-For example, VizieR catalogs have headers containing both column names
-and data types:
-<PRE>
- #Column e_Kmag (F6.3) ?(k_msigcom) K total magnitude uncertainty (4) [ucd=ERROR]
- #Column Rflg (A3) (rd_flg) Source of JHK default mag (6) [ucd=REFER_CODE]
- #Column Xflg (I1) [0,2] (gal_contam) Extended source contamination (10) [ucd=CODE_MISC]
-</PRE>
-
-while Sextractor files have headers containing column names alone:
-
-<PRE>
- # 1 X_IMAGE Object position along x [pixel]
- # 2 Y_IMAGE Object position along y [pixel]
- # 3 ALPHA_J2000 Right ascension of barycenter (J2000) [deg]
- # 4 DELTA_J2000 Declination of barycenter (J2000) [deg]
-</PRE>
-The hcolfmt specification allows you to describe which header lines
-contain column name and data type information. It consists of a string
-defining the format of the column line, using "$col" (or "$name") to
-specify placement of the column name, "$fmt" to specify placement of the
-data format, and "$skip" to specify tokens to ignore. You also can
-specify tokens explicitly (or, for those users familiar with how
-sscanf works, you can specify scanf skip specifiers using "%*").
-For example, the VizieR hcolfmt above might be specified in several ways:
-<PRE>
- Column $col ($fmt) # explicit specification of "Column" string
- $skip $col ($fmt) # skip one token
- %*s $col ($fmt) # skip one string (using scanf format)
-</PRE>
-while the Sextractor format might be specified using:
-<PRE>
- $skip $col # skip one token
- %*d $col # skip one int (using scanf format)
-</PRE>
-You must ensure that the hcolfmt statement only senses actual column
-definitions, with no false positives or negatives. For example, the
-first Sextractor specification, "$skip $col", will consider any header
-line containing two tokens to be a column name specifier, while the
-second one, "%*d $col", requires an integer to be the first token. In
-general, it is preferable to specify formats as explicitly as
-possible.
-
-<P>
-Note that the VizieR-style header info is sensed automatically by the
-funtools standard VizieR-like parser, using the hcolfmt "Column $col
-($fmt)". There is no need for explicit use of hcolfmt in this case.
-
-<P>
-<DT> debug=["true"|"false"]
-<DD>Display debugging information during parsing.
-
-</DL>
-
-<H2>Environment Variables</H2>
-
-<P>
-Environment variables are defined to allow many of these TEXT() values to be
-set without having to include them in TEXT() every time a file is processed:
-
-<pre>
- keyword environment variable
- ------- --------------------
- delims TEXT_DELIMS
- comchars TEXT_COMCHARS
- cols TEXT_COLUMNS
- eot TEXT_EOT
- null1 TEXT_NULL1
- alen TEXT_ALEN
- bincols TEXT_BINCOLS
- hcolfmt TEXT_HCOLFMT
-</pre>
-
-<H2>Restrictions and Problems</H2>
-
-<P>
-As with raw event files, the '+' (copy extensions) specifier is not
-supported for programs such as funtable.
-
-<P>
-String to int and int to string data conversions are allowed by the
-text parsers. This is done more by force of circumstance than by
-conviction: these transitions often happens with VizieR catalogs,
-which we want to support fully. One consequence of allowing these
-transitions is that the text parsers can get confused by columns which
-contain a valid integer in the first row and then switch to a
-string. Consider the following table:
-<PRE>
- xxx yyy zzz
- ---- ---- ----
- 111 aaa bbb
- ccc 222 ddd
-</PRE>
-The xxx column has an integer value in row one a string in row two,
-while the yyy column has the reverse. The parser will erroneously
-treat the first column as having data type int:
-<PRE>
- fundisp foo.tab
- XXX YYY ZZZ
- ---------- ------------ ------------
- 111 'aaa' 'bbb'
- 1667457792 '222' 'ddd'
-</PRE>
-while the second column is processed correctly. This situation can be avoided
-in any number of ways, all of which force the data type of the first column
-to be a string. For example, you can edit the file and explicitly quote the
-first row of the column:
-<PRE>
- xxx yyy zzz
- ---- ---- ----
- "111" aaa bbb
- ccc 222 ddd
-
- [sh] fundisp foo.tab
- XXX YYY ZZZ
- ------------ ------------ ------------
- '111' 'aaa' 'bbb'
- 'ccc' '222' 'ddd'
-</PRE>
-You can edit the file and explicitly set the data type of the first column:
-<PRE>
- xxx:3A yyy zzz
- ------ ---- ----
- 111 aaa bbb
- ccc 222 ddd
-
- [sh] fundisp foo.tab
- XXX YYY ZZZ
- ------------ ------------ ------------
- '111' 'aaa' 'bbb'
- 'ccc' '222' 'ddd'
-</PRE>
-You also can explicitly set the column names and data types of all columns,
-without editing the file:
-<PRE>
- [sh] fundisp foo.tab'[TEXT(xxx:3A,yyy:3A,zzz:3a)]'
- XXX YYY ZZZ
- ------------ ------------ ------------
- '111' 'aaa' 'bbb'
- 'ccc' '222' 'ddd'
-</PRE>
-The issue of data type transitions (which to allow and which to disallow)
-is still under discussion.
-
-<!-- =section funtext SEE ALSO -->
-<!-- =text See funtools(n) for a list of Funtools help pages -->
-<!-- =stop -->
-
-<P>
-<A HREF="./help.html">Go to Funtools Help Index</A>
-
-<H5>Last updated: August 3, 2007</H5>
-
-</BODY>
-</HTML>