summaryrefslogtreecommitdiffstats
path: root/doc/html/Ragged.html
blob: fa3b61e3304029d91b54e4084d44d45883594680 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <title>Ragged Arrays</title>
  </head>

  <body>
    <h1>Ragged Arrays</h1>

<table border=1>
<tr><th align=left>
<font color=red>
The H5RA Interface is strictly experimental at this time;
the interface may change dramatically or support for ragged arrays
may be unavailable in future in releases.  As a result, future releases
may be unable to retrieve data stored with this interface.
<p><center>Use these functions at your own risk!<br>
Do not create any archives using this interface!</center>
</font>
</th></tr>
</table>

    <h2>1. Introduction</h2>

    <p><b>Ragged arrays should be considered alpha quality. They were
	added to HDF5 to satisfy the needs of the ASCI/DMF vector
	bundle project; the interface and storage methods are likely
	to change in the future in ways that are not backward
	compatible.</b>

    <p>A two-dimensional ragged array has been added to the library
      and built on top of other existing functionality.  A ragged
      array is a one-dimensional array of <em>rows</em> where the
      length of any row is independent of the lengths of the other
      rows.  The number of rows and the length of each row can be
      changed at any time (the current version does not support
      truncating an array by removing rows). All elements of the
      ragged array have the same data type and, as with datasets, the
      data is type-converted between memory buffers and files.

    <p>The current implementation works best when most of the rows are 
      approximately the same length since a two dimensional dataset
      can be created to hold a nominal number of elements from each
      row with the additional elements stored in a separate dataset
      which implements a heap.

    <p>A ragged array is a composite object implemented as a group
      with three datasets.  The name of the group is the name of the
      ragged array. The <em>raw</em> dataset is a two-dimensional
      array that contains the first <em>N</em> elements of each row
      where <em>N</em> is determined by the application when the array
      is created.  If most rows have fewer than <em>N</em> elements
      then internal fragmentation may be quite bad.

    <p>The <em>over</em> dataset is a one-dimensional array that
      contains elements from each row that don't fit in the
      <em>raw</em> dataset.

    <p>The <em>meta</em> dataset maintains information about each row
      such as the number of elements in the row, the location of the
      overflow elements in the <em>over</em> dataset (if any), and the 
      amount of space reserved in <em>over</em> for the row.  The
      <em>meta</em> dataset has one entry per row and is where most of 
      the storage overhead is concentrated when rows are relatively
      short.

    <h2>2. Opening and Closing</h2>

    <dl>
      <dt><code>hid_t H5RAcreate (hid_t <em>location</em>, const char
	  *<em>name</em>, hid_t <em>type</em>, hid_t
	  <em>plist</em>)</code>
      <dd>This function creates a new ragged array by creating the
	group with the specified name and populating it with the
	component datasets (which should not be accessed
	independently). The dataset creation property list
	<em>plist</em> defines the width of the <em>raw</em> dataset;
	a nominal row is considered to be the width of a chunk.  The
	<em>type</em> argument defines the data type which will be
	stored in the file. A negative value is returned if the array
	cannot be created.

	<br><br>
      <dt><code>hid_t H5RAopen (hid_t <em>location</em>, const char
	  *<em>name</em>)</code>
      <dd>This function opens a ragged array by opening the specified
	group and the component datasets (which should not be accessed 
	indepently).  A negative value is returned if the array cannot 
	be opened.

	<br><br>
      <dt><code>herr_t H5RAclose (hid_t <em>array</em>)</code>
      <dd>All ragged arrays should be closed by calling this
	function.  The group and component datasets will be closed
	automatically by the library.
    </dl>

    <h2>3. Reading and Writing</h2>

    <p>In order to be as efficient as possible the ragged array layer
      operates on sets of contiguous rows and it is to the
      application's advantage to perform I/O on as many rows at a time 
      as possible.  These functions take a starting row number and the 
      number of rows on which to operate.

    <dl>
      <dt><code>herr_t H5RAwrite (hid_t <em>array_id</em>, hssize_t
	  <em>start_row</em>, hsize_t <em>nrows</em>, hid_t
	  <em>type</em>, hsize_t <em>size</em>[], void
	  *<em>buf</em>[])</code>
      <dd>A set of ragged array rows beginning at <em>start_row</em>
	and continuing for <em>nrows</em> is written to the file,
	converting the memory data type <em>type</em> to the file data
	type which was defined when the array was created.  The number 
	of elements to write from each row is specified in the
	<em>size</em> array and the data for each row is pointed to
	from the <em>buf</em> array.  The <em>size</em> and
	<em>buf</em> are indexed so their first element corresponds to 
	the first row on which to operate.

	<br><br>
      <dt><code>herr_t H5RAread (hid_t <em>array_id</em>, hssize_t
	  <em>start_row</em>, hsize_t <em>nrows</em>, hid_t
	  <em>type</em>, hsize_t <em>size</em>[], void
	  *<em>buf</em>[])</code>
      <dd>A set of ragged array rows beginning at <em>start_row</em>
	and continuing for <em>nrows</em> is read from the file,
	converting from the file data type which was defined when the
	array was created to the memory data type <em>type</em>. The
	number of elements to read from each row is specified in the
	<em>size</em> array and the buffers in which to place the
	results are pointed to by the <em>buf</em> array.  On return,
	the <em>size</em> array will contain the actual size of the
	row which may be different than the requested size.  When the
	request size is smaller than the actual size the row will be
	truncated; otherwise the remainder of the output buffer will
	be zero filled.  If a pointer in the <em>buf</em> array is
	null then the library will ignore the corresponding
	<em>size</em> value and allocate a buffer large enough to hold 
	the entire row. This function returns negative for failures
	with <em>buf</em> containing the original input values.
    </dl>
      
<!--
    <hr>
    <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
-->
<!-- Created: Wed Aug 26 14:10:32 EDT 1998 -->
<!-- hhmts start -->
<!--
Last modified: Fri Aug 28 14:27:19 EDT 1998
-->
<!-- hhmts end -->

<hr>
<address>
<a href="mailto:hdfhelp@ncsa.uiuc.edu">HDF Help Desk</a>
</address>

Last modified:  21 October 1998
                                        
  </body>
</html>