summaryrefslogtreecommitdiffstats
path: root/doc/html/TechNotes/IOPipe.html
blob: 7c24e2c71a1593f53062a99d4d605281149cf99e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <title>The Raw Data I/O Pipeline</title>
  </head>

  <body>
    <h1>The Raw Data I/O Pipeline</h1>

    <p>The HDF5 raw data pipeline is a complicated beast that handles
      all aspects of raw data storage and transfer of that data
      between the file and the application.  Data can be stored
      contiguously (internal or external), in variable size external
      segments, or regularly chunked; it can be sparse, extendible,
      and/or compressible. Data transfers must be able to convert from
      one data space to another, convert from one number type to
      another, and perform partial I/O operations. Furthermore,
      applications will expect their common usage of the pipeline to
      perform well.

    <p>To accomplish these goals, the pipeline has been designed in a
      modular way so no single subroutine is overly complicated and so
      functionality can be inserted easily at the appropriate
      locations in the pipeline.  A general pipeline was developed and
      then certain paths through the pipeline were optimized for
      performance.

    <p>We describe only the file-to-memory side of the pipeline since
      the memory-to-file side is a mirror image. We also assume that a
      proper hyperslab of a simple data space is being read from the
      file into a proper hyperslab of a simple data space in memory,
      and that the data type is a compound type which may require
      various number conversions on its members.

      <img alt="Figure 1" src="pipe1.gif">

    <p>The diagrams should be read from the top down. The Line A
      in the figure above shows that <code>H5Dread()</code> copies
      data from a hyperslab of a file dataset to a hyperslab of an
      application buffer by calling <code>H5D_read()</code>. And
      <code>H5D_read()</code> calls, in a loop,
      <code>H5S_simp_fgath()</code>, <code>H5T_conv_struct()</code>,
      and <code>H5S_simp_mscat()</code>. A temporary buffer, TCONV, is
      loaded with data points from the file, then data type conversion
      is performed on the temporary buffer, and finally data points
      are scattered out to application memory. Thus, data type
      conversion is an in-place operation and data space conversion
      consists of two steps. An additional temporary buffer, BKG, is
      large enough to hold <em>N</em> instances of the destination
      data type where <em>N</em> is the same number of data points
      that can be held by the TCONV buffer (which is large enough to
      hold either source or destination data points).

    <p>The application sets an upper limit for the size of the TCONV
      buffer and optionally supplies a buffer. If no buffer is
      supplied then one will be created by calling
      <code>malloc()</code> when the pipeline is executed (when
      necessary) and freed when the pipeline exits.  The size of the
      BKG buffer depends on the size of the TCONV buffer and if the
      application supplies a BKG buffer it should be at least as large
      as the TCONV buffer.  The default size for these buffers is one
      megabyte but the buffer might not be used to full capacity if
      the buffer size is not an integer multiple of the source or
      destination data point size (whichever is larger, but only
      destination for the BKG buffer).



    <p>Occassionally the destination data points will be partially
      initialized and the <code>H5Dread()</code> operation should not
      clobber those values.  For instance, the destination type might
      be a struct with members <code>a</code> and <code>b</code> where
      <code>a</code> is already initialized and we're reading
      <code>b</code> from the file.  An extra line, G, is added to the
      pipeline to provide the type conversion functions with the
      existing data.

      <img alt="Figure 2" src="pipe2.gif">

    <p>It will most likely be quite common that no data type
      conversion is necessary.  In such cases a temporary buffer for
      data type conversion is not needed and data space conversion
      can happen in a single step. In fact, when the source and
      destination data are both contiguous (they aren't in the
      picture) the loop degenerates to a single iteration.


      <img alt="Figure 3" src="pipe3.gif">

    <p>So far we've looked only at internal contiguous storage, but by
      replacing Line B in Figures 1 and 2 and Line A in Figure 3 with
      Figure 4 the pipeline is able to handle regularly chunked
      objects. Line B of Figure 4 is executed once for each chunk
      which contains data to be read and the chunk address is found by
      looking at a multi-dimensional key in a chunk B-tree which has
      one entry per chunk.

      <img alt="Figure 4" src="pipe4.gif">

    <p>If a single chunk is requested and the destination buffer is
      the same size/shape as the chunk, then the CHUNK buffer is
      bypassed and the destination buffer is used instead as shown in
      Figure 5.

      <img alt="Figure 5" src="pipe5.gif">

    <hr>
    <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
<!-- Created: Tue Mar 17 11:13:35 EST 1998 -->
<!-- hhmts start -->
Last modified: Wed Mar 18 10:38:30 EST 1998
<!-- hhmts end -->
  </body>
</html>