summaryrefslogtreecommitdiffstats
path: root/doc/html/study.html
blob: f9e192de5a13a96a6e9ba61925514316b5b63c10 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <title>Testing the chunked layout of HDF5</title>
  </head>

  <body>
    <h1>Testing the chunked layout of HDF5</h1>

    <p>This is the results of studying the chunked layout policy in
      HDF5. A 1000 by 1000 array of integers was written to a file
      dataset extending the dataset with each write to create, in the
      end, a 5000 by 5000 array of 4-byte integers for a total data
      storage size of 100 million bytes.

    <p>
      <center>
	<img alt="Order that data was written" src="study_p1.gif">
	<br><b>Fig 1: Write-order of Output Blocks</b>
      </center>

    <p>After the array was written, it was read back in blocks that
      were 500 by 500 bytes in row major order (that is, the top-left
      quadrant of output block one, then the top-right quadrant of
      output block one, then the top-left quadrant of output block 2,
      etc.).

    <p>I tried to answer two questions:
    <ul>
      <li>How does the storage overhead change as the chunk size
	changes?
      <li>What does the disk seek pattern look like as the chunk size
	changes?
    </ul>

    <p>I started with chunk sizes that were multiples of the read
      block size or k*(500, 500).

    <p>
      <center>
	<table border>
	  <caption align=bottom>
	    <b>Table 1: Total File Overhead</b>
	  </caption>
	  <tr>
	    <th>Chunk Size (elements)</th>
	    <th>Meta Data Overhead (ppm)</th>
	    <th>Raw Data Overhead (ppm)</th>
	  </tr>

	  <tr align=center>
	    <td>500 by 500</td>
	    <td>85.84</td>
	    <td>0.00</td>
	  </tr>
	  <tr align=center>
	    <td>1000 by 1000</td>
	    <td>23.08</td>
	    <td>0.00</td>
	  </tr>
	  <tr align=center>
	    <td>5000 by 1000</td>
	    <td>23.08</td>
	    <td>0.00</td>
	  </tr>
	  <tr align=center>
	    <td>250 by 250</td>
	    <td>253.30</td>
	    <td>0.00</td>
	  </tr>
	  <tr align=center>
	    <td>499 by 499</td>
	    <td>85.84</td>
	    <td>205164.84</td>
	  </tr>
	</table>
      </center>

    <hr>
    <p>
      <center>
	<img alt="500x500" src="study_500x500.gif">
	<br><b>Fig 2: Chunk size is 500x500</b>
      </center>

    <p>The first half of Figure 2 shows output to the file while the
      second half shows input.  Each dot represents a file-level I/O
      request and the lines that connect the dots are for visual
      clarity. The size of the request is not indicated in the
      graph. The output block size is four times the chunk size which
      results in four file-level write requests per block for a total
      of 100 requests. Since file space for the chunks was allocated
      in output order, and the input block size is 1/4 the output
      block size, the input shows a staircase effect.  Each input
      request results in one file-level read request. The downward
      spike at about the 60-millionth byte is probably the result of a
      cache miss for the B-tree and the downward spike at the end is
      probably a cache flush or file boot block update.

    <hr>
    <p>
      <center>
	<img alt="1000x1000" src="study_1000x1000.gif">
	<br><b>Fig 2: Chunk size is 1000x1000</b>
      </center>

    <p>In this test I increased the chunk size to match the output
      chunk size and one can see from the first half of the graph that
      25 file-level write requests were issued, one for each output
      block.  The read half of the test shows that four times the
      amount of data was read as written.  This results from the fact
      that HDF5 must read the entire chunk for any request that falls
      within that chunk, which is done because (1) if the data is
      compressed the entire chunk must be decompressed, and (2) the
      library assumes that a chunk size was chosen to optimize disk
      performance.

    <hr>
    <p>
      <center>
	<img alt="5000x1000" src="study_5000x1000.gif">
	<br><b>Fig 3: Chunk size is 5000x1000</b>
      </center>

    <p>Increasing the chunk size further results in even worse
      performance since both the read and write halves of the test are
      re-reading and re-writing vast amounts of data.  This proves
      that one should be careful that chunk sizes are not much larger
      than the typical partial I/O request.

    <hr>
    <p>
      <center>
	<img alt="250x250" src="study_250x250.gif">
	<br><b>Fig 4: Chunk size is 250x250</b>
      </center>

    <p>If the chunk size is decreased then the amount of data
      transfered between the disk and library is optimal for no
      caching, but the amount of meta data required to describe the
      chunk locations increases to 250 parts per million.  One can
      also see that the final downward spike contains more file-level
      write requests as the meta data is flushed to disk just before
      the file is closed.

    <hr>
    <p>
      <center>
	<img alt="499x499" src="study_499x499.gif">
	<br><b>Fig 4: Chunk size is 499x499</b>
      </center>

    <p>This test shows the result of choosing a chunk size which is 
      close to the I/O block size.  Because the total size of the
      array isn't a multiple of the chunk size, the library allocates
      an extra zone of chunks around the top and right edges of the
      array which are only partially filled.  This results in
      20,516,484 extra bytes of storage, a 20% increase in the total
      raw data storage size.  But the amount of meta data overhead is
      the same as for the 500 by 500 test.  In addition, the mismatch
      causes entire chunks to be read in order to update a few
      elements along the edge or the chunk which results in a 3.6-fold
      increase in the amount of data transfered.

    <hr>
    <address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
<!-- Created: Fri Jan 30 21:04:49 EST 1998 -->
<!-- hhmts start -->
Last modified: Fri Jan 30 23:51:31 EST 1998
<!-- hhmts end -->
  </body>
</html>