1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>Testing the chunked layout of HDF5</title>
</head>
<body>
<h1>Testing the chunked layout of HDF5</h1>
<p>This is the results of studying the chunked layout policy in
HDF5. A 1000 by 1000 array of integers was written to a file
dataset extending the dataset with each write to create, in the
end, a 5000 by 5000 array of 4-byte integers for a total data
storage size of 100 million bytes.
<p>
<center>
<img alt="Order that data was written" src="study_p1.gif">
<br><b>Fig 1: Write-order of Output Blocks</b>
</center>
<p>After the array was written, it was read back in blocks that
were 500 by 500 bytes in row major order (that is, the top-left
quadrant of output block one, then the top-right quadrant of
output block one, then the top-left quadrant of output block 2,
etc.).
<p>I tried to answer two questions:
<ul>
<li>How does the storage overhead change as the chunk size
changes?
<li>What does the disk seek pattern look like as the chunk size
changes?
</ul>
<p>I started with chunk sizes that were multiples of the read
block size or k*(500, 500).
<p>
<center>
<table border>
<caption align=bottom>
<b>Table 1: Total File Overhead</b>
</caption>
<tr>
<th>Chunk Size (elements)</th>
<th>Meta Data Overhead (ppm)</th>
<th>Raw Data Overhead (ppm)</th>
</tr>
<tr align=center>
<td>500 by 500</td>
<td>85.84</td>
<td>0.00</td>
</tr>
<tr align=center>
<td>1000 by 1000</td>
<td>23.08</td>
<td>0.00</td>
</tr>
<tr align=center>
<td>5000 by 1000</td>
<td>23.08</td>
<td>0.00</td>
</tr>
<tr align=center>
<td>250 by 250</td>
<td>253.30</td>
<td>0.00</td>
</tr>
<tr align=center>
<td>499 by 499</td>
<td>85.84</td>
<td>205164.84</td>
</tr>
</table>
</center>
<hr>
<p>
<center>
<img alt="500x500" src="study_500x500.gif">
<br><b>Fig 2: Chunk size is 500x500</b>
</center>
<p>The first half of Figure 2 shows output to the file while the
second half shows input. Each dot represents a file-level I/O
request and the lines that connect the dots are for visual
clarity. The size of the request is not indicated in the
graph. The output block size is four times the chunk size which
results in four file-level write requests per block for a total
of 100 requests. Since file space for the chunks was allocated
in output order, and the input block size is 1/4 the output
block size, the input shows a staircase effect. Each input
request results in one file-level read request. The downward
spike at about the 60-millionth byte is probably the result of a
cache miss for the B-tree and the downward spike at the end is
probably a cache flush or file boot block update.
<hr>
<p>
<center>
<img alt="1000x1000" src="study_1000x1000.gif">
<br><b>Fig 2: Chunk size is 1000x1000</b>
</center>
<p>In this test I increased the chunk size to match the output
chunk size and one can see from the first half of the graph that
25 file-level write requests were issued, one for each output
block. The read half of the test shows that four times the
amount of data was read as written. This results from the fact
that HDF5 must read the entire chunk for any request that falls
within that chunk, which is done because (1) if the data is
compressed the entire chunk must be decompressed, and (2) the
library assumes that a chunk size was chosen to optimize disk
performance.
<hr>
<p>
<center>
<img alt="5000x1000" src="study_5000x1000.gif">
<br><b>Fig 3: Chunk size is 5000x1000</b>
</center>
<p>Increasing the chunk size further results in even worse
performance since both the read and write halves of the test are
re-reading and re-writing vast amounts of data. This proves
that one should be careful that chunk sizes are not much larger
than the typical partial I/O request.
<hr>
<p>
<center>
<img alt="250x250" src="study_250x250.gif">
<br><b>Fig 4: Chunk size is 250x250</b>
</center>
<p>If the chunk size is decreased then the amount of data
transfered between the disk and library is optimal for no
caching, but the amount of meta data required to describe the
chunk locations increases to 250 parts per million. One can
also see that the final downward spike contains more file-level
write requests as the meta data is flushed to disk just before
the file is closed.
<hr>
<p>
<center>
<img alt="499x499" src="study_499x499.gif">
<br><b>Fig 4: Chunk size is 499x499</b>
</center>
<p>This test shows the result of choosing a chunk size which is
close to the I/O block size. Because the total size of the
array isn't a multiple of the chunk size, the library allocates
an extra zone of chunks around the top and right edges of the
array which are only partially filled. This results in
20,516,484 extra bytes of storage, a 20% increase in the total
raw data storage size. But the amount of meta data overhead is
the same as for the 500 by 500 test. In addition, the mismatch
causes entire chunks to be read in order to update a few
elements along the edge or the chunk which results in a 3.6-fold
increase in the amount of data transfered.
<hr>
<address><a href="mailto:matzke@llnl.gov">Robb Matzke</a></address>
<!-- Created: Fri Jan 30 21:04:49 EST 1998 -->
<!-- hhmts start -->
Last modified: Fri Jan 30 23:51:31 EST 1998
<!-- hhmts end -->
</body>
</html>
|