-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
180 lines (143 loc) · 8.39 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
s3backer - FUSE-based single file backing store via Amazon S3
Overview
s3backer is a filesystem that contains a single file backed by the Amazon
Simple Storage Service (Amazon S3). As a filesystem, it is very simple:
it provides a single normal file having a fixed size. Underneath, the
file is divided up into blocks, and the content of each block is stored
in a unique Amazon S3 object. In other words, what s3backer provides is
really more like an S3-backed virtual hard disk device, rather than a
filesystem.
In typical usage, a `normal' filesystem is mounted on top of the file
exported by the s3backer filesystem using a loopback mount (or disk image
mount on Mac OS X).
This arrangement has several benefits compared to more complete S3
filesystem implementations:
o By not attempting to implement a complete filesystem, which is a com-
plex undertaking and difficult to get right, s3backer can stay very
lightweight and simple. Only three HTTP operations are used: GET,
PUT, and DELETE. All of the experience and knowledge about how to
properly implement filesystems that already exists can be reused.
o By utilizing existing filesystems, you get full UNIX filesystem
semantics. Subtle bugs or missing functionality relating to hard
links, extended attributes, POSIX locking, etc. are avoided.
o The gap between normal filesystem semantics and Amazon S3 ``eventual
consistency'' is more easily and simply solved when one can interpret
S3 objects as simple device blocks rather than filesystem objects
(see below).
o When storing your data on Amazon S3 servers, which are not under your
control, the ability to encrypt and authenticate data becomes a crit-
ical issue. s3backer supports secure encryption and authentication.
Alternately, the encryption capability built into the Linux loopback
device can be used.
o Since S3 data is accessed over the network, local caching is also
very important for performance reasons. Since s3backer presents the
equivalent of a virtual hard disk to the kernel, most of the filesys-
tem caching can be done where it should be: in the kernel, via the
kernel's page cache. However s3backer also includes its own internal
block cache for increased performance, using asynchronous worker
threads to take advantage of the parallelism inherent in the network.
Consistency Guarantees
Amazon S3 makes relatively weak guarantees relating to the timing and
consistency of reads vs. writes (collectively known as ``eventual consis-
tency''). s3backer includes logic and configuration parameters to work
around these limitations, allowing the user to guarantee consistency to
whatever level desired, up to and including 100% detection and avoidance
of incorrect data. These are:
1. s3backer enforces a minimum delay between consecutive PUT or DELETE
operations on the same block. This ensures that Amazon S3 doesn't
receive these operations out of order.
2. s3backer maintains an internal block MD5 checksum cache, which
enables automatic detection and rejection of `stale' blocks returned
by GET operations.
This logic is configured by the following command line options:
--md5CacheSize, --md5CacheTime, and --minWriteDelay.
Zeroed Block Optimization
As a simple optimization, s3backer does not store blocks containing all
zeroes; instead, they are simply deleted. Conversely, reads of non-exis-
tent blocks will contain all zeroes. In other words, the backed file is
always maximally sparse.
As a result, blocks do not need to be created before being used and no
special initialization is necessary when creating a new filesystem.
When the --listBlocks flag is given, s3backer will list all existing
blocks at startup so it knows ahead of time exactly which blocks are
empty.
File and Block Size Auto-Detection
As a convenience, whenever the first block of the backed file is written,
s3backer includes as meta-data (in the ``x-amz-meta-s3backer-filesize''
header) the total size of the file. Along with the size of the block
itself, this value can be checked and/or auto-detected later when the
filesystem is remounted, eliminating the need for the --blockSize or
--size flags to be explicitly provided and avoiding accidental mis-inter-
pretation of an existing filesystem.
Block Cache
s3backer includes support for an internal block cache to increase perfor-
mance. The block cache cache is completely separate from the MD5 cache
which only stores MD5 checksums transiently and whose sole purpose is to
mitigate ``eventual consistency''. The block cache is a traditional
cache containing cached data blocks. When full, clean blocks are evicted
as necessary in LRU order.
Reads of cached blocks will return immediately with no network traffic.
Writes to the cache also return immediately and trigger an asynchronous
write operation to the network via a separate worker thread. Because the
kernel typically writes blocks through FUSE filesystems one at a time,
performing writes asynchronously allows s3backer to take advantage of the
parallelism inherent in the network, vastly improving write performance.
The block cache can be configured to store the cached data in a local
file instead of in memory. This permits larger cache sizes and allows
s3backer to reload cached data after a restart. Reloaded data is veri-
fied before reuse.
The block cache is configured by the following command line options:
--blockCacheFile, --blockCacheNoVerify, --blockCacheSize,
--blockCacheSync, --blockCacheThreads, --blockCacheTimeout, and
--blockCacheWriteDelay.
Read Ahead
s3backer implements a simple read-ahead algorithm in the block cache.
When a configurable number of blocks are read in order, block cache
worker threads are awoken to begin reading subsequent blocks into the
block cache. Read ahead continues as long as the kernel continues read-
ing blocks sequentially. The kernel typically requests blocks one at a
time, so having multiple worker threads already reading the next few
blocks improves read performance by taking advantage of the parallelism
inherent in the network.
Note that the kernel implements a read ahead algorithm as well; its
behavior should be taken into consideration. By default, s3backer passes
the -o max_readahead=0 option to FUSE.
Read ahead is configured by the --readAhead and --readAheadTrigger com-
mand line options.
Encryption and Authentication
s3backer supports encryption via the --encrypt, --password, and
--passwordFile flags. When encryption is enabled, SHA1 HMAC authentica-
tion is also automatically enabled, and s3backer rejects any blocks that
are not properly encrypted and signed.
Encrypting at the s3backer layer is preferable to encrypting at an upper
layer (e.g., at the loopback device layer), because if the data s3backer
sees is already encrypted it can't optimize away zeroed blocks or do
meaningful compression.
Read-Only Access
An Amazon S3 account is not required in order to use s3backer. The
filesystem must already exist and have S3 objects with ACL's configured
for public read access (see --accessType below); users should perform the
looback mount with the read-only flag (see mount(8)) and provide the
--readOnly flag to s3backer. This mode of operation facilitates the cre-
ation of public, read-only filesystems.
Simultaneous Mounts
Although it functions over the network, the s3backer filesystem is not a
distributed filesystem and does not support simultaneous read/write
mounts. (This is not something you would normally do with a hard-disk
partition either.) s3backer does not detect this situation; it is up to
the user to ensure that it doesn't happen.
Statistics File
s3backer populates the filesystem with a human-readable statistics file.
See --statsFilename below.
Logging
In normal operation s3backer will log via syslog(3). When run with the
-d or -f flags, s3backer will log to standard error.
------------------------------------------------------------------------
Home page: http://s3backer.googlecode.com/
------------------------------------------------------------------------
See INSTALL for installation instructions. After installing, see the s3backer(1)
man page for how to run it.
See COPYING for license.
See CHANGES for change history.
Enjoy!
$Id$