Comments (3)
Some thoughts on these ideas:
It makes sense that (a) the initial --listBlocks operation and (b) maintaining
a zero block bitmap be separate functions (with the former implying the latter).
However, maintaining a zero bit map must be optional, because (in current
implementation) it costs 1 bit per block and this could be too big for memory
in some combinations of small block size and large filesystem. So I think the
right answer is to add a --zeroTrack flag (or whatever) which is implicitly
enabled by --listBlocks.
Performing --listBlocks in the background is a good improvement, however there
is an implicit race condition which must be handled, namely a previously zero
block is written to right at the same time the --listBlocks response returns
saying the block is zero. Unless the cache is enabled (which we can't assume)
you need to somehow track this scenario. E.g., a simple way to do this is to
track the highest block number written.
Regarding changed #3 through #5, this is more troubling to me. You are changing
the fundamental assumption that S3 is authoritative and s3backer's job is only
to access and cache the data from S3. Having said that, fortunately there is
already an easy way to do what you want, i.e., assert the local cache as
authoritative, which is simply to always specify the `--blockCacheNoVerify'
flag. It seems that this flag would give you the same net effect you're looking
for.
Original comment by [email protected]
on 24 Oct 2010 at 8:04
- Changed state: Accepted
- Added labels: Type-Enhancement
- Removed labels: Type-Defect
from s3backer.
Hi Archie,
I hadn't thought of the memory problem with tracking zero blocks on really
large filesystems. You've obviously correct about that. I've attached a new
patch with the --zeroTrack flag you suggested.
On the subject of the race condition with doing --listBlocks in the background,
my thinking was to grab the HTTP I/O mutex while fetching and parsing each page
of the list, and to yield to the other block I/O threads between pages. Do you
think that's sufficient to avoid a race condition.
On the subject of refreshing lost blocks, I think that "the fundamental
assumption that S3 is authoritative" is one model for the use of s3backer, but
it doesn't need to be the only model that makes sense for the use of the tool.
If someone is only using an s3backer filesystem from a single host, then it is
perfectly reasonable for the cache to be considered authoritative for that
filesystem on that host. I wouldn't make this the default behavior, but I think
it's a good and useful to make it an option.
Let me explain a bit more about how I am using s3backer to explain why this
makes sense to me (and I think it will to others as well). I already keep a
complete backup of all of my data on a separate SATA disk that's used for
nothing but backups. The S3 backup is therefore intended to be used only for
disaster recovery, e.g., if God forbid my house burns down and both my primary
and backup SATA drive are lost.
Therefore, if I ever actually do need to use the S3 backup to restore data,
it'll be because the s3backer cache on my local computer is no longer
accessible, and all of the data in the S3 filesystem *will* need to be correct
and complete.
The problem is that, with the current architecture, even with
--blockCacheNoVerify set, there's no guarantee that that will be the case. S3's
SLA explicitly states that files can be lost, especially in RRS (which is very
attractive since it's 33% cheaper than standard storage). Once I've written a
backup file to the S3 filesystem, there's no reason why I would ever look at
that file's blocks again until/unless I have to restore it. So there is a
significant risk of blocks disappearing from the S3 filesystem and me not
knowing about them until I actually have to use the data to do a restore.
--blockCacheNoVerify doesn't solve the problem because it only "papers over"
missing blocks that are available in the local cache. It doesn't repair the
damage (i.e., it doesn't put a missing block back onto S3 when it finds one),
which means that if I ever have to remove my cache, the data is lost forever,
or if I ever have to use my backup for what it's actually intended, i.e.,
disaster recovery when my primary computer is completely toasted, it won't help
because there won't be a local cache to get missing blocks from.
In short, fixing missing blocks from the cache doesn't make sense when an S3
filesystem might be used from multiple locations, but it makes a lot of sense
when it's only intended to be used from one place at a time.
Please let me know your thoughts.
(Would it be better to have this discussion on the s3backer-devel list rather
than here?)
Original comment by [email protected]
on 25 Oct 2010 at 12:15
Attachments:
from s3backer.
Fixed in 034fff3.
from s3backer.
Related Issues (20)
- Confusing errors when not specifying bucket region HOT 8
- Mount failure results in mount flag not being cleaned up HOT 1
- PerformanceConsiderations - there is no "Buffer Cache" anymore HOT 1
- s3backer is silently accepting invalid command line parameters HOT 6
- block_cache assertion failures when running in NBD mode HOT 16
- mount token does not take into account bucket subdir HOT 1
- "Broken Pipe" errors when running in NBD mode HOT 7
- Drop features for dealing with eventually consistent servers? HOT 2
- Data corruption when using NBD mode HOT 14
- Cache bandwidth much lower in version 1.6.x than in 1.5.6 HOT 19
- Version 2.0.1 not pushed to AWS S3 download bucket HOT 1
- s3backer --nbd not doing anything HOT 2
- Docker build failing HOT 4
- s3 strong consistency HOT 1
- munmap_chunk(): invalid pointer HOT 2
- TRIM is very inefficient HOT 2
- block cache entry shrink policy not documented HOT 1
- Building with NBD results in configured build prefix being ignored for nbdkit plugin HOT 2
- nbdkit: error: invalid value "deflate" for boolean flag "--compress" HOT 4
- block cache flush and synchronous umount (with fuse) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from s3backer.