<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Fixed in <a class="commit-link" data-hovercard-type="commit" data-hovercard-url="https

enhancement request with patch: track zero blocks after startup even if --listBlocks wasn't specified about s3backer HOT 3 CLOSED

archiecobbs commented on June 7, 2024

enhancement request with patch: track zero blocks after startup even if --listBlocks wasn't specified

from s3backer.

Comments (3)

GoogleCodeExporter commented on June 7, 2024

Some thoughts on these ideas:

It makes sense that (a) the initial --listBlocks operation and (b) maintaining 
a zero block bitmap be separate functions (with the former implying the latter).

However, maintaining a zero bit map must be optional, because (in current 
implementation) it costs 1 bit per block and this could be too big for memory 
in some combinations of small block size and large filesystem. So I think the 
right answer is to add a --zeroTrack flag (or whatever) which is implicitly 
enabled by --listBlocks.

Performing --listBlocks in the background is a good improvement, however there 
is an implicit race condition which must be handled, namely a previously zero 
block is written to right at the same time the --listBlocks response returns 
saying the block is zero. Unless the cache is enabled (which we can't assume) 
you need to somehow track this scenario. E.g., a simple way to do this is to 
track the highest block number written.

Regarding changed #3 through #5, this is more troubling to me. You are changing 
the fundamental assumption that S3 is authoritative and s3backer's job is only 
to access and cache the data from S3. Having said that, fortunately there is 
already an easy way to do what you want, i.e., assert the local cache as 
authoritative, which is simply to always specify the `--blockCacheNoVerify' 
flag. It seems that this flag would give you the same net effect you're looking 
for.

Original comment by [email protected] on 24 Oct 2010 at 8:04

Changed state: Accepted
Added labels: Type-Enhancement
Removed labels: Type-Defect

from s3backer.

GoogleCodeExporter commented on June 7, 2024

Hi Archie,

I hadn't thought of the memory problem with tracking zero blocks on really 
large filesystems. You've obviously correct about that. I've attached a new 
patch with the --zeroTrack flag you suggested.

On the subject of the race condition with doing --listBlocks in the background, 
my thinking was to grab the HTTP I/O mutex while fetching and parsing each page 
of the list, and to yield to the other block I/O threads between pages. Do you 
think that's sufficient to avoid a race condition.

On the subject of refreshing lost blocks, I think that "the fundamental 
assumption that S3 is authoritative" is one model for the use of s3backer, but 
it doesn't need to be the only model that makes sense for the use of the tool. 
If someone is only using an s3backer filesystem from a single host, then it is 
perfectly reasonable for the cache to be considered authoritative for that 
filesystem on that host. I wouldn't make this the default behavior, but I think 
it's a good and useful to make it an option.

Let me explain a bit more about how I am using s3backer to explain why this 
makes sense to me (and I think it will to others as well). I already keep a 
complete backup of all of my data on a separate SATA disk that's used for 
nothing but backups. The S3 backup is therefore intended to be used only for 
disaster recovery, e.g., if God forbid my house burns down and both my primary 
and backup SATA drive are lost.

Therefore, if I ever actually do need to use the S3 backup to restore data, 
it'll be because the s3backer cache on my local computer is no longer 
accessible, and all of the data in the S3 filesystem *will* need to be correct 
and complete.

The problem is that, with the current architecture, even with 
--blockCacheNoVerify set, there's no guarantee that that will be the case. S3's 
SLA explicitly states that files can be lost, especially in RRS (which is very 
attractive since it's 33% cheaper than standard storage). Once I've written a 
backup file to the S3 filesystem, there's no reason why I would ever look at 
that file's blocks again until/unless I have to restore it. So there is a 
significant risk of blocks disappearing from the S3 filesystem and me not 
knowing about them until I actually have to use the data to do a restore.

--blockCacheNoVerify doesn't solve the problem because it only "papers over" 
missing blocks that are available in the local cache. It doesn't repair the 
damage (i.e., it doesn't put a missing block back onto S3 when it finds one), 
which means that if I ever have to remove my cache, the data is lost forever, 
or if I ever have to use my backup for what it's actually intended, i.e., 
disaster recovery when my primary computer is completely toasted, it won't help 
because there won't be a local cache to get missing blocks from.

In short, fixing missing blocks from the cache doesn't make sense when an S3 
filesystem might be used from multiple locations, but it makes a lot of sense 
when it's only intended to be used from one place at a time.

Please let me know your thoughts.

(Would it be better to have this discussion on the s3backer-devel list rather 
than here?)

Original comment by [email protected] on 25 Oct 2010 at 12:15

Attachments:

s3backer-r443-zero-blocks.patch

from s3backer.

archiecobbs commented on June 7, 2024

Fixed in 034fff3.

from s3backer.

enhancement request with patch: track zero blocks after startup even if --listBlocks wasn't specified about s3backer HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent