denismo / dynamofs Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 4.0 462 KB

Linux FUSE file system implementation with AWS DynamoDB as the storage

License: GNU General Public License v3.0

Shell 0.52% Python 99.48%

dynamofs's People

Contributors

Stargazers

Watchers

Forkers

lunastorm arielsalvo jamiebegin igroff

dynamofs's Issues

Implement S3 block storage

DynamoFS storage is quite expensive, while S3 has reasonable price.
S3 allows parallel uploads which can provide data transfers faster than DynamoFS fastest option.
Also, storing large files (GBs) via 64k blocks is very inefficient.

This feature will store data in S3:

1 S3 file per file-system file, the same name and path (in some bucket)
Multiple sequential writes will result in multiple parts of a multi-part upload
Multi-part upload closes when the file is closed after write
File cannot be written to again after the initial open due to immutable nature of S3 (write-once)
While the file is being written to, the size and attributes are updated immediately but the file is kept write-locked until it is closed (because it cannot be read from S3 until all multipart uploads finish)
After the write, many reads can be executed as usual
Reads will be blocked at storage level while the file is being assembled from parts (S3-eventual-consistency) via spin-lock

#6 chown 07 conflicts with chown 05

This is to do with permitPrivilegedOnly. On one hand, one test executes chown under unprivileged user without changing ownership but expects EPERM. On the other hand, other test expects not to get EPERM is ownership does not change

Move metadata from first block to the master record

This has primary advantage in that there are less blocks to read/write for any operations, which is easier to keep consistent. It'll also be faster as only 1 read is necessary for traversals, and for read/write operations the number of reads/write won't change.

Read should handle gracefully when there are missing blocks to be retrieved

By definition of truncate, it can set the file size to be large than before. The new bytes are filled with zeros. In our implementation, we don't allocate missing blocks so we need to make sure that read treats missing blocks as filled with zeros.

File modification time is not updated on write

Move file while performing one of the simple update operations

If we are moving a file while it is being updated by chmod, chown etc - simple attribute update - then it'll create 2 copies of the record as chmod's save will restore the deleted record,

Enable multi-threading in Fuse

At the moment Fuse is configured to make all request from one thread. This prevents some concurrency issues in FS implementation. Once those issues are resolved (e.g. #34) we can enable multi-threaded access.

Typo in readme

The readme says "execited" instead of "executed" in the "Highly Concurrent" section.

Write performance

At the moment the write performance is very slow. Copying a file of 90K takes about 30 seconds. This is due to the multiple write requests being performed by fuse library with 4K blocks. Every request requires read/write against Dynamo which is very slow.

Invalidate block cache

At the moment the blocks are cached indefinitely which will cause memory overflow. Need to implement purging of blocks out of cache.

Operations running with user in multiple groups can fail due to inability to retrieve user groups by FS

For example, chown only allows changing to the groups that the user belongs. If the user is in multiple groups, he can change between them. However, the fstest will fail (chown/00.t, 36) because FS can only get the number of the first group.

Test with new version of FSTest

All previous tests were done with the version pjd-fstest-20080816.tgz, released on August 16, 2008.

There is a newer version which has better coverage: pjd-fstest-20090130-RC.tgz, released on January 30, 2009.

Boto 2.9.8 does not know about ap-southeast-2 region for DynamoDB

getconf NAME_MAX does not work on the mounted Dynamo FS

There is a corresponding issue against fusepy. This is just a reminder that one that issue is fixed something needs to be done here as well.

Implement access call

Review the need for block cache

With the new design and improved concurrency properties we may not need the block cache anymore. Previously the first block had medata so that was one of the reasons to cache. Also, there were no concurrency checks.

Support for big files

At the moment the size of the file is supposedly stored in 32-bit integer.
Also, if the file is big, there will be many blocks however there is no pagination in Dynamo requests.

Modify the operations which perform +1 to perform "increment" update

There are several operations (mostly rename, link) which update block with link count. It's done using +1 in code which may leave the blocks inconsistent if there is a concurrent update. Instead, they should put "increment" operation which can be executed in parallel without conflict.

LockManager resiliance

Sometimes there are Fuse calls for release/lock against the file that we never saw before. LockManager fails in that case as there is no fileLock entry in dict. Need to either ignore failure (and create) or do nothing.

Also need to think about automatic purge of old fileLock entries.

Fix failing fstest

More than half of the tests are failing. The major areas are rename, open, mkdir.

Error while creating symbolic links

Due to whatever reason the operation succeeds but there is an error reported by Fuse:

root@ubuntu-VirtualBox:/mnt/pc/dynamo-fuse# ln -s /mnt/dynamo/aaa /mnt/dynamo/s_aaa
fuse: bad error value: 8
ln: failed to create symbolic link `/mnt/dynamo/s_aaa': Numerical result out of range

Add an ability to automatically create the Dynamo table

At the moment the table has to be pre-created manually, and it must have particular structure. It would be better if the application created (if absent) the table by itself with the right structure.

Consider using MVCC for concurrency

MVCC allows to implement reads/writes without read/write locks which is tempting as those are additional roundtrips on Dynamo

GT comparison on block numbre is done in strings

In truncate, there is a query for the tail blocks which uses GT(str(lastBlock)). As the range key is String, this will not produce the correct result. One workaround would be to format all numbers with head zeros up to the limit on the number of blocks

Support standard AWS credentials configuration mechanisms

At the moment only the simplest mechanism via environment variables is supported. This needs to be extended to support all AWS mechanisms.

Implement support for Java file locks

This is imagined to be a common use-case

Document the design

Test file locks on links and symlinks

Cache some records

It is obvious from the logs that Fuse performs some operations several times in a row. For example, getattr is called multiple times for the same file.

This presents significant opportunities for optimisation by caching, especially for any simple records like directory or file descriptor (but not data).

Add version tracking and concurrency checks

This will introduce a version field which will be incremented every time a write is made. It'll be verified against the current version to ensure that no concurrent modification happened. If it happens we need to figure out whether we can write automatic transaction retry mechanism.

Implement support for xattrs

Active development?

Is this project still under active development, or has it been abandoned?

Documentation lists python-fuse

Documentation currently lists the use of python-fuse as requirement but the package python-fuse is not available under pip (RH6 based) Is this something that is specific to an OS distro? Documentation references yum, so the assumption is made that it is available for redhat based OS's but the libraries do not exist. Also what versions of python is required to run the software?

Hard links don't reference count the blocks

When hard-links are created they just inherit the blockId of the source. If the source is deleted it will delete the block as well eventhough there may be hard links.

Implement file locking

At the moment there is no file locking mechanism

Hardlink of a node (e.g fifo)

Rename (mv) is not working

Throwing exception:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/fuse.py", line 448, in _wrapper
return func(_args, *_kwargs) or 0
File "/usr/local/lib/python2.7/dist-packages/fuse.py", line 488, in rename
new.decode(self.encoding))
File "dynamofs.py", line 51, in call
ret = getattr(self, op)(path, *args)
File "dynamofs.py", line 166, in rename
item.hash_key = new
AttributeError: can't set attribute

Lock recovery

If a client system which took a look on file dies no other system will be able to lock its files. Worse, if the system restarts the in-memory lockId will be gone and the system itself won't be able to unlock.

Need to implement lock recovery - if the system is identified to be dead the locks should be released. It can be done by some other system, based on "last client check-in" time.

The clients can check-in every now and then recording the time in some object.
The other clients can read the check-in object in case of lock being held for long time, and compare it with the lock time (needs to be added as well). If the time is too far from the expected check-in another client can declare the other system dead and release its locks. Other clients would then only check the system status.
Also depends on #41 to implement better error handling to handle situations when the client simply timed-out - if it is unable to lock/unlock on Dynamo it should not break the client. Unlock should succeed in most cases.

Implement mknod

Move/delete file being written/read

Linux filesystems allow the files which are being written/read to be moved or even deleted without causing errors. This is probably because an open file creates a hidden hard link to the inode (not persisted) which keeps the inode up while the file name is removed.

We need to implement it in some way preferably without locking as I believe this mechanism is used by log rotators/archivers.