Git Product home page Git Product logo

delugefs's Introduction

delugefs

Overview DelugeFS is a distributed filesystem I implemented as a proof of concept of several ideas I've had over the past few years.

Key features include:

a shared-nothing architecture (meaning there is no master node controlling everything - all peers are equal and there is no single point of failure) auto-discovery of peers able to utilize highly heterogeneous computational resources (ie: a random bunch of disks stuck in a random number of machines) Key insights this FS proves:

Efficient distribution of large blocks of immutable data without centralized control is a solved problem. I use libtorrent. Efficient sharing of small quantities of mutable data (without a central server) is a solved problem. I use Mercurial, but Git would have been equally useful. Automatic discovery of peers on a local network is a solved problem. Zeroconf/Bonjour is well established. Emulating a filesystem is a solved problem with FUSE. All of these projects have Python bindings! The key node in a distributed filesystem is the disk, not the machine. Everything above the disk is network topology. Current Status HIGHLY EXPERIMENTAL! -- PROOF OF CONCEPT ONLY -- DO NOT USE FOR ANY CRITICAL DATA AT THIS POINT!

I'm using it as personal media center storage spanning three disks on two machines. It works well so far, but it still very early in development.

Speed:

I/O across the FUSE boundary is CPU limited. Max observed is ~10MB/s. I suspect this is a limitation of the Python FUSE bindings I'm using. I/O between nodes is limited by the disk read/write speeds. I've observed >70MB/s sustained on my home network. Known Issues Files over ~4GB are not stored (and their zero-length stubs cannot be deleted). I believe this is due to an int vs. long incompatibility with libtorrent, but I haven't confirmed. Basic Algorithm To start up:

Filesystem is started given a volume id, a storage location, and a mount point. Filesystem searches for local peers. Filesystem either pulls from our clones other peer's Hg repositories. Filesystem looks for any files it has locally (complete or not), and starts seeding them. To write a file:

Filesystem client opens a file and attempts to write. Filesystem returns a handle to a local temporary file. Client writes to file and then closes it. Filesystem create a torrent of that file (containing metadata about the file along with secure hashes of its contents) and commits it to a local Hg repository. Filesystem contacts local peers and sends them a pull request. Other peers pull the file metadata update. To read a file:

If filesystem already has a copy of the file requested it returns the data directly. Filesystem loads the torrent file and starts searching for a peer with the file data via Bittorrents distributed hash table (DHT) peer discovery mechanism. Filesystem waits for the blocks covering the read request to become available, and then returns the data to the filesystem client. To replicate a file:

All peers participate in the bittorrent swarms associated with each file they have local copies of. If a peer notices the swarm size falls below a threshold, it will send out clone requests to a subset of its peers until the swarm size increases. Example Usage The first time the first node is ever brought up:

server1$ ./delugefs.py --create --id bigstore
--root /mnt/disk1/.bigstoredb
--mount ~/bigstore All future invocations would omit the "--create".

To bring up an additional node on a different disk on the same machine:

server1$ ./delugefs.py --id bigstore
--root /mnt/disk2/.bigstoredb (note the lack of the optional mount point)

To bring up an additional node on a different machine:

server2$ ./delugefs.py --id bigstore
--root /mnt/disk3/.bigstoredb
--mount ~/bigstore That's all there is to it!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.