diennea / blobit Goto Github PK

View Code? Open in Web Editor NEW

36.0 8.0 12.0 1.11 MB

BlobIt - a Distributed Large Object Storage

Home Page: https://blobit.org

License: Apache License 2.0

Java 96.93% Shell 1.98% CSS 1.08%

java storage blob-storage distributed bookkeeper

blobit's Introduction

BlobIt

BlobIt is a ditributed binary large objects (BLOBs) storage built upon Apache BookKeeper

Overview

BlobIt stores BLOBS (binary large objects) in buckets, a bucket is like a namespace. Multitenanty is fundamental in BlobIt architecture and it is expected that each tenant uses its own bucket.

Data is stored on a Apache BookKeeper cluster, and this automagically enables BlobIt to scale horizontally, the more Bookies you have the more amount of data you will be able to store.

BlobIt needs a metadata service in order to store refecences to the data, it ships by default with HerdDB, which is also built upon BookKeeper.

Architectural overview

BlobIt is designed for performance and expecially low latency in this scenario: the writer stores one BLOB and readers immediately read such BLOB (usually from different machines). This is the most common path in EmailSuccess, as BlobIt is the core datastore for it. Blobs are supposed to be retained for a couple of weeks, not for very long term, but there is nothing in the design of BlobIt that prevents you for storing data for years.

BlobIt clients talk directly to Bookies both for reads and writes, this way we are exploiting directly all of the BookKeeper optimizations on the write and read path. This architecture is totally decentralized, there is no BlobIt server. You can use the convenience binaries BlobIt service that is simply a pre-package bundle able to run ZooKeeper, BookKeeper, HerdDB and a REST API.

You can see BlobIt as simple extension to BookKeeper, with a metadata layer which makes it simple to:

reference Data using a user-supplied name (in form of bucketId/name)
organize efficently data in BookKeeper, an allow deletion of BLOBs.

Writes

Batches of Blobs are stored in BookKeeper ledgers (using the WriteHandleAdv API), We are storing more then one BLOB inside one BookKeeper ledger. BlobIt will collect unused ledgers and delete them.

When a Writer stores a BLOB it receives immedialy an unique ID of the blob, this ID is unique in the whole cluster, not only in the scope of the bucket. Such ID is a "smart id" and it contains all of the information needed to retrieve data without using the metadata service.

Such ID contains information like:

the ID of the ledger (64bit)
fist entry id (64bit)
number of entries (32bit)
size of the entry (64bit)

With such information it is possible to read the whole BLOB or even only parts. An object is immutable and it cannot be modified.

The client can assign a custom name, unique inside the context of the Bucket, Readers will be able to access the object using this key. You can assign the same key to another object, this way

If you are using custom keys the writer and the reader have to perform an additional RPC to the metadata service.

BookKeeper client stores data in immutable ledgers, and performs writes to a quorum of Bookies, which are only dta storage nodes. Each ledger will be written to several bookies, and all the information needed for data retrival is stored on ZooKeeper. ZooKeeper also stores data for Bookie discovery.

So the normal write flow is:

create a new ledger:
- choose a set of available Bookies using ZooKeeper
- write new ledger metadata to ZooKeeper
write each part of the BLOB directly to the Bookies
record on the metadata service (HerdDB) which entries of the ledger contains the data
perform an RPC to the HerdDB tablespace ledger for the bucket to write metadata
the database will perform a write on BookKeeper (still to a quorum of bookies)

The metadata service is decentralized: each bucket will have a dedicated tablespace on HerdDB, this leader will be indipendent from the ledgers of other tablespaces of otherbuckets, this way the system will scale horizonally with the number of Buckets.

Reads

BlobIt clients read data directly from Bookies. Because the objectId contains all of the information to access the data. In case of lookup by custom key a lookup on the metadata service is needed.

The reader can read the full Blob of parts of it. BlobIT supports a Streaming API for reads, suitable for very large objects: as soon as data comes from BookKeeper it is streamed to the application: Think about an HTTP service which retrieves an object and serves it directly to the client.

Buckets and data locality

You can use Buckets in order to make it possible to store data nearby the writer or the reader. BlobIt is able to use an HerdDB tablespace for each bucket, this way all of the metadata of the bucket will be handled using the placement policies configured in the system.

This is very important, because each Bucket will be able to survive and work indipendently from the others.

A typical scenario is to move readers, writers and the primary copy metadata and data next to each other, and have replicas on other machines/racks. Both for the metadata service (HerdDB) and the data service (BookKeeper) replicas will be activated immediately as soon as the reference machines are no more available without any service outage.

Deleting data

Data deletion is the most tricky part, because BlobIt is storing more than one BLOB inside the same ledeger, so you can delete a ledeger only when there is no live BLOB stored in it. We have a garbage collector system which makes maintenance of the Bucket and deleted data from BookKeeper when it is no more needed. Bookies in turn will do their own Garbage Collection, depending on the configuration. So disk space won't be reclaimed as soon as a BLOB is deleted.

BlobIt garbage collection is totally decentralized, any client can run the procedure, and it runs per bucket. Even in this case it is expected that services which operate on a bucket are co-located and take care of running the garbage collection in a timely manner. Usually it makes sense to run the GC of a bucket after deleting a batch of BLOBs of the same bucket.

Java Client example

A tipical writer looks like this:

String BUCKET_ID = "test";
byte[] TEST_DATA = "foo".getBytes();
HerdDBDataSource datasource = new HerdDBDataSource();
datasource.setUrl("jdbc:herddb:localhost");
Configuration configuration
                = new Configuration()
                    .setType(Configuration.TYPE_BOOKKEEPER)
                    .setConcurrentWriters(10)
                    .setUseTablespaces(true)
                    .setZookeeperUrl(env.getAddress());
try (ObjectManager manager = ObjectManagerFactory.createObjectManager(configuration, datasource);) {      
      manager.createBucket(BUCKET_ID, BUCKET_ID, BucketConfiguration.DEFAULT).get();

      BucketHandle bucket = manager.getBucket(BUCKET_ID);
      String id = bucket.put(null, TEST_DATA).get();
}

A typical reader looks like this:

String BUCKET_ID = "test";
byte[] TEST_DATA = "foo".getBytes();
HerdDBDataSource datasource = new HerdDBDataSource();
datasource.setUrl("jdbc:herddb:localhost");
Configuration configuration
                = new Configuration()
                    .setType(Configuration.TYPE_BOOKKEEPER)
                    .setConcurrentWriters(10)
                    .setUseTablespaces(true)
                    .setZookeeperUrl(env.getAddress());
try (ObjectManager manager = ObjectManagerFactory.createObjectManager(configuration, datasource);) {      
      manager.createBucket(BUCKET_ID, BUCKET_ID, BucketConfiguration.DEFAULT).get();

      BucketHandle bucket = manager.getBucket(BUCKET_ID);
      
      byte[] data = bucketReaders.get(it).get();
}

Most of the APIs are async and they are based on CompletableFuture.

REST API

We are delivering a REST service which is (almost) compatible with Open Stack Swift API. This service is still in ALPHA phase and it is currently used in order to perform benchmarks and comparisons with other products.

Security

As BlobIt is mostly a layer on top of BookKeeper, ZooKeeper and HerdDB all of the security aspects are handled directly but low level clients. Our suggestion is that you enable SASL/Kerberos authentication on all of such services. There is no support for other security features, because we expecte the client application to be in charge for handling semantics of buckets/blobs.

Getting in touch

Feel free to create issues in order to interact with the community.

Documentation will come soon, start with the examples inside the test cases in order to understand better how it works.

Please let us know if you are trying out this project, we will be happy to hear about your case and help you.

License

BlobIt is under Apache 2 license.

blobit's People

Contributors

Stargazers

Watchers

Forkers

taurus3g codelipenghui rudy2steiner yhilem apurbad hamadodene dmercuriali marvincai gauravashok misselvexu stevenlumt syk-coder

blobit's Issues

BKLedgerClosedException: Attempt to write to a closed ledger

In our blobit integration (In EmailSuccess.com), we encountered the exception "Attempt to write to a closed ledger" for a few seconds.
Can you help us fix this problem?

Caused by: org.blobit.core.api.ObjectManagerRuntimeException: org.blobit.core.api.ObjectManagerException: org.apache.bookkeeper.client.BKException$BKLedgerClosedException: Attempt to write to a closed ledger
        at org.blobit.core.cluster.BucketWriter.lambda$writeBlob$0(BucketWriter.java:237)
        at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
        at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
        at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
        ... 3 more
Caused by: org.blobit.core.api.ObjectManagerException: org.apache.bookkeeper.client.BKException$BKLedgerClosedException: Attempt to write to a closed ledger
        ... 7 more
Caused by: org.apache.bookkeeper.client.BKException$BKLedgerClosedException: Attempt to write to a closed ledger
        at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83)
        at org.apache.bookkeeper.client.SyncCallbackUtils$SyncAddCallback.addComplete(SyncCallbackUtils.java:251)
        at org.apache.bookkeeper.client.AsyncCallback$AddCallback.addCompleteWithLatency(AsyncCallback.java:91)
        at org.apache.bookkeeper.client.LedgerHandleAdv$1.safeRun(LedgerHandleAdv.java:249)
        at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        ... 1 more

CLI Command to describe a blob

It would be useful for testing a new CLI command that given a BLOB it describes it:

Basic info (size, number of entries, entry size...)
BK location info (which bookies are storing the blob segments??)

.

License

Add license reference in pom like

  <licenses>
    <license>
      <name>Apache License, Version 2.0</name>
      <url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
      <distribution>repo</distribution>
    </license>
  </licenses>

Support write with offset and length

Support a method on ObjectManager to write only a portion of a given byte array:

On ObjectManager:

public Future<String> put(String bucketId, byte[] data, int offset, int len);

Enable BookieID and upgrade BookKeeper to 4.12+

We need to:

Upgrade to Apache BookKeeper 4.14.3
Fix code useing BookieId instead of BookieSocketAddress
Add new configuration parameter server.bookkeeper.bookieid.enabled, disabled by default in "embedded" case

See https://github.com/diennea/herddb/pull/733/files as reference

BloBIT Can be used in Linux OS and Any CLI is available to use it?

Hello,
Im looking to use this application to upload and download packages as part of an application development.
Is this application having API's?
Is this application having CLI? if yes, please provide commands to install and use it
Is this application can be installed and run in Linux OS (Ubuntu,CentOS)?

It would be helpful, If the above information is known.

Thanks

Add an API to list named Blobs

It would be very useful an API to list named blobs, that is blobs with a custom name.

This API will use HerdDB SQL queries, so we can have an SQL-LIKE syntax.

Bucket.listObject(ObjectFilter filter, Consumer<ObjectMetadata> sink)

with filters built like ObjectFilter.nameLike("%myname%")

We will use this API on the CLI

Error while sending reply message to PooledUnsafeDi rectByteBuf(freed) while talking to null

20-03-17-12-02-15       java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error java.util.concurrent.ExecutionException: java.lang.Exception: herddb.network.netty.NettyChannel$3@5bdb86aa: error while sending reply m
essage to PooledUnsafeDirectByteBuf(freed) while talking to null
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error java.util.concurrent.ExecutionException: java.lang.Exception: herddb.network.netty.NettyChannel$3@5bdb86aa: error while sending reply message to PooledUnsafeDi
rectByteBuf(freed) while talking to null
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at magnews.gateway.archiver.Archiver.run(Archiver.java:319)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:830)
Caused by: java.lang.RuntimeException: Error java.util.concurrent.ExecutionException: java.lang.Exception: herddb.network.netty.NettyChannel$3@5bdb86aa: error while sending reply message to PooledUnsafeDirectByteBuf(freed) while talki
ng to null
        at herddb.network.Channel.sendMessageWithPduReply(Channel.java:95)
        at herddb.client.RoutedClientSideConnection.prepareQuery(RoutedClientSideConnection.java:347)
        at herddb.client.RoutedClientSideConnection.executeScan(RoutedClientSideConnection.java:699)
        at herddb.client.HDBConnection.executeScan(HDBConnection.java:345)
        at herddb.jdbc.HerdDBPreparedStatement.executeQuery(HerdDBPreparedStatement.java:74)
        at org.blobit.core.cluster.HerdDBMetadataStorageManager.listDeletableLedgers(HerdDBMetadataStorageManager.java:306)
        at org.blobit.core.cluster.ClusterObjectManager.gcBucket(ClusterObjectManager.java:430)
        at org.blobit.core.cluster.ClusterObjectManager.access$000(ClusterObjectManager.java:64)
        at org.blobit.core.cluster.ClusterObjectManager$BucketHandleImpl.gc(ClusterObjectManager.java:96)
        at magnews.gateway.store.blobit.BlobitMessageStore.endQueueCleanups(BlobitMessageStore.java:199)
        at magnews.gateway.archiver.Archiver$QueueArchiverTask.run(Archiver.java:283)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        ... 3 more
Caused by: java.lang.Exception: herddb.network.netty.NettyChannel$3@5bdb86aa: error while sending reply message to PooledUnsafeDirectByteBuf(freed)
        at herddb.network.netty.NettyChannel$3.messageSent(NettyChannel.java:208)
        at herddb.network.netty.NettyChannel$1.operationComplete(NettyChannel.java:142)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:502)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:476)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:415)
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:540)
        at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:533)
        at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:114)
        at io.netty.util.concurrent.PromiseCombiner.tryPromise(PromiseCombiner.java:170)
        at io.netty.util.concurrent.PromiseCombiner.finish(PromiseCombiner.java:159)
        at io.netty.handler.codec.MessageToMessageEncoder.writePromiseCombiner(MessageToMessageEncoder.java:139)
        at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:119)
        at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716)
        at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708)
        at io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56)
        at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102)
        at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149)
        at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:405)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:338)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        ... 1 more
Caused by: io.netty.channel.ExtendedClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)

Concurrent writers automatically closed without being used

When using 10 or more concurrent writers 2 or more writers get closed just after one insertion

Cannot perform release on 0.3 due to error in NamesAPITest.java:[150,30] no suitable method found for equals(byte[],int,int,byte[],int,int)

INFO] [ERROR] /Users/enrico.olivelli/dev/blobit/target/checkout/blobit-core/src/test/java/org/blobit/core/common/NamesAPITest.java:[150,30] no suitable method found for equals(byte[],int,int,byte[],int,int)
[INFO] method java.lang.Object.equals(java.lang.Object) is not applicable
[INFO] (actual and formal argument lists differ in length)
[INFO] method java.util.Arrays.equals(long[],long[]) is not applicable
[INFO] (actual and formal argument lists differ in length)
[INFO] method java.util.Arrays.equals(int[],int[]) is not applicable
[INFO] (actual and formal argument lists differ in length)
[INFO] method java.util.Arrays.equals(short[],short[]) is not applicable
[INFO] (actual and formal argument lists differ in length)
[INFO] method java.util.Arrays.equals(char[],char[]) is not applicable
[INFO] (actual and formal argument lists differ in length)
[INFO] method java.util.Arrays.equals(byte[],byte[]) is not applicable
[INFO] (actual and formal argument lists differ in length)
[INFO] method java.util.Arrays.equals(boolean[],boolean[]) is not applicable
[INFO] (actual and formal argument lists differ in length)
[INFO] method java.util.Arrays.equals(double[],double[]) is not applicable
[INFO] (actual and formal argument lists differ in length)
[INFO] method java.util.Arrays.equals(float[],float[]) is not applicable
[INFO] (actual and formal argument lists differ in length)
[INFO] method java.util.Arrays.equals(java.lang.Object[],java.lang.Object[]) is not applicable
[INFO] (actual and formal argument lists differ in length)

Bookie - Write ledger header

We need to write an entry to the ledger before adding it to the list of active ledgers to work around a possible fault case in BookKeeper when there is no entry on the Bookie and one Bookie goes down.

See diennea/herddb#572 for more information

BlobIt filesystem

We can create a layer over BlobIt in order to create a virtual file system, with these features:

a FS-tree-like structure (introduce the concept of "directory")
-- directory listing
-- create directory
-- remove directory
information about file size, disk size (virtual and real), possibly per directory (without needing a full scan)
fileExists
getFileInfo (type, size, lastmodification)
setLastModified
renameFile into same directory
renameFile into other directory (move)
delete file
read file (partial reads, offset+length)
overwrite file
truncate file to length (pre-allocate??)
write a file (full write of the whole blob)
streaming write without knowing the length in advance (not yet supported by BlobIt)
streaming write in "append mode" of a byte[] (something like 'concat', not yet supported by BlobIt)
-- streaming write in "append mode" without knowing the length in advance (not yet supported by BlobIt)

We expect that one Filesystem resides on a single Bucket.
We can add one or more metadata tables in order to store directory metadata, this will ease lookups and queries

No need for ACLs, file locks or other POSIX features

diennea / blobit Goto Github PK

blobit's Introduction

BlobIt

Overview

Architectural overview

Writes

Reads

Buckets and data locality

Deleting data

Java Client example

Getting in touch

License

blobit's People

Contributors

Stargazers

Watchers

Forkers

blobit's Issues

Recommend Projects

Recommend Topics

Recommend Org