Consider adding a secondary page cache

DataTanker

Embedded persistent key-value store for .NET. Pure C# code, B+Tree and RadixTree features, MIT License.

Features

simple and lightweight
fast enough
variable length values
concurrent access
atomic operations
user-defined serialization routines

Performance

I did not perform detailed comparison with competitors. The benchmarks provided by competing developers are usually biased and misleading. However, I provide some performance values. To ensure reliability you can run the Performance.exe test util.

sequential insert and read 1 000 000 integer keys with 20-byte values - 16sec
insert 100 000 integer keys with large values (from 200 to 5000 bytes) - 12sec
perform 1 000 000 random operations on 100 000 key set - 8sec

Feedback

Any feedback is welcome. Feel free to ask a question, send a bug report or feature request.

Usage

    [Serializable]
    class Employee
    {
        public int Id { get; set; }

        public string FirstName { get; set; }
        public string LastName { get; set; }

        public decimal Salary { get; set; }

        public override string ToString()
        {
            return $"Id: {Id}, Name: {FirstName} {LastName}, Salary {Salary}";
        }
    }

    class Program
    {
        public static void SaveEmployee(StorageFactory factory, Employee emp)
        {
            // create storage with integer keys and Employee values
            using (var storage = factory.CreateBPlusTreeStorage<int, Employee>(BPlusTreeStorageSettings.Default(sizeof(int))))
            {
                storage.OpenOrCreate(Directory.GetCurrentDirectory());
                storage.Set(emp.Id, emp);
            }
        }

        public static Employee ReadEmployee(StorageFactory factory, int id)
        {
            using (var storage = factory.CreateBPlusTreeStorage<int, Employee>(BPlusTreeStorageSettings.Default(sizeof(int))))
            {
                storage.OpenOrCreate(Directory.GetCurrentDirectory());
                return storage.Get(id);
            }
        }

        static void Main(string[] args)
        {
            var emp1 = new Employee { Id = 1, FirstName = "John", LastName = "Smith", Salary = new decimal(100000) };
            var emp2 = new Employee { Id = 2, FirstName = "Steve", LastName = "Barret", Salary = new decimal(150000) };

            var factory = new StorageFactory();
            SaveEmployee(factory, emp1);
            SaveEmployee(factory, emp2);

            var emp3 = ReadEmployee(factory, 1);
            var emp4 = ReadEmployee(factory, 2);

            Console.WriteLine(emp3);
            Console.WriteLine(emp4);
        }
    }

Where should I use it?

In cases where the data storage is required, but the file system is not suitable, and large DBMSs have too much overhead. For example: query caches, message queues, large amount of temporary data etc.

Access methods

Now DataTanker support two access methods: B+Tree and RadixTree. B+Tree have a good fill factor and demonstrates best performance on small sized keys without hierarchy. RadixTree is well suited for storing long hierarchical keys like filepaths.

Keys

Existing key always corresponds to a single value. Integer keys have a built-in serializers, so you don't have to worry about it. Here is a list of types having built-in serializers for keys: Int32, Int64, UInt32, UInt64, Double, Single, DateTime, String

To use any other type just implement serialize/deserialize routines for it. Other features depend on the access method.

B+Tree storage

Binary representation of key has size limit, which depends on selected page size. Each index page must have a place for at least three keys in binary format. This requirement follows from the B+Tree properties. Keys must implement the IComparable interface.

RadixTree storage

Key has no reasonable size limit. Implementation of IComparable and IEquatable interfaces are not required. However, key set is ordered by comparing byte sequences of serialized keys. So, care must be taken to the serialization.

Values

Values have no restrictions on type. The size of value have no reasonable limit. In most cases large values will be limited by the drive size.

The values use BinaryFormatter serialization by default. In this case, classes should be marked with [Serializable] attribute. However, serialization using BinaryFormatter implies too much overhead in time and disc space. You can easily provide your own serializer based on BSON, Protobuf or other appropriate proto. All you have to do is just implement two methods like this:

    public class ProtobufSerializer<T> : ISerializer<T>
    {
        public T Deserialize(byte[] bytes)
        {
            using (var ms = new MemoryStream(bytes))
            {
                var result = Serializer.Deserialize(typeof(T), ms);
                return (T)result;
            }
        }

        public byte[] Serialize(T obj)
        {
            using (var ms = new MemoryStream())
            {
                Serializer.Serialize(ms, obj);
                return ms.ToArray();
            }
        }
    }

What about ACID?

atomicity - any single operation is atomic
consistency - any operation transfer storage to a consistent state
isolation - all single operations are isolated from each other
durability - durability of updates is achieved by calling storage.Flush() method

However, transactions in the sense of unit-of-work are not supported. Thus, we can not produce long series of changes, and then rollback or commit them.

Create storage

var factory = new StorageFactory();
var storage = factory.CreateBPlusTreeStorage<int, string>( 
                    BitConverter.GetBytes,               // key serialization
                    p => BitConverter.ToInt32(p, 0),     // key deserialization
                    p => Encoding.UTF8.GetBytes(p),      // value serialization
                    p => Encoding.UTF8.GetString(p),     // value deserialization
                    BPlusTreeStorageSettings.Default(sizeof(int)));

Here we provide types for keys and values, serialization methods and storage settings. Please note that storage instance implements IDisposible interface. We should use storage instance in using block or call Dispose() in appropriate time to successfully shutdown.

To create storage on disk or open an already existing storage use one of

storage.OpenExisting(string path)
storage.CreateNew(string path)
storage.OpenOrCreate(string path)

On-disk storage files:

info - xml-file containing common information about storage
pagemap - the mapping of virtual page addresses to on-disk offsets
storage - the main storage file containing keys, values and index data

Operations

Storage instance returned by the CreateBPlusTreeStorage() method supports following operations:

Get(TKey key) - gets the value by its key
Set(TKey key, TValue value) - inserts or updates key value pair
Remove(TKey key) - removes key-value pair by key
Exists(TKey key) - cheks if key-value pair exists
GetRawDataLength(TKey key) - retrieves the length (in bytes) of binary representation of the value referenced by the specified key
GetRawDataSegment(TKey key, long startIndex, long endIndex) - retrieves a segment of binary representation of the value referenced by the specified key
Min() - gets the minimal key
Max() - gets the maximal key
PreviousTo(TKey key) - gets the key previous to the specified key, the existence of the specified key is not required
NextTo(TKey key) - gets the key next to the specified key, the existence of the specified key is not required
MinValue() - gets a value corresponing to the minimal key
MaxValue() - gets the value corresponing to the maximal key
Count() - computes the count of key-value pairs

	var content = new byte[length];
	Stream.BlockingRead(length); // page content

victorscherbakov / datatanker Goto Github PK

datatanker's Introduction

DataTanker

Features

Performance

Feedback

Usage

Where should I use it?

Access methods

Keys

Values

What about ACID?

Create storage

Operations

datatanker's People

Contributors

Stargazers

Watchers

Forkers

datatanker's Issues

Recommend Projects

Recommend Topics

Recommend Org