Git Product home page Git Product logo

datatanker's Introduction

DataTanker NuGet Pre Release AppVeyor license

Embedded persistent key-value store for .NET. Pure C# code, B+Tree and RadixTree features, MIT License.

Features

  • simple and lightweight
  • fast enough
  • variable length values
  • concurrent access
  • atomic operations
  • user-defined serialization routines

Performance

I did not perform detailed comparison with competitors. The benchmarks provided by competing developers are usually biased and misleading. However, I provide some performance values. To ensure reliability you can run the Performance.exe test util.

  • sequential insert and read 1 000 000 integer keys with 20-byte values - 16sec
  • insert 100 000 integer keys with large values (from 200 to 5000 bytes) - 12sec
  • perform 1 000 000 random operations on 100 000 key set - 8sec

Feedback

Any feedback is welcome. Feel free to ask a question, send a bug report or feature request.

Usage

    [Serializable]
    class Employee
    {
        public int Id { get; set; }

        public string FirstName { get; set; }
        public string LastName { get; set; }

        public decimal Salary { get; set; }

        public override string ToString()
        {
            return $"Id: {Id}, Name: {FirstName} {LastName}, Salary {Salary}";
        }
    }

    class Program
    {
        public static void SaveEmployee(StorageFactory factory, Employee emp)
        {
            // create storage with integer keys and Employee values
            using (var storage = factory.CreateBPlusTreeStorage<int, Employee>(BPlusTreeStorageSettings.Default(sizeof(int))))
            {
                storage.OpenOrCreate(Directory.GetCurrentDirectory());
                storage.Set(emp.Id, emp);
            }
        }

        public static Employee ReadEmployee(StorageFactory factory, int id)
        {
            using (var storage = factory.CreateBPlusTreeStorage<int, Employee>(BPlusTreeStorageSettings.Default(sizeof(int))))
            {
                storage.OpenOrCreate(Directory.GetCurrentDirectory());
                return storage.Get(id);
            }
        }

        static void Main(string[] args)
        {
            var emp1 = new Employee { Id = 1, FirstName = "John", LastName = "Smith", Salary = new decimal(100000) };
            var emp2 = new Employee { Id = 2, FirstName = "Steve", LastName = "Barret", Salary = new decimal(150000) };

            var factory = new StorageFactory();
            SaveEmployee(factory, emp1);
            SaveEmployee(factory, emp2);

            var emp3 = ReadEmployee(factory, 1);
            var emp4 = ReadEmployee(factory, 2);

            Console.WriteLine(emp3);
            Console.WriteLine(emp4);
        }
    }

Where should I use it?

In cases where the data storage is required, but the file system is not suitable, and large DBMSs have too much overhead. For example: query caches, message queues, large amount of temporary data etc.

Access methods

Now DataTanker support two access methods: B+Tree and RadixTree. B+Tree have a good fill factor and demonstrates best performance on small sized keys without hierarchy. RadixTree is well suited for storing long hierarchical keys like filepaths.

Keys

Existing key always corresponds to a single value. Integer keys have a built-in serializers, so you don't have to worry about it. Here is a list of types having built-in serializers for keys: Int32, Int64, UInt32, UInt64, Double, Single, DateTime, String

To use any other type just implement serialize/deserialize routines for it. Other features depend on the access method.

B+Tree storage

Binary representation of key has size limit, which depends on selected page size. Each index page must have a place for at least three keys in binary format. This requirement follows from the B+Tree properties. Keys must implement the IComparable interface.

RadixTree storage

Key has no reasonable size limit. Implementation of IComparable and IEquatable interfaces are not required. However, key set is ordered by comparing byte sequences of serialized keys. So, care must be taken to the serialization.

Values

Values ​​have no restrictions on type. The size of value have no reasonable limit. In most cases large values will be limited by the drive size.

The values use BinaryFormatter serialization by default. In this case, classes should be marked with [Serializable] attribute. However, serialization using BinaryFormatter implies too much overhead in time and disc space. You can easily provide your own serializer based on BSON, Protobuf or other appropriate proto. All you have to do is just implement two methods like this:

    public class ProtobufSerializer<T> : ISerializer<T>
    {
        public T Deserialize(byte[] bytes)
        {
            using (var ms = new MemoryStream(bytes))
            {
                var result = Serializer.Deserialize(typeof(T), ms);
                return (T)result;
            }
        }

        public byte[] Serialize(T obj)
        {
            using (var ms = new MemoryStream())
            {
                Serializer.Serialize(ms, obj);
                return ms.ToArray();
            }
        }
    }

What about ACID?

  • atomicity - any single operation is atomic
  • consistency - any operation transfer storage to a consistent state
  • isolation - all single operations are isolated from each other
  • durability - durability of updates is achieved by calling storage.Flush() method

However, transactions in the sense of unit-of-work are not supported. Thus, we can not produce long series of changes, and then rollback or commit them.

Create storage

var factory = new StorageFactory();
var storage = factory.CreateBPlusTreeStorage<int, string>( 
                    BitConverter.GetBytes,               // key serialization
                    p => BitConverter.ToInt32(p, 0),     // key deserialization
                    p => Encoding.UTF8.GetBytes(p),      // value serialization
                    p => Encoding.UTF8.GetString(p),     // value deserialization
                    BPlusTreeStorageSettings.Default(sizeof(int)));

Here we provide types for keys and values, serialization methods and storage settings. Please note that storage instance implements IDisposible interface. We should use storage instance in using block or call Dispose() in appropriate time to successfully shutdown.

To create storage on disk or open an already existing storage use one of

  • storage.OpenExisting(string path)
  • storage.CreateNew(string path)
  • storage.OpenOrCreate(string path)

On-disk storage files:

  • info - xml-file containing common information about storage
  • pagemap - the mapping of virtual page addresses to on-disk offsets
  • storage - the main storage file containing keys, values and index data

Operations

Storage instance returned by the CreateBPlusTreeStorage() method supports following operations:

  • Get(TKey key) - gets the value by its key
  • Set(TKey key, TValue value) - inserts or updates key value pair
  • Remove(TKey key) - removes key-value pair by key
  • Exists(TKey key) - cheks if key-value pair exists
  • GetRawDataLength(TKey key) - retrieves the length (in bytes) of binary representation of the value referenced by the specified key
  • GetRawDataSegment(TKey key, long startIndex, long endIndex) - retrieves a segment of binary representation of the value referenced by the specified key
  • Min() - gets the minimal key
  • Max() - gets the maximal key
  • PreviousTo(TKey key) - gets the key previous to the specified key, the existence of the specified key is not required
  • NextTo(TKey key) - gets the key next to the specified key, the existence of the specified key is not required
  • MinValue() - gets a value corresponing to the minimal key
  • MaxValue() - gets the value corresponing to the maximal key
  • Count() - computes the count of key-value pairs

datatanker's People

Contributors

mario206 avatar victorscherbakov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

datatanker's Issues

Consider adding a secondary page cache

When the CachingPageManager evicts pages, it will not actually free the memory. That's the job of the GC. This means that when we evict a page and it's later requested again, we have to always re-read the page from disk, even tho the evicted page object might be still in memory and not reclaimed by the GC yet, essentially a preventable read.

Instead of throwing away the evicted pages, a secondary page keeping weak references to the pages would thus reduce file I/O and improve performance.

I implemented a basic secondary cache, nothing more than a Dictionary<ulong, WeakReference<Page>> really, as well as some cache hit/miss/rate counters and let it lose on one of my data stores in a real world program, that read about 2M items and wrote about 300K new/updated items, using 16K pages and MaxCachePages = 10,000, and using a non-concurrent workstation GC. The primary cache hit rate was 96.7%, so the current cache works for the most part (tho I also have rudimentary code to prioritize metadata/index pages over data pages in the cache eviction). But the secondary page (the weak references one I added) was able to serve 62.1% of fetch requests that were missed in the primary cache, essentially for free if you compare Dictionary lookup and the small amount of space for the dictionary vs hitting the disk. Of all fetch requests, only 1.25% requests were read from disk (compared to 3.3% without the secondary cache), and most would be requests to a page never requested before that thus could not possibly have been cached already.

RecoveryFile is horribly broken

So, I was curious how the recovery file stuff was actually implemented, and found some grave bugs which completely disable the feature always, and if it wasn't, would actually cause more data corruption.

For starters CorrectlyFinished is always false, because it tries to read beyond the stream end (Stream.Seek(finalRecordSize, SeekOrigin.End); instead of -finalRecordSize).

Stream.Seek(finalRecordSize, SeekOrigin.End);

Next, it does not actually read the content of update records:, as it calls the BlockingRead() variant that expects a size and returns the buffer (and ignores the return value), so content is never filled, thus potentially "updating" a page with all zeroes.

var content = new byte[length];
Stream.BlockingRead(length); // page content

Finally, replaying the recovery records will fail, as the recovery is attempted before Storage.IsOpen is set to true, thus when tying to e.g. update a page a chain of PageExists -> CheckStorage -> not open -> throw new InvalidOperationException occurs, and boom.

Fixing these issue should at least make the basic recovery functionality work.

I added a test to my local fork (that has Recovery renamed to Journal, among other things), that you maybe can adapt.

[TestFixture]
public class JournalTests : FileSystemStorageTestBase
{
    public JournalTests()
    {
        StoragePath = Path.Combine(Directory.GetParent(Assembly.GetExecutingAssembly().Location).FullName,
                                    "..\\..\\Storages");
    }

    [Test]
    public void RecoverJournal()
    {
        long idx;
        long deleteIndex;

        using (var man = new FileSystemPageManager(4096))
        using (var storage = new Storage(man)) {
            storage.Open(StoragePath, PersistentStorageOpenMode.Create);
            var page = man.CreatePage();
            idx = page.Index;
            deleteIndex = man.CreatePage().Index;
        }

        using (var man = new FileSystemPageManager(4096))
        using (var storage = new Storage(man)) {
            storage.Open(StoragePath, PersistentStorageOpenMode.OpenExisting);
            var page = man.FetchPage(idx);
            Assert.AreEqual(page.Index, idx);
            Assert.AreNotEqual(page.Content[10], 0xff);
            Assert.True(man.PageExists(deleteIndex));
        }

        string file;
        using (var man = new FileSystemPageManager(4096))
        using (var storage = new Storage(man))
        using (var journal = new JournalFile(man, man.PageSize)) {
            storage.Open(StoragePath, PersistentStorageOpenMode.OpenExisting);
            var page = man.FetchPage(idx);
            idx = page.Index;
            man.Dispose();

            Assert.AreEqual(page.Index, idx);
            Assert.AreNotEqual(page.Content[10], 0xff);

            journal.WriteDeletePageRecord(deleteIndex);

            // Update twice with different pages to make sure the latest one wins
            page.Content[9] = 0xaa;
            page.Content[10] = 0xaa;
            journal.WriteUpdatePageRecord(page);
            page.Content[10] = 0xff;
            journal.WriteUpdatePageRecord(page);

            journal.WriteFinalMarker();
            file = journal.JournalFileName;
        }

        Assert.True(File.Exists(file));
        Assert.GreaterOrEqual(new FileInfo(file).Length, 1);
        Assert.AreNotEqual(idx, 0);
            
        using (var man = new FileSystemPageManager(4096))
        using (var storage = new Storage(man)) {
            storage.Open(StoragePath, PersistentStorageOpenMode.OpenExisting);
            var page = man.FetchPage(idx);
            Assert.AreEqual(page.Index, idx);
            Assert.AreEqual(page.Content[9], 0xaa);
            Assert.AreEqual(page.Content[10], 0xff);
            Assert.False(man.PageExists(deleteIndex));
            Assert.AreEqual(man.CreatePage().Index, deleteIndex);
        }
    }
}

GREAT

First of all: great project, It totally deserve more visibility.
Second: storage.Max() always return null, I will try to make a PR

Storing multiplex indices/values in the same storage

First: impressive work, congratulation!

I'd like to use the library to store a variable amount of indices, of different key/value types.
I understand I could create one storage per index, but that would lead to three files stored on the disk, so I would have 3 x files for x indices.
If I could get a way to store everything in just three files I believe it would be better.

Please advise, I'd be happy to contribute if I have some insights and if the workload/complexity is not too high.

I'm willing to convert the library in .Net Standart 2.0, I've run the portability analyzer, there's a 100% match, so it's only a matter of project structure change.

Another remark, I've profiled the lib during the 100K random alloc in your WinForm application and there's a considerable amount of time spent in RecoveryFile.EnsureFileExists() for the check if the file already exist, if we avoid this we can drop 10% on the Set() method.

Incorrect uses of CompareTo

I am modifying a copy of DataTanker right now[*] and noticed that in a few places the code calls obj.CompareTo(other) == 1 (== -1, != 1 and so on).
The contract for this method however states it will return less-than-zero, zero, greater-than-zero, not -1, 0, 1, as the DataTanker code currently assumes.

Indeed, if I were to implement a custom key type, e.g. using StringComparer.Ordinal (or anything that does return a.length - b.Length), then stuff breaks, horribly.

I'd therefore suggest to fix this bug by inspecting all uses of .CompareTo()

[*] Not that it matters for the issue at hand, but I modded it to remove ComparableKeyOf, KeyOf, ValueOf (in favor of nullable types if you really need it), and to provide a way for custom IComparer's instead of relying on keys to implement IComparable, which does allow me to use e.g. byte[] keys.

Not an issue

Just stumbled upon this project - very, very cool. Will remember it's out here in case I need in-memory persistent storage :) Thanks for sharing. Cheers

dose this support auto compress ?

I am currently looking for a KV Store. It is precisely because of the limited memory and hard disk space that 20 billion keys need to be stored, but the value is not large, and the value does not need to be indexed.

my key about MaxLength 50 characters string (dynamic length).
and the value MaxLength about 50 bytes.

FSPageManager default buffering considered harmful

Before I forget, I looked into the performance characteristics quite a lot and one easy and rather impactful improvement was to change the FileSystemPageManager buffer defaults.

Right now, it defaults to writeBufferSize * PageSize, where writeBufferSize defaults to 100, or worse 1000 when constructed via StorageFactory with caching enabled and default cache settings.

https://github.com/VictorScherbakov/DataTanker/blob/master/DataTanker/Core/PageManagement/FileSystemPageManager.cs#L727

https://github.com/VictorScherbakov/DataTanker/blob/master/DataTanker/Core/StorageFactory.cs#L337

i.e. given a default 4096 page size, the usual case is 1000 * 4096 = 4096000 passed to the FileStream as the preferred buffer size.

Given that page access is extremely random during normal operation (as is a result of B+/Radix trees), the probability that two successive reads are close enough to each other that the seek to the next requested page offset does not discard the file streams internal buffer (and uninterrupted by writes, which would also discard writes) is extremely low. That means most of the time (almost always for a lot of work loads) the file stream buffer is discarded after being used to read only a single page. But the buffer was filled with 1000 pages, 999 of which are never accessed before being discard, creating a lot of useless disk I/O.

I have a real world program that was read-only at 65k page size

Unmodified:

  • 678 FetchPages
  • 676 disk I/O ReadFiles of 65536000 bytes each, meaning 2 reads were served from the buffer which is a hit rate of 0.3%.
  • Total disk reads: 44302336000 bytes = 41.26 GiB
  • ufff (thanks to the OS disk cache, it wasn't this many hard reads, thankfully).

Modified to use writeBufferSize = 1:

  • 678 FetchPages,
  • 678 disk I/O ReadFiles of 65536 bytes each (obviously)
  • Total disk reads: 44433408 bytes = 0.04 GiB
  • Now that's more like it.

The difference is a whooping 41 GiB in ReadFile requests to the OS. Of course, the larger the page size, the worse it gets.

Conclusion

Given the low probability that the file streams buffer will be useful, I recommend to chose writeBufferSize = 1 for all defaults.

pagemap size

Hi,
I have a lot of rewrites of the same key-value in my software.
I see that the pagemap file grows each time I finish a cycle of such updates.

When does it shrink? If at all?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.