spreads / spreads.lmdb Goto Github PK

View Code? Open in Web Editor NEW

78.0 7.0 9.0 1.06 MB

Low-level zero-overhead and the fastest LMDB .NET wrapper with some additional native methods useful for Spreads

Home Page: http://docs.dataspreads.io/spreads/libs/lmdb/api/README.html

License: Mozilla Public License 2.0

Batchfile 0.22% C# 94.12% Smalltalk 0.01% Dockerfile 0.32% Makefile 0.45% C 4.89%

lmdb dotnet csharp

spreads.lmdb's Introduction

Spreads.LMDB

Low-level zero-overhead and the fastest LMDB .NET wrapper with some additional native methods useful for Spreads.

Available on NuGet as Spreads.LMDB.

C# `async/await` support

In the original version, this library provided a dedicated writer thread for background writes (disabled by default). Now this functionality is removed. It is hard to implement for a general case, but write serialization could be achieved in user code if needed. See #41 for more details.

LMDB's supported "normal" case is when a transaction is executed from a single thread. For .NET this means that if all operations on a transactions are called from a single thread then it does not matter which thread is executing a transaction and LMDB will just work. However, it's not possible to jump threads inside write transactions, which means no awaits or wait handle waits.

Read transactions could be used from async code, which requires forcing MDB_NOTLS attribute for environments:

A thread may use parallel read-only transactions. A read-only transaction may span threads if the user synchronizes its use. Applications that multiplex many user threads over individual OS threads need this option. Such an application must also serialize the write transactions in an OS thread, since LMDB's write locking is unaware of the user threads.

Read-only transaction and cursor renewal

Spreads.LMDB automatically takes care of read-only transaction and cursor renewals if they are properly disposed as .NET objects. It does not allocate those objects in steady state (uses internal pools).

Working with memory safely

Warning! This library exposes MDB_val directly as DirectBuffer struct, the struct MUST ONLY be read when inside a transaction (or when it points to an overflow page - but that is an undocumented hack working so far). For writes, the memory behind DirectBuffer MUST BE pinned.

DirectBuffer.Span property allows to access MDB_val as Span<byte>. DirectBuffer can be easily constructed from Span<byte>, but the span must be pinned as well if it is backed by byte[].

DirectBuffer has many methods to read/write primitive and generic blittable struct values from any offset, e.g. directBufferInstance.Read<ulong>(8) to read ulong from offset 8. By default it checks bounds, and an LMDB call via P/Invoke takes much longer so there is no reason to switch the bounds checks off. But you can still do so e.g. if you read separate bytes of large values a lot (e.g. via indexer directBufferInstance[offset] that returns a single byte at offset).

Generic key/values support

Any C# struct that has no references could be used directly as a key or a value. See IROCR docs. Be aware of auto layout, padding and related issues.

IEnumerable support

A database or duplicate values of a key in a single dupsorted database could be enumerated via dataBaseInstance.AsEnumerable([several overloads]) methods that could return either DirectBuffers or generic blittable structs.

Examples

See tests. The API is very close to the C one but adapted for .NET.

Native libraries

Required native binaries fo x64 on Linux, Windows and macOS are included in the NuGet package. They are built via GitHub Actions using this Makefile.

The LMDB version is mdb.master branch matching the latest edit to CHANGES in mdb.RE/0.9 branch.

To build locally, you could adjust SOEXT for your platform and call make or just call make with a target libspreads_lmdb[.so|.dll|.dylib].

The library works with the original native LMDB binaries as well, but TryFind helper methods won't work.

Limitations

The library does not support nested transactions yet - only because we do not use them currently. They will be added as soon as we find a real-world compelling case for them.

Contributing

Issues & PRs are welcome!

Copyright

MPL 2.0 (c) Victor Baybekov, 2018-2023

spreads.lmdb's People

Contributors

Stargazers

Watchers

Forkers

aliostad stangelandcl wangjia184 rocknhawk crjaensch valmac gerhobbelt kasperk81 sabariganesh-k

spreads.lmdb's Issues

Stable version is not extracting native libraries

Like in title - newest version isn't extracting native library for me:

Last working version is 1.0.0-build1903282013.
Is this expected? Should i provide native library myself?

DUPSORT and DUPFIXED do not work

HIi,

I am unable to store multiple values against a key. This is a basic operation, am I missing something?

var env = LMDBEnvironment.Create("../../../../../lmdb7");
env.Open();
var stat = env.GetStat();
var key = 10000;
Console.WriteLine("start");
var t = Environment.TickCount;
using (var db = env.OpenDatabase("first_db2", new DatabaseConfig(DbFlags.DuplicatesFixed )))
{
    db.Truncate();
    for (var i = 1; i < 10000; i++)
    {
        db.Put(0, Interlocked.Increment(ref key), TransactionPutOptions.AppendData);
    }
}
Console.WriteLine(Environment.TickCount -t);

I am getting MDB_KEYEXIST: Key/data pair already exists all the time. I also tried it with DuplicatesSort to no avail.

Manual reset/renew for read transactions

Automatic pooling often leads to MDB_READERS_FULL when many processes work with the same env. Also NO_TLS makes things worse since it uses slot per txn not thread. Recent commit added an option to disable automatic pooling, but manual reset/renew could be useful.

Cursors do not take a reader slot and always pooled.

Specific exceptions

Need one for MAP_FULL at least.

How to direct use string as key and value?

I'm trying direct put string's pointer and get the stirng size to DirectBuffer, but I got and DirectBuffer size not fixed
or MDB_KEYEXIST: Key/data pair already exists error.

using (var tx = envS.BeginTransaction(TransactionBeginFlags.ReadWrite))
{
    for (int i = 0; i < runCount; i++)
    {
        var keyAndValue = i.ToString();
        unsafe
        {
            fixed (char* p = keyAndValue)
            {
                var size = keyAndValue.Length * sizeof(Char);
                var key1 = new DirectBuffer((long) size, (byte*) p);
                var value = new DirectBuffer((long) size, (byte*) p);
                dbS.Put(tx, ref key1, ref value, TransactionPutOptions.AppendData);
            }
        }
    }
    tx.Commit();
}

Could you tell me how to direct use string as key and value ?Thank you!

Need refcount of open DBs in Env

Need refcount of open DBs in Env. In the failing test we disposed Env explicitly while DB was finalized later. It should be impossible to dispose Env with outstanding open Dbs.

See commit a6e0025

Should it be possible to enumerate just keys (and not values) with a cursor?

Does the native LMDB API allow for passing NULL as the data parameter to mdb_cursor_get() and thus allow you to enumerate just keys? If so, does Spreads.LMDB expose this ability somehow?

What is the most performant way to check if a database has an entry for a specific key when that is all that you want to know and you do not want the value?

Nuget package does not contain a Mac runtime

I have taken a dependency to the package but unfortunately the nuget package does not contain a Mac runtime for libspreads_lmdb.so.

Can you please add support for mac?

Base class for transaction

Have you considered creating base class for transaction for use with shared functionality?

For example, you can open ReadOnlyCursor on both Transaction or ReadOnlyTransaction. But i want to create shared code for cursor enumeration. Right now i came up with this:

public void Iterate(object transaction)
{
            if (transaction is Transaction tx)
            {
                foreach (var item in db.AsEnumerable(tx, key)
                {
                    // do something with item
                }
            }
            else if (transaction is ReadOnlyTransaction rotx)
            {
                foreach (var item in db.AsEnumerable(rotx, key)
                {
                    // do something with item
                }
            }
            else
            {
                throw new ArgumentException("Not an Transaction object", nameof(transaction));
            }
}

It's working, but i find it very ugly. Is there a reason why you didn't use some shared base class? The way i see it - ReadWriteTransaction should be inheriting from ReadOnlyTransaction

sdb_put doesn't throw

Spreads extension sdb_put that does three LMDB operations in one P/Invoke doesn't throw e.g. on MAP_FULL (or throws not always, some strange behavior). It needs proper review. Marked as Obsolete for now.

Allow config to be set when Bootstrapper is not available.

If for any reason, Bootstrapper class is not able to be initialized (see this), Config can never be initialized. Please allow the DefaultLocation to be settable when the Bootstrapper class is unavailable.

SuppressGCTransition for read methods

This could give significant performance boost to read methods.

https://devblogs.microsoft.com/dotnet/improvements-in-native-code-interop-in-net-5-0/#suppressgctransition

Need to target 5.0 and review on which LMDB methods it's safe to apply this attribute.

For read transactions, we call txn_open/renew + read methods + txn_reset/close.

Some write methods could also benefit, but only in NOSYNC/ASYNC case. In normal case, they call fsync, which is slow and may block. Also, on Windows, write methods could trigger mmapped file growth, which is also slow. Therefore, write transactions open/close cannot have this attribute. They differ from read txns by a parameter, so we cannot apply the attribute here for read transactions as well. Only renew/reset are safe, and this library already leverages read-only cursor reuse via pooling and could benefit from the attribute.

Why no DirectBuffer overloads for PutAsync() ?

And a follow-up question is: If I use Put() (not PutAsync) and I am using disableAsync: false in my environment, will the Put() go through the blocking queue as with PutAsync()?

I am using async/await and also using DirectBuffer and Put() and I get random crashes. All of the Spans under my DirectBuffers are on the stack so they are unaffected by GC. Version="2020.0.114"

CursorGetOption.GetMultiple issue

I'm trying to get multiple structs saved by single key. I came up with this code:

public Span<CodeSegment> MatchTrack(long trackId)
        {
            var value = default(DirectBuffer);
            using (var cursor = databaseHolder.StoreDatabase.OpenReadOnlyCursor(tx))
            {
                if (cursor.TryGet(ref trackId, ref value, CursorGetOption.GetMultiple))
                {
                    return MemoryMarshal.Cast<byte, CodeSegment>(value.Span);
                }
            }
            return null;
        }

But i'm getting exception:

Type DirectBuffer is not fixed size. Either add Size parameter to StructLayout attribute or use Spreads.Serialization attribute to explicitly opt-in to treat non-primitive user-defined structs as fixed-size.

I'm saving those structs like this:

public void AddToStore(long trackId, List<CodeSegment> codeSegments)
        {
            foreach (var item in codeSegments)
            {
                var codeSegment = item;
                storeCursor.TryPut(ref trackId, ref codeSegment, CursorPutOptions.None);
            }
        }

The struct itself is:

[StructLayout(LayoutKind.Sequential, Size = 2 * sizeof(int))]
    public struct CodeSegment
    {
        #region Constructors

        /// <summary>
        /// Creates new CodeSegment with given parameters.
        /// </summary>
        /// <param name="code">Code.</param>
        /// <param name="time">Code timestamp.</param>
        public CodeSegment(int code, int time)
        {
            Code = code;
            Time = time;
        }

        #endregion Constructors

        #region Properties
        
        public int Code;
        public int Time;

        #endregion Properties
    }

Am i thinking wrong? Is it even possible to read multiple values at once into a span?

Some higher level apis

Have you considered some higher level apis in your library? For example - for someone who doesn't care about Span/Memory library could expose enumerators on duplicate entries and so on. Little things that would make using this wrapper a little bit easier and faster (in development time). I could open some PRs on this

LMDB allocating whole file

After last nuget update the lmdb database allocated whole file (as declared in MapSize) at start. I was using 1TB MapSize, since there was no problem with that earlier. Now i can't open the environment using new version (disk is too small). Is this planned? Can i set dynamic file size somehow?

Readme is somewhat outdated

First #28 - old custom packing via bootstrapper is still mentioned. Then some nonsense English in the part about switching off bounds ckecks.

Use `calli` for hot read methods

See dc5f383

Need to rewrite native methods using calli via UnsafeEx.Calli[signature]. Perf gain could be in the order of 10% for read methods. Maybe more for simple ones where P/Invoke cost dominates.

MDB_BAD_TXN

I'm trying to initiate few databases in environment with code like this:

using Spreads.LMDB;
using System.IO;

namespace LMDBReppro
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            const string lmdbPath = "lmdbDatabase";
            const int hashTablesCount = 50;
            if (Directory.Exists(lmdbPath))
                Directory.Delete(lmdbPath, true);
            Directory.CreateDirectory(lmdbPath);
            using (var environment = LMDBEnvironment.Create(lmdbPath, DbEnvironmentFlags.None))
            {
                environment.MapSize = (1024L * 1024L * 1024L * 10L); // 10 GB
                environment.MaxDatabases = hashTablesCount + 2;
                environment.MaxReaders = 1000;
                environment.Open();

                // Open all database to make sure they exists
                using (var tx = environment.BeginTransaction())
                {
                    var configuration = new DatabaseConfig(DbFlags.Create | DbFlags.IntegerKey);

                    var tracksDatabase = environment.OpenDatabase("tracks", configuration);
                    var subFingerprintsDatabase = environment.OpenDatabase("subFingerprints", configuration);

                    var hashTables = new Database[hashTablesCount];
                    var hashTableConfig = new DatabaseConfig(
                        DbFlags.Create
                        | DbFlags.IntegerKey
                        | DbFlags.DuplicatesSort
                        | DbFlags.DuplicatesFixed
                        | DbFlags.IntegerDuplicates
                    );
                    for (int i = 0; i < hashTablesCount; i++)
                    {
                        hashTables[i] = environment.OpenDatabase($"HashTable{i}", hashTableConfig);
                    }

                    tx.Commit();
                }
            }
        }
    }
}

But it fails with error from LMDB MDB_BAD_TXN. You can reproduce it in simple ConsoleApplication with your NuGet (fails in both .net Framework 4.7 and .net core 2.1)

Duplicate cursor logic

I have this code:

var codeSegments = new List<CodeSegment>(<some_initial_data>);

foreach (var item in codeSegments)
{
    var codeSegment = item;
    storeDatabase.Put(transaction, trackId, codeSegment, TransactionPutOptions.None);
}

using (var cursor = storeDatabase.OpenReadOnlyCursor(transaction))
{
    var value = default(CodeSegment);
    if (cursor.TryGet(ref trackId, ref value, CursorGetOption.Set)
            && cursor.TryGet(ref trackId, ref value, CursorGetOption.FirstDuplicate))
    {
        var segments = new CodeSegment[cursor.Count()];
        var counter = 0;
        segments[counter] = value;
        while (cursor.TryGet(ref trackId, ref value, CursorGetOption.NextDuplicate))
        {
            counter++;
            segments[counter] = value;
        }
        var orderedSegments = segments.OrderBy(e => e.Time).ToArray();

        for (int i = 0; i < ordered.Length; i++)
        {
            Debug.Assert(ordered[i].Time == orderedSegments[i].Time);
            Debug.Assert(ordered[i].Code == orderedSegments[i].Code);
        }
    }
}

But somehow return results are always off with input (Asserts don't pass, all items are off.

CodeSegment struct looks like this:

[StructLayout(LayoutKind.Sequential, Size = 2 * sizeof(int))]
[BinarySerialization(2 * sizeof(int))]
public struct CodeSegment
{
    public CodeSegment(int code, int time)
    {
        Code = code;
        Time = time;
    }

    public int Code;
    public int Time;
}

Is my cursor logic bad?

LockedObjectPool behavior

It was taken from corefx array pool implementation. In essence it is just a bounded thread-safe stack that should be faster than ConcurrentQueue. But the current implementation is quirky, one has to call Rent() on empty pool to be able to return values there and create object of Rent returned null (Rent will increment counter but has no factory, so it is just reserving space for future return). Bool return value from Return looks wrong, strange that it doesn't fail as in another place with SQLite connection pool today.

So the current code is incorrect but working, while tests fail with correct != here.

Need to review pooling once again. Or just add factory back.

Alignment

Important to remember that LMDB data is practically not aligned (to 2-bytes in dupsorted from 2014 comments from googling). Issue is here to review later that now data is returned without https://docs.microsoft.com/en-us/dotnet/api/system.reflection.emit.opcodes.unaligned?view=netframework-4.7.2

We must use (review usage if already) Unsafe.ReadUnaligned for any pointer read to T.

For Intel x64 it's ~noop and for others it's correctness issue, so performance doesn't matter.

Unable to find package System.Runtime.Intrinsics.Experimental

I'm trying to use your library in my .net standard library. In main project (TargetFramework .net standard 2.0) everything is ok, but i cannot compile test project (TargetFramework netcoreapp2.1). Test project have main project as reference. Error below. Have you any idea how to resolve this?

Also - i cannot get nuget itself on test project. The same error occures

Severity	Code	Description	Project	File	Line	Suppression State
Error	NU1102	Unable to find package System.Runtime.Intrinsics.Experimental with version (>= 4.6.0-preview1-26724-02)
  - Found 2 version(s) in nuget.org [ Nearest version: 4.5.0-rc1 ]
  - Found 0 version(s) in Microsoft Visual Studio Offline Packages

libmdbx

Hey,

First of all - thanks for the great job here. I have been using Lightning.NET in our big project, and allocations made by it were killing our servers constantly. Spreads.LMDB seems to be working much better so far. I will be testing it further.

Have you looked at (and considered) using libmdbx instead of LMDB? It have many improvements over LMDB and project seems to be very active.

Check it out: https://github.com/leo-yuriev/libmdbx

Use pooled IValueTaskSource for async write transactions

Useful links:

https://github.com/kkokosa/PooledValueTaskSource/blob/master/src/PooledValueTaskSource/FileReadingPooledValueTaskSource.cs

https://github.com/dotnet/corefx/pull/35522/files

The first link has the thing I was missing - object is released back to pool in GetResult or during sync completion.

Such reused object is also needed for DataSpreads/DataSpreads#6, so maybe will do a generic base class. For pooling T will be object in LMDB case. Or maybe some non-generic interface could be used for pooling. Actually what is mentioned in that issue should be implemented here.

Low priority because we could live with lambdas so far (until at least the first 6 DataSpreads issues are closed). But the current implementation does not accept a state object so we have to capture everything on every call, which is much more than a single TaskCompletionSource. Therefore we should also change the signature to the standard pattern with a delegate and state.

In DataSpreads (the Ingest sample) it's easy to starve the ThreadPool with high concurrency and blocking calls to LMDB from async methods are probably the main reason for that.

Sync path
When the BlockingCollection in LMDB environment is empty we should start a transaction synchronously. Async path is only to avoid blocking async code, we do not depend on anything special (e.g. NoLock) from the writer thread.

Passing state incurs boxing at least
The relevant queries have 2-3 value type parameters, plus need a global state object. We could only do pooling at Spreads.LMDB level if we use Func<Txn,object,object> and object as state. Alternatively, we could use a custom queue per every query where that matters and store the state in the pooled object. That's could be even simple due to explicit implementation vs fighting with types/boxing/etc.

Read a variable length text value

Hi,

I cannot find an example on tests. Is this possible?

Thanks

How to enumerate by a string prefix ?

Hello, I saw that lmdb can use key prefix for query, but I can't find related examples.

Could you tell me how can I use a prefix of string type for query? For example, use a C:\data as a key prefix to match C:\data\sub1, C:\data\sub1\123

Thank you!

Add NuGet package with LMDB RE branch

This is only relevant for Windows. We use master branch which on Windows allocates disk space on demand, but is slower. For interactive applications allocating all space is inconvenient, but resizing env is even more inconvenient so we prefer on-demand allocation by default. Idea is that Windows is usually a dev machine and on Linux there is no performance overheads from on-demand allocations (LMDB just works that way there).

But e.g. on Windows Server (or big machine that acts like a server, or powerful workstation will a dedicated disk) one could afford pre-allocating hundreds of GBs.

Should name the package Spreads.LMDB.RE

See discussion in #19

Typed DB & Cursor

Database<T> & Cursor<T> where T : unmanaged inheriting from untyped will be useful. Must check that T is fixed once and avoid checks on every call.

Random AccessViolationException

On following code:

public unsafe void PutSubFingerprint(SubFingerprintDataDTO subFingerprintDataDTO)
        {
            var subFingerprintKey = new DirectBuffer(BitConverter.GetBytes(subFingerprintDataDTO.SubFingerprintReference));
            var value = ZeroFormatterSerializer.Serialize(subFingerprintDataDTO);
            fixed (byte* array = value)
            {
                var directBuffer = new DirectBuffer(value.Length, array);
                databasesHolder.SubFingerprintsDatabase.Put(tx, ref subFingerprintKey, ref directBuffer);
            }
        }

I get randomly occuring AccessViolationException on line databasesHolder.SubFingerprintsDatabase.Put(tx, ref subFingerprintKey, ref directBuffer);.

I'm calling this code around 2000 times in a single Transaction (i don't know if that have any meaning in this context

By random i mean i can't find reproduction pattern. I'm building database using the same data in the same order. Sometimes it works and sometime it fails. When it fails it's always on another entry. So i don't think it's data fault. Am i doing something wrong?

Usage of CursorPutOptions.MultipleData

I cannot find test that shows usage of flag CursorPutOptions.MultipleData. I looked in your code and it seems like you do map native method that get DirectBuffer[] array, but i cannot find method in Cursor that actually uses it.

Can you provide some help with this one?

PS. Sorry with all those question today, i just find your code very promising :)

@edit: same thing seems to be with getting multiple data

[Question]How to get Crash stack caused in lMDB / native / unsafe operations ?

Excuse me, I have a tricky question, I have been using LMDB for some time. After some recent changes to the code, the program runs normally in Debug. However, the release mode will crash after several operations, and I repeatedly check the code. I do n’t know why.

The system Event Viewer and WER files also have very vague CRASH information. There is no useful information.
The Visual Studio debugging process also ends immediately after the process CRASH.

Is there any software that can capture the stack information of CRASH caused by native / unsafe operations in .NET?

Thank you very much !

Fault bucket 2110382570943049284, type 4
Event Name: APPCRASH
Response: Not available
Cab Id: 0

Problem signature:
P1: App.exe
P2: 1.0.0.0
P3: 5da7a666
P4: StackHash_cac1
P5: 10.0.18362.657
P6: 64d10ee0
P7: c0000374
P8: PCH_1D_FROM_ntdll+0x000000000009CC14
P9: 
P10: 

Attached files:
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WER96F5.tmp.dmp
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERA195.tmp.WERInternalMetadata.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERA1D4.tmp.xml
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERA1E4.tmp.csv
\\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERA38B.tmp.txt

These files may be available here:
\\?\C:\ProgramData\Microsoft\Windows\WER\ReportArchive\AppCrash_App._294e28b84ee69f467340d36c963cd5ff208b8ca6_1f478d7f_2f6dac53-00d2-475a-8154-7a9674bd6133

Analysis symbol: 
Rechecking for solution: 0
Report Id: ac47185f-824b-498e-8754-d018c3356820
Report Status: 268435456
Hashed bucket: 8da9a20d69207e766d4995c4593c6e44
Cab Guid: 0


-------


Faulting application name: App.exe, version: 1.0.0.0, time stamp: 0x5da7a666
Faulting module name: ntdll.dll, version: 10.0.18362.657, time stamp: 0x64d10ee0
Exception code: 0xc0000374
Fault offset: 0x00000000000f92a9
Faulting process id: 0x70c28
Faulting application start time: 0x01d5eab8148eb3f9
Faulting application path: X:\git\App\App.exe
Faulting module path: C:\WINDOWS\SYSTEM32\ntdll.dll
Report Id: ac47185f-824b-498e-8754-d018c3356820
Faulting package full name: 
Faulting package-relative application ID:

Access violation on Mac

The code that used to work all the time now gets access violation. Has something else been changed? This could be something on my side but just checking.

Also in previous versions all I had to do for a type was to put StructLayout(LayoutKind.Sequential, Size = n)] but now there is another attribute needed?

Commented out tests fail

Had to comment out 3 tests to make Linux build pass. The build was randomly, but mostly, failing (~~ 7 of 10 in GH Actions).

They fail on WSL with my binary (both with and without -s gcc option, which explains 91kb vs 370kb) and with a binary from Lightning. So it's not about binaries.

One of the tests is the one from Ali on MacOS. Maybe some use case is really an edge one either with pooling/reset, disposal or really the upstream.

Main vanilla functionality works, including Spreads TryFind methods => low priority to investigate.

Maybe it's still too long ENV file path is those tests specifically. Other fails were due to that.

TryGet current on cursor throws exception when no current record

Hi,

I am deleting records and found out if I delete the last record, I get an exception calling this:

c.TryGet(ref keydb, ref valdb, CursorGetOption.GetCurrent);

The exception is:

andling exception: Spreads.LMDB.LMDBException: Invalid argument
[xUnit.net 00:00:05.4860040]            at Spreads.LMDB.Interop.NativeMethods.ThrowLMDBEx(Int32 res) in C:\MD\Dev\Spreads\Spreads.LMDB\src\Spreads.LMDB\Interop\NativeMethods.cs:line 127
[xUnit.net 00:00:05.4862130]            at Spreads.LMDB.Cursor.TryGet(DirectBuffer& key, DirectBuffer& value, CursorGetOption operation) in C:\MD\Dev\Spreads\Spreads.LMDB\src\Spreads.LMDB\Cursor.cs:line 66
[xUnit.net 00:00:05.4863290]            at

Is this expected? TryGet really should not get an exception since I am not doing anything special.

DB creation in/after a transaction

I cannot call env.OpenDatabase() in a created transaction. If I create a db after creating a transaction, db gets null and failed in next operations.

I think new API can be added for this issue:
LMDBEnvironment.OpenDatabase(string name, Transaction txn, DatabaseConfig config)
or
Transaction.OpenDatabase(string name, DatabaseConfig config)

Here is the failed code:

string dir = "spread_lmdb_hello";

if (Directory.Exists(dir))
    Directory.Delete(dir, true);
var env = LMDBEnvironment.Create(dir, LMDBEnvironmentFlags.NoSync | LMDBEnvironmentFlags.WriteMap);
env.Open();

using Transaction txn = env.BeginTransaction();
using var db = env.OpenDatabase("test", new DatabaseConfig(DbFlags.Create)); // db  gets null here.
db.Truncate(txn); // crash with "Spreads.LMDB.LMDBException : Invalid argument"

No support for string keys?

On the same topic, I need to store string for keys.
From my understanding, keys and values are bytes in LMDB but in this library they are meant to be struct.

Is it not possible from your library? I understand that the struct makes the use of Spans possible but can be limiting. Happy also to open an issue for it so the answer stays.

It is certainly possible to create a struct to represent a fixed-length string with implicit conversion as below but a bit clunky.

struct TenCharAsciiString
{
    public byte char0;
    public byte char1;
    public byte char2;
    public byte char3;
    public byte char4;
    public byte char5;
    public byte char6;
    public byte char7;
    public byte char8;
    public byte char9;

    public static implicit operator string(TenCharAsciiString s)
    {
        //...
    }
    public static implicit operator TenCharAsciiString(string s)
    {
        //...
    }
}

Replace blocking concurrent queue with non-blocking channel?

Write transactions are executed in a single thread via a blocking concurrent queue.

it is better to replace blocking queue with Channel.

The former blocks a thread when waiting for new data; The latter is totally non-blocking hence you can free up thread when there is no data to write.

Win x86 support ?

Hi,

Spreads.LMDB work well on x64, but in Windows x86 , will throw a libspreads_lmdb.dll not found error.

I'v fond libspreads_native.dll in x86 bin folder, but libspreads_lmdb.dll only in x64 folder,

does preads.LMDB support Windows x86?

Thank you!

Add TouchSpace method

It's quite common to put/append many values in a single transaction. With NTDLL API this just kills performance and makes batched operation as slow as non-batched when the env file needs to grow. One could do this manually with MDB_RESERVE and then deleting, but a simple method AddSpace/TouchSpace will be convenient. Touch is a better word then add, add space means "increase map size".

Issue with multiple data write

Hi,

Could you help me out with and issue i'm having.. For the purpose of SoundFingerprinting algorithm i'm trying to use multiple data as Dictionary<int, Set<ulong>>() and (i far as i understandard LMDB) that it should work this way. But i found an use case when written values are just.. wrong. And i can't find out why is that happening. If you could take a look at this code (it's full issue reproduction in xunit test)
SpreadsLMDBBugRepro.zip

IntegerDuplicates works with ulong but not with long

When I use long for values of duplicates, it is not able to find them when I TryFindDup. It is able to find for int and ulong.

Can you please explain?

I am planning to use long (since is a more natural type) in my design. I can go to ulong but just need to understand whether the support for ulong is only perhaps because of supporting only at int level.