May I ask you a couple of questions that are not directly about the current implementation, but depend on the library elements? You know .NET and LMBD better than me and could spot some flaws as before.
First, about writes.
I want to use the lib with async without worrying about tasks that switch OS threads, so I wrap all write transactions in an async task like this:
/// <summary>
/// Performs a write trasaction asynchronously
/// </summary>
internal static async Task<T> WriteAsync<T>(this DbEnvironment env, Func<Transaction, Task<T>> writeTask) {
var tcs = new TaskCompletionSource<object>();
if (!env.WriteQueue.IsAddingCompleted) {
Func<Transaction, Task<object>> job = async (x) => await writeTask(x);
var tuple = Tuple.Create(tcs, job);
env.WriteQueue.Add(tuple);
} else {
tcs.TrySetCanceled();
}
var res = await tcs.Task;
return (T)res;
}
Where env.WriteQueue
is a concurrent queue that is consumed on a separate long-running Task, which executes the write task and sets a result to a TaskCompletionSource
. As we discussed in the latest comments of #32, my the previous solution (create transactions async) wasn't working, but with TCS
all write transactions are concurrent from C# point of view and serialized on a separate thread (and there is still a named lock for different processes).
// --- THIS IS INSIDE ENV CONSTRUCTOR ---
// Writer Task
// In the current process writes are serialized via the blocking queue
// Accross processes, writes are synchronized via WriteTxnGate
_cts = new CancellationTokenSource();
_writeTask = Task.Factory.StartNew(() => {
while (!WriteQueue.IsCompleted) {
// BLOCKING
try {
var tuple = WriteQueue.Take(_cts.Token);
var tcs = tuple.Item1;
var job = tuple.Item2;
try {
var txn = BeginTransaction(TransactionFlags.ReadWrite);
var result = job(txn).Result;
// Autocommit, if forgot to commit
if (txn.State == TransactionState.Active) {
txn.Commit();
}
tcs.SetResult(result);
}
catch (Exception e) {
tcs.SetException(e);
}
}
catch (InvalidOperationException e) {
}
}
}, _cts.Token, TaskCreationOptions.LongRunning, TaskScheduler.Default);
I believe this is quite similar what LMDB does internally, and all the write locking could be done on .NET side as well. Assuming that I am going to use only this async wrapper for all transactions, do you see any problems with parallelism here?
Second, about read transactions
I am reading this thread and it describes a pretty common (and my major) use case, when many readers should frequently read what writers add to DB. My microbenchmarks show that LMDB could be as fast as some immutable in-memory data-structures like F# map within a single transaction, and eliminating a new transaction allocation for each query could make random access as fast and will allow to read DB updates very quickly.
The solution there is to create many read-only transactions and never commit them, but instead reset and renew them and store in a pool or a thread-local storage.
Howard @hyc writes:
In the actual LMDB API read transactions can be reused by their creating thread, so they are zero-cost after the first time. I don't know if any of the other language wrappers leverage this fact.
(Cursive is mine). Assuming again that I want to wrap all reads in a similar fashion that writes above, a pool is not a good thing because .NET Tasks could run on different threads other than the one where a transaction was created. If Go threads are similar to C# Tasks, Howard then kind of confirms this:
It is unfortunate that you're using a system like Go that multiplexes on top of OS threads. Your pool is going to require locks to manage access, and will be a bottleneck. In a conventional threaded program, thread-local storage can be accessed for free.
Before I saw that mail list thread, I had a naive implementation:
/// <summary>
/// Perform a a read transaction asynchronously
/// </summary>
internal static async Task<T> ReadAsync<T>(this DbEnvironment env, Func<Transaction, Task<T>> readTask) {
var txn = await env.BeginTransactionAsync(TransactionFlags.ReadOnly);
var res = await readTask(txn);
// Autocommit, if forgot to commit
if (txn.State == TransactionState.Active) {
txn.Commit();
}
return res;
}
Instead, inside env:
internal ThreadLocal<Transaction> TlTxn;
... in constructor:
TlTxn = new ThreadLocal<Transaction>(
() => {
var txn = BeginTransaction(TransactionFlags.ReadOnly);
txn.Reset(); // further on, we always renew it, not begin
return txn;
}, true); // track to dispose all of them later
Then the initial naive method becomes:
/// <summary>
/// Perform a a read transaction asynchronously
/// </summary>
internal static async Task<T> ReadAsync<T>(this DbEnvironment env, Func<Transaction, T> readTask) {
return await Task.Run(() => {
var txn = env.TlTxn.Value;
if (txn != null && txn.State == TransactionState.Reset) {
// this is expected state
txn.Renew();
}
else {
if (txn != null) env.TlTxn.Value.Dispose();
txn = env.BeginTransaction(TransactionFlags.ReadOnly);
env.TlTxn.Value = txn;
}
// inside this task only a single thread touches txn unless readTask shares it with others
// because readTask is not async, the thread of the outer task doesn't switch (doesn't await anything)
var res = readTask(txn);
// Autoreset
if (txn.State == TransactionState.Active) {
txn.Reset();
}
return res;
});
}
That works on simple tests, but does anything catch your eyes? Particularly, I worry about disposal of Transactions inside ThreadLocal<> object when in the long-running app threads come and go to/from a threadpool. Also, I wanted to keed readTask
returning a Task
and to await on it, but then I thought that the awaiting thread could switch to another task and access the same thread-local transaction from another ReadAsync
call while awaiting for the readTask
(or after awaiting, if await returns to another OS thread). The second quote is probably about that, but a lock here could deadlock or it is just too complex to reason about it - ThreadLocals with async lamdas are tricky and I am not sure I grasp the subtleties, readTask
as Func<Transaction, T> is enough.
Finally, do you think that this is enough to switch off LMDB locks completely with MDB_NOLOCK
option? It is not a goal per se, but I want C# version to be as thread-safe as the native one.