danburkert / lmdb-rs Goto Github PK

View Code? Open in Web Editor NEW

158.0 10.0 90.0 173 KB

Safe Rust bindings for LMDB

License: Apache License 2.0

Rust 100.00%

lmdb-rs's Introduction

lmdb-rs

Idiomatic and safe APIs for interacting with the Symas Lightning Memory-Mapped Database (LMDB).

Building from Source

git clone --recursive [email protected]:danburkert/lmdb-rs.git
cd lmdb-rs
cargo build

Features

lmdb-rs's People

Contributors

Stargazers

Watchers

lmdb-rs's Issues

Cursor::iter_start() panics on empty database

It seems like it should instead return an iterator that yields no results.

Unfortunately, it doesn't work to simply ignore the result of get() in iter_start() and return the Iter anyway: the subsequent mdb_cursor_get() call in Iter yields error code 22.

One way to do this would be to add a bool field to Iter to record whether or not it should try to look up more records, and set the field to true if the first get() doesn't find a key. Could also set the field when the first error is returned to spare further lookups, though that's likely not a bottleneck for anybody.

Anyway, if this sounds good I'm happy to submit a PR for it.

Pointer aliasing issue when using read-write cursors.

The following sample code (using the lmdb 0.8.0 crate and the tempfile 3.1.0 crate) shows a situation where a read-write cursor can cause a value pointed to by an immutable reference to change during the immutable reference's lifetime, which to my understanding is a violation of Rust's safety rules. The sample code doesn't use the unsafe keyword, which makes me think the problem is with some unsafe code in the lmdb crate.

use lmdb::{Environment, DatabaseFlags, WriteFlags, Cursor};
use tempfile::tempdir;
use lmdb_sys::MDB_FIRST;

fn main() {
    let temp_dir = tempdir().unwrap();
    let env = Environment::new().open(temp_dir.path()).unwrap();
    let db = env.create_db(None, DatabaseFlags::default()).unwrap();
    let key: Vec<u8> = vec![1, 2, 3];
    let value_0: Vec<u8> = vec![4, 5, 6];
    let value_1: Vec<u8> = vec![7, 8, 9];
    let mut rw_txn = env.begin_rw_txn().unwrap();
    rw_txn.put(db, &key, &value_0, WriteFlags::default()).unwrap();
    let mut rw_cursor = rw_txn.open_rw_cursor(db).unwrap();
    let returned_value: &[u8] = rw_cursor.get(None, None, MDB_FIRST).unwrap().1;
    println!("returned_value is an immutable slice reference, so the value it points to should not change during its lifetime.");
    println!("First, returned_value points to: {:?}", returned_value);
    rw_cursor.put(&key, &value_1, WriteFlags::default()).unwrap();
    println!("Later, returned_value points to: {:?}", returned_value);
}

On my computer, running the above code yields the following output.

returned_value is an immutable slice reference, so the value it points to should not change during its lifetime.
First, returned_value points to: [4, 5, 6]
Later, returned_value points to: [7, 8, 9]

I think part of the problem is that the key and value references returned by Cursor::get have lifetimes that are not tied to the lifetime of the self reference. Maybe the 'txn lifetime annotations should be removed from the signature of Cursor::get?

I suspect that if you replace rw_cursor.put with rw_cursor.del in the above example, it might even be possible to get a segfault or other illegal memory access due to freeing of the buffer that returned_value points to, but I haven't been able to make that happen in my testing so far, so it is just speculation.

Documentation doesn't clearly specify which flag constants belong to flag group

Currently, it's difficult to discover all flags of a type without looking at the source. For example, if I want to find all possible WriteFlags values when writing code to put a value into a database, I open the flags.rs source file and look at the bitflags macro that defines them. Ditto when I need to find the possible values for DatabaseFlags and EnvironmentFlags.

Instead, the rustdoc-generated documentation should list all flags of a type in one place. I'm not sure where and how, but here are some ideas:

List all relevant flag constants in the type definition (e.g., in the description for WriteFlags, list out APPEND, APPEND_DUP, CURRENT, etc.).
Create a submodule for each flag type, which would force the documentation to group flags together by type.
Explicitly describe the flags and what they do in the description for each method that has a flag parameter. For example, in the RwTransaction::put method, specify all WriteFlags values and what effect they have.

Items 1 and 2, above, are the easiest to implement, though I think item 3 gives the best user experience. Of course, the items aren't mutually exclusive and good documentation is often repetitive instead of maximally concise.

Here are all affected flag types:

DatabaseFlags
EnvironmentFlags
WriteFlags

Provide safe access to LMDB version

I'm currently playing around with accessing a shared LMDB database from multiple languages (Perl, Python, PHP, Rust, NodeJS), and I'm getting an MDB_VERSION_MISMATCH error in my NodeJS implementation. To further debug this, I tried to determine the LMDB version that each implementation is using, which worked...

in NodeJS:

 let lmdb = require('node-lmdb');
 console.log(lmdb.version);

in Python:

 import lmdb
 print("{}".format(lmdb.version()))

in Perl:

 use LMDB_File qw(:flags :version);
 printf "%s\n", LMDB_File::MDB_VERSION_STRING;

in PHP:

 printf("%s\n", dba_handlers(true)["lmdb"]);

But I can't find a way to access this information from Rust with the lmdb crate. As far as I can tell, the only way to get it is by using mdb_version from lmdb-sys, but that seems to be unsafe, and as a Rust newbie, I don't dare use it. From the lmdb-sys code it looks like as if the current version is 0.9.21, but that should probably be queryable via code :)

[FR] allow linking with system dylib

segment fault

stable rustc 1.19.0 (0ade33941 2017-07-17)

Thread 21 "refresh_state" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ff446dff700 (LWP 10586)]
0x00005555555c73b7 in mdb_cursor_put ()

(gdb) info stack
#0  0x00005555555c73b7 in mdb_cursor_put ()
#1  0x00005555555c91a0 in mdb_put ()

Database and Environment marked as Sync and Send， assume to be thread safe,
but in my application, concurrent put cause ”segment fault“ as above，have any idea about it？

this is my code

        let mut tx = self.env.begin_rw_txn()?;
        let height = Num(height);
        let encoded_height = serialize(&height, Infinite).unwrap();
        let new_height =
            tx.put(self.index, &height_key, &encoded_height.as_slice(), libmdb::NO_DUP_DATA);
        if new_height.is_ok() {
            let head = tx.get(self.main, &head_key)
                .map(From::from)
                .unwrap_or_else(|_| Num::default());
            if height > head {
                tx.put(
                    self.main,
                    &head_key,
                    &encoded_height.clone().as_slice(),
                    WriteFlags::empty(),
                )?;
            }
            if let Err(libmdb::Error::NotFound) = tx.get(self.main, &tail_key) {
                tx.put(
                    self.main,
                    &tail_key,
                    &encoded_height.clone().as_slice(),
                    WriteFlags::empty(),
                )?;
            }
        }
        tx.commit()

return Result from fallible iteration methods

In #37, I suggested returning a Result instead of panicking when iterator operations fail.

The Cursor::iter*() methods are straightforward to convert, and the ergonomic impact would be minimal. To obtain the equivalent behavior as today (panic on error), a consumer would just need to unwrap the Result:

- cursor.iter().collect::<Vec<_>>()
+ cursor.iter().unwrap().collect::<Vec<_>>()

Alternately, they could handle it via the other Result methods, via a match operator, or (in functions that themselves return a Result) via the ? operator, which is a single character change:

- cursor.iter().collect::<Vec<_>>()
+ cursor.iter()?.collect::<Vec<_>>()

Iter::next() is similarly straightforward to convert, although the ergonomic impact seems more significant. Nevertheless, iterating Result instances appears to be common enough that idioms have emerged to simplify it.

To obtain the equivalent behavior as today, a consumer could unwrap() each Result:

- cursor.iter().collect::<Vec<_>>()
+ cursor.iter().map(Result::unwrap).collect::<Vec<_>>()

However, they could also convert a collection of results—f.e. Vec<Result<T, E>>—into a result containing a collection (or the first error)—i.e. Result<Vec<T, E>> via this compact syntax as a type annotation:

let items: Result<Vec<_>> = cursor.iter().collect();

Or the turbofish equivalent:

cursor.iter().collect::<Result<Vec<_>>>()

The consumer could then return the error in a function that itself returns Result via the ? operator:

- cursor.iter().collect::<Vec<_>>()
+ cursor.iter().collect::<Result<Vec<_>>>()?

Or handle it in another way that is appropriate to its use case.

Also see Karol Kuczmarski's blog post Iteration patterns for Result & Option, which describes other interesting options, such as collecting only Ok results (ignoring errors) and partitioning results into separate collections of Ok and Err results.

for loops over iterators can similarly be converted to obtain the equivalent behavior as today:

- for (key, data) in cursor.iter() {
+ for (key, data) in cursor.iter().map(Result::unwrap) {

To return on error, however, I think a consumer would need to destructure the tuple in the loop body:

- for (key, data) in cursor.iter() {
+ for result in cursor.iter() {
+     let (key, data) = result?;

@danburkert Do these idioms seem ergonomic enough, or are they still too complex for comfort?

rustfmt

I'd like to make some other changes and submit them, but I have my editor set to run rustfmt by default when saving a file. I can obviously disable this, but would a pull request that only runs rustfmt be welcome?

Can't clone because submodule is missing

The submodule "lmdb-sys" is hosted on Gitorious, which seems to have shut down.

Could it just be merged in here instead of using a submodule?

Consider accepting the UNC paths for env on Windows

EnvironmentBuilder::open() panics on Windows (latest stable-x86_64-pc-windows-msvc) while calling it with a canonicalized path generated by std::path::Path::canonicalize(). The error_no(123) suggests that the given path is invalid, therefore it couldn't open the environment at that path.

Looks like the canonicalized paths on Windows are UNC paths (e.g. \\?\C:\\foo.bar), if the underlying mmap system calls can't play well with them, shall we consider converting those UNC paths to regular ones here?

New release?

Since #37 has been merged, it would be nice to get a new release published!

Cursor::iterFrom: Panics on empty databases

lmdb-rs/src/cursor.rs

Line 79 in 90e8880

self.get(None, None, ffi::MDB_FIRST).unwrap();

The above line causes a panic in the event of an empty database.
Given that the MDB_SET_RANGE: 'Position at first key greater than or equal to specified key.' option is used, it seems like it might also cause a panic if you ask for a key which is greater than any key stored.

I think that causing a panic here isn't intuitive.
Options I see:

Return an empty iterator instead
(breaking) Make iter_from return a Result<Iter<'txn>, lmdb::error::Error> instead.

Would a PR with one of options be OK?

Investigate replacing `#[doc(hidden)]` with `pub(crate)`

The #[doc(hidden)] attributes were originally to solve cross-module visibility issues, but now Rust has better features for that. https://github.com/rust-lang/rfcs/blob/master/text/1422-pub-restricted.md

Linked documentation is out of date

The documentation on rust-ci.org which is linked from README.md and https://crates.io/crates/lmdb seems to refer to a much older version, e.g. it does not have RwCursor.

set_max_dbs: Confusing function signature name.

fn set_max_dbs(&mut self, max_readers: c_uint) -> &mut EnvironmentBuilder

may be we can change max_readers to max_dbs ?

Cursor::iter_from panics when database has duplicate keys

@tarcieri reported this panic, which is coming from the unwrap call. It's not clear how/what is going on here, except that perhaps iter_from should return a Result. Further clarification on in exactly what situations this happens would be good to add to the documentation. Reproducible test case: tarcieri/ithos@6d15568#commitcomment-18461783

How to deal with MapFull?

During productive use I got the error DB(MapFull). My very basic understand + basic googling says I hit the total number of records given my current database, but this is probably not a hard limit.

I open the database as followed:

let dir: &std::path::Path = std::path::Path::new(&config.persistence_file_path);
let db_flags: lmdb::DatabaseFlags = lmdb::DatabaseFlags::empty();
let db_environment = try!(lmdb::Environment::new().set_max_dbs(1).open(&dir));
let database = try!(db_environment.create_db(None, db_flags));

When writing entries I use WriteFlags::empty() within the RwTransaction (which now fails with message MapFull).

The mdb file is 1MB large, so this seems like some default limit being hit. I'm basically only using LMDB as an append-only storage for simple event sourcing, so this was only a matter of time.

Can I increase this db limit somehow by changing the Rust code / using (other) flags? Is there another way? Can I keep my data while resizing, or should I export it from my existing file and import it into a new db?

Thank you for any tips on how to deal with this!

PS: I wasn't sure if I should just go ask at the LMDB project itself, but maybe this use case is already dealt with by a flag within the current crate, and I only have to change my code.

Potential issue with DUP_SORT

Hey!

I might be missing something here, but from what it looks like, DUP_SORT is not behaving as expected.

I have the following test code:

use lmdb::{Cursor, Database, DatabaseFlags, Environment, RwTransaction, Transaction, WriteFlags};
use std::sync::Arc;
use tempdir::TempDir;

fn main() {
    let db_tmp = TempDir::new("test").unwrap().into_path();
    let db_path = db_tmp.to_str().unwrap();

    let env = Arc::new(
        Environment::new()
            .set_max_dbs(10)
            .set_map_size(10000000)
            .open(db_path.as_ref())
            .unwrap(),
    );

    let db = env
        .create_db(Some("db_name"), DatabaseFlags::DUP_SORT)
        .unwrap();

    let key = vec![3u8; 32];
    let val = vec![4u8; 32];

    {
        let mut db_txn = env.begin_rw_txn().unwrap();

        db_txn.put(db, &key, &val, WriteFlags::empty()).unwrap();

        db_txn.commit().unwrap();
    }

    {
        let mut db_txn = env.begin_rw_txn().unwrap();

        db_txn.del(db, &key, Some(&val)).unwrap();

        db_txn.commit().unwrap();
    }
}

The code panics at db_txn.del(db, &key, Some(&val)).unwrap(); with Err value: NotFound'`. Is this expected to work?

Thank you!

Support for passing in file permissions of the environment when calling EnvironmentBuilder::open

I would like to specify the file permissions for the environment. This crate defaults to 0644 with no way to override it.

Add support for MDB_MULTIPLE

Hi,

I tried hacking around a little a while ago to add support for the MDB_MULTIPLE insert flag on mdb_cursor_put but failed. Here's my (maybe outdated) dirty branch: master...fbernier:mdb_multiple

The problem I was having is that is seems like what I'm getting back when reading is the pointer to the data and not the data itself.

I am opening this issue because I kind of gave up but feel like it would be a worthy addition.

If I understand correctly, one of the problems with implementing multiple insert is that the data needs to be sequential. If it's not, we can still use it by copying the data but I guess it defeats the purpose.

Feature request: provide safe access to MDB_envinfo

I'm assuming that access to MDB_envinfo would be very similar to how MDB_stat is safely wrapped?

Compilation fails on armv7

Hi!

Compiling this crate using stable on an armv7 (stable-armv7-unknown-linux-gnueabihf) fails with the following error:

error[E0308]: mismatched types
   --> /root/.cargo/git/checkouts/lmdb-rs-82a4129c4785dceb/37b241c/src/error.rs:119:53
    |
119 |             str::from_utf8_unchecked(CStr::from_ptr(err).to_bytes())
    |                                                     ^^^ expected u8, found i8
    |
    = note: expected type `*const u8`
               found type `*const i8`
    = help: here are some functions which might fulfill your needs:
            - .offset(...)
            - .wrapping_offset(...)