Using Rust to Make a Safer Interface for Yahoo’s Fast MDBM Database
I’m really supposed to be working on my
serialization series,
but I just saw a neat new library that was open sourced by Yahoo a couple days
ago called MDBM on
Hacker News. I know nothing
about the library, but there are some really neat claims:
It’s supposed to be fast, and that’s always nice.
It’s supposed to have a really slim interface.
It’s so fast because it’s passing around naked pointers to mmapped files,
which is terribly unsafe. Unless you got rust which can prove that those
pointers won’t escape :)
So I wanted to see how easy it’d be to make a Rust binding for the project.
If you want to follow along, first make sure you have
rust installed. Unfortunately it
looks like MDBM only supports Linux and FreeBSD, so I had to build out a Fedora
VM to test this out on. I think this is all you need to build it:
12345
% git clone https://github.com/yahoo/mdbm
% cd mdbm/redhat
% make
% rpm -Uvh ~/rpmbuild/RPMS/x86_64/mdbm-4.11.1-1.fc21.x86_64.rpm
% rpm -Uvh ~/rpmbuild/RPMS/x86_64/mdbm-devel-4.11.1-1.fc21.x86_64.rpm
Unfortunately it’s only for linux, and I got a mac, but it turns out there’s
plenty I can do to prep while VirtualBox and Fedora 21 download. Lets start out
by creating our project with cargo:
12
% cargo new mdbm
% cd rust-mdbm
(Right now there’s no way to have the name be different than the path, so edit
Cargo.toml to rename the project to mdbm. I filed
#1030 to get that
implemented).
By convention, we put bindgen packages into $name-sys, so make that crate as
well:
12
% cargo new --no-git mdbm-sys
% cd mdbm-sys
We’ve got a really cool tool called
bindgen, which uses clang to parse
header files and convert them into an unsafe rust interface. So lets check out
MDBM, and generate a crate to wrap it up in.
123456789101112
% cd ../..
% git clone git@github.com:crabtw/rust-bindgen.git
% cd rust-bindgen
% cargo build
% cd ..
% git clone git@github.com:yahoo/mdbm.git
% cd rust-mdbm/mdbm-sys
% DYLD_LIBRARY_PATH=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib \
~/rust/rust-bindgen/target/bindgen \
-lmdbm \
-o src/lib.rs \
mdbm/include/mdbm.h
Pretty magical. Make sure it builds:
1234567891011
% cargo build
/Users/erickt/rust-mdbm/mdbm-sys/src/lib.rs:3:21: 3:25 error: failed to resolve. Maybe a missing `extern crate libc`?
/Users/erickt/rust-mdbm/mdbm-sys/src/lib.rs:3 pub type __int8_t = ::libc::c_char;
^~~~
/Users/erickt/rust-mdbm/mdbm-sys/src/lib.rs:3:21: 3:35 error: use of undeclared type name `libc::c_char`
/Users/erickt/rust-mdbm/mdbm-sys/src/lib.rs:3 pub type __int8_t = ::libc::c_char;
^~~~~~~~~~~~~~
/Users/erickt/rust-mdbm/mdbm-sys/src/lib.rs:4:22: 4:26 error: failed to resolve. Maybe a missing `extern crate libc`?
/Users/erickt/rust-mdbm/mdbm-sys/src/lib.rs:4 pub type __uint8_t = ::libc::c_uchar;
^~~~
...
Nope! The problem is that we don’t have the libc crate imported. We don’t
have a convention yet for this, but I like to do is:
This lets me run bindgen later on without mucking up the library. This now
compiles. Next up is our high level interface. Add mdbm-sys to our high level
interface by adding this to the rust-mdbm/Cargo.toml file:
12
[dependencies.mdbm-sys]path="mdbm-sys"
By now I got my VirtualBox setup working, so now to the actual code! Lets start
with a barebones wrapper around the database:
123
pubstructMDBM{db:*mutmdbm_sys::MDBM,}
Next is the constructor and destructor. I’m hardcoding things for now and using
IoError, since MDBM appears to log everything to the ERRNO:
Pretty straightforward translation of the examples with some hardcoded values
to start out. Next up is a wrapper around MDBM’s datum type, which is the
type used for both keys and values. datum is just a simple struct containing
a pointer and length, pretty much analogous to our &[u8] slices. However our
slices are much more powerful because our type system can guarantee that in
safe Rust, these slices can never outlive where they are derived from:
And finally, we got setting and getting a key-value. Setting is pretty
straightforward. The only fun thing is using the AsDatum constraints so we
can do db.set(&"foo", &"bar", 0) instead of
db.set(Datum::new(&"foo".as_slice()), Datum::new("bar".as_slice()), 0).
we’re copying into the database, we don’t have to worry about lifetimes yet:
123456789101112131415161718192021222324
implMDBM{.../// Set a key.pubfnset<K,V>(&self,key:&K,value:&V,flags:int)->Result<(),IoError>whereK:AsDatum,V:AsDatum,{unsafe{letrc=mdbm_sys::mdbm_store(self.db,to_raw_datum(&key.as_datum()),to_raw_datum(&value.as_datum()),flagsaslibc::c_int);ifrc==-1{Err(IoError::last_error())}else{Ok(())}}}...
MDBM requires the database to be locked in order to get the keys. This os
where things get fun in order to prevent those interior pointers from escaping.
We’ll create another wrapper type that manages the lock, and uses RAII to
unlock when we’re done. We tie the lifetime of the Lock to the lifetime of
the database and key, which prevents it from outliving either object:
implMDBM{.../// Lock a key.pubfnlock<'a,K>(&'aself,key:&'aK,flags:int)->Result<Lock<'a>,IoError>whereK:AsDatum,{letrc=unsafe{mdbm_sys::mdbm_lock_smart(self.db,&to_raw_datum(&key.as_datum()),flagsaslibc::c_int)};ifrc==1{Ok(Lock{db:self,key:key.as_datum()})}else{Err(IoError::last_error())}}...}pubstructLock<'a>{db:&'aMDBM,key:Datum<'a>,}#[unsafe_destructor]impl<'a>DropforLock<'a>{fndrop(&mutself){unsafe{letrc=mdbm_sys::mdbm_unlock_smart(self.db.db,&to_raw_datum(&self.key),0);assert_eq!(rc,1);}}}
(Note that I’ve heard #[unsafe_destrutor] as used here may become unnecessary
in 1.0).
Finally, let’s get our value! Assuming the value exists, we tie the lifetime of
the Lock to the lifetime of the returned &[u8]:
123456789101112131415161718
impl<'a>Lock<'a>{/// Fetch a key.pubfnget<'a>(&'aself)->Option<&'a[u8]>{unsafe{letvalue=mdbm_sys::mdbm_fetch(self.db.db,to_raw_datum(&self.key));ifvalue.dptr.is_null(){None}else{// we want to constrain the ptr to our lifetime.letptr:&*constu8=mem::transmute(&value.dptr);Some(slice::from_raw_buf(ptr,value.dsizeasuint))}}}}
Now to verify it works:
123456789101112131415161718192021222324252627
#[test]fntest(){letdb=MDBM::new(&Path::new("test.db"),super::MDBM_O_RDWR|super::MDBM_O_CREAT,0o644,0,0).unwrap();db.set(&"hello",&"world",0).unwrap();{// key needs to be an lvalue so the lock can hold a reference to// it.letkey="hello";// Lock the key. RAII will unlock it when we exit this scope.letvalue=db.lock(&key,0).unwrap();// Convert the value into a string. The lock is still live at this// point.letvalue=str::from_utf8(value.get().unwrap()).unwrap();assert_eq!(value,"world");println!("hello: {}",value);}}
Success! Not too bad for 2 hours of work. Baring bugs, this mdbm
library should perform at roughly the same speed as the C library, but
eliminate many very painful bug opportunities that require tools like Valgrind
to debug.