serde_codegen
and serde_macros
while projects
switch over to 0.7, I’m going to shift more to a pull-based approach, so please
file a bug ticket if a nightly release has broken you.
On to the list on the major changes!
Error
from serde::de::Error
variants.visit_
methods to serialize_
and deserialize_
.serde::de::Error
implement
std::error::Error
.serde::de::Deserializer::deserialize_struct_field
hook that allows a
Deserializer
to know the next value is a struct field.serde::ser::Error
, which allows a Serialize
type produce an error
if it cannot be serialized.std::path::Path
with non-unicode characters will result in a
Serde error, rather than a panic.std::net
types.serde::de::Error::unknown_variant
error message hook.serde::de::Error::syntax
to serde::de::Error::custom
.#[serde(deny_unknown_fields)]
,
as in:1 2 3 4 5 |
|
#[serde(rename="...")]
to rename the
container, as in:1 2 3 4 5 |
|
#[serde(rename="...")]
to rename
variants, as in:1 2 3 4 |
|
#[serde(rename(serialize="...", deserialize="..."))]
that supports crazy schemas like AWS that expect serialized fields with the
first character lower case, and the deserialized response with the first
character upper cased.#[serde(default="$path")]
, where $path
is a
reference to some function that returns a default value for a field if it’s
not present when deserializing. For example:1 2 3 4 5 6 7 8 9 |
|
#[serde(skip_serializing_if="$path")]
, where
$path
is a path reference to some function that returns a bool
, that if
true, should skip serializing the field.1 2 3 4 5 6 7 8 9 |
|
#[serde(serialize_with="$path")]
and
#[serde(deserialize_with="$path")]
, where $path
us a path reference to
some function that serializes a field, as in:1 2 3 4 5 6 7 8 9 10 11 |
|
StreamDeserializer
, that enables parsing a stream of JSON values optionally
separated by whitespace into an iterator of those deserialized values.I’d like to thank everyone that’s helped out over the past few months. Please forgive me if I accidentally left you off the list:
Stateful
has some challenges it needs to overcome in order to add new and
exciting control flow mechanisms to Rust. While we don’t get access to any of
the cool analysis passes inside the Rust compiler, Stateful
is able to sneak
around their necessity in many cases since it really only needs to support a
subset of Rust. Here are some of the techniques it exploits, err, uses.
First off, let’s talk about variables. One of the primary things Stateful
needs to do is manage the process of state flowing through the machine. However,
consider a statement like this:
1
|
|
“Obviously it’s a variable, right?” Actually you can’t be sure. What if someone did:
1 2 3 4 |
|
Well then the compiler would helpfully report:
1 2 |
|
But that warning only works for simple let
statements. Consider what happens
with matches. Consider:
1 2 3 4 |
|
Is x
or y
a variable, or a variant? There’s no way to know unless you
perform name resolution, otherwise known as the resolve pass in the compiler.
Unfortunately though, there’s no way for Stateful
to run that analysis. As
Sméagol said, “There is another way. More secret, and dark way.”. This leads
us to Cheat Number One: Stateful assumes that all lowercase identifiers are
variables, and uppercase ones are enum variants. Sure, Rust supports
lowercase variants, but there’s no reason why Stateful
has to use them. It
makes our lives much easier.
The next problem is typing. Sure, Rust is nice and all that you can write a
local variable like let x = ...
and it’ll infer the type for you. All Rust
asks for is that the user explicitly specify the type of a value that enters or
leaves the bounds of a function. Our problem is that one of the main tasks of
Stateful
is to lift variables into some State
structure so that their
available when the function is re-entered. So in effect, all variables inside
Stateful
must be typed. Consider the example from last week:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
This State
enumeration is what I’m talking about. It gets passed into and
out of the advance
function. It needs to be some concrete type, which looks
something like this:
1 2 3 4 5 6 7 8 9 |
|
The problem is that we want to write code like this:
1 2 3 4 5 6 7 8 |
|
So how can we resolve this? Well first, we could wait for RFC 105 or RFC 1305 to get implemented, but that’s not happening any time soon. Until then, there is cheat number two: Hide state variables in a boxed trait. This one is from Eduard Burtescu. Instead of the nice well typed example from the last post, we actually generate some code that hides the types with an over abundance of generic types:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
All for the cost of a boxed variable. It’s not ideal, but it does let us keep experimenting. However, if we do want to avoid this allocation, we can just require that all variables that survive across a yield point have their type specified. So our previous example would be written as:
1 2 3 4 5 6 7 8 |
|
It’s not so bad here, but it’d get obnoxious if we had a generator like:
1 2 3 4 5 6 7 8 |
|
The type of iter
, by the way, is impossible to write because there is
currently no way to specify the type of the closure. Instead, it needs to be
rewritten to use a free function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
If we want to support closures though, we need to use the Box<Iterator<...>
trick.
This one’s a doozy. Here’s an example of the problem. Consider:
1 2 3 4 5 6 7 |
|
This would look something like this (which also demonstrates how match
statements are represented):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
Zero in on this block:
1 2 3 4 5 6 7 8 9 |
|
The type of opt
is Option<&'a mut usize>
, and value
is &'a mut usize
.
So we’ve got two outstanding mutable borrows, which is illegal. The real
problem is that Stateful
without Resolve and the Borrow Checker pass, it
cannot know if a use of the variable is a copy or move in all cases. So we now
have cheat number 3: Use pseudo-macros to hint to Stateful if a type is
copyable or movable. This is the same technique we use to implement the
pseudo-macro yield_!(...)
, where we would add move_!(...)
and copy_!(...)
to inform Stateful
when something has been, well, moved or copied. Our
previous example would then be written as:
1 2 3 4 5 6 7 |
|
Which would then give Stateful
enough information to generate something like
this, which would then know that the match consumed the option:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
I’m also considering some default rules, that can be overridden with these macros:
&T
type), then it’s always copied. All other types are assumed to not be
copyable.copy_!(...)
hint.move_!(...)
hint.match
statement, unless one of the match
arms uses ref
or ref mut
.Hopefully this will enable a vast majority of code to work without
copy_!(...)
or move_!(...)
.
Those are our major cheats! I’m sure there will be plenty more in the future. In the meantime, I want to show off some some actual working code! Check this puppy out!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Produces:
1 2 3 |
|
Isn’t it beautiful? We got generics, mutable variables, loops, matches, breaks, and a whole host of ignored warnings!
]]>Hello internet! It’s been too long. Not only are the Rust Meetups back up and running, it’s time for me to start back to blogging. For the past couple months, I’ve been working on a new syntax extension that will allow people to create fun and exciting new control flow mechanisms in stable Rust. “For the love of all that is sigils, why?!” Well, Because I can. Sometimes when you stare into the madness, it stares back into you? Or something like that?
It’s called Stateful, which helpfully has no documentation. Such an innocent name, right? It’s very much in progress (and mostly broken) implementation of some of the ideas in this and future posts. So don’t go and think these code snippets are executable just yet :)
Anyway, lets show off Stateful
by showing how we can implement
Generators.
We’ve got an RFC ticket to
implement them, but wouldn’t it be nice to have them sooner? For those of you
unfamiliar with the concept, Generators are function that can be returned from
multiple times, all while preserving state between those calls. Basically,
they’re just a simpler way to write
Iterators.
Say we wanted to iterate over the numbers 0, 1, and 2. Today, we would write
an Iterator
with something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
The struct preserves our state across these function calls. It’s a pretty
straightforward implementation, but it does have some amount of boilerplate
code. For large iterator implementations, this state management can get quite
complicated. Instead, lets see how this same code could be expressed with
something like Stateful
:
1 2 3 4 5 6 7 8 9 10 |
|
Where yield_!(i)
is some magical control flow mechanism that not only
returned some value Some(i)
, but also made sure on the iter.next()
would
jump the execution to just after the yield. At the end of the generator, we’d
just return None
. We could simplify this even more by unrolling that loop
into:
1 2 3 4 5 6 |
|
The fun part is figuring out how to convert these generators into something
that’s roughly equivalent to Iter3
. At it’s heart, Iter3
really is a
simple state machine, where we save the counter state in the structure before
we “yield” the value to the caller. Let’s look at what we would generate for
gen3_unrolled
.
First, we need some boilerplate, that sets up the state of our generator. We don’t yet have impl trait, so we hide all our stuff in a module:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
We represent our generator’s state with an enum. We have our initial state, a state per yield, then an exit state:
1 2 3 4 5 6 7 |
|
Finally, we have our state machine, and a pretty trivial Iterator
implementation that manages entering and exiting the state machine:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
We move the current state
into advance
, then have this loop-match
state
machine. Then there are 2 new control flow constructs:
return_!($expr; $next_state)
and our old friend goto!($next_state)
.
return_!()
returns some value and also sets the position the generator should
resume at, and goto!()
just sets the next state without leaving the function.
Here’s one way they might be implemented:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Relatively straightforward transformation, right? But that’s an easy case.
Things start to get a wee bit more complicated when we start thinking about how
we’d transform gen3
, because it’s got both a while
loop and a mutable
variable. Lets see that in action. I’ll leave out the boilerplate code and
just focus on the advance
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Now things are getting interesting! There are two critical things we can see
off the bat. First, we need to reify the loops and conditionals into the state
machine, because they affect the control flow. Second, we need to lift any
variables that are accessed across states into the State
enum.
We can also start seeing the complications. The obvious one is mutable
variables. We need to somehow thread the information about i
’s mutability
through each of the states. This naive implementation would trip over the
#[warn(unused_mut)]
lint. And now you might start to get a sense of the
horror that lies beneath Stateful
.
At this point, you might be thinking to yourself, “Self, if mutable variables are going to be complicated, what about copies and moves?” You sound like a pretty sensible person. Therein lies madness. You might want to stop thinking too deeply on it. If you can’t, maybe you think “Wait. What about Generics?” Yep. “Borrows?!” Now I’m getting a little worried. “How do you even know what’s a variable!?!” Sorry.
Yeah so there are one or two things that might be a tad challenging.
So that’s Stateful
. It’s an experiment to get some real world experience
with these control flow mechanisms that may someday feed into RFCs, and maybe,
just maybe, might get implemented in the compiler. There’s no reason we need
to support everything, which would require us to basically reimplement the
compiler. Instead, I believe there’s a subset of Rust that we can support in
order to start getting real experience now.
Generators area really just the start. There’s a whole host of other things
that really are just other things that, if you just squint at em, are really
just state machines in disguise. It’s quite possible if we can pull
Stateful
, we’ll also be able to implement things like
Coroutines,
Continuations, and
that hot new mechanism all the cool languages are implementing these days,
Async/Await.
But that’s all for later. First is to get this to work. In closing, I leave you with these wise words.
]]>ph’nglui mglw’nafh Cthulhu R’lyeh wgah’nagl fhtagn.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
|
Here’s the bug if you missed my callout:
1
|
|
My intention was to tie the lifetime of PollItem<'a>
to the lifetime of the
Socket
, but because I left out one measly 'a
, Rust doesn’t tie the two
together, and instead is actually using the 'static
lifetime. This then lets
you do something evil like:
1 2 3 4 5 6 7 8 9 |
|
It’s just that easy. Fix is simple, just change the function to use &'a self
and Rust will refuse to compile this snippet. Job well done!
Well, no, not really. Because what was particularly devious about this bug is
that it actually came back. Later on I accidentally reverted &'a self
back to
&self
because I secretly hate myself. The project and examples still compiled
and ran, but that unitialized dereference was just waiting around to cause a
security vulnerability.
Oops.
Crap.
Making sure Rust actually rejects programs that it ought to be rejecting fundamentally important when writing a library that uses Unsafe Rust.
That’s where compiletest comes in.
It’s a testing framework that’s been extacted from
rust-lang/rust
that lets you write these “shouldn’t-compile” tests. Here’s how to use it.
First add this to your Cargo.toml
. We do a little feature dance because
currently compiletest
only runs on nightly:
1 2 3 4 5 6 7 8 |
|
Then, add add a test driver tests/compile-tests.rs
(or whatever you want to
name it) that runs the compiletest tests:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
Finally, add the test! Here’s the one I wrote, tests/compile-fail/no-leaking-poll-items.rs
:
1 2 3 4 5 6 7 8 9 |
|
Now you can live in peace with the confidence that this bug won’t ever appear again:
1 2 3 4 5 6 7 8 9 10 11 |
|
In summary, use compiletest
, and demand it’s use from the Unsafe Rust
libraries you use! Otherwise you can never be sure if unsafe and undefined
behavior like this will sneak into your project.
TLDR:
]]>Serialize
e to hint to the Serializer
how to parse the next bytes.
This will enable Servo to use
bincode for its IPC protocol.
Here are the major changes:
serde::json
was factored out into its own separate crate
serde_json
#114.visit_named_{map,seq}
to
visit_struct
and visit_tuple_struct
#114
#120._error
from de::Error
#129.serde_macros
for generating fully generic code
#117.Thank you to everyone that’s helped with this release:
It’s been a bit since we last did some benchmarks, so here are the latest numbers with these compilers:
bincode’s serde support makes its first appearance, which starts out roughly 1/3 slower at serialization, but about the same speed at deserialization. I haven’t done much optimization, so there’s probably a lot of low hanging fruit.
serde_json saw a good amount of improvement, mainly from some compiler optimizations in the 1.4 nightly. The deserializer is slightly slower due to the parser rewrite.
capnproto-rust’s unpacked format shows a surprisingly large large serialization improvement, with a 10x improvement from 4GB/s to 15GB/s. Good job dwrensha! Deserialization is half as slow as before though. Perhaps I have a bug in my code?
I’ve changed the Rust MessagePack implementation to rmp, which has a wee bit faster serializer, but deserialization was about the same.
I’ve also updated the numbers for Go and C++, but those numbers stayed roughly the same.
Serialization:
language | library | format | serialization (MB/s) |
---|---|---|---|
Rust | capnproto-rust | Cap’n Proto (unpacked) | |
Go | go-capnproto | Cap’n Proto | 3877 |
Rust | bincode | Raw | |
Rust | bincode (serde) | Raw | 2143 |
Rust | capnproto-rust | Cap’n Proto (packed) | |
Go | gogoprotobuf | Protocol Buffers | |
Rust | rmp | MessagePack | |
Rust | rust-protobuf | Protocol Buffers | |
Rust | serde::json | JSON | |
C++ | rapidjson | JSON | 307 |
Go | goprotobuf | Protocol Buffers | |
Rust | serialize::json | JSON | |
Go | ffjson | JSON | 147 |
Go | encoding/json | JSON | 85 |
Deserialization:
language | library | format | deserialization (MB/s) |
---|---|---|---|
Rust | capnproto-rust | Cap’n Proto (unpacked) | |
Go | go-capnproto | Cap’n Proto (zero copy) | 1407 |
Go | go-capnproto | Cap’n Proto | 711 |
Rust | capnproto-rust | Cap’n Proto (packed) | |
Rust | bincode (serde) | Raw | 310 |
Rust | bincode | Raw | |
Go | gogoprotobuf | Protocol Buffers | 270 |
C++ | rapidjson | JSON (sax) | 182 |
C++ | rapidjson | JSON (dom) | 155 |
Rust | rust-protobuf | Protocol Buffers | 143 |
Rust | rmp | MessagePack | |
Rust | serde::json | JSON | |
Go | ffjson | JSON | 95 |
Go | goprotobuf | Protocol Buffers | 81 |
Go | encoding/json | JSON | 23 |
Rust | serialize::json | JSON | 23 |
#[derive(Serialize, Deserialize)]
decorator annotations. Here’s how to use
it.
First, lets start with a simple serde 0.3.x project that’s forced to use
nightly because it uses serde_macros
. The Cargo.toml
is:
1 2 3 4 5 6 7 8 9 |
|
And the actual library is src/lib.rs
:
1 2 3 4 5 6 7 8 9 10 |
|
In order to use Stable Rust, we can use the new serde_codegen
. Our strategy
is to split our input into two files. The first is the entry point Cargo will
use to compile the library, src/lib.rs
. The second is a template that
contains the macros, src/lib.rs.in
. It will be expanded into
$OUT_DIR/lib.rs
, which is included in src/lib.rs
. So src/lib.rs
looks
like:
1 2 3 |
|
src/lib.rs.in
then just looks like:
1 2 3 4 5 |
|
In order to generate the $OUT_DIR/lib.rs
, we’ll use a Cargo build script.
We’ll configure Cargo.toml
with:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Finally, the build.rs
script itself uses syntex
to expand the syntax
extensions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
While syntex
is quite powerful, there are a few major downsides. Rust does
not yet support the ability for a generated file to provide error location
information from a template file. This means that tracking down errors requires
manually looking at the generated code and trying to identify where the error
in the template. However, there is a workaround. It’s actually not that
difficult to support syntex
and the Rust Nightly compiler plugins. To update
our example, we’ll change the Cargo.toml
to:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Then the build.rs
is changed to optionally expand the macros in our template:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Finally, src/lib.rs
is updated to:
1 2 3 4 5 6 7 8 9 10 |
|
Then most development can happen with using the Nightly Rust and
cargo build --no-default-features --features nightly
for better error
messages, but downstream consumers can use Stable Rust without worry.
Syntex can only expand macros inside macros it knows about, and it doesn’t know about the builtin macros. This is because a lot of the stable macros are using unstable features under the covers. So unfortunately if you’re using a library like the quasiquoting library quasi, you cannot write:
1
|
|
Instead you have to pull out the syntex macros into a separate variable:
1 2 |
|
Syntex can take a while to compile. It may be possible to optimize this, but
that may be difficult while keeping compatibility with libsyntax
.
That’s v0.4.0
. I hope you enjoy it! Please let me know if you run into any
problems.
Here are other things that came with this version:
1 2 3 4 5 6 7 |
|
LineColIterator
that tracks line and column information for
deserializers #58.de::PrimitiveVisitor
to also depend on FromStr
#70json::Value::lookup
, that allows values to be extracted with
value.lookup("foo.bar.baz")
#76Bug Fixes:
A special thanks to everyone that helped with this release:
Here’s what’s also new in serde v0.3.1:
ValueDeserializer::deserializer
into ValueDeserializer::into_deserializer
.#[serde(alias="...")]
to #[serde(rename="...")]
.Box
, Rc
, and Arc
.VariantVisitor
to hint to the deserializer which variant kind it is expecting.
This allows serializers to serialize a unit variant as a string.Error::unknown_field_error
error message.Upstream of serde, I’ve been also doing some work on aster and quasi, which are my helper libraries to simplify writing syntax extensions.
aster v0.2.0:
Vec
, Box
, Rc
, and Arc
.use
simple paths, globs, and lists.#[automatically_derived]
annotation.quasi v0.1.9:
quote_attr!()
and quote_matchers!()
from libsyntax
.Thanks for everyone’s help with this release!
]]>There’s been a ton of work since 0.2. Here are the highlights:
Ported over from std::old_io to std::io. There is a bit of a performance hit
when serializing to &mut [u8]
, although it’s really not that bad. In my goser
benchmarks, previously it ran in 373 MB/s, but now it’s running at 260 MB/s.
However, this hasn’t impacted the Vec<u8>
serialization performance, nor
deserialization performance.
Much better JSON deserialization errors. Now std::io::Error
is properly
propogated, and error locations are reported when a Deserialize
raises an error.
Merged serde::ser::Serializer
and serde::ser::Visitor
.
Renamed serde::ser::Serialize::visit
to serde::ser::Serialize::serialize
.
Replaced serde::ser::{Seq,Map}Visitor::size_hint
with a len()
method that
returns an optional length. This has a little stronger emphasis that we either
need an exactly length or no length. Formats that need an exact length should
make sure to verify the length passed in matches the actual amount of values
serialized.
serde::json
now deserializes missing values as a ()
.
Finished implementing #[derive(Serialize, Deserialize)]
for all struct and
enum forms.
Ported serde_macros
over to aster
and quasi, which simplies code
generation.
Removed the unnessary first
argument from visit_{seq,map}_elt
.
Rewrote enum deserializations to not require allocations. Oddly enough this is a tad slower than the allocation form. I suspect it’s coming from the function calls not getting inlined away.
Allowed enum serialization and deserialization to support more than one variant.
Allowed Deserialize
types to hint that it’s expecting a sequence or a map.
Allowed maps to be deserialized from a ()
.
Added a serde::bytes::{Bytes,ByteBuf}
, which wrap &[u8]
/Vec<u8>
to allow
some formats to encode these values more efficiently than generic sequences.
Added serde::de::value
, which contains some helper deserializers to
deserialize from a Rust type.
Added impls for most collection types in the standard library.
Thanks everyone that’s helped out with this release!
]]>Well it’s a long time coming, but serde2 is finally in a mostly usable
position! If you recall from
part 3,
one of the problems with serde1 is that we’re paying a lot for tagging our
types, and it’s really hurting us on the deserialization side of things. So
there’s one other pattern that we can use that allows for lookahead that
doesn’t need tags: visitors. A year or so ago I rewrote our generic hashing
framework to use the visitor pattern to great success. serde2
came out of
experiments to see if I could do the same thing here. It turned out that it was
a really elegant approach.
It all starts with a type that we want to serialize:
1 2 3 4 5 |
|
(Aside: while I’d rather use where
here for this type parameter, that would
force me to write <V as Visitor>::Value>
due to
#20300).
This Visitor
trait then looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
So the implementation for a bool
then looks like:
1 2 3 4 5 6 7 8 |
|
Things get more interesting when we get to compound structures like a sequence.
Here’s Visitor
again. It needs to both be able to visit the overall structure
as well as each item:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
We also have this SeqVisitor
trait that the type to serialize provides. It
really just looks like an Iterator
, but the type parameter has been moved to
the visit
method so that it can return different types:
1 2 3 4 5 6 7 8 9 10 |
|
Finally, to implement this for a type like &[T]
we create an
Iterator
-to-SeqVisitor
adaptor and pass it to the visitor, which then in
turn visits each item:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
SeqIteratorVisitor
is publically exposed, so it should be easy to use it with
custom data structures. Maps follow the same pattern (and also expose
MapIteratorVisitor
), but each item instead uses visit_visit_map_elt(first,
key, value)
. Tuples, struct tuples, and tuple enum variants are all really
just named sequences. Likewise, structs and struct enum variants are just named
maps.
Because struct implementations are so common, here’s an example how to do it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
Fortunately serde2
also comes with a #[derive_serialize]
macro so you don’t
need to write this out by hand if you don’t want to.
Now to actually build a serializer. We start with a trait:
1 2 3 4 5 6 7 |
|
It’s the responsibility of the serializer to create a visitor and then pass it
to the type. Oftentimes the serializer also implements Visitor
, but it’s not
required. Here’s a snippet of the JSON serializer visitor:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
Hopefully it is pretty straight forward.
Now serialization is the easy part. Deserialization is where it always gets more tricky. We follow a similar pattern as serialization. A deserializee creates a visitor which accepts any type (most resulting in an error), and passes it to a deserializer. This deserializer then extracts it’s next value from it’s stream and passes it to the visitor, which then produces the actual type.
It’s achingly close to the same pattern between a serializer and a serializee, but as hard as I tried, I couldn’t unify the two. The error semantics are different. In serialization, you want the serializer (which creates the visitor) to define the error. In deserialization, you want the deserializer which consumes the visitor to define the error.
Let’s start first with Error
. As opposed to serialization, when we’re
deserializing we can error both in the Deserializer
if there is a parse
error, or in the Deserialize
if it’s received an unexpected value. We do this
with an Error
trait, which allows a deserializee to generically create the
few errors it needs:
1 2 3 4 5 6 7 |
|
Now the Deserialize
trait, which looks similar to Serialize
:
1 2 3 4 5 |
|
The Visitor
also looks like the serialization Visitor
, except for the
methods error by default.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Sequences and Maps are also a little different:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
Here is an example struct deserializer. Structs are deserialized as a map, but since maps are unordered, we need a simple state machine to extract the values. In order to get the keys, we just create an enum for the fields, and a custom deserializer to convert a string into a field without an allocation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
It’s a little more complicated, but once again there is
#[derive_deserialize]
, which does all this work for you.
Deserializers then follow the same pattern as serializers. The one difference
is that we need to provide a special hook for Option<T>
types so formats like
JSON can treat null
types as options.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
So how does it perform? Here’s the serialization benchmarks, with yet another ordering. This time sorted by the performance:
language | library | format | serialization (MB/s) |
---|---|---|---|
Rust | capnproto-rust | Cap’n Proto (unpacked) | 4226 |
Go | go-capnproto | Cap’n Proto | 3824.20 |
Rust | bincode | Binary | 1020 |
Rust | capnproto-rust | Cap’n Proto (packed) | 672 |
Go | gogoprotobuf | Protocol Buffers | 596.78 |
Rust | rust-msgpack | MessagePack | 397 |
Rust | serde2::json (&[u8]) | JSON | 373 |
Rust | rust-protobuf | Protocol Buffers | 357 |
C++ | rapidjson | JSON | 316 |
Rust | serde2::json (Custom) | JSON | 306 |
Rust | serde2::json (Vec) | JSON | 288 |
Rust | serde::json (Custom) | JSON | 244 |
Rust | serde::json (&[u8]) | JSON | 222 |
Go | goprotobuf | Protocol Buffers | 214.68 |
Rust | serde::json (Vec) | JSON | 149 |
Go | ffjson | JSON | 147.37 |
Rust | serialize::json | JSON | 183 |
Go | encoding/json | JSON | 80.49 |
I think it’s fair to say that on at least this benchmark we’ve hit our
performance numbers. Writing to a preallocated buffer with BufWriter
is 18%
faster than rapidjson (although to be
fair they are allocating). Our Vec<u8>
writer comes in 12% slower. What’s
interesting is this custom Writer. It turns out LLVM is still having trouble
lowering our generic Vec::push_all
into a memcpy
. This Writer variant
however is able to get us to rapidjson’s level:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Deserialization we do much better than serde because we aren’t passing around all those tags, but we have a ways to catch up to rapidjson. Still, being just 37% slower than the fastest JSON deserializer makes me feel pretty proud.
language | library | format | deserialization (MB/s) |
---|---|---|---|
Rust | capnproto-rust | Cap’n Proto (unpacked) | 2123 |
Go | go-capnproto | Cap’n Proto (zero copy) | 1407.95 |
Go | go-capnproto | Cap’n Proto | 711.77 |
Rust | capnproto-rust | Cap’n Proto (packed) | 529 |
Go | gogoprotobuf | Protocol Buffers | 272.68 |
C++ | rapidjson | JSON (sax) | 189 |
C++ | rapidjson | JSON (dom) | 162 |
Rust | bincode | Binary | 142 |
Rust | rust-protobuf | Protocol Buffers | 141 |
Rust | rust-msgpack | MessagePack | 138 |
Rust | serde2::json | JSON | 122 |
Go | ffjson | JSON | 95.06 |
Go | goprotobuf | Protocol Buffers | 79.78 |
Rust | serde::json | JSON | 67 |
Rust | serialize::json | JSON | 25 |
Go | encoding/json | JSON | 22.79 |
What a long trip it’s been! I hope you enjoyed it. While there are still a few things left to port over from serde1 to serde2 (like the JSON pretty printer), and some things probably should be renamed, I’m happy with the design so I think it’s in a place where people can start using it now. Please let me know how it goes!
]]>libsyntax
, which is not going to be exposed in
Rust 1.0 because we’re not ready to stablize it’s API. That would really hamper
future development of the compiler.
It would be so nice though, writing this for every type we want to serialize:
1 2 3 4 5 |
|
instead of:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
So I want to announce my plan on how to deal with this (and also publically
announce that everyone can blame me if this turns out to hurt Rust’s future
development). I’ve started syntex, a
library that enables code generation using an unofficial tracking fork of
libsyntax
. In order to deal with the fact that libsyntax
might make
breaking changes between minor Rust releases, we will just release a minor or
major release of syntex
, depending on if there were any breaking changes in
libsyntax
. Ideally syntex
will allow the community to experiment with
different code generation approaches to see what would be worth merging
upstream. Or even better, what hooks are needed in the compiler so it doesn’t
have to think about syntax extensions at all.
I’ve got the basic version working right now. Here’s a simple
hello_world_macros
syntax extension (you can see the actual code
here).
First, the hello_world_macros/Cargo.toml
:
1 2 3 4 5 6 7 8 |
|
The syntex_syntax
is the crate for my fork of libsyntax
, and syntex
provides some helper functions to ease registering syntax extensions.
Then the src/lib.rs
, which declares a macro hello_world
that just produces a
"hello world"
string:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Now to use it. This is a little more complicated because we have to do code
generation, but Cargo helps with that. Our strategy is use a build.rs
script
to do code generation in a main.rss
file, and then use the include!()
macro
to include it into our dummy main.rs
file. Here’s the Cargo.toml
we need:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Here’s the build.rs
, which actually performs the code generation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Our main.rs
driver script:
1 2 |
|
And finally the main.rss
:
1 2 3 4 |
|
One limitiation you can see above is that we unfortunately can’t compose our
macros with the Rust macros is that syntex
currently has no awareness of the
Rust macros, and since macros are parsed outside-in, we have to leave the
tokens inside a macro like println!()
untouched.
That’s syntex
. There is a bunch of more work left to be done in syntex
to make it really useable. There’s also a lot of work in Rust and Cargo that
would help it be really effective:
C
.libsyntax
.#[path]
to reference an environment
variable. This would be a little cleaner than using include!(...)
.On Cargo’s side:
main.rs
/lib.rs
/etc.Cargo.toml
could grow a plugin mechanism to remove the need to write a
build.rs
script.I’m sure there’s plenty more that needs to get done! So, please help out!
edit: comments on reddit
]]>Overall serde
’s approach for serialization works out pretty well. One thing I
forgot to include in the last post was that I also have two benchmarks that are
not using serde
, but are just safely reading and writing values. Assuming I
haven’t missed anything, they should be the upper limit in performance we can
get out of any serialization framework: Here’s
serialization:
language | library | serialization (MB/s) |
---|---|---|
rust | max without string escapes | 353 |
c++ | rapidjson | 304 |
rust | max with string escape | 234 |
rust | serde::json | 201 |
rust | serialize::json | 147 |
go | ffjson | 147 |
So beyond optimizing string escaping, serde::json
is only 14% slower than the
zero-cost version and 34% slower than rapidjson
.
Deserialization, on the other hand, still has a ways to go:
language | library | deserialization (MB/s) |
---|---|---|
rust | rapidjson (SAX) | 189 |
c++ | rapidjson (DOM) | 162 |
rust | max with Iterator<u8> | 152 |
go | ffjson | 95 |
rust | max with Reader | 78 |
rust | serde::json | 73 |
rust | serialize::json | 24 |
There are a couple interesting things here:
First, serde::json
is built upon consuming from an Iterator<u8>
, so we’re
48% slower than our theoretical max, and 58% slower than rapidjson
. It looks
like tagged tokens, while faster than the closures in libserialize
, are still
pretty expensive.
Second, ffjson
is beating us and they compile dramatically faster too. The
goser test suite takes about 0.54
seconds to compile, whereas mine takes about 30 seconds at --opt-level=3
(!!). Rust itself is only taking 1.5 seconds, the rest is spent in LLVM. With
no optimization, it compiles “only” in 5.6 seconds, and is 96% slower.
Third, Reader
is a surprisingly expensive trait when dealing with a format
like JSON that need to read a byte at a time. It turns out we’re not
generating great code for
types with padding. aatch has been working on fixing this though.
Since I wrote that last section, I did a little more experimentation to try to figure out why our serialization upper bound is 23% slower than rapidjson. And, well, maybe I found it?
serialization (MB/s) | |
---|---|
serde::json with a MyMemWriter | 346 |
serde::json with a Vec |
247 |
All I did with MyMemWriter
is copy the Vec::<u8>
implementation of Writer
into the local codebase:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
Somehow it’s not enough to just mark Vec::write
as
#[inline]
, having it in the same file gave LLVM enough information to
optimize it’s overhead away. Even using #[inline(always)]
on Vec::write
and
Vec::push_all
isn’t able to get the same increase, so I’m not sure how to
replicate this in the general case.
Also interesting is bench_serializer_slice
, which uses BufWriter
.
serialization (MB/s) | |
---|---|
serde::json with a BufWriter | 342 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Another digression. Since I wrote the above, aatch has put out some PRs that
should help speed up enums.
19898 and
#20060 and was able to optimize
the padding out of enums and fix an issue with returns generating bad code. In
my bug from earlier
his patches were able to speed up my benchmark returning an
Result<(), IoError>
from running at 40MB/s to 88MB/s. However, if we’re able
to reduce IoError
down to a word, we get the performance up to 730MB/s! We
also might get enum compression, so a type like Result<(), IoError>
then
would speed up to 1200MB/s! I think going in this direction is going to really
help speed things up.
That was taking a while, so until next time!
]]>So libserialize
has some pretty serious downsides. It’s slow, it’s got this
weird recursive closure thing going on, and it can’t even represent enum types
like a serialize::json::Json
. We need a new solution, and while I was at it,
we ended up with two: serde and
serde2. Both are
different approaches to trying to address these problems. The biggest one being
the type representation problem.
I want to start with deserialization first, as that’s really the interesting
bit. To repeat myself a little bit from
part 1,
here is a generic json Value
enum:
1 2 3 4 5 6 7 8 9 10 |
|
To deserialize a string like [1, true]
into
Array(vec![I64(1), Boolean(true)])
, we need to peek at one character ahead
(ignoring whitespace) in order to discover what is the type of the next value.
We then can use that knowledge to pick the right variant, and parse the next
value correctly. While I haven’t formally studied this stuff, I believe this
can be more formally stated as Value
requires at least a LL(1) grammar,
but since libserialize
supports no lookahead, so at most it can handle LL(0)
grammars.
Since I was thinking of this problem in terms of grammars, I wanted to take a
page out of their book and implement generic deserialization in this style.
serde::de::Deserializer
s are then an Iterator<serde::de::Token>
lexer that
produces a token stream, and serde::de::Deserialize
s are a parser that
consumes this stream to produce a value. Here’s serde::de::Token
, which can
represent nearly all the rust types:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
The serde::de::Deserialize
stream must generate tokens that follow this
grammar:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
For performance reasons, there is no separator in the compound grammar.
Finishing up this section are the actual traits, Deserialize
and Deserializer
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
The Deserialize
trait is kept pretty slim, and is how lookahead is
implemented. Deserializer
is an enhanced Iterator<Result<Token, E>>
, with
many helpful default methods. Here are them in action. First we’ll start with
what’s probably the simplest Deserializer
, which just wraps a Vec<Token>
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
Overall it should be pretty straight forward. As usual, error handling makes
things a bit noisier, but hopefully it’s not too onerous. Next is a
Deserialize
for bool
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Simple! Sequences are a bit more tricky. Here’s Deserialize
a Vec<T>
. We
use a helper adaptor SeqDeserializer
to deserialize from all types that
implement FromIterator
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
Last is a struct deserializer. This relies on a simple state machine in order to deserialize from out of order maps:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
It’s more complicated than libserialize
’s struct parsing, but it performs
much better because it can handle out of order maps without buffering tokens.
Serialization’s story is a much simpler one. Conceptually
serde::ser::Serializer
/serde::ser::Serialize
are inspired by the
deserialization story, but we don’t need the tagged tokens because we already
know the types. Here are the traits:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
There are many default methods, so only a handful of implementations need to be
specified. Now lets look at how they are used. Here’s a simple
AssertSerializer
that I use in my test suite to make sure I’m serializing
properly:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
Implementing Serialize
for values follows the same pattern. Here’s bool
:
1 2 3 4 5 6 |
|
Vec<T>
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
And structs:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Much simpler than deserialization.
So how does it perform? Here’s the serialization benchmarks, with yet another ordering. This time sorted by the performance:
language | library | format | serialization (MB/s) |
---|---|---|---|
Rust | capnproto-rust | Cap’n Proto (unpacked) | 4349 |
Go | go-capnproto | Cap’n Proto | 3824.20 |
Rust | bincode | Binary | 1020 |
Go | gogoprotobuf | Protocol Buffers | 596.78 |
Rust | capnproto-rust | Cap’n Proto (packed) | 583 |
Rust | rust-msgpack | MessagePack | 397 |
Rust | rust-protobuf | Protocol Buffers | 357 |
C++ | rapidjson | JSON | 304 |
Rust | serde::json | JSON | 222 |
Go | goprotobuf | Protocol Buffers | 214.68 |
Go | ffjson | JSON | 147.37 |
Rust | serialize::json | JSON | 147 |
Go | encoding/json | JSON | 80.49 |
serde::json
is doing pretty good! It still has got a ways to go to catch up
to rapidjson, but it’s pretty cool it’s
beating goprotobuf out of the box :)
Here are the deserialization numbers:
language | library | format | deserialization (MB/s) |
---|---|---|---|
Rust | capnproto-rust | Cap’n Proto (unpacked) | 2185 |
Go | go-capnproto | Cap’n Proto (zero copy) | 1407.95 |
Go | go-capnproto | Cap’n Proto | 711.77 |
Rust | capnproto-rust | Cap’n Proto (packed) | 351 |
Go | gogoprotobuf | Protocol Buffers | 272.68 |
C++ | rapidjson | JSON (sax) | 189 |
C++ | rapidjson | JSON (dom) | 162 |
Rust | rust-msgpack | MessagePack | 138 |
Rust | rust-protobuf | Protocol Buffers | 129 |
Go | ffjson | JSON | 95.06 |
Rust | bincode | Binary | 80 |
Go | goprotobuf | Protocol Buffers | 79.78 |
Rust | serde::json | JSON | 67 |
Rust | serialize::json | JSON | 24 |
Go | encoding/json | JSON | 22.79 |
Well on the plus side, serde::json
nearly 3 times faster than
libserialize::json
. On the downside rapidjson is nearly 3 times faster than
us in it’s SAX style parsing. Even the newly added deserialization support in
ffjson is 1.4 times faster than us. So we
got more work cut out for us!
Next time, serde2!
PS: I’m definitely getting close to the end of my story, and while I have some better numbers with serde2, nothing is quite putting me in the rapidjson range. Anyone want to help optimize serde? I would greatly appreciate the help!
PPS: I’ve gotten a number of requests for my serialization benchmarks to be ported over to other languages and libraries. Especially a C++ version of Cap’n Proto. Unfortunately I don’t really have the time to do it myself. Would anyone be up for helping to implement it?
comments on reddit
]]>So I wanted to see how easy it’d be to make a Rust binding for the project. If you want to follow along, first make sure you have rust installed. Unfortunately it looks like MDBM only supports Linux and FreeBSD, so I had to build out a Fedora VM to test this out on. I think this is all you need to build it:
1 2 3 4 5 |
|
Unfortunately it’s only for linux, and I got a mac, but it turns out there’s plenty I can do to prep while VirtualBox and Fedora 21 download. Lets start out by creating our project with cargo:
1 2 |
|
(Right now there’s no way to have the name be different than the path, so edit
Cargo.toml
to rename the project to mdbm
. I filed
#1030 to get that
implemented).
By convention, we put bindgen packages into $name-sys
, so make that crate as
well:
1 2 |
|
We’ve got a really cool tool called bindgen, which uses clang to parse header files and convert them into an unsafe rust interface. So lets check out MDBM, and generate a crate to wrap it up in.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Pretty magical. Make sure it builds:
1 2 3 4 5 6 7 8 9 10 11 |
|
Nope! The problem is that we don’t have the libc
crate imported. We don’t
have a convention yet for this, but I like to do is:
1
|
|
And create a src/lib.rs
that contains:
1 2 3 4 5 6 7 8 |
|
This lets me run bindgen later on without mucking up the library. This now
compiles. Next up is our high level interface. Add mdbm-sys
to our high level
interface by adding this to the rust-mdbm/Cargo.toml
file:
1 2 |
|
By now I got my VirtualBox setup working, so now to the actual code! Lets start with a barebones wrapper around the database:
1 2 3 |
|
Next is the constructor and destructor. I’m hardcoding things for now and using IoError, since MDBM appears to log everything to the ERRNO:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
Pretty straightforward translation of the examples with some hardcoded values
to start out. Next up is a wrapper around MDBM’s datum
type, which is the
type used for both keys and values. datum
is just a simple struct containing
a pointer and length, pretty much analogous to our &[u8]
slices. However our
slices are much more powerful because our type system can guarantee that in
safe Rust, these slices can never outlive where they are derived from:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
And for convenience, lets add a AsDatum
conversion method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
And finally, we got setting and getting a key-value. Setting is pretty
straightforward. The only fun thing is using the AsDatum
constraints so we
can do db.set(&"foo", &"bar", 0)
instead of
db.set(Datum::new(&"foo".as_slice()), Datum::new("bar".as_slice()), 0)
.
we’re copying into the database, we don’t have to worry about lifetimes yet:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
MDBM requires the database to be locked in order to get the keys. This os
where things get fun in order to prevent those interior pointers from escaping.
We’ll create another wrapper type that manages the lock, and uses RAII to
unlock when we’re done. We tie the lifetime of the Lock
to the lifetime of
the database and key, which prevents it from outliving either object:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
(Note that I’ve heard #[unsafe_destrutor]
as used here may become unnecessary
in 1.0).
Finally, let’s get our value! Assuming the value exists, we tie the lifetime of
the Lock
to the lifetime of the returned &[u8]
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Now to verify it works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
Which when run with cargo test
, produces:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Next we want to make sure invalid behavior is a compile-time error. First, make sure we don’t leak the keys:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Which errors with:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
And confirm the value doesn’t leak either:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Which errors with:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
Success! Not too bad for 2 hours of work. Baring bugs, this mdbm
library should perform at roughly the same speed as the C library, but
eliminate many very painful bug opportunities that require tools like Valgrind
to debug.
Low level benchmarking is confusing and non-intuitive.
The end.
Or not. Whatever. So I’m trying to get my
implement-Reader
-and-Writer
-for-&[u8]
type PR
#18980 landed. But
Steven Fackler
obnixously and correctly pointed out that this won’t play that nicely with the
new Reader
and Writer
implementation for Vec<u8>
. Grumble grumble. And then
Alex Crichton
had the gall to mention that a Writer
for mut &[u8]
also probably won’t be
that common either. Sure, he’s write and all, but but I got it working without
needing an index! That means that the &mut [u8]
Writer
only needs 2
pointers instead of BufWriter
’s three, so it just has to be faster! Well,
doesn’t it?
Stupid benchmarks.
I got to say it’s pretty addicting writing micro-benchmarks. It’s a lot of fun seeing how sensitive low-level code can be to just the smallest of tweaks. It’s also really annoying when you write something you think is pretty neat, then you find it’s chock-full of false dependencies between cache lines, or other mean things CPUs like to impose on poor programmers.
Anyway, to start lets look at what should be the fastest way to write to a buffer. Completely unsafely with no checks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
With SRC_LEN=4
and BATCHES=128
, we get this. For fun I added the new
libtest
from #19233 that will
hopefully land soon. I also added also ran variations that explicitly inlined
and not inlined the inner function:
1 2 3 |
|
So overall it does quite well. Now lets compare with the code I wrote:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
And. Well. It didn’t do that well.
1 2 3 |
|
Wow. That’s pretty bad compared to the ideal.
Crud. So not only did I add an implementation that’s probably going to not work
with write!
and now it turns out the performance is pretty terrible. Inlining
isn’t helping like it did in the unsafe case. So how’s
std::io::BufWriter
compare?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Here’s how it does:
1 2 3 |
|
That’s just cruel. The optimization gods obviously hate me. So I started playing with a lot of variations (it’s my yeah yeah it’s my serialization benchmark suite, I’m planning on making it more general purpose. Besides it’s my suite and I can do whatever I want with it, so there):
Writer
into a struct wrapper shouldn’t do anything, and it
didn’t.let write_len = min(dst_len, src_len)
. We can
turn that into the branch-predictor-friendly:1 2 |
|
Doesn’t matter, still performs the same.
src.len()
bytes! Damn the safety! That, of course,
works. I can hear them giggle.std::io::BufWriter
and make sure that
it’s still nearly optimal. It still is.min(dst_len, src_len)
is a bounds check, so
we could switch from the bounds checked std.slice::bytes::copy_memory
to
the unsafe std::ptr::copy_nonoverlapping_memory
, but that also doesn’t
help.std::io::BufWriter
,
and it does shave a couple nanoseconds off. It might be worth pushing it
upstream:1 2 3 |
|
uint
than std::io::BufWriter
, I’m
doing two writes to advance my slice, one to advance the pointer, and one to
shrink the length. std::io::BufWriter
only has to advance it’s pos
index.
But in this case if instead of treating the slice as a (ptr, length)
, we
can convert it into a (start_ptr, end_ptr)
, where start_ptr=ptr
, and
end_ptr=ptr+length
. This works! Ish:1 2 3 |
|
I know when I’m defeated. Oh well. I guess I can at least update
std::io::BufWriter
to support the new error handling approach:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
How’s it do?
1 2 3 |
|
Grumble grumble. It turns out that if we tweak the copy_memory
line to:
1
|
|
It shaves 674 nanoseconds off the run:
1 2 3 |
|
But still no where near where we need to be. That suggests though that always
cutting down the src
, which triggers another bounds check has some measurable
impact. So maybe I should only shrink the src
slice when we know it needs to
be shrunk?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
Lets see how it failed this time…
1 2 3 |
|
No way. That actually worked?! That’s awesome! I’ll carve that out into another
PR. Maybe it’ll work for my original version that doesn’t use a pos
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
And yep, just as fast!
1 2 3 |
|
At this point, both solutions are approximately just as fast as the unsafe
ptr::copy_nonoverlapping_memory
! So that’s awesome. Now would anyone really
care enough about the extra uint
? There may be a few very specialized cases
where that extra uint
could cause a problem, but I’m not sure if it’s worth
it. What do you all think?
I thought that was good, but since I’m already here, how’s the new Vec<u8>
writer doing? Here’s the driver:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
And the results:
1 2 3 |
|
Wow. That’s pretty terrible. Something weird must be going on with
Vec::push_all
. (Maybe that’s what caused my serialization benchmarks to slow
1/3?). Lets skip it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
And it looks a bit better, but not perfect:
1 2 3 |
|
There’s even less going on here than before. The only difference is that
reserve call. Commenting it out gets us back to copy_nonoverlapping_memory
territory:
1 2 3 |
|
Unfortunately it’s getting pretty late, so rather than wait until the next time
to dive into this, I’ll leave it up to you all. Does anyone know why reserve
is causing so much trouble here?
PS: While I was working on this, I saw stevencheg submitted a patch to speed up the protocol buffer support. But when I ran the tests, everything was about 40% slower than the last benchmark post! Something happened with Rust’s performance over these past couple weeks!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Vec
as a Writer
. Over the weekend I submitted
#18980, which allows a &[u8]
to be used as a Reader
. Overall a pretty simple change. However, when I was
running the test suite, I found that the std::io::net::tcp::write_close_ip4()
test was occasionally failing:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
The write
would succeed a few times, then occasionally error with the
unknown error: Protocol wrong type for socket
, or the EPROTOTYPE
errno.
This is a really surprising error. As far as I know, the only functions that
return that errno are socket
, socketpair
, and connect
. I searched
everywhere, but I couldn’t find any documentation suggesting that write
would
ever produce that error.
I wasn’t the only one who ran into it. bjz opened #18900 describing the same problem. One interesting thing to note is they were also on OSX Yosemite. So I took a little bit of time to extract out that test from the Rust test suite into this gist and got someone on #rust-internals to run it on linux for me with this little driver script:
1 2 3 4 5 6 7 8 9 |
|
and it didn’t error out. So it seems to be system dependent. Further
experimentation showed that if we introduced sleeps or a mutex synchronization
appeared to fix the problem as well. At this point I wasn’t sure if this was a
non-issue, a bug in our code, or a bug in the OS. One things for sure though,
if there is a bug, it could be scattered somewhere across the Rust codebase,
which just std::io
alone is 20 files at around 12522 lines. It’d be a pain to
cut that down to a self contained test case.
Fortunately we’ve got C-Reduce to help us out. Back in May Professor John Regehr from the University of Utah came to our Bay Area Rust meetup to talk about compiler testing and fuzzing. We recorded the talk, so if you want to watch it, you can find it here. One of the things he talked about was the tool C-Reduce his research group developed to automatically cut out unnecessary lines of code that can still reproduce a bug you’re looking for. While it’s written to target C files, it turns out Rust is syntatically close enough to C that it works out pretty well for it too. All you need is a single source file and driver script that’ll report if the compiled source file reproduced the bug.
Aside 1: By the way, one of the other things I did this weekend was I put together a Homebrew pull request for C-Reduce. It hasn’t landed yet, but you want to use it, you can do:
1 2 |
|
Hopefully it’ll land soon so you’ll be able to do:
1
|
|
Anyway, back to the story. So we’ve got a rather large code base to cover, and
while C-reduce does a pretty good job of trimming away lines, just pointing it
at the entire std
module is a bit too much for it to handle in a reasonable
amount of time. It probably needs some more semantic information about Rust to
do a better job of cutting out broad swaths of code.
So I needed to do at least a rough pass to slim down the files. I figured the
problem was probably contained in std::io
or std::sys
, so I wrote a simple
test.rs
that explicitly included those modules as part of the crate (see
this gist if you are
interested), and used the pretty printer to merge it into one file:
1
|
|
Aside 2: Our pretty printer still has some bugs in it, which I filed:
19075 and
19077. Fortunately it was
pretty simple to fix those cases by hand in the generated std.rs
, and odds
are good they’ll be fixed by the time you might use this process.
Next up, we need a driver script. We can adapt our one from before. The only
special consideration is that we need to make sure to only exit with a return
code of 0 if the version of std.rs
we’re compiling errors with EPROTOTYPE
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
I used the helper script timeout.sh to time out tests in case C-Reduce accidentally made us an infinite loop.
Finally we’re ready to start running C-Reduce:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
I let it run in the background for sometime in the background while I did some other things. When I got back, C-Reduce automatically reduced the file from 153KB to a slim 22KB. I then reran rust with the lints enabled to manually cut out the dead code C-Reduce failed to remove, and flattened away some unnecessary structs and methods. I was finally left with:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
|
This snippet reproduces the same EPROTOTYPE
that we started with at the top
of the post. Pretty cool that we got here without much effort?
Now At this point you might say to yourself that couldn’t I have extracted this
out myself? And yeah, you would be right. This is a pretty much a c-in-rust
implementation of this bug. But what’s great about using C-Reduce here is that
I only had to make some very rough guesses about what files were and were not
important to include in my test.rs
. Eventually when we get some rust plugins
written for C-Reduce I probably could just point it at the whole libstd
let
C-Reduce do it’s thing. Doing this by hand can be a pretty long and painful
manual process, especially if we’re trying to debug a codegen or runtime bug.
In the past I’ve spent hours reducing some codegen bugs down into a small
snippet that C-Reduce was also able to produce in a couple minutes.
The last step with this code was to eliminate Rust from the picture, and translate this code into C:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
This also produces EPROTOTYPE
, so we can eliminate Rust altogther. But lets
keep digging. What exactly is producing this error? If I was on Linux, I’d use
strace
, but that program isn’t on Macs. There’s a similar tool called
dtruss
, but that seemed to slow things down enough that the EPROTOTYPE
never happened. Fortunately though there is another program called errinfo
,
that just prints the errno
along with every syscall. In one terminal I ran
while ./test; do sleep 0.1; done
. In the other:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Right there we see our sendto
syscall is actually returning the EPROTOTYPE
.
This errno
then is definitely being created inside the OSX kernel, not in any
userspace code. Fortunately, most of the Apple kernel, XNU, is open sourced, so
we can dig down to what’ll be the my last layer. You can find the tarballs at
http://www.opensource.apple.com/. But I’d rather use the
unoffical GitHub repository. Using GitHub’s
search tools, We can find all 17 instances of
EPROTOTYPE
in the codebase. Now I don’t know the XNU codebase, but there are still some
really interesting things we can find. The first is in
bsd/kern/uipc_usrreq.c:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Hey look at that! There’s handler for the send
syscall (although for IPC, not
TCP) that actually documents that it can return EPROTOTYPE
! While it doesn’t
explain exactly how this can happen, the fact it mentions unp_connect
hints
that uipc_send
may trigger a connect, and that’s exactly what we find a
couple lines into the function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
The fact that the comment says the socket might not be connected yet when we’re
doing a send
hints that Apple may have introduced some level of asynchrony
and preemption to sockets. So if we trigger the actual connect here, it could
then return EPROTOTYPE
, which makes sense. Unfortunately that’s still not
quite the behavior we’re seeing. We’re not getting EPROTOTYPE
on our first
write, but after we’ve done a couple.
I believe we find that behavior in the actual TCP syscall file, bsd/netinet/tcp_usrreq.c:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
I believe that comment explains everything we’re seeing. If we trigger a send
while the kernel is in the middle of tearing down the socket, it returns
EPROTOTYPE
. This then looks to be an error we could retry. Once the socket is
fully torn down, it should eventually return the proper EPIPE
. This is also
pretty easy to test. So I modified the inner loop of our C test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
And yep, it exits cleanly. After all of this, I think it’s pretty clear at this point that there’s no weird kernel corruption bug going on, just a poorly documented edge case. But it sure was fun chasing this condition through the system.
To prevent anyone else from tripping over this edge case, I filed a Apple Radar
ticket (number #19012087 for any Apple employees reading this). Hopefully if
anyone runs into this mysterious EPROTOTYPE
it’ll be documented for them, or
at least there’s a chance they’ll stumble over this blog post and save
themselves a weekend diving through the OS.
C++
and Rust
JSON tests to serialize enums as uints.JSON:
language | library | population (ns) | serialization (MB/s) | deserialization (MB/s) |
---|---|---|---|---|
Rust | serialize::json | 1127 | 117 | 26 |
C++ | rapidjson (dom) | 546 | 281 | 144 |
C++ | rapidjson (dom) | 546 | 281 | 181 |
Go | encoding/json | 343 | 63.99 | 22.46 |
Go | ffjson | 343 | 144.60 | (not supported) |
Cap’n Proto:
language | library | population (ns) | serialization (MB/s) | deserialization (MB/s) |
---|---|---|---|---|
Rust | capnproto-rust (unpacked) | 325 | 4977 | 2251 |
Rust | capnproto-rust (packed) | 325 | 398 | 246 |
Go | go-capnproto | 2368 | 2226.71 | 450 |
Go | go-capnproto (zero copy) | 2368 | 2226.71 | 1393.3 |
Protocol Buffers:
language | library | population (ns) | serialization (MB/s) | deserialization (MB/s) |
---|---|---|---|---|
Rust | rust-protobuf | 1041 | 370 | 118 |
Go | goprotobuf | 1133 | 138.27 | 91.18 |
Go | gogoprotobuf | 343 | 472.69 | 295.33 |
Misc:
language | library | population (ns) | serialization (MB/s) | deserialization (MB/s) |
---|---|---|---|---|
Rust | rust-msgpack | 1143 | 454 | 144 |
Rust | bincode | 1143 | 1149 | 82 |
Anyone want to add more C/Go/Rust/Java/etc benchmarks?
]]>language | library | format | serialization (MB/s) | deserialization (MB/s) |
---|---|---|---|---|
C++ | rapidjson | JSON (dom) | 233 | 102 |
C++ | rapidjson | JSON (sax) | 233 | 124 |
Go | encoding/json | JSON | 54.93 | 16.72 |
Go | ffjson | JSON | 126.40 | (not supported) |
Go | goprotobuf | Protocol Buffers | 138.27 | 91.18 |
Go | gogoprotobuf | Protocol Buffers | 472.69 | 295.33 |
Go | go-capnproto | Cap’n Proto | 2226.71 | 450 |
Go | go-capnproto | Cap’n Proto (zero copy) | 2226.71 | 1393.3 |
Rust | serialize::json | JSON | 89 | 18 |
Rust | rust-msgpack | MessagePack | 160 | 52 |
Rust | rust-protobuf | Protocol Buffers | 177 | 70 |
Rust | capnproto-rust | Cap’n Proto (unpacked) | 1729 | 1276 |
Rust | capnproto-rust | Cap’n Proto (packed) | 398 | 246 |
I upgraded to OS X Yosemite, so I think that brought these numbers down overall from the last post.
]]>serialize
library, specifically serialize::json
is pretty slow.
Back when I started this project a number of months ago, I wanted to benchmark
to see how we compared to some other languages. There are a bunch of JSON
benchmarks, but the one I chose was Cloudflare’s Go language.
Goser, mainly because it was using a
complex real world log structure, and they did the hard work of implementing
benchmarks for encoding/json,
goprotobuf,
gogoprotobuf, and
go-capnproto. I also included the
Go ffjson and C++
rapidjson, which
both claim to be the fastest JSON libraries for those languages. Here are the
results I got:
language | library | format | serialization (MB/s) | deserialization (MB/s) |
---|---|---|---|---|
C++ | rapidjson | JSON | 294 | 164 (DOM) / 192 (SAX) |
Go | encoding/json | JSON | 71.47 | 25.09 |
Go | ffjson | JSON | 156.67 | (not supported) |
Go | goprotobuf | Protocol Buffers | 148.78 | 99.57 |
Go | gogoprotobuf | Protocol Buffers | 519.48 | 319.40 |
Go | go-capnproto | Cap’n Proto | 3419.54 | 665.35 |
Rust | serialize::json | JSON | 40-ish | 10-ish |
Notes:
rapidjson
supports both DOM-style and SAX-style deserializing. DOM-style
means deserializing into a generic object, then from there into the final
object, SAX-style means a callback approach where a callback handler is
called for each JSON token.encoding/json
uses reflection to serialize arbitrary values. ffjson
uses code generation to get it’s serialization sped, but it doesn’t implement
deserialization.goprotobuf
and gogoprotobuf
use code generation, but gogoprotobuf
uses Protocol Buffer’s extension support to do cheaper serialization.So. Yikes. Not only are we no where near rapidjson
, we were being soundly
beaten by Go’s reflection-based framework encoding/json
. Even worse, our
compile time was at least 10 times theirs. So, not pretty at all.
But that was a couple months ago. Between then and now, Patrick Walton, Luqman
Aden, myself, and probably lots others found and fixed a number of bugs across
serialize::json
, std::io
, generic function calls, and more. All this work
got us to more than double our performance:
language | library | format | serialization (MB/s) | deserialization (MB/s) |
---|---|---|---|---|
Rust | serialize::json | JSON | 117 | 25 |
We’re (kind of) beating Go! At least the builtin reflection-based solution. Better, but not great. I think our challenge is those dang closures. While LLVM can optimize simple closures, it seems to have a lot of trouble with all these recursive closure calls. While having finished unboxed closures might finally let us break through this performance bottleneck, it’s not guaranteed.
All in all, this, and the representational problems from post 1 make it pretty obvious we got some fundamental issues here and we need to use an alternative solution. Next post I’ll start getting into the details of the design of serde.
]]>Anyway, on to the post. My main on-again-off-again project this past year has
been working Rust’s generic serialize
library. If you haven’t played with it yet, it’s really nifty. It’s a generic
framework that allows a generic Encoder
serialize a generic Encodable
, and
the inverse with Decoder
and Decodable
. This allows you to write just one
Encodable
impl that can transparently work with our
json library,
msgpack,
toml, and etc. It’s simple to use
too in most cases as you can use #[deriving(Encodable, Decodable)]
to
automatically create a implementation for your type. Here’s an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
There are some downsides to serialize though. Manually implementing can be a bit of a pain. Here’s the example from before:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
As you can see, parsing compound structures requires these recursive closure
calls in order to perform the handshake between the Encoder
and the
Encodable
. A couple people have run into bugs in the past where they didn’t
implement this pattern, which results in some confusing bugs. Furthermore, LLVM
isn’t great at inlining these recursive calls, so serialize
impls tend to not
perform well.
That’s not the worst of it though. The real problem is that there are types
that can implement Encodable
, there’s no way to write a Decodable
implementation. They’re pretty common too. For example, the
serialize::json::Json
type:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The Json
value can represent any value that’s in a JSON string. Implied in
this is the notion that the Decodable
has to look ahead to see what the next
value is so it can decide which Json
variant to construct. Unfortunately our
current Decoder
infrastructure doesn’t support lookahead. The way the
Decoder
/Decodable
handshake works is essentially:
Decodable
asks for a struct named "Employee"
.Decodable
asks for a field named "name"
.Decodable
asks for a value of type String
.Decodable
asks for a field named "age"
.Decodable
asks for a value of type uint
.Any deviation from this pattern results in an error. There isn’t a way for the
Decodable
to ask what is the type of the next value, so this is why we
serialize generic enums by explicitly tagging the variant, as in:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
That’s probably good enough for now. In my next post I’ll go into in my approach to fix this in serde.
]]>Ragel is a rather neat way of writing simple parsers. In some ways it’s pretty similar to Lex, but Ragel also allows you execute arbitrary code at any point in the state machine. Furthermore, this arbitrary code can manipulate the state machine itself, so it can be used in many places you’d traditionally need a full parser, such as properly handling parentheses.
Here’s an example of a atoi
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
While this is probably a bit more verbose than writing atoi
by hand, it does
make the grammar pretty explicit, which can help keep it accurate.
Unfortunately there are some pretty severe performance issues at the moment. Ragel supports two state machine styles, table-driven and goto-driven. My backend uses tables, but since Rust doesn’t yet support global constant vectors, I need to malloc the state machine table on every function call. This results in the ragel-based url parser being about 10 times slower than the equivalent table-based parser in OCaml. You can see the generated code here.
The goto
route could be promising to explore. We could simulate it using
mutually recursive function calls. OCaml does this. But again, since Rust
doesn’t support tailcalls (and may
ever), we could run into a stack
explosion. It may work well for small grammars though, and maybe LLVM could
optimize calls into tailcalls.
Unless I’m doing something glaringly wrong, it seems likely that we are going to need some compiler help before these performance issues get solved.
]]>