It’s been a while, hasn’t it? Here’s part 1, part 2, part 2.1, part 2.2, part 3, and part 3.1 if you want to catch up.
Serde Version 2
Well it’s a long time coming, but serde2 is finally in a mostly usable
position! If you recall from
part 3,
one of the problems with serde1 is that we’re paying a lot for tagging our
types, and it’s really hurting us on the deserialization side of things. So
there’s one other pattern that we can use that allows for lookahead that
doesn’t need tags: visitors. A year or so ago I rewrote our generic hashing
framework to use the visitor pattern to great success. serde2
came out of
experiments to see if I could do the same thing here. It turned out that it was
a really elegant approach.
Serialize
It all starts with a type that we want to serialize:
1 2 3 4 5 |
|
(Aside: while I’d rather use where
here for this type parameter, that would
force me to write <V as Visitor>::Value>
due to
#20300).
This Visitor
trait then looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
So the implementation for a bool
then looks like:
1 2 3 4 5 6 7 8 |
|
Things get more interesting when we get to compound structures like a sequence.
Here’s Visitor
again. It needs to both be able to visit the overall structure
as well as each item:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
We also have this SeqVisitor
trait that the type to serialize provides. It
really just looks like an Iterator
, but the type parameter has been moved to
the visit
method so that it can return different types:
1 2 3 4 5 6 7 8 9 10 |
|
Finally, to implement this for a type like &[T]
we create an
Iterator
-to-SeqVisitor
adaptor and pass it to the visitor, which then in
turn visits each item:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
SeqIteratorVisitor
is publically exposed, so it should be easy to use it with
custom data structures. Maps follow the same pattern (and also expose
MapIteratorVisitor
), but each item instead uses visit_visit_map_elt(first,
key, value)
. Tuples, struct tuples, and tuple enum variants are all really
just named sequences. Likewise, structs and struct enum variants are just named
maps.
Because struct implementations are so common, here’s an example how to do it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
Fortunately serde2
also comes with a #[derive_serialize]
macro so you don’t
need to write this out by hand if you don’t want to.
Serializer
Now to actually build a serializer. We start with a trait:
1 2 3 4 5 6 7 |
|
It’s the responsibility of the serializer to create a visitor and then pass it
to the type. Oftentimes the serializer also implements Visitor
, but it’s not
required. Here’s a snippet of the JSON serializer visitor:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
Hopefully it is pretty straight forward.
Deserialization
Now serialization is the easy part. Deserialization is where it always gets more tricky. We follow a similar pattern as serialization. A deserializee creates a visitor which accepts any type (most resulting in an error), and passes it to a deserializer. This deserializer then extracts it’s next value from it’s stream and passes it to the visitor, which then produces the actual type.
It’s achingly close to the same pattern between a serializer and a serializee, but as hard as I tried, I couldn’t unify the two. The error semantics are different. In serialization, you want the serializer (which creates the visitor) to define the error. In deserialization, you want the deserializer which consumes the visitor to define the error.
Let’s start first with Error
. As opposed to serialization, when we’re
deserializing we can error both in the Deserializer
if there is a parse
error, or in the Deserialize
if it’s received an unexpected value. We do this
with an Error
trait, which allows a deserializee to generically create the
few errors it needs:
1 2 3 4 5 6 7 |
|
Now the Deserialize
trait, which looks similar to Serialize
:
1 2 3 4 5 |
|
The Visitor
also looks like the serialization Visitor
, except for the
methods error by default.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Sequences and Maps are also a little different:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
Here is an example struct deserializer. Structs are deserialized as a map, but since maps are unordered, we need a simple state machine to extract the values. In order to get the keys, we just create an enum for the fields, and a custom deserializer to convert a string into a field without an allocation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
It’s a little more complicated, but once again there is
#[derive_deserialize]
, which does all this work for you.
Deserializer
Deserializers then follow the same pattern as serializers. The one difference
is that we need to provide a special hook for Option<T>
types so formats like
JSON can treat null
types as options.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Performance
So how does it perform? Here’s the serialization benchmarks, with yet another ordering. This time sorted by the performance:
language | library | format | serialization (MB/s) |
---|---|---|---|
Rust | capnproto-rust | Cap’n Proto (unpacked) | 4226 |
Go | go-capnproto | Cap’n Proto | 3824.20 |
Rust | bincode | Binary | 1020 |
Rust | capnproto-rust | Cap’n Proto (packed) | 672 |
Go | gogoprotobuf | Protocol Buffers | 596.78 |
Rust | rust-msgpack | MessagePack | 397 |
Rust | serde2::json (&[u8]) | JSON | 373 |
Rust | rust-protobuf | Protocol Buffers | 357 |
C++ | rapidjson | JSON | 316 |
Rust | serde2::json (Custom) | JSON | 306 |
Rust | serde2::json (Vec) | JSON | 288 |
Rust | serde::json (Custom) | JSON | 244 |
Rust | serde::json (&[u8]) | JSON | 222 |
Go | goprotobuf | Protocol Buffers | 214.68 |
Rust | serde::json (Vec) | JSON | 149 |
Go | ffjson | JSON | 147.37 |
Rust | serialize::json | JSON | 183 |
Go | encoding/json | JSON | 80.49 |
I think it’s fair to say that on at least this benchmark we’ve hit our
performance numbers. Writing to a preallocated buffer with BufWriter
is 18%
faster than rapidjson (although to be
fair they are allocating). Our Vec<u8>
writer comes in 12% slower. What’s
interesting is this custom Writer. It turns out LLVM is still having trouble
lowering our generic Vec::push_all
into a memcpy
. This Writer variant
however is able to get us to rapidjson’s level:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Deserialization we do much better than serde because we aren’t passing around all those tags, but we have a ways to catch up to rapidjson. Still, being just 37% slower than the fastest JSON deserializer makes me feel pretty proud.
language | library | format | deserialization (MB/s) |
---|---|---|---|
Rust | capnproto-rust | Cap’n Proto (unpacked) | 2123 |
Go | go-capnproto | Cap’n Proto (zero copy) | 1407.95 |
Go | go-capnproto | Cap’n Proto | 711.77 |
Rust | capnproto-rust | Cap’n Proto (packed) | 529 |
Go | gogoprotobuf | Protocol Buffers | 272.68 |
C++ | rapidjson | JSON (sax) | 189 |
C++ | rapidjson | JSON (dom) | 162 |
Rust | bincode | Binary | 142 |
Rust | rust-protobuf | Protocol Buffers | 141 |
Rust | rust-msgpack | MessagePack | 138 |
Rust | serde2::json | JSON | 122 |
Go | ffjson | JSON | 95.06 |
Go | goprotobuf | Protocol Buffers | 79.78 |
Rust | serde::json | JSON | 67 |
Rust | serialize::json | JSON | 25 |
Go | encoding/json | JSON | 22.79 |
Conclusion
What a long trip it’s been! I hope you enjoyed it. While there are still a few things left to port over from serde1 to serde2 (like the JSON pretty printer), and some things probably should be renamed, I’m happy with the design so I think it’s in a place where people can start using it now. Please let me know how it goes!