Wow, home stretch! Here’s the rest of the series if you want to catch up: part 1, part 2, part 2.1, part 2.2, and part 3.
Overall serde
’s approach for serialization works out pretty well. One thing I
forgot to include in the last post was that I also have two benchmarks that are
not using serde
, but are just safely reading and writing values. Assuming I
haven’t missed anything, they should be the upper limit in performance we can
get out of any serialization framework: Here’s
serialization:
language | library | serialization (MB/s) |
---|---|---|
rust | max without string escapes | 353 |
c++ | rapidjson | 304 |
rust | max with string escape | 234 |
rust | serde::json | 201 |
rust | serialize::json | 147 |
go | ffjson | 147 |
So beyond optimizing string escaping, serde::json
is only 14% slower than the
zero-cost version and 34% slower than rapidjson
.
Deserialization, on the other hand, still has a ways to go:
language | library | deserialization (MB/s) |
---|---|---|
rust | rapidjson (SAX) | 189 |
c++ | rapidjson (DOM) | 162 |
rust | max with Iterator<u8> | 152 |
go | ffjson | 95 |
rust | max with Reader | 78 |
rust | serde::json | 73 |
rust | serialize::json | 24 |
There are a couple interesting things here:
First, serde::json
is built upon consuming from an Iterator<u8>
, so we’re
48% slower than our theoretical max, and 58% slower than rapidjson
. It looks
like tagged tokens, while faster than the closures in libserialize
, are still
pretty expensive.
Second, ffjson
is beating us and they compile dramatically faster too. The
goser test suite takes about 0.54
seconds to compile, whereas mine takes about 30 seconds at --opt-level=3
(!!). Rust itself is only taking 1.5 seconds, the rest is spent in LLVM. With
no optimization, it compiles “only” in 5.6 seconds, and is 96% slower.
Third, Reader
is a surprisingly expensive trait when dealing with a format
like JSON that need to read a byte at a time. It turns out we’re not
generating great code for
types with padding. aatch has been working on fixing this though.
Since I wrote that last section, I did a little more experimentation to try to figure out why our serialization upper bound is 23% slower than rapidjson. And, well, maybe I found it?
serialization (MB/s) | |
---|---|
serde::json with a MyMemWriter | 346 |
serde::json with a Vec |
247 |
All I did with MyMemWriter
is copy the Vec::<u8>
implementation of Writer
into the local codebase:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
Somehow it’s not enough to just mark Vec::write
as
#[inline]
, having it in the same file gave LLVM enough information to
optimize it’s overhead away. Even using #[inline(always)]
on Vec::write
and
Vec::push_all
isn’t able to get the same increase, so I’m not sure how to
replicate this in the general case.
Also interesting is bench_serializer_slice
, which uses BufWriter
.
serialization (MB/s) | |
---|---|
serde::json with a BufWriter | 342 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Another digression. Since I wrote the above, aatch has put out some PRs that
should help speed up enums.
19898 and
#20060 and was able to optimize
the padding out of enums and fix an issue with returns generating bad code. In
my bug from earlier
his patches were able to speed up my benchmark returning an
Result<(), IoError>
from running at 40MB/s to 88MB/s. However, if we’re able
to reduce IoError
down to a word, we get the performance up to 730MB/s! We
also might get enum compression, so a type like Result<(), IoError>
then
would speed up to 1200MB/s! I think going in this direction is going to really
help speed things up.
That was taking a while, so until next time!