Wow, home stretch! Here’s the rest of the series if you want to catch up: part 1, part 2, part 2.1, part 2.2, and part 3.
serde’s approach for serialization works out pretty well. One thing I
forgot to include in the last post was that I also have two benchmarks that are
serde, but are just safely reading and writing values. Assuming I
haven’t missed anything, they should be the upper limit in performance we can
get out of any serialization framework: Here’s
|rust||max without string escapes||353|
|rust||max with string escape||234|
So beyond optimizing string escaping,
serde::json is only 14% slower than the
zero-cost version and 34% slower than
Deserialization, on the other hand, still has a ways to go:
|rust||max with Iterator<u8>||152|
|rust||max with Reader||78|
There are a couple interesting things here:
serde::json is built upon consuming from an
Iterator<u8>, so we’re
48% slower than our theoretical max, and 58% slower than
rapidjson. It looks
like tagged tokens, while faster than the closures in
libserialize, are still
ffjson is beating us and they compile dramatically faster too. The
goser test suite takes about 0.54
seconds to compile, whereas mine takes about 30 seconds at
(!!). Rust itself is only taking 1.5 seconds, the rest is spent in LLVM. With
no optimization, it compiles “only” in 5.6 seconds, and is 96% slower.
Reader is a surprisingly expensive trait when dealing with a format
like JSON that need to read a byte at a time. It turns out we’re not
generating great code for
types with padding. aatch has been working on fixing this though.
Since I wrote that last section, I did a little more experimentation to try to figure out why our serialization upper bound is 23% slower than rapidjson. And, well, maybe I found it?
|serde::json with a MyMemWriter||346|
| serde::json with a Vec
All I did with
MyMemWriter is copy the
Vec::<u8> implementation of
into the local codebase:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
Somehow it’s not enough to just mark
#[inline], having it in the same file gave LLVM enough information to
optimize it’s overhead away. Even using
Vec::push_all isn’t able to get the same increase, so I’m not sure how to
replicate this in the general case.
Also interesting is
bench_serializer_slice, which uses
|serde::json with a BufWriter||342|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Another digression. Since I wrote the above, aatch has put out some PRs that
should help speed up enums.
#20060 and was able to optimize
the padding out of enums and fix an issue with returns generating bad code. In
my bug from earlier
his patches were able to speed up my benchmark returning an
Result<(), IoError> from running at 40MB/s to 88MB/s. However, if we’re able
IoError down to a word, we get the performance up to 730MB/s! We
also might get enum compression, so a type like
Result<(), IoError> then
would speed up to 1200MB/s! I think going in this direction is going to really
help speed things up.
That was taking a while, so until next time!