util: compression APIs to support revlog compression...
util: compression APIs to support revlog compression
As part of "zstd all of the things," we need to teach revlogs to
use non-zlib compression formats. Because we're routing all compression
via the "compression manager" and "compression engine" APIs, we need to
introduction functionality there for performing revlog operations.
Ideally, revlog compression and decompression operations would be
implemented in terms of simple "compress" and "decompress" primitives.
However, there are a few considerations that make us want to have a
specialized primitive for handling revlogs:
1) Performance. Revlogs tend to do compression and especially
decompression operations in batches. Any overhead for e.g.
instantiating a "context" for performing an operation can be
noticed. For this reason, our "revlog compressor" primitive is
reusable. For zstd, we reuse the same compression "context" for
multiple operations. I've measured this to have a performance
impact versus constructing new contexts for each operation.
2) Specialization. By having a primitive dedicated to revlog use,
we can make revlog-specific choices and leave the door open for
more functionality in the future. For example, the zstd revlog
compressor may one day make use of dictionary compression.
A future patch will introduce a decompress() on the compressor
object.
The code for the zlib compressor is basically copied from
revlog.compress(). Although it doesn't handle the empty input
case, the null first byte case, and the 'u' prefix case. These
cases will continue to be handled in revlog.py once that code is
ported to use this API.