##// END OF EJS Templates
copies-rust: add smarter approach for merging small mapping with large mapping...
copies-rust: add smarter approach for merging small mapping with large mapping The current approach (finding the smaller updated set) works great when the mapping have similar size, but do a lot of unnecessary work when one side is tinier than the other one. So we do better in theses cases. See inline documentation for details. It give a sizeable boost to many of out slower cases: Repo Case Source-Rev Dest-Rev # of revisions old time new time Difference Factor time per rev --------------------------------------------------------------------------------------------------------------------------------------------------------------- mozilla-try x00000_revs_x_added_0_copies 6a320851d377 1ebb79acd503 : 363753 revs, 18.123103 s, 5.693818 s, -12.429285 s, × 0.3142, 15 µs/rev mozilla-try x00000_revs_x_added_x_copies 5173c4b6f97c 95d83ee7242d : 362229 revs, 17.907312 s, 5.677655 s, -12.229657 s, × 0.3171, 15 µs/rev mozilla-try x00000_revs_x000_added_x_copies 9126823d0e9c ca82787bb23c : 359344 revs, 17.684797 s, 5.563370 s, -12.121427 s, × 0.3146, 15 µs/rev mozilla-try x00000_revs_x0000_added_x0000_copies 8d3fafa80d4b eb884023b810 : 192665 revs, 2.881471 s, 2.864099 s, -0.017372 s, × 0.9940, 14 µs/rev mozilla-try x00000_revs_x00000_added_x000_copies 9b2a99adc05e 8e29777b48e6 : 382065 revs, 63.148971 s, 59.498652 s, -3.650319 s, × 0.9422, 155 µs/rev mozilla-try x00000_revs_x00000_added_x000_copies 9b2a99adc05e 8e29777b48e6 : 382065 revs, 63.148971 s, 59.498652 s, -3.650319 s, × 0.9422, 155 µs/rev ideally, the im-rs object would have a `merge` method, but it does not (yet) Full timing comparison below (they are one pathological case than become even worse, for unclear reason). Repo Case Source-Rev Dest-Rev # of revisions old time new time Difference Factor time per rev --------------------------------------------------------------------------------------------------------------------------------------------------------------- mercurial x_revs_x_added_0_copies ad6b123de1c7 39cfcef4f463 : 1 revs, 0.000043 s, 0.000042 s, -0.000001 s, × 0.9767, 42 µs/rev mercurial x_revs_x_added_x_copies 2b1c78674230 0c1d10351869 : 6 revs, 0.000105 s, 0.000104 s, -0.000001 s, × 0.9905, 17 µs/rev mercurial x000_revs_x000_added_x_copies 81f8ff2a9bf2 dd3267698d84 : 1032 revs, 0.004895 s, 0.004913 s, +0.000018 s, × 1.0037, 4 µs/rev pypy x_revs_x_added_0_copies aed021ee8ae8 099ed31b181b : 9 revs, 0.000194 s, 0.000191 s, -0.000003 s, × 0.9845, 21 µs/rev pypy x_revs_x000_added_0_copies 4aa4e1f8e19a 359343b9ac0e : 1 revs, 0.000050 s, 0.000050 s, +0.000000 s, × 1.0000, 50 µs/rev pypy x_revs_x_added_x_copies ac52eb7bbbb0 72e022663155 : 7 revs, 0.000115 s, 0.000112 s, -0.000003 s, × 0.9739, 16 µs/rev pypy x_revs_x00_added_x_copies c3b14617fbd7 ace7255d9a26 : 1 revs, 0.000289 s, 0.000288 s, -0.000001 s, × 0.9965, 288 µs/rev pypy x_revs_x000_added_x000_copies df6f7a526b60 a83dc6a2d56f : 6 revs, 0.010513 s, 0.010411 s, -0.000102 s, × 0.9903, 1735 µs/rev pypy x000_revs_xx00_added_0_copies 89a76aede314 2f22446ff07e : 4785 revs, 0.051474 s, 0.052852 s, +0.001378 s, × 1.0268, 11 µs/rev pypy x000_revs_x000_added_x_copies 8a3b5bfd266e 2c68e87c3efe : 6780 revs, 0.088086 s, 0.092828 s, +0.004742 s, × 1.0538, 13 µs/rev pypy x000_revs_x000_added_x000_copies 89a76aede314 7b3dda341c84 : 5441 revs, 0.062176 s, 0.063269 s, +0.001093 s, × 1.0176, 11 µs/rev pypy x0000_revs_x_added_0_copies d1defd0dc478 c9cb1334cc78 : 43645 revs, 0.720950 s, 0.711975 s, -0.008975 s, × 0.9876, 16 µs/rev pypy x0000_revs_xx000_added_0_copies bf2c629d0071 4ffed77c095c : 2 revs, 0.012897 s, 0.012771 s, -0.000126 s, × 0.9902, 6385 µs/rev pypy x0000_revs_xx000_added_x000_copies 08ea3258278e d9fa043f30c0 : 11316 revs, 0.121524 s, 0.124505 s, +0.002981 s, × 1.0245, 11 µs/rev netbeans x_revs_x_added_0_copies fb0955ffcbcd a01e9239f9e7 : 2 revs, 0.000082 s, 0.000082 s, +0.000000 s, × 1.0000, 41 µs/rev netbeans x_revs_x000_added_0_copies 6f360122949f 20eb231cc7d0 : 2 revs, 0.000109 s, 0.000111 s, +0.000002 s, × 1.0183, 55 µs/rev netbeans x_revs_x_added_x_copies 1ada3faf6fb6 5a39d12eecf4 : 3 revs, 0.000175 s, 0.000171 s, -0.000004 s, × 0.9771, 57 µs/rev netbeans x_revs_x00_added_x_copies 35be93ba1e2c 9eec5e90c05f : 9 revs, 0.000719 s, 0.000708 s, -0.000011 s, × 0.9847, 78 µs/rev netbeans x000_revs_xx00_added_0_copies eac3045b4fdd 51d4ae7f1290 : 1421 revs, 0.010426 s, 0.010608 s, +0.000182 s, × 1.0175, 7 µs/rev netbeans x000_revs_x000_added_x_copies e2063d266acd 6081d72689dc : 1533 revs, 0.015712 s, 0.015635 s, -0.000077 s, × 0.9951, 10 µs/rev netbeans x000_revs_x000_added_x000_copies ff453e9fee32 411350406ec2 : 5750 revs, 0.077353 s, 0.072072 s, -0.005281 s, × 0.9317, 12 µs/rev netbeans x0000_revs_xx000_added_x000_copies 588c2d1ced70 1aad62e59ddd : 66949 revs, 0.673930 s, 0.682732 s, +0.008802 s, × 1.0131, 10 µs/rev mozilla-central x_revs_x_added_0_copies 3697f962bb7b 7015fcdd43a2 : 2 revs, 0.000089 s, 0.000090 s, +0.000001 s, × 1.0112, 45 µs/rev mozilla-central x_revs_x000_added_0_copies dd390860c6c9 40d0c5bed75d : 8 revs, 0.000212 s, 0.000210 s, -0.000002 s, × 0.9906, 26 µs/rev mozilla-central x_revs_x_added_x_copies 8d198483ae3b 14207ffc2b2f : 9 revs, 0.000183 s, 0.000182 s, -0.000001 s, × 0.9945, 20 µs/rev mozilla-central x_revs_x00_added_x_copies 98cbc58cc6bc 446a150332c3 : 7 revs, 0.000595 s, 0.000594 s, -0.000001 s, × 0.9983, 84 µs/rev mozilla-central x_revs_x000_added_x000_copies 3c684b4b8f68 0a5e72d1b479 : 3 revs, 0.003117 s, 0.003102 s, -0.000015 s, × 0.9952, 1034 µs/rev mozilla-central x_revs_x0000_added_x0000_copies effb563bb7e5 c07a39dc4e80 : 6 revs, 0.060197 s, 0.060234 s, +0.000037 s, × 1.0006, 10039 µs/rev mozilla-central x000_revs_xx00_added_0_copies 6100d773079a 04a55431795e : 1593 revs, 0.006379 s, 0.006300 s, -0.000079 s, × 0.9876, 3 µs/rev mozilla-central x000_revs_x000_added_x_copies 9f17a6fc04f9 2d37b966abed : 41 revs, 0.005008 s, 0.004817 s, -0.000191 s, × 0.9619, 117 µs/rev mozilla-central x000_revs_x000_added_x000_copies 7c97034feb78 4407bd0c6330 : 7839 revs, 0.065123 s, 0.065451 s, +0.000328 s, × 1.0050, 8 µs/rev mozilla-central x0000_revs_xx000_added_0_copies 9eec5917337d 67118cc6dcad : 615 revs, 0.026404 s, 0.026282 s, -0.000122 s, × 0.9954, 42 µs/rev mozilla-central x0000_revs_xx000_added_x000_copies f78c615a656c 96a38b690156 : 30263 revs, 0.203456 s, 0.206873 s, +0.003417 s, × 1.0168, 6 µs/rev mozilla-central x00000_revs_x0000_added_x0000_copies 6832ae71433c 4c222a1d9a00 : 153721 revs, 1.929809 s, 1.935918 s, +0.006109 s, × 1.0032, 12 µs/rev mozilla-central x00000_revs_x00000_added_x000_copies 76caed42cf7c 1daa622bbe42 : 204976 revs, 2.825064 s, 2.827320 s, +0.002256 s, × 1.0008, 13 µs/rev mozilla-try x_revs_x_added_0_copies aaf6dde0deb8 9790f499805a : 2 revs, 0.000857 s, 0.000842 s, -0.000015 s, × 0.9825, 421 µs/rev mozilla-try x_revs_x000_added_0_copies d8d0222927b4 5bb8ce8c7450 : 2 revs, 0.000870 s, 0.000870 s, +0.000000 s, × 1.0000, 435 µs/rev mozilla-try x_revs_x_added_x_copies 092fcca11bdb 936255a0384a : 4 revs, 0.000161 s, 0.000165 s, +0.000004 s, × 1.0248, 41 µs/rev mozilla-try x_revs_x00_added_x_copies b53d2fadbdb5 017afae788ec : 2 revs, 0.001147 s, 0.001145 s, -0.000002 s, × 0.9983, 572 µs/rev mozilla-try x_revs_x000_added_x000_copies 20408ad61ce5 6f0ee96e21ad : 1 revs, 0.026640 s, 0.026500 s, -0.000140 s, × 0.9947, 26500 µs/rev mozilla-try x_revs_x0000_added_x0000_copies effb563bb7e5 c07a39dc4e80 : 6 revs, 0.059849 s, 0.059407 s, -0.000442 s, × 0.9926, 9901 µs/rev mozilla-try x000_revs_xx00_added_0_copies 6100d773079a 04a55431795e : 1593 revs, 0.006326 s, 0.006325 s, -0.000001 s, × 0.9998, 3 µs/rev mozilla-try x000_revs_x000_added_x_copies 9f17a6fc04f9 2d37b966abed : 41 revs, 0.005188 s, 0.005171 s, -0.000017 s, × 0.9967, 126 µs/rev mozilla-try x000_revs_x000_added_x000_copies 1346fd0130e4 4c65cbdabc1f : 6657 revs, 0.067633 s, 0.066837 s, -0.000796 s, × 0.9882, 10 µs/rev mozilla-try x0000_revs_x_added_0_copies 63519bfd42ee a36a2a865d92 : 40314 revs, 0.306969 s, 0.314252 s, +0.007283 s, × 1.0237, 7 µs/rev mozilla-try x0000_revs_x_added_x_copies 9fe69ff0762d bcabf2a78927 : 38690 revs, 0.293370 s, 0.304160 s, +0.010790 s, × 1.0368, 7 µs/rev mozilla-try x0000_revs_xx000_added_x_copies 156f6e2674f2 4d0f2c178e66 : 8598 revs, 0.087159 s, 0.089223 s, +0.002064 s, × 1.0237, 10 µs/rev mozilla-try x0000_revs_xx000_added_0_copies 9eec5917337d 67118cc6dcad : 615 revs, 0.027251 s, 0.026711 s, -0.000540 s, × 0.9802, 43 µs/rev mozilla-try x0000_revs_xx000_added_x000_copies 89294cd501d9 7ccb2fc7ccb5 : 97052 revs, 3.010011 s, 3.243010 s, +0.232999 s, × 1.0774, 33 µs/rev mozilla-try x0000_revs_x0000_added_x0000_copies e928c65095ed e951f4ad123a : 52031 revs, 0.753434 s, 0.756500 s, +0.003066 s, × 1.0041, 14 µs/rev mozilla-try x00000_revs_x_added_0_copies 6a320851d377 1ebb79acd503 : 363753 revs, 18.123103 s, 5.693818 s, -12.429285 s, × 0.3142, 15 µs/rev mozilla-try x00000_revs_x00000_added_0_copies dc8a3ca7010e d16fde900c9c : 34414 revs, 0.583206 s, 0.590904 s, +0.007698 s, × 1.0132, 17 µs/rev mozilla-try x00000_revs_x_added_x_copies 5173c4b6f97c 95d83ee7242d : 362229 revs, 17.907312 s, 5.677655 s, -12.229657 s, × 0.3171, 15 µs/rev mozilla-try x00000_revs_x000_added_x_copies 9126823d0e9c ca82787bb23c : 359344 revs, 17.684797 s, 5.563370 s, -12.121427 s, × 0.3146, 15 µs/rev mozilla-try x00000_revs_x0000_added_x0000_copies 8d3fafa80d4b eb884023b810 : 192665 revs, 2.881471 s, 2.864099 s, -0.017372 s, × 0.9940, 14 µs/rev mozilla-try x00000_revs_x00000_added_x0000_copies 1b661134e2ca 1ae03d022d6d : 228985 revs, 101.062002 s, 113.297287 s, +12.235285 s, × 1.1211, 494 µs/rev mozilla-try x00000_revs_x00000_added_x000_copies 9b2a99adc05e 8e29777b48e6 : 382065 revs, 63.148971 s, 59.498652 s, -3.650319 s, × 0.9422, 155 µs/rev Differential Revision: https://phab.mercurial-scm.org/D9491

File last commit:

r43207:69de49c4 default
r46744:c94d013e default
Show More
decompressionreader.c
781 lines | 17.6 KiB | text/x-c | CLexer
/**
* Copyright (c) 2017-present, Gregory Szorc
* All rights reserved.
*
* This software may be modified and distributed under the terms
* of the BSD license. See the LICENSE file for details.
*/
#include "python-zstandard.h"
extern PyObject* ZstdError;
static void set_unsupported_operation(void) {
PyObject* iomod;
PyObject* exc;
iomod = PyImport_ImportModule("io");
if (NULL == iomod) {
return;
}
exc = PyObject_GetAttrString(iomod, "UnsupportedOperation");
if (NULL == exc) {
Py_DECREF(iomod);
return;
}
PyErr_SetNone(exc);
Py_DECREF(exc);
Py_DECREF(iomod);
}
static void reader_dealloc(ZstdDecompressionReader* self) {
Py_XDECREF(self->decompressor);
Py_XDECREF(self->reader);
if (self->buffer.buf) {
PyBuffer_Release(&self->buffer);
}
PyObject_Del(self);
}
static ZstdDecompressionReader* reader_enter(ZstdDecompressionReader* self) {
if (self->entered) {
PyErr_SetString(PyExc_ValueError, "cannot __enter__ multiple times");
return NULL;
}
self->entered = 1;
Py_INCREF(self);
return self;
}
static PyObject* reader_exit(ZstdDecompressionReader* self, PyObject* args) {
PyObject* exc_type;
PyObject* exc_value;
PyObject* exc_tb;
if (!PyArg_ParseTuple(args, "OOO:__exit__", &exc_type, &exc_value, &exc_tb)) {
return NULL;
}
self->entered = 0;
self->closed = 1;
/* Release resources. */
Py_CLEAR(self->reader);
if (self->buffer.buf) {
PyBuffer_Release(&self->buffer);
memset(&self->buffer, 0, sizeof(self->buffer));
}
Py_CLEAR(self->decompressor);
Py_RETURN_FALSE;
}
static PyObject* reader_readable(PyObject* self) {
Py_RETURN_TRUE;
}
static PyObject* reader_writable(PyObject* self) {
Py_RETURN_FALSE;
}
static PyObject* reader_seekable(PyObject* self) {
Py_RETURN_TRUE;
}
static PyObject* reader_close(ZstdDecompressionReader* self) {
self->closed = 1;
Py_RETURN_NONE;
}
static PyObject* reader_flush(PyObject* self) {
Py_RETURN_NONE;
}
static PyObject* reader_isatty(PyObject* self) {
Py_RETURN_FALSE;
}
/**
* Read available input.
*
* Returns 0 if no data was added to input.
* Returns 1 if new input data is available.
* Returns -1 on error and sets a Python exception as a side-effect.
*/
int read_decompressor_input(ZstdDecompressionReader* self) {
if (self->finishedInput) {
return 0;
}
if (self->input.pos != self->input.size) {
return 0;
}
if (self->reader) {
Py_buffer buffer;
assert(self->readResult == NULL);
self->readResult = PyObject_CallMethod(self->reader, "read",
"k", self->readSize);
if (NULL == self->readResult) {
return -1;
}
memset(&buffer, 0, sizeof(buffer));
if (0 != PyObject_GetBuffer(self->readResult, &buffer, PyBUF_CONTIG_RO)) {
return -1;
}
/* EOF */
if (0 == buffer.len) {
self->finishedInput = 1;
Py_CLEAR(self->readResult);
}
else {
self->input.src = buffer.buf;
self->input.size = buffer.len;
self->input.pos = 0;
}
PyBuffer_Release(&buffer);
}
else {
assert(self->buffer.buf);
/*
* We should only get here once since expectation is we always
* exhaust input buffer before reading again.
*/
assert(self->input.src == NULL);
self->input.src = self->buffer.buf;
self->input.size = self->buffer.len;
self->input.pos = 0;
}
return 1;
}
/**
* Decompresses available input into an output buffer.
*
* Returns 0 if we need more input.
* Returns 1 if output buffer should be emitted.
* Returns -1 on error and sets a Python exception.
*/
int decompress_input(ZstdDecompressionReader* self, ZSTD_outBuffer* output) {
size_t zresult;
if (self->input.pos >= self->input.size) {
return 0;
}
Py_BEGIN_ALLOW_THREADS
zresult = ZSTD_decompressStream(self->decompressor->dctx, output, &self->input);
Py_END_ALLOW_THREADS
/* Input exhausted. Clear our state tracking. */
if (self->input.pos == self->input.size) {
memset(&self->input, 0, sizeof(self->input));
Py_CLEAR(self->readResult);
if (self->buffer.buf) {
self->finishedInput = 1;
}
}
if (ZSTD_isError(zresult)) {
PyErr_Format(ZstdError, "zstd decompress error: %s", ZSTD_getErrorName(zresult));
return -1;
}
/* We fulfilled the full read request. Signal to emit. */
if (output->pos && output->pos == output->size) {
return 1;
}
/* We're at the end of a frame and we aren't allowed to return data
spanning frames. */
else if (output->pos && zresult == 0 && !self->readAcrossFrames) {
return 1;
}
/* There is more room in the output. Signal to collect more data. */
return 0;
}
static PyObject* reader_read(ZstdDecompressionReader* self, PyObject* args, PyObject* kwargs) {
static char* kwlist[] = {
"size",
NULL
};
Py_ssize_t size = -1;
PyObject* result = NULL;
char* resultBuffer;
Py_ssize_t resultSize;
ZSTD_outBuffer output;
int decompressResult, readResult;
if (self->closed) {
PyErr_SetString(PyExc_ValueError, "stream is closed");
return NULL;
}
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &size)) {
return NULL;
}
if (size < -1) {
PyErr_SetString(PyExc_ValueError, "cannot read negative amounts less than -1");
return NULL;
}
if (size == -1) {
return PyObject_CallMethod((PyObject*)self, "readall", NULL);
}
if (self->finishedOutput || size == 0) {
return PyBytes_FromStringAndSize("", 0);
}
result = PyBytes_FromStringAndSize(NULL, size);
if (NULL == result) {
return NULL;
}
PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize);
output.dst = resultBuffer;
output.size = resultSize;
output.pos = 0;
readinput:
decompressResult = decompress_input(self, &output);
if (-1 == decompressResult) {
Py_XDECREF(result);
return NULL;
}
else if (0 == decompressResult) { }
else if (1 == decompressResult) {
self->bytesDecompressed += output.pos;
if (output.pos != output.size) {
if (safe_pybytes_resize(&result, output.pos)) {
Py_XDECREF(result);
return NULL;
}
}
return result;
}
else {
assert(0);
}
readResult = read_decompressor_input(self);
if (-1 == readResult) {
Py_XDECREF(result);
return NULL;
}
else if (0 == readResult) {}
else if (1 == readResult) {}
else {
assert(0);
}
if (self->input.size) {
goto readinput;
}
/* EOF */
self->bytesDecompressed += output.pos;
if (safe_pybytes_resize(&result, output.pos)) {
Py_XDECREF(result);
return NULL;
}
return result;
}
static PyObject* reader_read1(ZstdDecompressionReader* self, PyObject* args, PyObject* kwargs) {
static char* kwlist[] = {
"size",
NULL
};
Py_ssize_t size = -1;
PyObject* result = NULL;
char* resultBuffer;
Py_ssize_t resultSize;
ZSTD_outBuffer output;
if (self->closed) {
PyErr_SetString(PyExc_ValueError, "stream is closed");
return NULL;
}
if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &size)) {
return NULL;
}
if (size < -1) {
PyErr_SetString(PyExc_ValueError, "cannot read negative amounts less than -1");
return NULL;
}
if (self->finishedOutput || size == 0) {
return PyBytes_FromStringAndSize("", 0);
}
if (size == -1) {
size = ZSTD_DStreamOutSize();
}
result = PyBytes_FromStringAndSize(NULL, size);
if (NULL == result) {
return NULL;
}
PyBytes_AsStringAndSize(result, &resultBuffer, &resultSize);
output.dst = resultBuffer;
output.size = resultSize;
output.pos = 0;
/* read1() is supposed to use at most 1 read() from the underlying stream.
* However, we can't satisfy this requirement with decompression due to the
* nature of how decompression works. Our strategy is to read + decompress
* until we get any output, at which point we return. This satisfies the
* intent of the read1() API to limit read operations.
*/
while (!self->finishedInput) {
int readResult, decompressResult;
readResult = read_decompressor_input(self);
if (-1 == readResult) {
Py_XDECREF(result);
return NULL;
}
else if (0 == readResult || 1 == readResult) { }
else {
assert(0);
}
decompressResult = decompress_input(self, &output);
if (-1 == decompressResult) {
Py_XDECREF(result);
return NULL;
}
else if (0 == decompressResult || 1 == decompressResult) { }
else {
assert(0);
}
if (output.pos) {
break;
}
}
self->bytesDecompressed += output.pos;
if (safe_pybytes_resize(&result, output.pos)) {
Py_XDECREF(result);
return NULL;
}
return result;
}
static PyObject* reader_readinto(ZstdDecompressionReader* self, PyObject* args) {
Py_buffer dest;
ZSTD_outBuffer output;
int decompressResult, readResult;
PyObject* result = NULL;
if (self->closed) {
PyErr_SetString(PyExc_ValueError, "stream is closed");
return NULL;
}
if (self->finishedOutput) {
return PyLong_FromLong(0);
}
if (!PyArg_ParseTuple(args, "w*:readinto", &dest)) {
return NULL;
}
if (!PyBuffer_IsContiguous(&dest, 'C') || dest.ndim > 1) {
PyErr_SetString(PyExc_ValueError,
"destination buffer should be contiguous and have at most one dimension");
goto finally;
}
output.dst = dest.buf;
output.size = dest.len;
output.pos = 0;
readinput:
decompressResult = decompress_input(self, &output);
if (-1 == decompressResult) {
goto finally;
}
else if (0 == decompressResult) { }
else if (1 == decompressResult) {
self->bytesDecompressed += output.pos;
result = PyLong_FromSize_t(output.pos);
goto finally;
}
else {
assert(0);
}
readResult = read_decompressor_input(self);
if (-1 == readResult) {
goto finally;
}
else if (0 == readResult) {}
else if (1 == readResult) {}
else {
assert(0);
}
if (self->input.size) {
goto readinput;
}
/* EOF */
self->bytesDecompressed += output.pos;
result = PyLong_FromSize_t(output.pos);
finally:
PyBuffer_Release(&dest);
return result;
}
static PyObject* reader_readinto1(ZstdDecompressionReader* self, PyObject* args) {
Py_buffer dest;
ZSTD_outBuffer output;
PyObject* result = NULL;
if (self->closed) {
PyErr_SetString(PyExc_ValueError, "stream is closed");
return NULL;
}
if (self->finishedOutput) {
return PyLong_FromLong(0);
}
if (!PyArg_ParseTuple(args, "w*:readinto1", &dest)) {
return NULL;
}
if (!PyBuffer_IsContiguous(&dest, 'C') || dest.ndim > 1) {
PyErr_SetString(PyExc_ValueError,
"destination buffer should be contiguous and have at most one dimension");
goto finally;
}
output.dst = dest.buf;
output.size = dest.len;
output.pos = 0;
while (!self->finishedInput && !self->finishedOutput) {
int decompressResult, readResult;
readResult = read_decompressor_input(self);
if (-1 == readResult) {
goto finally;
}
else if (0 == readResult || 1 == readResult) {}
else {
assert(0);
}
decompressResult = decompress_input(self, &output);
if (-1 == decompressResult) {
goto finally;
}
else if (0 == decompressResult || 1 == decompressResult) {}
else {
assert(0);
}
if (output.pos) {
break;
}
}
self->bytesDecompressed += output.pos;
result = PyLong_FromSize_t(output.pos);
finally:
PyBuffer_Release(&dest);
return result;
}
static PyObject* reader_readall(PyObject* self) {
PyObject* chunks = NULL;
PyObject* empty = NULL;
PyObject* result = NULL;
/* Our strategy is to collect chunks into a list then join all the
* chunks at the end. We could potentially use e.g. an io.BytesIO. But
* this feels simple enough to implement and avoids potentially expensive
* reallocations of large buffers.
*/
chunks = PyList_New(0);
if (NULL == chunks) {
return NULL;
}
while (1) {
PyObject* chunk = PyObject_CallMethod(self, "read", "i", 1048576);
if (NULL == chunk) {
Py_DECREF(chunks);
return NULL;
}
if (!PyBytes_Size(chunk)) {
Py_DECREF(chunk);
break;
}
if (PyList_Append(chunks, chunk)) {
Py_DECREF(chunk);
Py_DECREF(chunks);
return NULL;
}
Py_DECREF(chunk);
}
empty = PyBytes_FromStringAndSize("", 0);
if (NULL == empty) {
Py_DECREF(chunks);
return NULL;
}
result = PyObject_CallMethod(empty, "join", "O", chunks);
Py_DECREF(empty);
Py_DECREF(chunks);
return result;
}
static PyObject* reader_readline(PyObject* self) {
set_unsupported_operation();
return NULL;
}
static PyObject* reader_readlines(PyObject* self) {
set_unsupported_operation();
return NULL;
}
static PyObject* reader_seek(ZstdDecompressionReader* self, PyObject* args) {
Py_ssize_t pos;
int whence = 0;
unsigned long long readAmount = 0;
size_t defaultOutSize = ZSTD_DStreamOutSize();
if (self->closed) {
PyErr_SetString(PyExc_ValueError, "stream is closed");
return NULL;
}
if (!PyArg_ParseTuple(args, "n|i:seek", &pos, &whence)) {
return NULL;
}
if (whence == SEEK_SET) {
if (pos < 0) {
PyErr_SetString(PyExc_ValueError,
"cannot seek to negative position with SEEK_SET");
return NULL;
}
if ((unsigned long long)pos < self->bytesDecompressed) {
PyErr_SetString(PyExc_ValueError,
"cannot seek zstd decompression stream backwards");
return NULL;
}
readAmount = pos - self->bytesDecompressed;
}
else if (whence == SEEK_CUR) {
if (pos < 0) {
PyErr_SetString(PyExc_ValueError,
"cannot seek zstd decompression stream backwards");
return NULL;
}
readAmount = pos;
}
else if (whence == SEEK_END) {
/* We /could/ support this with pos==0. But let's not do that until someone
needs it. */
PyErr_SetString(PyExc_ValueError,
"zstd decompression streams cannot be seeked with SEEK_END");
return NULL;
}
/* It is a bit inefficient to do this via the Python API. But since there
is a bit of state tracking involved to read from this type, it is the
easiest to implement. */
while (readAmount) {
Py_ssize_t readSize;
PyObject* readResult = PyObject_CallMethod((PyObject*)self, "read", "K",
readAmount < defaultOutSize ? readAmount : defaultOutSize);
if (!readResult) {
return NULL;
}
readSize = PyBytes_GET_SIZE(readResult);
Py_CLEAR(readResult);
/* Empty read means EOF. */
if (!readSize) {
break;
}
readAmount -= readSize;
}
return PyLong_FromUnsignedLongLong(self->bytesDecompressed);
}
static PyObject* reader_tell(ZstdDecompressionReader* self) {
/* TODO should this raise OSError since stream isn't seekable? */
return PyLong_FromUnsignedLongLong(self->bytesDecompressed);
}
static PyObject* reader_write(PyObject* self, PyObject* args) {
set_unsupported_operation();
return NULL;
}
static PyObject* reader_writelines(PyObject* self, PyObject* args) {
set_unsupported_operation();
return NULL;
}
static PyObject* reader_iter(PyObject* self) {
set_unsupported_operation();
return NULL;
}
static PyObject* reader_iternext(PyObject* self) {
set_unsupported_operation();
return NULL;
}
static PyMethodDef reader_methods[] = {
{ "__enter__", (PyCFunction)reader_enter, METH_NOARGS,
PyDoc_STR("Enter a compression context") },
{ "__exit__", (PyCFunction)reader_exit, METH_VARARGS,
PyDoc_STR("Exit a compression context") },
{ "close", (PyCFunction)reader_close, METH_NOARGS,
PyDoc_STR("Close the stream so it cannot perform any more operations") },
{ "flush", (PyCFunction)reader_flush, METH_NOARGS, PyDoc_STR("no-ops") },
{ "isatty", (PyCFunction)reader_isatty, METH_NOARGS, PyDoc_STR("Returns False") },
{ "readable", (PyCFunction)reader_readable, METH_NOARGS,
PyDoc_STR("Returns True") },
{ "read", (PyCFunction)reader_read, METH_VARARGS | METH_KEYWORDS,
PyDoc_STR("read compressed data") },
{ "read1", (PyCFunction)reader_read1, METH_VARARGS | METH_KEYWORDS,
PyDoc_STR("read compressed data") },
{ "readinto", (PyCFunction)reader_readinto, METH_VARARGS, NULL },
{ "readinto1", (PyCFunction)reader_readinto1, METH_VARARGS, NULL },
{ "readall", (PyCFunction)reader_readall, METH_NOARGS, PyDoc_STR("Not implemented") },
{ "readline", (PyCFunction)reader_readline, METH_NOARGS, PyDoc_STR("Not implemented") },
{ "readlines", (PyCFunction)reader_readlines, METH_NOARGS, PyDoc_STR("Not implemented") },
{ "seek", (PyCFunction)reader_seek, METH_VARARGS, PyDoc_STR("Seek the stream") },
{ "seekable", (PyCFunction)reader_seekable, METH_NOARGS,
PyDoc_STR("Returns True") },
{ "tell", (PyCFunction)reader_tell, METH_NOARGS,
PyDoc_STR("Returns current number of bytes compressed") },
{ "writable", (PyCFunction)reader_writable, METH_NOARGS,
PyDoc_STR("Returns False") },
{ "write", (PyCFunction)reader_write, METH_VARARGS, PyDoc_STR("unsupported operation") },
{ "writelines", (PyCFunction)reader_writelines, METH_VARARGS, PyDoc_STR("unsupported operation") },
{ NULL, NULL }
};
static PyMemberDef reader_members[] = {
{ "closed", T_BOOL, offsetof(ZstdDecompressionReader, closed),
READONLY, "whether stream is closed" },
{ NULL }
};
PyTypeObject ZstdDecompressionReaderType = {
PyVarObject_HEAD_INIT(NULL, 0)
"zstd.ZstdDecompressionReader", /* tp_name */
sizeof(ZstdDecompressionReader), /* tp_basicsize */
0, /* tp_itemsize */
(destructor)reader_dealloc, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_compare */
0, /* tp_repr */
0, /* tp_as_number */
0, /* tp_as_sequence */
0, /* tp_as_mapping */
0, /* tp_hash */
0, /* tp_call */
0, /* tp_str */
0, /* tp_getattro */
0, /* tp_setattro */
0, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT, /* tp_flags */
0, /* tp_doc */
0, /* tp_traverse */
0, /* tp_clear */
0, /* tp_richcompare */
0, /* tp_weaklistoffset */
reader_iter, /* tp_iter */
reader_iternext, /* tp_iternext */
reader_methods, /* tp_methods */
reader_members, /* tp_members */
0, /* tp_getset */
0, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
0, /* tp_init */
0, /* tp_alloc */
PyType_GenericNew, /* tp_new */
};
void decompressionreader_module_init(PyObject* mod) {
/* TODO make reader a sub-class of io.RawIOBase */
Py_TYPE(&ZstdDecompressionReaderType) = &PyType_Type;
if (PyType_Ready(&ZstdDecompressionReaderType) < 0) {
return;
}
}