Show More
@@ -0,0 +1,27 b'' | |||
|
1 | Copyright (c) 2016, Gregory Szorc | |
|
2 | All rights reserved. | |
|
3 | ||
|
4 | Redistribution and use in source and binary forms, with or without modification, | |
|
5 | are permitted provided that the following conditions are met: | |
|
6 | ||
|
7 | 1. Redistributions of source code must retain the above copyright notice, this | |
|
8 | list of conditions and the following disclaimer. | |
|
9 | ||
|
10 | 2. Redistributions in binary form must reproduce the above copyright notice, | |
|
11 | this list of conditions and the following disclaimer in the documentation | |
|
12 | and/or other materials provided with the distribution. | |
|
13 | ||
|
14 | 3. Neither the name of the copyright holder nor the names of its contributors | |
|
15 | may be used to endorse or promote products derived from this software without | |
|
16 | specific prior written permission. | |
|
17 | ||
|
18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |
|
19 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
|
20 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |
|
21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR | |
|
22 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES | |
|
23 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; | |
|
24 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON | |
|
25 | ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | |
|
26 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | |
|
27 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
@@ -0,0 +1,63 b'' | |||
|
1 | Version History | |
|
2 | =============== | |
|
3 | ||
|
4 | 0.5.0 (released 2016-11-10) | |
|
5 | --------------------------- | |
|
6 | ||
|
7 | * Vendored version of zstd updated to 1.1.1. | |
|
8 | * Continuous integration for Python 3.6 and 3.7 | |
|
9 | * Continuous integration for Conda | |
|
10 | * Added compression and decompression APIs providing similar interfaces | |
|
11 | to the standard library ``zlib`` and ``bz2`` modules. This allows | |
|
12 | coding to a common interface. | |
|
13 | * ``zstd.__version__` is now defined. | |
|
14 | * ``read_from()`` on various APIs now accepts objects implementing the buffer | |
|
15 | protocol. | |
|
16 | * ``read_from()`` has gained a ``skip_bytes`` argument. This allows callers | |
|
17 | to pass in an existing buffer with a header without having to create a | |
|
18 | slice or a new object. | |
|
19 | * Implemented ``ZstdCompressionDict.as_bytes()``. | |
|
20 | * Python's memory allocator is now used instead of ``malloc()``. | |
|
21 | * Low-level zstd data structures are reused in more instances, cutting down | |
|
22 | on overhead for certain operations. | |
|
23 | * ``distutils`` boilerplate for obtaining an ``Extension`` instance | |
|
24 | has now been refactored into a standalone ``setup_zstd.py`` file. This | |
|
25 | allows other projects with ``setup.py`` files to reuse the | |
|
26 | ``distutils`` code for this project without copying code. | |
|
27 | * The monolithic ``zstd.c`` file has been split into a header file defining | |
|
28 | types and separate ``.c`` source files for the implementation. | |
|
29 | ||
|
30 | History of the Project | |
|
31 | ====================== | |
|
32 | ||
|
33 | 2016-08-31 - Zstandard 1.0.0 is released and Gregory starts hacking on a | |
|
34 | Python extension for use by the Mercurial project. A very hacky prototype | |
|
35 | is sent to the mercurial-devel list for RFC. | |
|
36 | ||
|
37 | 2016-09-03 - Most functionality from Zstandard C API implemented. Source | |
|
38 | code published on https://github.com/indygreg/python-zstandard. Travis-CI | |
|
39 | automation configured. 0.0.1 release on PyPI. | |
|
40 | ||
|
41 | 2016-09-05 - After the API was rounded out a bit and support for Python | |
|
42 | 2.6 and 2.7 was added, version 0.1 was released to PyPI. | |
|
43 | ||
|
44 | 2016-09-05 - After the compressor and decompressor APIs were changed, 0.2 | |
|
45 | was released to PyPI. | |
|
46 | ||
|
47 | 2016-09-10 - 0.3 is released with a bunch of new features. ZstdCompressor | |
|
48 | now accepts arguments controlling frame parameters. The source size can now | |
|
49 | be declared when performing streaming compression. ZstdDecompressor.decompress() | |
|
50 | is implemented. Compression dictionaries are now cached when using the simple | |
|
51 | compression and decompression APIs. Memory size APIs added. | |
|
52 | ZstdCompressor.read_from() and ZstdDecompressor.read_from() have been | |
|
53 | implemented. This rounds out the major compression/decompression APIs planned | |
|
54 | by the author. | |
|
55 | ||
|
56 | 2016-10-02 - 0.3.3 is released with a bug fix for read_from not fully | |
|
57 | decoding a zstd frame (issue #2). | |
|
58 | ||
|
59 | 2016-10-02 - 0.4.0 is released with zstd 1.1.0, support for custom read and | |
|
60 | write buffer sizes, and a few bug fixes involving failure to read/write | |
|
61 | all data when buffer sizes were too small to hold remaining data. | |
|
62 | ||
|
63 | 2016-11-10 - 0.5.0 is released with zstd 1.1.1 and other enhancements. |
This diff has been collapsed as it changes many lines, (776 lines changed) Show them Hide them | |||
@@ -0,0 +1,776 b'' | |||
|
1 | ================ | |
|
2 | python-zstandard | |
|
3 | ================ | |
|
4 | ||
|
5 | This project provides a Python C extension for interfacing with the | |
|
6 | `Zstandard <http://www.zstd.net>`_ compression library. | |
|
7 | ||
|
8 | The primary goal of the extension is to provide a Pythonic interface to | |
|
9 | the underlying C API. This means exposing most of the features and flexibility | |
|
10 | of the C API while not sacrificing usability or safety that Python provides. | |
|
11 | ||
|
12 | | |ci-status| |win-ci-status| | |
|
13 | ||
|
14 | State of Project | |
|
15 | ================ | |
|
16 | ||
|
17 | The project is officially in beta state. The author is reasonably satisfied | |
|
18 | with the current API and that functionality works as advertised. There | |
|
19 | may be some backwards incompatible changes before 1.0. Though the author | |
|
20 | does not intend to make any major changes to the Python API. | |
|
21 | ||
|
22 | There is continuous integration for Python versions 2.6, 2.7, and 3.3+ | |
|
23 | on Linux x86_x64 and Windows x86 and x86_64. The author is reasonably | |
|
24 | confident the extension is stable and works as advertised on these | |
|
25 | platforms. | |
|
26 | ||
|
27 | Expected Changes | |
|
28 | ---------------- | |
|
29 | ||
|
30 | The author is reasonably confident in the current state of what's | |
|
31 | implemented on the ``ZstdCompressor`` and ``ZstdDecompressor`` types. | |
|
32 | Those APIs likely won't change significantly. Some low-level behavior | |
|
33 | (such as naming and types expected by arguments) may change. | |
|
34 | ||
|
35 | There will likely be arguments added to control the input and output | |
|
36 | buffer sizes (currently, certain operations read and write in chunk | |
|
37 | sizes using zstd's preferred defaults). | |
|
38 | ||
|
39 | There should be an API that accepts an object that conforms to the buffer | |
|
40 | interface and returns an iterator over compressed or decompressed output. | |
|
41 | ||
|
42 | The author is on the fence as to whether to support the extremely | |
|
43 | low level compression and decompression APIs. It could be useful to | |
|
44 | support compression without the framing headers. But the author doesn't | |
|
45 | believe it a high priority at this time. | |
|
46 | ||
|
47 | The CFFI bindings are half-baked and need to be finished. | |
|
48 | ||
|
49 | Requirements | |
|
50 | ============ | |
|
51 | ||
|
52 | This extension is designed to run with Python 2.6, 2.7, 3.3, 3.4, and 3.5 | |
|
53 | on common platforms (Linux, Windows, and OS X). Only x86_64 is currently | |
|
54 | well-tested as an architecture. | |
|
55 | ||
|
56 | Installing | |
|
57 | ========== | |
|
58 | ||
|
59 | This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard. | |
|
60 | So, to install this package:: | |
|
61 | ||
|
62 | $ pip install zstandard | |
|
63 | ||
|
64 | Binary wheels are made available for some platforms. If you need to | |
|
65 | install from a source distribution, all you should need is a working C | |
|
66 | compiler and the Python development headers/libraries. On many Linux | |
|
67 | distributions, you can install a ``python-dev`` or ``python-devel`` | |
|
68 | package to provide these dependencies. | |
|
69 | ||
|
70 | Packages are also uploaded to Anaconda Cloud at | |
|
71 | https://anaconda.org/indygreg/zstandard. See that URL for how to install | |
|
72 | this package with ``conda``. | |
|
73 | ||
|
74 | Performance | |
|
75 | =========== | |
|
76 | ||
|
77 | Very crude and non-scientific benchmarking (most benchmarks fall in this | |
|
78 | category because proper benchmarking is hard) show that the Python bindings | |
|
79 | perform within 10% of the native C implementation. | |
|
80 | ||
|
81 | The following table compares the performance of compressing and decompressing | |
|
82 | a 1.1 GB tar file comprised of the files in a Firefox source checkout. Values | |
|
83 | obtained with the ``zstd`` program are on the left. The remaining columns detail | |
|
84 | performance of various compression APIs in the Python bindings. | |
|
85 | ||
|
86 | +-------+-----------------+-----------------+-----------------+---------------+ | |
|
87 | | Level | Native | Simple | Stream In | Stream Out | | |
|
88 | | | Comp / Decomp | Comp / Decomp | Comp / Decomp | Comp | | |
|
89 | +=======+=================+=================+=================+===============+ | |
|
90 | | 1 | 490 / 1338 MB/s | 458 / 1266 MB/s | 407 / 1156 MB/s | 405 MB/s | | |
|
91 | +-------+-----------------+-----------------+-----------------+---------------+ | |
|
92 | | 2 | 412 / 1288 MB/s | 381 / 1203 MB/s | 345 / 1128 MB/s | 349 MB/s | | |
|
93 | +-------+-----------------+-----------------+-----------------+---------------+ | |
|
94 | | 3 | 342 / 1312 MB/s | 319 / 1182 MB/s | 285 / 1165 MB/s | 287 MB/s | | |
|
95 | +-------+-----------------+-----------------+-----------------+---------------+ | |
|
96 | | 11 | 64 / 1506 MB/s | 66 / 1436 MB/s | 56 / 1342 MB/s | 57 MB/s | | |
|
97 | +-------+-----------------+-----------------+-----------------+---------------+ | |
|
98 | ||
|
99 | Again, these are very unscientific. But it shows that Python is capable of | |
|
100 | compressing at several hundred MB/s and decompressing at over 1 GB/s. | |
|
101 | ||
|
102 | Comparison to Other Python Bindings | |
|
103 | =================================== | |
|
104 | ||
|
105 | https://pypi.python.org/pypi/zstd is an alternative Python binding to | |
|
106 | Zstandard. At the time this was written, the latest release of that | |
|
107 | package (1.0.0.2) had the following significant differences from this package: | |
|
108 | ||
|
109 | * It only exposes the simple API for compression and decompression operations. | |
|
110 | This extension exposes the streaming API, dictionary training, and more. | |
|
111 | * It adds a custom framing header to compressed data and there is no way to | |
|
112 | disable it. This means that data produced with that module cannot be used by | |
|
113 | other Zstandard implementations. | |
|
114 | ||
|
115 | Bundling of Zstandard Source Code | |
|
116 | ================================= | |
|
117 | ||
|
118 | The source repository for this project contains a vendored copy of the | |
|
119 | Zstandard source code. This is done for a few reasons. | |
|
120 | ||
|
121 | First, Zstandard is relatively new and not yet widely available as a system | |
|
122 | package. Providing a copy of the source code enables the Python C extension | |
|
123 | to be compiled without requiring the user to obtain the Zstandard source code | |
|
124 | separately. | |
|
125 | ||
|
126 | Second, Zstandard has both a stable *public* API and an *experimental* API. | |
|
127 | The *experimental* API is actually quite useful (contains functionality for | |
|
128 | training dictionaries for example), so it is something we wish to expose to | |
|
129 | Python. However, the *experimental* API is only available via static linking. | |
|
130 | Furthermore, the *experimental* API can change at any time. So, control over | |
|
131 | the exact version of the Zstandard library linked against is important to | |
|
132 | ensure known behavior. | |
|
133 | ||
|
134 | Instructions for Building and Testing | |
|
135 | ===================================== | |
|
136 | ||
|
137 | Once you have the source code, the extension can be built via setup.py:: | |
|
138 | ||
|
139 | $ python setup.py build_ext | |
|
140 | ||
|
141 | We recommend testing with ``nose``:: | |
|
142 | ||
|
143 | $ nosetests | |
|
144 | ||
|
145 | A Tox configuration is present to test against multiple Python versions:: | |
|
146 | ||
|
147 | $ tox | |
|
148 | ||
|
149 | Tests use the ``hypothesis`` Python package to perform fuzzing. If you | |
|
150 | don't have it, those tests won't run. | |
|
151 | ||
|
152 | There is also an experimental CFFI module. You need the ``cffi`` Python | |
|
153 | package installed to build and test that. | |
|
154 | ||
|
155 | To create a virtualenv with all development dependencies, do something | |
|
156 | like the following:: | |
|
157 | ||
|
158 | # Python 2 | |
|
159 | $ virtualenv venv | |
|
160 | ||
|
161 | # Python 3 | |
|
162 | $ python3 -m venv venv | |
|
163 | ||
|
164 | $ source venv/bin/activate | |
|
165 | $ pip install cffi hypothesis nose tox | |
|
166 | ||
|
167 | API | |
|
168 | === | |
|
169 | ||
|
170 | The compiled C extension provides a ``zstd`` Python module. This module | |
|
171 | exposes the following interfaces. | |
|
172 | ||
|
173 | ZstdCompressor | |
|
174 | -------------- | |
|
175 | ||
|
176 | The ``ZstdCompressor`` class provides an interface for performing | |
|
177 | compression operations. | |
|
178 | ||
|
179 | Each instance is associated with parameters that control compression | |
|
180 | behavior. These come from the following named arguments (all optional): | |
|
181 | ||
|
182 | level | |
|
183 | Integer compression level. Valid values are between 1 and 22. | |
|
184 | dict_data | |
|
185 | Compression dictionary to use. | |
|
186 | ||
|
187 | Note: When using dictionary data and ``compress()`` is called multiple | |
|
188 | times, the ``CompressionParameters`` derived from an integer compression | |
|
189 | ``level`` and the first compressed data's size will be reused for all | |
|
190 | subsequent operations. This may not be desirable if source data size | |
|
191 | varies significantly. | |
|
192 | compression_params | |
|
193 | A ``CompressionParameters`` instance (overrides the ``level`` value). | |
|
194 | write_checksum | |
|
195 | Whether a 4 byte checksum should be written with the compressed data. | |
|
196 | Defaults to False. If True, the decompressor can verify that decompressed | |
|
197 | data matches the original input data. | |
|
198 | write_content_size | |
|
199 | Whether the size of the uncompressed data will be written into the | |
|
200 | header of compressed data. Defaults to False. The data will only be | |
|
201 | written if the compressor knows the size of the input data. This is | |
|
202 | likely not true for streaming compression. | |
|
203 | write_dict_id | |
|
204 | Whether to write the dictionary ID into the compressed data. | |
|
205 | Defaults to True. The dictionary ID is only written if a dictionary | |
|
206 | is being used. | |
|
207 | ||
|
208 | Simple API | |
|
209 | ^^^^^^^^^^ | |
|
210 | ||
|
211 | ``compress(data)`` compresses and returns data as a one-shot operation.:: | |
|
212 | ||
|
213 | cctx = zstd.ZsdCompressor() | |
|
214 | compressed = cctx.compress(b'data to compress') | |
|
215 | ||
|
216 | Streaming Input API | |
|
217 | ^^^^^^^^^^^^^^^^^^^ | |
|
218 | ||
|
219 | ``write_to(fh)`` (which behaves as a context manager) allows you to *stream* | |
|
220 | data into a compressor.:: | |
|
221 | ||
|
222 | cctx = zstd.ZstdCompressor(level=10) | |
|
223 | with cctx.write_to(fh) as compressor: | |
|
224 | compressor.write(b'chunk 0') | |
|
225 | compressor.write(b'chunk 1') | |
|
226 | ... | |
|
227 | ||
|
228 | The argument to ``write_to()`` must have a ``write(data)`` method. As | |
|
229 | compressed data is available, ``write()`` will be called with the comrpessed | |
|
230 | data as its argument. Many common Python types implement ``write()``, including | |
|
231 | open file handles and ``io.BytesIO``. | |
|
232 | ||
|
233 | ``write_to()`` returns an object representing a streaming compressor instance. | |
|
234 | It **must** be used as a context manager. That object's ``write(data)`` method | |
|
235 | is used to feed data into the compressor. | |
|
236 | ||
|
237 | If the size of the data being fed to this streaming compressor is known, | |
|
238 | you can declare it before compression begins:: | |
|
239 | ||
|
240 | cctx = zstd.ZstdCompressor() | |
|
241 | with cctx.write_to(fh, size=data_len) as compressor: | |
|
242 | compressor.write(chunk0) | |
|
243 | compressor.write(chunk1) | |
|
244 | ... | |
|
245 | ||
|
246 | Declaring the size of the source data allows compression parameters to | |
|
247 | be tuned. And if ``write_content_size`` is used, it also results in the | |
|
248 | content size being written into the frame header of the output data. | |
|
249 | ||
|
250 | The size of chunks being ``write()`` to the destination can be specified:: | |
|
251 | ||
|
252 | cctx = zstd.ZstdCompressor() | |
|
253 | with cctx.write_to(fh, write_size=32768) as compressor: | |
|
254 | ... | |
|
255 | ||
|
256 | To see how much memory is being used by the streaming compressor:: | |
|
257 | ||
|
258 | cctx = zstd.ZstdCompressor() | |
|
259 | with cctx.write_to(fh) as compressor: | |
|
260 | ... | |
|
261 | byte_size = compressor.memory_size() | |
|
262 | ||
|
263 | Streaming Output API | |
|
264 | ^^^^^^^^^^^^^^^^^^^^ | |
|
265 | ||
|
266 | ``read_from(reader)`` provides a mechanism to stream data out of a compressor | |
|
267 | as an iterator of data chunks.:: | |
|
268 | ||
|
269 | cctx = zstd.ZstdCompressor() | |
|
270 | for chunk in cctx.read_from(fh): | |
|
271 | # Do something with emitted data. | |
|
272 | ||
|
273 | ``read_from()`` accepts an object that has a ``read(size)`` method or conforms | |
|
274 | to the buffer protocol. (``bytes`` and ``memoryview`` are 2 common types that | |
|
275 | provide the buffer protocol.) | |
|
276 | ||
|
277 | Uncompressed data is fetched from the source either by calling ``read(size)`` | |
|
278 | or by fetching a slice of data from the object directly (in the case where | |
|
279 | the buffer protocol is being used). The returned iterator consists of chunks | |
|
280 | of compressed data. | |
|
281 | ||
|
282 | Like ``write_to()``, ``read_from()`` also accepts a ``size`` argument | |
|
283 | declaring the size of the input stream:: | |
|
284 | ||
|
285 | cctx = zstd.ZstdCompressor() | |
|
286 | for chunk in cctx.read_from(fh, size=some_int): | |
|
287 | pass | |
|
288 | ||
|
289 | You can also control the size that data is ``read()`` from the source and | |
|
290 | the ideal size of output chunks:: | |
|
291 | ||
|
292 | cctx = zstd.ZstdCompressor() | |
|
293 | for chunk in cctx.read_from(fh, read_size=16384, write_size=8192): | |
|
294 | pass | |
|
295 | ||
|
296 | Stream Copying API | |
|
297 | ^^^^^^^^^^^^^^^^^^ | |
|
298 | ||
|
299 | ``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while | |
|
300 | compressing it.:: | |
|
301 | ||
|
302 | cctx = zstd.ZstdCompressor() | |
|
303 | cctx.copy_stream(ifh, ofh) | |
|
304 | ||
|
305 | For example, say you wish to compress a file:: | |
|
306 | ||
|
307 | cctx = zstd.ZstdCompressor() | |
|
308 | with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: | |
|
309 | cctx.copy_stream(ifh, ofh) | |
|
310 | ||
|
311 | It is also possible to declare the size of the source stream:: | |
|
312 | ||
|
313 | cctx = zstd.ZstdCompressor() | |
|
314 | cctx.copy_stream(ifh, ofh, size=len_of_input) | |
|
315 | ||
|
316 | You can also specify how large the chunks that are ``read()`` and ``write()`` | |
|
317 | from and to the streams:: | |
|
318 | ||
|
319 | cctx = zstd.ZstdCompressor() | |
|
320 | cctx.copy_stream(ifh, ofh, read_size=32768, write_size=16384) | |
|
321 | ||
|
322 | The stream copier returns a 2-tuple of bytes read and written:: | |
|
323 | ||
|
324 | cctx = zstd.ZstdCompressor() | |
|
325 | read_count, write_count = cctx.copy_stream(ifh, ofh) | |
|
326 | ||
|
327 | Compressor API | |
|
328 | ^^^^^^^^^^^^^^ | |
|
329 | ||
|
330 | ``compressobj()`` returns an object that exposes ``compress(data)`` and | |
|
331 | ``flush()`` methods. Each returns compressed data or an empty bytes. | |
|
332 | ||
|
333 | The purpose of ``compressobj()`` is to provide an API-compatible interface | |
|
334 | with ``zlib.compressobj`` and ``bz2.BZ2Compressor``. This allows callers to | |
|
335 | swap in different compressor objects while using the same API. | |
|
336 | ||
|
337 | Once ``flush()`` is called, the compressor will no longer accept new data | |
|
338 | to ``compress()``. ``flush()`` **must** be called to end the compression | |
|
339 | context. If not called, the returned data may be incomplete. | |
|
340 | ||
|
341 | Here is how this API should be used:: | |
|
342 | ||
|
343 | cctx = zstd.ZstdCompressor() | |
|
344 | cobj = cctx.compressobj() | |
|
345 | data = cobj.compress(b'raw input 0') | |
|
346 | data = cobj.compress(b'raw input 1') | |
|
347 | data = cobj.flush() | |
|
348 | ||
|
349 | For best performance results, keep input chunks under 256KB. This avoids | |
|
350 | extra allocations for a large output object. | |
|
351 | ||
|
352 | It is possible to declare the input size of the data that will be fed into | |
|
353 | the compressor:: | |
|
354 | ||
|
355 | cctx = zstd.ZstdCompressor() | |
|
356 | cobj = cctx.compressobj(size=6) | |
|
357 | data = cobj.compress(b'foobar') | |
|
358 | data = cobj.flush() | |
|
359 | ||
|
360 | ZstdDecompressor | |
|
361 | ---------------- | |
|
362 | ||
|
363 | The ``ZstdDecompressor`` class provides an interface for performing | |
|
364 | decompression. | |
|
365 | ||
|
366 | Each instance is associated with parameters that control decompression. These | |
|
367 | come from the following named arguments (all optional): | |
|
368 | ||
|
369 | dict_data | |
|
370 | Compression dictionary to use. | |
|
371 | ||
|
372 | The interface of this class is very similar to ``ZstdCompressor`` (by design). | |
|
373 | ||
|
374 | Simple API | |
|
375 | ^^^^^^^^^^ | |
|
376 | ||
|
377 | ``decompress(data)`` can be used to decompress an entire compressed zstd | |
|
378 | frame in a single operation.:: | |
|
379 | ||
|
380 | dctx = zstd.ZstdDecompressor() | |
|
381 | decompressed = dctx.decompress(data) | |
|
382 | ||
|
383 | By default, ``decompress(data)`` will only work on data written with the content | |
|
384 | size encoded in its header. This can be achieved by creating a | |
|
385 | ``ZstdCompressor`` with ``write_content_size=True``. If compressed data without | |
|
386 | an embedded content size is seen, ``zstd.ZstdError`` will be raised. | |
|
387 | ||
|
388 | If the compressed data doesn't have its content size embedded within it, | |
|
389 | decompression can be attempted by specifying the ``max_output_size`` | |
|
390 | argument.:: | |
|
391 | ||
|
392 | dctx = zstd.ZstdDecompressor() | |
|
393 | uncompressed = dctx.decompress(data, max_output_size=1048576) | |
|
394 | ||
|
395 | Ideally, ``max_output_size`` will be identical to the decompressed output | |
|
396 | size. | |
|
397 | ||
|
398 | If ``max_output_size`` is too small to hold the decompressed data, | |
|
399 | ``zstd.ZstdError`` will be raised. | |
|
400 | ||
|
401 | If ``max_output_size`` is larger than the decompressed data, the allocated | |
|
402 | output buffer will be resized to only use the space required. | |
|
403 | ||
|
404 | Please note that an allocation of the requested ``max_output_size`` will be | |
|
405 | performed every time the method is called. Setting to a very large value could | |
|
406 | result in a lot of work for the memory allocator and may result in | |
|
407 | ``MemoryError`` being raised if the allocation fails. | |
|
408 | ||
|
409 | If the exact size of decompressed data is unknown, it is **strongly** | |
|
410 | recommended to use a streaming API. | |
|
411 | ||
|
412 | Streaming Input API | |
|
413 | ^^^^^^^^^^^^^^^^^^^ | |
|
414 | ||
|
415 | ``write_to(fh)`` can be used to incrementally send compressed data to a | |
|
416 | decompressor.:: | |
|
417 | ||
|
418 | dctx = zstd.ZstdDecompressor() | |
|
419 | with dctx.write_to(fh) as decompressor: | |
|
420 | decompressor.write(compressed_data) | |
|
421 | ||
|
422 | This behaves similarly to ``zstd.ZstdCompressor``: compressed data is written to | |
|
423 | the decompressor by calling ``write(data)`` and decompressed output is written | |
|
424 | to the output object by calling its ``write(data)`` method. | |
|
425 | ||
|
426 | The size of chunks being ``write()`` to the destination can be specified:: | |
|
427 | ||
|
428 | dctx = zstd.ZstdDecompressor() | |
|
429 | with dctx.write_to(fh, write_size=16384) as decompressor: | |
|
430 | pass | |
|
431 | ||
|
432 | You can see how much memory is being used by the decompressor:: | |
|
433 | ||
|
434 | dctx = zstd.ZstdDecompressor() | |
|
435 | with dctx.write_to(fh) as decompressor: | |
|
436 | byte_size = decompressor.memory_size() | |
|
437 | ||
|
438 | Streaming Output API | |
|
439 | ^^^^^^^^^^^^^^^^^^^^ | |
|
440 | ||
|
441 | ``read_from(fh)`` provides a mechanism to stream decompressed data out of a | |
|
442 | compressed source as an iterator of data chunks.:: | |
|
443 | ||
|
444 | dctx = zstd.ZstdDecompressor() | |
|
445 | for chunk in dctx.read_from(fh): | |
|
446 | # Do something with original data. | |
|
447 | ||
|
448 | ``read_from()`` accepts a) an object with a ``read(size)`` method that will | |
|
449 | return compressed bytes b) an object conforming to the buffer protocol that | |
|
450 | can expose its data as a contiguous range of bytes. The ``bytes`` and | |
|
451 | ``memoryview`` types expose this buffer protocol. | |
|
452 | ||
|
453 | ``read_from()`` returns an iterator whose elements are chunks of the | |
|
454 | decompressed data. | |
|
455 | ||
|
456 | The size of requested ``read()`` from the source can be specified:: | |
|
457 | ||
|
458 | dctx = zstd.ZstdDecompressor() | |
|
459 | for chunk in dctx.read_from(fh, read_size=16384): | |
|
460 | pass | |
|
461 | ||
|
462 | It is also possible to skip leading bytes in the input data:: | |
|
463 | ||
|
464 | dctx = zstd.ZstdDecompressor() | |
|
465 | for chunk in dctx.read_from(fh, skip_bytes=1): | |
|
466 | pass | |
|
467 | ||
|
468 | Skipping leading bytes is useful if the source data contains extra | |
|
469 | *header* data but you want to avoid the overhead of making a buffer copy | |
|
470 | or allocating a new ``memoryview`` object in order to decompress the data. | |
|
471 | ||
|
472 | Similarly to ``ZstdCompressor.read_from()``, the consumer of the iterator | |
|
473 | controls when data is decompressed. If the iterator isn't consumed, | |
|
474 | decompression is put on hold. | |
|
475 | ||
|
476 | When ``read_from()`` is passed an object conforming to the buffer protocol, | |
|
477 | the behavior may seem similar to what occurs when the simple decompression | |
|
478 | API is used. However, this API works when the decompressed size is unknown. | |
|
479 | Furthermore, if feeding large inputs, the decompressor will work in chunks | |
|
480 | instead of performing a single operation. | |
|
481 | ||
|
482 | Stream Copying API | |
|
483 | ^^^^^^^^^^^^^^^^^^ | |
|
484 | ||
|
485 | ``copy_stream(ifh, ofh)`` can be used to copy data across 2 streams while | |
|
486 | performing decompression.:: | |
|
487 | ||
|
488 | dctx = zstd.ZstdDecompressor() | |
|
489 | dctx.copy_stream(ifh, ofh) | |
|
490 | ||
|
491 | e.g. to decompress a file to another file:: | |
|
492 | ||
|
493 | dctx = zstd.ZstdDecompressor() | |
|
494 | with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: | |
|
495 | dctx.copy_stream(ifh, ofh) | |
|
496 | ||
|
497 | The size of chunks being ``read()`` and ``write()`` from and to the streams | |
|
498 | can be specified:: | |
|
499 | ||
|
500 | dctx = zstd.ZstdDecompressor() | |
|
501 | dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384) | |
|
502 | ||
|
503 | Decompressor API | |
|
504 | ^^^^^^^^^^^^^^^^ | |
|
505 | ||
|
506 | ``decompressobj()`` returns an object that exposes a ``decompress(data)`` | |
|
507 | methods. Compressed data chunks are fed into ``decompress(data)`` and | |
|
508 | uncompressed output (or an empty bytes) is returned. Output from subsequent | |
|
509 | calls needs to be concatenated to reassemble the full decompressed byte | |
|
510 | sequence. | |
|
511 | ||
|
512 | The purpose of ``decompressobj()`` is to provide an API-compatible interface | |
|
513 | with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor``. This allows callers | |
|
514 | to swap in different decompressor objects while using the same API. | |
|
515 | ||
|
516 | Each object is single use: once an input frame is decoded, ``decompress()`` | |
|
517 | can no longer be called. | |
|
518 | ||
|
519 | Here is how this API should be used:: | |
|
520 | ||
|
521 | dctx = zstd.ZstdDeompressor() | |
|
522 | dobj = cctx.decompressobj() | |
|
523 | data = dobj.decompress(compressed_chunk_0) | |
|
524 | data = dobj.decompress(compressed_chunk_1) | |
|
525 | ||
|
526 | Choosing an API | |
|
527 | --------------- | |
|
528 | ||
|
529 | Various forms of compression and decompression APIs are provided because each | |
|
530 | are suitable for different use cases. | |
|
531 | ||
|
532 | The simple/one-shot APIs are useful for small data, when the decompressed | |
|
533 | data size is known (either recorded in the zstd frame header via | |
|
534 | ``write_content_size`` or known via an out-of-band mechanism, such as a file | |
|
535 | size). | |
|
536 | ||
|
537 | A limitation of the simple APIs is that input or output data must fit in memory. | |
|
538 | And unless using advanced tricks with Python *buffer objects*, both input and | |
|
539 | output must fit in memory simultaneously. | |
|
540 | ||
|
541 | Another limitation is that compression or decompression is performed as a single | |
|
542 | operation. So if you feed large input, it could take a long time for the | |
|
543 | function to return. | |
|
544 | ||
|
545 | The streaming APIs do not have the limitations of the simple API. The cost to | |
|
546 | this is they are more complex to use than a single function call. | |
|
547 | ||
|
548 | The streaming APIs put the caller in control of compression and decompression | |
|
549 | behavior by allowing them to directly control either the input or output side | |
|
550 | of the operation. | |
|
551 | ||
|
552 | With the streaming input APIs, the caller feeds data into the compressor or | |
|
553 | decompressor as they see fit. Output data will only be written after the caller | |
|
554 | has explicitly written data. | |
|
555 | ||
|
556 | With the streaming output APIs, the caller consumes output from the compressor | |
|
557 | or decompressor as they see fit. The compressor or decompressor will only | |
|
558 | consume data from the source when the caller is ready to receive it. | |
|
559 | ||
|
560 | One end of the streaming APIs involves a file-like object that must | |
|
561 | ``write()`` output data or ``read()`` input data. Depending on what the | |
|
562 | backing storage for these objects is, those operations may not complete quickly. | |
|
563 | For example, when streaming compressed data to a file, the ``write()`` into | |
|
564 | a streaming compressor could result in a ``write()`` to the filesystem, which | |
|
565 | may take a long time to finish due to slow I/O on the filesystem. So, there | |
|
566 | may be overhead in streaming APIs beyond the compression and decompression | |
|
567 | operations. | |
|
568 | ||
|
569 | Dictionary Creation and Management | |
|
570 | ---------------------------------- | |
|
571 | ||
|
572 | Zstandard allows *dictionaries* to be used when compressing and | |
|
573 | decompressing data. The idea is that if you are compressing a lot of similar | |
|
574 | data, you can precompute common properties of that data (such as recurring | |
|
575 | byte sequences) to achieve better compression ratios. | |
|
576 | ||
|
577 | In Python, compression dictionaries are represented as the | |
|
578 | ``ZstdCompressionDict`` type. | |
|
579 | ||
|
580 | Instances can be constructed from bytes:: | |
|
581 | ||
|
582 | dict_data = zstd.ZstdCompressionDict(data) | |
|
583 | ||
|
584 | More interestingly, instances can be created by *training* on sample data:: | |
|
585 | ||
|
586 | dict_data = zstd.train_dictionary(size, samples) | |
|
587 | ||
|
588 | This takes a list of bytes instances and creates and returns a | |
|
589 | ``ZstdCompressionDict``. | |
|
590 | ||
|
591 | You can see how many bytes are in the dictionary by calling ``len()``:: | |
|
592 | ||
|
593 | dict_data = zstd.train_dictionary(size, samples) | |
|
594 | dict_size = len(dict_data) # will not be larger than ``size`` | |
|
595 | ||
|
596 | Once you have a dictionary, you can pass it to the objects performing | |
|
597 | compression and decompression:: | |
|
598 | ||
|
599 | dict_data = zstd.train_dictionary(16384, samples) | |
|
600 | ||
|
601 | cctx = zstd.ZstdCompressor(dict_data=dict_data) | |
|
602 | for source_data in input_data: | |
|
603 | compressed = cctx.compress(source_data) | |
|
604 | # Do something with compressed data. | |
|
605 | ||
|
606 | dctx = zstd.ZstdDecompressor(dict_data=dict_data) | |
|
607 | for compressed_data in input_data: | |
|
608 | buffer = io.BytesIO() | |
|
609 | with dctx.write_to(buffer) as decompressor: | |
|
610 | decompressor.write(compressed_data) | |
|
611 | # Do something with raw data in ``buffer``. | |
|
612 | ||
|
613 | Dictionaries have unique integer IDs. You can retrieve this ID via:: | |
|
614 | ||
|
615 | dict_id = zstd.dictionary_id(dict_data) | |
|
616 | ||
|
617 | You can obtain the raw data in the dict (useful for persisting and constructing | |
|
618 | a ``ZstdCompressionDict`` later) via ``as_bytes()``:: | |
|
619 | ||
|
620 | dict_data = zstd.train_dictionary(size, samples) | |
|
621 | raw_data = dict_data.as_bytes() | |
|
622 | ||
|
623 | Explicit Compression Parameters | |
|
624 | ------------------------------- | |
|
625 | ||
|
626 | Zstandard's integer compression levels along with the input size and dictionary | |
|
627 | size are converted into a data structure defining multiple parameters to tune | |
|
628 | behavior of the compression algorithm. It is possible to use define this | |
|
629 | data structure explicitly to have lower-level control over compression behavior. | |
|
630 | ||
|
631 | The ``zstd.CompressionParameters`` type represents this data structure. | |
|
632 | You can see how Zstandard converts compression levels to this data structure | |
|
633 | by calling ``zstd.get_compression_parameters()``. e.g.:: | |
|
634 | ||
|
635 | params = zstd.get_compression_parameters(5) | |
|
636 | ||
|
637 | This function also accepts the uncompressed data size and dictionary size | |
|
638 | to adjust parameters:: | |
|
639 | ||
|
640 | params = zstd.get_compression_parameters(3, source_size=len(data), dict_size=len(dict_data)) | |
|
641 | ||
|
642 | You can also construct compression parameters from their low-level components:: | |
|
643 | ||
|
644 | params = zstd.CompressionParameters(20, 6, 12, 5, 4, 10, zstd.STRATEGY_FAST) | |
|
645 | ||
|
646 | You can then configure a compressor to use the custom parameters:: | |
|
647 | ||
|
648 | cctx = zstd.ZstdCompressor(compression_params=params) | |
|
649 | ||
|
650 | The members of the ``CompressionParameters`` tuple are as follows:: | |
|
651 | ||
|
652 | * 0 - Window log | |
|
653 | * 1 - Chain log | |
|
654 | * 2 - Hash log | |
|
655 | * 3 - Search log | |
|
656 | * 4 - Search length | |
|
657 | * 5 - Target length | |
|
658 | * 6 - Strategy (one of the ``zstd.STRATEGY_`` constants) | |
|
659 | ||
|
660 | You'll need to read the Zstandard documentation for what these parameters | |
|
661 | do. | |
|
662 | ||
|
663 | Misc Functionality | |
|
664 | ------------------ | |
|
665 | ||
|
666 | estimate_compression_context_size(CompressionParameters) | |
|
667 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
|
668 | ||
|
669 | Given a ``CompressionParameters`` struct, estimate the memory size required | |
|
670 | to perform compression. | |
|
671 | ||
|
672 | estimate_decompression_context_size() | |
|
673 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
|
674 | ||
|
675 | Estimate the memory size requirements for a decompressor instance. | |
|
676 | ||
|
677 | Constants | |
|
678 | --------- | |
|
679 | ||
|
680 | The following module constants/attributes are exposed: | |
|
681 | ||
|
682 | ZSTD_VERSION | |
|
683 | This module attribute exposes a 3-tuple of the Zstandard version. e.g. | |
|
684 | ``(1, 0, 0)`` | |
|
685 | MAX_COMPRESSION_LEVEL | |
|
686 | Integer max compression level accepted by compression functions | |
|
687 | COMPRESSION_RECOMMENDED_INPUT_SIZE | |
|
688 | Recommended chunk size to feed to compressor functions | |
|
689 | COMPRESSION_RECOMMENDED_OUTPUT_SIZE | |
|
690 | Recommended chunk size for compression output | |
|
691 | DECOMPRESSION_RECOMMENDED_INPUT_SIZE | |
|
692 | Recommended chunk size to feed into decompresor functions | |
|
693 | DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE | |
|
694 | Recommended chunk size for decompression output | |
|
695 | ||
|
696 | FRAME_HEADER | |
|
697 | bytes containing header of the Zstandard frame | |
|
698 | MAGIC_NUMBER | |
|
699 | Frame header as an integer | |
|
700 | ||
|
701 | WINDOWLOG_MIN | |
|
702 | Minimum value for compression parameter | |
|
703 | WINDOWLOG_MAX | |
|
704 | Maximum value for compression parameter | |
|
705 | CHAINLOG_MIN | |
|
706 | Minimum value for compression parameter | |
|
707 | CHAINLOG_MAX | |
|
708 | Maximum value for compression parameter | |
|
709 | HASHLOG_MIN | |
|
710 | Minimum value for compression parameter | |
|
711 | HASHLOG_MAX | |
|
712 | Maximum value for compression parameter | |
|
713 | SEARCHLOG_MIN | |
|
714 | Minimum value for compression parameter | |
|
715 | SEARCHLOG_MAX | |
|
716 | Maximum value for compression parameter | |
|
717 | SEARCHLENGTH_MIN | |
|
718 | Minimum value for compression parameter | |
|
719 | SEARCHLENGTH_MAX | |
|
720 | Maximum value for compression parameter | |
|
721 | TARGETLENGTH_MIN | |
|
722 | Minimum value for compression parameter | |
|
723 | TARGETLENGTH_MAX | |
|
724 | Maximum value for compression parameter | |
|
725 | STRATEGY_FAST | |
|
726 | Compression strategory | |
|
727 | STRATEGY_DFAST | |
|
728 | Compression strategory | |
|
729 | STRATEGY_GREEDY | |
|
730 | Compression strategory | |
|
731 | STRATEGY_LAZY | |
|
732 | Compression strategory | |
|
733 | STRATEGY_LAZY2 | |
|
734 | Compression strategory | |
|
735 | STRATEGY_BTLAZY2 | |
|
736 | Compression strategory | |
|
737 | STRATEGY_BTOPT | |
|
738 | Compression strategory | |
|
739 | ||
|
740 | Note on Zstandard's *Experimental* API | |
|
741 | ====================================== | |
|
742 | ||
|
743 | Many of the Zstandard APIs used by this module are marked as *experimental* | |
|
744 | within the Zstandard project. This includes a large number of useful | |
|
745 | features, such as compression and frame parameters and parts of dictionary | |
|
746 | compression. | |
|
747 | ||
|
748 | It is unclear how Zstandard's C API will evolve over time, especially with | |
|
749 | regards to this *experimental* functionality. We will try to maintain | |
|
750 | backwards compatibility at the Python API level. However, we cannot | |
|
751 | guarantee this for things not under our control. | |
|
752 | ||
|
753 | Since a copy of the Zstandard source code is distributed with this | |
|
754 | module and since we compile against it, the behavior of a specific | |
|
755 | version of this module should be constant for all of time. So if you | |
|
756 | pin the version of this module used in your projects (which is a Python | |
|
757 | best practice), you should be buffered from unwanted future changes. | |
|
758 | ||
|
759 | Donate | |
|
760 | ====== | |
|
761 | ||
|
762 | A lot of time has been invested into this project by the author. | |
|
763 | ||
|
764 | If you find this project useful and would like to thank the author for | |
|
765 | their work, consider donating some money. Any amount is appreciated. | |
|
766 | ||
|
767 | .. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif | |
|
768 | :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=gregory%2eszorc%40gmail%2ecom&lc=US&item_name=python%2dzstandard¤cy_code=USD&bn=PP%2dDonationsBF%3abtn_donate_LG%2egif%3aNonHosted | |
|
769 | :alt: Donate via PayPal | |
|
770 | ||
|
771 | .. |ci-status| image:: https://travis-ci.org/indygreg/python-zstandard.svg?branch=master | |
|
772 | :target: https://travis-ci.org/indygreg/python-zstandard | |
|
773 | ||
|
774 | .. |win-ci-status| image:: https://ci.appveyor.com/api/projects/status/github/indygreg/python-zstandard?svg=true | |
|
775 | :target: https://ci.appveyor.com/project/indygreg/python-zstandard | |
|
776 | :alt: Windows build status |
@@ -0,0 +1,247 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | extern PyObject* ZstdError; | |
|
12 | ||
|
13 | ZstdCompressionDict* train_dictionary(PyObject* self, PyObject* args, PyObject* kwargs) { | |
|
14 | static char *kwlist[] = { "dict_size", "samples", "parameters", NULL }; | |
|
15 | size_t capacity; | |
|
16 | PyObject* samples; | |
|
17 | Py_ssize_t samplesLen; | |
|
18 | PyObject* parameters = NULL; | |
|
19 | ZDICT_params_t zparams; | |
|
20 | Py_ssize_t sampleIndex; | |
|
21 | Py_ssize_t sampleSize; | |
|
22 | PyObject* sampleItem; | |
|
23 | size_t zresult; | |
|
24 | void* sampleBuffer; | |
|
25 | void* sampleOffset; | |
|
26 | size_t samplesSize = 0; | |
|
27 | size_t* sampleSizes; | |
|
28 | void* dict; | |
|
29 | ZstdCompressionDict* result; | |
|
30 | ||
|
31 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "nO!|O!", kwlist, | |
|
32 | &capacity, | |
|
33 | &PyList_Type, &samples, | |
|
34 | (PyObject*)&DictParametersType, ¶meters)) { | |
|
35 | return NULL; | |
|
36 | } | |
|
37 | ||
|
38 | /* Validate parameters first since it is easiest. */ | |
|
39 | zparams.selectivityLevel = 0; | |
|
40 | zparams.compressionLevel = 0; | |
|
41 | zparams.notificationLevel = 0; | |
|
42 | zparams.dictID = 0; | |
|
43 | zparams.reserved[0] = 0; | |
|
44 | zparams.reserved[1] = 0; | |
|
45 | ||
|
46 | if (parameters) { | |
|
47 | /* TODO validate data ranges */ | |
|
48 | zparams.selectivityLevel = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 0)); | |
|
49 | zparams.compressionLevel = PyLong_AsLong(PyTuple_GetItem(parameters, 1)); | |
|
50 | zparams.notificationLevel = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 2)); | |
|
51 | zparams.dictID = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 3)); | |
|
52 | } | |
|
53 | ||
|
54 | /* Figure out the size of the raw samples */ | |
|
55 | samplesLen = PyList_Size(samples); | |
|
56 | for (sampleIndex = 0; sampleIndex < samplesLen; sampleIndex++) { | |
|
57 | sampleItem = PyList_GetItem(samples, sampleIndex); | |
|
58 | if (!PyBytes_Check(sampleItem)) { | |
|
59 | PyErr_SetString(PyExc_ValueError, "samples must be bytes"); | |
|
60 | /* TODO probably need to perform DECREF here */ | |
|
61 | return NULL; | |
|
62 | } | |
|
63 | samplesSize += PyBytes_GET_SIZE(sampleItem); | |
|
64 | } | |
|
65 | ||
|
66 | /* Now that we know the total size of the raw simples, we can allocate | |
|
67 | a buffer for the raw data */ | |
|
68 | sampleBuffer = malloc(samplesSize); | |
|
69 | if (!sampleBuffer) { | |
|
70 | PyErr_NoMemory(); | |
|
71 | return NULL; | |
|
72 | } | |
|
73 | sampleSizes = malloc(samplesLen * sizeof(size_t)); | |
|
74 | if (!sampleSizes) { | |
|
75 | free(sampleBuffer); | |
|
76 | PyErr_NoMemory(); | |
|
77 | return NULL; | |
|
78 | } | |
|
79 | ||
|
80 | sampleOffset = sampleBuffer; | |
|
81 | /* Now iterate again and assemble the samples in the buffer */ | |
|
82 | for (sampleIndex = 0; sampleIndex < samplesLen; sampleIndex++) { | |
|
83 | sampleItem = PyList_GetItem(samples, sampleIndex); | |
|
84 | sampleSize = PyBytes_GET_SIZE(sampleItem); | |
|
85 | sampleSizes[sampleIndex] = sampleSize; | |
|
86 | memcpy(sampleOffset, PyBytes_AS_STRING(sampleItem), sampleSize); | |
|
87 | sampleOffset = (char*)sampleOffset + sampleSize; | |
|
88 | } | |
|
89 | ||
|
90 | dict = malloc(capacity); | |
|
91 | if (!dict) { | |
|
92 | free(sampleSizes); | |
|
93 | free(sampleBuffer); | |
|
94 | PyErr_NoMemory(); | |
|
95 | return NULL; | |
|
96 | } | |
|
97 | ||
|
98 | zresult = ZDICT_trainFromBuffer_advanced(dict, capacity, | |
|
99 | sampleBuffer, sampleSizes, (unsigned int)samplesLen, | |
|
100 | zparams); | |
|
101 | if (ZDICT_isError(zresult)) { | |
|
102 | PyErr_Format(ZstdError, "Cannot train dict: %s", ZDICT_getErrorName(zresult)); | |
|
103 | free(dict); | |
|
104 | free(sampleSizes); | |
|
105 | free(sampleBuffer); | |
|
106 | return NULL; | |
|
107 | } | |
|
108 | ||
|
109 | result = PyObject_New(ZstdCompressionDict, &ZstdCompressionDictType); | |
|
110 | if (!result) { | |
|
111 | return NULL; | |
|
112 | } | |
|
113 | ||
|
114 | result->dictData = dict; | |
|
115 | result->dictSize = zresult; | |
|
116 | return result; | |
|
117 | } | |
|
118 | ||
|
119 | ||
|
120 | PyDoc_STRVAR(ZstdCompressionDict__doc__, | |
|
121 | "ZstdCompressionDict(data) - Represents a computed compression dictionary\n" | |
|
122 | "\n" | |
|
123 | "This type holds the results of a computed Zstandard compression dictionary.\n" | |
|
124 | "Instances are obtained by calling ``train_dictionary()`` or by passing bytes\n" | |
|
125 | "obtained from another source into the constructor.\n" | |
|
126 | ); | |
|
127 | ||
|
128 | static int ZstdCompressionDict_init(ZstdCompressionDict* self, PyObject* args) { | |
|
129 | const char* source; | |
|
130 | Py_ssize_t sourceSize; | |
|
131 | ||
|
132 | self->dictData = NULL; | |
|
133 | self->dictSize = 0; | |
|
134 | ||
|
135 | #if PY_MAJOR_VERSION >= 3 | |
|
136 | if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) { | |
|
137 | #else | |
|
138 | if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) { | |
|
139 | #endif | |
|
140 | return -1; | |
|
141 | } | |
|
142 | ||
|
143 | self->dictData = malloc(sourceSize); | |
|
144 | if (!self->dictData) { | |
|
145 | PyErr_NoMemory(); | |
|
146 | return -1; | |
|
147 | } | |
|
148 | ||
|
149 | memcpy(self->dictData, source, sourceSize); | |
|
150 | self->dictSize = sourceSize; | |
|
151 | ||
|
152 | return 0; | |
|
153 | } | |
|
154 | ||
|
155 | static void ZstdCompressionDict_dealloc(ZstdCompressionDict* self) { | |
|
156 | if (self->dictData) { | |
|
157 | free(self->dictData); | |
|
158 | self->dictData = NULL; | |
|
159 | } | |
|
160 | ||
|
161 | PyObject_Del(self); | |
|
162 | } | |
|
163 | ||
|
164 | static PyObject* ZstdCompressionDict_dict_id(ZstdCompressionDict* self) { | |
|
165 | unsigned dictID = ZDICT_getDictID(self->dictData, self->dictSize); | |
|
166 | ||
|
167 | return PyLong_FromLong(dictID); | |
|
168 | } | |
|
169 | ||
|
170 | static PyObject* ZstdCompressionDict_as_bytes(ZstdCompressionDict* self) { | |
|
171 | return PyBytes_FromStringAndSize(self->dictData, self->dictSize); | |
|
172 | } | |
|
173 | ||
|
174 | static PyMethodDef ZstdCompressionDict_methods[] = { | |
|
175 | { "dict_id", (PyCFunction)ZstdCompressionDict_dict_id, METH_NOARGS, | |
|
176 | PyDoc_STR("dict_id() -- obtain the numeric dictionary ID") }, | |
|
177 | { "as_bytes", (PyCFunction)ZstdCompressionDict_as_bytes, METH_NOARGS, | |
|
178 | PyDoc_STR("as_bytes() -- obtain the raw bytes constituting the dictionary data") }, | |
|
179 | { NULL, NULL } | |
|
180 | }; | |
|
181 | ||
|
182 | static Py_ssize_t ZstdCompressionDict_length(ZstdCompressionDict* self) { | |
|
183 | return self->dictSize; | |
|
184 | } | |
|
185 | ||
|
186 | static PySequenceMethods ZstdCompressionDict_sq = { | |
|
187 | (lenfunc)ZstdCompressionDict_length, /* sq_length */ | |
|
188 | 0, /* sq_concat */ | |
|
189 | 0, /* sq_repeat */ | |
|
190 | 0, /* sq_item */ | |
|
191 | 0, /* sq_ass_item */ | |
|
192 | 0, /* sq_contains */ | |
|
193 | 0, /* sq_inplace_concat */ | |
|
194 | 0 /* sq_inplace_repeat */ | |
|
195 | }; | |
|
196 | ||
|
197 | PyTypeObject ZstdCompressionDictType = { | |
|
198 | PyVarObject_HEAD_INIT(NULL, 0) | |
|
199 | "zstd.ZstdCompressionDict", /* tp_name */ | |
|
200 | sizeof(ZstdCompressionDict), /* tp_basicsize */ | |
|
201 | 0, /* tp_itemsize */ | |
|
202 | (destructor)ZstdCompressionDict_dealloc, /* tp_dealloc */ | |
|
203 | 0, /* tp_print */ | |
|
204 | 0, /* tp_getattr */ | |
|
205 | 0, /* tp_setattr */ | |
|
206 | 0, /* tp_compare */ | |
|
207 | 0, /* tp_repr */ | |
|
208 | 0, /* tp_as_number */ | |
|
209 | &ZstdCompressionDict_sq, /* tp_as_sequence */ | |
|
210 | 0, /* tp_as_mapping */ | |
|
211 | 0, /* tp_hash */ | |
|
212 | 0, /* tp_call */ | |
|
213 | 0, /* tp_str */ | |
|
214 | 0, /* tp_getattro */ | |
|
215 | 0, /* tp_setattro */ | |
|
216 | 0, /* tp_as_buffer */ | |
|
217 | Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ | |
|
218 | ZstdCompressionDict__doc__, /* tp_doc */ | |
|
219 | 0, /* tp_traverse */ | |
|
220 | 0, /* tp_clear */ | |
|
221 | 0, /* tp_richcompare */ | |
|
222 | 0, /* tp_weaklistoffset */ | |
|
223 | 0, /* tp_iter */ | |
|
224 | 0, /* tp_iternext */ | |
|
225 | ZstdCompressionDict_methods, /* tp_methods */ | |
|
226 | 0, /* tp_members */ | |
|
227 | 0, /* tp_getset */ | |
|
228 | 0, /* tp_base */ | |
|
229 | 0, /* tp_dict */ | |
|
230 | 0, /* tp_descr_get */ | |
|
231 | 0, /* tp_descr_set */ | |
|
232 | 0, /* tp_dictoffset */ | |
|
233 | (initproc)ZstdCompressionDict_init, /* tp_init */ | |
|
234 | 0, /* tp_alloc */ | |
|
235 | PyType_GenericNew, /* tp_new */ | |
|
236 | }; | |
|
237 | ||
|
238 | void compressiondict_module_init(PyObject* mod) { | |
|
239 | Py_TYPE(&ZstdCompressionDictType) = &PyType_Type; | |
|
240 | if (PyType_Ready(&ZstdCompressionDictType) < 0) { | |
|
241 | return; | |
|
242 | } | |
|
243 | ||
|
244 | Py_INCREF((PyObject*)&ZstdCompressionDictType); | |
|
245 | PyModule_AddObject(mod, "ZstdCompressionDict", | |
|
246 | (PyObject*)&ZstdCompressionDictType); | |
|
247 | } |
@@ -0,0 +1,226 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | void ztopy_compression_parameters(CompressionParametersObject* params, ZSTD_compressionParameters* zparams) { | |
|
12 | zparams->windowLog = params->windowLog; | |
|
13 | zparams->chainLog = params->chainLog; | |
|
14 | zparams->hashLog = params->hashLog; | |
|
15 | zparams->searchLog = params->searchLog; | |
|
16 | zparams->searchLength = params->searchLength; | |
|
17 | zparams->targetLength = params->targetLength; | |
|
18 | zparams->strategy = params->strategy; | |
|
19 | } | |
|
20 | ||
|
21 | CompressionParametersObject* get_compression_parameters(PyObject* self, PyObject* args) { | |
|
22 | int compressionLevel; | |
|
23 | unsigned PY_LONG_LONG sourceSize = 0; | |
|
24 | Py_ssize_t dictSize = 0; | |
|
25 | ZSTD_compressionParameters params; | |
|
26 | CompressionParametersObject* result; | |
|
27 | ||
|
28 | if (!PyArg_ParseTuple(args, "i|Kn", &compressionLevel, &sourceSize, &dictSize)) { | |
|
29 | return NULL; | |
|
30 | } | |
|
31 | ||
|
32 | params = ZSTD_getCParams(compressionLevel, sourceSize, dictSize); | |
|
33 | ||
|
34 | result = PyObject_New(CompressionParametersObject, &CompressionParametersType); | |
|
35 | if (!result) { | |
|
36 | return NULL; | |
|
37 | } | |
|
38 | ||
|
39 | result->windowLog = params.windowLog; | |
|
40 | result->chainLog = params.chainLog; | |
|
41 | result->hashLog = params.hashLog; | |
|
42 | result->searchLog = params.searchLog; | |
|
43 | result->searchLength = params.searchLength; | |
|
44 | result->targetLength = params.targetLength; | |
|
45 | result->strategy = params.strategy; | |
|
46 | ||
|
47 | return result; | |
|
48 | } | |
|
49 | ||
|
50 | PyObject* estimate_compression_context_size(PyObject* self, PyObject* args) { | |
|
51 | CompressionParametersObject* params; | |
|
52 | ZSTD_compressionParameters zparams; | |
|
53 | PyObject* result; | |
|
54 | ||
|
55 | if (!PyArg_ParseTuple(args, "O!", &CompressionParametersType, ¶ms)) { | |
|
56 | return NULL; | |
|
57 | } | |
|
58 | ||
|
59 | ztopy_compression_parameters(params, &zparams); | |
|
60 | result = PyLong_FromSize_t(ZSTD_estimateCCtxSize(zparams)); | |
|
61 | return result; | |
|
62 | } | |
|
63 | ||
|
64 | PyDoc_STRVAR(CompressionParameters__doc__, | |
|
65 | "CompressionParameters: low-level control over zstd compression"); | |
|
66 | ||
|
67 | static PyObject* CompressionParameters_new(PyTypeObject* subtype, PyObject* args, PyObject* kwargs) { | |
|
68 | CompressionParametersObject* self; | |
|
69 | unsigned windowLog; | |
|
70 | unsigned chainLog; | |
|
71 | unsigned hashLog; | |
|
72 | unsigned searchLog; | |
|
73 | unsigned searchLength; | |
|
74 | unsigned targetLength; | |
|
75 | unsigned strategy; | |
|
76 | ||
|
77 | if (!PyArg_ParseTuple(args, "IIIIIII", &windowLog, &chainLog, &hashLog, &searchLog, | |
|
78 | &searchLength, &targetLength, &strategy)) { | |
|
79 | return NULL; | |
|
80 | } | |
|
81 | ||
|
82 | if (windowLog < ZSTD_WINDOWLOG_MIN || windowLog > ZSTD_WINDOWLOG_MAX) { | |
|
83 | PyErr_SetString(PyExc_ValueError, "invalid window log value"); | |
|
84 | return NULL; | |
|
85 | } | |
|
86 | ||
|
87 | if (chainLog < ZSTD_CHAINLOG_MIN || chainLog > ZSTD_CHAINLOG_MAX) { | |
|
88 | PyErr_SetString(PyExc_ValueError, "invalid chain log value"); | |
|
89 | return NULL; | |
|
90 | } | |
|
91 | ||
|
92 | if (hashLog < ZSTD_HASHLOG_MIN || hashLog > ZSTD_HASHLOG_MAX) { | |
|
93 | PyErr_SetString(PyExc_ValueError, "invalid hash log value"); | |
|
94 | return NULL; | |
|
95 | } | |
|
96 | ||
|
97 | if (searchLog < ZSTD_SEARCHLOG_MIN || searchLog > ZSTD_SEARCHLOG_MAX) { | |
|
98 | PyErr_SetString(PyExc_ValueError, "invalid search log value"); | |
|
99 | return NULL; | |
|
100 | } | |
|
101 | ||
|
102 | if (searchLength < ZSTD_SEARCHLENGTH_MIN || searchLength > ZSTD_SEARCHLENGTH_MAX) { | |
|
103 | PyErr_SetString(PyExc_ValueError, "invalid search length value"); | |
|
104 | return NULL; | |
|
105 | } | |
|
106 | ||
|
107 | if (targetLength < ZSTD_TARGETLENGTH_MIN || targetLength > ZSTD_TARGETLENGTH_MAX) { | |
|
108 | PyErr_SetString(PyExc_ValueError, "invalid target length value"); | |
|
109 | return NULL; | |
|
110 | } | |
|
111 | ||
|
112 | if (strategy < ZSTD_fast || strategy > ZSTD_btopt) { | |
|
113 | PyErr_SetString(PyExc_ValueError, "invalid strategy value"); | |
|
114 | return NULL; | |
|
115 | } | |
|
116 | ||
|
117 | self = (CompressionParametersObject*)subtype->tp_alloc(subtype, 1); | |
|
118 | if (!self) { | |
|
119 | return NULL; | |
|
120 | } | |
|
121 | ||
|
122 | self->windowLog = windowLog; | |
|
123 | self->chainLog = chainLog; | |
|
124 | self->hashLog = hashLog; | |
|
125 | self->searchLog = searchLog; | |
|
126 | self->searchLength = searchLength; | |
|
127 | self->targetLength = targetLength; | |
|
128 | self->strategy = strategy; | |
|
129 | ||
|
130 | return (PyObject*)self; | |
|
131 | } | |
|
132 | ||
|
133 | static void CompressionParameters_dealloc(PyObject* self) { | |
|
134 | PyObject_Del(self); | |
|
135 | } | |
|
136 | ||
|
137 | static Py_ssize_t CompressionParameters_length(PyObject* self) { | |
|
138 | return 7; | |
|
139 | }; | |
|
140 | ||
|
141 | static PyObject* CompressionParameters_item(PyObject* o, Py_ssize_t i) { | |
|
142 | CompressionParametersObject* self = (CompressionParametersObject*)o; | |
|
143 | ||
|
144 | switch (i) { | |
|
145 | case 0: | |
|
146 | return PyLong_FromLong(self->windowLog); | |
|
147 | case 1: | |
|
148 | return PyLong_FromLong(self->chainLog); | |
|
149 | case 2: | |
|
150 | return PyLong_FromLong(self->hashLog); | |
|
151 | case 3: | |
|
152 | return PyLong_FromLong(self->searchLog); | |
|
153 | case 4: | |
|
154 | return PyLong_FromLong(self->searchLength); | |
|
155 | case 5: | |
|
156 | return PyLong_FromLong(self->targetLength); | |
|
157 | case 6: | |
|
158 | return PyLong_FromLong(self->strategy); | |
|
159 | default: | |
|
160 | PyErr_SetString(PyExc_IndexError, "index out of range"); | |
|
161 | return NULL; | |
|
162 | } | |
|
163 | } | |
|
164 | ||
|
165 | static PySequenceMethods CompressionParameters_sq = { | |
|
166 | CompressionParameters_length, /* sq_length */ | |
|
167 | 0, /* sq_concat */ | |
|
168 | 0, /* sq_repeat */ | |
|
169 | CompressionParameters_item, /* sq_item */ | |
|
170 | 0, /* sq_ass_item */ | |
|
171 | 0, /* sq_contains */ | |
|
172 | 0, /* sq_inplace_concat */ | |
|
173 | 0 /* sq_inplace_repeat */ | |
|
174 | }; | |
|
175 | ||
|
176 | PyTypeObject CompressionParametersType = { | |
|
177 | PyVarObject_HEAD_INIT(NULL, 0) | |
|
178 | "CompressionParameters", /* tp_name */ | |
|
179 | sizeof(CompressionParametersObject), /* tp_basicsize */ | |
|
180 | 0, /* tp_itemsize */ | |
|
181 | (destructor)CompressionParameters_dealloc, /* tp_dealloc */ | |
|
182 | 0, /* tp_print */ | |
|
183 | 0, /* tp_getattr */ | |
|
184 | 0, /* tp_setattr */ | |
|
185 | 0, /* tp_compare */ | |
|
186 | 0, /* tp_repr */ | |
|
187 | 0, /* tp_as_number */ | |
|
188 | &CompressionParameters_sq, /* tp_as_sequence */ | |
|
189 | 0, /* tp_as_mapping */ | |
|
190 | 0, /* tp_hash */ | |
|
191 | 0, /* tp_call */ | |
|
192 | 0, /* tp_str */ | |
|
193 | 0, /* tp_getattro */ | |
|
194 | 0, /* tp_setattro */ | |
|
195 | 0, /* tp_as_buffer */ | |
|
196 | Py_TPFLAGS_DEFAULT, /* tp_flags */ | |
|
197 | CompressionParameters__doc__, /* tp_doc */ | |
|
198 | 0, /* tp_traverse */ | |
|
199 | 0, /* tp_clear */ | |
|
200 | 0, /* tp_richcompare */ | |
|
201 | 0, /* tp_weaklistoffset */ | |
|
202 | 0, /* tp_iter */ | |
|
203 | 0, /* tp_iternext */ | |
|
204 | 0, /* tp_methods */ | |
|
205 | 0, /* tp_members */ | |
|
206 | 0, /* tp_getset */ | |
|
207 | 0, /* tp_base */ | |
|
208 | 0, /* tp_dict */ | |
|
209 | 0, /* tp_descr_get */ | |
|
210 | 0, /* tp_descr_set */ | |
|
211 | 0, /* tp_dictoffset */ | |
|
212 | 0, /* tp_init */ | |
|
213 | 0, /* tp_alloc */ | |
|
214 | CompressionParameters_new, /* tp_new */ | |
|
215 | }; | |
|
216 | ||
|
217 | void compressionparams_module_init(PyObject* mod) { | |
|
218 | Py_TYPE(&CompressionParametersType) = &PyType_Type; | |
|
219 | if (PyType_Ready(&CompressionParametersType) < 0) { | |
|
220 | return; | |
|
221 | } | |
|
222 | ||
|
223 | Py_IncRef((PyObject*)&CompressionParametersType); | |
|
224 | PyModule_AddObject(mod, "CompressionParameters", | |
|
225 | (PyObject*)&CompressionParametersType); | |
|
226 | } |
@@ -0,0 +1,235 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | extern PyObject* ZstdError; | |
|
12 | ||
|
13 | PyDoc_STRVAR(ZstdCompresssionWriter__doc__, | |
|
14 | """A context manager used for writing compressed output to a writer.\n" | |
|
15 | ); | |
|
16 | ||
|
17 | static void ZstdCompressionWriter_dealloc(ZstdCompressionWriter* self) { | |
|
18 | Py_XDECREF(self->compressor); | |
|
19 | Py_XDECREF(self->writer); | |
|
20 | ||
|
21 | if (self->cstream) { | |
|
22 | ZSTD_freeCStream(self->cstream); | |
|
23 | self->cstream = NULL; | |
|
24 | } | |
|
25 | ||
|
26 | PyObject_Del(self); | |
|
27 | } | |
|
28 | ||
|
29 | static PyObject* ZstdCompressionWriter_enter(ZstdCompressionWriter* self) { | |
|
30 | if (self->entered) { | |
|
31 | PyErr_SetString(ZstdError, "cannot __enter__ multiple times"); | |
|
32 | return NULL; | |
|
33 | } | |
|
34 | ||
|
35 | self->cstream = CStream_from_ZstdCompressor(self->compressor, self->sourceSize); | |
|
36 | if (!self->cstream) { | |
|
37 | return NULL; | |
|
38 | } | |
|
39 | ||
|
40 | self->entered = 1; | |
|
41 | ||
|
42 | Py_INCREF(self); | |
|
43 | return (PyObject*)self; | |
|
44 | } | |
|
45 | ||
|
46 | static PyObject* ZstdCompressionWriter_exit(ZstdCompressionWriter* self, PyObject* args) { | |
|
47 | PyObject* exc_type; | |
|
48 | PyObject* exc_value; | |
|
49 | PyObject* exc_tb; | |
|
50 | size_t zresult; | |
|
51 | ||
|
52 | ZSTD_outBuffer output; | |
|
53 | PyObject* res; | |
|
54 | ||
|
55 | if (!PyArg_ParseTuple(args, "OOO", &exc_type, &exc_value, &exc_tb)) { | |
|
56 | return NULL; | |
|
57 | } | |
|
58 | ||
|
59 | self->entered = 0; | |
|
60 | ||
|
61 | if (self->cstream && exc_type == Py_None && exc_value == Py_None && | |
|
62 | exc_tb == Py_None) { | |
|
63 | ||
|
64 | output.dst = malloc(self->outSize); | |
|
65 | if (!output.dst) { | |
|
66 | return PyErr_NoMemory(); | |
|
67 | } | |
|
68 | output.size = self->outSize; | |
|
69 | output.pos = 0; | |
|
70 | ||
|
71 | while (1) { | |
|
72 | zresult = ZSTD_endStream(self->cstream, &output); | |
|
73 | if (ZSTD_isError(zresult)) { | |
|
74 | PyErr_Format(ZstdError, "error ending compression stream: %s", | |
|
75 | ZSTD_getErrorName(zresult)); | |
|
76 | free(output.dst); | |
|
77 | return NULL; | |
|
78 | } | |
|
79 | ||
|
80 | if (output.pos) { | |
|
81 | #if PY_MAJOR_VERSION >= 3 | |
|
82 | res = PyObject_CallMethod(self->writer, "write", "y#", | |
|
83 | #else | |
|
84 | res = PyObject_CallMethod(self->writer, "write", "s#", | |
|
85 | #endif | |
|
86 | output.dst, output.pos); | |
|
87 | Py_XDECREF(res); | |
|
88 | } | |
|
89 | ||
|
90 | if (!zresult) { | |
|
91 | break; | |
|
92 | } | |
|
93 | ||
|
94 | output.pos = 0; | |
|
95 | } | |
|
96 | ||
|
97 | free(output.dst); | |
|
98 | ZSTD_freeCStream(self->cstream); | |
|
99 | self->cstream = NULL; | |
|
100 | } | |
|
101 | ||
|
102 | Py_RETURN_FALSE; | |
|
103 | } | |
|
104 | ||
|
105 | static PyObject* ZstdCompressionWriter_memory_size(ZstdCompressionWriter* self) { | |
|
106 | if (!self->cstream) { | |
|
107 | PyErr_SetString(ZstdError, "cannot determine size of an inactive compressor; " | |
|
108 | "call when a context manager is active"); | |
|
109 | return NULL; | |
|
110 | } | |
|
111 | ||
|
112 | return PyLong_FromSize_t(ZSTD_sizeof_CStream(self->cstream)); | |
|
113 | } | |
|
114 | ||
|
115 | static PyObject* ZstdCompressionWriter_write(ZstdCompressionWriter* self, PyObject* args) { | |
|
116 | const char* source; | |
|
117 | Py_ssize_t sourceSize; | |
|
118 | size_t zresult; | |
|
119 | ZSTD_inBuffer input; | |
|
120 | ZSTD_outBuffer output; | |
|
121 | PyObject* res; | |
|
122 | ||
|
123 | #if PY_MAJOR_VERSION >= 3 | |
|
124 | if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) { | |
|
125 | #else | |
|
126 | if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) { | |
|
127 | #endif | |
|
128 | return NULL; | |
|
129 | } | |
|
130 | ||
|
131 | if (!self->entered) { | |
|
132 | PyErr_SetString(ZstdError, "compress must be called from an active context manager"); | |
|
133 | return NULL; | |
|
134 | } | |
|
135 | ||
|
136 | output.dst = malloc(self->outSize); | |
|
137 | if (!output.dst) { | |
|
138 | return PyErr_NoMemory(); | |
|
139 | } | |
|
140 | output.size = self->outSize; | |
|
141 | output.pos = 0; | |
|
142 | ||
|
143 | input.src = source; | |
|
144 | input.size = sourceSize; | |
|
145 | input.pos = 0; | |
|
146 | ||
|
147 | while ((ssize_t)input.pos < sourceSize) { | |
|
148 | Py_BEGIN_ALLOW_THREADS | |
|
149 | zresult = ZSTD_compressStream(self->cstream, &output, &input); | |
|
150 | Py_END_ALLOW_THREADS | |
|
151 | ||
|
152 | if (ZSTD_isError(zresult)) { | |
|
153 | free(output.dst); | |
|
154 | PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); | |
|
155 | return NULL; | |
|
156 | } | |
|
157 | ||
|
158 | /* Copy data from output buffer to writer. */ | |
|
159 | if (output.pos) { | |
|
160 | #if PY_MAJOR_VERSION >= 3 | |
|
161 | res = PyObject_CallMethod(self->writer, "write", "y#", | |
|
162 | #else | |
|
163 | res = PyObject_CallMethod(self->writer, "write", "s#", | |
|
164 | #endif | |
|
165 | output.dst, output.pos); | |
|
166 | Py_XDECREF(res); | |
|
167 | } | |
|
168 | output.pos = 0; | |
|
169 | } | |
|
170 | ||
|
171 | free(output.dst); | |
|
172 | ||
|
173 | /* TODO return bytes written */ | |
|
174 | Py_RETURN_NONE; | |
|
175 | } | |
|
176 | ||
|
177 | static PyMethodDef ZstdCompressionWriter_methods[] = { | |
|
178 | { "__enter__", (PyCFunction)ZstdCompressionWriter_enter, METH_NOARGS, | |
|
179 | PyDoc_STR("Enter a compression context.") }, | |
|
180 | { "__exit__", (PyCFunction)ZstdCompressionWriter_exit, METH_VARARGS, | |
|
181 | PyDoc_STR("Exit a compression context.") }, | |
|
182 | { "memory_size", (PyCFunction)ZstdCompressionWriter_memory_size, METH_NOARGS, | |
|
183 | PyDoc_STR("Obtain the memory size of the underlying compressor") }, | |
|
184 | { "write", (PyCFunction)ZstdCompressionWriter_write, METH_VARARGS, | |
|
185 | PyDoc_STR("Compress data") }, | |
|
186 | { NULL, NULL } | |
|
187 | }; | |
|
188 | ||
|
189 | PyTypeObject ZstdCompressionWriterType = { | |
|
190 | PyVarObject_HEAD_INIT(NULL, 0) | |
|
191 | "zstd.ZstdCompressionWriter", /* tp_name */ | |
|
192 | sizeof(ZstdCompressionWriter), /* tp_basicsize */ | |
|
193 | 0, /* tp_itemsize */ | |
|
194 | (destructor)ZstdCompressionWriter_dealloc, /* tp_dealloc */ | |
|
195 | 0, /* tp_print */ | |
|
196 | 0, /* tp_getattr */ | |
|
197 | 0, /* tp_setattr */ | |
|
198 | 0, /* tp_compare */ | |
|
199 | 0, /* tp_repr */ | |
|
200 | 0, /* tp_as_number */ | |
|
201 | 0, /* tp_as_sequence */ | |
|
202 | 0, /* tp_as_mapping */ | |
|
203 | 0, /* tp_hash */ | |
|
204 | 0, /* tp_call */ | |
|
205 | 0, /* tp_str */ | |
|
206 | 0, /* tp_getattro */ | |
|
207 | 0, /* tp_setattro */ | |
|
208 | 0, /* tp_as_buffer */ | |
|
209 | Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ | |
|
210 | ZstdCompresssionWriter__doc__, /* tp_doc */ | |
|
211 | 0, /* tp_traverse */ | |
|
212 | 0, /* tp_clear */ | |
|
213 | 0, /* tp_richcompare */ | |
|
214 | 0, /* tp_weaklistoffset */ | |
|
215 | 0, /* tp_iter */ | |
|
216 | 0, /* tp_iternext */ | |
|
217 | ZstdCompressionWriter_methods, /* tp_methods */ | |
|
218 | 0, /* tp_members */ | |
|
219 | 0, /* tp_getset */ | |
|
220 | 0, /* tp_base */ | |
|
221 | 0, /* tp_dict */ | |
|
222 | 0, /* tp_descr_get */ | |
|
223 | 0, /* tp_descr_set */ | |
|
224 | 0, /* tp_dictoffset */ | |
|
225 | 0, /* tp_init */ | |
|
226 | 0, /* tp_alloc */ | |
|
227 | PyType_GenericNew, /* tp_new */ | |
|
228 | }; | |
|
229 | ||
|
230 | void compressionwriter_module_init(PyObject* mod) { | |
|
231 | Py_TYPE(&ZstdCompressionWriterType) = &PyType_Type; | |
|
232 | if (PyType_Ready(&ZstdCompressionWriterType) < 0) { | |
|
233 | return; | |
|
234 | } | |
|
235 | } |
@@ -0,0 +1,205 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | extern PyObject* ZstdError; | |
|
12 | ||
|
13 | PyDoc_STRVAR(ZstdCompressionObj__doc__, | |
|
14 | "Perform compression using a standard library compatible API.\n" | |
|
15 | ); | |
|
16 | ||
|
17 | static void ZstdCompressionObj_dealloc(ZstdCompressionObj* self) { | |
|
18 | PyMem_Free(self->output.dst); | |
|
19 | self->output.dst = NULL; | |
|
20 | ||
|
21 | if (self->cstream) { | |
|
22 | ZSTD_freeCStream(self->cstream); | |
|
23 | self->cstream = NULL; | |
|
24 | } | |
|
25 | ||
|
26 | Py_XDECREF(self->compressor); | |
|
27 | ||
|
28 | PyObject_Del(self); | |
|
29 | } | |
|
30 | ||
|
31 | static PyObject* ZstdCompressionObj_compress(ZstdCompressionObj* self, PyObject* args) { | |
|
32 | const char* source; | |
|
33 | Py_ssize_t sourceSize; | |
|
34 | ZSTD_inBuffer input; | |
|
35 | size_t zresult; | |
|
36 | PyObject* result = NULL; | |
|
37 | Py_ssize_t resultSize = 0; | |
|
38 | ||
|
39 | if (self->flushed) { | |
|
40 | PyErr_SetString(ZstdError, "cannot call compress() after flush() has been called"); | |
|
41 | return NULL; | |
|
42 | } | |
|
43 | ||
|
44 | #if PY_MAJOR_VERSION >= 3 | |
|
45 | if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) { | |
|
46 | #else | |
|
47 | if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) { | |
|
48 | #endif | |
|
49 | return NULL; | |
|
50 | } | |
|
51 | ||
|
52 | input.src = source; | |
|
53 | input.size = sourceSize; | |
|
54 | input.pos = 0; | |
|
55 | ||
|
56 | while ((ssize_t)input.pos < sourceSize) { | |
|
57 | Py_BEGIN_ALLOW_THREADS | |
|
58 | zresult = ZSTD_compressStream(self->cstream, &self->output, &input); | |
|
59 | Py_END_ALLOW_THREADS | |
|
60 | ||
|
61 | if (ZSTD_isError(zresult)) { | |
|
62 | PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); | |
|
63 | return NULL; | |
|
64 | } | |
|
65 | ||
|
66 | if (self->output.pos) { | |
|
67 | if (result) { | |
|
68 | resultSize = PyBytes_GET_SIZE(result); | |
|
69 | if (-1 == _PyBytes_Resize(&result, resultSize + self->output.pos)) { | |
|
70 | return NULL; | |
|
71 | } | |
|
72 | ||
|
73 | memcpy(PyBytes_AS_STRING(result) + resultSize, | |
|
74 | self->output.dst, self->output.pos); | |
|
75 | } | |
|
76 | else { | |
|
77 | result = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); | |
|
78 | if (!result) { | |
|
79 | return NULL; | |
|
80 | } | |
|
81 | } | |
|
82 | ||
|
83 | self->output.pos = 0; | |
|
84 | } | |
|
85 | } | |
|
86 | ||
|
87 | if (result) { | |
|
88 | return result; | |
|
89 | } | |
|
90 | else { | |
|
91 | return PyBytes_FromString(""); | |
|
92 | } | |
|
93 | } | |
|
94 | ||
|
95 | static PyObject* ZstdCompressionObj_flush(ZstdCompressionObj* self) { | |
|
96 | size_t zresult; | |
|
97 | PyObject* result = NULL; | |
|
98 | Py_ssize_t resultSize = 0; | |
|
99 | ||
|
100 | if (self->flushed) { | |
|
101 | PyErr_SetString(ZstdError, "flush() already called"); | |
|
102 | return NULL; | |
|
103 | } | |
|
104 | ||
|
105 | self->flushed = 1; | |
|
106 | ||
|
107 | while (1) { | |
|
108 | zresult = ZSTD_endStream(self->cstream, &self->output); | |
|
109 | if (ZSTD_isError(zresult)) { | |
|
110 | PyErr_Format(ZstdError, "error ending compression stream: %s", | |
|
111 | ZSTD_getErrorName(zresult)); | |
|
112 | return NULL; | |
|
113 | } | |
|
114 | ||
|
115 | if (self->output.pos) { | |
|
116 | if (result) { | |
|
117 | resultSize = PyBytes_GET_SIZE(result); | |
|
118 | if (-1 == _PyBytes_Resize(&result, resultSize + self->output.pos)) { | |
|
119 | return NULL; | |
|
120 | } | |
|
121 | ||
|
122 | memcpy(PyBytes_AS_STRING(result) + resultSize, | |
|
123 | self->output.dst, self->output.pos); | |
|
124 | } | |
|
125 | else { | |
|
126 | result = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); | |
|
127 | if (!result) { | |
|
128 | return NULL; | |
|
129 | } | |
|
130 | } | |
|
131 | ||
|
132 | self->output.pos = 0; | |
|
133 | } | |
|
134 | ||
|
135 | if (!zresult) { | |
|
136 | break; | |
|
137 | } | |
|
138 | } | |
|
139 | ||
|
140 | ZSTD_freeCStream(self->cstream); | |
|
141 | self->cstream = NULL; | |
|
142 | ||
|
143 | if (result) { | |
|
144 | return result; | |
|
145 | } | |
|
146 | else { | |
|
147 | return PyBytes_FromString(""); | |
|
148 | } | |
|
149 | } | |
|
150 | ||
|
151 | static PyMethodDef ZstdCompressionObj_methods[] = { | |
|
152 | { "compress", (PyCFunction)ZstdCompressionObj_compress, METH_VARARGS, | |
|
153 | PyDoc_STR("compress data") }, | |
|
154 | { "flush", (PyCFunction)ZstdCompressionObj_flush, METH_NOARGS, | |
|
155 | PyDoc_STR("finish compression operation") }, | |
|
156 | { NULL, NULL } | |
|
157 | }; | |
|
158 | ||
|
159 | PyTypeObject ZstdCompressionObjType = { | |
|
160 | PyVarObject_HEAD_INIT(NULL, 0) | |
|
161 | "zstd.ZstdCompressionObj", /* tp_name */ | |
|
162 | sizeof(ZstdCompressionObj), /* tp_basicsize */ | |
|
163 | 0, /* tp_itemsize */ | |
|
164 | (destructor)ZstdCompressionObj_dealloc, /* tp_dealloc */ | |
|
165 | 0, /* tp_print */ | |
|
166 | 0, /* tp_getattr */ | |
|
167 | 0, /* tp_setattr */ | |
|
168 | 0, /* tp_compare */ | |
|
169 | 0, /* tp_repr */ | |
|
170 | 0, /* tp_as_number */ | |
|
171 | 0, /* tp_as_sequence */ | |
|
172 | 0, /* tp_as_mapping */ | |
|
173 | 0, /* tp_hash */ | |
|
174 | 0, /* tp_call */ | |
|
175 | 0, /* tp_str */ | |
|
176 | 0, /* tp_getattro */ | |
|
177 | 0, /* tp_setattro */ | |
|
178 | 0, /* tp_as_buffer */ | |
|
179 | Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ | |
|
180 | ZstdCompressionObj__doc__, /* tp_doc */ | |
|
181 | 0, /* tp_traverse */ | |
|
182 | 0, /* tp_clear */ | |
|
183 | 0, /* tp_richcompare */ | |
|
184 | 0, /* tp_weaklistoffset */ | |
|
185 | 0, /* tp_iter */ | |
|
186 | 0, /* tp_iternext */ | |
|
187 | ZstdCompressionObj_methods, /* tp_methods */ | |
|
188 | 0, /* tp_members */ | |
|
189 | 0, /* tp_getset */ | |
|
190 | 0, /* tp_base */ | |
|
191 | 0, /* tp_dict */ | |
|
192 | 0, /* tp_descr_get */ | |
|
193 | 0, /* tp_descr_set */ | |
|
194 | 0, /* tp_dictoffset */ | |
|
195 | 0, /* tp_init */ | |
|
196 | 0, /* tp_alloc */ | |
|
197 | PyType_GenericNew, /* tp_new */ | |
|
198 | }; | |
|
199 | ||
|
200 | void compressobj_module_init(PyObject* module) { | |
|
201 | Py_TYPE(&ZstdCompressionObjType) = &PyType_Type; | |
|
202 | if (PyType_Ready(&ZstdCompressionObjType) < 0) { | |
|
203 | return; | |
|
204 | } | |
|
205 | } |
This diff has been collapsed as it changes many lines, (757 lines changed) Show them Hide them | |||
@@ -0,0 +1,757 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | extern PyObject* ZstdError; | |
|
12 | ||
|
13 | /** | |
|
14 | * Initialize a zstd CStream from a ZstdCompressor instance. | |
|
15 | * | |
|
16 | * Returns a ZSTD_CStream on success or NULL on failure. If NULL, a Python | |
|
17 | * exception will be set. | |
|
18 | */ | |
|
19 | ZSTD_CStream* CStream_from_ZstdCompressor(ZstdCompressor* compressor, Py_ssize_t sourceSize) { | |
|
20 | ZSTD_CStream* cstream; | |
|
21 | ZSTD_parameters zparams; | |
|
22 | void* dictData = NULL; | |
|
23 | size_t dictSize = 0; | |
|
24 | size_t zresult; | |
|
25 | ||
|
26 | cstream = ZSTD_createCStream(); | |
|
27 | if (!cstream) { | |
|
28 | PyErr_SetString(ZstdError, "cannot create CStream"); | |
|
29 | return NULL; | |
|
30 | } | |
|
31 | ||
|
32 | if (compressor->dict) { | |
|
33 | dictData = compressor->dict->dictData; | |
|
34 | dictSize = compressor->dict->dictSize; | |
|
35 | } | |
|
36 | ||
|
37 | memset(&zparams, 0, sizeof(zparams)); | |
|
38 | if (compressor->cparams) { | |
|
39 | ztopy_compression_parameters(compressor->cparams, &zparams.cParams); | |
|
40 | /* Do NOT call ZSTD_adjustCParams() here because the compression params | |
|
41 | come from the user. */ | |
|
42 | } | |
|
43 | else { | |
|
44 | zparams.cParams = ZSTD_getCParams(compressor->compressionLevel, sourceSize, dictSize); | |
|
45 | } | |
|
46 | ||
|
47 | zparams.fParams = compressor->fparams; | |
|
48 | ||
|
49 | zresult = ZSTD_initCStream_advanced(cstream, dictData, dictSize, zparams, sourceSize); | |
|
50 | ||
|
51 | if (ZSTD_isError(zresult)) { | |
|
52 | ZSTD_freeCStream(cstream); | |
|
53 | PyErr_Format(ZstdError, "cannot init CStream: %s", ZSTD_getErrorName(zresult)); | |
|
54 | return NULL; | |
|
55 | } | |
|
56 | ||
|
57 | return cstream; | |
|
58 | } | |
|
59 | ||
|
60 | ||
|
61 | PyDoc_STRVAR(ZstdCompressor__doc__, | |
|
62 | "ZstdCompressor(level=None, dict_data=None, compression_params=None)\n" | |
|
63 | "\n" | |
|
64 | "Create an object used to perform Zstandard compression.\n" | |
|
65 | "\n" | |
|
66 | "An instance can compress data various ways. Instances can be used multiple\n" | |
|
67 | "times. Each compression operation will use the compression parameters\n" | |
|
68 | "defined at construction time.\n" | |
|
69 | "\n" | |
|
70 | "Compression can be configured via the following names arguments:\n" | |
|
71 | "\n" | |
|
72 | "level\n" | |
|
73 | " Integer compression level.\n" | |
|
74 | "dict_data\n" | |
|
75 | " A ``ZstdCompressionDict`` to be used to compress with dictionary data.\n" | |
|
76 | "compression_params\n" | |
|
77 | " A ``CompressionParameters`` instance defining low-level compression" | |
|
78 | " parameters. If defined, this will overwrite the ``level`` argument.\n" | |
|
79 | "write_checksum\n" | |
|
80 | " If True, a 4 byte content checksum will be written with the compressed\n" | |
|
81 | " data, allowing the decompressor to perform content verification.\n" | |
|
82 | "write_content_size\n" | |
|
83 | " If True, the decompressed content size will be included in the header of\n" | |
|
84 | " the compressed data. This data will only be written if the compressor\n" | |
|
85 | " knows the size of the input data.\n" | |
|
86 | "write_dict_id\n" | |
|
87 | " Determines whether the dictionary ID will be written into the compressed\n" | |
|
88 | " data. Defaults to True. Only adds content to the compressed data if\n" | |
|
89 | " a dictionary is being used.\n" | |
|
90 | ); | |
|
91 | ||
|
92 | static int ZstdCompressor_init(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { | |
|
93 | static char* kwlist[] = { | |
|
94 | "level", | |
|
95 | "dict_data", | |
|
96 | "compression_params", | |
|
97 | "write_checksum", | |
|
98 | "write_content_size", | |
|
99 | "write_dict_id", | |
|
100 | NULL | |
|
101 | }; | |
|
102 | ||
|
103 | int level = 3; | |
|
104 | ZstdCompressionDict* dict = NULL; | |
|
105 | CompressionParametersObject* params = NULL; | |
|
106 | PyObject* writeChecksum = NULL; | |
|
107 | PyObject* writeContentSize = NULL; | |
|
108 | PyObject* writeDictID = NULL; | |
|
109 | ||
|
110 | self->dict = NULL; | |
|
111 | self->cparams = NULL; | |
|
112 | self->cdict = NULL; | |
|
113 | ||
|
114 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|iO!O!OOO", kwlist, | |
|
115 | &level, &ZstdCompressionDictType, &dict, | |
|
116 | &CompressionParametersType, ¶ms, | |
|
117 | &writeChecksum, &writeContentSize, &writeDictID)) { | |
|
118 | return -1; | |
|
119 | } | |
|
120 | ||
|
121 | if (level < 1) { | |
|
122 | PyErr_SetString(PyExc_ValueError, "level must be greater than 0"); | |
|
123 | return -1; | |
|
124 | } | |
|
125 | ||
|
126 | if (level > ZSTD_maxCLevel()) { | |
|
127 | PyErr_Format(PyExc_ValueError, "level must be less than %d", | |
|
128 | ZSTD_maxCLevel() + 1); | |
|
129 | return -1; | |
|
130 | } | |
|
131 | ||
|
132 | self->compressionLevel = level; | |
|
133 | ||
|
134 | if (dict) { | |
|
135 | self->dict = dict; | |
|
136 | Py_INCREF(dict); | |
|
137 | } | |
|
138 | ||
|
139 | if (params) { | |
|
140 | self->cparams = params; | |
|
141 | Py_INCREF(params); | |
|
142 | } | |
|
143 | ||
|
144 | memset(&self->fparams, 0, sizeof(self->fparams)); | |
|
145 | ||
|
146 | if (writeChecksum && PyObject_IsTrue(writeChecksum)) { | |
|
147 | self->fparams.checksumFlag = 1; | |
|
148 | } | |
|
149 | if (writeContentSize && PyObject_IsTrue(writeContentSize)) { | |
|
150 | self->fparams.contentSizeFlag = 1; | |
|
151 | } | |
|
152 | if (writeDictID && PyObject_Not(writeDictID)) { | |
|
153 | self->fparams.noDictIDFlag = 1; | |
|
154 | } | |
|
155 | ||
|
156 | return 0; | |
|
157 | } | |
|
158 | ||
|
159 | static void ZstdCompressor_dealloc(ZstdCompressor* self) { | |
|
160 | Py_XDECREF(self->cparams); | |
|
161 | Py_XDECREF(self->dict); | |
|
162 | ||
|
163 | if (self->cdict) { | |
|
164 | ZSTD_freeCDict(self->cdict); | |
|
165 | self->cdict = NULL; | |
|
166 | } | |
|
167 | ||
|
168 | PyObject_Del(self); | |
|
169 | } | |
|
170 | ||
|
171 | PyDoc_STRVAR(ZstdCompressor_copy_stream__doc__, | |
|
172 | "copy_stream(ifh, ofh[, size=0, read_size=default, write_size=default])\n" | |
|
173 | "compress data between streams\n" | |
|
174 | "\n" | |
|
175 | "Data will be read from ``ifh``, compressed, and written to ``ofh``.\n" | |
|
176 | "``ifh`` must have a ``read(size)`` method. ``ofh`` must have a ``write(data)``\n" | |
|
177 | "method.\n" | |
|
178 | "\n" | |
|
179 | "An optional ``size`` argument specifies the size of the source stream.\n" | |
|
180 | "If defined, compression parameters will be tuned based on the size.\n" | |
|
181 | "\n" | |
|
182 | "Optional arguments ``read_size`` and ``write_size`` define the chunk sizes\n" | |
|
183 | "of ``read()`` and ``write()`` operations, respectively. By default, they use\n" | |
|
184 | "the default compression stream input and output sizes, respectively.\n" | |
|
185 | ); | |
|
186 | ||
|
187 | static PyObject* ZstdCompressor_copy_stream(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { | |
|
188 | static char* kwlist[] = { | |
|
189 | "ifh", | |
|
190 | "ofh", | |
|
191 | "size", | |
|
192 | "read_size", | |
|
193 | "write_size", | |
|
194 | NULL | |
|
195 | }; | |
|
196 | ||
|
197 | PyObject* source; | |
|
198 | PyObject* dest; | |
|
199 | Py_ssize_t sourceSize = 0; | |
|
200 | size_t inSize = ZSTD_CStreamInSize(); | |
|
201 | size_t outSize = ZSTD_CStreamOutSize(); | |
|
202 | ZSTD_CStream* cstream; | |
|
203 | ZSTD_inBuffer input; | |
|
204 | ZSTD_outBuffer output; | |
|
205 | Py_ssize_t totalRead = 0; | |
|
206 | Py_ssize_t totalWrite = 0; | |
|
207 | char* readBuffer; | |
|
208 | Py_ssize_t readSize; | |
|
209 | PyObject* readResult; | |
|
210 | PyObject* res = NULL; | |
|
211 | size_t zresult; | |
|
212 | PyObject* writeResult; | |
|
213 | PyObject* totalReadPy; | |
|
214 | PyObject* totalWritePy; | |
|
215 | ||
|
216 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nkk", kwlist, &source, &dest, &sourceSize, | |
|
217 | &inSize, &outSize)) { | |
|
218 | return NULL; | |
|
219 | } | |
|
220 | ||
|
221 | if (!PyObject_HasAttrString(source, "read")) { | |
|
222 | PyErr_SetString(PyExc_ValueError, "first argument must have a read() method"); | |
|
223 | return NULL; | |
|
224 | } | |
|
225 | ||
|
226 | if (!PyObject_HasAttrString(dest, "write")) { | |
|
227 | PyErr_SetString(PyExc_ValueError, "second argument must have a write() method"); | |
|
228 | return NULL; | |
|
229 | } | |
|
230 | ||
|
231 | cstream = CStream_from_ZstdCompressor(self, sourceSize); | |
|
232 | if (!cstream) { | |
|
233 | res = NULL; | |
|
234 | goto finally; | |
|
235 | } | |
|
236 | ||
|
237 | output.dst = PyMem_Malloc(outSize); | |
|
238 | if (!output.dst) { | |
|
239 | PyErr_NoMemory(); | |
|
240 | res = NULL; | |
|
241 | goto finally; | |
|
242 | } | |
|
243 | output.size = outSize; | |
|
244 | output.pos = 0; | |
|
245 | ||
|
246 | while (1) { | |
|
247 | /* Try to read from source stream. */ | |
|
248 | readResult = PyObject_CallMethod(source, "read", "n", inSize); | |
|
249 | if (!readResult) { | |
|
250 | PyErr_SetString(ZstdError, "could not read() from source"); | |
|
251 | goto finally; | |
|
252 | } | |
|
253 | ||
|
254 | PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); | |
|
255 | ||
|
256 | /* If no data was read, we're at EOF. */ | |
|
257 | if (0 == readSize) { | |
|
258 | break; | |
|
259 | } | |
|
260 | ||
|
261 | totalRead += readSize; | |
|
262 | ||
|
263 | /* Send data to compressor */ | |
|
264 | input.src = readBuffer; | |
|
265 | input.size = readSize; | |
|
266 | input.pos = 0; | |
|
267 | ||
|
268 | while (input.pos < input.size) { | |
|
269 | Py_BEGIN_ALLOW_THREADS | |
|
270 | zresult = ZSTD_compressStream(cstream, &output, &input); | |
|
271 | Py_END_ALLOW_THREADS | |
|
272 | ||
|
273 | if (ZSTD_isError(zresult)) { | |
|
274 | res = NULL; | |
|
275 | PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); | |
|
276 | goto finally; | |
|
277 | } | |
|
278 | ||
|
279 | if (output.pos) { | |
|
280 | #if PY_MAJOR_VERSION >= 3 | |
|
281 | writeResult = PyObject_CallMethod(dest, "write", "y#", | |
|
282 | #else | |
|
283 | writeResult = PyObject_CallMethod(dest, "write", "s#", | |
|
284 | #endif | |
|
285 | output.dst, output.pos); | |
|
286 | Py_XDECREF(writeResult); | |
|
287 | totalWrite += output.pos; | |
|
288 | output.pos = 0; | |
|
289 | } | |
|
290 | } | |
|
291 | } | |
|
292 | ||
|
293 | /* We've finished reading. Now flush the compressor stream. */ | |
|
294 | while (1) { | |
|
295 | zresult = ZSTD_endStream(cstream, &output); | |
|
296 | if (ZSTD_isError(zresult)) { | |
|
297 | PyErr_Format(ZstdError, "error ending compression stream: %s", | |
|
298 | ZSTD_getErrorName(zresult)); | |
|
299 | res = NULL; | |
|
300 | goto finally; | |
|
301 | } | |
|
302 | ||
|
303 | if (output.pos) { | |
|
304 | #if PY_MAJOR_VERSION >= 3 | |
|
305 | writeResult = PyObject_CallMethod(dest, "write", "y#", | |
|
306 | #else | |
|
307 | writeResult = PyObject_CallMethod(dest, "write", "s#", | |
|
308 | #endif | |
|
309 | output.dst, output.pos); | |
|
310 | totalWrite += output.pos; | |
|
311 | Py_XDECREF(writeResult); | |
|
312 | output.pos = 0; | |
|
313 | } | |
|
314 | ||
|
315 | if (!zresult) { | |
|
316 | break; | |
|
317 | } | |
|
318 | } | |
|
319 | ||
|
320 | ZSTD_freeCStream(cstream); | |
|
321 | cstream = NULL; | |
|
322 | ||
|
323 | totalReadPy = PyLong_FromSsize_t(totalRead); | |
|
324 | totalWritePy = PyLong_FromSsize_t(totalWrite); | |
|
325 | res = PyTuple_Pack(2, totalReadPy, totalWritePy); | |
|
326 | Py_DecRef(totalReadPy); | |
|
327 | Py_DecRef(totalWritePy); | |
|
328 | ||
|
329 | finally: | |
|
330 | if (output.dst) { | |
|
331 | PyMem_Free(output.dst); | |
|
332 | } | |
|
333 | ||
|
334 | if (cstream) { | |
|
335 | ZSTD_freeCStream(cstream); | |
|
336 | } | |
|
337 | ||
|
338 | return res; | |
|
339 | } | |
|
340 | ||
|
341 | PyDoc_STRVAR(ZstdCompressor_compress__doc__, | |
|
342 | "compress(data)\n" | |
|
343 | "\n" | |
|
344 | "Compress data in a single operation.\n" | |
|
345 | "\n" | |
|
346 | "This is the simplest mechanism to perform compression: simply pass in a\n" | |
|
347 | "value and get a compressed value back. It is almost the most prone to abuse.\n" | |
|
348 | "The input and output values must fit in memory, so passing in very large\n" | |
|
349 | "values can result in excessive memory usage. For this reason, one of the\n" | |
|
350 | "streaming based APIs is preferred for larger values.\n" | |
|
351 | ); | |
|
352 | ||
|
353 | static PyObject* ZstdCompressor_compress(ZstdCompressor* self, PyObject* args) { | |
|
354 | const char* source; | |
|
355 | Py_ssize_t sourceSize; | |
|
356 | size_t destSize; | |
|
357 | ZSTD_CCtx* cctx; | |
|
358 | PyObject* output; | |
|
359 | char* dest; | |
|
360 | void* dictData = NULL; | |
|
361 | size_t dictSize = 0; | |
|
362 | size_t zresult; | |
|
363 | ZSTD_parameters zparams; | |
|
364 | ZSTD_customMem zmem; | |
|
365 | ||
|
366 | #if PY_MAJOR_VERSION >= 3 | |
|
367 | if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) { | |
|
368 | #else | |
|
369 | if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) { | |
|
370 | #endif | |
|
371 | return NULL; | |
|
372 | } | |
|
373 | ||
|
374 | destSize = ZSTD_compressBound(sourceSize); | |
|
375 | output = PyBytes_FromStringAndSize(NULL, destSize); | |
|
376 | if (!output) { | |
|
377 | return NULL; | |
|
378 | } | |
|
379 | ||
|
380 | dest = PyBytes_AsString(output); | |
|
381 | ||
|
382 | cctx = ZSTD_createCCtx(); | |
|
383 | if (!cctx) { | |
|
384 | Py_DECREF(output); | |
|
385 | PyErr_SetString(ZstdError, "could not create CCtx"); | |
|
386 | return NULL; | |
|
387 | } | |
|
388 | ||
|
389 | if (self->dict) { | |
|
390 | dictData = self->dict->dictData; | |
|
391 | dictSize = self->dict->dictSize; | |
|
392 | } | |
|
393 | ||
|
394 | memset(&zparams, 0, sizeof(zparams)); | |
|
395 | if (!self->cparams) { | |
|
396 | zparams.cParams = ZSTD_getCParams(self->compressionLevel, sourceSize, dictSize); | |
|
397 | } | |
|
398 | else { | |
|
399 | ztopy_compression_parameters(self->cparams, &zparams.cParams); | |
|
400 | /* Do NOT call ZSTD_adjustCParams() here because the compression params | |
|
401 | come from the user. */ | |
|
402 | } | |
|
403 | ||
|
404 | zparams.fParams = self->fparams; | |
|
405 | ||
|
406 | /* The raw dict data has to be processed before it can be used. Since this | |
|
407 | adds overhead - especially if multiple dictionary compression operations | |
|
408 | are performed on the same ZstdCompressor instance - we create a | |
|
409 | ZSTD_CDict once and reuse it for all operations. */ | |
|
410 | ||
|
411 | /* TODO the zparams (which can be derived from the source data size) used | |
|
412 | on first invocation are effectively reused for subsequent operations. This | |
|
413 | may not be appropriate if input sizes vary significantly and could affect | |
|
414 | chosen compression parameters. | |
|
415 | https://github.com/facebook/zstd/issues/358 tracks this issue. */ | |
|
416 | if (dictData && !self->cdict) { | |
|
417 | Py_BEGIN_ALLOW_THREADS | |
|
418 | memset(&zmem, 0, sizeof(zmem)); | |
|
419 | self->cdict = ZSTD_createCDict_advanced(dictData, dictSize, zparams, zmem); | |
|
420 | Py_END_ALLOW_THREADS | |
|
421 | ||
|
422 | if (!self->cdict) { | |
|
423 | Py_DECREF(output); | |
|
424 | ZSTD_freeCCtx(cctx); | |
|
425 | PyErr_SetString(ZstdError, "could not create compression dictionary"); | |
|
426 | return NULL; | |
|
427 | } | |
|
428 | } | |
|
429 | ||
|
430 | Py_BEGIN_ALLOW_THREADS | |
|
431 | /* By avoiding ZSTD_compress(), we don't necessarily write out content | |
|
432 | size. This means the argument to ZstdCompressor to control frame | |
|
433 | parameters is honored. */ | |
|
434 | if (self->cdict) { | |
|
435 | zresult = ZSTD_compress_usingCDict(cctx, dest, destSize, | |
|
436 | source, sourceSize, self->cdict); | |
|
437 | } | |
|
438 | else { | |
|
439 | zresult = ZSTD_compress_advanced(cctx, dest, destSize, | |
|
440 | source, sourceSize, dictData, dictSize, zparams); | |
|
441 | } | |
|
442 | Py_END_ALLOW_THREADS | |
|
443 | ||
|
444 | ZSTD_freeCCtx(cctx); | |
|
445 | ||
|
446 | if (ZSTD_isError(zresult)) { | |
|
447 | PyErr_Format(ZstdError, "cannot compress: %s", ZSTD_getErrorName(zresult)); | |
|
448 | Py_CLEAR(output); | |
|
449 | return NULL; | |
|
450 | } | |
|
451 | else { | |
|
452 | Py_SIZE(output) = zresult; | |
|
453 | } | |
|
454 | ||
|
455 | return output; | |
|
456 | } | |
|
457 | ||
|
458 | PyDoc_STRVAR(ZstdCompressionObj__doc__, | |
|
459 | "compressobj()\n" | |
|
460 | "\n" | |
|
461 | "Return an object exposing ``compress(data)`` and ``flush()`` methods.\n" | |
|
462 | "\n" | |
|
463 | "The returned object exposes an API similar to ``zlib.compressobj`` and\n" | |
|
464 | "``bz2.BZ2Compressor`` so that callers can swap in the zstd compressor\n" | |
|
465 | "without changing how compression is performed.\n" | |
|
466 | ); | |
|
467 | ||
|
468 | static ZstdCompressionObj* ZstdCompressor_compressobj(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { | |
|
469 | static char* kwlist[] = { | |
|
470 | "size", | |
|
471 | NULL | |
|
472 | }; | |
|
473 | ||
|
474 | Py_ssize_t inSize = 0; | |
|
475 | size_t outSize = ZSTD_CStreamOutSize(); | |
|
476 | ZstdCompressionObj* result = PyObject_New(ZstdCompressionObj, &ZstdCompressionObjType); | |
|
477 | if (!result) { | |
|
478 | return NULL; | |
|
479 | } | |
|
480 | ||
|
481 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &inSize)) { | |
|
482 | return NULL; | |
|
483 | } | |
|
484 | ||
|
485 | result->cstream = CStream_from_ZstdCompressor(self, inSize); | |
|
486 | if (!result->cstream) { | |
|
487 | Py_DECREF(result); | |
|
488 | return NULL; | |
|
489 | } | |
|
490 | ||
|
491 | result->output.dst = PyMem_Malloc(outSize); | |
|
492 | if (!result->output.dst) { | |
|
493 | PyErr_NoMemory(); | |
|
494 | Py_DECREF(result); | |
|
495 | return NULL; | |
|
496 | } | |
|
497 | result->output.size = outSize; | |
|
498 | result->output.pos = 0; | |
|
499 | ||
|
500 | result->compressor = self; | |
|
501 | Py_INCREF(result->compressor); | |
|
502 | ||
|
503 | result->flushed = 0; | |
|
504 | ||
|
505 | return result; | |
|
506 | } | |
|
507 | ||
|
508 | PyDoc_STRVAR(ZstdCompressor_read_from__doc__, | |
|
509 | "read_from(reader, [size=0, read_size=default, write_size=default])\n" | |
|
510 | "Read uncompress data from a reader and return an iterator\n" | |
|
511 | "\n" | |
|
512 | "Returns an iterator of compressed data produced from reading from ``reader``.\n" | |
|
513 | "\n" | |
|
514 | "Uncompressed data will be obtained from ``reader`` by calling the\n" | |
|
515 | "``read(size)`` method of it. The source data will be streamed into a\n" | |
|
516 | "compressor. As compressed data is available, it will be exposed to the\n" | |
|
517 | "iterator.\n" | |
|
518 | "\n" | |
|
519 | "Data is read from the source in chunks of ``read_size``. Compressed chunks\n" | |
|
520 | "are at most ``write_size`` bytes. Both values default to the zstd input and\n" | |
|
521 | "and output defaults, respectively.\n" | |
|
522 | "\n" | |
|
523 | "The caller is partially in control of how fast data is fed into the\n" | |
|
524 | "compressor by how it consumes the returned iterator. The compressor will\n" | |
|
525 | "not consume from the reader unless the caller consumes from the iterator.\n" | |
|
526 | ); | |
|
527 | ||
|
528 | static ZstdCompressorIterator* ZstdCompressor_read_from(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { | |
|
529 | static char* kwlist[] = { | |
|
530 | "reader", | |
|
531 | "size", | |
|
532 | "read_size", | |
|
533 | "write_size", | |
|
534 | NULL | |
|
535 | }; | |
|
536 | ||
|
537 | PyObject* reader; | |
|
538 | Py_ssize_t sourceSize = 0; | |
|
539 | size_t inSize = ZSTD_CStreamInSize(); | |
|
540 | size_t outSize = ZSTD_CStreamOutSize(); | |
|
541 | ZstdCompressorIterator* result; | |
|
542 | ||
|
543 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nkk", kwlist, &reader, &sourceSize, | |
|
544 | &inSize, &outSize)) { | |
|
545 | return NULL; | |
|
546 | } | |
|
547 | ||
|
548 | result = PyObject_New(ZstdCompressorIterator, &ZstdCompressorIteratorType); | |
|
549 | if (!result) { | |
|
550 | return NULL; | |
|
551 | } | |
|
552 | ||
|
553 | result->compressor = NULL; | |
|
554 | result->reader = NULL; | |
|
555 | result->buffer = NULL; | |
|
556 | result->cstream = NULL; | |
|
557 | result->input.src = NULL; | |
|
558 | result->output.dst = NULL; | |
|
559 | result->readResult = NULL; | |
|
560 | ||
|
561 | if (PyObject_HasAttrString(reader, "read")) { | |
|
562 | result->reader = reader; | |
|
563 | Py_INCREF(result->reader); | |
|
564 | } | |
|
565 | else if (1 == PyObject_CheckBuffer(reader)) { | |
|
566 | result->buffer = PyMem_Malloc(sizeof(Py_buffer)); | |
|
567 | if (!result->buffer) { | |
|
568 | goto except; | |
|
569 | } | |
|
570 | ||
|
571 | memset(result->buffer, 0, sizeof(Py_buffer)); | |
|
572 | ||
|
573 | if (0 != PyObject_GetBuffer(reader, result->buffer, PyBUF_CONTIG_RO)) { | |
|
574 | goto except; | |
|
575 | } | |
|
576 | ||
|
577 | result->bufferOffset = 0; | |
|
578 | sourceSize = result->buffer->len; | |
|
579 | } | |
|
580 | else { | |
|
581 | PyErr_SetString(PyExc_ValueError, | |
|
582 | "must pass an object with a read() method or conforms to buffer protocol"); | |
|
583 | goto except; | |
|
584 | } | |
|
585 | ||
|
586 | result->compressor = self; | |
|
587 | Py_INCREF(result->compressor); | |
|
588 | ||
|
589 | result->sourceSize = sourceSize; | |
|
590 | result->cstream = CStream_from_ZstdCompressor(self, sourceSize); | |
|
591 | if (!result->cstream) { | |
|
592 | goto except; | |
|
593 | } | |
|
594 | ||
|
595 | result->inSize = inSize; | |
|
596 | result->outSize = outSize; | |
|
597 | ||
|
598 | result->output.dst = PyMem_Malloc(outSize); | |
|
599 | if (!result->output.dst) { | |
|
600 | PyErr_NoMemory(); | |
|
601 | goto except; | |
|
602 | } | |
|
603 | result->output.size = outSize; | |
|
604 | result->output.pos = 0; | |
|
605 | ||
|
606 | result->input.src = NULL; | |
|
607 | result->input.size = 0; | |
|
608 | result->input.pos = 0; | |
|
609 | ||
|
610 | result->finishedInput = 0; | |
|
611 | result->finishedOutput = 0; | |
|
612 | ||
|
613 | goto finally; | |
|
614 | ||
|
615 | except: | |
|
616 | if (result->cstream) { | |
|
617 | ZSTD_freeCStream(result->cstream); | |
|
618 | result->cstream = NULL; | |
|
619 | } | |
|
620 | ||
|
621 | Py_DecRef((PyObject*)result->compressor); | |
|
622 | Py_DecRef(result->reader); | |
|
623 | ||
|
624 | Py_DECREF(result); | |
|
625 | result = NULL; | |
|
626 | ||
|
627 | finally: | |
|
628 | return result; | |
|
629 | } | |
|
630 | ||
|
631 | PyDoc_STRVAR(ZstdCompressor_write_to___doc__, | |
|
632 | "Create a context manager to write compressed data to an object.\n" | |
|
633 | "\n" | |
|
634 | "The passed object must have a ``write()`` method.\n" | |
|
635 | "\n" | |
|
636 | "The caller feeds input data to the object by calling ``compress(data)``.\n" | |
|
637 | "Compressed data is written to the argument given to this function.\n" | |
|
638 | "\n" | |
|
639 | "The function takes an optional ``size`` argument indicating the total size\n" | |
|
640 | "of the eventual input. If specified, the size will influence compression\n" | |
|
641 | "parameter tuning and could result in the size being written into the\n" | |
|
642 | "header of the compressed data.\n" | |
|
643 | "\n" | |
|
644 | "An optional ``write_size`` argument is also accepted. It defines the maximum\n" | |
|
645 | "byte size of chunks fed to ``write()``. By default, it uses the zstd default\n" | |
|
646 | "for a compressor output stream.\n" | |
|
647 | ); | |
|
648 | ||
|
649 | static ZstdCompressionWriter* ZstdCompressor_write_to(ZstdCompressor* self, PyObject* args, PyObject* kwargs) { | |
|
650 | static char* kwlist[] = { | |
|
651 | "writer", | |
|
652 | "size", | |
|
653 | "write_size", | |
|
654 | NULL | |
|
655 | }; | |
|
656 | ||
|
657 | PyObject* writer; | |
|
658 | ZstdCompressionWriter* result; | |
|
659 | Py_ssize_t sourceSize = 0; | |
|
660 | size_t outSize = ZSTD_CStreamOutSize(); | |
|
661 | ||
|
662 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nk", kwlist, &writer, &sourceSize, | |
|
663 | &outSize)) { | |
|
664 | return NULL; | |
|
665 | } | |
|
666 | ||
|
667 | if (!PyObject_HasAttrString(writer, "write")) { | |
|
668 | PyErr_SetString(PyExc_ValueError, "must pass an object with a write() method"); | |
|
669 | return NULL; | |
|
670 | } | |
|
671 | ||
|
672 | result = PyObject_New(ZstdCompressionWriter, &ZstdCompressionWriterType); | |
|
673 | if (!result) { | |
|
674 | return NULL; | |
|
675 | } | |
|
676 | ||
|
677 | result->compressor = self; | |
|
678 | Py_INCREF(result->compressor); | |
|
679 | ||
|
680 | result->writer = writer; | |
|
681 | Py_INCREF(result->writer); | |
|
682 | ||
|
683 | result->sourceSize = sourceSize; | |
|
684 | ||
|
685 | result->outSize = outSize; | |
|
686 | ||
|
687 | result->entered = 0; | |
|
688 | result->cstream = NULL; | |
|
689 | ||
|
690 | return result; | |
|
691 | } | |
|
692 | ||
|
693 | static PyMethodDef ZstdCompressor_methods[] = { | |
|
694 | { "compress", (PyCFunction)ZstdCompressor_compress, METH_VARARGS, | |
|
695 | ZstdCompressor_compress__doc__ }, | |
|
696 | { "compressobj", (PyCFunction)ZstdCompressor_compressobj, | |
|
697 | METH_VARARGS | METH_KEYWORDS, ZstdCompressionObj__doc__ }, | |
|
698 | { "copy_stream", (PyCFunction)ZstdCompressor_copy_stream, | |
|
699 | METH_VARARGS | METH_KEYWORDS, ZstdCompressor_copy_stream__doc__ }, | |
|
700 | { "read_from", (PyCFunction)ZstdCompressor_read_from, | |
|
701 | METH_VARARGS | METH_KEYWORDS, ZstdCompressor_read_from__doc__ }, | |
|
702 | { "write_to", (PyCFunction)ZstdCompressor_write_to, | |
|
703 | METH_VARARGS | METH_KEYWORDS, ZstdCompressor_write_to___doc__ }, | |
|
704 | { NULL, NULL } | |
|
705 | }; | |
|
706 | ||
|
707 | PyTypeObject ZstdCompressorType = { | |
|
708 | PyVarObject_HEAD_INIT(NULL, 0) | |
|
709 | "zstd.ZstdCompressor", /* tp_name */ | |
|
710 | sizeof(ZstdCompressor), /* tp_basicsize */ | |
|
711 | 0, /* tp_itemsize */ | |
|
712 | (destructor)ZstdCompressor_dealloc, /* tp_dealloc */ | |
|
713 | 0, /* tp_print */ | |
|
714 | 0, /* tp_getattr */ | |
|
715 | 0, /* tp_setattr */ | |
|
716 | 0, /* tp_compare */ | |
|
717 | 0, /* tp_repr */ | |
|
718 | 0, /* tp_as_number */ | |
|
719 | 0, /* tp_as_sequence */ | |
|
720 | 0, /* tp_as_mapping */ | |
|
721 | 0, /* tp_hash */ | |
|
722 | 0, /* tp_call */ | |
|
723 | 0, /* tp_str */ | |
|
724 | 0, /* tp_getattro */ | |
|
725 | 0, /* tp_setattro */ | |
|
726 | 0, /* tp_as_buffer */ | |
|
727 | Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ | |
|
728 | ZstdCompressor__doc__, /* tp_doc */ | |
|
729 | 0, /* tp_traverse */ | |
|
730 | 0, /* tp_clear */ | |
|
731 | 0, /* tp_richcompare */ | |
|
732 | 0, /* tp_weaklistoffset */ | |
|
733 | 0, /* tp_iter */ | |
|
734 | 0, /* tp_iternext */ | |
|
735 | ZstdCompressor_methods, /* tp_methods */ | |
|
736 | 0, /* tp_members */ | |
|
737 | 0, /* tp_getset */ | |
|
738 | 0, /* tp_base */ | |
|
739 | 0, /* tp_dict */ | |
|
740 | 0, /* tp_descr_get */ | |
|
741 | 0, /* tp_descr_set */ | |
|
742 | 0, /* tp_dictoffset */ | |
|
743 | (initproc)ZstdCompressor_init, /* tp_init */ | |
|
744 | 0, /* tp_alloc */ | |
|
745 | PyType_GenericNew, /* tp_new */ | |
|
746 | }; | |
|
747 | ||
|
748 | void compressor_module_init(PyObject* mod) { | |
|
749 | Py_TYPE(&ZstdCompressorType) = &PyType_Type; | |
|
750 | if (PyType_Ready(&ZstdCompressorType) < 0) { | |
|
751 | return; | |
|
752 | } | |
|
753 | ||
|
754 | Py_INCREF((PyObject*)&ZstdCompressorType); | |
|
755 | PyModule_AddObject(mod, "ZstdCompressor", | |
|
756 | (PyObject*)&ZstdCompressorType); | |
|
757 | } |
@@ -0,0 +1,234 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | #define min(a, b) (((a) < (b)) ? (a) : (b)) | |
|
12 | ||
|
13 | extern PyObject* ZstdError; | |
|
14 | ||
|
15 | PyDoc_STRVAR(ZstdCompressorIterator__doc__, | |
|
16 | "Represents an iterator of compressed data.\n" | |
|
17 | ); | |
|
18 | ||
|
19 | static void ZstdCompressorIterator_dealloc(ZstdCompressorIterator* self) { | |
|
20 | Py_XDECREF(self->readResult); | |
|
21 | Py_XDECREF(self->compressor); | |
|
22 | Py_XDECREF(self->reader); | |
|
23 | ||
|
24 | if (self->buffer) { | |
|
25 | PyBuffer_Release(self->buffer); | |
|
26 | PyMem_FREE(self->buffer); | |
|
27 | self->buffer = NULL; | |
|
28 | } | |
|
29 | ||
|
30 | if (self->cstream) { | |
|
31 | ZSTD_freeCStream(self->cstream); | |
|
32 | self->cstream = NULL; | |
|
33 | } | |
|
34 | ||
|
35 | if (self->output.dst) { | |
|
36 | PyMem_Free(self->output.dst); | |
|
37 | self->output.dst = NULL; | |
|
38 | } | |
|
39 | ||
|
40 | PyObject_Del(self); | |
|
41 | } | |
|
42 | ||
|
43 | static PyObject* ZstdCompressorIterator_iter(PyObject* self) { | |
|
44 | Py_INCREF(self); | |
|
45 | return self; | |
|
46 | } | |
|
47 | ||
|
48 | static PyObject* ZstdCompressorIterator_iternext(ZstdCompressorIterator* self) { | |
|
49 | size_t zresult; | |
|
50 | PyObject* readResult = NULL; | |
|
51 | PyObject* chunk; | |
|
52 | char* readBuffer; | |
|
53 | Py_ssize_t readSize = 0; | |
|
54 | Py_ssize_t bufferRemaining; | |
|
55 | ||
|
56 | if (self->finishedOutput) { | |
|
57 | PyErr_SetString(PyExc_StopIteration, "output flushed"); | |
|
58 | return NULL; | |
|
59 | } | |
|
60 | ||
|
61 | feedcompressor: | |
|
62 | ||
|
63 | /* If we have data left in the input, consume it. */ | |
|
64 | if (self->input.pos < self->input.size) { | |
|
65 | Py_BEGIN_ALLOW_THREADS | |
|
66 | zresult = ZSTD_compressStream(self->cstream, &self->output, &self->input); | |
|
67 | Py_END_ALLOW_THREADS | |
|
68 | ||
|
69 | /* Release the Python object holding the input buffer. */ | |
|
70 | if (self->input.pos == self->input.size) { | |
|
71 | self->input.src = NULL; | |
|
72 | self->input.pos = 0; | |
|
73 | self->input.size = 0; | |
|
74 | Py_DECREF(self->readResult); | |
|
75 | self->readResult = NULL; | |
|
76 | } | |
|
77 | ||
|
78 | if (ZSTD_isError(zresult)) { | |
|
79 | PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); | |
|
80 | return NULL; | |
|
81 | } | |
|
82 | ||
|
83 | /* If it produced output data, emit it. */ | |
|
84 | if (self->output.pos) { | |
|
85 | chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); | |
|
86 | self->output.pos = 0; | |
|
87 | return chunk; | |
|
88 | } | |
|
89 | } | |
|
90 | ||
|
91 | /* We should never have output data sitting around after a previous call. */ | |
|
92 | assert(self->output.pos == 0); | |
|
93 | ||
|
94 | /* The code above should have either emitted a chunk and returned or consumed | |
|
95 | the entire input buffer. So the state of the input buffer is not | |
|
96 | relevant. */ | |
|
97 | if (!self->finishedInput) { | |
|
98 | if (self->reader) { | |
|
99 | readResult = PyObject_CallMethod(self->reader, "read", "I", self->inSize); | |
|
100 | if (!readResult) { | |
|
101 | PyErr_SetString(ZstdError, "could not read() from source"); | |
|
102 | return NULL; | |
|
103 | } | |
|
104 | ||
|
105 | PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); | |
|
106 | } | |
|
107 | else { | |
|
108 | assert(self->buffer && self->buffer->buf); | |
|
109 | ||
|
110 | /* Only support contiguous C arrays. */ | |
|
111 | assert(self->buffer->strides == NULL && self->buffer->suboffsets == NULL); | |
|
112 | assert(self->buffer->itemsize == 1); | |
|
113 | ||
|
114 | readBuffer = (char*)self->buffer->buf + self->bufferOffset; | |
|
115 | bufferRemaining = self->buffer->len - self->bufferOffset; | |
|
116 | readSize = min(bufferRemaining, (Py_ssize_t)self->inSize); | |
|
117 | self->bufferOffset += readSize; | |
|
118 | } | |
|
119 | ||
|
120 | if (0 == readSize) { | |
|
121 | Py_XDECREF(readResult); | |
|
122 | self->finishedInput = 1; | |
|
123 | } | |
|
124 | else { | |
|
125 | self->readResult = readResult; | |
|
126 | } | |
|
127 | } | |
|
128 | ||
|
129 | /* EOF */ | |
|
130 | if (0 == readSize) { | |
|
131 | zresult = ZSTD_endStream(self->cstream, &self->output); | |
|
132 | if (ZSTD_isError(zresult)) { | |
|
133 | PyErr_Format(ZstdError, "error ending compression stream: %s", | |
|
134 | ZSTD_getErrorName(zresult)); | |
|
135 | return NULL; | |
|
136 | } | |
|
137 | ||
|
138 | assert(self->output.pos); | |
|
139 | ||
|
140 | if (0 == zresult) { | |
|
141 | self->finishedOutput = 1; | |
|
142 | } | |
|
143 | ||
|
144 | chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); | |
|
145 | self->output.pos = 0; | |
|
146 | return chunk; | |
|
147 | } | |
|
148 | ||
|
149 | /* New data from reader. Feed into compressor. */ | |
|
150 | self->input.src = readBuffer; | |
|
151 | self->input.size = readSize; | |
|
152 | self->input.pos = 0; | |
|
153 | ||
|
154 | Py_BEGIN_ALLOW_THREADS | |
|
155 | zresult = ZSTD_compressStream(self->cstream, &self->output, &self->input); | |
|
156 | Py_END_ALLOW_THREADS | |
|
157 | ||
|
158 | /* The input buffer currently points to memory managed by Python | |
|
159 | (readBuffer). This object was allocated by this function. If it wasn't | |
|
160 | fully consumed, we need to release it in a subsequent function call. | |
|
161 | If it is fully consumed, do that now. | |
|
162 | */ | |
|
163 | if (self->input.pos == self->input.size) { | |
|
164 | self->input.src = NULL; | |
|
165 | self->input.pos = 0; | |
|
166 | self->input.size = 0; | |
|
167 | Py_XDECREF(self->readResult); | |
|
168 | self->readResult = NULL; | |
|
169 | } | |
|
170 | ||
|
171 | if (ZSTD_isError(zresult)) { | |
|
172 | PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult)); | |
|
173 | return NULL; | |
|
174 | } | |
|
175 | ||
|
176 | assert(self->input.pos <= self->input.size); | |
|
177 | ||
|
178 | /* If we didn't write anything, start the process over. */ | |
|
179 | if (0 == self->output.pos) { | |
|
180 | goto feedcompressor; | |
|
181 | } | |
|
182 | ||
|
183 | chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos); | |
|
184 | self->output.pos = 0; | |
|
185 | return chunk; | |
|
186 | } | |
|
187 | ||
|
188 | PyTypeObject ZstdCompressorIteratorType = { | |
|
189 | PyVarObject_HEAD_INIT(NULL, 0) | |
|
190 | "zstd.ZstdCompressorIterator", /* tp_name */ | |
|
191 | sizeof(ZstdCompressorIterator), /* tp_basicsize */ | |
|
192 | 0, /* tp_itemsize */ | |
|
193 | (destructor)ZstdCompressorIterator_dealloc, /* tp_dealloc */ | |
|
194 | 0, /* tp_print */ | |
|
195 | 0, /* tp_getattr */ | |
|
196 | 0, /* tp_setattr */ | |
|
197 | 0, /* tp_compare */ | |
|
198 | 0, /* tp_repr */ | |
|
199 | 0, /* tp_as_number */ | |
|
200 | 0, /* tp_as_sequence */ | |
|
201 | 0, /* tp_as_mapping */ | |
|
202 | 0, /* tp_hash */ | |
|
203 | 0, /* tp_call */ | |
|
204 | 0, /* tp_str */ | |
|
205 | 0, /* tp_getattro */ | |
|
206 | 0, /* tp_setattro */ | |
|
207 | 0, /* tp_as_buffer */ | |
|
208 | Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ | |
|
209 | ZstdCompressorIterator__doc__, /* tp_doc */ | |
|
210 | 0, /* tp_traverse */ | |
|
211 | 0, /* tp_clear */ | |
|
212 | 0, /* tp_richcompare */ | |
|
213 | 0, /* tp_weaklistoffset */ | |
|
214 | ZstdCompressorIterator_iter, /* tp_iter */ | |
|
215 | (iternextfunc)ZstdCompressorIterator_iternext, /* tp_iternext */ | |
|
216 | 0, /* tp_methods */ | |
|
217 | 0, /* tp_members */ | |
|
218 | 0, /* tp_getset */ | |
|
219 | 0, /* tp_base */ | |
|
220 | 0, /* tp_dict */ | |
|
221 | 0, /* tp_descr_get */ | |
|
222 | 0, /* tp_descr_set */ | |
|
223 | 0, /* tp_dictoffset */ | |
|
224 | 0, /* tp_init */ | |
|
225 | 0, /* tp_alloc */ | |
|
226 | PyType_GenericNew, /* tp_new */ | |
|
227 | }; | |
|
228 | ||
|
229 | void compressoriterator_module_init(PyObject* mod) { | |
|
230 | Py_TYPE(&ZstdCompressorIteratorType) = &PyType_Type; | |
|
231 | if (PyType_Ready(&ZstdCompressorIteratorType) < 0) { | |
|
232 | return; | |
|
233 | } | |
|
234 | } |
@@ -0,0 +1,84 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | extern PyObject* ZstdError; | |
|
12 | ||
|
13 | static char frame_header[] = { | |
|
14 | '\x28', | |
|
15 | '\xb5', | |
|
16 | '\x2f', | |
|
17 | '\xfd', | |
|
18 | }; | |
|
19 | ||
|
20 | void constants_module_init(PyObject* mod) { | |
|
21 | PyObject* version; | |
|
22 | PyObject* zstdVersion; | |
|
23 | PyObject* frameHeader; | |
|
24 | ||
|
25 | #if PY_MAJOR_VERSION >= 3 | |
|
26 | version = PyUnicode_FromString(PYTHON_ZSTANDARD_VERSION); | |
|
27 | #else | |
|
28 | version = PyString_FromString(PYTHON_ZSTANDARD_VERSION); | |
|
29 | #endif | |
|
30 | Py_INCREF(version); | |
|
31 | PyModule_AddObject(mod, "__version__", version); | |
|
32 | ||
|
33 | ZstdError = PyErr_NewException("zstd.ZstdError", NULL, NULL); | |
|
34 | PyModule_AddObject(mod, "ZstdError", ZstdError); | |
|
35 | ||
|
36 | /* For now, the version is a simple tuple instead of a dedicated type. */ | |
|
37 | zstdVersion = PyTuple_New(3); | |
|
38 | PyTuple_SetItem(zstdVersion, 0, PyLong_FromLong(ZSTD_VERSION_MAJOR)); | |
|
39 | PyTuple_SetItem(zstdVersion, 1, PyLong_FromLong(ZSTD_VERSION_MINOR)); | |
|
40 | PyTuple_SetItem(zstdVersion, 2, PyLong_FromLong(ZSTD_VERSION_RELEASE)); | |
|
41 | Py_IncRef(zstdVersion); | |
|
42 | PyModule_AddObject(mod, "ZSTD_VERSION", zstdVersion); | |
|
43 | ||
|
44 | frameHeader = PyBytes_FromStringAndSize(frame_header, sizeof(frame_header)); | |
|
45 | if (frameHeader) { | |
|
46 | PyModule_AddObject(mod, "FRAME_HEADER", frameHeader); | |
|
47 | } | |
|
48 | else { | |
|
49 | PyErr_Format(PyExc_ValueError, "could not create frame header object"); | |
|
50 | } | |
|
51 | ||
|
52 | PyModule_AddIntConstant(mod, "MAX_COMPRESSION_LEVEL", ZSTD_maxCLevel()); | |
|
53 | PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_INPUT_SIZE", | |
|
54 | (long)ZSTD_CStreamInSize()); | |
|
55 | PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_OUTPUT_SIZE", | |
|
56 | (long)ZSTD_CStreamOutSize()); | |
|
57 | PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_INPUT_SIZE", | |
|
58 | (long)ZSTD_DStreamInSize()); | |
|
59 | PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE", | |
|
60 | (long)ZSTD_DStreamOutSize()); | |
|
61 | ||
|
62 | PyModule_AddIntConstant(mod, "MAGIC_NUMBER", ZSTD_MAGICNUMBER); | |
|
63 | PyModule_AddIntConstant(mod, "WINDOWLOG_MIN", ZSTD_WINDOWLOG_MIN); | |
|
64 | PyModule_AddIntConstant(mod, "WINDOWLOG_MAX", ZSTD_WINDOWLOG_MAX); | |
|
65 | PyModule_AddIntConstant(mod, "CHAINLOG_MIN", ZSTD_CHAINLOG_MIN); | |
|
66 | PyModule_AddIntConstant(mod, "CHAINLOG_MAX", ZSTD_CHAINLOG_MAX); | |
|
67 | PyModule_AddIntConstant(mod, "HASHLOG_MIN", ZSTD_HASHLOG_MIN); | |
|
68 | PyModule_AddIntConstant(mod, "HASHLOG_MAX", ZSTD_HASHLOG_MAX); | |
|
69 | PyModule_AddIntConstant(mod, "HASHLOG3_MAX", ZSTD_HASHLOG3_MAX); | |
|
70 | PyModule_AddIntConstant(mod, "SEARCHLOG_MIN", ZSTD_SEARCHLOG_MIN); | |
|
71 | PyModule_AddIntConstant(mod, "SEARCHLOG_MAX", ZSTD_SEARCHLOG_MAX); | |
|
72 | PyModule_AddIntConstant(mod, "SEARCHLENGTH_MIN", ZSTD_SEARCHLENGTH_MIN); | |
|
73 | PyModule_AddIntConstant(mod, "SEARCHLENGTH_MAX", ZSTD_SEARCHLENGTH_MAX); | |
|
74 | PyModule_AddIntConstant(mod, "TARGETLENGTH_MIN", ZSTD_TARGETLENGTH_MIN); | |
|
75 | PyModule_AddIntConstant(mod, "TARGETLENGTH_MAX", ZSTD_TARGETLENGTH_MAX); | |
|
76 | ||
|
77 | PyModule_AddIntConstant(mod, "STRATEGY_FAST", ZSTD_fast); | |
|
78 | PyModule_AddIntConstant(mod, "STRATEGY_DFAST", ZSTD_dfast); | |
|
79 | PyModule_AddIntConstant(mod, "STRATEGY_GREEDY", ZSTD_greedy); | |
|
80 | PyModule_AddIntConstant(mod, "STRATEGY_LAZY", ZSTD_lazy); | |
|
81 | PyModule_AddIntConstant(mod, "STRATEGY_LAZY2", ZSTD_lazy2); | |
|
82 | PyModule_AddIntConstant(mod, "STRATEGY_BTLAZY2", ZSTD_btlazy2); | |
|
83 | PyModule_AddIntConstant(mod, "STRATEGY_BTOPT", ZSTD_btopt); | |
|
84 | } |
@@ -0,0 +1,187 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | extern PyObject* ZstdError; | |
|
12 | ||
|
13 | PyDoc_STRVAR(ZstdDecompressionWriter__doc, | |
|
14 | """A context manager used for writing decompressed output.\n" | |
|
15 | ); | |
|
16 | ||
|
17 | static void ZstdDecompressionWriter_dealloc(ZstdDecompressionWriter* self) { | |
|
18 | Py_XDECREF(self->decompressor); | |
|
19 | Py_XDECREF(self->writer); | |
|
20 | ||
|
21 | if (self->dstream) { | |
|
22 | ZSTD_freeDStream(self->dstream); | |
|
23 | self->dstream = NULL; | |
|
24 | } | |
|
25 | ||
|
26 | PyObject_Del(self); | |
|
27 | } | |
|
28 | ||
|
29 | static PyObject* ZstdDecompressionWriter_enter(ZstdDecompressionWriter* self) { | |
|
30 | if (self->entered) { | |
|
31 | PyErr_SetString(ZstdError, "cannot __enter__ multiple times"); | |
|
32 | return NULL; | |
|
33 | } | |
|
34 | ||
|
35 | self->dstream = DStream_from_ZstdDecompressor(self->decompressor); | |
|
36 | if (!self->dstream) { | |
|
37 | return NULL; | |
|
38 | } | |
|
39 | ||
|
40 | self->entered = 1; | |
|
41 | ||
|
42 | Py_INCREF(self); | |
|
43 | return (PyObject*)self; | |
|
44 | } | |
|
45 | ||
|
46 | static PyObject* ZstdDecompressionWriter_exit(ZstdDecompressionWriter* self, PyObject* args) { | |
|
47 | self->entered = 0; | |
|
48 | ||
|
49 | if (self->dstream) { | |
|
50 | ZSTD_freeDStream(self->dstream); | |
|
51 | self->dstream = NULL; | |
|
52 | } | |
|
53 | ||
|
54 | Py_RETURN_FALSE; | |
|
55 | } | |
|
56 | ||
|
57 | static PyObject* ZstdDecompressionWriter_memory_size(ZstdDecompressionWriter* self) { | |
|
58 | if (!self->dstream) { | |
|
59 | PyErr_SetString(ZstdError, "cannot determine size of inactive decompressor; " | |
|
60 | "call when context manager is active"); | |
|
61 | return NULL; | |
|
62 | } | |
|
63 | ||
|
64 | return PyLong_FromSize_t(ZSTD_sizeof_DStream(self->dstream)); | |
|
65 | } | |
|
66 | ||
|
67 | static PyObject* ZstdDecompressionWriter_write(ZstdDecompressionWriter* self, PyObject* args) { | |
|
68 | const char* source; | |
|
69 | Py_ssize_t sourceSize; | |
|
70 | size_t zresult = 0; | |
|
71 | ZSTD_inBuffer input; | |
|
72 | ZSTD_outBuffer output; | |
|
73 | PyObject* res; | |
|
74 | ||
|
75 | #if PY_MAJOR_VERSION >= 3 | |
|
76 | if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) { | |
|
77 | #else | |
|
78 | if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) { | |
|
79 | #endif | |
|
80 | return NULL; | |
|
81 | } | |
|
82 | ||
|
83 | if (!self->entered) { | |
|
84 | PyErr_SetString(ZstdError, "write must be called from an active context manager"); | |
|
85 | return NULL; | |
|
86 | } | |
|
87 | ||
|
88 | output.dst = malloc(self->outSize); | |
|
89 | if (!output.dst) { | |
|
90 | return PyErr_NoMemory(); | |
|
91 | } | |
|
92 | output.size = self->outSize; | |
|
93 | output.pos = 0; | |
|
94 | ||
|
95 | input.src = source; | |
|
96 | input.size = sourceSize; | |
|
97 | input.pos = 0; | |
|
98 | ||
|
99 | while ((ssize_t)input.pos < sourceSize) { | |
|
100 | Py_BEGIN_ALLOW_THREADS | |
|
101 | zresult = ZSTD_decompressStream(self->dstream, &output, &input); | |
|
102 | Py_END_ALLOW_THREADS | |
|
103 | ||
|
104 | if (ZSTD_isError(zresult)) { | |
|
105 | free(output.dst); | |
|
106 | PyErr_Format(ZstdError, "zstd decompress error: %s", | |
|
107 | ZSTD_getErrorName(zresult)); | |
|
108 | return NULL; | |
|
109 | } | |
|
110 | ||
|
111 | if (output.pos) { | |
|
112 | #if PY_MAJOR_VERSION >= 3 | |
|
113 | res = PyObject_CallMethod(self->writer, "write", "y#", | |
|
114 | #else | |
|
115 | res = PyObject_CallMethod(self->writer, "write", "s#", | |
|
116 | #endif | |
|
117 | output.dst, output.pos); | |
|
118 | Py_XDECREF(res); | |
|
119 | output.pos = 0; | |
|
120 | } | |
|
121 | } | |
|
122 | ||
|
123 | free(output.dst); | |
|
124 | ||
|
125 | /* TODO return bytes written */ | |
|
126 | Py_RETURN_NONE; | |
|
127 | } | |
|
128 | ||
|
129 | static PyMethodDef ZstdDecompressionWriter_methods[] = { | |
|
130 | { "__enter__", (PyCFunction)ZstdDecompressionWriter_enter, METH_NOARGS, | |
|
131 | PyDoc_STR("Enter a decompression context.") }, | |
|
132 | { "__exit__", (PyCFunction)ZstdDecompressionWriter_exit, METH_VARARGS, | |
|
133 | PyDoc_STR("Exit a decompression context.") }, | |
|
134 | { "memory_size", (PyCFunction)ZstdDecompressionWriter_memory_size, METH_NOARGS, | |
|
135 | PyDoc_STR("Obtain the memory size in bytes of the underlying decompressor.") }, | |
|
136 | { "write", (PyCFunction)ZstdDecompressionWriter_write, METH_VARARGS, | |
|
137 | PyDoc_STR("Compress data") }, | |
|
138 | { NULL, NULL } | |
|
139 | }; | |
|
140 | ||
|
141 | PyTypeObject ZstdDecompressionWriterType = { | |
|
142 | PyVarObject_HEAD_INIT(NULL, 0) | |
|
143 | "zstd.ZstdDecompressionWriter", /* tp_name */ | |
|
144 | sizeof(ZstdDecompressionWriter),/* tp_basicsize */ | |
|
145 | 0, /* tp_itemsize */ | |
|
146 | (destructor)ZstdDecompressionWriter_dealloc, /* tp_dealloc */ | |
|
147 | 0, /* tp_print */ | |
|
148 | 0, /* tp_getattr */ | |
|
149 | 0, /* tp_setattr */ | |
|
150 | 0, /* tp_compare */ | |
|
151 | 0, /* tp_repr */ | |
|
152 | 0, /* tp_as_number */ | |
|
153 | 0, /* tp_as_sequence */ | |
|
154 | 0, /* tp_as_mapping */ | |
|
155 | 0, /* tp_hash */ | |
|
156 | 0, /* tp_call */ | |
|
157 | 0, /* tp_str */ | |
|
158 | 0, /* tp_getattro */ | |
|
159 | 0, /* tp_setattro */ | |
|
160 | 0, /* tp_as_buffer */ | |
|
161 | Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ | |
|
162 | ZstdDecompressionWriter__doc, /* tp_doc */ | |
|
163 | 0, /* tp_traverse */ | |
|
164 | 0, /* tp_clear */ | |
|
165 | 0, /* tp_richcompare */ | |
|
166 | 0, /* tp_weaklistoffset */ | |
|
167 | 0, /* tp_iter */ | |
|
168 | 0, /* tp_iternext */ | |
|
169 | ZstdDecompressionWriter_methods,/* tp_methods */ | |
|
170 | 0, /* tp_members */ | |
|
171 | 0, /* tp_getset */ | |
|
172 | 0, /* tp_base */ | |
|
173 | 0, /* tp_dict */ | |
|
174 | 0, /* tp_descr_get */ | |
|
175 | 0, /* tp_descr_set */ | |
|
176 | 0, /* tp_dictoffset */ | |
|
177 | 0, /* tp_init */ | |
|
178 | 0, /* tp_alloc */ | |
|
179 | PyType_GenericNew, /* tp_new */ | |
|
180 | }; | |
|
181 | ||
|
182 | void decompressionwriter_module_init(PyObject* mod) { | |
|
183 | Py_TYPE(&ZstdDecompressionWriterType) = &PyType_Type; | |
|
184 | if (PyType_Ready(&ZstdDecompressionWriterType) < 0) { | |
|
185 | return; | |
|
186 | } | |
|
187 | } |
@@ -0,0 +1,170 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | extern PyObject* ZstdError; | |
|
12 | ||
|
13 | PyDoc_STRVAR(DecompressionObj__doc__, | |
|
14 | "Perform decompression using a standard library compatible API.\n" | |
|
15 | ); | |
|
16 | ||
|
17 | static void DecompressionObj_dealloc(ZstdDecompressionObj* self) { | |
|
18 | if (self->dstream) { | |
|
19 | ZSTD_freeDStream(self->dstream); | |
|
20 | self->dstream = NULL; | |
|
21 | } | |
|
22 | ||
|
23 | Py_XDECREF(self->decompressor); | |
|
24 | ||
|
25 | PyObject_Del(self); | |
|
26 | } | |
|
27 | ||
|
28 | static PyObject* DecompressionObj_decompress(ZstdDecompressionObj* self, PyObject* args) { | |
|
29 | const char* source; | |
|
30 | Py_ssize_t sourceSize; | |
|
31 | size_t zresult; | |
|
32 | ZSTD_inBuffer input; | |
|
33 | ZSTD_outBuffer output; | |
|
34 | size_t outSize = ZSTD_DStreamOutSize(); | |
|
35 | PyObject* result = NULL; | |
|
36 | Py_ssize_t resultSize = 0; | |
|
37 | ||
|
38 | if (self->finished) { | |
|
39 | PyErr_SetString(ZstdError, "cannot use a decompressobj multiple times"); | |
|
40 | return NULL; | |
|
41 | } | |
|
42 | ||
|
43 | #if PY_MAJOR_VERSION >= 3 | |
|
44 | if (!PyArg_ParseTuple(args, "y#", | |
|
45 | #else | |
|
46 | if (!PyArg_ParseTuple(args, "s#", | |
|
47 | #endif | |
|
48 | &source, &sourceSize)) { | |
|
49 | return NULL; | |
|
50 | } | |
|
51 | ||
|
52 | input.src = source; | |
|
53 | input.size = sourceSize; | |
|
54 | input.pos = 0; | |
|
55 | ||
|
56 | output.dst = PyMem_Malloc(outSize); | |
|
57 | if (!output.dst) { | |
|
58 | PyErr_NoMemory(); | |
|
59 | return NULL; | |
|
60 | } | |
|
61 | output.size = outSize; | |
|
62 | output.pos = 0; | |
|
63 | ||
|
64 | /* Read input until exhausted. */ | |
|
65 | while (input.pos < input.size) { | |
|
66 | Py_BEGIN_ALLOW_THREADS | |
|
67 | zresult = ZSTD_decompressStream(self->dstream, &output, &input); | |
|
68 | Py_END_ALLOW_THREADS | |
|
69 | ||
|
70 | if (ZSTD_isError(zresult)) { | |
|
71 | PyErr_Format(ZstdError, "zstd decompressor error: %s", | |
|
72 | ZSTD_getErrorName(zresult)); | |
|
73 | result = NULL; | |
|
74 | goto finally; | |
|
75 | } | |
|
76 | ||
|
77 | if (0 == zresult) { | |
|
78 | self->finished = 1; | |
|
79 | } | |
|
80 | ||
|
81 | if (output.pos) { | |
|
82 | if (result) { | |
|
83 | resultSize = PyBytes_GET_SIZE(result); | |
|
84 | if (-1 == _PyBytes_Resize(&result, resultSize + output.pos)) { | |
|
85 | goto except; | |
|
86 | } | |
|
87 | ||
|
88 | memcpy(PyBytes_AS_STRING(result) + resultSize, | |
|
89 | output.dst, output.pos); | |
|
90 | } | |
|
91 | else { | |
|
92 | result = PyBytes_FromStringAndSize(output.dst, output.pos); | |
|
93 | if (!result) { | |
|
94 | goto except; | |
|
95 | } | |
|
96 | } | |
|
97 | ||
|
98 | output.pos = 0; | |
|
99 | } | |
|
100 | } | |
|
101 | ||
|
102 | if (!result) { | |
|
103 | result = PyBytes_FromString(""); | |
|
104 | } | |
|
105 | ||
|
106 | goto finally; | |
|
107 | ||
|
108 | except: | |
|
109 | Py_DecRef(result); | |
|
110 | result = NULL; | |
|
111 | ||
|
112 | finally: | |
|
113 | PyMem_Free(output.dst); | |
|
114 | ||
|
115 | return result; | |
|
116 | } | |
|
117 | ||
|
118 | static PyMethodDef DecompressionObj_methods[] = { | |
|
119 | { "decompress", (PyCFunction)DecompressionObj_decompress, | |
|
120 | METH_VARARGS, PyDoc_STR("decompress data") }, | |
|
121 | { NULL, NULL } | |
|
122 | }; | |
|
123 | ||
|
124 | PyTypeObject ZstdDecompressionObjType = { | |
|
125 | PyVarObject_HEAD_INIT(NULL, 0) | |
|
126 | "zstd.ZstdDecompressionObj", /* tp_name */ | |
|
127 | sizeof(ZstdDecompressionObj), /* tp_basicsize */ | |
|
128 | 0, /* tp_itemsize */ | |
|
129 | (destructor)DecompressionObj_dealloc, /* tp_dealloc */ | |
|
130 | 0, /* tp_print */ | |
|
131 | 0, /* tp_getattr */ | |
|
132 | 0, /* tp_setattr */ | |
|
133 | 0, /* tp_compare */ | |
|
134 | 0, /* tp_repr */ | |
|
135 | 0, /* tp_as_number */ | |
|
136 | 0, /* tp_as_sequence */ | |
|
137 | 0, /* tp_as_mapping */ | |
|
138 | 0, /* tp_hash */ | |
|
139 | 0, /* tp_call */ | |
|
140 | 0, /* tp_str */ | |
|
141 | 0, /* tp_getattro */ | |
|
142 | 0, /* tp_setattro */ | |
|
143 | 0, /* tp_as_buffer */ | |
|
144 | Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ | |
|
145 | DecompressionObj__doc__, /* tp_doc */ | |
|
146 | 0, /* tp_traverse */ | |
|
147 | 0, /* tp_clear */ | |
|
148 | 0, /* tp_richcompare */ | |
|
149 | 0, /* tp_weaklistoffset */ | |
|
150 | 0, /* tp_iter */ | |
|
151 | 0, /* tp_iternext */ | |
|
152 | DecompressionObj_methods, /* tp_methods */ | |
|
153 | 0, /* tp_members */ | |
|
154 | 0, /* tp_getset */ | |
|
155 | 0, /* tp_base */ | |
|
156 | 0, /* tp_dict */ | |
|
157 | 0, /* tp_descr_get */ | |
|
158 | 0, /* tp_descr_set */ | |
|
159 | 0, /* tp_dictoffset */ | |
|
160 | 0, /* tp_init */ | |
|
161 | 0, /* tp_alloc */ | |
|
162 | PyType_GenericNew, /* tp_new */ | |
|
163 | }; | |
|
164 | ||
|
165 | void decompressobj_module_init(PyObject* module) { | |
|
166 | Py_TYPE(&ZstdDecompressionObjType) = &PyType_Type; | |
|
167 | if (PyType_Ready(&ZstdDecompressionObjType) < 0) { | |
|
168 | return; | |
|
169 | } | |
|
170 | } |
This diff has been collapsed as it changes many lines, (669 lines changed) Show them Hide them | |||
@@ -0,0 +1,669 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | extern PyObject* ZstdError; | |
|
12 | ||
|
13 | ZSTD_DStream* DStream_from_ZstdDecompressor(ZstdDecompressor* decompressor) { | |
|
14 | ZSTD_DStream* dstream; | |
|
15 | void* dictData = NULL; | |
|
16 | size_t dictSize = 0; | |
|
17 | size_t zresult; | |
|
18 | ||
|
19 | dstream = ZSTD_createDStream(); | |
|
20 | if (!dstream) { | |
|
21 | PyErr_SetString(ZstdError, "could not create DStream"); | |
|
22 | return NULL; | |
|
23 | } | |
|
24 | ||
|
25 | if (decompressor->dict) { | |
|
26 | dictData = decompressor->dict->dictData; | |
|
27 | dictSize = decompressor->dict->dictSize; | |
|
28 | } | |
|
29 | ||
|
30 | if (dictData) { | |
|
31 | zresult = ZSTD_initDStream_usingDict(dstream, dictData, dictSize); | |
|
32 | } | |
|
33 | else { | |
|
34 | zresult = ZSTD_initDStream(dstream); | |
|
35 | } | |
|
36 | ||
|
37 | if (ZSTD_isError(zresult)) { | |
|
38 | PyErr_Format(ZstdError, "could not initialize DStream: %s", | |
|
39 | ZSTD_getErrorName(zresult)); | |
|
40 | return NULL; | |
|
41 | } | |
|
42 | ||
|
43 | return dstream; | |
|
44 | } | |
|
45 | ||
|
46 | PyDoc_STRVAR(Decompressor__doc__, | |
|
47 | "ZstdDecompressor(dict_data=None)\n" | |
|
48 | "\n" | |
|
49 | "Create an object used to perform Zstandard decompression.\n" | |
|
50 | "\n" | |
|
51 | "An instance can perform multiple decompression operations." | |
|
52 | ); | |
|
53 | ||
|
54 | static int Decompressor_init(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { | |
|
55 | static char* kwlist[] = { | |
|
56 | "dict_data", | |
|
57 | NULL | |
|
58 | }; | |
|
59 | ||
|
60 | ZstdCompressionDict* dict = NULL; | |
|
61 | ||
|
62 | self->refdctx = NULL; | |
|
63 | self->dict = NULL; | |
|
64 | self->ddict = NULL; | |
|
65 | ||
|
66 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O!", kwlist, | |
|
67 | &ZstdCompressionDictType, &dict)) { | |
|
68 | return -1; | |
|
69 | } | |
|
70 | ||
|
71 | /* Instead of creating a ZSTD_DCtx for every decompression operation, | |
|
72 | we create an instance at object creation time and recycle it via | |
|
73 | ZSTD_copyDCTx() on each use. This means each use is a malloc+memcpy | |
|
74 | instead of a malloc+init. */ | |
|
75 | /* TODO lazily initialize the reference ZSTD_DCtx on first use since | |
|
76 | not instances of ZstdDecompressor will use a ZSTD_DCtx. */ | |
|
77 | self->refdctx = ZSTD_createDCtx(); | |
|
78 | if (!self->refdctx) { | |
|
79 | PyErr_NoMemory(); | |
|
80 | goto except; | |
|
81 | } | |
|
82 | ||
|
83 | if (dict) { | |
|
84 | self->dict = dict; | |
|
85 | Py_INCREF(dict); | |
|
86 | } | |
|
87 | ||
|
88 | return 0; | |
|
89 | ||
|
90 | except: | |
|
91 | if (self->refdctx) { | |
|
92 | ZSTD_freeDCtx(self->refdctx); | |
|
93 | self->refdctx = NULL; | |
|
94 | } | |
|
95 | ||
|
96 | return -1; | |
|
97 | } | |
|
98 | ||
|
99 | static void Decompressor_dealloc(ZstdDecompressor* self) { | |
|
100 | if (self->refdctx) { | |
|
101 | ZSTD_freeDCtx(self->refdctx); | |
|
102 | } | |
|
103 | ||
|
104 | Py_XDECREF(self->dict); | |
|
105 | ||
|
106 | if (self->ddict) { | |
|
107 | ZSTD_freeDDict(self->ddict); | |
|
108 | self->ddict = NULL; | |
|
109 | } | |
|
110 | ||
|
111 | PyObject_Del(self); | |
|
112 | } | |
|
113 | ||
|
114 | PyDoc_STRVAR(Decompressor_copy_stream__doc__, | |
|
115 | "copy_stream(ifh, ofh[, read_size=default, write_size=default]) -- decompress data between streams\n" | |
|
116 | "\n" | |
|
117 | "Compressed data will be read from ``ifh``, decompressed, and written to\n" | |
|
118 | "``ofh``. ``ifh`` must have a ``read(size)`` method. ``ofh`` must have a\n" | |
|
119 | "``write(data)`` method.\n" | |
|
120 | "\n" | |
|
121 | "The optional ``read_size`` and ``write_size`` arguments control the chunk\n" | |
|
122 | "size of data that is ``read()`` and ``write()`` between streams. They default\n" | |
|
123 | "to the default input and output sizes of zstd decompressor streams.\n" | |
|
124 | ); | |
|
125 | ||
|
126 | static PyObject* Decompressor_copy_stream(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { | |
|
127 | static char* kwlist[] = { | |
|
128 | "ifh", | |
|
129 | "ofh", | |
|
130 | "read_size", | |
|
131 | "write_size", | |
|
132 | NULL | |
|
133 | }; | |
|
134 | ||
|
135 | PyObject* source; | |
|
136 | PyObject* dest; | |
|
137 | size_t inSize = ZSTD_DStreamInSize(); | |
|
138 | size_t outSize = ZSTD_DStreamOutSize(); | |
|
139 | ZSTD_DStream* dstream; | |
|
140 | ZSTD_inBuffer input; | |
|
141 | ZSTD_outBuffer output; | |
|
142 | Py_ssize_t totalRead = 0; | |
|
143 | Py_ssize_t totalWrite = 0; | |
|
144 | char* readBuffer; | |
|
145 | Py_ssize_t readSize; | |
|
146 | PyObject* readResult; | |
|
147 | PyObject* res = NULL; | |
|
148 | size_t zresult = 0; | |
|
149 | PyObject* writeResult; | |
|
150 | PyObject* totalReadPy; | |
|
151 | PyObject* totalWritePy; | |
|
152 | ||
|
153 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|kk", kwlist, &source, | |
|
154 | &dest, &inSize, &outSize)) { | |
|
155 | return NULL; | |
|
156 | } | |
|
157 | ||
|
158 | if (!PyObject_HasAttrString(source, "read")) { | |
|
159 | PyErr_SetString(PyExc_ValueError, "first argument must have a read() method"); | |
|
160 | return NULL; | |
|
161 | } | |
|
162 | ||
|
163 | if (!PyObject_HasAttrString(dest, "write")) { | |
|
164 | PyErr_SetString(PyExc_ValueError, "second argument must have a write() method"); | |
|
165 | return NULL; | |
|
166 | } | |
|
167 | ||
|
168 | dstream = DStream_from_ZstdDecompressor(self); | |
|
169 | if (!dstream) { | |
|
170 | res = NULL; | |
|
171 | goto finally; | |
|
172 | } | |
|
173 | ||
|
174 | output.dst = PyMem_Malloc(outSize); | |
|
175 | if (!output.dst) { | |
|
176 | PyErr_NoMemory(); | |
|
177 | res = NULL; | |
|
178 | goto finally; | |
|
179 | } | |
|
180 | output.size = outSize; | |
|
181 | output.pos = 0; | |
|
182 | ||
|
183 | /* Read source stream until EOF */ | |
|
184 | while (1) { | |
|
185 | readResult = PyObject_CallMethod(source, "read", "n", inSize); | |
|
186 | if (!readResult) { | |
|
187 | PyErr_SetString(ZstdError, "could not read() from source"); | |
|
188 | goto finally; | |
|
189 | } | |
|
190 | ||
|
191 | PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); | |
|
192 | ||
|
193 | /* If no data was read, we're at EOF. */ | |
|
194 | if (0 == readSize) { | |
|
195 | break; | |
|
196 | } | |
|
197 | ||
|
198 | totalRead += readSize; | |
|
199 | ||
|
200 | /* Send data to decompressor */ | |
|
201 | input.src = readBuffer; | |
|
202 | input.size = readSize; | |
|
203 | input.pos = 0; | |
|
204 | ||
|
205 | while (input.pos < input.size) { | |
|
206 | Py_BEGIN_ALLOW_THREADS | |
|
207 | zresult = ZSTD_decompressStream(dstream, &output, &input); | |
|
208 | Py_END_ALLOW_THREADS | |
|
209 | ||
|
210 | if (ZSTD_isError(zresult)) { | |
|
211 | PyErr_Format(ZstdError, "zstd decompressor error: %s", | |
|
212 | ZSTD_getErrorName(zresult)); | |
|
213 | res = NULL; | |
|
214 | goto finally; | |
|
215 | } | |
|
216 | ||
|
217 | if (output.pos) { | |
|
218 | #if PY_MAJOR_VERSION >= 3 | |
|
219 | writeResult = PyObject_CallMethod(dest, "write", "y#", | |
|
220 | #else | |
|
221 | writeResult = PyObject_CallMethod(dest, "write", "s#", | |
|
222 | #endif | |
|
223 | output.dst, output.pos); | |
|
224 | ||
|
225 | Py_XDECREF(writeResult); | |
|
226 | totalWrite += output.pos; | |
|
227 | output.pos = 0; | |
|
228 | } | |
|
229 | } | |
|
230 | } | |
|
231 | ||
|
232 | /* Source stream is exhausted. Finish up. */ | |
|
233 | ||
|
234 | ZSTD_freeDStream(dstream); | |
|
235 | dstream = NULL; | |
|
236 | ||
|
237 | totalReadPy = PyLong_FromSsize_t(totalRead); | |
|
238 | totalWritePy = PyLong_FromSsize_t(totalWrite); | |
|
239 | res = PyTuple_Pack(2, totalReadPy, totalWritePy); | |
|
240 | Py_DecRef(totalReadPy); | |
|
241 | Py_DecRef(totalWritePy); | |
|
242 | ||
|
243 | finally: | |
|
244 | if (output.dst) { | |
|
245 | PyMem_Free(output.dst); | |
|
246 | } | |
|
247 | ||
|
248 | if (dstream) { | |
|
249 | ZSTD_freeDStream(dstream); | |
|
250 | } | |
|
251 | ||
|
252 | return res; | |
|
253 | } | |
|
254 | ||
|
255 | PyDoc_STRVAR(Decompressor_decompress__doc__, | |
|
256 | "decompress(data[, max_output_size=None]) -- Decompress data in its entirety\n" | |
|
257 | "\n" | |
|
258 | "This method will decompress the entirety of the argument and return the\n" | |
|
259 | "result.\n" | |
|
260 | "\n" | |
|
261 | "The input bytes are expected to contain a full Zstandard frame (something\n" | |
|
262 | "compressed with ``ZstdCompressor.compress()`` or similar). If the input does\n" | |
|
263 | "not contain a full frame, an exception will be raised.\n" | |
|
264 | "\n" | |
|
265 | "If the frame header of the compressed data does not contain the content size\n" | |
|
266 | "``max_output_size`` must be specified or ``ZstdError`` will be raised. An\n" | |
|
267 | "allocation of size ``max_output_size`` will be performed and an attempt will\n" | |
|
268 | "be made to perform decompression into that buffer. If the buffer is too\n" | |
|
269 | "small or cannot be allocated, ``ZstdError`` will be raised. The buffer will\n" | |
|
270 | "be resized if it is too large.\n" | |
|
271 | "\n" | |
|
272 | "Uncompressed data could be much larger than compressed data. As a result,\n" | |
|
273 | "calling this function could result in a very large memory allocation being\n" | |
|
274 | "performed to hold the uncompressed data. Therefore it is **highly**\n" | |
|
275 | "recommended to use a streaming decompression method instead of this one.\n" | |
|
276 | ); | |
|
277 | ||
|
278 | PyObject* Decompressor_decompress(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { | |
|
279 | static char* kwlist[] = { | |
|
280 | "data", | |
|
281 | "max_output_size", | |
|
282 | NULL | |
|
283 | }; | |
|
284 | ||
|
285 | const char* source; | |
|
286 | Py_ssize_t sourceSize; | |
|
287 | Py_ssize_t maxOutputSize = 0; | |
|
288 | unsigned long long decompressedSize; | |
|
289 | size_t destCapacity; | |
|
290 | PyObject* result = NULL; | |
|
291 | ZSTD_DCtx* dctx = NULL; | |
|
292 | void* dictData = NULL; | |
|
293 | size_t dictSize = 0; | |
|
294 | size_t zresult; | |
|
295 | ||
|
296 | #if PY_MAJOR_VERSION >= 3 | |
|
297 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y#|n", kwlist, | |
|
298 | #else | |
|
299 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s#|n", kwlist, | |
|
300 | #endif | |
|
301 | &source, &sourceSize, &maxOutputSize)) { | |
|
302 | return NULL; | |
|
303 | } | |
|
304 | ||
|
305 | dctx = PyMem_Malloc(ZSTD_sizeof_DCtx(self->refdctx)); | |
|
306 | if (!dctx) { | |
|
307 | PyErr_NoMemory(); | |
|
308 | return NULL; | |
|
309 | } | |
|
310 | ||
|
311 | ZSTD_copyDCtx(dctx, self->refdctx); | |
|
312 | ||
|
313 | if (self->dict) { | |
|
314 | dictData = self->dict->dictData; | |
|
315 | dictSize = self->dict->dictSize; | |
|
316 | } | |
|
317 | ||
|
318 | if (dictData && !self->ddict) { | |
|
319 | Py_BEGIN_ALLOW_THREADS | |
|
320 | self->ddict = ZSTD_createDDict(dictData, dictSize); | |
|
321 | Py_END_ALLOW_THREADS | |
|
322 | ||
|
323 | if (!self->ddict) { | |
|
324 | PyErr_SetString(ZstdError, "could not create decompression dict"); | |
|
325 | goto except; | |
|
326 | } | |
|
327 | } | |
|
328 | ||
|
329 | decompressedSize = ZSTD_getDecompressedSize(source, sourceSize); | |
|
330 | /* 0 returned if content size not in the zstd frame header */ | |
|
331 | if (0 == decompressedSize) { | |
|
332 | if (0 == maxOutputSize) { | |
|
333 | PyErr_SetString(ZstdError, "input data invalid or missing content size " | |
|
334 | "in frame header"); | |
|
335 | goto except; | |
|
336 | } | |
|
337 | else { | |
|
338 | result = PyBytes_FromStringAndSize(NULL, maxOutputSize); | |
|
339 | destCapacity = maxOutputSize; | |
|
340 | } | |
|
341 | } | |
|
342 | else { | |
|
343 | result = PyBytes_FromStringAndSize(NULL, decompressedSize); | |
|
344 | destCapacity = decompressedSize; | |
|
345 | } | |
|
346 | ||
|
347 | if (!result) { | |
|
348 | goto except; | |
|
349 | } | |
|
350 | ||
|
351 | Py_BEGIN_ALLOW_THREADS | |
|
352 | if (self->ddict) { | |
|
353 | zresult = ZSTD_decompress_usingDDict(dctx, PyBytes_AsString(result), destCapacity, | |
|
354 | source, sourceSize, self->ddict); | |
|
355 | } | |
|
356 | else { | |
|
357 | zresult = ZSTD_decompressDCtx(dctx, PyBytes_AsString(result), destCapacity, source, sourceSize); | |
|
358 | } | |
|
359 | Py_END_ALLOW_THREADS | |
|
360 | ||
|
361 | if (ZSTD_isError(zresult)) { | |
|
362 | PyErr_Format(ZstdError, "decompression error: %s", ZSTD_getErrorName(zresult)); | |
|
363 | goto except; | |
|
364 | } | |
|
365 | else if (decompressedSize && zresult != decompressedSize) { | |
|
366 | PyErr_Format(ZstdError, "decompression error: decompressed %zu bytes; expected %llu", | |
|
367 | zresult, decompressedSize); | |
|
368 | goto except; | |
|
369 | } | |
|
370 | else if (zresult < destCapacity) { | |
|
371 | if (_PyBytes_Resize(&result, zresult)) { | |
|
372 | goto except; | |
|
373 | } | |
|
374 | } | |
|
375 | ||
|
376 | goto finally; | |
|
377 | ||
|
378 | except: | |
|
379 | Py_DecRef(result); | |
|
380 | result = NULL; | |
|
381 | ||
|
382 | finally: | |
|
383 | if (dctx) { | |
|
384 | PyMem_FREE(dctx); | |
|
385 | } | |
|
386 | ||
|
387 | return result; | |
|
388 | } | |
|
389 | ||
|
390 | PyDoc_STRVAR(Decompressor_decompressobj__doc__, | |
|
391 | "decompressobj()\n" | |
|
392 | "\n" | |
|
393 | "Incrementally feed data into a decompressor.\n" | |
|
394 | "\n" | |
|
395 | "The returned object exposes a ``decompress(data)`` method. This makes it\n" | |
|
396 | "compatible with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor`` so that\n" | |
|
397 | "callers can swap in the zstd decompressor while using the same API.\n" | |
|
398 | ); | |
|
399 | ||
|
400 | static ZstdDecompressionObj* Decompressor_decompressobj(ZstdDecompressor* self) { | |
|
401 | ZstdDecompressionObj* result = PyObject_New(ZstdDecompressionObj, &ZstdDecompressionObjType); | |
|
402 | if (!result) { | |
|
403 | return NULL; | |
|
404 | } | |
|
405 | ||
|
406 | result->dstream = DStream_from_ZstdDecompressor(self); | |
|
407 | if (!result->dstream) { | |
|
408 | Py_DecRef((PyObject*)result); | |
|
409 | return NULL; | |
|
410 | } | |
|
411 | ||
|
412 | result->decompressor = self; | |
|
413 | Py_INCREF(result->decompressor); | |
|
414 | ||
|
415 | result->finished = 0; | |
|
416 | ||
|
417 | return result; | |
|
418 | } | |
|
419 | ||
|
420 | PyDoc_STRVAR(Decompressor_read_from__doc__, | |
|
421 | "read_from(reader[, read_size=default, write_size=default, skip_bytes=0])\n" | |
|
422 | "Read compressed data and return an iterator\n" | |
|
423 | "\n" | |
|
424 | "Returns an iterator of decompressed data chunks produced from reading from\n" | |
|
425 | "the ``reader``.\n" | |
|
426 | "\n" | |
|
427 | "Compressed data will be obtained from ``reader`` by calling the\n" | |
|
428 | "``read(size)`` method of it. The source data will be streamed into a\n" | |
|
429 | "decompressor. As decompressed data is available, it will be exposed to the\n" | |
|
430 | "returned iterator.\n" | |
|
431 | "\n" | |
|
432 | "Data is ``read()`` in chunks of size ``read_size`` and exposed to the\n" | |
|
433 | "iterator in chunks of size ``write_size``. The default values are the input\n" | |
|
434 | "and output sizes for a zstd streaming decompressor.\n" | |
|
435 | "\n" | |
|
436 | "There is also support for skipping the first ``skip_bytes`` of data from\n" | |
|
437 | "the source.\n" | |
|
438 | ); | |
|
439 | ||
|
440 | static ZstdDecompressorIterator* Decompressor_read_from(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { | |
|
441 | static char* kwlist[] = { | |
|
442 | "reader", | |
|
443 | "read_size", | |
|
444 | "write_size", | |
|
445 | "skip_bytes", | |
|
446 | NULL | |
|
447 | }; | |
|
448 | ||
|
449 | PyObject* reader; | |
|
450 | size_t inSize = ZSTD_DStreamInSize(); | |
|
451 | size_t outSize = ZSTD_DStreamOutSize(); | |
|
452 | ZstdDecompressorIterator* result; | |
|
453 | size_t skipBytes = 0; | |
|
454 | ||
|
455 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kkk", kwlist, &reader, | |
|
456 | &inSize, &outSize, &skipBytes)) { | |
|
457 | return NULL; | |
|
458 | } | |
|
459 | ||
|
460 | if (skipBytes >= inSize) { | |
|
461 | PyErr_SetString(PyExc_ValueError, | |
|
462 | "skip_bytes must be smaller than read_size"); | |
|
463 | return NULL; | |
|
464 | } | |
|
465 | ||
|
466 | result = PyObject_New(ZstdDecompressorIterator, &ZstdDecompressorIteratorType); | |
|
467 | if (!result) { | |
|
468 | return NULL; | |
|
469 | } | |
|
470 | ||
|
471 | result->decompressor = NULL; | |
|
472 | result->reader = NULL; | |
|
473 | result->buffer = NULL; | |
|
474 | result->dstream = NULL; | |
|
475 | result->input.src = NULL; | |
|
476 | result->output.dst = NULL; | |
|
477 | ||
|
478 | if (PyObject_HasAttrString(reader, "read")) { | |
|
479 | result->reader = reader; | |
|
480 | Py_INCREF(result->reader); | |
|
481 | } | |
|
482 | else if (1 == PyObject_CheckBuffer(reader)) { | |
|
483 | /* Object claims it is a buffer. Try to get a handle to it. */ | |
|
484 | result->buffer = PyMem_Malloc(sizeof(Py_buffer)); | |
|
485 | if (!result->buffer) { | |
|
486 | goto except; | |
|
487 | } | |
|
488 | ||
|
489 | memset(result->buffer, 0, sizeof(Py_buffer)); | |
|
490 | ||
|
491 | if (0 != PyObject_GetBuffer(reader, result->buffer, PyBUF_CONTIG_RO)) { | |
|
492 | goto except; | |
|
493 | } | |
|
494 | ||
|
495 | result->bufferOffset = 0; | |
|
496 | } | |
|
497 | else { | |
|
498 | PyErr_SetString(PyExc_ValueError, | |
|
499 | "must pass an object with a read() method or conforms to buffer protocol"); | |
|
500 | goto except; | |
|
501 | } | |
|
502 | ||
|
503 | result->decompressor = self; | |
|
504 | Py_INCREF(result->decompressor); | |
|
505 | ||
|
506 | result->inSize = inSize; | |
|
507 | result->outSize = outSize; | |
|
508 | result->skipBytes = skipBytes; | |
|
509 | ||
|
510 | result->dstream = DStream_from_ZstdDecompressor(self); | |
|
511 | if (!result->dstream) { | |
|
512 | goto except; | |
|
513 | } | |
|
514 | ||
|
515 | result->input.src = PyMem_Malloc(inSize); | |
|
516 | if (!result->input.src) { | |
|
517 | PyErr_NoMemory(); | |
|
518 | goto except; | |
|
519 | } | |
|
520 | result->input.size = 0; | |
|
521 | result->input.pos = 0; | |
|
522 | ||
|
523 | result->output.dst = NULL; | |
|
524 | result->output.size = 0; | |
|
525 | result->output.pos = 0; | |
|
526 | ||
|
527 | result->readCount = 0; | |
|
528 | result->finishedInput = 0; | |
|
529 | result->finishedOutput = 0; | |
|
530 | ||
|
531 | goto finally; | |
|
532 | ||
|
533 | except: | |
|
534 | if (result->reader) { | |
|
535 | Py_DECREF(result->reader); | |
|
536 | result->reader = NULL; | |
|
537 | } | |
|
538 | ||
|
539 | if (result->buffer) { | |
|
540 | PyBuffer_Release(result->buffer); | |
|
541 | Py_DECREF(result->buffer); | |
|
542 | result->buffer = NULL; | |
|
543 | } | |
|
544 | ||
|
545 | Py_DECREF(result); | |
|
546 | result = NULL; | |
|
547 | ||
|
548 | finally: | |
|
549 | ||
|
550 | return result; | |
|
551 | } | |
|
552 | ||
|
553 | PyDoc_STRVAR(Decompressor_write_to__doc__, | |
|
554 | "Create a context manager to write decompressed data to an object.\n" | |
|
555 | "\n" | |
|
556 | "The passed object must have a ``write()`` method.\n" | |
|
557 | "\n" | |
|
558 | "The caller feeds intput data to the object by calling ``write(data)``.\n" | |
|
559 | "Decompressed data is written to the argument given as it is decompressed.\n" | |
|
560 | "\n" | |
|
561 | "An optional ``write_size`` argument defines the size of chunks to\n" | |
|
562 | "``write()`` to the writer. It defaults to the default output size for a zstd\n" | |
|
563 | "streaming decompressor.\n" | |
|
564 | ); | |
|
565 | ||
|
566 | static ZstdDecompressionWriter* Decompressor_write_to(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) { | |
|
567 | static char* kwlist[] = { | |
|
568 | "writer", | |
|
569 | "write_size", | |
|
570 | NULL | |
|
571 | }; | |
|
572 | ||
|
573 | PyObject* writer; | |
|
574 | size_t outSize = ZSTD_DStreamOutSize(); | |
|
575 | ZstdDecompressionWriter* result; | |
|
576 | ||
|
577 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|k", kwlist, &writer, &outSize)) { | |
|
578 | return NULL; | |
|
579 | } | |
|
580 | ||
|
581 | if (!PyObject_HasAttrString(writer, "write")) { | |
|
582 | PyErr_SetString(PyExc_ValueError, "must pass an object with a write() method"); | |
|
583 | return NULL; | |
|
584 | } | |
|
585 | ||
|
586 | result = PyObject_New(ZstdDecompressionWriter, &ZstdDecompressionWriterType); | |
|
587 | if (!result) { | |
|
588 | return NULL; | |
|
589 | } | |
|
590 | ||
|
591 | result->decompressor = self; | |
|
592 | Py_INCREF(result->decompressor); | |
|
593 | ||
|
594 | result->writer = writer; | |
|
595 | Py_INCREF(result->writer); | |
|
596 | ||
|
597 | result->outSize = outSize; | |
|
598 | ||
|
599 | result->entered = 0; | |
|
600 | result->dstream = NULL; | |
|
601 | ||
|
602 | return result; | |
|
603 | } | |
|
604 | ||
|
605 | static PyMethodDef Decompressor_methods[] = { | |
|
606 | { "copy_stream", (PyCFunction)Decompressor_copy_stream, METH_VARARGS | METH_KEYWORDS, | |
|
607 | Decompressor_copy_stream__doc__ }, | |
|
608 | { "decompress", (PyCFunction)Decompressor_decompress, METH_VARARGS | METH_KEYWORDS, | |
|
609 | Decompressor_decompress__doc__ }, | |
|
610 | { "decompressobj", (PyCFunction)Decompressor_decompressobj, METH_NOARGS, | |
|
611 | Decompressor_decompressobj__doc__ }, | |
|
612 | { "read_from", (PyCFunction)Decompressor_read_from, METH_VARARGS | METH_KEYWORDS, | |
|
613 | Decompressor_read_from__doc__ }, | |
|
614 | { "write_to", (PyCFunction)Decompressor_write_to, METH_VARARGS | METH_KEYWORDS, | |
|
615 | Decompressor_write_to__doc__ }, | |
|
616 | { NULL, NULL } | |
|
617 | }; | |
|
618 | ||
|
619 | PyTypeObject ZstdDecompressorType = { | |
|
620 | PyVarObject_HEAD_INIT(NULL, 0) | |
|
621 | "zstd.ZstdDecompressor", /* tp_name */ | |
|
622 | sizeof(ZstdDecompressor), /* tp_basicsize */ | |
|
623 | 0, /* tp_itemsize */ | |
|
624 | (destructor)Decompressor_dealloc, /* tp_dealloc */ | |
|
625 | 0, /* tp_print */ | |
|
626 | 0, /* tp_getattr */ | |
|
627 | 0, /* tp_setattr */ | |
|
628 | 0, /* tp_compare */ | |
|
629 | 0, /* tp_repr */ | |
|
630 | 0, /* tp_as_number */ | |
|
631 | 0, /* tp_as_sequence */ | |
|
632 | 0, /* tp_as_mapping */ | |
|
633 | 0, /* tp_hash */ | |
|
634 | 0, /* tp_call */ | |
|
635 | 0, /* tp_str */ | |
|
636 | 0, /* tp_getattro */ | |
|
637 | 0, /* tp_setattro */ | |
|
638 | 0, /* tp_as_buffer */ | |
|
639 | Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ | |
|
640 | Decompressor__doc__, /* tp_doc */ | |
|
641 | 0, /* tp_traverse */ | |
|
642 | 0, /* tp_clear */ | |
|
643 | 0, /* tp_richcompare */ | |
|
644 | 0, /* tp_weaklistoffset */ | |
|
645 | 0, /* tp_iter */ | |
|
646 | 0, /* tp_iternext */ | |
|
647 | Decompressor_methods, /* tp_methods */ | |
|
648 | 0, /* tp_members */ | |
|
649 | 0, /* tp_getset */ | |
|
650 | 0, /* tp_base */ | |
|
651 | 0, /* tp_dict */ | |
|
652 | 0, /* tp_descr_get */ | |
|
653 | 0, /* tp_descr_set */ | |
|
654 | 0, /* tp_dictoffset */ | |
|
655 | (initproc)Decompressor_init, /* tp_init */ | |
|
656 | 0, /* tp_alloc */ | |
|
657 | PyType_GenericNew, /* tp_new */ | |
|
658 | }; | |
|
659 | ||
|
660 | void decompressor_module_init(PyObject* mod) { | |
|
661 | Py_TYPE(&ZstdDecompressorType) = &PyType_Type; | |
|
662 | if (PyType_Ready(&ZstdDecompressorType) < 0) { | |
|
663 | return; | |
|
664 | } | |
|
665 | ||
|
666 | Py_INCREF((PyObject*)&ZstdDecompressorType); | |
|
667 | PyModule_AddObject(mod, "ZstdDecompressor", | |
|
668 | (PyObject*)&ZstdDecompressorType); | |
|
669 | } |
@@ -0,0 +1,254 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | #define min(a, b) (((a) < (b)) ? (a) : (b)) | |
|
12 | ||
|
13 | extern PyObject* ZstdError; | |
|
14 | ||
|
15 | PyDoc_STRVAR(ZstdDecompressorIterator__doc__, | |
|
16 | "Represents an iterator of decompressed data.\n" | |
|
17 | ); | |
|
18 | ||
|
19 | static void ZstdDecompressorIterator_dealloc(ZstdDecompressorIterator* self) { | |
|
20 | Py_XDECREF(self->decompressor); | |
|
21 | Py_XDECREF(self->reader); | |
|
22 | ||
|
23 | if (self->buffer) { | |
|
24 | PyBuffer_Release(self->buffer); | |
|
25 | PyMem_FREE(self->buffer); | |
|
26 | self->buffer = NULL; | |
|
27 | } | |
|
28 | ||
|
29 | if (self->dstream) { | |
|
30 | ZSTD_freeDStream(self->dstream); | |
|
31 | self->dstream = NULL; | |
|
32 | } | |
|
33 | ||
|
34 | if (self->input.src) { | |
|
35 | PyMem_Free((void*)self->input.src); | |
|
36 | self->input.src = NULL; | |
|
37 | } | |
|
38 | ||
|
39 | PyObject_Del(self); | |
|
40 | } | |
|
41 | ||
|
42 | static PyObject* ZstdDecompressorIterator_iter(PyObject* self) { | |
|
43 | Py_INCREF(self); | |
|
44 | return self; | |
|
45 | } | |
|
46 | ||
|
47 | static DecompressorIteratorResult read_decompressor_iterator(ZstdDecompressorIterator* self) { | |
|
48 | size_t zresult; | |
|
49 | PyObject* chunk; | |
|
50 | DecompressorIteratorResult result; | |
|
51 | size_t oldInputPos = self->input.pos; | |
|
52 | ||
|
53 | result.chunk = NULL; | |
|
54 | ||
|
55 | chunk = PyBytes_FromStringAndSize(NULL, self->outSize); | |
|
56 | if (!chunk) { | |
|
57 | result.errored = 1; | |
|
58 | return result; | |
|
59 | } | |
|
60 | ||
|
61 | self->output.dst = PyBytes_AsString(chunk); | |
|
62 | self->output.size = self->outSize; | |
|
63 | self->output.pos = 0; | |
|
64 | ||
|
65 | Py_BEGIN_ALLOW_THREADS | |
|
66 | zresult = ZSTD_decompressStream(self->dstream, &self->output, &self->input); | |
|
67 | Py_END_ALLOW_THREADS | |
|
68 | ||
|
69 | /* We're done with the pointer. Nullify to prevent anyone from getting a | |
|
70 | handle on a Python object. */ | |
|
71 | self->output.dst = NULL; | |
|
72 | ||
|
73 | if (ZSTD_isError(zresult)) { | |
|
74 | Py_DECREF(chunk); | |
|
75 | PyErr_Format(ZstdError, "zstd decompress error: %s", | |
|
76 | ZSTD_getErrorName(zresult)); | |
|
77 | result.errored = 1; | |
|
78 | return result; | |
|
79 | } | |
|
80 | ||
|
81 | self->readCount += self->input.pos - oldInputPos; | |
|
82 | ||
|
83 | /* Frame is fully decoded. Input exhausted and output sitting in buffer. */ | |
|
84 | if (0 == zresult) { | |
|
85 | self->finishedInput = 1; | |
|
86 | self->finishedOutput = 1; | |
|
87 | } | |
|
88 | ||
|
89 | /* If it produced output data, return it. */ | |
|
90 | if (self->output.pos) { | |
|
91 | if (self->output.pos < self->outSize) { | |
|
92 | if (_PyBytes_Resize(&chunk, self->output.pos)) { | |
|
93 | result.errored = 1; | |
|
94 | return result; | |
|
95 | } | |
|
96 | } | |
|
97 | } | |
|
98 | else { | |
|
99 | Py_DECREF(chunk); | |
|
100 | chunk = NULL; | |
|
101 | } | |
|
102 | ||
|
103 | result.errored = 0; | |
|
104 | result.chunk = chunk; | |
|
105 | ||
|
106 | return result; | |
|
107 | } | |
|
108 | ||
|
109 | static PyObject* ZstdDecompressorIterator_iternext(ZstdDecompressorIterator* self) { | |
|
110 | PyObject* readResult = NULL; | |
|
111 | char* readBuffer; | |
|
112 | Py_ssize_t readSize; | |
|
113 | Py_ssize_t bufferRemaining; | |
|
114 | DecompressorIteratorResult result; | |
|
115 | ||
|
116 | if (self->finishedOutput) { | |
|
117 | PyErr_SetString(PyExc_StopIteration, "output flushed"); | |
|
118 | return NULL; | |
|
119 | } | |
|
120 | ||
|
121 | /* If we have data left in the input, consume it. */ | |
|
122 | if (self->input.pos < self->input.size) { | |
|
123 | result = read_decompressor_iterator(self); | |
|
124 | if (result.chunk || result.errored) { | |
|
125 | return result.chunk; | |
|
126 | } | |
|
127 | ||
|
128 | /* Else fall through to get more data from input. */ | |
|
129 | } | |
|
130 | ||
|
131 | read_from_source: | |
|
132 | ||
|
133 | if (!self->finishedInput) { | |
|
134 | if (self->reader) { | |
|
135 | readResult = PyObject_CallMethod(self->reader, "read", "I", self->inSize); | |
|
136 | if (!readResult) { | |
|
137 | return NULL; | |
|
138 | } | |
|
139 | ||
|
140 | PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize); | |
|
141 | } | |
|
142 | else { | |
|
143 | assert(self->buffer && self->buffer->buf); | |
|
144 | ||
|
145 | /* Only support contiguous C arrays for now */ | |
|
146 | assert(self->buffer->strides == NULL && self->buffer->suboffsets == NULL); | |
|
147 | assert(self->buffer->itemsize == 1); | |
|
148 | ||
|
149 | /* TODO avoid memcpy() below */ | |
|
150 | readBuffer = (char *)self->buffer->buf + self->bufferOffset; | |
|
151 | bufferRemaining = self->buffer->len - self->bufferOffset; | |
|
152 | readSize = min(bufferRemaining, (Py_ssize_t)self->inSize); | |
|
153 | self->bufferOffset += readSize; | |
|
154 | } | |
|
155 | ||
|
156 | if (readSize) { | |
|
157 | if (!self->readCount && self->skipBytes) { | |
|
158 | assert(self->skipBytes < self->inSize); | |
|
159 | if ((Py_ssize_t)self->skipBytes >= readSize) { | |
|
160 | PyErr_SetString(PyExc_ValueError, | |
|
161 | "skip_bytes larger than first input chunk; " | |
|
162 | "this scenario is currently unsupported"); | |
|
163 | Py_DecRef(readResult); | |
|
164 | return NULL; | |
|
165 | } | |
|
166 | ||
|
167 | readBuffer = readBuffer + self->skipBytes; | |
|
168 | readSize -= self->skipBytes; | |
|
169 | } | |
|
170 | ||
|
171 | /* Copy input into previously allocated buffer because it can live longer | |
|
172 | than a single function call and we don't want to keep a ref to a Python | |
|
173 | object around. This could be changed... */ | |
|
174 | memcpy((void*)self->input.src, readBuffer, readSize); | |
|
175 | self->input.size = readSize; | |
|
176 | self->input.pos = 0; | |
|
177 | } | |
|
178 | /* No bytes on first read must mean an empty input stream. */ | |
|
179 | else if (!self->readCount) { | |
|
180 | self->finishedInput = 1; | |
|
181 | self->finishedOutput = 1; | |
|
182 | Py_DecRef(readResult); | |
|
183 | PyErr_SetString(PyExc_StopIteration, "empty input"); | |
|
184 | return NULL; | |
|
185 | } | |
|
186 | else { | |
|
187 | self->finishedInput = 1; | |
|
188 | } | |
|
189 | ||
|
190 | /* We've copied the data managed by memory. Discard the Python object. */ | |
|
191 | Py_DecRef(readResult); | |
|
192 | } | |
|
193 | ||
|
194 | result = read_decompressor_iterator(self); | |
|
195 | if (result.errored || result.chunk) { | |
|
196 | return result.chunk; | |
|
197 | } | |
|
198 | ||
|
199 | /* No new output data. Try again unless we know there is no more data. */ | |
|
200 | if (!self->finishedInput) { | |
|
201 | goto read_from_source; | |
|
202 | } | |
|
203 | ||
|
204 | PyErr_SetString(PyExc_StopIteration, "input exhausted"); | |
|
205 | return NULL; | |
|
206 | } | |
|
207 | ||
|
208 | PyTypeObject ZstdDecompressorIteratorType = { | |
|
209 | PyVarObject_HEAD_INIT(NULL, 0) | |
|
210 | "zstd.ZstdDecompressorIterator", /* tp_name */ | |
|
211 | sizeof(ZstdDecompressorIterator), /* tp_basicsize */ | |
|
212 | 0, /* tp_itemsize */ | |
|
213 | (destructor)ZstdDecompressorIterator_dealloc, /* tp_dealloc */ | |
|
214 | 0, /* tp_print */ | |
|
215 | 0, /* tp_getattr */ | |
|
216 | 0, /* tp_setattr */ | |
|
217 | 0, /* tp_compare */ | |
|
218 | 0, /* tp_repr */ | |
|
219 | 0, /* tp_as_number */ | |
|
220 | 0, /* tp_as_sequence */ | |
|
221 | 0, /* tp_as_mapping */ | |
|
222 | 0, /* tp_hash */ | |
|
223 | 0, /* tp_call */ | |
|
224 | 0, /* tp_str */ | |
|
225 | 0, /* tp_getattro */ | |
|
226 | 0, /* tp_setattro */ | |
|
227 | 0, /* tp_as_buffer */ | |
|
228 | Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */ | |
|
229 | ZstdDecompressorIterator__doc__, /* tp_doc */ | |
|
230 | 0, /* tp_traverse */ | |
|
231 | 0, /* tp_clear */ | |
|
232 | 0, /* tp_richcompare */ | |
|
233 | 0, /* tp_weaklistoffset */ | |
|
234 | ZstdDecompressorIterator_iter, /* tp_iter */ | |
|
235 | (iternextfunc)ZstdDecompressorIterator_iternext, /* tp_iternext */ | |
|
236 | 0, /* tp_methods */ | |
|
237 | 0, /* tp_members */ | |
|
238 | 0, /* tp_getset */ | |
|
239 | 0, /* tp_base */ | |
|
240 | 0, /* tp_dict */ | |
|
241 | 0, /* tp_descr_get */ | |
|
242 | 0, /* tp_descr_set */ | |
|
243 | 0, /* tp_dictoffset */ | |
|
244 | 0, /* tp_init */ | |
|
245 | 0, /* tp_alloc */ | |
|
246 | PyType_GenericNew, /* tp_new */ | |
|
247 | }; | |
|
248 | ||
|
249 | void decompressoriterator_module_init(PyObject* mod) { | |
|
250 | Py_TYPE(&ZstdDecompressorIteratorType) = &PyType_Type; | |
|
251 | if (PyType_Ready(&ZstdDecompressorIteratorType) < 0) { | |
|
252 | return; | |
|
253 | } | |
|
254 | } |
@@ -0,0 +1,125 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #include "python-zstandard.h" | |
|
10 | ||
|
11 | PyDoc_STRVAR(DictParameters__doc__, | |
|
12 | "DictParameters: low-level control over dictionary generation"); | |
|
13 | ||
|
14 | static PyObject* DictParameters_new(PyTypeObject* subtype, PyObject* args, PyObject* kwargs) { | |
|
15 | DictParametersObject* self; | |
|
16 | unsigned selectivityLevel; | |
|
17 | int compressionLevel; | |
|
18 | unsigned notificationLevel; | |
|
19 | unsigned dictID; | |
|
20 | ||
|
21 | if (!PyArg_ParseTuple(args, "IiII", &selectivityLevel, &compressionLevel, | |
|
22 | ¬ificationLevel, &dictID)) { | |
|
23 | return NULL; | |
|
24 | } | |
|
25 | ||
|
26 | self = (DictParametersObject*)subtype->tp_alloc(subtype, 1); | |
|
27 | if (!self) { | |
|
28 | return NULL; | |
|
29 | } | |
|
30 | ||
|
31 | self->selectivityLevel = selectivityLevel; | |
|
32 | self->compressionLevel = compressionLevel; | |
|
33 | self->notificationLevel = notificationLevel; | |
|
34 | self->dictID = dictID; | |
|
35 | ||
|
36 | return (PyObject*)self; | |
|
37 | } | |
|
38 | ||
|
39 | static void DictParameters_dealloc(PyObject* self) { | |
|
40 | PyObject_Del(self); | |
|
41 | } | |
|
42 | ||
|
43 | static Py_ssize_t DictParameters_length(PyObject* self) { | |
|
44 | return 4; | |
|
45 | }; | |
|
46 | ||
|
47 | static PyObject* DictParameters_item(PyObject* o, Py_ssize_t i) { | |
|
48 | DictParametersObject* self = (DictParametersObject*)o; | |
|
49 | ||
|
50 | switch (i) { | |
|
51 | case 0: | |
|
52 | return PyLong_FromLong(self->selectivityLevel); | |
|
53 | case 1: | |
|
54 | return PyLong_FromLong(self->compressionLevel); | |
|
55 | case 2: | |
|
56 | return PyLong_FromLong(self->notificationLevel); | |
|
57 | case 3: | |
|
58 | return PyLong_FromLong(self->dictID); | |
|
59 | default: | |
|
60 | PyErr_SetString(PyExc_IndexError, "index out of range"); | |
|
61 | return NULL; | |
|
62 | } | |
|
63 | } | |
|
64 | ||
|
65 | static PySequenceMethods DictParameters_sq = { | |
|
66 | DictParameters_length, /* sq_length */ | |
|
67 | 0, /* sq_concat */ | |
|
68 | 0, /* sq_repeat */ | |
|
69 | DictParameters_item, /* sq_item */ | |
|
70 | 0, /* sq_ass_item */ | |
|
71 | 0, /* sq_contains */ | |
|
72 | 0, /* sq_inplace_concat */ | |
|
73 | 0 /* sq_inplace_repeat */ | |
|
74 | }; | |
|
75 | ||
|
76 | PyTypeObject DictParametersType = { | |
|
77 | PyVarObject_HEAD_INIT(NULL, 0) | |
|
78 | "DictParameters", /* tp_name */ | |
|
79 | sizeof(DictParametersObject), /* tp_basicsize */ | |
|
80 | 0, /* tp_itemsize */ | |
|
81 | (destructor)DictParameters_dealloc, /* tp_dealloc */ | |
|
82 | 0, /* tp_print */ | |
|
83 | 0, /* tp_getattr */ | |
|
84 | 0, /* tp_setattr */ | |
|
85 | 0, /* tp_compare */ | |
|
86 | 0, /* tp_repr */ | |
|
87 | 0, /* tp_as_number */ | |
|
88 | &DictParameters_sq, /* tp_as_sequence */ | |
|
89 | 0, /* tp_as_mapping */ | |
|
90 | 0, /* tp_hash */ | |
|
91 | 0, /* tp_call */ | |
|
92 | 0, /* tp_str */ | |
|
93 | 0, /* tp_getattro */ | |
|
94 | 0, /* tp_setattro */ | |
|
95 | 0, /* tp_as_buffer */ | |
|
96 | Py_TPFLAGS_DEFAULT, /* tp_flags */ | |
|
97 | DictParameters__doc__, /* tp_doc */ | |
|
98 | 0, /* tp_traverse */ | |
|
99 | 0, /* tp_clear */ | |
|
100 | 0, /* tp_richcompare */ | |
|
101 | 0, /* tp_weaklistoffset */ | |
|
102 | 0, /* tp_iter */ | |
|
103 | 0, /* tp_iternext */ | |
|
104 | 0, /* tp_methods */ | |
|
105 | 0, /* tp_members */ | |
|
106 | 0, /* tp_getset */ | |
|
107 | 0, /* tp_base */ | |
|
108 | 0, /* tp_dict */ | |
|
109 | 0, /* tp_descr_get */ | |
|
110 | 0, /* tp_descr_set */ | |
|
111 | 0, /* tp_dictoffset */ | |
|
112 | 0, /* tp_init */ | |
|
113 | 0, /* tp_alloc */ | |
|
114 | DictParameters_new, /* tp_new */ | |
|
115 | }; | |
|
116 | ||
|
117 | void dictparams_module_init(PyObject* mod) { | |
|
118 | Py_TYPE(&DictParametersType) = &PyType_Type; | |
|
119 | if (PyType_Ready(&DictParametersType) < 0) { | |
|
120 | return; | |
|
121 | } | |
|
122 | ||
|
123 | Py_IncRef((PyObject*)&DictParametersType); | |
|
124 | PyModule_AddObject(mod, "DictParameters", (PyObject*)&DictParametersType); | |
|
125 | } |
@@ -0,0 +1,172 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | #define PY_SSIZE_T_CLEAN | |
|
10 | #include <Python.h> | |
|
11 | ||
|
12 | #define ZSTD_STATIC_LINKING_ONLY | |
|
13 | #define ZDICT_STATIC_LINKING_ONLY | |
|
14 | #include "mem.h" | |
|
15 | #include "zstd.h" | |
|
16 | #include "zdict.h" | |
|
17 | ||
|
18 | #define PYTHON_ZSTANDARD_VERSION "0.5.0" | |
|
19 | ||
|
20 | typedef struct { | |
|
21 | PyObject_HEAD | |
|
22 | unsigned windowLog; | |
|
23 | unsigned chainLog; | |
|
24 | unsigned hashLog; | |
|
25 | unsigned searchLog; | |
|
26 | unsigned searchLength; | |
|
27 | unsigned targetLength; | |
|
28 | ZSTD_strategy strategy; | |
|
29 | } CompressionParametersObject; | |
|
30 | ||
|
31 | extern PyTypeObject CompressionParametersType; | |
|
32 | ||
|
33 | typedef struct { | |
|
34 | PyObject_HEAD | |
|
35 | unsigned selectivityLevel; | |
|
36 | int compressionLevel; | |
|
37 | unsigned notificationLevel; | |
|
38 | unsigned dictID; | |
|
39 | } DictParametersObject; | |
|
40 | ||
|
41 | extern PyTypeObject DictParametersType; | |
|
42 | ||
|
43 | typedef struct { | |
|
44 | PyObject_HEAD | |
|
45 | ||
|
46 | void* dictData; | |
|
47 | size_t dictSize; | |
|
48 | } ZstdCompressionDict; | |
|
49 | ||
|
50 | extern PyTypeObject ZstdCompressionDictType; | |
|
51 | ||
|
52 | typedef struct { | |
|
53 | PyObject_HEAD | |
|
54 | ||
|
55 | int compressionLevel; | |
|
56 | ZstdCompressionDict* dict; | |
|
57 | ZSTD_CDict* cdict; | |
|
58 | CompressionParametersObject* cparams; | |
|
59 | ZSTD_frameParameters fparams; | |
|
60 | } ZstdCompressor; | |
|
61 | ||
|
62 | extern PyTypeObject ZstdCompressorType; | |
|
63 | ||
|
64 | typedef struct { | |
|
65 | PyObject_HEAD | |
|
66 | ||
|
67 | ZstdCompressor* compressor; | |
|
68 | ZSTD_CStream* cstream; | |
|
69 | ZSTD_outBuffer output; | |
|
70 | int flushed; | |
|
71 | } ZstdCompressionObj; | |
|
72 | ||
|
73 | extern PyTypeObject ZstdCompressionObjType; | |
|
74 | ||
|
75 | typedef struct { | |
|
76 | PyObject_HEAD | |
|
77 | ||
|
78 | ZstdCompressor* compressor; | |
|
79 | PyObject* writer; | |
|
80 | Py_ssize_t sourceSize; | |
|
81 | size_t outSize; | |
|
82 | ZSTD_CStream* cstream; | |
|
83 | int entered; | |
|
84 | } ZstdCompressionWriter; | |
|
85 | ||
|
86 | extern PyTypeObject ZstdCompressionWriterType; | |
|
87 | ||
|
88 | typedef struct { | |
|
89 | PyObject_HEAD | |
|
90 | ||
|
91 | ZstdCompressor* compressor; | |
|
92 | PyObject* reader; | |
|
93 | Py_buffer* buffer; | |
|
94 | Py_ssize_t bufferOffset; | |
|
95 | Py_ssize_t sourceSize; | |
|
96 | size_t inSize; | |
|
97 | size_t outSize; | |
|
98 | ||
|
99 | ZSTD_CStream* cstream; | |
|
100 | ZSTD_inBuffer input; | |
|
101 | ZSTD_outBuffer output; | |
|
102 | int finishedOutput; | |
|
103 | int finishedInput; | |
|
104 | PyObject* readResult; | |
|
105 | } ZstdCompressorIterator; | |
|
106 | ||
|
107 | extern PyTypeObject ZstdCompressorIteratorType; | |
|
108 | ||
|
109 | typedef struct { | |
|
110 | PyObject_HEAD | |
|
111 | ||
|
112 | ZSTD_DCtx* refdctx; | |
|
113 | ||
|
114 | ZstdCompressionDict* dict; | |
|
115 | ZSTD_DDict* ddict; | |
|
116 | } ZstdDecompressor; | |
|
117 | ||
|
118 | extern PyTypeObject ZstdDecompressorType; | |
|
119 | ||
|
120 | typedef struct { | |
|
121 | PyObject_HEAD | |
|
122 | ||
|
123 | ZstdDecompressor* decompressor; | |
|
124 | ZSTD_DStream* dstream; | |
|
125 | int finished; | |
|
126 | } ZstdDecompressionObj; | |
|
127 | ||
|
128 | extern PyTypeObject ZstdDecompressionObjType; | |
|
129 | ||
|
130 | typedef struct { | |
|
131 | PyObject_HEAD | |
|
132 | ||
|
133 | ZstdDecompressor* decompressor; | |
|
134 | PyObject* writer; | |
|
135 | size_t outSize; | |
|
136 | ZSTD_DStream* dstream; | |
|
137 | int entered; | |
|
138 | } ZstdDecompressionWriter; | |
|
139 | ||
|
140 | extern PyTypeObject ZstdDecompressionWriterType; | |
|
141 | ||
|
142 | typedef struct { | |
|
143 | PyObject_HEAD | |
|
144 | ||
|
145 | ZstdDecompressor* decompressor; | |
|
146 | PyObject* reader; | |
|
147 | Py_buffer* buffer; | |
|
148 | Py_ssize_t bufferOffset; | |
|
149 | size_t inSize; | |
|
150 | size_t outSize; | |
|
151 | size_t skipBytes; | |
|
152 | ZSTD_DStream* dstream; | |
|
153 | ZSTD_inBuffer input; | |
|
154 | ZSTD_outBuffer output; | |
|
155 | Py_ssize_t readCount; | |
|
156 | int finishedInput; | |
|
157 | int finishedOutput; | |
|
158 | } ZstdDecompressorIterator; | |
|
159 | ||
|
160 | extern PyTypeObject ZstdDecompressorIteratorType; | |
|
161 | ||
|
162 | typedef struct { | |
|
163 | int errored; | |
|
164 | PyObject* chunk; | |
|
165 | } DecompressorIteratorResult; | |
|
166 | ||
|
167 | void ztopy_compression_parameters(CompressionParametersObject* params, ZSTD_compressionParameters* zparams); | |
|
168 | CompressionParametersObject* get_compression_parameters(PyObject* self, PyObject* args); | |
|
169 | PyObject* estimate_compression_context_size(PyObject* self, PyObject* args); | |
|
170 | ZSTD_CStream* CStream_from_ZstdCompressor(ZstdCompressor* compressor, Py_ssize_t sourceSize); | |
|
171 | ZSTD_DStream* DStream_from_ZstdDecompressor(ZstdDecompressor* decompressor); | |
|
172 | ZstdCompressionDict* train_dictionary(PyObject* self, PyObject* args, PyObject* kwargs); |
@@ -0,0 +1,110 b'' | |||
|
1 | # Copyright (c) 2016-present, Gregory Szorc | |
|
2 | # All rights reserved. | |
|
3 | # | |
|
4 | # This software may be modified and distributed under the terms | |
|
5 | # of the BSD license. See the LICENSE file for details. | |
|
6 | ||
|
7 | from __future__ import absolute_import | |
|
8 | ||
|
9 | import cffi | |
|
10 | import os | |
|
11 | ||
|
12 | ||
|
13 | HERE = os.path.abspath(os.path.dirname(__file__)) | |
|
14 | ||
|
15 | SOURCES = ['zstd/%s' % p for p in ( | |
|
16 | 'common/entropy_common.c', | |
|
17 | 'common/error_private.c', | |
|
18 | 'common/fse_decompress.c', | |
|
19 | 'common/xxhash.c', | |
|
20 | 'common/zstd_common.c', | |
|
21 | 'compress/fse_compress.c', | |
|
22 | 'compress/huf_compress.c', | |
|
23 | 'compress/zbuff_compress.c', | |
|
24 | 'compress/zstd_compress.c', | |
|
25 | 'decompress/huf_decompress.c', | |
|
26 | 'decompress/zbuff_decompress.c', | |
|
27 | 'decompress/zstd_decompress.c', | |
|
28 | 'dictBuilder/divsufsort.c', | |
|
29 | 'dictBuilder/zdict.c', | |
|
30 | )] | |
|
31 | ||
|
32 | INCLUDE_DIRS = [os.path.join(HERE, d) for d in ( | |
|
33 | 'zstd', | |
|
34 | 'zstd/common', | |
|
35 | 'zstd/compress', | |
|
36 | 'zstd/decompress', | |
|
37 | 'zstd/dictBuilder', | |
|
38 | )] | |
|
39 | ||
|
40 | with open(os.path.join(HERE, 'zstd', 'zstd.h'), 'rb') as fh: | |
|
41 | zstd_h = fh.read() | |
|
42 | ||
|
43 | ffi = cffi.FFI() | |
|
44 | ffi.set_source('_zstd_cffi', ''' | |
|
45 | /* needed for typedefs like U32 references in zstd.h */ | |
|
46 | #include "mem.h" | |
|
47 | #define ZSTD_STATIC_LINKING_ONLY | |
|
48 | #include "zstd.h" | |
|
49 | ''', | |
|
50 | sources=SOURCES, include_dirs=INCLUDE_DIRS) | |
|
51 | ||
|
52 | # Rather than define the API definitions from zstd.h inline, munge the | |
|
53 | # source in a way that cdef() will accept. | |
|
54 | lines = zstd_h.splitlines() | |
|
55 | lines = [l.rstrip() for l in lines if l.strip()] | |
|
56 | ||
|
57 | # Strip preprocessor directives - they aren't important for our needs. | |
|
58 | lines = [l for l in lines | |
|
59 | if not l.startswith((b'#if', b'#else', b'#endif', b'#include'))] | |
|
60 | ||
|
61 | # Remove extern C block | |
|
62 | lines = [l for l in lines if l not in (b'extern "C" {', b'}')] | |
|
63 | ||
|
64 | # The version #defines don't parse and aren't necessary. Strip them. | |
|
65 | lines = [l for l in lines if not l.startswith(( | |
|
66 | b'#define ZSTD_H_235446', | |
|
67 | b'#define ZSTD_LIB_VERSION', | |
|
68 | b'#define ZSTD_QUOTE', | |
|
69 | b'#define ZSTD_EXPAND_AND_QUOTE', | |
|
70 | b'#define ZSTD_VERSION_STRING', | |
|
71 | b'#define ZSTD_VERSION_NUMBER'))] | |
|
72 | ||
|
73 | # The C parser also doesn't like some constant defines referencing | |
|
74 | # other constants. | |
|
75 | # TODO we pick the 64-bit constants here. We should assert somewhere | |
|
76 | # we're compiling for 64-bit. | |
|
77 | def fix_constants(l): | |
|
78 | if l.startswith(b'#define ZSTD_WINDOWLOG_MAX '): | |
|
79 | return b'#define ZSTD_WINDOWLOG_MAX 27' | |
|
80 | elif l.startswith(b'#define ZSTD_CHAINLOG_MAX '): | |
|
81 | return b'#define ZSTD_CHAINLOG_MAX 28' | |
|
82 | elif l.startswith(b'#define ZSTD_HASHLOG_MAX '): | |
|
83 | return b'#define ZSTD_HASHLOG_MAX 27' | |
|
84 | elif l.startswith(b'#define ZSTD_CHAINLOG_MAX '): | |
|
85 | return b'#define ZSTD_CHAINLOG_MAX 28' | |
|
86 | elif l.startswith(b'#define ZSTD_CHAINLOG_MIN '): | |
|
87 | return b'#define ZSTD_CHAINLOG_MIN 6' | |
|
88 | elif l.startswith(b'#define ZSTD_SEARCHLOG_MAX '): | |
|
89 | return b'#define ZSTD_SEARCHLOG_MAX 26' | |
|
90 | elif l.startswith(b'#define ZSTD_BLOCKSIZE_ABSOLUTEMAX '): | |
|
91 | return b'#define ZSTD_BLOCKSIZE_ABSOLUTEMAX 131072' | |
|
92 | else: | |
|
93 | return l | |
|
94 | lines = map(fix_constants, lines) | |
|
95 | ||
|
96 | # ZSTDLIB_API isn't handled correctly. Strip it. | |
|
97 | lines = [l for l in lines if not l.startswith(b'# define ZSTDLIB_API')] | |
|
98 | def strip_api(l): | |
|
99 | if l.startswith(b'ZSTDLIB_API '): | |
|
100 | return l[len(b'ZSTDLIB_API '):] | |
|
101 | else: | |
|
102 | return l | |
|
103 | lines = map(strip_api, lines) | |
|
104 | ||
|
105 | source = b'\n'.join(lines) | |
|
106 | ffi.cdef(source.decode('latin1')) | |
|
107 | ||
|
108 | ||
|
109 | if __name__ == '__main__': | |
|
110 | ffi.compile() |
@@ -0,0 +1,62 b'' | |||
|
1 | #!/usr/bin/env python | |
|
2 | # Copyright (c) 2016-present, Gregory Szorc | |
|
3 | # All rights reserved. | |
|
4 | # | |
|
5 | # This software may be modified and distributed under the terms | |
|
6 | # of the BSD license. See the LICENSE file for details. | |
|
7 | ||
|
8 | from setuptools import setup | |
|
9 | ||
|
10 | try: | |
|
11 | import cffi | |
|
12 | except ImportError: | |
|
13 | cffi = None | |
|
14 | ||
|
15 | import setup_zstd | |
|
16 | ||
|
17 | # Code for obtaining the Extension instance is in its own module to | |
|
18 | # facilitate reuse in other projects. | |
|
19 | extensions = [setup_zstd.get_c_extension()] | |
|
20 | ||
|
21 | if cffi: | |
|
22 | import make_cffi | |
|
23 | extensions.append(make_cffi.ffi.distutils_extension()) | |
|
24 | ||
|
25 | version = None | |
|
26 | ||
|
27 | with open('c-ext/python-zstandard.h', 'r') as fh: | |
|
28 | for line in fh: | |
|
29 | if not line.startswith('#define PYTHON_ZSTANDARD_VERSION'): | |
|
30 | continue | |
|
31 | ||
|
32 | version = line.split()[2][1:-1] | |
|
33 | break | |
|
34 | ||
|
35 | if not version: | |
|
36 | raise Exception('could not resolve package version; ' | |
|
37 | 'this should never happen') | |
|
38 | ||
|
39 | setup( | |
|
40 | name='zstandard', | |
|
41 | version=version, | |
|
42 | description='Zstandard bindings for Python', | |
|
43 | long_description=open('README.rst', 'r').read(), | |
|
44 | url='https://github.com/indygreg/python-zstandard', | |
|
45 | author='Gregory Szorc', | |
|
46 | author_email='gregory.szorc@gmail.com', | |
|
47 | license='BSD', | |
|
48 | classifiers=[ | |
|
49 | 'Development Status :: 4 - Beta', | |
|
50 | 'Intended Audience :: Developers', | |
|
51 | 'License :: OSI Approved :: BSD License', | |
|
52 | 'Programming Language :: C', | |
|
53 | 'Programming Language :: Python :: 2.6', | |
|
54 | 'Programming Language :: Python :: 2.7', | |
|
55 | 'Programming Language :: Python :: 3.3', | |
|
56 | 'Programming Language :: Python :: 3.4', | |
|
57 | 'Programming Language :: Python :: 3.5', | |
|
58 | ], | |
|
59 | keywords='zstandard zstd compression', | |
|
60 | ext_modules=extensions, | |
|
61 | test_suite='tests', | |
|
62 | ) |
@@ -0,0 +1,64 b'' | |||
|
1 | # Copyright (c) 2016-present, Gregory Szorc | |
|
2 | # All rights reserved. | |
|
3 | # | |
|
4 | # This software may be modified and distributed under the terms | |
|
5 | # of the BSD license. See the LICENSE file for details. | |
|
6 | ||
|
7 | import os | |
|
8 | from distutils.extension import Extension | |
|
9 | ||
|
10 | ||
|
11 | zstd_sources = ['zstd/%s' % p for p in ( | |
|
12 | 'common/entropy_common.c', | |
|
13 | 'common/error_private.c', | |
|
14 | 'common/fse_decompress.c', | |
|
15 | 'common/xxhash.c', | |
|
16 | 'common/zstd_common.c', | |
|
17 | 'compress/fse_compress.c', | |
|
18 | 'compress/huf_compress.c', | |
|
19 | 'compress/zbuff_compress.c', | |
|
20 | 'compress/zstd_compress.c', | |
|
21 | 'decompress/huf_decompress.c', | |
|
22 | 'decompress/zbuff_decompress.c', | |
|
23 | 'decompress/zstd_decompress.c', | |
|
24 | 'dictBuilder/divsufsort.c', | |
|
25 | 'dictBuilder/zdict.c', | |
|
26 | )] | |
|
27 | ||
|
28 | ||
|
29 | zstd_includes = [ | |
|
30 | 'c-ext', | |
|
31 | 'zstd', | |
|
32 | 'zstd/common', | |
|
33 | 'zstd/compress', | |
|
34 | 'zstd/decompress', | |
|
35 | 'zstd/dictBuilder', | |
|
36 | ] | |
|
37 | ||
|
38 | ext_sources = [ | |
|
39 | 'zstd.c', | |
|
40 | 'c-ext/compressiondict.c', | |
|
41 | 'c-ext/compressobj.c', | |
|
42 | 'c-ext/compressor.c', | |
|
43 | 'c-ext/compressoriterator.c', | |
|
44 | 'c-ext/compressionparams.c', | |
|
45 | 'c-ext/compressionwriter.c', | |
|
46 | 'c-ext/constants.c', | |
|
47 | 'c-ext/decompressobj.c', | |
|
48 | 'c-ext/decompressor.c', | |
|
49 | 'c-ext/decompressoriterator.c', | |
|
50 | 'c-ext/decompressionwriter.c', | |
|
51 | 'c-ext/dictparams.c', | |
|
52 | ] | |
|
53 | ||
|
54 | ||
|
55 | def get_c_extension(name='zstd'): | |
|
56 | """Obtain a distutils.extension.Extension for the C extension.""" | |
|
57 | root = os.path.abspath(os.path.dirname(__file__)) | |
|
58 | ||
|
59 | sources = [os.path.join(root, p) for p in zstd_sources + ext_sources] | |
|
60 | include_dirs = [os.path.join(root, d) for d in zstd_includes] | |
|
61 | ||
|
62 | # TODO compile with optimizations. | |
|
63 | return Extension(name, sources, | |
|
64 | include_dirs=include_dirs) |
|
1 | NO CONTENT: new file 100644 |
@@ -0,0 +1,15 b'' | |||
|
1 | import io | |
|
2 | ||
|
3 | class OpCountingBytesIO(io.BytesIO): | |
|
4 | def __init__(self, *args, **kwargs): | |
|
5 | self._read_count = 0 | |
|
6 | self._write_count = 0 | |
|
7 | return super(OpCountingBytesIO, self).__init__(*args, **kwargs) | |
|
8 | ||
|
9 | def read(self, *args): | |
|
10 | self._read_count += 1 | |
|
11 | return super(OpCountingBytesIO, self).read(*args) | |
|
12 | ||
|
13 | def write(self, data): | |
|
14 | self._write_count += 1 | |
|
15 | return super(OpCountingBytesIO, self).write(data) |
@@ -0,0 +1,35 b'' | |||
|
1 | import io | |
|
2 | ||
|
3 | try: | |
|
4 | import unittest2 as unittest | |
|
5 | except ImportError: | |
|
6 | import unittest | |
|
7 | ||
|
8 | import zstd | |
|
9 | ||
|
10 | try: | |
|
11 | import zstd_cffi | |
|
12 | except ImportError: | |
|
13 | raise unittest.SkipTest('cffi version of zstd not available') | |
|
14 | ||
|
15 | ||
|
16 | class TestCFFIWriteToToCDecompressor(unittest.TestCase): | |
|
17 | def test_simple(self): | |
|
18 | orig = io.BytesIO() | |
|
19 | orig.write(b'foo') | |
|
20 | orig.write(b'bar') | |
|
21 | orig.write(b'foobar' * 16384) | |
|
22 | ||
|
23 | dest = io.BytesIO() | |
|
24 | cctx = zstd_cffi.ZstdCompressor() | |
|
25 | with cctx.write_to(dest) as compressor: | |
|
26 | compressor.write(orig.getvalue()) | |
|
27 | ||
|
28 | uncompressed = io.BytesIO() | |
|
29 | dctx = zstd.ZstdDecompressor() | |
|
30 | with dctx.write_to(uncompressed) as decompressor: | |
|
31 | decompressor.write(dest.getvalue()) | |
|
32 | ||
|
33 | self.assertEqual(uncompressed.getvalue(), orig.getvalue()) | |
|
34 | ||
|
35 |
@@ -0,0 +1,465 b'' | |||
|
1 | import hashlib | |
|
2 | import io | |
|
3 | import struct | |
|
4 | import sys | |
|
5 | ||
|
6 | try: | |
|
7 | import unittest2 as unittest | |
|
8 | except ImportError: | |
|
9 | import unittest | |
|
10 | ||
|
11 | import zstd | |
|
12 | ||
|
13 | from .common import OpCountingBytesIO | |
|
14 | ||
|
15 | ||
|
16 | if sys.version_info[0] >= 3: | |
|
17 | next = lambda it: it.__next__() | |
|
18 | else: | |
|
19 | next = lambda it: it.next() | |
|
20 | ||
|
21 | ||
|
22 | class TestCompressor(unittest.TestCase): | |
|
23 | def test_level_bounds(self): | |
|
24 | with self.assertRaises(ValueError): | |
|
25 | zstd.ZstdCompressor(level=0) | |
|
26 | ||
|
27 | with self.assertRaises(ValueError): | |
|
28 | zstd.ZstdCompressor(level=23) | |
|
29 | ||
|
30 | ||
|
31 | class TestCompressor_compress(unittest.TestCase): | |
|
32 | def test_compress_empty(self): | |
|
33 | cctx = zstd.ZstdCompressor(level=1) | |
|
34 | cctx.compress(b'') | |
|
35 | ||
|
36 | cctx = zstd.ZstdCompressor(level=22) | |
|
37 | cctx.compress(b'') | |
|
38 | ||
|
39 | def test_compress_empty(self): | |
|
40 | cctx = zstd.ZstdCompressor(level=1) | |
|
41 | self.assertEqual(cctx.compress(b''), | |
|
42 | b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00') | |
|
43 | ||
|
44 | def test_compress_large(self): | |
|
45 | chunks = [] | |
|
46 | for i in range(255): | |
|
47 | chunks.append(struct.Struct('>B').pack(i) * 16384) | |
|
48 | ||
|
49 | cctx = zstd.ZstdCompressor(level=3) | |
|
50 | result = cctx.compress(b''.join(chunks)) | |
|
51 | self.assertEqual(len(result), 999) | |
|
52 | self.assertEqual(result[0:4], b'\x28\xb5\x2f\xfd') | |
|
53 | ||
|
54 | def test_write_checksum(self): | |
|
55 | cctx = zstd.ZstdCompressor(level=1) | |
|
56 | no_checksum = cctx.compress(b'foobar') | |
|
57 | cctx = zstd.ZstdCompressor(level=1, write_checksum=True) | |
|
58 | with_checksum = cctx.compress(b'foobar') | |
|
59 | ||
|
60 | self.assertEqual(len(with_checksum), len(no_checksum) + 4) | |
|
61 | ||
|
62 | def test_write_content_size(self): | |
|
63 | cctx = zstd.ZstdCompressor(level=1) | |
|
64 | no_size = cctx.compress(b'foobar' * 256) | |
|
65 | cctx = zstd.ZstdCompressor(level=1, write_content_size=True) | |
|
66 | with_size = cctx.compress(b'foobar' * 256) | |
|
67 | ||
|
68 | self.assertEqual(len(with_size), len(no_size) + 1) | |
|
69 | ||
|
70 | def test_no_dict_id(self): | |
|
71 | samples = [] | |
|
72 | for i in range(128): | |
|
73 | samples.append(b'foo' * 64) | |
|
74 | samples.append(b'bar' * 64) | |
|
75 | samples.append(b'foobar' * 64) | |
|
76 | ||
|
77 | d = zstd.train_dictionary(1024, samples) | |
|
78 | ||
|
79 | cctx = zstd.ZstdCompressor(level=1, dict_data=d) | |
|
80 | with_dict_id = cctx.compress(b'foobarfoobar') | |
|
81 | ||
|
82 | cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_dict_id=False) | |
|
83 | no_dict_id = cctx.compress(b'foobarfoobar') | |
|
84 | ||
|
85 | self.assertEqual(len(with_dict_id), len(no_dict_id) + 4) | |
|
86 | ||
|
87 | def test_compress_dict_multiple(self): | |
|
88 | samples = [] | |
|
89 | for i in range(128): | |
|
90 | samples.append(b'foo' * 64) | |
|
91 | samples.append(b'bar' * 64) | |
|
92 | samples.append(b'foobar' * 64) | |
|
93 | ||
|
94 | d = zstd.train_dictionary(8192, samples) | |
|
95 | ||
|
96 | cctx = zstd.ZstdCompressor(level=1, dict_data=d) | |
|
97 | ||
|
98 | for i in range(32): | |
|
99 | cctx.compress(b'foo bar foobar foo bar foobar') | |
|
100 | ||
|
101 | ||
|
102 | class TestCompressor_compressobj(unittest.TestCase): | |
|
103 | def test_compressobj_empty(self): | |
|
104 | cctx = zstd.ZstdCompressor(level=1) | |
|
105 | cobj = cctx.compressobj() | |
|
106 | self.assertEqual(cobj.compress(b''), b'') | |
|
107 | self.assertEqual(cobj.flush(), | |
|
108 | b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00') | |
|
109 | ||
|
110 | def test_compressobj_large(self): | |
|
111 | chunks = [] | |
|
112 | for i in range(255): | |
|
113 | chunks.append(struct.Struct('>B').pack(i) * 16384) | |
|
114 | ||
|
115 | cctx = zstd.ZstdCompressor(level=3) | |
|
116 | cobj = cctx.compressobj() | |
|
117 | ||
|
118 | result = cobj.compress(b''.join(chunks)) + cobj.flush() | |
|
119 | self.assertEqual(len(result), 999) | |
|
120 | self.assertEqual(result[0:4], b'\x28\xb5\x2f\xfd') | |
|
121 | ||
|
122 | def test_write_checksum(self): | |
|
123 | cctx = zstd.ZstdCompressor(level=1) | |
|
124 | cobj = cctx.compressobj() | |
|
125 | no_checksum = cobj.compress(b'foobar') + cobj.flush() | |
|
126 | cctx = zstd.ZstdCompressor(level=1, write_checksum=True) | |
|
127 | cobj = cctx.compressobj() | |
|
128 | with_checksum = cobj.compress(b'foobar') + cobj.flush() | |
|
129 | ||
|
130 | self.assertEqual(len(with_checksum), len(no_checksum) + 4) | |
|
131 | ||
|
132 | def test_write_content_size(self): | |
|
133 | cctx = zstd.ZstdCompressor(level=1) | |
|
134 | cobj = cctx.compressobj(size=len(b'foobar' * 256)) | |
|
135 | no_size = cobj.compress(b'foobar' * 256) + cobj.flush() | |
|
136 | cctx = zstd.ZstdCompressor(level=1, write_content_size=True) | |
|
137 | cobj = cctx.compressobj(size=len(b'foobar' * 256)) | |
|
138 | with_size = cobj.compress(b'foobar' * 256) + cobj.flush() | |
|
139 | ||
|
140 | self.assertEqual(len(with_size), len(no_size) + 1) | |
|
141 | ||
|
142 | def test_compress_after_flush(self): | |
|
143 | cctx = zstd.ZstdCompressor() | |
|
144 | cobj = cctx.compressobj() | |
|
145 | ||
|
146 | cobj.compress(b'foo') | |
|
147 | cobj.flush() | |
|
148 | ||
|
149 | with self.assertRaisesRegexp(zstd.ZstdError, 'cannot call compress\(\) after flush'): | |
|
150 | cobj.compress(b'foo') | |
|
151 | ||
|
152 | with self.assertRaisesRegexp(zstd.ZstdError, 'flush\(\) already called'): | |
|
153 | cobj.flush() | |
|
154 | ||
|
155 | ||
|
156 | class TestCompressor_copy_stream(unittest.TestCase): | |
|
157 | def test_no_read(self): | |
|
158 | source = object() | |
|
159 | dest = io.BytesIO() | |
|
160 | ||
|
161 | cctx = zstd.ZstdCompressor() | |
|
162 | with self.assertRaises(ValueError): | |
|
163 | cctx.copy_stream(source, dest) | |
|
164 | ||
|
165 | def test_no_write(self): | |
|
166 | source = io.BytesIO() | |
|
167 | dest = object() | |
|
168 | ||
|
169 | cctx = zstd.ZstdCompressor() | |
|
170 | with self.assertRaises(ValueError): | |
|
171 | cctx.copy_stream(source, dest) | |
|
172 | ||
|
173 | def test_empty(self): | |
|
174 | source = io.BytesIO() | |
|
175 | dest = io.BytesIO() | |
|
176 | ||
|
177 | cctx = zstd.ZstdCompressor(level=1) | |
|
178 | r, w = cctx.copy_stream(source, dest) | |
|
179 | self.assertEqual(int(r), 0) | |
|
180 | self.assertEqual(w, 9) | |
|
181 | ||
|
182 | self.assertEqual(dest.getvalue(), | |
|
183 | b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00') | |
|
184 | ||
|
185 | def test_large_data(self): | |
|
186 | source = io.BytesIO() | |
|
187 | for i in range(255): | |
|
188 | source.write(struct.Struct('>B').pack(i) * 16384) | |
|
189 | source.seek(0) | |
|
190 | ||
|
191 | dest = io.BytesIO() | |
|
192 | cctx = zstd.ZstdCompressor() | |
|
193 | r, w = cctx.copy_stream(source, dest) | |
|
194 | ||
|
195 | self.assertEqual(r, 255 * 16384) | |
|
196 | self.assertEqual(w, 999) | |
|
197 | ||
|
198 | def test_write_checksum(self): | |
|
199 | source = io.BytesIO(b'foobar') | |
|
200 | no_checksum = io.BytesIO() | |
|
201 | ||
|
202 | cctx = zstd.ZstdCompressor(level=1) | |
|
203 | cctx.copy_stream(source, no_checksum) | |
|
204 | ||
|
205 | source.seek(0) | |
|
206 | with_checksum = io.BytesIO() | |
|
207 | cctx = zstd.ZstdCompressor(level=1, write_checksum=True) | |
|
208 | cctx.copy_stream(source, with_checksum) | |
|
209 | ||
|
210 | self.assertEqual(len(with_checksum.getvalue()), | |
|
211 | len(no_checksum.getvalue()) + 4) | |
|
212 | ||
|
213 | def test_write_content_size(self): | |
|
214 | source = io.BytesIO(b'foobar' * 256) | |
|
215 | no_size = io.BytesIO() | |
|
216 | ||
|
217 | cctx = zstd.ZstdCompressor(level=1) | |
|
218 | cctx.copy_stream(source, no_size) | |
|
219 | ||
|
220 | source.seek(0) | |
|
221 | with_size = io.BytesIO() | |
|
222 | cctx = zstd.ZstdCompressor(level=1, write_content_size=True) | |
|
223 | cctx.copy_stream(source, with_size) | |
|
224 | ||
|
225 | # Source content size is unknown, so no content size written. | |
|
226 | self.assertEqual(len(with_size.getvalue()), | |
|
227 | len(no_size.getvalue())) | |
|
228 | ||
|
229 | source.seek(0) | |
|
230 | with_size = io.BytesIO() | |
|
231 | cctx.copy_stream(source, with_size, size=len(source.getvalue())) | |
|
232 | ||
|
233 | # We specified source size, so content size header is present. | |
|
234 | self.assertEqual(len(with_size.getvalue()), | |
|
235 | len(no_size.getvalue()) + 1) | |
|
236 | ||
|
237 | def test_read_write_size(self): | |
|
238 | source = OpCountingBytesIO(b'foobarfoobar') | |
|
239 | dest = OpCountingBytesIO() | |
|
240 | cctx = zstd.ZstdCompressor() | |
|
241 | r, w = cctx.copy_stream(source, dest, read_size=1, write_size=1) | |
|
242 | ||
|
243 | self.assertEqual(r, len(source.getvalue())) | |
|
244 | self.assertEqual(w, 21) | |
|
245 | self.assertEqual(source._read_count, len(source.getvalue()) + 1) | |
|
246 | self.assertEqual(dest._write_count, len(dest.getvalue())) | |
|
247 | ||
|
248 | ||
|
249 | def compress(data, level): | |
|
250 | buffer = io.BytesIO() | |
|
251 | cctx = zstd.ZstdCompressor(level=level) | |
|
252 | with cctx.write_to(buffer) as compressor: | |
|
253 | compressor.write(data) | |
|
254 | return buffer.getvalue() | |
|
255 | ||
|
256 | ||
|
257 | class TestCompressor_write_to(unittest.TestCase): | |
|
258 | def test_empty(self): | |
|
259 | self.assertEqual(compress(b'', 1), | |
|
260 | b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00') | |
|
261 | ||
|
262 | def test_multiple_compress(self): | |
|
263 | buffer = io.BytesIO() | |
|
264 | cctx = zstd.ZstdCompressor(level=5) | |
|
265 | with cctx.write_to(buffer) as compressor: | |
|
266 | compressor.write(b'foo') | |
|
267 | compressor.write(b'bar') | |
|
268 | compressor.write(b'x' * 8192) | |
|
269 | ||
|
270 | result = buffer.getvalue() | |
|
271 | self.assertEqual(result, | |
|
272 | b'\x28\xb5\x2f\xfd\x00\x50\x75\x00\x00\x38\x66\x6f' | |
|
273 | b'\x6f\x62\x61\x72\x78\x01\x00\xfc\xdf\x03\x23') | |
|
274 | ||
|
275 | def test_dictionary(self): | |
|
276 | samples = [] | |
|
277 | for i in range(128): | |
|
278 | samples.append(b'foo' * 64) | |
|
279 | samples.append(b'bar' * 64) | |
|
280 | samples.append(b'foobar' * 64) | |
|
281 | ||
|
282 | d = zstd.train_dictionary(8192, samples) | |
|
283 | ||
|
284 | buffer = io.BytesIO() | |
|
285 | cctx = zstd.ZstdCompressor(level=9, dict_data=d) | |
|
286 | with cctx.write_to(buffer) as compressor: | |
|
287 | compressor.write(b'foo') | |
|
288 | compressor.write(b'bar') | |
|
289 | compressor.write(b'foo' * 16384) | |
|
290 | ||
|
291 | compressed = buffer.getvalue() | |
|
292 | h = hashlib.sha1(compressed).hexdigest() | |
|
293 | self.assertEqual(h, '1c5bcd25181bcd8c1a73ea8773323e0056129f92') | |
|
294 | ||
|
295 | def test_compression_params(self): | |
|
296 | params = zstd.CompressionParameters(20, 6, 12, 5, 4, 10, zstd.STRATEGY_FAST) | |
|
297 | ||
|
298 | buffer = io.BytesIO() | |
|
299 | cctx = zstd.ZstdCompressor(compression_params=params) | |
|
300 | with cctx.write_to(buffer) as compressor: | |
|
301 | compressor.write(b'foo') | |
|
302 | compressor.write(b'bar') | |
|
303 | compressor.write(b'foobar' * 16384) | |
|
304 | ||
|
305 | compressed = buffer.getvalue() | |
|
306 | h = hashlib.sha1(compressed).hexdigest() | |
|
307 | self.assertEqual(h, '1ae31f270ed7de14235221a604b31ecd517ebd99') | |
|
308 | ||
|
309 | def test_write_checksum(self): | |
|
310 | no_checksum = io.BytesIO() | |
|
311 | cctx = zstd.ZstdCompressor(level=1) | |
|
312 | with cctx.write_to(no_checksum) as compressor: | |
|
313 | compressor.write(b'foobar') | |
|
314 | ||
|
315 | with_checksum = io.BytesIO() | |
|
316 | cctx = zstd.ZstdCompressor(level=1, write_checksum=True) | |
|
317 | with cctx.write_to(with_checksum) as compressor: | |
|
318 | compressor.write(b'foobar') | |
|
319 | ||
|
320 | self.assertEqual(len(with_checksum.getvalue()), | |
|
321 | len(no_checksum.getvalue()) + 4) | |
|
322 | ||
|
323 | def test_write_content_size(self): | |
|
324 | no_size = io.BytesIO() | |
|
325 | cctx = zstd.ZstdCompressor(level=1) | |
|
326 | with cctx.write_to(no_size) as compressor: | |
|
327 | compressor.write(b'foobar' * 256) | |
|
328 | ||
|
329 | with_size = io.BytesIO() | |
|
330 | cctx = zstd.ZstdCompressor(level=1, write_content_size=True) | |
|
331 | with cctx.write_to(with_size) as compressor: | |
|
332 | compressor.write(b'foobar' * 256) | |
|
333 | ||
|
334 | # Source size is not known in streaming mode, so header not | |
|
335 | # written. | |
|
336 | self.assertEqual(len(with_size.getvalue()), | |
|
337 | len(no_size.getvalue())) | |
|
338 | ||
|
339 | # Declaring size will write the header. | |
|
340 | with_size = io.BytesIO() | |
|
341 | with cctx.write_to(with_size, size=len(b'foobar' * 256)) as compressor: | |
|
342 | compressor.write(b'foobar' * 256) | |
|
343 | ||
|
344 | self.assertEqual(len(with_size.getvalue()), | |
|
345 | len(no_size.getvalue()) + 1) | |
|
346 | ||
|
347 | def test_no_dict_id(self): | |
|
348 | samples = [] | |
|
349 | for i in range(128): | |
|
350 | samples.append(b'foo' * 64) | |
|
351 | samples.append(b'bar' * 64) | |
|
352 | samples.append(b'foobar' * 64) | |
|
353 | ||
|
354 | d = zstd.train_dictionary(1024, samples) | |
|
355 | ||
|
356 | with_dict_id = io.BytesIO() | |
|
357 | cctx = zstd.ZstdCompressor(level=1, dict_data=d) | |
|
358 | with cctx.write_to(with_dict_id) as compressor: | |
|
359 | compressor.write(b'foobarfoobar') | |
|
360 | ||
|
361 | cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_dict_id=False) | |
|
362 | no_dict_id = io.BytesIO() | |
|
363 | with cctx.write_to(no_dict_id) as compressor: | |
|
364 | compressor.write(b'foobarfoobar') | |
|
365 | ||
|
366 | self.assertEqual(len(with_dict_id.getvalue()), | |
|
367 | len(no_dict_id.getvalue()) + 4) | |
|
368 | ||
|
369 | def test_memory_size(self): | |
|
370 | cctx = zstd.ZstdCompressor(level=3) | |
|
371 | buffer = io.BytesIO() | |
|
372 | with cctx.write_to(buffer) as compressor: | |
|
373 | size = compressor.memory_size() | |
|
374 | ||
|
375 | self.assertGreater(size, 100000) | |
|
376 | ||
|
377 | def test_write_size(self): | |
|
378 | cctx = zstd.ZstdCompressor(level=3) | |
|
379 | dest = OpCountingBytesIO() | |
|
380 | with cctx.write_to(dest, write_size=1) as compressor: | |
|
381 | compressor.write(b'foo') | |
|
382 | compressor.write(b'bar') | |
|
383 | compressor.write(b'foobar') | |
|
384 | ||
|
385 | self.assertEqual(len(dest.getvalue()), dest._write_count) | |
|
386 | ||
|
387 | ||
|
388 | class TestCompressor_read_from(unittest.TestCase): | |
|
389 | def test_type_validation(self): | |
|
390 | cctx = zstd.ZstdCompressor() | |
|
391 | ||
|
392 | # Object with read() works. | |
|
393 | cctx.read_from(io.BytesIO()) | |
|
394 | ||
|
395 | # Buffer protocol works. | |
|
396 | cctx.read_from(b'foobar') | |
|
397 | ||
|
398 | with self.assertRaisesRegexp(ValueError, 'must pass an object with a read'): | |
|
399 | cctx.read_from(True) | |
|
400 | ||
|
401 | def test_read_empty(self): | |
|
402 | cctx = zstd.ZstdCompressor(level=1) | |
|
403 | ||
|
404 | source = io.BytesIO() | |
|
405 | it = cctx.read_from(source) | |
|
406 | chunks = list(it) | |
|
407 | self.assertEqual(len(chunks), 1) | |
|
408 | compressed = b''.join(chunks) | |
|
409 | self.assertEqual(compressed, b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00') | |
|
410 | ||
|
411 | # And again with the buffer protocol. | |
|
412 | it = cctx.read_from(b'') | |
|
413 | chunks = list(it) | |
|
414 | self.assertEqual(len(chunks), 1) | |
|
415 | compressed2 = b''.join(chunks) | |
|
416 | self.assertEqual(compressed2, compressed) | |
|
417 | ||
|
418 | def test_read_large(self): | |
|
419 | cctx = zstd.ZstdCompressor(level=1) | |
|
420 | ||
|
421 | source = io.BytesIO() | |
|
422 | source.write(b'f' * zstd.COMPRESSION_RECOMMENDED_INPUT_SIZE) | |
|
423 | source.write(b'o') | |
|
424 | source.seek(0) | |
|
425 | ||
|
426 | # Creating an iterator should not perform any compression until | |
|
427 | # first read. | |
|
428 | it = cctx.read_from(source, size=len(source.getvalue())) | |
|
429 | self.assertEqual(source.tell(), 0) | |
|
430 | ||
|
431 | # We should have exactly 2 output chunks. | |
|
432 | chunks = [] | |
|
433 | chunk = next(it) | |
|
434 | self.assertIsNotNone(chunk) | |
|
435 | self.assertEqual(source.tell(), zstd.COMPRESSION_RECOMMENDED_INPUT_SIZE) | |
|
436 | chunks.append(chunk) | |
|
437 | chunk = next(it) | |
|
438 | self.assertIsNotNone(chunk) | |
|
439 | chunks.append(chunk) | |
|
440 | ||
|
441 | self.assertEqual(source.tell(), len(source.getvalue())) | |
|
442 | ||
|
443 | with self.assertRaises(StopIteration): | |
|
444 | next(it) | |
|
445 | ||
|
446 | # And again for good measure. | |
|
447 | with self.assertRaises(StopIteration): | |
|
448 | next(it) | |
|
449 | ||
|
450 | # We should get the same output as the one-shot compression mechanism. | |
|
451 | self.assertEqual(b''.join(chunks), cctx.compress(source.getvalue())) | |
|
452 | ||
|
453 | # Now check the buffer protocol. | |
|
454 | it = cctx.read_from(source.getvalue()) | |
|
455 | chunks = list(it) | |
|
456 | self.assertEqual(len(chunks), 2) | |
|
457 | self.assertEqual(b''.join(chunks), cctx.compress(source.getvalue())) | |
|
458 | ||
|
459 | def test_read_write_size(self): | |
|
460 | source = OpCountingBytesIO(b'foobarfoobar') | |
|
461 | cctx = zstd.ZstdCompressor(level=3) | |
|
462 | for chunk in cctx.read_from(source, read_size=1, write_size=1): | |
|
463 | self.assertEqual(len(chunk), 1) | |
|
464 | ||
|
465 | self.assertEqual(source._read_count, len(source.getvalue()) + 1) |
@@ -0,0 +1,107 b'' | |||
|
1 | import io | |
|
2 | ||
|
3 | try: | |
|
4 | import unittest2 as unittest | |
|
5 | except ImportError: | |
|
6 | import unittest | |
|
7 | ||
|
8 | try: | |
|
9 | import hypothesis | |
|
10 | import hypothesis.strategies as strategies | |
|
11 | except ImportError: | |
|
12 | hypothesis = None | |
|
13 | ||
|
14 | import zstd | |
|
15 | ||
|
16 | class TestCompressionParameters(unittest.TestCase): | |
|
17 | def test_init_bad_arg_type(self): | |
|
18 | with self.assertRaises(TypeError): | |
|
19 | zstd.CompressionParameters() | |
|
20 | ||
|
21 | with self.assertRaises(TypeError): | |
|
22 | zstd.CompressionParameters(0, 1) | |
|
23 | ||
|
24 | def test_bounds(self): | |
|
25 | zstd.CompressionParameters(zstd.WINDOWLOG_MIN, | |
|
26 | zstd.CHAINLOG_MIN, | |
|
27 | zstd.HASHLOG_MIN, | |
|
28 | zstd.SEARCHLOG_MIN, | |
|
29 | zstd.SEARCHLENGTH_MIN, | |
|
30 | zstd.TARGETLENGTH_MIN, | |
|
31 | zstd.STRATEGY_FAST) | |
|
32 | ||
|
33 | zstd.CompressionParameters(zstd.WINDOWLOG_MAX, | |
|
34 | zstd.CHAINLOG_MAX, | |
|
35 | zstd.HASHLOG_MAX, | |
|
36 | zstd.SEARCHLOG_MAX, | |
|
37 | zstd.SEARCHLENGTH_MAX, | |
|
38 | zstd.TARGETLENGTH_MAX, | |
|
39 | zstd.STRATEGY_BTOPT) | |
|
40 | ||
|
41 | def test_get_compression_parameters(self): | |
|
42 | p = zstd.get_compression_parameters(1) | |
|
43 | self.assertIsInstance(p, zstd.CompressionParameters) | |
|
44 | ||
|
45 | self.assertEqual(p[0], 19) | |
|
46 | ||
|
47 | if hypothesis: | |
|
48 | s_windowlog = strategies.integers(min_value=zstd.WINDOWLOG_MIN, | |
|
49 | max_value=zstd.WINDOWLOG_MAX) | |
|
50 | s_chainlog = strategies.integers(min_value=zstd.CHAINLOG_MIN, | |
|
51 | max_value=zstd.CHAINLOG_MAX) | |
|
52 | s_hashlog = strategies.integers(min_value=zstd.HASHLOG_MIN, | |
|
53 | max_value=zstd.HASHLOG_MAX) | |
|
54 | s_searchlog = strategies.integers(min_value=zstd.SEARCHLOG_MIN, | |
|
55 | max_value=zstd.SEARCHLOG_MAX) | |
|
56 | s_searchlength = strategies.integers(min_value=zstd.SEARCHLENGTH_MIN, | |
|
57 | max_value=zstd.SEARCHLENGTH_MAX) | |
|
58 | s_targetlength = strategies.integers(min_value=zstd.TARGETLENGTH_MIN, | |
|
59 | max_value=zstd.TARGETLENGTH_MAX) | |
|
60 | s_strategy = strategies.sampled_from((zstd.STRATEGY_FAST, | |
|
61 | zstd.STRATEGY_DFAST, | |
|
62 | zstd.STRATEGY_GREEDY, | |
|
63 | zstd.STRATEGY_LAZY, | |
|
64 | zstd.STRATEGY_LAZY2, | |
|
65 | zstd.STRATEGY_BTLAZY2, | |
|
66 | zstd.STRATEGY_BTOPT)) | |
|
67 | ||
|
68 | class TestCompressionParametersHypothesis(unittest.TestCase): | |
|
69 | @hypothesis.given(s_windowlog, s_chainlog, s_hashlog, s_searchlog, | |
|
70 | s_searchlength, s_targetlength, s_strategy) | |
|
71 | def test_valid_init(self, windowlog, chainlog, hashlog, searchlog, | |
|
72 | searchlength, targetlength, strategy): | |
|
73 | p = zstd.CompressionParameters(windowlog, chainlog, hashlog, | |
|
74 | searchlog, searchlength, | |
|
75 | targetlength, strategy) | |
|
76 | self.assertEqual(tuple(p), | |
|
77 | (windowlog, chainlog, hashlog, searchlog, | |
|
78 | searchlength, targetlength, strategy)) | |
|
79 | ||
|
80 | # Verify we can instantiate a compressor with the supplied values. | |
|
81 | # ZSTD_checkCParams moves the goal posts on us from what's advertised | |
|
82 | # in the constants. So move along with them. | |
|
83 | if searchlength == zstd.SEARCHLENGTH_MIN and strategy in (zstd.STRATEGY_FAST, zstd.STRATEGY_GREEDY): | |
|
84 | searchlength += 1 | |
|
85 | p = zstd.CompressionParameters(windowlog, chainlog, hashlog, | |
|
86 | searchlog, searchlength, | |
|
87 | targetlength, strategy) | |
|
88 | elif searchlength == zstd.SEARCHLENGTH_MAX and strategy != zstd.STRATEGY_FAST: | |
|
89 | searchlength -= 1 | |
|
90 | p = zstd.CompressionParameters(windowlog, chainlog, hashlog, | |
|
91 | searchlog, searchlength, | |
|
92 | targetlength, strategy) | |
|
93 | ||
|
94 | cctx = zstd.ZstdCompressor(compression_params=p) | |
|
95 | with cctx.write_to(io.BytesIO()): | |
|
96 | pass | |
|
97 | ||
|
98 | @hypothesis.given(s_windowlog, s_chainlog, s_hashlog, s_searchlog, | |
|
99 | s_searchlength, s_targetlength, s_strategy) | |
|
100 | def test_estimate_compression_context_size(self, windowlog, chainlog, | |
|
101 | hashlog, searchlog, | |
|
102 | searchlength, targetlength, | |
|
103 | strategy): | |
|
104 | p = zstd.CompressionParameters(windowlog, chainlog, hashlog, | |
|
105 | searchlog, searchlength, | |
|
106 | targetlength, strategy) | |
|
107 | size = zstd.estimate_compression_context_size(p) |
@@ -0,0 +1,478 b'' | |||
|
1 | import io | |
|
2 | import random | |
|
3 | import struct | |
|
4 | import sys | |
|
5 | ||
|
6 | try: | |
|
7 | import unittest2 as unittest | |
|
8 | except ImportError: | |
|
9 | import unittest | |
|
10 | ||
|
11 | import zstd | |
|
12 | ||
|
13 | from .common import OpCountingBytesIO | |
|
14 | ||
|
15 | ||
|
16 | if sys.version_info[0] >= 3: | |
|
17 | next = lambda it: it.__next__() | |
|
18 | else: | |
|
19 | next = lambda it: it.next() | |
|
20 | ||
|
21 | ||
|
22 | class TestDecompressor_decompress(unittest.TestCase): | |
|
23 | def test_empty_input(self): | |
|
24 | dctx = zstd.ZstdDecompressor() | |
|
25 | ||
|
26 | with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'): | |
|
27 | dctx.decompress(b'') | |
|
28 | ||
|
29 | def test_invalid_input(self): | |
|
30 | dctx = zstd.ZstdDecompressor() | |
|
31 | ||
|
32 | with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'): | |
|
33 | dctx.decompress(b'foobar') | |
|
34 | ||
|
35 | def test_no_content_size_in_frame(self): | |
|
36 | cctx = zstd.ZstdCompressor(write_content_size=False) | |
|
37 | compressed = cctx.compress(b'foobar') | |
|
38 | ||
|
39 | dctx = zstd.ZstdDecompressor() | |
|
40 | with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'): | |
|
41 | dctx.decompress(compressed) | |
|
42 | ||
|
43 | def test_content_size_present(self): | |
|
44 | cctx = zstd.ZstdCompressor(write_content_size=True) | |
|
45 | compressed = cctx.compress(b'foobar') | |
|
46 | ||
|
47 | dctx = zstd.ZstdDecompressor() | |
|
48 | decompressed = dctx.decompress(compressed) | |
|
49 | self.assertEqual(decompressed, b'foobar') | |
|
50 | ||
|
51 | def test_max_output_size(self): | |
|
52 | cctx = zstd.ZstdCompressor(write_content_size=False) | |
|
53 | source = b'foobar' * 256 | |
|
54 | compressed = cctx.compress(source) | |
|
55 | ||
|
56 | dctx = zstd.ZstdDecompressor() | |
|
57 | # Will fit into buffer exactly the size of input. | |
|
58 | decompressed = dctx.decompress(compressed, max_output_size=len(source)) | |
|
59 | self.assertEqual(decompressed, source) | |
|
60 | ||
|
61 | # Input size - 1 fails | |
|
62 | with self.assertRaisesRegexp(zstd.ZstdError, 'Destination buffer is too small'): | |
|
63 | dctx.decompress(compressed, max_output_size=len(source) - 1) | |
|
64 | ||
|
65 | # Input size + 1 works | |
|
66 | decompressed = dctx.decompress(compressed, max_output_size=len(source) + 1) | |
|
67 | self.assertEqual(decompressed, source) | |
|
68 | ||
|
69 | # A much larger buffer works. | |
|
70 | decompressed = dctx.decompress(compressed, max_output_size=len(source) * 64) | |
|
71 | self.assertEqual(decompressed, source) | |
|
72 | ||
|
73 | def test_stupidly_large_output_buffer(self): | |
|
74 | cctx = zstd.ZstdCompressor(write_content_size=False) | |
|
75 | compressed = cctx.compress(b'foobar' * 256) | |
|
76 | dctx = zstd.ZstdDecompressor() | |
|
77 | ||
|
78 | # Will get OverflowError on some Python distributions that can't | |
|
79 | # handle really large integers. | |
|
80 | with self.assertRaises((MemoryError, OverflowError)): | |
|
81 | dctx.decompress(compressed, max_output_size=2**62) | |
|
82 | ||
|
83 | def test_dictionary(self): | |
|
84 | samples = [] | |
|
85 | for i in range(128): | |
|
86 | samples.append(b'foo' * 64) | |
|
87 | samples.append(b'bar' * 64) | |
|
88 | samples.append(b'foobar' * 64) | |
|
89 | ||
|
90 | d = zstd.train_dictionary(8192, samples) | |
|
91 | ||
|
92 | orig = b'foobar' * 16384 | |
|
93 | cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_content_size=True) | |
|
94 | compressed = cctx.compress(orig) | |
|
95 | ||
|
96 | dctx = zstd.ZstdDecompressor(dict_data=d) | |
|
97 | decompressed = dctx.decompress(compressed) | |
|
98 | ||
|
99 | self.assertEqual(decompressed, orig) | |
|
100 | ||
|
101 | def test_dictionary_multiple(self): | |
|
102 | samples = [] | |
|
103 | for i in range(128): | |
|
104 | samples.append(b'foo' * 64) | |
|
105 | samples.append(b'bar' * 64) | |
|
106 | samples.append(b'foobar' * 64) | |
|
107 | ||
|
108 | d = zstd.train_dictionary(8192, samples) | |
|
109 | ||
|
110 | sources = (b'foobar' * 8192, b'foo' * 8192, b'bar' * 8192) | |
|
111 | compressed = [] | |
|
112 | cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_content_size=True) | |
|
113 | for source in sources: | |
|
114 | compressed.append(cctx.compress(source)) | |
|
115 | ||
|
116 | dctx = zstd.ZstdDecompressor(dict_data=d) | |
|
117 | for i in range(len(sources)): | |
|
118 | decompressed = dctx.decompress(compressed[i]) | |
|
119 | self.assertEqual(decompressed, sources[i]) | |
|
120 | ||
|
121 | ||
|
122 | class TestDecompressor_copy_stream(unittest.TestCase): | |
|
123 | def test_no_read(self): | |
|
124 | source = object() | |
|
125 | dest = io.BytesIO() | |
|
126 | ||
|
127 | dctx = zstd.ZstdDecompressor() | |
|
128 | with self.assertRaises(ValueError): | |
|
129 | dctx.copy_stream(source, dest) | |
|
130 | ||
|
131 | def test_no_write(self): | |
|
132 | source = io.BytesIO() | |
|
133 | dest = object() | |
|
134 | ||
|
135 | dctx = zstd.ZstdDecompressor() | |
|
136 | with self.assertRaises(ValueError): | |
|
137 | dctx.copy_stream(source, dest) | |
|
138 | ||
|
139 | def test_empty(self): | |
|
140 | source = io.BytesIO() | |
|
141 | dest = io.BytesIO() | |
|
142 | ||
|
143 | dctx = zstd.ZstdDecompressor() | |
|
144 | # TODO should this raise an error? | |
|
145 | r, w = dctx.copy_stream(source, dest) | |
|
146 | ||
|
147 | self.assertEqual(r, 0) | |
|
148 | self.assertEqual(w, 0) | |
|
149 | self.assertEqual(dest.getvalue(), b'') | |
|
150 | ||
|
151 | def test_large_data(self): | |
|
152 | source = io.BytesIO() | |
|
153 | for i in range(255): | |
|
154 | source.write(struct.Struct('>B').pack(i) * 16384) | |
|
155 | source.seek(0) | |
|
156 | ||
|
157 | compressed = io.BytesIO() | |
|
158 | cctx = zstd.ZstdCompressor() | |
|
159 | cctx.copy_stream(source, compressed) | |
|
160 | ||
|
161 | compressed.seek(0) | |
|
162 | dest = io.BytesIO() | |
|
163 | dctx = zstd.ZstdDecompressor() | |
|
164 | r, w = dctx.copy_stream(compressed, dest) | |
|
165 | ||
|
166 | self.assertEqual(r, len(compressed.getvalue())) | |
|
167 | self.assertEqual(w, len(source.getvalue())) | |
|
168 | ||
|
169 | def test_read_write_size(self): | |
|
170 | source = OpCountingBytesIO(zstd.ZstdCompressor().compress( | |
|
171 | b'foobarfoobar')) | |
|
172 | ||
|
173 | dest = OpCountingBytesIO() | |
|
174 | dctx = zstd.ZstdDecompressor() | |
|
175 | r, w = dctx.copy_stream(source, dest, read_size=1, write_size=1) | |
|
176 | ||
|
177 | self.assertEqual(r, len(source.getvalue())) | |
|
178 | self.assertEqual(w, len(b'foobarfoobar')) | |
|
179 | self.assertEqual(source._read_count, len(source.getvalue()) + 1) | |
|
180 | self.assertEqual(dest._write_count, len(dest.getvalue())) | |
|
181 | ||
|
182 | ||
|
183 | class TestDecompressor_decompressobj(unittest.TestCase): | |
|
184 | def test_simple(self): | |
|
185 | data = zstd.ZstdCompressor(level=1).compress(b'foobar') | |
|
186 | ||
|
187 | dctx = zstd.ZstdDecompressor() | |
|
188 | dobj = dctx.decompressobj() | |
|
189 | self.assertEqual(dobj.decompress(data), b'foobar') | |
|
190 | ||
|
191 | def test_reuse(self): | |
|
192 | data = zstd.ZstdCompressor(level=1).compress(b'foobar') | |
|
193 | ||
|
194 | dctx = zstd.ZstdDecompressor() | |
|
195 | dobj = dctx.decompressobj() | |
|
196 | dobj.decompress(data) | |
|
197 | ||
|
198 | with self.assertRaisesRegexp(zstd.ZstdError, 'cannot use a decompressobj'): | |
|
199 | dobj.decompress(data) | |
|
200 | ||
|
201 | ||
|
202 | def decompress_via_writer(data): | |
|
203 | buffer = io.BytesIO() | |
|
204 | dctx = zstd.ZstdDecompressor() | |
|
205 | with dctx.write_to(buffer) as decompressor: | |
|
206 | decompressor.write(data) | |
|
207 | return buffer.getvalue() | |
|
208 | ||
|
209 | ||
|
210 | class TestDecompressor_write_to(unittest.TestCase): | |
|
211 | def test_empty_roundtrip(self): | |
|
212 | cctx = zstd.ZstdCompressor() | |
|
213 | empty = cctx.compress(b'') | |
|
214 | self.assertEqual(decompress_via_writer(empty), b'') | |
|
215 | ||
|
216 | def test_large_roundtrip(self): | |
|
217 | chunks = [] | |
|
218 | for i in range(255): | |
|
219 | chunks.append(struct.Struct('>B').pack(i) * 16384) | |
|
220 | orig = b''.join(chunks) | |
|
221 | cctx = zstd.ZstdCompressor() | |
|
222 | compressed = cctx.compress(orig) | |
|
223 | ||
|
224 | self.assertEqual(decompress_via_writer(compressed), orig) | |
|
225 | ||
|
226 | def test_multiple_calls(self): | |
|
227 | chunks = [] | |
|
228 | for i in range(255): | |
|
229 | for j in range(255): | |
|
230 | chunks.append(struct.Struct('>B').pack(j) * i) | |
|
231 | ||
|
232 | orig = b''.join(chunks) | |
|
233 | cctx = zstd.ZstdCompressor() | |
|
234 | compressed = cctx.compress(orig) | |
|
235 | ||
|
236 | buffer = io.BytesIO() | |
|
237 | dctx = zstd.ZstdDecompressor() | |
|
238 | with dctx.write_to(buffer) as decompressor: | |
|
239 | pos = 0 | |
|
240 | while pos < len(compressed): | |
|
241 | pos2 = pos + 8192 | |
|
242 | decompressor.write(compressed[pos:pos2]) | |
|
243 | pos += 8192 | |
|
244 | self.assertEqual(buffer.getvalue(), orig) | |
|
245 | ||
|
246 | def test_dictionary(self): | |
|
247 | samples = [] | |
|
248 | for i in range(128): | |
|
249 | samples.append(b'foo' * 64) | |
|
250 | samples.append(b'bar' * 64) | |
|
251 | samples.append(b'foobar' * 64) | |
|
252 | ||
|
253 | d = zstd.train_dictionary(8192, samples) | |
|
254 | ||
|
255 | orig = b'foobar' * 16384 | |
|
256 | buffer = io.BytesIO() | |
|
257 | cctx = zstd.ZstdCompressor(dict_data=d) | |
|
258 | with cctx.write_to(buffer) as compressor: | |
|
259 | compressor.write(orig) | |
|
260 | ||
|
261 | compressed = buffer.getvalue() | |
|
262 | buffer = io.BytesIO() | |
|
263 | ||
|
264 | dctx = zstd.ZstdDecompressor(dict_data=d) | |
|
265 | with dctx.write_to(buffer) as decompressor: | |
|
266 | decompressor.write(compressed) | |
|
267 | ||
|
268 | self.assertEqual(buffer.getvalue(), orig) | |
|
269 | ||
|
270 | def test_memory_size(self): | |
|
271 | dctx = zstd.ZstdDecompressor() | |
|
272 | buffer = io.BytesIO() | |
|
273 | with dctx.write_to(buffer) as decompressor: | |
|
274 | size = decompressor.memory_size() | |
|
275 | ||
|
276 | self.assertGreater(size, 100000) | |
|
277 | ||
|
278 | def test_write_size(self): | |
|
279 | source = zstd.ZstdCompressor().compress(b'foobarfoobar') | |
|
280 | dest = OpCountingBytesIO() | |
|
281 | dctx = zstd.ZstdDecompressor() | |
|
282 | with dctx.write_to(dest, write_size=1) as decompressor: | |
|
283 | s = struct.Struct('>B') | |
|
284 | for c in source: | |
|
285 | if not isinstance(c, str): | |
|
286 | c = s.pack(c) | |
|
287 | decompressor.write(c) | |
|
288 | ||
|
289 | ||
|
290 | self.assertEqual(dest.getvalue(), b'foobarfoobar') | |
|
291 | self.assertEqual(dest._write_count, len(dest.getvalue())) | |
|
292 | ||
|
293 | ||
|
294 | class TestDecompressor_read_from(unittest.TestCase): | |
|
295 | def test_type_validation(self): | |
|
296 | dctx = zstd.ZstdDecompressor() | |
|
297 | ||
|
298 | # Object with read() works. | |
|
299 | dctx.read_from(io.BytesIO()) | |
|
300 | ||
|
301 | # Buffer protocol works. | |
|
302 | dctx.read_from(b'foobar') | |
|
303 | ||
|
304 | with self.assertRaisesRegexp(ValueError, 'must pass an object with a read'): | |
|
305 | dctx.read_from(True) | |
|
306 | ||
|
307 | def test_empty_input(self): | |
|
308 | dctx = zstd.ZstdDecompressor() | |
|
309 | ||
|
310 | source = io.BytesIO() | |
|
311 | it = dctx.read_from(source) | |
|
312 | # TODO this is arguably wrong. Should get an error about missing frame foo. | |
|
313 | with self.assertRaises(StopIteration): | |
|
314 | next(it) | |
|
315 | ||
|
316 | it = dctx.read_from(b'') | |
|
317 | with self.assertRaises(StopIteration): | |
|
318 | next(it) | |
|
319 | ||
|
320 | def test_invalid_input(self): | |
|
321 | dctx = zstd.ZstdDecompressor() | |
|
322 | ||
|
323 | source = io.BytesIO(b'foobar') | |
|
324 | it = dctx.read_from(source) | |
|
325 | with self.assertRaisesRegexp(zstd.ZstdError, 'Unknown frame descriptor'): | |
|
326 | next(it) | |
|
327 | ||
|
328 | it = dctx.read_from(b'foobar') | |
|
329 | with self.assertRaisesRegexp(zstd.ZstdError, 'Unknown frame descriptor'): | |
|
330 | next(it) | |
|
331 | ||
|
332 | def test_empty_roundtrip(self): | |
|
333 | cctx = zstd.ZstdCompressor(level=1, write_content_size=False) | |
|
334 | empty = cctx.compress(b'') | |
|
335 | ||
|
336 | source = io.BytesIO(empty) | |
|
337 | source.seek(0) | |
|
338 | ||
|
339 | dctx = zstd.ZstdDecompressor() | |
|
340 | it = dctx.read_from(source) | |
|
341 | ||
|
342 | # No chunks should be emitted since there is no data. | |
|
343 | with self.assertRaises(StopIteration): | |
|
344 | next(it) | |
|
345 | ||
|
346 | # Again for good measure. | |
|
347 | with self.assertRaises(StopIteration): | |
|
348 | next(it) | |
|
349 | ||
|
350 | def test_skip_bytes_too_large(self): | |
|
351 | dctx = zstd.ZstdDecompressor() | |
|
352 | ||
|
353 | with self.assertRaisesRegexp(ValueError, 'skip_bytes must be smaller than read_size'): | |
|
354 | dctx.read_from(b'', skip_bytes=1, read_size=1) | |
|
355 | ||
|
356 | with self.assertRaisesRegexp(ValueError, 'skip_bytes larger than first input chunk'): | |
|
357 | b''.join(dctx.read_from(b'foobar', skip_bytes=10)) | |
|
358 | ||
|
359 | def test_skip_bytes(self): | |
|
360 | cctx = zstd.ZstdCompressor(write_content_size=False) | |
|
361 | compressed = cctx.compress(b'foobar') | |
|
362 | ||
|
363 | dctx = zstd.ZstdDecompressor() | |
|
364 | output = b''.join(dctx.read_from(b'hdr' + compressed, skip_bytes=3)) | |
|
365 | self.assertEqual(output, b'foobar') | |
|
366 | ||
|
367 | def test_large_output(self): | |
|
368 | source = io.BytesIO() | |
|
369 | source.write(b'f' * zstd.DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE) | |
|
370 | source.write(b'o') | |
|
371 | source.seek(0) | |
|
372 | ||
|
373 | cctx = zstd.ZstdCompressor(level=1) | |
|
374 | compressed = io.BytesIO(cctx.compress(source.getvalue())) | |
|
375 | compressed.seek(0) | |
|
376 | ||
|
377 | dctx = zstd.ZstdDecompressor() | |
|
378 | it = dctx.read_from(compressed) | |
|
379 | ||
|
380 | chunks = [] | |
|
381 | chunks.append(next(it)) | |
|
382 | chunks.append(next(it)) | |
|
383 | ||
|
384 | with self.assertRaises(StopIteration): | |
|
385 | next(it) | |
|
386 | ||
|
387 | decompressed = b''.join(chunks) | |
|
388 | self.assertEqual(decompressed, source.getvalue()) | |
|
389 | ||
|
390 | # And again with buffer protocol. | |
|
391 | it = dctx.read_from(compressed.getvalue()) | |
|
392 | chunks = [] | |
|
393 | chunks.append(next(it)) | |
|
394 | chunks.append(next(it)) | |
|
395 | ||
|
396 | with self.assertRaises(StopIteration): | |
|
397 | next(it) | |
|
398 | ||
|
399 | decompressed = b''.join(chunks) | |
|
400 | self.assertEqual(decompressed, source.getvalue()) | |
|
401 | ||
|
402 | def test_large_input(self): | |
|
403 | bytes = list(struct.Struct('>B').pack(i) for i in range(256)) | |
|
404 | compressed = io.BytesIO() | |
|
405 | input_size = 0 | |
|
406 | cctx = zstd.ZstdCompressor(level=1) | |
|
407 | with cctx.write_to(compressed) as compressor: | |
|
408 | while True: | |
|
409 | compressor.write(random.choice(bytes)) | |
|
410 | input_size += 1 | |
|
411 | ||
|
412 | have_compressed = len(compressed.getvalue()) > zstd.DECOMPRESSION_RECOMMENDED_INPUT_SIZE | |
|
413 | have_raw = input_size > zstd.DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE * 2 | |
|
414 | if have_compressed and have_raw: | |
|
415 | break | |
|
416 | ||
|
417 | compressed.seek(0) | |
|
418 | self.assertGreater(len(compressed.getvalue()), | |
|
419 | zstd.DECOMPRESSION_RECOMMENDED_INPUT_SIZE) | |
|
420 | ||
|
421 | dctx = zstd.ZstdDecompressor() | |
|
422 | it = dctx.read_from(compressed) | |
|
423 | ||
|
424 | chunks = [] | |
|
425 | chunks.append(next(it)) | |
|
426 | chunks.append(next(it)) | |
|
427 | chunks.append(next(it)) | |
|
428 | ||
|
429 | with self.assertRaises(StopIteration): | |
|
430 | next(it) | |
|
431 | ||
|
432 | decompressed = b''.join(chunks) | |
|
433 | self.assertEqual(len(decompressed), input_size) | |
|
434 | ||
|
435 | # And again with buffer protocol. | |
|
436 | it = dctx.read_from(compressed.getvalue()) | |
|
437 | ||
|
438 | chunks = [] | |
|
439 | chunks.append(next(it)) | |
|
440 | chunks.append(next(it)) | |
|
441 | chunks.append(next(it)) | |
|
442 | ||
|
443 | with self.assertRaises(StopIteration): | |
|
444 | next(it) | |
|
445 | ||
|
446 | decompressed = b''.join(chunks) | |
|
447 | self.assertEqual(len(decompressed), input_size) | |
|
448 | ||
|
449 | def test_interesting(self): | |
|
450 | # Found this edge case via fuzzing. | |
|
451 | cctx = zstd.ZstdCompressor(level=1) | |
|
452 | ||
|
453 | source = io.BytesIO() | |
|
454 | ||
|
455 | compressed = io.BytesIO() | |
|
456 | with cctx.write_to(compressed) as compressor: | |
|
457 | for i in range(256): | |
|
458 | chunk = b'\0' * 1024 | |
|
459 | compressor.write(chunk) | |
|
460 | source.write(chunk) | |
|
461 | ||
|
462 | dctx = zstd.ZstdDecompressor() | |
|
463 | ||
|
464 | simple = dctx.decompress(compressed.getvalue(), | |
|
465 | max_output_size=len(source.getvalue())) | |
|
466 | self.assertEqual(simple, source.getvalue()) | |
|
467 | ||
|
468 | compressed.seek(0) | |
|
469 | streamed = b''.join(dctx.read_from(compressed)) | |
|
470 | self.assertEqual(streamed, source.getvalue()) | |
|
471 | ||
|
472 | def test_read_write_size(self): | |
|
473 | source = OpCountingBytesIO(zstd.ZstdCompressor().compress(b'foobarfoobar')) | |
|
474 | dctx = zstd.ZstdDecompressor() | |
|
475 | for chunk in dctx.read_from(source, read_size=1, write_size=1): | |
|
476 | self.assertEqual(len(chunk), 1) | |
|
477 | ||
|
478 | self.assertEqual(source._read_count, len(source.getvalue())) |
@@ -0,0 +1,17 b'' | |||
|
1 | try: | |
|
2 | import unittest2 as unittest | |
|
3 | except ImportError: | |
|
4 | import unittest | |
|
5 | ||
|
6 | import zstd | |
|
7 | ||
|
8 | ||
|
9 | class TestSizes(unittest.TestCase): | |
|
10 | def test_decompression_size(self): | |
|
11 | size = zstd.estimate_decompression_context_size() | |
|
12 | self.assertGreater(size, 100000) | |
|
13 | ||
|
14 | def test_compression_size(self): | |
|
15 | params = zstd.get_compression_parameters(3) | |
|
16 | size = zstd.estimate_compression_context_size(params) | |
|
17 | self.assertGreater(size, 100000) |
@@ -0,0 +1,48 b'' | |||
|
1 | from __future__ import unicode_literals | |
|
2 | ||
|
3 | try: | |
|
4 | import unittest2 as unittest | |
|
5 | except ImportError: | |
|
6 | import unittest | |
|
7 | ||
|
8 | import zstd | |
|
9 | ||
|
10 | class TestModuleAttributes(unittest.TestCase): | |
|
11 | def test_version(self): | |
|
12 | self.assertEqual(zstd.ZSTD_VERSION, (1, 1, 1)) | |
|
13 | ||
|
14 | def test_constants(self): | |
|
15 | self.assertEqual(zstd.MAX_COMPRESSION_LEVEL, 22) | |
|
16 | self.assertEqual(zstd.FRAME_HEADER, b'\x28\xb5\x2f\xfd') | |
|
17 | ||
|
18 | def test_hasattr(self): | |
|
19 | attrs = ( | |
|
20 | 'COMPRESSION_RECOMMENDED_INPUT_SIZE', | |
|
21 | 'COMPRESSION_RECOMMENDED_OUTPUT_SIZE', | |
|
22 | 'DECOMPRESSION_RECOMMENDED_INPUT_SIZE', | |
|
23 | 'DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE', | |
|
24 | 'MAGIC_NUMBER', | |
|
25 | 'WINDOWLOG_MIN', | |
|
26 | 'WINDOWLOG_MAX', | |
|
27 | 'CHAINLOG_MIN', | |
|
28 | 'CHAINLOG_MAX', | |
|
29 | 'HASHLOG_MIN', | |
|
30 | 'HASHLOG_MAX', | |
|
31 | 'HASHLOG3_MAX', | |
|
32 | 'SEARCHLOG_MIN', | |
|
33 | 'SEARCHLOG_MAX', | |
|
34 | 'SEARCHLENGTH_MIN', | |
|
35 | 'SEARCHLENGTH_MAX', | |
|
36 | 'TARGETLENGTH_MIN', | |
|
37 | 'TARGETLENGTH_MAX', | |
|
38 | 'STRATEGY_FAST', | |
|
39 | 'STRATEGY_DFAST', | |
|
40 | 'STRATEGY_GREEDY', | |
|
41 | 'STRATEGY_LAZY', | |
|
42 | 'STRATEGY_LAZY2', | |
|
43 | 'STRATEGY_BTLAZY2', | |
|
44 | 'STRATEGY_BTOPT', | |
|
45 | ) | |
|
46 | ||
|
47 | for a in attrs: | |
|
48 | self.assertTrue(hasattr(zstd, a)) |
@@ -0,0 +1,64 b'' | |||
|
1 | import io | |
|
2 | ||
|
3 | try: | |
|
4 | import unittest2 as unittest | |
|
5 | except ImportError: | |
|
6 | import unittest | |
|
7 | ||
|
8 | try: | |
|
9 | import hypothesis | |
|
10 | import hypothesis.strategies as strategies | |
|
11 | except ImportError: | |
|
12 | raise unittest.SkipTest('hypothesis not available') | |
|
13 | ||
|
14 | import zstd | |
|
15 | ||
|
16 | ||
|
17 | compression_levels = strategies.integers(min_value=1, max_value=22) | |
|
18 | ||
|
19 | ||
|
20 | class TestRoundTrip(unittest.TestCase): | |
|
21 | @hypothesis.given(strategies.binary(), compression_levels) | |
|
22 | def test_compress_write_to(self, data, level): | |
|
23 | """Random data from compress() roundtrips via write_to.""" | |
|
24 | cctx = zstd.ZstdCompressor(level=level) | |
|
25 | compressed = cctx.compress(data) | |
|
26 | ||
|
27 | buffer = io.BytesIO() | |
|
28 | dctx = zstd.ZstdDecompressor() | |
|
29 | with dctx.write_to(buffer) as decompressor: | |
|
30 | decompressor.write(compressed) | |
|
31 | ||
|
32 | self.assertEqual(buffer.getvalue(), data) | |
|
33 | ||
|
34 | @hypothesis.given(strategies.binary(), compression_levels) | |
|
35 | def test_compressor_write_to_decompressor_write_to(self, data, level): | |
|
36 | """Random data from compressor write_to roundtrips via write_to.""" | |
|
37 | compress_buffer = io.BytesIO() | |
|
38 | decompressed_buffer = io.BytesIO() | |
|
39 | ||
|
40 | cctx = zstd.ZstdCompressor(level=level) | |
|
41 | with cctx.write_to(compress_buffer) as compressor: | |
|
42 | compressor.write(data) | |
|
43 | ||
|
44 | dctx = zstd.ZstdDecompressor() | |
|
45 | with dctx.write_to(decompressed_buffer) as decompressor: | |
|
46 | decompressor.write(compress_buffer.getvalue()) | |
|
47 | ||
|
48 | self.assertEqual(decompressed_buffer.getvalue(), data) | |
|
49 | ||
|
50 | @hypothesis.given(strategies.binary(average_size=1048576)) | |
|
51 | @hypothesis.settings(perform_health_check=False) | |
|
52 | def test_compressor_write_to_decompressor_write_to_larger(self, data): | |
|
53 | compress_buffer = io.BytesIO() | |
|
54 | decompressed_buffer = io.BytesIO() | |
|
55 | ||
|
56 | cctx = zstd.ZstdCompressor(level=5) | |
|
57 | with cctx.write_to(compress_buffer) as compressor: | |
|
58 | compressor.write(data) | |
|
59 | ||
|
60 | dctx = zstd.ZstdDecompressor() | |
|
61 | with dctx.write_to(decompressed_buffer) as decompressor: | |
|
62 | decompressor.write(compress_buffer.getvalue()) | |
|
63 | ||
|
64 | self.assertEqual(decompressed_buffer.getvalue(), data) |
@@ -0,0 +1,46 b'' | |||
|
1 | import sys | |
|
2 | ||
|
3 | try: | |
|
4 | import unittest2 as unittest | |
|
5 | except ImportError: | |
|
6 | import unittest | |
|
7 | ||
|
8 | import zstd | |
|
9 | ||
|
10 | ||
|
11 | if sys.version_info[0] >= 3: | |
|
12 | int_type = int | |
|
13 | else: | |
|
14 | int_type = long | |
|
15 | ||
|
16 | ||
|
17 | class TestTrainDictionary(unittest.TestCase): | |
|
18 | def test_no_args(self): | |
|
19 | with self.assertRaises(TypeError): | |
|
20 | zstd.train_dictionary() | |
|
21 | ||
|
22 | def test_bad_args(self): | |
|
23 | with self.assertRaises(TypeError): | |
|
24 | zstd.train_dictionary(8192, u'foo') | |
|
25 | ||
|
26 | with self.assertRaises(ValueError): | |
|
27 | zstd.train_dictionary(8192, [u'foo']) | |
|
28 | ||
|
29 | def test_basic(self): | |
|
30 | samples = [] | |
|
31 | for i in range(128): | |
|
32 | samples.append(b'foo' * 64) | |
|
33 | samples.append(b'bar' * 64) | |
|
34 | samples.append(b'foobar' * 64) | |
|
35 | samples.append(b'baz' * 64) | |
|
36 | samples.append(b'foobaz' * 64) | |
|
37 | samples.append(b'bazfoo' * 64) | |
|
38 | ||
|
39 | d = zstd.train_dictionary(8192, samples) | |
|
40 | self.assertLessEqual(len(d), 8192) | |
|
41 | ||
|
42 | dict_id = d.dict_id() | |
|
43 | self.assertIsInstance(dict_id, int_type) | |
|
44 | ||
|
45 | data = d.as_bytes() | |
|
46 | self.assertEqual(data[0:4], b'\x37\xa4\x30\xec') |
@@ -0,0 +1,112 b'' | |||
|
1 | /** | |
|
2 | * Copyright (c) 2016-present, Gregory Szorc | |
|
3 | * All rights reserved. | |
|
4 | * | |
|
5 | * This software may be modified and distributed under the terms | |
|
6 | * of the BSD license. See the LICENSE file for details. | |
|
7 | */ | |
|
8 | ||
|
9 | /* A Python C extension for Zstandard. */ | |
|
10 | ||
|
11 | #include "python-zstandard.h" | |
|
12 | ||
|
13 | PyObject *ZstdError; | |
|
14 | ||
|
15 | PyDoc_STRVAR(estimate_compression_context_size__doc__, | |
|
16 | "estimate_compression_context_size(compression_parameters)\n" | |
|
17 | "\n" | |
|
18 | "Give the amount of memory allocated for a compression context given a\n" | |
|
19 | "CompressionParameters instance"); | |
|
20 | ||
|
21 | PyDoc_STRVAR(estimate_decompression_context_size__doc__, | |
|
22 | "estimate_decompression_context_size()\n" | |
|
23 | "\n" | |
|
24 | "Estimate the amount of memory allocated to a decompression context.\n" | |
|
25 | ); | |
|
26 | ||
|
27 | static PyObject* estimate_decompression_context_size(PyObject* self) { | |
|
28 | return PyLong_FromSize_t(ZSTD_estimateDCtxSize()); | |
|
29 | } | |
|
30 | ||
|
31 | PyDoc_STRVAR(get_compression_parameters__doc__, | |
|
32 | "get_compression_parameters(compression_level[, source_size[, dict_size]])\n" | |
|
33 | "\n" | |
|
34 | "Obtains a ``CompressionParameters`` instance from a compression level and\n" | |
|
35 | "optional input size and dictionary size"); | |
|
36 | ||
|
37 | PyDoc_STRVAR(train_dictionary__doc__, | |
|
38 | "train_dictionary(dict_size, samples)\n" | |
|
39 | "\n" | |
|
40 | "Train a dictionary from sample data.\n" | |
|
41 | "\n" | |
|
42 | "A compression dictionary of size ``dict_size`` will be created from the\n" | |
|
43 | "iterable of samples provided by ``samples``.\n" | |
|
44 | "\n" | |
|
45 | "The raw dictionary content will be returned\n"); | |
|
46 | ||
|
47 | static char zstd_doc[] = "Interface to zstandard"; | |
|
48 | ||
|
49 | static PyMethodDef zstd_methods[] = { | |
|
50 | { "estimate_compression_context_size", (PyCFunction)estimate_compression_context_size, | |
|
51 | METH_VARARGS, estimate_compression_context_size__doc__ }, | |
|
52 | { "estimate_decompression_context_size", (PyCFunction)estimate_decompression_context_size, | |
|
53 | METH_NOARGS, estimate_decompression_context_size__doc__ }, | |
|
54 | { "get_compression_parameters", (PyCFunction)get_compression_parameters, | |
|
55 | METH_VARARGS, get_compression_parameters__doc__ }, | |
|
56 | { "train_dictionary", (PyCFunction)train_dictionary, | |
|
57 | METH_VARARGS | METH_KEYWORDS, train_dictionary__doc__ }, | |
|
58 | { NULL, NULL } | |
|
59 | }; | |
|
60 | ||
|
61 | void compressobj_module_init(PyObject* mod); | |
|
62 | void compressor_module_init(PyObject* mod); | |
|
63 | void compressionparams_module_init(PyObject* mod); | |
|
64 | void constants_module_init(PyObject* mod); | |
|
65 | void dictparams_module_init(PyObject* mod); | |
|
66 | void compressiondict_module_init(PyObject* mod); | |
|
67 | void compressionwriter_module_init(PyObject* mod); | |
|
68 | void compressoriterator_module_init(PyObject* mod); | |
|
69 | void decompressor_module_init(PyObject* mod); | |
|
70 | void decompressobj_module_init(PyObject* mod); | |
|
71 | void decompressionwriter_module_init(PyObject* mod); | |
|
72 | void decompressoriterator_module_init(PyObject* mod); | |
|
73 | ||
|
74 | void zstd_module_init(PyObject* m) { | |
|
75 | compressionparams_module_init(m); | |
|
76 | dictparams_module_init(m); | |
|
77 | compressiondict_module_init(m); | |
|
78 | compressobj_module_init(m); | |
|
79 | compressor_module_init(m); | |
|
80 | compressionwriter_module_init(m); | |
|
81 | compressoriterator_module_init(m); | |
|
82 | constants_module_init(m); | |
|
83 | decompressor_module_init(m); | |
|
84 | decompressobj_module_init(m); | |
|
85 | decompressionwriter_module_init(m); | |
|
86 | decompressoriterator_module_init(m); | |
|
87 | } | |
|
88 | ||
|
89 | #if PY_MAJOR_VERSION >= 3 | |
|
90 | static struct PyModuleDef zstd_module = { | |
|
91 | PyModuleDef_HEAD_INIT, | |
|
92 | "zstd", | |
|
93 | zstd_doc, | |
|
94 | -1, | |
|
95 | zstd_methods | |
|
96 | }; | |
|
97 | ||
|
98 | PyMODINIT_FUNC PyInit_zstd(void) { | |
|
99 | PyObject *m = PyModule_Create(&zstd_module); | |
|
100 | if (m) { | |
|
101 | zstd_module_init(m); | |
|
102 | } | |
|
103 | return m; | |
|
104 | } | |
|
105 | #else | |
|
106 | PyMODINIT_FUNC initzstd(void) { | |
|
107 | PyObject *m = Py_InitModule3("zstd", zstd_methods, zstd_doc); | |
|
108 | if (m) { | |
|
109 | zstd_module_init(m); | |
|
110 | } | |
|
111 | } | |
|
112 | #endif |
@@ -0,0 +1,152 b'' | |||
|
1 | # Copyright (c) 2016-present, Gregory Szorc | |
|
2 | # All rights reserved. | |
|
3 | # | |
|
4 | # This software may be modified and distributed under the terms | |
|
5 | # of the BSD license. See the LICENSE file for details. | |
|
6 | ||
|
7 | """Python interface to the Zstandard (zstd) compression library.""" | |
|
8 | ||
|
9 | from __future__ import absolute_import, unicode_literals | |
|
10 | ||
|
11 | import io | |
|
12 | ||
|
13 | from _zstd_cffi import ( | |
|
14 | ffi, | |
|
15 | lib, | |
|
16 | ) | |
|
17 | ||
|
18 | ||
|
19 | _CSTREAM_IN_SIZE = lib.ZSTD_CStreamInSize() | |
|
20 | _CSTREAM_OUT_SIZE = lib.ZSTD_CStreamOutSize() | |
|
21 | ||
|
22 | ||
|
23 | class _ZstdCompressionWriter(object): | |
|
24 | def __init__(self, cstream, writer): | |
|
25 | self._cstream = cstream | |
|
26 | self._writer = writer | |
|
27 | ||
|
28 | def __enter__(self): | |
|
29 | return self | |
|
30 | ||
|
31 | def __exit__(self, exc_type, exc_value, exc_tb): | |
|
32 | if not exc_type and not exc_value and not exc_tb: | |
|
33 | out_buffer = ffi.new('ZSTD_outBuffer *') | |
|
34 | out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE) | |
|
35 | out_buffer.size = _CSTREAM_OUT_SIZE | |
|
36 | out_buffer.pos = 0 | |
|
37 | ||
|
38 | while True: | |
|
39 | res = lib.ZSTD_endStream(self._cstream, out_buffer) | |
|
40 | if lib.ZSTD_isError(res): | |
|
41 | raise Exception('error ending compression stream: %s' % lib.ZSTD_getErrorName) | |
|
42 | ||
|
43 | if out_buffer.pos: | |
|
44 | self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos)) | |
|
45 | out_buffer.pos = 0 | |
|
46 | ||
|
47 | if res == 0: | |
|
48 | break | |
|
49 | ||
|
50 | return False | |
|
51 | ||
|
52 | def write(self, data): | |
|
53 | out_buffer = ffi.new('ZSTD_outBuffer *') | |
|
54 | out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE) | |
|
55 | out_buffer.size = _CSTREAM_OUT_SIZE | |
|
56 | out_buffer.pos = 0 | |
|
57 | ||
|
58 | # TODO can we reuse existing memory? | |
|
59 | in_buffer = ffi.new('ZSTD_inBuffer *') | |
|
60 | in_buffer.src = ffi.new('char[]', data) | |
|
61 | in_buffer.size = len(data) | |
|
62 | in_buffer.pos = 0 | |
|
63 | while in_buffer.pos < in_buffer.size: | |
|
64 | res = lib.ZSTD_compressStream(self._cstream, out_buffer, in_buffer) | |
|
65 | if lib.ZSTD_isError(res): | |
|
66 | raise Exception('zstd compress error: %s' % lib.ZSTD_getErrorName(res)) | |
|
67 | ||
|
68 | if out_buffer.pos: | |
|
69 | self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos)) | |
|
70 | out_buffer.pos = 0 | |
|
71 | ||
|
72 | ||
|
73 | class ZstdCompressor(object): | |
|
74 | def __init__(self, level=3, dict_data=None, compression_params=None): | |
|
75 | if dict_data: | |
|
76 | raise Exception('dict_data not yet supported') | |
|
77 | if compression_params: | |
|
78 | raise Exception('compression_params not yet supported') | |
|
79 | ||
|
80 | self._compression_level = level | |
|
81 | ||
|
82 | def compress(self, data): | |
|
83 | # Just use the stream API for now. | |
|
84 | output = io.BytesIO() | |
|
85 | with self.write_to(output) as compressor: | |
|
86 | compressor.write(data) | |
|
87 | return output.getvalue() | |
|
88 | ||
|
89 | def copy_stream(self, ifh, ofh): | |
|
90 | cstream = self._get_cstream() | |
|
91 | ||
|
92 | in_buffer = ffi.new('ZSTD_inBuffer *') | |
|
93 | out_buffer = ffi.new('ZSTD_outBuffer *') | |
|
94 | ||
|
95 | out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE) | |
|
96 | out_buffer.size = _CSTREAM_OUT_SIZE | |
|
97 | out_buffer.pos = 0 | |
|
98 | ||
|
99 | total_read, total_write = 0, 0 | |
|
100 | ||
|
101 | while True: | |
|
102 | data = ifh.read(_CSTREAM_IN_SIZE) | |
|
103 | if not data: | |
|
104 | break | |
|
105 | ||
|
106 | total_read += len(data) | |
|
107 | ||
|
108 | in_buffer.src = ffi.new('char[]', data) | |
|
109 | in_buffer.size = len(data) | |
|
110 | in_buffer.pos = 0 | |
|
111 | ||
|
112 | while in_buffer.pos < in_buffer.size: | |
|
113 | res = lib.ZSTD_compressStream(cstream, out_buffer, in_buffer) | |
|
114 | if lib.ZSTD_isError(res): | |
|
115 | raise Exception('zstd compress error: %s' % | |
|
116 | lib.ZSTD_getErrorName(res)) | |
|
117 | ||
|
118 | if out_buffer.pos: | |
|
119 | ofh.write(ffi.buffer(out_buffer.dst, out_buffer.pos)) | |
|
120 | total_write = out_buffer.pos | |
|
121 | out_buffer.pos = 0 | |
|
122 | ||
|
123 | # We've finished reading. Flush the compressor. | |
|
124 | while True: | |
|
125 | res = lib.ZSTD_endStream(cstream, out_buffer) | |
|
126 | if lib.ZSTD_isError(res): | |
|
127 | raise Exception('error ending compression stream: %s' % | |
|
128 | lib.ZSTD_getErrorName(res)) | |
|
129 | ||
|
130 | if out_buffer.pos: | |
|
131 | ofh.write(ffi.buffer(out_buffer.dst, out_buffer.pos)) | |
|
132 | total_write += out_buffer.pos | |
|
133 | out_buffer.pos = 0 | |
|
134 | ||
|
135 | if res == 0: | |
|
136 | break | |
|
137 | ||
|
138 | return total_read, total_write | |
|
139 | ||
|
140 | def write_to(self, writer): | |
|
141 | return _ZstdCompressionWriter(self._get_cstream(), writer) | |
|
142 | ||
|
143 | def _get_cstream(self): | |
|
144 | cstream = lib.ZSTD_createCStream() | |
|
145 | cstream = ffi.gc(cstream, lib.ZSTD_freeCStream) | |
|
146 | ||
|
147 | res = lib.ZSTD_initCStream(cstream, self._compression_level) | |
|
148 | if lib.ZSTD_isError(res): | |
|
149 | raise Exception('cannot init CStream: %s' % | |
|
150 | lib.ZSTD_getErrorName(res)) | |
|
151 | ||
|
152 | return cstream |
@@ -7,7 +7,7 b'' | |||
|
7 | 7 | New errors are not allowed. Warnings are strongly discouraged. |
|
8 | 8 | (The writing "no-che?k-code" is for not skipping this file when checking.) |
|
9 | 9 | |
|
10 | $ hg locate | sed 's-\\-/-g' | | |
|
10 | $ hg locate -X contrib/python-zstandard | sed 's-\\-/-g' | | |
|
11 | 11 | > xargs "$check_code" --warnings --per-file=0 || false |
|
12 | 12 | Skipping hgext/fsmonitor/pywatchman/__init__.py it has no-che?k-code (glob) |
|
13 | 13 | Skipping hgext/fsmonitor/pywatchman/bser.c it has no-che?k-code (glob) |
@@ -159,6 +159,7 b' outputs, which should be fixed later.' | |||
|
159 | 159 | $ hg locate 'set:**.py or grep(r"^#!.*?python")' \ |
|
160 | 160 | > 'tests/**.t' \ |
|
161 | 161 | > -X contrib/debugshell.py \ |
|
162 | > -X contrib/python-zstandard/ \ | |
|
162 | 163 | > -X contrib/win32/hgwebdir_wsgi.py \ |
|
163 | 164 | > -X doc/gendoc.py \ |
|
164 | 165 | > -X doc/hgmanpage.py \ |
@@ -4,6 +4,17 b'' | |||
|
4 | 4 | $ cd "$TESTDIR"/.. |
|
5 | 5 | |
|
6 | 6 | $ hg files 'set:(**.py)' | sed 's|\\|/|g' | xargs python contrib/check-py3-compat.py |
|
7 | contrib/python-zstandard/setup.py not using absolute_import | |
|
8 | contrib/python-zstandard/setup_zstd.py not using absolute_import | |
|
9 | contrib/python-zstandard/tests/common.py not using absolute_import | |
|
10 | contrib/python-zstandard/tests/test_cffi.py not using absolute_import | |
|
11 | contrib/python-zstandard/tests/test_compressor.py not using absolute_import | |
|
12 | contrib/python-zstandard/tests/test_data_structures.py not using absolute_import | |
|
13 | contrib/python-zstandard/tests/test_decompressor.py not using absolute_import | |
|
14 | contrib/python-zstandard/tests/test_estimate_sizes.py not using absolute_import | |
|
15 | contrib/python-zstandard/tests/test_module_attributes.py not using absolute_import | |
|
16 | contrib/python-zstandard/tests/test_roundtrip.py not using absolute_import | |
|
17 | contrib/python-zstandard/tests/test_train_dictionary.py not using absolute_import | |
|
7 | 18 | hgext/fsmonitor/pywatchman/__init__.py not using absolute_import |
|
8 | 19 | hgext/fsmonitor/pywatchman/__init__.py requires print_function |
|
9 | 20 | hgext/fsmonitor/pywatchman/capabilities.py not using absolute_import |
@@ -10,6 +10,6 b' run pyflakes on all tracked files ending' | |||
|
10 | 10 | > -X mercurial/pycompat.py \ |
|
11 | 11 | > 2>/dev/null \ |
|
12 | 12 | > | xargs pyflakes 2>/dev/null | "$TESTDIR/filterpyflakes.py" |
|
13 | contrib/python-zstandard/tests/test_data_structures.py:107: local variable 'size' is assigned to but never used | |
|
13 | 14 | tests/filterpyflakes.py:39: undefined name 'undefinedname' |
|
14 | 15 | |
|
15 |
General Comments 0
You need to be logged in to leave comments.
Login now