##// END OF EJS Templates
zstd: vendor python-zstandard 0.5.0...
Gregory Szorc -
r30435:b86a448a default
parent child Browse files
Show More
@@ -0,0 +1,27 b''
1 Copyright (c) 2016, Gregory Szorc
2 All rights reserved.
3
4 Redistribution and use in source and binary forms, with or without modification,
5 are permitted provided that the following conditions are met:
6
7 1. Redistributions of source code must retain the above copyright notice, this
8 list of conditions and the following disclaimer.
9
10 2. Redistributions in binary form must reproduce the above copyright notice,
11 this list of conditions and the following disclaimer in the documentation
12 and/or other materials provided with the distribution.
13
14 3. Neither the name of the copyright holder nor the names of its contributors
15 may be used to endorse or promote products derived from this software without
16 specific prior written permission.
17
18 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
19 ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
20 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
21 DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
22 ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
23 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
24 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
25 ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
26 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
27 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,2 b''
1 graft zstd
2 include make_cffi.py
@@ -0,0 +1,63 b''
1 Version History
2 ===============
3
4 0.5.0 (released 2016-11-10)
5 ---------------------------
6
7 * Vendored version of zstd updated to 1.1.1.
8 * Continuous integration for Python 3.6 and 3.7
9 * Continuous integration for Conda
10 * Added compression and decompression APIs providing similar interfaces
11 to the standard library ``zlib`` and ``bz2`` modules. This allows
12 coding to a common interface.
13 * ``zstd.__version__` is now defined.
14 * ``read_from()`` on various APIs now accepts objects implementing the buffer
15 protocol.
16 * ``read_from()`` has gained a ``skip_bytes`` argument. This allows callers
17 to pass in an existing buffer with a header without having to create a
18 slice or a new object.
19 * Implemented ``ZstdCompressionDict.as_bytes()``.
20 * Python's memory allocator is now used instead of ``malloc()``.
21 * Low-level zstd data structures are reused in more instances, cutting down
22 on overhead for certain operations.
23 * ``distutils`` boilerplate for obtaining an ``Extension`` instance
24 has now been refactored into a standalone ``setup_zstd.py`` file. This
25 allows other projects with ``setup.py`` files to reuse the
26 ``distutils`` code for this project without copying code.
27 * The monolithic ``zstd.c`` file has been split into a header file defining
28 types and separate ``.c`` source files for the implementation.
29
30 History of the Project
31 ======================
32
33 2016-08-31 - Zstandard 1.0.0 is released and Gregory starts hacking on a
34 Python extension for use by the Mercurial project. A very hacky prototype
35 is sent to the mercurial-devel list for RFC.
36
37 2016-09-03 - Most functionality from Zstandard C API implemented. Source
38 code published on https://github.com/indygreg/python-zstandard. Travis-CI
39 automation configured. 0.0.1 release on PyPI.
40
41 2016-09-05 - After the API was rounded out a bit and support for Python
42 2.6 and 2.7 was added, version 0.1 was released to PyPI.
43
44 2016-09-05 - After the compressor and decompressor APIs were changed, 0.2
45 was released to PyPI.
46
47 2016-09-10 - 0.3 is released with a bunch of new features. ZstdCompressor
48 now accepts arguments controlling frame parameters. The source size can now
49 be declared when performing streaming compression. ZstdDecompressor.decompress()
50 is implemented. Compression dictionaries are now cached when using the simple
51 compression and decompression APIs. Memory size APIs added.
52 ZstdCompressor.read_from() and ZstdDecompressor.read_from() have been
53 implemented. This rounds out the major compression/decompression APIs planned
54 by the author.
55
56 2016-10-02 - 0.3.3 is released with a bug fix for read_from not fully
57 decoding a zstd frame (issue #2).
58
59 2016-10-02 - 0.4.0 is released with zstd 1.1.0, support for custom read and
60 write buffer sizes, and a few bug fixes involving failure to read/write
61 all data when buffer sizes were too small to hold remaining data.
62
63 2016-11-10 - 0.5.0 is released with zstd 1.1.1 and other enhancements.
This diff has been collapsed as it changes many lines, (776 lines changed) Show them Hide them
@@ -0,0 +1,776 b''
1 ================
2 python-zstandard
3 ================
4
5 This project provides a Python C extension for interfacing with the
6 `Zstandard <http://www.zstd.net>`_ compression library.
7
8 The primary goal of the extension is to provide a Pythonic interface to
9 the underlying C API. This means exposing most of the features and flexibility
10 of the C API while not sacrificing usability or safety that Python provides.
11
12 | |ci-status| |win-ci-status|
13
14 State of Project
15 ================
16
17 The project is officially in beta state. The author is reasonably satisfied
18 with the current API and that functionality works as advertised. There
19 may be some backwards incompatible changes before 1.0. Though the author
20 does not intend to make any major changes to the Python API.
21
22 There is continuous integration for Python versions 2.6, 2.7, and 3.3+
23 on Linux x86_x64 and Windows x86 and x86_64. The author is reasonably
24 confident the extension is stable and works as advertised on these
25 platforms.
26
27 Expected Changes
28 ----------------
29
30 The author is reasonably confident in the current state of what's
31 implemented on the ``ZstdCompressor`` and ``ZstdDecompressor`` types.
32 Those APIs likely won't change significantly. Some low-level behavior
33 (such as naming and types expected by arguments) may change.
34
35 There will likely be arguments added to control the input and output
36 buffer sizes (currently, certain operations read and write in chunk
37 sizes using zstd's preferred defaults).
38
39 There should be an API that accepts an object that conforms to the buffer
40 interface and returns an iterator over compressed or decompressed output.
41
42 The author is on the fence as to whether to support the extremely
43 low level compression and decompression APIs. It could be useful to
44 support compression without the framing headers. But the author doesn't
45 believe it a high priority at this time.
46
47 The CFFI bindings are half-baked and need to be finished.
48
49 Requirements
50 ============
51
52 This extension is designed to run with Python 2.6, 2.7, 3.3, 3.4, and 3.5
53 on common platforms (Linux, Windows, and OS X). Only x86_64 is currently
54 well-tested as an architecture.
55
56 Installing
57 ==========
58
59 This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard.
60 So, to install this package::
61
62 $ pip install zstandard
63
64 Binary wheels are made available for some platforms. If you need to
65 install from a source distribution, all you should need is a working C
66 compiler and the Python development headers/libraries. On many Linux
67 distributions, you can install a ``python-dev`` or ``python-devel``
68 package to provide these dependencies.
69
70 Packages are also uploaded to Anaconda Cloud at
71 https://anaconda.org/indygreg/zstandard. See that URL for how to install
72 this package with ``conda``.
73
74 Performance
75 ===========
76
77 Very crude and non-scientific benchmarking (most benchmarks fall in this
78 category because proper benchmarking is hard) show that the Python bindings
79 perform within 10% of the native C implementation.
80
81 The following table compares the performance of compressing and decompressing
82 a 1.1 GB tar file comprised of the files in a Firefox source checkout. Values
83 obtained with the ``zstd`` program are on the left. The remaining columns detail
84 performance of various compression APIs in the Python bindings.
85
86 +-------+-----------------+-----------------+-----------------+---------------+
87 | Level | Native | Simple | Stream In | Stream Out |
88 | | Comp / Decomp | Comp / Decomp | Comp / Decomp | Comp |
89 +=======+=================+=================+=================+===============+
90 | 1 | 490 / 1338 MB/s | 458 / 1266 MB/s | 407 / 1156 MB/s | 405 MB/s |
91 +-------+-----------------+-----------------+-----------------+---------------+
92 | 2 | 412 / 1288 MB/s | 381 / 1203 MB/s | 345 / 1128 MB/s | 349 MB/s |
93 +-------+-----------------+-----------------+-----------------+---------------+
94 | 3 | 342 / 1312 MB/s | 319 / 1182 MB/s | 285 / 1165 MB/s | 287 MB/s |
95 +-------+-----------------+-----------------+-----------------+---------------+
96 | 11 | 64 / 1506 MB/s | 66 / 1436 MB/s | 56 / 1342 MB/s | 57 MB/s |
97 +-------+-----------------+-----------------+-----------------+---------------+
98
99 Again, these are very unscientific. But it shows that Python is capable of
100 compressing at several hundred MB/s and decompressing at over 1 GB/s.
101
102 Comparison to Other Python Bindings
103 ===================================
104
105 https://pypi.python.org/pypi/zstd is an alternative Python binding to
106 Zstandard. At the time this was written, the latest release of that
107 package (1.0.0.2) had the following significant differences from this package:
108
109 * It only exposes the simple API for compression and decompression operations.
110 This extension exposes the streaming API, dictionary training, and more.
111 * It adds a custom framing header to compressed data and there is no way to
112 disable it. This means that data produced with that module cannot be used by
113 other Zstandard implementations.
114
115 Bundling of Zstandard Source Code
116 =================================
117
118 The source repository for this project contains a vendored copy of the
119 Zstandard source code. This is done for a few reasons.
120
121 First, Zstandard is relatively new and not yet widely available as a system
122 package. Providing a copy of the source code enables the Python C extension
123 to be compiled without requiring the user to obtain the Zstandard source code
124 separately.
125
126 Second, Zstandard has both a stable *public* API and an *experimental* API.
127 The *experimental* API is actually quite useful (contains functionality for
128 training dictionaries for example), so it is something we wish to expose to
129 Python. However, the *experimental* API is only available via static linking.
130 Furthermore, the *experimental* API can change at any time. So, control over
131 the exact version of the Zstandard library linked against is important to
132 ensure known behavior.
133
134 Instructions for Building and Testing
135 =====================================
136
137 Once you have the source code, the extension can be built via setup.py::
138
139 $ python setup.py build_ext
140
141 We recommend testing with ``nose``::
142
143 $ nosetests
144
145 A Tox configuration is present to test against multiple Python versions::
146
147 $ tox
148
149 Tests use the ``hypothesis`` Python package to perform fuzzing. If you
150 don't have it, those tests won't run.
151
152 There is also an experimental CFFI module. You need the ``cffi`` Python
153 package installed to build and test that.
154
155 To create a virtualenv with all development dependencies, do something
156 like the following::
157
158 # Python 2
159 $ virtualenv venv
160
161 # Python 3
162 $ python3 -m venv venv
163
164 $ source venv/bin/activate
165 $ pip install cffi hypothesis nose tox
166
167 API
168 ===
169
170 The compiled C extension provides a ``zstd`` Python module. This module
171 exposes the following interfaces.
172
173 ZstdCompressor
174 --------------
175
176 The ``ZstdCompressor`` class provides an interface for performing
177 compression operations.
178
179 Each instance is associated with parameters that control compression
180 behavior. These come from the following named arguments (all optional):
181
182 level
183 Integer compression level. Valid values are between 1 and 22.
184 dict_data
185 Compression dictionary to use.
186
187 Note: When using dictionary data and ``compress()`` is called multiple
188 times, the ``CompressionParameters`` derived from an integer compression
189 ``level`` and the first compressed data's size will be reused for all
190 subsequent operations. This may not be desirable if source data size
191 varies significantly.
192 compression_params
193 A ``CompressionParameters`` instance (overrides the ``level`` value).
194 write_checksum
195 Whether a 4 byte checksum should be written with the compressed data.
196 Defaults to False. If True, the decompressor can verify that decompressed
197 data matches the original input data.
198 write_content_size
199 Whether the size of the uncompressed data will be written into the
200 header of compressed data. Defaults to False. The data will only be
201 written if the compressor knows the size of the input data. This is
202 likely not true for streaming compression.
203 write_dict_id
204 Whether to write the dictionary ID into the compressed data.
205 Defaults to True. The dictionary ID is only written if a dictionary
206 is being used.
207
208 Simple API
209 ^^^^^^^^^^
210
211 ``compress(data)`` compresses and returns data as a one-shot operation.::
212
213 cctx = zstd.ZsdCompressor()
214 compressed = cctx.compress(b'data to compress')
215
216 Streaming Input API
217 ^^^^^^^^^^^^^^^^^^^
218
219 ``write_to(fh)`` (which behaves as a context manager) allows you to *stream*
220 data into a compressor.::
221
222 cctx = zstd.ZstdCompressor(level=10)
223 with cctx.write_to(fh) as compressor:
224 compressor.write(b'chunk 0')
225 compressor.write(b'chunk 1')
226 ...
227
228 The argument to ``write_to()`` must have a ``write(data)`` method. As
229 compressed data is available, ``write()`` will be called with the comrpessed
230 data as its argument. Many common Python types implement ``write()``, including
231 open file handles and ``io.BytesIO``.
232
233 ``write_to()`` returns an object representing a streaming compressor instance.
234 It **must** be used as a context manager. That object's ``write(data)`` method
235 is used to feed data into the compressor.
236
237 If the size of the data being fed to this streaming compressor is known,
238 you can declare it before compression begins::
239
240 cctx = zstd.ZstdCompressor()
241 with cctx.write_to(fh, size=data_len) as compressor:
242 compressor.write(chunk0)
243 compressor.write(chunk1)
244 ...
245
246 Declaring the size of the source data allows compression parameters to
247 be tuned. And if ``write_content_size`` is used, it also results in the
248 content size being written into the frame header of the output data.
249
250 The size of chunks being ``write()`` to the destination can be specified::
251
252 cctx = zstd.ZstdCompressor()
253 with cctx.write_to(fh, write_size=32768) as compressor:
254 ...
255
256 To see how much memory is being used by the streaming compressor::
257
258 cctx = zstd.ZstdCompressor()
259 with cctx.write_to(fh) as compressor:
260 ...
261 byte_size = compressor.memory_size()
262
263 Streaming Output API
264 ^^^^^^^^^^^^^^^^^^^^
265
266 ``read_from(reader)`` provides a mechanism to stream data out of a compressor
267 as an iterator of data chunks.::
268
269 cctx = zstd.ZstdCompressor()
270 for chunk in cctx.read_from(fh):
271 # Do something with emitted data.
272
273 ``read_from()`` accepts an object that has a ``read(size)`` method or conforms
274 to the buffer protocol. (``bytes`` and ``memoryview`` are 2 common types that
275 provide the buffer protocol.)
276
277 Uncompressed data is fetched from the source either by calling ``read(size)``
278 or by fetching a slice of data from the object directly (in the case where
279 the buffer protocol is being used). The returned iterator consists of chunks
280 of compressed data.
281
282 Like ``write_to()``, ``read_from()`` also accepts a ``size`` argument
283 declaring the size of the input stream::
284
285 cctx = zstd.ZstdCompressor()
286 for chunk in cctx.read_from(fh, size=some_int):
287 pass
288
289 You can also control the size that data is ``read()`` from the source and
290 the ideal size of output chunks::
291
292 cctx = zstd.ZstdCompressor()
293 for chunk in cctx.read_from(fh, read_size=16384, write_size=8192):
294 pass
295
296 Stream Copying API
297 ^^^^^^^^^^^^^^^^^^
298
299 ``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while
300 compressing it.::
301
302 cctx = zstd.ZstdCompressor()
303 cctx.copy_stream(ifh, ofh)
304
305 For example, say you wish to compress a file::
306
307 cctx = zstd.ZstdCompressor()
308 with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh:
309 cctx.copy_stream(ifh, ofh)
310
311 It is also possible to declare the size of the source stream::
312
313 cctx = zstd.ZstdCompressor()
314 cctx.copy_stream(ifh, ofh, size=len_of_input)
315
316 You can also specify how large the chunks that are ``read()`` and ``write()``
317 from and to the streams::
318
319 cctx = zstd.ZstdCompressor()
320 cctx.copy_stream(ifh, ofh, read_size=32768, write_size=16384)
321
322 The stream copier returns a 2-tuple of bytes read and written::
323
324 cctx = zstd.ZstdCompressor()
325 read_count, write_count = cctx.copy_stream(ifh, ofh)
326
327 Compressor API
328 ^^^^^^^^^^^^^^
329
330 ``compressobj()`` returns an object that exposes ``compress(data)`` and
331 ``flush()`` methods. Each returns compressed data or an empty bytes.
332
333 The purpose of ``compressobj()`` is to provide an API-compatible interface
334 with ``zlib.compressobj`` and ``bz2.BZ2Compressor``. This allows callers to
335 swap in different compressor objects while using the same API.
336
337 Once ``flush()`` is called, the compressor will no longer accept new data
338 to ``compress()``. ``flush()`` **must** be called to end the compression
339 context. If not called, the returned data may be incomplete.
340
341 Here is how this API should be used::
342
343 cctx = zstd.ZstdCompressor()
344 cobj = cctx.compressobj()
345 data = cobj.compress(b'raw input 0')
346 data = cobj.compress(b'raw input 1')
347 data = cobj.flush()
348
349 For best performance results, keep input chunks under 256KB. This avoids
350 extra allocations for a large output object.
351
352 It is possible to declare the input size of the data that will be fed into
353 the compressor::
354
355 cctx = zstd.ZstdCompressor()
356 cobj = cctx.compressobj(size=6)
357 data = cobj.compress(b'foobar')
358 data = cobj.flush()
359
360 ZstdDecompressor
361 ----------------
362
363 The ``ZstdDecompressor`` class provides an interface for performing
364 decompression.
365
366 Each instance is associated with parameters that control decompression. These
367 come from the following named arguments (all optional):
368
369 dict_data
370 Compression dictionary to use.
371
372 The interface of this class is very similar to ``ZstdCompressor`` (by design).
373
374 Simple API
375 ^^^^^^^^^^
376
377 ``decompress(data)`` can be used to decompress an entire compressed zstd
378 frame in a single operation.::
379
380 dctx = zstd.ZstdDecompressor()
381 decompressed = dctx.decompress(data)
382
383 By default, ``decompress(data)`` will only work on data written with the content
384 size encoded in its header. This can be achieved by creating a
385 ``ZstdCompressor`` with ``write_content_size=True``. If compressed data without
386 an embedded content size is seen, ``zstd.ZstdError`` will be raised.
387
388 If the compressed data doesn't have its content size embedded within it,
389 decompression can be attempted by specifying the ``max_output_size``
390 argument.::
391
392 dctx = zstd.ZstdDecompressor()
393 uncompressed = dctx.decompress(data, max_output_size=1048576)
394
395 Ideally, ``max_output_size`` will be identical to the decompressed output
396 size.
397
398 If ``max_output_size`` is too small to hold the decompressed data,
399 ``zstd.ZstdError`` will be raised.
400
401 If ``max_output_size`` is larger than the decompressed data, the allocated
402 output buffer will be resized to only use the space required.
403
404 Please note that an allocation of the requested ``max_output_size`` will be
405 performed every time the method is called. Setting to a very large value could
406 result in a lot of work for the memory allocator and may result in
407 ``MemoryError`` being raised if the allocation fails.
408
409 If the exact size of decompressed data is unknown, it is **strongly**
410 recommended to use a streaming API.
411
412 Streaming Input API
413 ^^^^^^^^^^^^^^^^^^^
414
415 ``write_to(fh)`` can be used to incrementally send compressed data to a
416 decompressor.::
417
418 dctx = zstd.ZstdDecompressor()
419 with dctx.write_to(fh) as decompressor:
420 decompressor.write(compressed_data)
421
422 This behaves similarly to ``zstd.ZstdCompressor``: compressed data is written to
423 the decompressor by calling ``write(data)`` and decompressed output is written
424 to the output object by calling its ``write(data)`` method.
425
426 The size of chunks being ``write()`` to the destination can be specified::
427
428 dctx = zstd.ZstdDecompressor()
429 with dctx.write_to(fh, write_size=16384) as decompressor:
430 pass
431
432 You can see how much memory is being used by the decompressor::
433
434 dctx = zstd.ZstdDecompressor()
435 with dctx.write_to(fh) as decompressor:
436 byte_size = decompressor.memory_size()
437
438 Streaming Output API
439 ^^^^^^^^^^^^^^^^^^^^
440
441 ``read_from(fh)`` provides a mechanism to stream decompressed data out of a
442 compressed source as an iterator of data chunks.::
443
444 dctx = zstd.ZstdDecompressor()
445 for chunk in dctx.read_from(fh):
446 # Do something with original data.
447
448 ``read_from()`` accepts a) an object with a ``read(size)`` method that will
449 return compressed bytes b) an object conforming to the buffer protocol that
450 can expose its data as a contiguous range of bytes. The ``bytes`` and
451 ``memoryview`` types expose this buffer protocol.
452
453 ``read_from()`` returns an iterator whose elements are chunks of the
454 decompressed data.
455
456 The size of requested ``read()`` from the source can be specified::
457
458 dctx = zstd.ZstdDecompressor()
459 for chunk in dctx.read_from(fh, read_size=16384):
460 pass
461
462 It is also possible to skip leading bytes in the input data::
463
464 dctx = zstd.ZstdDecompressor()
465 for chunk in dctx.read_from(fh, skip_bytes=1):
466 pass
467
468 Skipping leading bytes is useful if the source data contains extra
469 *header* data but you want to avoid the overhead of making a buffer copy
470 or allocating a new ``memoryview`` object in order to decompress the data.
471
472 Similarly to ``ZstdCompressor.read_from()``, the consumer of the iterator
473 controls when data is decompressed. If the iterator isn't consumed,
474 decompression is put on hold.
475
476 When ``read_from()`` is passed an object conforming to the buffer protocol,
477 the behavior may seem similar to what occurs when the simple decompression
478 API is used. However, this API works when the decompressed size is unknown.
479 Furthermore, if feeding large inputs, the decompressor will work in chunks
480 instead of performing a single operation.
481
482 Stream Copying API
483 ^^^^^^^^^^^^^^^^^^
484
485 ``copy_stream(ifh, ofh)`` can be used to copy data across 2 streams while
486 performing decompression.::
487
488 dctx = zstd.ZstdDecompressor()
489 dctx.copy_stream(ifh, ofh)
490
491 e.g. to decompress a file to another file::
492
493 dctx = zstd.ZstdDecompressor()
494 with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh:
495 dctx.copy_stream(ifh, ofh)
496
497 The size of chunks being ``read()`` and ``write()`` from and to the streams
498 can be specified::
499
500 dctx = zstd.ZstdDecompressor()
501 dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384)
502
503 Decompressor API
504 ^^^^^^^^^^^^^^^^
505
506 ``decompressobj()`` returns an object that exposes a ``decompress(data)``
507 methods. Compressed data chunks are fed into ``decompress(data)`` and
508 uncompressed output (or an empty bytes) is returned. Output from subsequent
509 calls needs to be concatenated to reassemble the full decompressed byte
510 sequence.
511
512 The purpose of ``decompressobj()`` is to provide an API-compatible interface
513 with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor``. This allows callers
514 to swap in different decompressor objects while using the same API.
515
516 Each object is single use: once an input frame is decoded, ``decompress()``
517 can no longer be called.
518
519 Here is how this API should be used::
520
521 dctx = zstd.ZstdDeompressor()
522 dobj = cctx.decompressobj()
523 data = dobj.decompress(compressed_chunk_0)
524 data = dobj.decompress(compressed_chunk_1)
525
526 Choosing an API
527 ---------------
528
529 Various forms of compression and decompression APIs are provided because each
530 are suitable for different use cases.
531
532 The simple/one-shot APIs are useful for small data, when the decompressed
533 data size is known (either recorded in the zstd frame header via
534 ``write_content_size`` or known via an out-of-band mechanism, such as a file
535 size).
536
537 A limitation of the simple APIs is that input or output data must fit in memory.
538 And unless using advanced tricks with Python *buffer objects*, both input and
539 output must fit in memory simultaneously.
540
541 Another limitation is that compression or decompression is performed as a single
542 operation. So if you feed large input, it could take a long time for the
543 function to return.
544
545 The streaming APIs do not have the limitations of the simple API. The cost to
546 this is they are more complex to use than a single function call.
547
548 The streaming APIs put the caller in control of compression and decompression
549 behavior by allowing them to directly control either the input or output side
550 of the operation.
551
552 With the streaming input APIs, the caller feeds data into the compressor or
553 decompressor as they see fit. Output data will only be written after the caller
554 has explicitly written data.
555
556 With the streaming output APIs, the caller consumes output from the compressor
557 or decompressor as they see fit. The compressor or decompressor will only
558 consume data from the source when the caller is ready to receive it.
559
560 One end of the streaming APIs involves a file-like object that must
561 ``write()`` output data or ``read()`` input data. Depending on what the
562 backing storage for these objects is, those operations may not complete quickly.
563 For example, when streaming compressed data to a file, the ``write()`` into
564 a streaming compressor could result in a ``write()`` to the filesystem, which
565 may take a long time to finish due to slow I/O on the filesystem. So, there
566 may be overhead in streaming APIs beyond the compression and decompression
567 operations.
568
569 Dictionary Creation and Management
570 ----------------------------------
571
572 Zstandard allows *dictionaries* to be used when compressing and
573 decompressing data. The idea is that if you are compressing a lot of similar
574 data, you can precompute common properties of that data (such as recurring
575 byte sequences) to achieve better compression ratios.
576
577 In Python, compression dictionaries are represented as the
578 ``ZstdCompressionDict`` type.
579
580 Instances can be constructed from bytes::
581
582 dict_data = zstd.ZstdCompressionDict(data)
583
584 More interestingly, instances can be created by *training* on sample data::
585
586 dict_data = zstd.train_dictionary(size, samples)
587
588 This takes a list of bytes instances and creates and returns a
589 ``ZstdCompressionDict``.
590
591 You can see how many bytes are in the dictionary by calling ``len()``::
592
593 dict_data = zstd.train_dictionary(size, samples)
594 dict_size = len(dict_data) # will not be larger than ``size``
595
596 Once you have a dictionary, you can pass it to the objects performing
597 compression and decompression::
598
599 dict_data = zstd.train_dictionary(16384, samples)
600
601 cctx = zstd.ZstdCompressor(dict_data=dict_data)
602 for source_data in input_data:
603 compressed = cctx.compress(source_data)
604 # Do something with compressed data.
605
606 dctx = zstd.ZstdDecompressor(dict_data=dict_data)
607 for compressed_data in input_data:
608 buffer = io.BytesIO()
609 with dctx.write_to(buffer) as decompressor:
610 decompressor.write(compressed_data)
611 # Do something with raw data in ``buffer``.
612
613 Dictionaries have unique integer IDs. You can retrieve this ID via::
614
615 dict_id = zstd.dictionary_id(dict_data)
616
617 You can obtain the raw data in the dict (useful for persisting and constructing
618 a ``ZstdCompressionDict`` later) via ``as_bytes()``::
619
620 dict_data = zstd.train_dictionary(size, samples)
621 raw_data = dict_data.as_bytes()
622
623 Explicit Compression Parameters
624 -------------------------------
625
626 Zstandard's integer compression levels along with the input size and dictionary
627 size are converted into a data structure defining multiple parameters to tune
628 behavior of the compression algorithm. It is possible to use define this
629 data structure explicitly to have lower-level control over compression behavior.
630
631 The ``zstd.CompressionParameters`` type represents this data structure.
632 You can see how Zstandard converts compression levels to this data structure
633 by calling ``zstd.get_compression_parameters()``. e.g.::
634
635 params = zstd.get_compression_parameters(5)
636
637 This function also accepts the uncompressed data size and dictionary size
638 to adjust parameters::
639
640 params = zstd.get_compression_parameters(3, source_size=len(data), dict_size=len(dict_data))
641
642 You can also construct compression parameters from their low-level components::
643
644 params = zstd.CompressionParameters(20, 6, 12, 5, 4, 10, zstd.STRATEGY_FAST)
645
646 You can then configure a compressor to use the custom parameters::
647
648 cctx = zstd.ZstdCompressor(compression_params=params)
649
650 The members of the ``CompressionParameters`` tuple are as follows::
651
652 * 0 - Window log
653 * 1 - Chain log
654 * 2 - Hash log
655 * 3 - Search log
656 * 4 - Search length
657 * 5 - Target length
658 * 6 - Strategy (one of the ``zstd.STRATEGY_`` constants)
659
660 You'll need to read the Zstandard documentation for what these parameters
661 do.
662
663 Misc Functionality
664 ------------------
665
666 estimate_compression_context_size(CompressionParameters)
667 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
668
669 Given a ``CompressionParameters`` struct, estimate the memory size required
670 to perform compression.
671
672 estimate_decompression_context_size()
673 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
674
675 Estimate the memory size requirements for a decompressor instance.
676
677 Constants
678 ---------
679
680 The following module constants/attributes are exposed:
681
682 ZSTD_VERSION
683 This module attribute exposes a 3-tuple of the Zstandard version. e.g.
684 ``(1, 0, 0)``
685 MAX_COMPRESSION_LEVEL
686 Integer max compression level accepted by compression functions
687 COMPRESSION_RECOMMENDED_INPUT_SIZE
688 Recommended chunk size to feed to compressor functions
689 COMPRESSION_RECOMMENDED_OUTPUT_SIZE
690 Recommended chunk size for compression output
691 DECOMPRESSION_RECOMMENDED_INPUT_SIZE
692 Recommended chunk size to feed into decompresor functions
693 DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE
694 Recommended chunk size for decompression output
695
696 FRAME_HEADER
697 bytes containing header of the Zstandard frame
698 MAGIC_NUMBER
699 Frame header as an integer
700
701 WINDOWLOG_MIN
702 Minimum value for compression parameter
703 WINDOWLOG_MAX
704 Maximum value for compression parameter
705 CHAINLOG_MIN
706 Minimum value for compression parameter
707 CHAINLOG_MAX
708 Maximum value for compression parameter
709 HASHLOG_MIN
710 Minimum value for compression parameter
711 HASHLOG_MAX
712 Maximum value for compression parameter
713 SEARCHLOG_MIN
714 Minimum value for compression parameter
715 SEARCHLOG_MAX
716 Maximum value for compression parameter
717 SEARCHLENGTH_MIN
718 Minimum value for compression parameter
719 SEARCHLENGTH_MAX
720 Maximum value for compression parameter
721 TARGETLENGTH_MIN
722 Minimum value for compression parameter
723 TARGETLENGTH_MAX
724 Maximum value for compression parameter
725 STRATEGY_FAST
726 Compression strategory
727 STRATEGY_DFAST
728 Compression strategory
729 STRATEGY_GREEDY
730 Compression strategory
731 STRATEGY_LAZY
732 Compression strategory
733 STRATEGY_LAZY2
734 Compression strategory
735 STRATEGY_BTLAZY2
736 Compression strategory
737 STRATEGY_BTOPT
738 Compression strategory
739
740 Note on Zstandard's *Experimental* API
741 ======================================
742
743 Many of the Zstandard APIs used by this module are marked as *experimental*
744 within the Zstandard project. This includes a large number of useful
745 features, such as compression and frame parameters and parts of dictionary
746 compression.
747
748 It is unclear how Zstandard's C API will evolve over time, especially with
749 regards to this *experimental* functionality. We will try to maintain
750 backwards compatibility at the Python API level. However, we cannot
751 guarantee this for things not under our control.
752
753 Since a copy of the Zstandard source code is distributed with this
754 module and since we compile against it, the behavior of a specific
755 version of this module should be constant for all of time. So if you
756 pin the version of this module used in your projects (which is a Python
757 best practice), you should be buffered from unwanted future changes.
758
759 Donate
760 ======
761
762 A lot of time has been invested into this project by the author.
763
764 If you find this project useful and would like to thank the author for
765 their work, consider donating some money. Any amount is appreciated.
766
767 .. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif
768 :target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=gregory%2eszorc%40gmail%2ecom&lc=US&item_name=python%2dzstandard&currency_code=USD&bn=PP%2dDonationsBF%3abtn_donate_LG%2egif%3aNonHosted
769 :alt: Donate via PayPal
770
771 .. |ci-status| image:: https://travis-ci.org/indygreg/python-zstandard.svg?branch=master
772 :target: https://travis-ci.org/indygreg/python-zstandard
773
774 .. |win-ci-status| image:: https://ci.appveyor.com/api/projects/status/github/indygreg/python-zstandard?svg=true
775 :target: https://ci.appveyor.com/project/indygreg/python-zstandard
776 :alt: Windows build status
@@ -0,0 +1,247 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 extern PyObject* ZstdError;
12
13 ZstdCompressionDict* train_dictionary(PyObject* self, PyObject* args, PyObject* kwargs) {
14 static char *kwlist[] = { "dict_size", "samples", "parameters", NULL };
15 size_t capacity;
16 PyObject* samples;
17 Py_ssize_t samplesLen;
18 PyObject* parameters = NULL;
19 ZDICT_params_t zparams;
20 Py_ssize_t sampleIndex;
21 Py_ssize_t sampleSize;
22 PyObject* sampleItem;
23 size_t zresult;
24 void* sampleBuffer;
25 void* sampleOffset;
26 size_t samplesSize = 0;
27 size_t* sampleSizes;
28 void* dict;
29 ZstdCompressionDict* result;
30
31 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "nO!|O!", kwlist,
32 &capacity,
33 &PyList_Type, &samples,
34 (PyObject*)&DictParametersType, &parameters)) {
35 return NULL;
36 }
37
38 /* Validate parameters first since it is easiest. */
39 zparams.selectivityLevel = 0;
40 zparams.compressionLevel = 0;
41 zparams.notificationLevel = 0;
42 zparams.dictID = 0;
43 zparams.reserved[0] = 0;
44 zparams.reserved[1] = 0;
45
46 if (parameters) {
47 /* TODO validate data ranges */
48 zparams.selectivityLevel = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 0));
49 zparams.compressionLevel = PyLong_AsLong(PyTuple_GetItem(parameters, 1));
50 zparams.notificationLevel = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 2));
51 zparams.dictID = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 3));
52 }
53
54 /* Figure out the size of the raw samples */
55 samplesLen = PyList_Size(samples);
56 for (sampleIndex = 0; sampleIndex < samplesLen; sampleIndex++) {
57 sampleItem = PyList_GetItem(samples, sampleIndex);
58 if (!PyBytes_Check(sampleItem)) {
59 PyErr_SetString(PyExc_ValueError, "samples must be bytes");
60 /* TODO probably need to perform DECREF here */
61 return NULL;
62 }
63 samplesSize += PyBytes_GET_SIZE(sampleItem);
64 }
65
66 /* Now that we know the total size of the raw simples, we can allocate
67 a buffer for the raw data */
68 sampleBuffer = malloc(samplesSize);
69 if (!sampleBuffer) {
70 PyErr_NoMemory();
71 return NULL;
72 }
73 sampleSizes = malloc(samplesLen * sizeof(size_t));
74 if (!sampleSizes) {
75 free(sampleBuffer);
76 PyErr_NoMemory();
77 return NULL;
78 }
79
80 sampleOffset = sampleBuffer;
81 /* Now iterate again and assemble the samples in the buffer */
82 for (sampleIndex = 0; sampleIndex < samplesLen; sampleIndex++) {
83 sampleItem = PyList_GetItem(samples, sampleIndex);
84 sampleSize = PyBytes_GET_SIZE(sampleItem);
85 sampleSizes[sampleIndex] = sampleSize;
86 memcpy(sampleOffset, PyBytes_AS_STRING(sampleItem), sampleSize);
87 sampleOffset = (char*)sampleOffset + sampleSize;
88 }
89
90 dict = malloc(capacity);
91 if (!dict) {
92 free(sampleSizes);
93 free(sampleBuffer);
94 PyErr_NoMemory();
95 return NULL;
96 }
97
98 zresult = ZDICT_trainFromBuffer_advanced(dict, capacity,
99 sampleBuffer, sampleSizes, (unsigned int)samplesLen,
100 zparams);
101 if (ZDICT_isError(zresult)) {
102 PyErr_Format(ZstdError, "Cannot train dict: %s", ZDICT_getErrorName(zresult));
103 free(dict);
104 free(sampleSizes);
105 free(sampleBuffer);
106 return NULL;
107 }
108
109 result = PyObject_New(ZstdCompressionDict, &ZstdCompressionDictType);
110 if (!result) {
111 return NULL;
112 }
113
114 result->dictData = dict;
115 result->dictSize = zresult;
116 return result;
117 }
118
119
120 PyDoc_STRVAR(ZstdCompressionDict__doc__,
121 "ZstdCompressionDict(data) - Represents a computed compression dictionary\n"
122 "\n"
123 "This type holds the results of a computed Zstandard compression dictionary.\n"
124 "Instances are obtained by calling ``train_dictionary()`` or by passing bytes\n"
125 "obtained from another source into the constructor.\n"
126 );
127
128 static int ZstdCompressionDict_init(ZstdCompressionDict* self, PyObject* args) {
129 const char* source;
130 Py_ssize_t sourceSize;
131
132 self->dictData = NULL;
133 self->dictSize = 0;
134
135 #if PY_MAJOR_VERSION >= 3
136 if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
137 #else
138 if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
139 #endif
140 return -1;
141 }
142
143 self->dictData = malloc(sourceSize);
144 if (!self->dictData) {
145 PyErr_NoMemory();
146 return -1;
147 }
148
149 memcpy(self->dictData, source, sourceSize);
150 self->dictSize = sourceSize;
151
152 return 0;
153 }
154
155 static void ZstdCompressionDict_dealloc(ZstdCompressionDict* self) {
156 if (self->dictData) {
157 free(self->dictData);
158 self->dictData = NULL;
159 }
160
161 PyObject_Del(self);
162 }
163
164 static PyObject* ZstdCompressionDict_dict_id(ZstdCompressionDict* self) {
165 unsigned dictID = ZDICT_getDictID(self->dictData, self->dictSize);
166
167 return PyLong_FromLong(dictID);
168 }
169
170 static PyObject* ZstdCompressionDict_as_bytes(ZstdCompressionDict* self) {
171 return PyBytes_FromStringAndSize(self->dictData, self->dictSize);
172 }
173
174 static PyMethodDef ZstdCompressionDict_methods[] = {
175 { "dict_id", (PyCFunction)ZstdCompressionDict_dict_id, METH_NOARGS,
176 PyDoc_STR("dict_id() -- obtain the numeric dictionary ID") },
177 { "as_bytes", (PyCFunction)ZstdCompressionDict_as_bytes, METH_NOARGS,
178 PyDoc_STR("as_bytes() -- obtain the raw bytes constituting the dictionary data") },
179 { NULL, NULL }
180 };
181
182 static Py_ssize_t ZstdCompressionDict_length(ZstdCompressionDict* self) {
183 return self->dictSize;
184 }
185
186 static PySequenceMethods ZstdCompressionDict_sq = {
187 (lenfunc)ZstdCompressionDict_length, /* sq_length */
188 0, /* sq_concat */
189 0, /* sq_repeat */
190 0, /* sq_item */
191 0, /* sq_ass_item */
192 0, /* sq_contains */
193 0, /* sq_inplace_concat */
194 0 /* sq_inplace_repeat */
195 };
196
197 PyTypeObject ZstdCompressionDictType = {
198 PyVarObject_HEAD_INIT(NULL, 0)
199 "zstd.ZstdCompressionDict", /* tp_name */
200 sizeof(ZstdCompressionDict), /* tp_basicsize */
201 0, /* tp_itemsize */
202 (destructor)ZstdCompressionDict_dealloc, /* tp_dealloc */
203 0, /* tp_print */
204 0, /* tp_getattr */
205 0, /* tp_setattr */
206 0, /* tp_compare */
207 0, /* tp_repr */
208 0, /* tp_as_number */
209 &ZstdCompressionDict_sq, /* tp_as_sequence */
210 0, /* tp_as_mapping */
211 0, /* tp_hash */
212 0, /* tp_call */
213 0, /* tp_str */
214 0, /* tp_getattro */
215 0, /* tp_setattro */
216 0, /* tp_as_buffer */
217 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
218 ZstdCompressionDict__doc__, /* tp_doc */
219 0, /* tp_traverse */
220 0, /* tp_clear */
221 0, /* tp_richcompare */
222 0, /* tp_weaklistoffset */
223 0, /* tp_iter */
224 0, /* tp_iternext */
225 ZstdCompressionDict_methods, /* tp_methods */
226 0, /* tp_members */
227 0, /* tp_getset */
228 0, /* tp_base */
229 0, /* tp_dict */
230 0, /* tp_descr_get */
231 0, /* tp_descr_set */
232 0, /* tp_dictoffset */
233 (initproc)ZstdCompressionDict_init, /* tp_init */
234 0, /* tp_alloc */
235 PyType_GenericNew, /* tp_new */
236 };
237
238 void compressiondict_module_init(PyObject* mod) {
239 Py_TYPE(&ZstdCompressionDictType) = &PyType_Type;
240 if (PyType_Ready(&ZstdCompressionDictType) < 0) {
241 return;
242 }
243
244 Py_INCREF((PyObject*)&ZstdCompressionDictType);
245 PyModule_AddObject(mod, "ZstdCompressionDict",
246 (PyObject*)&ZstdCompressionDictType);
247 }
@@ -0,0 +1,226 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 void ztopy_compression_parameters(CompressionParametersObject* params, ZSTD_compressionParameters* zparams) {
12 zparams->windowLog = params->windowLog;
13 zparams->chainLog = params->chainLog;
14 zparams->hashLog = params->hashLog;
15 zparams->searchLog = params->searchLog;
16 zparams->searchLength = params->searchLength;
17 zparams->targetLength = params->targetLength;
18 zparams->strategy = params->strategy;
19 }
20
21 CompressionParametersObject* get_compression_parameters(PyObject* self, PyObject* args) {
22 int compressionLevel;
23 unsigned PY_LONG_LONG sourceSize = 0;
24 Py_ssize_t dictSize = 0;
25 ZSTD_compressionParameters params;
26 CompressionParametersObject* result;
27
28 if (!PyArg_ParseTuple(args, "i|Kn", &compressionLevel, &sourceSize, &dictSize)) {
29 return NULL;
30 }
31
32 params = ZSTD_getCParams(compressionLevel, sourceSize, dictSize);
33
34 result = PyObject_New(CompressionParametersObject, &CompressionParametersType);
35 if (!result) {
36 return NULL;
37 }
38
39 result->windowLog = params.windowLog;
40 result->chainLog = params.chainLog;
41 result->hashLog = params.hashLog;
42 result->searchLog = params.searchLog;
43 result->searchLength = params.searchLength;
44 result->targetLength = params.targetLength;
45 result->strategy = params.strategy;
46
47 return result;
48 }
49
50 PyObject* estimate_compression_context_size(PyObject* self, PyObject* args) {
51 CompressionParametersObject* params;
52 ZSTD_compressionParameters zparams;
53 PyObject* result;
54
55 if (!PyArg_ParseTuple(args, "O!", &CompressionParametersType, &params)) {
56 return NULL;
57 }
58
59 ztopy_compression_parameters(params, &zparams);
60 result = PyLong_FromSize_t(ZSTD_estimateCCtxSize(zparams));
61 return result;
62 }
63
64 PyDoc_STRVAR(CompressionParameters__doc__,
65 "CompressionParameters: low-level control over zstd compression");
66
67 static PyObject* CompressionParameters_new(PyTypeObject* subtype, PyObject* args, PyObject* kwargs) {
68 CompressionParametersObject* self;
69 unsigned windowLog;
70 unsigned chainLog;
71 unsigned hashLog;
72 unsigned searchLog;
73 unsigned searchLength;
74 unsigned targetLength;
75 unsigned strategy;
76
77 if (!PyArg_ParseTuple(args, "IIIIIII", &windowLog, &chainLog, &hashLog, &searchLog,
78 &searchLength, &targetLength, &strategy)) {
79 return NULL;
80 }
81
82 if (windowLog < ZSTD_WINDOWLOG_MIN || windowLog > ZSTD_WINDOWLOG_MAX) {
83 PyErr_SetString(PyExc_ValueError, "invalid window log value");
84 return NULL;
85 }
86
87 if (chainLog < ZSTD_CHAINLOG_MIN || chainLog > ZSTD_CHAINLOG_MAX) {
88 PyErr_SetString(PyExc_ValueError, "invalid chain log value");
89 return NULL;
90 }
91
92 if (hashLog < ZSTD_HASHLOG_MIN || hashLog > ZSTD_HASHLOG_MAX) {
93 PyErr_SetString(PyExc_ValueError, "invalid hash log value");
94 return NULL;
95 }
96
97 if (searchLog < ZSTD_SEARCHLOG_MIN || searchLog > ZSTD_SEARCHLOG_MAX) {
98 PyErr_SetString(PyExc_ValueError, "invalid search log value");
99 return NULL;
100 }
101
102 if (searchLength < ZSTD_SEARCHLENGTH_MIN || searchLength > ZSTD_SEARCHLENGTH_MAX) {
103 PyErr_SetString(PyExc_ValueError, "invalid search length value");
104 return NULL;
105 }
106
107 if (targetLength < ZSTD_TARGETLENGTH_MIN || targetLength > ZSTD_TARGETLENGTH_MAX) {
108 PyErr_SetString(PyExc_ValueError, "invalid target length value");
109 return NULL;
110 }
111
112 if (strategy < ZSTD_fast || strategy > ZSTD_btopt) {
113 PyErr_SetString(PyExc_ValueError, "invalid strategy value");
114 return NULL;
115 }
116
117 self = (CompressionParametersObject*)subtype->tp_alloc(subtype, 1);
118 if (!self) {
119 return NULL;
120 }
121
122 self->windowLog = windowLog;
123 self->chainLog = chainLog;
124 self->hashLog = hashLog;
125 self->searchLog = searchLog;
126 self->searchLength = searchLength;
127 self->targetLength = targetLength;
128 self->strategy = strategy;
129
130 return (PyObject*)self;
131 }
132
133 static void CompressionParameters_dealloc(PyObject* self) {
134 PyObject_Del(self);
135 }
136
137 static Py_ssize_t CompressionParameters_length(PyObject* self) {
138 return 7;
139 };
140
141 static PyObject* CompressionParameters_item(PyObject* o, Py_ssize_t i) {
142 CompressionParametersObject* self = (CompressionParametersObject*)o;
143
144 switch (i) {
145 case 0:
146 return PyLong_FromLong(self->windowLog);
147 case 1:
148 return PyLong_FromLong(self->chainLog);
149 case 2:
150 return PyLong_FromLong(self->hashLog);
151 case 3:
152 return PyLong_FromLong(self->searchLog);
153 case 4:
154 return PyLong_FromLong(self->searchLength);
155 case 5:
156 return PyLong_FromLong(self->targetLength);
157 case 6:
158 return PyLong_FromLong(self->strategy);
159 default:
160 PyErr_SetString(PyExc_IndexError, "index out of range");
161 return NULL;
162 }
163 }
164
165 static PySequenceMethods CompressionParameters_sq = {
166 CompressionParameters_length, /* sq_length */
167 0, /* sq_concat */
168 0, /* sq_repeat */
169 CompressionParameters_item, /* sq_item */
170 0, /* sq_ass_item */
171 0, /* sq_contains */
172 0, /* sq_inplace_concat */
173 0 /* sq_inplace_repeat */
174 };
175
176 PyTypeObject CompressionParametersType = {
177 PyVarObject_HEAD_INIT(NULL, 0)
178 "CompressionParameters", /* tp_name */
179 sizeof(CompressionParametersObject), /* tp_basicsize */
180 0, /* tp_itemsize */
181 (destructor)CompressionParameters_dealloc, /* tp_dealloc */
182 0, /* tp_print */
183 0, /* tp_getattr */
184 0, /* tp_setattr */
185 0, /* tp_compare */
186 0, /* tp_repr */
187 0, /* tp_as_number */
188 &CompressionParameters_sq, /* tp_as_sequence */
189 0, /* tp_as_mapping */
190 0, /* tp_hash */
191 0, /* tp_call */
192 0, /* tp_str */
193 0, /* tp_getattro */
194 0, /* tp_setattro */
195 0, /* tp_as_buffer */
196 Py_TPFLAGS_DEFAULT, /* tp_flags */
197 CompressionParameters__doc__, /* tp_doc */
198 0, /* tp_traverse */
199 0, /* tp_clear */
200 0, /* tp_richcompare */
201 0, /* tp_weaklistoffset */
202 0, /* tp_iter */
203 0, /* tp_iternext */
204 0, /* tp_methods */
205 0, /* tp_members */
206 0, /* tp_getset */
207 0, /* tp_base */
208 0, /* tp_dict */
209 0, /* tp_descr_get */
210 0, /* tp_descr_set */
211 0, /* tp_dictoffset */
212 0, /* tp_init */
213 0, /* tp_alloc */
214 CompressionParameters_new, /* tp_new */
215 };
216
217 void compressionparams_module_init(PyObject* mod) {
218 Py_TYPE(&CompressionParametersType) = &PyType_Type;
219 if (PyType_Ready(&CompressionParametersType) < 0) {
220 return;
221 }
222
223 Py_IncRef((PyObject*)&CompressionParametersType);
224 PyModule_AddObject(mod, "CompressionParameters",
225 (PyObject*)&CompressionParametersType);
226 }
@@ -0,0 +1,235 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 extern PyObject* ZstdError;
12
13 PyDoc_STRVAR(ZstdCompresssionWriter__doc__,
14 """A context manager used for writing compressed output to a writer.\n"
15 );
16
17 static void ZstdCompressionWriter_dealloc(ZstdCompressionWriter* self) {
18 Py_XDECREF(self->compressor);
19 Py_XDECREF(self->writer);
20
21 if (self->cstream) {
22 ZSTD_freeCStream(self->cstream);
23 self->cstream = NULL;
24 }
25
26 PyObject_Del(self);
27 }
28
29 static PyObject* ZstdCompressionWriter_enter(ZstdCompressionWriter* self) {
30 if (self->entered) {
31 PyErr_SetString(ZstdError, "cannot __enter__ multiple times");
32 return NULL;
33 }
34
35 self->cstream = CStream_from_ZstdCompressor(self->compressor, self->sourceSize);
36 if (!self->cstream) {
37 return NULL;
38 }
39
40 self->entered = 1;
41
42 Py_INCREF(self);
43 return (PyObject*)self;
44 }
45
46 static PyObject* ZstdCompressionWriter_exit(ZstdCompressionWriter* self, PyObject* args) {
47 PyObject* exc_type;
48 PyObject* exc_value;
49 PyObject* exc_tb;
50 size_t zresult;
51
52 ZSTD_outBuffer output;
53 PyObject* res;
54
55 if (!PyArg_ParseTuple(args, "OOO", &exc_type, &exc_value, &exc_tb)) {
56 return NULL;
57 }
58
59 self->entered = 0;
60
61 if (self->cstream && exc_type == Py_None && exc_value == Py_None &&
62 exc_tb == Py_None) {
63
64 output.dst = malloc(self->outSize);
65 if (!output.dst) {
66 return PyErr_NoMemory();
67 }
68 output.size = self->outSize;
69 output.pos = 0;
70
71 while (1) {
72 zresult = ZSTD_endStream(self->cstream, &output);
73 if (ZSTD_isError(zresult)) {
74 PyErr_Format(ZstdError, "error ending compression stream: %s",
75 ZSTD_getErrorName(zresult));
76 free(output.dst);
77 return NULL;
78 }
79
80 if (output.pos) {
81 #if PY_MAJOR_VERSION >= 3
82 res = PyObject_CallMethod(self->writer, "write", "y#",
83 #else
84 res = PyObject_CallMethod(self->writer, "write", "s#",
85 #endif
86 output.dst, output.pos);
87 Py_XDECREF(res);
88 }
89
90 if (!zresult) {
91 break;
92 }
93
94 output.pos = 0;
95 }
96
97 free(output.dst);
98 ZSTD_freeCStream(self->cstream);
99 self->cstream = NULL;
100 }
101
102 Py_RETURN_FALSE;
103 }
104
105 static PyObject* ZstdCompressionWriter_memory_size(ZstdCompressionWriter* self) {
106 if (!self->cstream) {
107 PyErr_SetString(ZstdError, "cannot determine size of an inactive compressor; "
108 "call when a context manager is active");
109 return NULL;
110 }
111
112 return PyLong_FromSize_t(ZSTD_sizeof_CStream(self->cstream));
113 }
114
115 static PyObject* ZstdCompressionWriter_write(ZstdCompressionWriter* self, PyObject* args) {
116 const char* source;
117 Py_ssize_t sourceSize;
118 size_t zresult;
119 ZSTD_inBuffer input;
120 ZSTD_outBuffer output;
121 PyObject* res;
122
123 #if PY_MAJOR_VERSION >= 3
124 if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
125 #else
126 if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
127 #endif
128 return NULL;
129 }
130
131 if (!self->entered) {
132 PyErr_SetString(ZstdError, "compress must be called from an active context manager");
133 return NULL;
134 }
135
136 output.dst = malloc(self->outSize);
137 if (!output.dst) {
138 return PyErr_NoMemory();
139 }
140 output.size = self->outSize;
141 output.pos = 0;
142
143 input.src = source;
144 input.size = sourceSize;
145 input.pos = 0;
146
147 while ((ssize_t)input.pos < sourceSize) {
148 Py_BEGIN_ALLOW_THREADS
149 zresult = ZSTD_compressStream(self->cstream, &output, &input);
150 Py_END_ALLOW_THREADS
151
152 if (ZSTD_isError(zresult)) {
153 free(output.dst);
154 PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
155 return NULL;
156 }
157
158 /* Copy data from output buffer to writer. */
159 if (output.pos) {
160 #if PY_MAJOR_VERSION >= 3
161 res = PyObject_CallMethod(self->writer, "write", "y#",
162 #else
163 res = PyObject_CallMethod(self->writer, "write", "s#",
164 #endif
165 output.dst, output.pos);
166 Py_XDECREF(res);
167 }
168 output.pos = 0;
169 }
170
171 free(output.dst);
172
173 /* TODO return bytes written */
174 Py_RETURN_NONE;
175 }
176
177 static PyMethodDef ZstdCompressionWriter_methods[] = {
178 { "__enter__", (PyCFunction)ZstdCompressionWriter_enter, METH_NOARGS,
179 PyDoc_STR("Enter a compression context.") },
180 { "__exit__", (PyCFunction)ZstdCompressionWriter_exit, METH_VARARGS,
181 PyDoc_STR("Exit a compression context.") },
182 { "memory_size", (PyCFunction)ZstdCompressionWriter_memory_size, METH_NOARGS,
183 PyDoc_STR("Obtain the memory size of the underlying compressor") },
184 { "write", (PyCFunction)ZstdCompressionWriter_write, METH_VARARGS,
185 PyDoc_STR("Compress data") },
186 { NULL, NULL }
187 };
188
189 PyTypeObject ZstdCompressionWriterType = {
190 PyVarObject_HEAD_INIT(NULL, 0)
191 "zstd.ZstdCompressionWriter", /* tp_name */
192 sizeof(ZstdCompressionWriter), /* tp_basicsize */
193 0, /* tp_itemsize */
194 (destructor)ZstdCompressionWriter_dealloc, /* tp_dealloc */
195 0, /* tp_print */
196 0, /* tp_getattr */
197 0, /* tp_setattr */
198 0, /* tp_compare */
199 0, /* tp_repr */
200 0, /* tp_as_number */
201 0, /* tp_as_sequence */
202 0, /* tp_as_mapping */
203 0, /* tp_hash */
204 0, /* tp_call */
205 0, /* tp_str */
206 0, /* tp_getattro */
207 0, /* tp_setattro */
208 0, /* tp_as_buffer */
209 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
210 ZstdCompresssionWriter__doc__, /* tp_doc */
211 0, /* tp_traverse */
212 0, /* tp_clear */
213 0, /* tp_richcompare */
214 0, /* tp_weaklistoffset */
215 0, /* tp_iter */
216 0, /* tp_iternext */
217 ZstdCompressionWriter_methods, /* tp_methods */
218 0, /* tp_members */
219 0, /* tp_getset */
220 0, /* tp_base */
221 0, /* tp_dict */
222 0, /* tp_descr_get */
223 0, /* tp_descr_set */
224 0, /* tp_dictoffset */
225 0, /* tp_init */
226 0, /* tp_alloc */
227 PyType_GenericNew, /* tp_new */
228 };
229
230 void compressionwriter_module_init(PyObject* mod) {
231 Py_TYPE(&ZstdCompressionWriterType) = &PyType_Type;
232 if (PyType_Ready(&ZstdCompressionWriterType) < 0) {
233 return;
234 }
235 }
@@ -0,0 +1,205 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 extern PyObject* ZstdError;
12
13 PyDoc_STRVAR(ZstdCompressionObj__doc__,
14 "Perform compression using a standard library compatible API.\n"
15 );
16
17 static void ZstdCompressionObj_dealloc(ZstdCompressionObj* self) {
18 PyMem_Free(self->output.dst);
19 self->output.dst = NULL;
20
21 if (self->cstream) {
22 ZSTD_freeCStream(self->cstream);
23 self->cstream = NULL;
24 }
25
26 Py_XDECREF(self->compressor);
27
28 PyObject_Del(self);
29 }
30
31 static PyObject* ZstdCompressionObj_compress(ZstdCompressionObj* self, PyObject* args) {
32 const char* source;
33 Py_ssize_t sourceSize;
34 ZSTD_inBuffer input;
35 size_t zresult;
36 PyObject* result = NULL;
37 Py_ssize_t resultSize = 0;
38
39 if (self->flushed) {
40 PyErr_SetString(ZstdError, "cannot call compress() after flush() has been called");
41 return NULL;
42 }
43
44 #if PY_MAJOR_VERSION >= 3
45 if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
46 #else
47 if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
48 #endif
49 return NULL;
50 }
51
52 input.src = source;
53 input.size = sourceSize;
54 input.pos = 0;
55
56 while ((ssize_t)input.pos < sourceSize) {
57 Py_BEGIN_ALLOW_THREADS
58 zresult = ZSTD_compressStream(self->cstream, &self->output, &input);
59 Py_END_ALLOW_THREADS
60
61 if (ZSTD_isError(zresult)) {
62 PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
63 return NULL;
64 }
65
66 if (self->output.pos) {
67 if (result) {
68 resultSize = PyBytes_GET_SIZE(result);
69 if (-1 == _PyBytes_Resize(&result, resultSize + self->output.pos)) {
70 return NULL;
71 }
72
73 memcpy(PyBytes_AS_STRING(result) + resultSize,
74 self->output.dst, self->output.pos);
75 }
76 else {
77 result = PyBytes_FromStringAndSize(self->output.dst, self->output.pos);
78 if (!result) {
79 return NULL;
80 }
81 }
82
83 self->output.pos = 0;
84 }
85 }
86
87 if (result) {
88 return result;
89 }
90 else {
91 return PyBytes_FromString("");
92 }
93 }
94
95 static PyObject* ZstdCompressionObj_flush(ZstdCompressionObj* self) {
96 size_t zresult;
97 PyObject* result = NULL;
98 Py_ssize_t resultSize = 0;
99
100 if (self->flushed) {
101 PyErr_SetString(ZstdError, "flush() already called");
102 return NULL;
103 }
104
105 self->flushed = 1;
106
107 while (1) {
108 zresult = ZSTD_endStream(self->cstream, &self->output);
109 if (ZSTD_isError(zresult)) {
110 PyErr_Format(ZstdError, "error ending compression stream: %s",
111 ZSTD_getErrorName(zresult));
112 return NULL;
113 }
114
115 if (self->output.pos) {
116 if (result) {
117 resultSize = PyBytes_GET_SIZE(result);
118 if (-1 == _PyBytes_Resize(&result, resultSize + self->output.pos)) {
119 return NULL;
120 }
121
122 memcpy(PyBytes_AS_STRING(result) + resultSize,
123 self->output.dst, self->output.pos);
124 }
125 else {
126 result = PyBytes_FromStringAndSize(self->output.dst, self->output.pos);
127 if (!result) {
128 return NULL;
129 }
130 }
131
132 self->output.pos = 0;
133 }
134
135 if (!zresult) {
136 break;
137 }
138 }
139
140 ZSTD_freeCStream(self->cstream);
141 self->cstream = NULL;
142
143 if (result) {
144 return result;
145 }
146 else {
147 return PyBytes_FromString("");
148 }
149 }
150
151 static PyMethodDef ZstdCompressionObj_methods[] = {
152 { "compress", (PyCFunction)ZstdCompressionObj_compress, METH_VARARGS,
153 PyDoc_STR("compress data") },
154 { "flush", (PyCFunction)ZstdCompressionObj_flush, METH_NOARGS,
155 PyDoc_STR("finish compression operation") },
156 { NULL, NULL }
157 };
158
159 PyTypeObject ZstdCompressionObjType = {
160 PyVarObject_HEAD_INIT(NULL, 0)
161 "zstd.ZstdCompressionObj", /* tp_name */
162 sizeof(ZstdCompressionObj), /* tp_basicsize */
163 0, /* tp_itemsize */
164 (destructor)ZstdCompressionObj_dealloc, /* tp_dealloc */
165 0, /* tp_print */
166 0, /* tp_getattr */
167 0, /* tp_setattr */
168 0, /* tp_compare */
169 0, /* tp_repr */
170 0, /* tp_as_number */
171 0, /* tp_as_sequence */
172 0, /* tp_as_mapping */
173 0, /* tp_hash */
174 0, /* tp_call */
175 0, /* tp_str */
176 0, /* tp_getattro */
177 0, /* tp_setattro */
178 0, /* tp_as_buffer */
179 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
180 ZstdCompressionObj__doc__, /* tp_doc */
181 0, /* tp_traverse */
182 0, /* tp_clear */
183 0, /* tp_richcompare */
184 0, /* tp_weaklistoffset */
185 0, /* tp_iter */
186 0, /* tp_iternext */
187 ZstdCompressionObj_methods, /* tp_methods */
188 0, /* tp_members */
189 0, /* tp_getset */
190 0, /* tp_base */
191 0, /* tp_dict */
192 0, /* tp_descr_get */
193 0, /* tp_descr_set */
194 0, /* tp_dictoffset */
195 0, /* tp_init */
196 0, /* tp_alloc */
197 PyType_GenericNew, /* tp_new */
198 };
199
200 void compressobj_module_init(PyObject* module) {
201 Py_TYPE(&ZstdCompressionObjType) = &PyType_Type;
202 if (PyType_Ready(&ZstdCompressionObjType) < 0) {
203 return;
204 }
205 }
This diff has been collapsed as it changes many lines, (757 lines changed) Show them Hide them
@@ -0,0 +1,757 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 extern PyObject* ZstdError;
12
13 /**
14 * Initialize a zstd CStream from a ZstdCompressor instance.
15 *
16 * Returns a ZSTD_CStream on success or NULL on failure. If NULL, a Python
17 * exception will be set.
18 */
19 ZSTD_CStream* CStream_from_ZstdCompressor(ZstdCompressor* compressor, Py_ssize_t sourceSize) {
20 ZSTD_CStream* cstream;
21 ZSTD_parameters zparams;
22 void* dictData = NULL;
23 size_t dictSize = 0;
24 size_t zresult;
25
26 cstream = ZSTD_createCStream();
27 if (!cstream) {
28 PyErr_SetString(ZstdError, "cannot create CStream");
29 return NULL;
30 }
31
32 if (compressor->dict) {
33 dictData = compressor->dict->dictData;
34 dictSize = compressor->dict->dictSize;
35 }
36
37 memset(&zparams, 0, sizeof(zparams));
38 if (compressor->cparams) {
39 ztopy_compression_parameters(compressor->cparams, &zparams.cParams);
40 /* Do NOT call ZSTD_adjustCParams() here because the compression params
41 come from the user. */
42 }
43 else {
44 zparams.cParams = ZSTD_getCParams(compressor->compressionLevel, sourceSize, dictSize);
45 }
46
47 zparams.fParams = compressor->fparams;
48
49 zresult = ZSTD_initCStream_advanced(cstream, dictData, dictSize, zparams, sourceSize);
50
51 if (ZSTD_isError(zresult)) {
52 ZSTD_freeCStream(cstream);
53 PyErr_Format(ZstdError, "cannot init CStream: %s", ZSTD_getErrorName(zresult));
54 return NULL;
55 }
56
57 return cstream;
58 }
59
60
61 PyDoc_STRVAR(ZstdCompressor__doc__,
62 "ZstdCompressor(level=None, dict_data=None, compression_params=None)\n"
63 "\n"
64 "Create an object used to perform Zstandard compression.\n"
65 "\n"
66 "An instance can compress data various ways. Instances can be used multiple\n"
67 "times. Each compression operation will use the compression parameters\n"
68 "defined at construction time.\n"
69 "\n"
70 "Compression can be configured via the following names arguments:\n"
71 "\n"
72 "level\n"
73 " Integer compression level.\n"
74 "dict_data\n"
75 " A ``ZstdCompressionDict`` to be used to compress with dictionary data.\n"
76 "compression_params\n"
77 " A ``CompressionParameters`` instance defining low-level compression"
78 " parameters. If defined, this will overwrite the ``level`` argument.\n"
79 "write_checksum\n"
80 " If True, a 4 byte content checksum will be written with the compressed\n"
81 " data, allowing the decompressor to perform content verification.\n"
82 "write_content_size\n"
83 " If True, the decompressed content size will be included in the header of\n"
84 " the compressed data. This data will only be written if the compressor\n"
85 " knows the size of the input data.\n"
86 "write_dict_id\n"
87 " Determines whether the dictionary ID will be written into the compressed\n"
88 " data. Defaults to True. Only adds content to the compressed data if\n"
89 " a dictionary is being used.\n"
90 );
91
92 static int ZstdCompressor_init(ZstdCompressor* self, PyObject* args, PyObject* kwargs) {
93 static char* kwlist[] = {
94 "level",
95 "dict_data",
96 "compression_params",
97 "write_checksum",
98 "write_content_size",
99 "write_dict_id",
100 NULL
101 };
102
103 int level = 3;
104 ZstdCompressionDict* dict = NULL;
105 CompressionParametersObject* params = NULL;
106 PyObject* writeChecksum = NULL;
107 PyObject* writeContentSize = NULL;
108 PyObject* writeDictID = NULL;
109
110 self->dict = NULL;
111 self->cparams = NULL;
112 self->cdict = NULL;
113
114 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|iO!O!OOO", kwlist,
115 &level, &ZstdCompressionDictType, &dict,
116 &CompressionParametersType, &params,
117 &writeChecksum, &writeContentSize, &writeDictID)) {
118 return -1;
119 }
120
121 if (level < 1) {
122 PyErr_SetString(PyExc_ValueError, "level must be greater than 0");
123 return -1;
124 }
125
126 if (level > ZSTD_maxCLevel()) {
127 PyErr_Format(PyExc_ValueError, "level must be less than %d",
128 ZSTD_maxCLevel() + 1);
129 return -1;
130 }
131
132 self->compressionLevel = level;
133
134 if (dict) {
135 self->dict = dict;
136 Py_INCREF(dict);
137 }
138
139 if (params) {
140 self->cparams = params;
141 Py_INCREF(params);
142 }
143
144 memset(&self->fparams, 0, sizeof(self->fparams));
145
146 if (writeChecksum && PyObject_IsTrue(writeChecksum)) {
147 self->fparams.checksumFlag = 1;
148 }
149 if (writeContentSize && PyObject_IsTrue(writeContentSize)) {
150 self->fparams.contentSizeFlag = 1;
151 }
152 if (writeDictID && PyObject_Not(writeDictID)) {
153 self->fparams.noDictIDFlag = 1;
154 }
155
156 return 0;
157 }
158
159 static void ZstdCompressor_dealloc(ZstdCompressor* self) {
160 Py_XDECREF(self->cparams);
161 Py_XDECREF(self->dict);
162
163 if (self->cdict) {
164 ZSTD_freeCDict(self->cdict);
165 self->cdict = NULL;
166 }
167
168 PyObject_Del(self);
169 }
170
171 PyDoc_STRVAR(ZstdCompressor_copy_stream__doc__,
172 "copy_stream(ifh, ofh[, size=0, read_size=default, write_size=default])\n"
173 "compress data between streams\n"
174 "\n"
175 "Data will be read from ``ifh``, compressed, and written to ``ofh``.\n"
176 "``ifh`` must have a ``read(size)`` method. ``ofh`` must have a ``write(data)``\n"
177 "method.\n"
178 "\n"
179 "An optional ``size`` argument specifies the size of the source stream.\n"
180 "If defined, compression parameters will be tuned based on the size.\n"
181 "\n"
182 "Optional arguments ``read_size`` and ``write_size`` define the chunk sizes\n"
183 "of ``read()`` and ``write()`` operations, respectively. By default, they use\n"
184 "the default compression stream input and output sizes, respectively.\n"
185 );
186
187 static PyObject* ZstdCompressor_copy_stream(ZstdCompressor* self, PyObject* args, PyObject* kwargs) {
188 static char* kwlist[] = {
189 "ifh",
190 "ofh",
191 "size",
192 "read_size",
193 "write_size",
194 NULL
195 };
196
197 PyObject* source;
198 PyObject* dest;
199 Py_ssize_t sourceSize = 0;
200 size_t inSize = ZSTD_CStreamInSize();
201 size_t outSize = ZSTD_CStreamOutSize();
202 ZSTD_CStream* cstream;
203 ZSTD_inBuffer input;
204 ZSTD_outBuffer output;
205 Py_ssize_t totalRead = 0;
206 Py_ssize_t totalWrite = 0;
207 char* readBuffer;
208 Py_ssize_t readSize;
209 PyObject* readResult;
210 PyObject* res = NULL;
211 size_t zresult;
212 PyObject* writeResult;
213 PyObject* totalReadPy;
214 PyObject* totalWritePy;
215
216 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nkk", kwlist, &source, &dest, &sourceSize,
217 &inSize, &outSize)) {
218 return NULL;
219 }
220
221 if (!PyObject_HasAttrString(source, "read")) {
222 PyErr_SetString(PyExc_ValueError, "first argument must have a read() method");
223 return NULL;
224 }
225
226 if (!PyObject_HasAttrString(dest, "write")) {
227 PyErr_SetString(PyExc_ValueError, "second argument must have a write() method");
228 return NULL;
229 }
230
231 cstream = CStream_from_ZstdCompressor(self, sourceSize);
232 if (!cstream) {
233 res = NULL;
234 goto finally;
235 }
236
237 output.dst = PyMem_Malloc(outSize);
238 if (!output.dst) {
239 PyErr_NoMemory();
240 res = NULL;
241 goto finally;
242 }
243 output.size = outSize;
244 output.pos = 0;
245
246 while (1) {
247 /* Try to read from source stream. */
248 readResult = PyObject_CallMethod(source, "read", "n", inSize);
249 if (!readResult) {
250 PyErr_SetString(ZstdError, "could not read() from source");
251 goto finally;
252 }
253
254 PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize);
255
256 /* If no data was read, we're at EOF. */
257 if (0 == readSize) {
258 break;
259 }
260
261 totalRead += readSize;
262
263 /* Send data to compressor */
264 input.src = readBuffer;
265 input.size = readSize;
266 input.pos = 0;
267
268 while (input.pos < input.size) {
269 Py_BEGIN_ALLOW_THREADS
270 zresult = ZSTD_compressStream(cstream, &output, &input);
271 Py_END_ALLOW_THREADS
272
273 if (ZSTD_isError(zresult)) {
274 res = NULL;
275 PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
276 goto finally;
277 }
278
279 if (output.pos) {
280 #if PY_MAJOR_VERSION >= 3
281 writeResult = PyObject_CallMethod(dest, "write", "y#",
282 #else
283 writeResult = PyObject_CallMethod(dest, "write", "s#",
284 #endif
285 output.dst, output.pos);
286 Py_XDECREF(writeResult);
287 totalWrite += output.pos;
288 output.pos = 0;
289 }
290 }
291 }
292
293 /* We've finished reading. Now flush the compressor stream. */
294 while (1) {
295 zresult = ZSTD_endStream(cstream, &output);
296 if (ZSTD_isError(zresult)) {
297 PyErr_Format(ZstdError, "error ending compression stream: %s",
298 ZSTD_getErrorName(zresult));
299 res = NULL;
300 goto finally;
301 }
302
303 if (output.pos) {
304 #if PY_MAJOR_VERSION >= 3
305 writeResult = PyObject_CallMethod(dest, "write", "y#",
306 #else
307 writeResult = PyObject_CallMethod(dest, "write", "s#",
308 #endif
309 output.dst, output.pos);
310 totalWrite += output.pos;
311 Py_XDECREF(writeResult);
312 output.pos = 0;
313 }
314
315 if (!zresult) {
316 break;
317 }
318 }
319
320 ZSTD_freeCStream(cstream);
321 cstream = NULL;
322
323 totalReadPy = PyLong_FromSsize_t(totalRead);
324 totalWritePy = PyLong_FromSsize_t(totalWrite);
325 res = PyTuple_Pack(2, totalReadPy, totalWritePy);
326 Py_DecRef(totalReadPy);
327 Py_DecRef(totalWritePy);
328
329 finally:
330 if (output.dst) {
331 PyMem_Free(output.dst);
332 }
333
334 if (cstream) {
335 ZSTD_freeCStream(cstream);
336 }
337
338 return res;
339 }
340
341 PyDoc_STRVAR(ZstdCompressor_compress__doc__,
342 "compress(data)\n"
343 "\n"
344 "Compress data in a single operation.\n"
345 "\n"
346 "This is the simplest mechanism to perform compression: simply pass in a\n"
347 "value and get a compressed value back. It is almost the most prone to abuse.\n"
348 "The input and output values must fit in memory, so passing in very large\n"
349 "values can result in excessive memory usage. For this reason, one of the\n"
350 "streaming based APIs is preferred for larger values.\n"
351 );
352
353 static PyObject* ZstdCompressor_compress(ZstdCompressor* self, PyObject* args) {
354 const char* source;
355 Py_ssize_t sourceSize;
356 size_t destSize;
357 ZSTD_CCtx* cctx;
358 PyObject* output;
359 char* dest;
360 void* dictData = NULL;
361 size_t dictSize = 0;
362 size_t zresult;
363 ZSTD_parameters zparams;
364 ZSTD_customMem zmem;
365
366 #if PY_MAJOR_VERSION >= 3
367 if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
368 #else
369 if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
370 #endif
371 return NULL;
372 }
373
374 destSize = ZSTD_compressBound(sourceSize);
375 output = PyBytes_FromStringAndSize(NULL, destSize);
376 if (!output) {
377 return NULL;
378 }
379
380 dest = PyBytes_AsString(output);
381
382 cctx = ZSTD_createCCtx();
383 if (!cctx) {
384 Py_DECREF(output);
385 PyErr_SetString(ZstdError, "could not create CCtx");
386 return NULL;
387 }
388
389 if (self->dict) {
390 dictData = self->dict->dictData;
391 dictSize = self->dict->dictSize;
392 }
393
394 memset(&zparams, 0, sizeof(zparams));
395 if (!self->cparams) {
396 zparams.cParams = ZSTD_getCParams(self->compressionLevel, sourceSize, dictSize);
397 }
398 else {
399 ztopy_compression_parameters(self->cparams, &zparams.cParams);
400 /* Do NOT call ZSTD_adjustCParams() here because the compression params
401 come from the user. */
402 }
403
404 zparams.fParams = self->fparams;
405
406 /* The raw dict data has to be processed before it can be used. Since this
407 adds overhead - especially if multiple dictionary compression operations
408 are performed on the same ZstdCompressor instance - we create a
409 ZSTD_CDict once and reuse it for all operations. */
410
411 /* TODO the zparams (which can be derived from the source data size) used
412 on first invocation are effectively reused for subsequent operations. This
413 may not be appropriate if input sizes vary significantly and could affect
414 chosen compression parameters.
415 https://github.com/facebook/zstd/issues/358 tracks this issue. */
416 if (dictData && !self->cdict) {
417 Py_BEGIN_ALLOW_THREADS
418 memset(&zmem, 0, sizeof(zmem));
419 self->cdict = ZSTD_createCDict_advanced(dictData, dictSize, zparams, zmem);
420 Py_END_ALLOW_THREADS
421
422 if (!self->cdict) {
423 Py_DECREF(output);
424 ZSTD_freeCCtx(cctx);
425 PyErr_SetString(ZstdError, "could not create compression dictionary");
426 return NULL;
427 }
428 }
429
430 Py_BEGIN_ALLOW_THREADS
431 /* By avoiding ZSTD_compress(), we don't necessarily write out content
432 size. This means the argument to ZstdCompressor to control frame
433 parameters is honored. */
434 if (self->cdict) {
435 zresult = ZSTD_compress_usingCDict(cctx, dest, destSize,
436 source, sourceSize, self->cdict);
437 }
438 else {
439 zresult = ZSTD_compress_advanced(cctx, dest, destSize,
440 source, sourceSize, dictData, dictSize, zparams);
441 }
442 Py_END_ALLOW_THREADS
443
444 ZSTD_freeCCtx(cctx);
445
446 if (ZSTD_isError(zresult)) {
447 PyErr_Format(ZstdError, "cannot compress: %s", ZSTD_getErrorName(zresult));
448 Py_CLEAR(output);
449 return NULL;
450 }
451 else {
452 Py_SIZE(output) = zresult;
453 }
454
455 return output;
456 }
457
458 PyDoc_STRVAR(ZstdCompressionObj__doc__,
459 "compressobj()\n"
460 "\n"
461 "Return an object exposing ``compress(data)`` and ``flush()`` methods.\n"
462 "\n"
463 "The returned object exposes an API similar to ``zlib.compressobj`` and\n"
464 "``bz2.BZ2Compressor`` so that callers can swap in the zstd compressor\n"
465 "without changing how compression is performed.\n"
466 );
467
468 static ZstdCompressionObj* ZstdCompressor_compressobj(ZstdCompressor* self, PyObject* args, PyObject* kwargs) {
469 static char* kwlist[] = {
470 "size",
471 NULL
472 };
473
474 Py_ssize_t inSize = 0;
475 size_t outSize = ZSTD_CStreamOutSize();
476 ZstdCompressionObj* result = PyObject_New(ZstdCompressionObj, &ZstdCompressionObjType);
477 if (!result) {
478 return NULL;
479 }
480
481 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &inSize)) {
482 return NULL;
483 }
484
485 result->cstream = CStream_from_ZstdCompressor(self, inSize);
486 if (!result->cstream) {
487 Py_DECREF(result);
488 return NULL;
489 }
490
491 result->output.dst = PyMem_Malloc(outSize);
492 if (!result->output.dst) {
493 PyErr_NoMemory();
494 Py_DECREF(result);
495 return NULL;
496 }
497 result->output.size = outSize;
498 result->output.pos = 0;
499
500 result->compressor = self;
501 Py_INCREF(result->compressor);
502
503 result->flushed = 0;
504
505 return result;
506 }
507
508 PyDoc_STRVAR(ZstdCompressor_read_from__doc__,
509 "read_from(reader, [size=0, read_size=default, write_size=default])\n"
510 "Read uncompress data from a reader and return an iterator\n"
511 "\n"
512 "Returns an iterator of compressed data produced from reading from ``reader``.\n"
513 "\n"
514 "Uncompressed data will be obtained from ``reader`` by calling the\n"
515 "``read(size)`` method of it. The source data will be streamed into a\n"
516 "compressor. As compressed data is available, it will be exposed to the\n"
517 "iterator.\n"
518 "\n"
519 "Data is read from the source in chunks of ``read_size``. Compressed chunks\n"
520 "are at most ``write_size`` bytes. Both values default to the zstd input and\n"
521 "and output defaults, respectively.\n"
522 "\n"
523 "The caller is partially in control of how fast data is fed into the\n"
524 "compressor by how it consumes the returned iterator. The compressor will\n"
525 "not consume from the reader unless the caller consumes from the iterator.\n"
526 );
527
528 static ZstdCompressorIterator* ZstdCompressor_read_from(ZstdCompressor* self, PyObject* args, PyObject* kwargs) {
529 static char* kwlist[] = {
530 "reader",
531 "size",
532 "read_size",
533 "write_size",
534 NULL
535 };
536
537 PyObject* reader;
538 Py_ssize_t sourceSize = 0;
539 size_t inSize = ZSTD_CStreamInSize();
540 size_t outSize = ZSTD_CStreamOutSize();
541 ZstdCompressorIterator* result;
542
543 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nkk", kwlist, &reader, &sourceSize,
544 &inSize, &outSize)) {
545 return NULL;
546 }
547
548 result = PyObject_New(ZstdCompressorIterator, &ZstdCompressorIteratorType);
549 if (!result) {
550 return NULL;
551 }
552
553 result->compressor = NULL;
554 result->reader = NULL;
555 result->buffer = NULL;
556 result->cstream = NULL;
557 result->input.src = NULL;
558 result->output.dst = NULL;
559 result->readResult = NULL;
560
561 if (PyObject_HasAttrString(reader, "read")) {
562 result->reader = reader;
563 Py_INCREF(result->reader);
564 }
565 else if (1 == PyObject_CheckBuffer(reader)) {
566 result->buffer = PyMem_Malloc(sizeof(Py_buffer));
567 if (!result->buffer) {
568 goto except;
569 }
570
571 memset(result->buffer, 0, sizeof(Py_buffer));
572
573 if (0 != PyObject_GetBuffer(reader, result->buffer, PyBUF_CONTIG_RO)) {
574 goto except;
575 }
576
577 result->bufferOffset = 0;
578 sourceSize = result->buffer->len;
579 }
580 else {
581 PyErr_SetString(PyExc_ValueError,
582 "must pass an object with a read() method or conforms to buffer protocol");
583 goto except;
584 }
585
586 result->compressor = self;
587 Py_INCREF(result->compressor);
588
589 result->sourceSize = sourceSize;
590 result->cstream = CStream_from_ZstdCompressor(self, sourceSize);
591 if (!result->cstream) {
592 goto except;
593 }
594
595 result->inSize = inSize;
596 result->outSize = outSize;
597
598 result->output.dst = PyMem_Malloc(outSize);
599 if (!result->output.dst) {
600 PyErr_NoMemory();
601 goto except;
602 }
603 result->output.size = outSize;
604 result->output.pos = 0;
605
606 result->input.src = NULL;
607 result->input.size = 0;
608 result->input.pos = 0;
609
610 result->finishedInput = 0;
611 result->finishedOutput = 0;
612
613 goto finally;
614
615 except:
616 if (result->cstream) {
617 ZSTD_freeCStream(result->cstream);
618 result->cstream = NULL;
619 }
620
621 Py_DecRef((PyObject*)result->compressor);
622 Py_DecRef(result->reader);
623
624 Py_DECREF(result);
625 result = NULL;
626
627 finally:
628 return result;
629 }
630
631 PyDoc_STRVAR(ZstdCompressor_write_to___doc__,
632 "Create a context manager to write compressed data to an object.\n"
633 "\n"
634 "The passed object must have a ``write()`` method.\n"
635 "\n"
636 "The caller feeds input data to the object by calling ``compress(data)``.\n"
637 "Compressed data is written to the argument given to this function.\n"
638 "\n"
639 "The function takes an optional ``size`` argument indicating the total size\n"
640 "of the eventual input. If specified, the size will influence compression\n"
641 "parameter tuning and could result in the size being written into the\n"
642 "header of the compressed data.\n"
643 "\n"
644 "An optional ``write_size`` argument is also accepted. It defines the maximum\n"
645 "byte size of chunks fed to ``write()``. By default, it uses the zstd default\n"
646 "for a compressor output stream.\n"
647 );
648
649 static ZstdCompressionWriter* ZstdCompressor_write_to(ZstdCompressor* self, PyObject* args, PyObject* kwargs) {
650 static char* kwlist[] = {
651 "writer",
652 "size",
653 "write_size",
654 NULL
655 };
656
657 PyObject* writer;
658 ZstdCompressionWriter* result;
659 Py_ssize_t sourceSize = 0;
660 size_t outSize = ZSTD_CStreamOutSize();
661
662 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nk", kwlist, &writer, &sourceSize,
663 &outSize)) {
664 return NULL;
665 }
666
667 if (!PyObject_HasAttrString(writer, "write")) {
668 PyErr_SetString(PyExc_ValueError, "must pass an object with a write() method");
669 return NULL;
670 }
671
672 result = PyObject_New(ZstdCompressionWriter, &ZstdCompressionWriterType);
673 if (!result) {
674 return NULL;
675 }
676
677 result->compressor = self;
678 Py_INCREF(result->compressor);
679
680 result->writer = writer;
681 Py_INCREF(result->writer);
682
683 result->sourceSize = sourceSize;
684
685 result->outSize = outSize;
686
687 result->entered = 0;
688 result->cstream = NULL;
689
690 return result;
691 }
692
693 static PyMethodDef ZstdCompressor_methods[] = {
694 { "compress", (PyCFunction)ZstdCompressor_compress, METH_VARARGS,
695 ZstdCompressor_compress__doc__ },
696 { "compressobj", (PyCFunction)ZstdCompressor_compressobj,
697 METH_VARARGS | METH_KEYWORDS, ZstdCompressionObj__doc__ },
698 { "copy_stream", (PyCFunction)ZstdCompressor_copy_stream,
699 METH_VARARGS | METH_KEYWORDS, ZstdCompressor_copy_stream__doc__ },
700 { "read_from", (PyCFunction)ZstdCompressor_read_from,
701 METH_VARARGS | METH_KEYWORDS, ZstdCompressor_read_from__doc__ },
702 { "write_to", (PyCFunction)ZstdCompressor_write_to,
703 METH_VARARGS | METH_KEYWORDS, ZstdCompressor_write_to___doc__ },
704 { NULL, NULL }
705 };
706
707 PyTypeObject ZstdCompressorType = {
708 PyVarObject_HEAD_INIT(NULL, 0)
709 "zstd.ZstdCompressor", /* tp_name */
710 sizeof(ZstdCompressor), /* tp_basicsize */
711 0, /* tp_itemsize */
712 (destructor)ZstdCompressor_dealloc, /* tp_dealloc */
713 0, /* tp_print */
714 0, /* tp_getattr */
715 0, /* tp_setattr */
716 0, /* tp_compare */
717 0, /* tp_repr */
718 0, /* tp_as_number */
719 0, /* tp_as_sequence */
720 0, /* tp_as_mapping */
721 0, /* tp_hash */
722 0, /* tp_call */
723 0, /* tp_str */
724 0, /* tp_getattro */
725 0, /* tp_setattro */
726 0, /* tp_as_buffer */
727 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
728 ZstdCompressor__doc__, /* tp_doc */
729 0, /* tp_traverse */
730 0, /* tp_clear */
731 0, /* tp_richcompare */
732 0, /* tp_weaklistoffset */
733 0, /* tp_iter */
734 0, /* tp_iternext */
735 ZstdCompressor_methods, /* tp_methods */
736 0, /* tp_members */
737 0, /* tp_getset */
738 0, /* tp_base */
739 0, /* tp_dict */
740 0, /* tp_descr_get */
741 0, /* tp_descr_set */
742 0, /* tp_dictoffset */
743 (initproc)ZstdCompressor_init, /* tp_init */
744 0, /* tp_alloc */
745 PyType_GenericNew, /* tp_new */
746 };
747
748 void compressor_module_init(PyObject* mod) {
749 Py_TYPE(&ZstdCompressorType) = &PyType_Type;
750 if (PyType_Ready(&ZstdCompressorType) < 0) {
751 return;
752 }
753
754 Py_INCREF((PyObject*)&ZstdCompressorType);
755 PyModule_AddObject(mod, "ZstdCompressor",
756 (PyObject*)&ZstdCompressorType);
757 }
@@ -0,0 +1,234 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 #define min(a, b) (((a) < (b)) ? (a) : (b))
12
13 extern PyObject* ZstdError;
14
15 PyDoc_STRVAR(ZstdCompressorIterator__doc__,
16 "Represents an iterator of compressed data.\n"
17 );
18
19 static void ZstdCompressorIterator_dealloc(ZstdCompressorIterator* self) {
20 Py_XDECREF(self->readResult);
21 Py_XDECREF(self->compressor);
22 Py_XDECREF(self->reader);
23
24 if (self->buffer) {
25 PyBuffer_Release(self->buffer);
26 PyMem_FREE(self->buffer);
27 self->buffer = NULL;
28 }
29
30 if (self->cstream) {
31 ZSTD_freeCStream(self->cstream);
32 self->cstream = NULL;
33 }
34
35 if (self->output.dst) {
36 PyMem_Free(self->output.dst);
37 self->output.dst = NULL;
38 }
39
40 PyObject_Del(self);
41 }
42
43 static PyObject* ZstdCompressorIterator_iter(PyObject* self) {
44 Py_INCREF(self);
45 return self;
46 }
47
48 static PyObject* ZstdCompressorIterator_iternext(ZstdCompressorIterator* self) {
49 size_t zresult;
50 PyObject* readResult = NULL;
51 PyObject* chunk;
52 char* readBuffer;
53 Py_ssize_t readSize = 0;
54 Py_ssize_t bufferRemaining;
55
56 if (self->finishedOutput) {
57 PyErr_SetString(PyExc_StopIteration, "output flushed");
58 return NULL;
59 }
60
61 feedcompressor:
62
63 /* If we have data left in the input, consume it. */
64 if (self->input.pos < self->input.size) {
65 Py_BEGIN_ALLOW_THREADS
66 zresult = ZSTD_compressStream(self->cstream, &self->output, &self->input);
67 Py_END_ALLOW_THREADS
68
69 /* Release the Python object holding the input buffer. */
70 if (self->input.pos == self->input.size) {
71 self->input.src = NULL;
72 self->input.pos = 0;
73 self->input.size = 0;
74 Py_DECREF(self->readResult);
75 self->readResult = NULL;
76 }
77
78 if (ZSTD_isError(zresult)) {
79 PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
80 return NULL;
81 }
82
83 /* If it produced output data, emit it. */
84 if (self->output.pos) {
85 chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos);
86 self->output.pos = 0;
87 return chunk;
88 }
89 }
90
91 /* We should never have output data sitting around after a previous call. */
92 assert(self->output.pos == 0);
93
94 /* The code above should have either emitted a chunk and returned or consumed
95 the entire input buffer. So the state of the input buffer is not
96 relevant. */
97 if (!self->finishedInput) {
98 if (self->reader) {
99 readResult = PyObject_CallMethod(self->reader, "read", "I", self->inSize);
100 if (!readResult) {
101 PyErr_SetString(ZstdError, "could not read() from source");
102 return NULL;
103 }
104
105 PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize);
106 }
107 else {
108 assert(self->buffer && self->buffer->buf);
109
110 /* Only support contiguous C arrays. */
111 assert(self->buffer->strides == NULL && self->buffer->suboffsets == NULL);
112 assert(self->buffer->itemsize == 1);
113
114 readBuffer = (char*)self->buffer->buf + self->bufferOffset;
115 bufferRemaining = self->buffer->len - self->bufferOffset;
116 readSize = min(bufferRemaining, (Py_ssize_t)self->inSize);
117 self->bufferOffset += readSize;
118 }
119
120 if (0 == readSize) {
121 Py_XDECREF(readResult);
122 self->finishedInput = 1;
123 }
124 else {
125 self->readResult = readResult;
126 }
127 }
128
129 /* EOF */
130 if (0 == readSize) {
131 zresult = ZSTD_endStream(self->cstream, &self->output);
132 if (ZSTD_isError(zresult)) {
133 PyErr_Format(ZstdError, "error ending compression stream: %s",
134 ZSTD_getErrorName(zresult));
135 return NULL;
136 }
137
138 assert(self->output.pos);
139
140 if (0 == zresult) {
141 self->finishedOutput = 1;
142 }
143
144 chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos);
145 self->output.pos = 0;
146 return chunk;
147 }
148
149 /* New data from reader. Feed into compressor. */
150 self->input.src = readBuffer;
151 self->input.size = readSize;
152 self->input.pos = 0;
153
154 Py_BEGIN_ALLOW_THREADS
155 zresult = ZSTD_compressStream(self->cstream, &self->output, &self->input);
156 Py_END_ALLOW_THREADS
157
158 /* The input buffer currently points to memory managed by Python
159 (readBuffer). This object was allocated by this function. If it wasn't
160 fully consumed, we need to release it in a subsequent function call.
161 If it is fully consumed, do that now.
162 */
163 if (self->input.pos == self->input.size) {
164 self->input.src = NULL;
165 self->input.pos = 0;
166 self->input.size = 0;
167 Py_XDECREF(self->readResult);
168 self->readResult = NULL;
169 }
170
171 if (ZSTD_isError(zresult)) {
172 PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
173 return NULL;
174 }
175
176 assert(self->input.pos <= self->input.size);
177
178 /* If we didn't write anything, start the process over. */
179 if (0 == self->output.pos) {
180 goto feedcompressor;
181 }
182
183 chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos);
184 self->output.pos = 0;
185 return chunk;
186 }
187
188 PyTypeObject ZstdCompressorIteratorType = {
189 PyVarObject_HEAD_INIT(NULL, 0)
190 "zstd.ZstdCompressorIterator", /* tp_name */
191 sizeof(ZstdCompressorIterator), /* tp_basicsize */
192 0, /* tp_itemsize */
193 (destructor)ZstdCompressorIterator_dealloc, /* tp_dealloc */
194 0, /* tp_print */
195 0, /* tp_getattr */
196 0, /* tp_setattr */
197 0, /* tp_compare */
198 0, /* tp_repr */
199 0, /* tp_as_number */
200 0, /* tp_as_sequence */
201 0, /* tp_as_mapping */
202 0, /* tp_hash */
203 0, /* tp_call */
204 0, /* tp_str */
205 0, /* tp_getattro */
206 0, /* tp_setattro */
207 0, /* tp_as_buffer */
208 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
209 ZstdCompressorIterator__doc__, /* tp_doc */
210 0, /* tp_traverse */
211 0, /* tp_clear */
212 0, /* tp_richcompare */
213 0, /* tp_weaklistoffset */
214 ZstdCompressorIterator_iter, /* tp_iter */
215 (iternextfunc)ZstdCompressorIterator_iternext, /* tp_iternext */
216 0, /* tp_methods */
217 0, /* tp_members */
218 0, /* tp_getset */
219 0, /* tp_base */
220 0, /* tp_dict */
221 0, /* tp_descr_get */
222 0, /* tp_descr_set */
223 0, /* tp_dictoffset */
224 0, /* tp_init */
225 0, /* tp_alloc */
226 PyType_GenericNew, /* tp_new */
227 };
228
229 void compressoriterator_module_init(PyObject* mod) {
230 Py_TYPE(&ZstdCompressorIteratorType) = &PyType_Type;
231 if (PyType_Ready(&ZstdCompressorIteratorType) < 0) {
232 return;
233 }
234 }
@@ -0,0 +1,84 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 extern PyObject* ZstdError;
12
13 static char frame_header[] = {
14 '\x28',
15 '\xb5',
16 '\x2f',
17 '\xfd',
18 };
19
20 void constants_module_init(PyObject* mod) {
21 PyObject* version;
22 PyObject* zstdVersion;
23 PyObject* frameHeader;
24
25 #if PY_MAJOR_VERSION >= 3
26 version = PyUnicode_FromString(PYTHON_ZSTANDARD_VERSION);
27 #else
28 version = PyString_FromString(PYTHON_ZSTANDARD_VERSION);
29 #endif
30 Py_INCREF(version);
31 PyModule_AddObject(mod, "__version__", version);
32
33 ZstdError = PyErr_NewException("zstd.ZstdError", NULL, NULL);
34 PyModule_AddObject(mod, "ZstdError", ZstdError);
35
36 /* For now, the version is a simple tuple instead of a dedicated type. */
37 zstdVersion = PyTuple_New(3);
38 PyTuple_SetItem(zstdVersion, 0, PyLong_FromLong(ZSTD_VERSION_MAJOR));
39 PyTuple_SetItem(zstdVersion, 1, PyLong_FromLong(ZSTD_VERSION_MINOR));
40 PyTuple_SetItem(zstdVersion, 2, PyLong_FromLong(ZSTD_VERSION_RELEASE));
41 Py_IncRef(zstdVersion);
42 PyModule_AddObject(mod, "ZSTD_VERSION", zstdVersion);
43
44 frameHeader = PyBytes_FromStringAndSize(frame_header, sizeof(frame_header));
45 if (frameHeader) {
46 PyModule_AddObject(mod, "FRAME_HEADER", frameHeader);
47 }
48 else {
49 PyErr_Format(PyExc_ValueError, "could not create frame header object");
50 }
51
52 PyModule_AddIntConstant(mod, "MAX_COMPRESSION_LEVEL", ZSTD_maxCLevel());
53 PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_INPUT_SIZE",
54 (long)ZSTD_CStreamInSize());
55 PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_OUTPUT_SIZE",
56 (long)ZSTD_CStreamOutSize());
57 PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_INPUT_SIZE",
58 (long)ZSTD_DStreamInSize());
59 PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE",
60 (long)ZSTD_DStreamOutSize());
61
62 PyModule_AddIntConstant(mod, "MAGIC_NUMBER", ZSTD_MAGICNUMBER);
63 PyModule_AddIntConstant(mod, "WINDOWLOG_MIN", ZSTD_WINDOWLOG_MIN);
64 PyModule_AddIntConstant(mod, "WINDOWLOG_MAX", ZSTD_WINDOWLOG_MAX);
65 PyModule_AddIntConstant(mod, "CHAINLOG_MIN", ZSTD_CHAINLOG_MIN);
66 PyModule_AddIntConstant(mod, "CHAINLOG_MAX", ZSTD_CHAINLOG_MAX);
67 PyModule_AddIntConstant(mod, "HASHLOG_MIN", ZSTD_HASHLOG_MIN);
68 PyModule_AddIntConstant(mod, "HASHLOG_MAX", ZSTD_HASHLOG_MAX);
69 PyModule_AddIntConstant(mod, "HASHLOG3_MAX", ZSTD_HASHLOG3_MAX);
70 PyModule_AddIntConstant(mod, "SEARCHLOG_MIN", ZSTD_SEARCHLOG_MIN);
71 PyModule_AddIntConstant(mod, "SEARCHLOG_MAX", ZSTD_SEARCHLOG_MAX);
72 PyModule_AddIntConstant(mod, "SEARCHLENGTH_MIN", ZSTD_SEARCHLENGTH_MIN);
73 PyModule_AddIntConstant(mod, "SEARCHLENGTH_MAX", ZSTD_SEARCHLENGTH_MAX);
74 PyModule_AddIntConstant(mod, "TARGETLENGTH_MIN", ZSTD_TARGETLENGTH_MIN);
75 PyModule_AddIntConstant(mod, "TARGETLENGTH_MAX", ZSTD_TARGETLENGTH_MAX);
76
77 PyModule_AddIntConstant(mod, "STRATEGY_FAST", ZSTD_fast);
78 PyModule_AddIntConstant(mod, "STRATEGY_DFAST", ZSTD_dfast);
79 PyModule_AddIntConstant(mod, "STRATEGY_GREEDY", ZSTD_greedy);
80 PyModule_AddIntConstant(mod, "STRATEGY_LAZY", ZSTD_lazy);
81 PyModule_AddIntConstant(mod, "STRATEGY_LAZY2", ZSTD_lazy2);
82 PyModule_AddIntConstant(mod, "STRATEGY_BTLAZY2", ZSTD_btlazy2);
83 PyModule_AddIntConstant(mod, "STRATEGY_BTOPT", ZSTD_btopt);
84 }
@@ -0,0 +1,187 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 extern PyObject* ZstdError;
12
13 PyDoc_STRVAR(ZstdDecompressionWriter__doc,
14 """A context manager used for writing decompressed output.\n"
15 );
16
17 static void ZstdDecompressionWriter_dealloc(ZstdDecompressionWriter* self) {
18 Py_XDECREF(self->decompressor);
19 Py_XDECREF(self->writer);
20
21 if (self->dstream) {
22 ZSTD_freeDStream(self->dstream);
23 self->dstream = NULL;
24 }
25
26 PyObject_Del(self);
27 }
28
29 static PyObject* ZstdDecompressionWriter_enter(ZstdDecompressionWriter* self) {
30 if (self->entered) {
31 PyErr_SetString(ZstdError, "cannot __enter__ multiple times");
32 return NULL;
33 }
34
35 self->dstream = DStream_from_ZstdDecompressor(self->decompressor);
36 if (!self->dstream) {
37 return NULL;
38 }
39
40 self->entered = 1;
41
42 Py_INCREF(self);
43 return (PyObject*)self;
44 }
45
46 static PyObject* ZstdDecompressionWriter_exit(ZstdDecompressionWriter* self, PyObject* args) {
47 self->entered = 0;
48
49 if (self->dstream) {
50 ZSTD_freeDStream(self->dstream);
51 self->dstream = NULL;
52 }
53
54 Py_RETURN_FALSE;
55 }
56
57 static PyObject* ZstdDecompressionWriter_memory_size(ZstdDecompressionWriter* self) {
58 if (!self->dstream) {
59 PyErr_SetString(ZstdError, "cannot determine size of inactive decompressor; "
60 "call when context manager is active");
61 return NULL;
62 }
63
64 return PyLong_FromSize_t(ZSTD_sizeof_DStream(self->dstream));
65 }
66
67 static PyObject* ZstdDecompressionWriter_write(ZstdDecompressionWriter* self, PyObject* args) {
68 const char* source;
69 Py_ssize_t sourceSize;
70 size_t zresult = 0;
71 ZSTD_inBuffer input;
72 ZSTD_outBuffer output;
73 PyObject* res;
74
75 #if PY_MAJOR_VERSION >= 3
76 if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
77 #else
78 if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
79 #endif
80 return NULL;
81 }
82
83 if (!self->entered) {
84 PyErr_SetString(ZstdError, "write must be called from an active context manager");
85 return NULL;
86 }
87
88 output.dst = malloc(self->outSize);
89 if (!output.dst) {
90 return PyErr_NoMemory();
91 }
92 output.size = self->outSize;
93 output.pos = 0;
94
95 input.src = source;
96 input.size = sourceSize;
97 input.pos = 0;
98
99 while ((ssize_t)input.pos < sourceSize) {
100 Py_BEGIN_ALLOW_THREADS
101 zresult = ZSTD_decompressStream(self->dstream, &output, &input);
102 Py_END_ALLOW_THREADS
103
104 if (ZSTD_isError(zresult)) {
105 free(output.dst);
106 PyErr_Format(ZstdError, "zstd decompress error: %s",
107 ZSTD_getErrorName(zresult));
108 return NULL;
109 }
110
111 if (output.pos) {
112 #if PY_MAJOR_VERSION >= 3
113 res = PyObject_CallMethod(self->writer, "write", "y#",
114 #else
115 res = PyObject_CallMethod(self->writer, "write", "s#",
116 #endif
117 output.dst, output.pos);
118 Py_XDECREF(res);
119 output.pos = 0;
120 }
121 }
122
123 free(output.dst);
124
125 /* TODO return bytes written */
126 Py_RETURN_NONE;
127 }
128
129 static PyMethodDef ZstdDecompressionWriter_methods[] = {
130 { "__enter__", (PyCFunction)ZstdDecompressionWriter_enter, METH_NOARGS,
131 PyDoc_STR("Enter a decompression context.") },
132 { "__exit__", (PyCFunction)ZstdDecompressionWriter_exit, METH_VARARGS,
133 PyDoc_STR("Exit a decompression context.") },
134 { "memory_size", (PyCFunction)ZstdDecompressionWriter_memory_size, METH_NOARGS,
135 PyDoc_STR("Obtain the memory size in bytes of the underlying decompressor.") },
136 { "write", (PyCFunction)ZstdDecompressionWriter_write, METH_VARARGS,
137 PyDoc_STR("Compress data") },
138 { NULL, NULL }
139 };
140
141 PyTypeObject ZstdDecompressionWriterType = {
142 PyVarObject_HEAD_INIT(NULL, 0)
143 "zstd.ZstdDecompressionWriter", /* tp_name */
144 sizeof(ZstdDecompressionWriter),/* tp_basicsize */
145 0, /* tp_itemsize */
146 (destructor)ZstdDecompressionWriter_dealloc, /* tp_dealloc */
147 0, /* tp_print */
148 0, /* tp_getattr */
149 0, /* tp_setattr */
150 0, /* tp_compare */
151 0, /* tp_repr */
152 0, /* tp_as_number */
153 0, /* tp_as_sequence */
154 0, /* tp_as_mapping */
155 0, /* tp_hash */
156 0, /* tp_call */
157 0, /* tp_str */
158 0, /* tp_getattro */
159 0, /* tp_setattro */
160 0, /* tp_as_buffer */
161 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
162 ZstdDecompressionWriter__doc, /* tp_doc */
163 0, /* tp_traverse */
164 0, /* tp_clear */
165 0, /* tp_richcompare */
166 0, /* tp_weaklistoffset */
167 0, /* tp_iter */
168 0, /* tp_iternext */
169 ZstdDecompressionWriter_methods,/* tp_methods */
170 0, /* tp_members */
171 0, /* tp_getset */
172 0, /* tp_base */
173 0, /* tp_dict */
174 0, /* tp_descr_get */
175 0, /* tp_descr_set */
176 0, /* tp_dictoffset */
177 0, /* tp_init */
178 0, /* tp_alloc */
179 PyType_GenericNew, /* tp_new */
180 };
181
182 void decompressionwriter_module_init(PyObject* mod) {
183 Py_TYPE(&ZstdDecompressionWriterType) = &PyType_Type;
184 if (PyType_Ready(&ZstdDecompressionWriterType) < 0) {
185 return;
186 }
187 }
@@ -0,0 +1,170 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 extern PyObject* ZstdError;
12
13 PyDoc_STRVAR(DecompressionObj__doc__,
14 "Perform decompression using a standard library compatible API.\n"
15 );
16
17 static void DecompressionObj_dealloc(ZstdDecompressionObj* self) {
18 if (self->dstream) {
19 ZSTD_freeDStream(self->dstream);
20 self->dstream = NULL;
21 }
22
23 Py_XDECREF(self->decompressor);
24
25 PyObject_Del(self);
26 }
27
28 static PyObject* DecompressionObj_decompress(ZstdDecompressionObj* self, PyObject* args) {
29 const char* source;
30 Py_ssize_t sourceSize;
31 size_t zresult;
32 ZSTD_inBuffer input;
33 ZSTD_outBuffer output;
34 size_t outSize = ZSTD_DStreamOutSize();
35 PyObject* result = NULL;
36 Py_ssize_t resultSize = 0;
37
38 if (self->finished) {
39 PyErr_SetString(ZstdError, "cannot use a decompressobj multiple times");
40 return NULL;
41 }
42
43 #if PY_MAJOR_VERSION >= 3
44 if (!PyArg_ParseTuple(args, "y#",
45 #else
46 if (!PyArg_ParseTuple(args, "s#",
47 #endif
48 &source, &sourceSize)) {
49 return NULL;
50 }
51
52 input.src = source;
53 input.size = sourceSize;
54 input.pos = 0;
55
56 output.dst = PyMem_Malloc(outSize);
57 if (!output.dst) {
58 PyErr_NoMemory();
59 return NULL;
60 }
61 output.size = outSize;
62 output.pos = 0;
63
64 /* Read input until exhausted. */
65 while (input.pos < input.size) {
66 Py_BEGIN_ALLOW_THREADS
67 zresult = ZSTD_decompressStream(self->dstream, &output, &input);
68 Py_END_ALLOW_THREADS
69
70 if (ZSTD_isError(zresult)) {
71 PyErr_Format(ZstdError, "zstd decompressor error: %s",
72 ZSTD_getErrorName(zresult));
73 result = NULL;
74 goto finally;
75 }
76
77 if (0 == zresult) {
78 self->finished = 1;
79 }
80
81 if (output.pos) {
82 if (result) {
83 resultSize = PyBytes_GET_SIZE(result);
84 if (-1 == _PyBytes_Resize(&result, resultSize + output.pos)) {
85 goto except;
86 }
87
88 memcpy(PyBytes_AS_STRING(result) + resultSize,
89 output.dst, output.pos);
90 }
91 else {
92 result = PyBytes_FromStringAndSize(output.dst, output.pos);
93 if (!result) {
94 goto except;
95 }
96 }
97
98 output.pos = 0;
99 }
100 }
101
102 if (!result) {
103 result = PyBytes_FromString("");
104 }
105
106 goto finally;
107
108 except:
109 Py_DecRef(result);
110 result = NULL;
111
112 finally:
113 PyMem_Free(output.dst);
114
115 return result;
116 }
117
118 static PyMethodDef DecompressionObj_methods[] = {
119 { "decompress", (PyCFunction)DecompressionObj_decompress,
120 METH_VARARGS, PyDoc_STR("decompress data") },
121 { NULL, NULL }
122 };
123
124 PyTypeObject ZstdDecompressionObjType = {
125 PyVarObject_HEAD_INIT(NULL, 0)
126 "zstd.ZstdDecompressionObj", /* tp_name */
127 sizeof(ZstdDecompressionObj), /* tp_basicsize */
128 0, /* tp_itemsize */
129 (destructor)DecompressionObj_dealloc, /* tp_dealloc */
130 0, /* tp_print */
131 0, /* tp_getattr */
132 0, /* tp_setattr */
133 0, /* tp_compare */
134 0, /* tp_repr */
135 0, /* tp_as_number */
136 0, /* tp_as_sequence */
137 0, /* tp_as_mapping */
138 0, /* tp_hash */
139 0, /* tp_call */
140 0, /* tp_str */
141 0, /* tp_getattro */
142 0, /* tp_setattro */
143 0, /* tp_as_buffer */
144 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
145 DecompressionObj__doc__, /* tp_doc */
146 0, /* tp_traverse */
147 0, /* tp_clear */
148 0, /* tp_richcompare */
149 0, /* tp_weaklistoffset */
150 0, /* tp_iter */
151 0, /* tp_iternext */
152 DecompressionObj_methods, /* tp_methods */
153 0, /* tp_members */
154 0, /* tp_getset */
155 0, /* tp_base */
156 0, /* tp_dict */
157 0, /* tp_descr_get */
158 0, /* tp_descr_set */
159 0, /* tp_dictoffset */
160 0, /* tp_init */
161 0, /* tp_alloc */
162 PyType_GenericNew, /* tp_new */
163 };
164
165 void decompressobj_module_init(PyObject* module) {
166 Py_TYPE(&ZstdDecompressionObjType) = &PyType_Type;
167 if (PyType_Ready(&ZstdDecompressionObjType) < 0) {
168 return;
169 }
170 }
This diff has been collapsed as it changes many lines, (669 lines changed) Show them Hide them
@@ -0,0 +1,669 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 extern PyObject* ZstdError;
12
13 ZSTD_DStream* DStream_from_ZstdDecompressor(ZstdDecompressor* decompressor) {
14 ZSTD_DStream* dstream;
15 void* dictData = NULL;
16 size_t dictSize = 0;
17 size_t zresult;
18
19 dstream = ZSTD_createDStream();
20 if (!dstream) {
21 PyErr_SetString(ZstdError, "could not create DStream");
22 return NULL;
23 }
24
25 if (decompressor->dict) {
26 dictData = decompressor->dict->dictData;
27 dictSize = decompressor->dict->dictSize;
28 }
29
30 if (dictData) {
31 zresult = ZSTD_initDStream_usingDict(dstream, dictData, dictSize);
32 }
33 else {
34 zresult = ZSTD_initDStream(dstream);
35 }
36
37 if (ZSTD_isError(zresult)) {
38 PyErr_Format(ZstdError, "could not initialize DStream: %s",
39 ZSTD_getErrorName(zresult));
40 return NULL;
41 }
42
43 return dstream;
44 }
45
46 PyDoc_STRVAR(Decompressor__doc__,
47 "ZstdDecompressor(dict_data=None)\n"
48 "\n"
49 "Create an object used to perform Zstandard decompression.\n"
50 "\n"
51 "An instance can perform multiple decompression operations."
52 );
53
54 static int Decompressor_init(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) {
55 static char* kwlist[] = {
56 "dict_data",
57 NULL
58 };
59
60 ZstdCompressionDict* dict = NULL;
61
62 self->refdctx = NULL;
63 self->dict = NULL;
64 self->ddict = NULL;
65
66 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O!", kwlist,
67 &ZstdCompressionDictType, &dict)) {
68 return -1;
69 }
70
71 /* Instead of creating a ZSTD_DCtx for every decompression operation,
72 we create an instance at object creation time and recycle it via
73 ZSTD_copyDCTx() on each use. This means each use is a malloc+memcpy
74 instead of a malloc+init. */
75 /* TODO lazily initialize the reference ZSTD_DCtx on first use since
76 not instances of ZstdDecompressor will use a ZSTD_DCtx. */
77 self->refdctx = ZSTD_createDCtx();
78 if (!self->refdctx) {
79 PyErr_NoMemory();
80 goto except;
81 }
82
83 if (dict) {
84 self->dict = dict;
85 Py_INCREF(dict);
86 }
87
88 return 0;
89
90 except:
91 if (self->refdctx) {
92 ZSTD_freeDCtx(self->refdctx);
93 self->refdctx = NULL;
94 }
95
96 return -1;
97 }
98
99 static void Decompressor_dealloc(ZstdDecompressor* self) {
100 if (self->refdctx) {
101 ZSTD_freeDCtx(self->refdctx);
102 }
103
104 Py_XDECREF(self->dict);
105
106 if (self->ddict) {
107 ZSTD_freeDDict(self->ddict);
108 self->ddict = NULL;
109 }
110
111 PyObject_Del(self);
112 }
113
114 PyDoc_STRVAR(Decompressor_copy_stream__doc__,
115 "copy_stream(ifh, ofh[, read_size=default, write_size=default]) -- decompress data between streams\n"
116 "\n"
117 "Compressed data will be read from ``ifh``, decompressed, and written to\n"
118 "``ofh``. ``ifh`` must have a ``read(size)`` method. ``ofh`` must have a\n"
119 "``write(data)`` method.\n"
120 "\n"
121 "The optional ``read_size`` and ``write_size`` arguments control the chunk\n"
122 "size of data that is ``read()`` and ``write()`` between streams. They default\n"
123 "to the default input and output sizes of zstd decompressor streams.\n"
124 );
125
126 static PyObject* Decompressor_copy_stream(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) {
127 static char* kwlist[] = {
128 "ifh",
129 "ofh",
130 "read_size",
131 "write_size",
132 NULL
133 };
134
135 PyObject* source;
136 PyObject* dest;
137 size_t inSize = ZSTD_DStreamInSize();
138 size_t outSize = ZSTD_DStreamOutSize();
139 ZSTD_DStream* dstream;
140 ZSTD_inBuffer input;
141 ZSTD_outBuffer output;
142 Py_ssize_t totalRead = 0;
143 Py_ssize_t totalWrite = 0;
144 char* readBuffer;
145 Py_ssize_t readSize;
146 PyObject* readResult;
147 PyObject* res = NULL;
148 size_t zresult = 0;
149 PyObject* writeResult;
150 PyObject* totalReadPy;
151 PyObject* totalWritePy;
152
153 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|kk", kwlist, &source,
154 &dest, &inSize, &outSize)) {
155 return NULL;
156 }
157
158 if (!PyObject_HasAttrString(source, "read")) {
159 PyErr_SetString(PyExc_ValueError, "first argument must have a read() method");
160 return NULL;
161 }
162
163 if (!PyObject_HasAttrString(dest, "write")) {
164 PyErr_SetString(PyExc_ValueError, "second argument must have a write() method");
165 return NULL;
166 }
167
168 dstream = DStream_from_ZstdDecompressor(self);
169 if (!dstream) {
170 res = NULL;
171 goto finally;
172 }
173
174 output.dst = PyMem_Malloc(outSize);
175 if (!output.dst) {
176 PyErr_NoMemory();
177 res = NULL;
178 goto finally;
179 }
180 output.size = outSize;
181 output.pos = 0;
182
183 /* Read source stream until EOF */
184 while (1) {
185 readResult = PyObject_CallMethod(source, "read", "n", inSize);
186 if (!readResult) {
187 PyErr_SetString(ZstdError, "could not read() from source");
188 goto finally;
189 }
190
191 PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize);
192
193 /* If no data was read, we're at EOF. */
194 if (0 == readSize) {
195 break;
196 }
197
198 totalRead += readSize;
199
200 /* Send data to decompressor */
201 input.src = readBuffer;
202 input.size = readSize;
203 input.pos = 0;
204
205 while (input.pos < input.size) {
206 Py_BEGIN_ALLOW_THREADS
207 zresult = ZSTD_decompressStream(dstream, &output, &input);
208 Py_END_ALLOW_THREADS
209
210 if (ZSTD_isError(zresult)) {
211 PyErr_Format(ZstdError, "zstd decompressor error: %s",
212 ZSTD_getErrorName(zresult));
213 res = NULL;
214 goto finally;
215 }
216
217 if (output.pos) {
218 #if PY_MAJOR_VERSION >= 3
219 writeResult = PyObject_CallMethod(dest, "write", "y#",
220 #else
221 writeResult = PyObject_CallMethod(dest, "write", "s#",
222 #endif
223 output.dst, output.pos);
224
225 Py_XDECREF(writeResult);
226 totalWrite += output.pos;
227 output.pos = 0;
228 }
229 }
230 }
231
232 /* Source stream is exhausted. Finish up. */
233
234 ZSTD_freeDStream(dstream);
235 dstream = NULL;
236
237 totalReadPy = PyLong_FromSsize_t(totalRead);
238 totalWritePy = PyLong_FromSsize_t(totalWrite);
239 res = PyTuple_Pack(2, totalReadPy, totalWritePy);
240 Py_DecRef(totalReadPy);
241 Py_DecRef(totalWritePy);
242
243 finally:
244 if (output.dst) {
245 PyMem_Free(output.dst);
246 }
247
248 if (dstream) {
249 ZSTD_freeDStream(dstream);
250 }
251
252 return res;
253 }
254
255 PyDoc_STRVAR(Decompressor_decompress__doc__,
256 "decompress(data[, max_output_size=None]) -- Decompress data in its entirety\n"
257 "\n"
258 "This method will decompress the entirety of the argument and return the\n"
259 "result.\n"
260 "\n"
261 "The input bytes are expected to contain a full Zstandard frame (something\n"
262 "compressed with ``ZstdCompressor.compress()`` or similar). If the input does\n"
263 "not contain a full frame, an exception will be raised.\n"
264 "\n"
265 "If the frame header of the compressed data does not contain the content size\n"
266 "``max_output_size`` must be specified or ``ZstdError`` will be raised. An\n"
267 "allocation of size ``max_output_size`` will be performed and an attempt will\n"
268 "be made to perform decompression into that buffer. If the buffer is too\n"
269 "small or cannot be allocated, ``ZstdError`` will be raised. The buffer will\n"
270 "be resized if it is too large.\n"
271 "\n"
272 "Uncompressed data could be much larger than compressed data. As a result,\n"
273 "calling this function could result in a very large memory allocation being\n"
274 "performed to hold the uncompressed data. Therefore it is **highly**\n"
275 "recommended to use a streaming decompression method instead of this one.\n"
276 );
277
278 PyObject* Decompressor_decompress(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) {
279 static char* kwlist[] = {
280 "data",
281 "max_output_size",
282 NULL
283 };
284
285 const char* source;
286 Py_ssize_t sourceSize;
287 Py_ssize_t maxOutputSize = 0;
288 unsigned long long decompressedSize;
289 size_t destCapacity;
290 PyObject* result = NULL;
291 ZSTD_DCtx* dctx = NULL;
292 void* dictData = NULL;
293 size_t dictSize = 0;
294 size_t zresult;
295
296 #if PY_MAJOR_VERSION >= 3
297 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y#|n", kwlist,
298 #else
299 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s#|n", kwlist,
300 #endif
301 &source, &sourceSize, &maxOutputSize)) {
302 return NULL;
303 }
304
305 dctx = PyMem_Malloc(ZSTD_sizeof_DCtx(self->refdctx));
306 if (!dctx) {
307 PyErr_NoMemory();
308 return NULL;
309 }
310
311 ZSTD_copyDCtx(dctx, self->refdctx);
312
313 if (self->dict) {
314 dictData = self->dict->dictData;
315 dictSize = self->dict->dictSize;
316 }
317
318 if (dictData && !self->ddict) {
319 Py_BEGIN_ALLOW_THREADS
320 self->ddict = ZSTD_createDDict(dictData, dictSize);
321 Py_END_ALLOW_THREADS
322
323 if (!self->ddict) {
324 PyErr_SetString(ZstdError, "could not create decompression dict");
325 goto except;
326 }
327 }
328
329 decompressedSize = ZSTD_getDecompressedSize(source, sourceSize);
330 /* 0 returned if content size not in the zstd frame header */
331 if (0 == decompressedSize) {
332 if (0 == maxOutputSize) {
333 PyErr_SetString(ZstdError, "input data invalid or missing content size "
334 "in frame header");
335 goto except;
336 }
337 else {
338 result = PyBytes_FromStringAndSize(NULL, maxOutputSize);
339 destCapacity = maxOutputSize;
340 }
341 }
342 else {
343 result = PyBytes_FromStringAndSize(NULL, decompressedSize);
344 destCapacity = decompressedSize;
345 }
346
347 if (!result) {
348 goto except;
349 }
350
351 Py_BEGIN_ALLOW_THREADS
352 if (self->ddict) {
353 zresult = ZSTD_decompress_usingDDict(dctx, PyBytes_AsString(result), destCapacity,
354 source, sourceSize, self->ddict);
355 }
356 else {
357 zresult = ZSTD_decompressDCtx(dctx, PyBytes_AsString(result), destCapacity, source, sourceSize);
358 }
359 Py_END_ALLOW_THREADS
360
361 if (ZSTD_isError(zresult)) {
362 PyErr_Format(ZstdError, "decompression error: %s", ZSTD_getErrorName(zresult));
363 goto except;
364 }
365 else if (decompressedSize && zresult != decompressedSize) {
366 PyErr_Format(ZstdError, "decompression error: decompressed %zu bytes; expected %llu",
367 zresult, decompressedSize);
368 goto except;
369 }
370 else if (zresult < destCapacity) {
371 if (_PyBytes_Resize(&result, zresult)) {
372 goto except;
373 }
374 }
375
376 goto finally;
377
378 except:
379 Py_DecRef(result);
380 result = NULL;
381
382 finally:
383 if (dctx) {
384 PyMem_FREE(dctx);
385 }
386
387 return result;
388 }
389
390 PyDoc_STRVAR(Decompressor_decompressobj__doc__,
391 "decompressobj()\n"
392 "\n"
393 "Incrementally feed data into a decompressor.\n"
394 "\n"
395 "The returned object exposes a ``decompress(data)`` method. This makes it\n"
396 "compatible with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor`` so that\n"
397 "callers can swap in the zstd decompressor while using the same API.\n"
398 );
399
400 static ZstdDecompressionObj* Decompressor_decompressobj(ZstdDecompressor* self) {
401 ZstdDecompressionObj* result = PyObject_New(ZstdDecompressionObj, &ZstdDecompressionObjType);
402 if (!result) {
403 return NULL;
404 }
405
406 result->dstream = DStream_from_ZstdDecompressor(self);
407 if (!result->dstream) {
408 Py_DecRef((PyObject*)result);
409 return NULL;
410 }
411
412 result->decompressor = self;
413 Py_INCREF(result->decompressor);
414
415 result->finished = 0;
416
417 return result;
418 }
419
420 PyDoc_STRVAR(Decompressor_read_from__doc__,
421 "read_from(reader[, read_size=default, write_size=default, skip_bytes=0])\n"
422 "Read compressed data and return an iterator\n"
423 "\n"
424 "Returns an iterator of decompressed data chunks produced from reading from\n"
425 "the ``reader``.\n"
426 "\n"
427 "Compressed data will be obtained from ``reader`` by calling the\n"
428 "``read(size)`` method of it. The source data will be streamed into a\n"
429 "decompressor. As decompressed data is available, it will be exposed to the\n"
430 "returned iterator.\n"
431 "\n"
432 "Data is ``read()`` in chunks of size ``read_size`` and exposed to the\n"
433 "iterator in chunks of size ``write_size``. The default values are the input\n"
434 "and output sizes for a zstd streaming decompressor.\n"
435 "\n"
436 "There is also support for skipping the first ``skip_bytes`` of data from\n"
437 "the source.\n"
438 );
439
440 static ZstdDecompressorIterator* Decompressor_read_from(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) {
441 static char* kwlist[] = {
442 "reader",
443 "read_size",
444 "write_size",
445 "skip_bytes",
446 NULL
447 };
448
449 PyObject* reader;
450 size_t inSize = ZSTD_DStreamInSize();
451 size_t outSize = ZSTD_DStreamOutSize();
452 ZstdDecompressorIterator* result;
453 size_t skipBytes = 0;
454
455 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kkk", kwlist, &reader,
456 &inSize, &outSize, &skipBytes)) {
457 return NULL;
458 }
459
460 if (skipBytes >= inSize) {
461 PyErr_SetString(PyExc_ValueError,
462 "skip_bytes must be smaller than read_size");
463 return NULL;
464 }
465
466 result = PyObject_New(ZstdDecompressorIterator, &ZstdDecompressorIteratorType);
467 if (!result) {
468 return NULL;
469 }
470
471 result->decompressor = NULL;
472 result->reader = NULL;
473 result->buffer = NULL;
474 result->dstream = NULL;
475 result->input.src = NULL;
476 result->output.dst = NULL;
477
478 if (PyObject_HasAttrString(reader, "read")) {
479 result->reader = reader;
480 Py_INCREF(result->reader);
481 }
482 else if (1 == PyObject_CheckBuffer(reader)) {
483 /* Object claims it is a buffer. Try to get a handle to it. */
484 result->buffer = PyMem_Malloc(sizeof(Py_buffer));
485 if (!result->buffer) {
486 goto except;
487 }
488
489 memset(result->buffer, 0, sizeof(Py_buffer));
490
491 if (0 != PyObject_GetBuffer(reader, result->buffer, PyBUF_CONTIG_RO)) {
492 goto except;
493 }
494
495 result->bufferOffset = 0;
496 }
497 else {
498 PyErr_SetString(PyExc_ValueError,
499 "must pass an object with a read() method or conforms to buffer protocol");
500 goto except;
501 }
502
503 result->decompressor = self;
504 Py_INCREF(result->decompressor);
505
506 result->inSize = inSize;
507 result->outSize = outSize;
508 result->skipBytes = skipBytes;
509
510 result->dstream = DStream_from_ZstdDecompressor(self);
511 if (!result->dstream) {
512 goto except;
513 }
514
515 result->input.src = PyMem_Malloc(inSize);
516 if (!result->input.src) {
517 PyErr_NoMemory();
518 goto except;
519 }
520 result->input.size = 0;
521 result->input.pos = 0;
522
523 result->output.dst = NULL;
524 result->output.size = 0;
525 result->output.pos = 0;
526
527 result->readCount = 0;
528 result->finishedInput = 0;
529 result->finishedOutput = 0;
530
531 goto finally;
532
533 except:
534 if (result->reader) {
535 Py_DECREF(result->reader);
536 result->reader = NULL;
537 }
538
539 if (result->buffer) {
540 PyBuffer_Release(result->buffer);
541 Py_DECREF(result->buffer);
542 result->buffer = NULL;
543 }
544
545 Py_DECREF(result);
546 result = NULL;
547
548 finally:
549
550 return result;
551 }
552
553 PyDoc_STRVAR(Decompressor_write_to__doc__,
554 "Create a context manager to write decompressed data to an object.\n"
555 "\n"
556 "The passed object must have a ``write()`` method.\n"
557 "\n"
558 "The caller feeds intput data to the object by calling ``write(data)``.\n"
559 "Decompressed data is written to the argument given as it is decompressed.\n"
560 "\n"
561 "An optional ``write_size`` argument defines the size of chunks to\n"
562 "``write()`` to the writer. It defaults to the default output size for a zstd\n"
563 "streaming decompressor.\n"
564 );
565
566 static ZstdDecompressionWriter* Decompressor_write_to(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) {
567 static char* kwlist[] = {
568 "writer",
569 "write_size",
570 NULL
571 };
572
573 PyObject* writer;
574 size_t outSize = ZSTD_DStreamOutSize();
575 ZstdDecompressionWriter* result;
576
577 if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|k", kwlist, &writer, &outSize)) {
578 return NULL;
579 }
580
581 if (!PyObject_HasAttrString(writer, "write")) {
582 PyErr_SetString(PyExc_ValueError, "must pass an object with a write() method");
583 return NULL;
584 }
585
586 result = PyObject_New(ZstdDecompressionWriter, &ZstdDecompressionWriterType);
587 if (!result) {
588 return NULL;
589 }
590
591 result->decompressor = self;
592 Py_INCREF(result->decompressor);
593
594 result->writer = writer;
595 Py_INCREF(result->writer);
596
597 result->outSize = outSize;
598
599 result->entered = 0;
600 result->dstream = NULL;
601
602 return result;
603 }
604
605 static PyMethodDef Decompressor_methods[] = {
606 { "copy_stream", (PyCFunction)Decompressor_copy_stream, METH_VARARGS | METH_KEYWORDS,
607 Decompressor_copy_stream__doc__ },
608 { "decompress", (PyCFunction)Decompressor_decompress, METH_VARARGS | METH_KEYWORDS,
609 Decompressor_decompress__doc__ },
610 { "decompressobj", (PyCFunction)Decompressor_decompressobj, METH_NOARGS,
611 Decompressor_decompressobj__doc__ },
612 { "read_from", (PyCFunction)Decompressor_read_from, METH_VARARGS | METH_KEYWORDS,
613 Decompressor_read_from__doc__ },
614 { "write_to", (PyCFunction)Decompressor_write_to, METH_VARARGS | METH_KEYWORDS,
615 Decompressor_write_to__doc__ },
616 { NULL, NULL }
617 };
618
619 PyTypeObject ZstdDecompressorType = {
620 PyVarObject_HEAD_INIT(NULL, 0)
621 "zstd.ZstdDecompressor", /* tp_name */
622 sizeof(ZstdDecompressor), /* tp_basicsize */
623 0, /* tp_itemsize */
624 (destructor)Decompressor_dealloc, /* tp_dealloc */
625 0, /* tp_print */
626 0, /* tp_getattr */
627 0, /* tp_setattr */
628 0, /* tp_compare */
629 0, /* tp_repr */
630 0, /* tp_as_number */
631 0, /* tp_as_sequence */
632 0, /* tp_as_mapping */
633 0, /* tp_hash */
634 0, /* tp_call */
635 0, /* tp_str */
636 0, /* tp_getattro */
637 0, /* tp_setattro */
638 0, /* tp_as_buffer */
639 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
640 Decompressor__doc__, /* tp_doc */
641 0, /* tp_traverse */
642 0, /* tp_clear */
643 0, /* tp_richcompare */
644 0, /* tp_weaklistoffset */
645 0, /* tp_iter */
646 0, /* tp_iternext */
647 Decompressor_methods, /* tp_methods */
648 0, /* tp_members */
649 0, /* tp_getset */
650 0, /* tp_base */
651 0, /* tp_dict */
652 0, /* tp_descr_get */
653 0, /* tp_descr_set */
654 0, /* tp_dictoffset */
655 (initproc)Decompressor_init, /* tp_init */
656 0, /* tp_alloc */
657 PyType_GenericNew, /* tp_new */
658 };
659
660 void decompressor_module_init(PyObject* mod) {
661 Py_TYPE(&ZstdDecompressorType) = &PyType_Type;
662 if (PyType_Ready(&ZstdDecompressorType) < 0) {
663 return;
664 }
665
666 Py_INCREF((PyObject*)&ZstdDecompressorType);
667 PyModule_AddObject(mod, "ZstdDecompressor",
668 (PyObject*)&ZstdDecompressorType);
669 }
@@ -0,0 +1,254 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 #define min(a, b) (((a) < (b)) ? (a) : (b))
12
13 extern PyObject* ZstdError;
14
15 PyDoc_STRVAR(ZstdDecompressorIterator__doc__,
16 "Represents an iterator of decompressed data.\n"
17 );
18
19 static void ZstdDecompressorIterator_dealloc(ZstdDecompressorIterator* self) {
20 Py_XDECREF(self->decompressor);
21 Py_XDECREF(self->reader);
22
23 if (self->buffer) {
24 PyBuffer_Release(self->buffer);
25 PyMem_FREE(self->buffer);
26 self->buffer = NULL;
27 }
28
29 if (self->dstream) {
30 ZSTD_freeDStream(self->dstream);
31 self->dstream = NULL;
32 }
33
34 if (self->input.src) {
35 PyMem_Free((void*)self->input.src);
36 self->input.src = NULL;
37 }
38
39 PyObject_Del(self);
40 }
41
42 static PyObject* ZstdDecompressorIterator_iter(PyObject* self) {
43 Py_INCREF(self);
44 return self;
45 }
46
47 static DecompressorIteratorResult read_decompressor_iterator(ZstdDecompressorIterator* self) {
48 size_t zresult;
49 PyObject* chunk;
50 DecompressorIteratorResult result;
51 size_t oldInputPos = self->input.pos;
52
53 result.chunk = NULL;
54
55 chunk = PyBytes_FromStringAndSize(NULL, self->outSize);
56 if (!chunk) {
57 result.errored = 1;
58 return result;
59 }
60
61 self->output.dst = PyBytes_AsString(chunk);
62 self->output.size = self->outSize;
63 self->output.pos = 0;
64
65 Py_BEGIN_ALLOW_THREADS
66 zresult = ZSTD_decompressStream(self->dstream, &self->output, &self->input);
67 Py_END_ALLOW_THREADS
68
69 /* We're done with the pointer. Nullify to prevent anyone from getting a
70 handle on a Python object. */
71 self->output.dst = NULL;
72
73 if (ZSTD_isError(zresult)) {
74 Py_DECREF(chunk);
75 PyErr_Format(ZstdError, "zstd decompress error: %s",
76 ZSTD_getErrorName(zresult));
77 result.errored = 1;
78 return result;
79 }
80
81 self->readCount += self->input.pos - oldInputPos;
82
83 /* Frame is fully decoded. Input exhausted and output sitting in buffer. */
84 if (0 == zresult) {
85 self->finishedInput = 1;
86 self->finishedOutput = 1;
87 }
88
89 /* If it produced output data, return it. */
90 if (self->output.pos) {
91 if (self->output.pos < self->outSize) {
92 if (_PyBytes_Resize(&chunk, self->output.pos)) {
93 result.errored = 1;
94 return result;
95 }
96 }
97 }
98 else {
99 Py_DECREF(chunk);
100 chunk = NULL;
101 }
102
103 result.errored = 0;
104 result.chunk = chunk;
105
106 return result;
107 }
108
109 static PyObject* ZstdDecompressorIterator_iternext(ZstdDecompressorIterator* self) {
110 PyObject* readResult = NULL;
111 char* readBuffer;
112 Py_ssize_t readSize;
113 Py_ssize_t bufferRemaining;
114 DecompressorIteratorResult result;
115
116 if (self->finishedOutput) {
117 PyErr_SetString(PyExc_StopIteration, "output flushed");
118 return NULL;
119 }
120
121 /* If we have data left in the input, consume it. */
122 if (self->input.pos < self->input.size) {
123 result = read_decompressor_iterator(self);
124 if (result.chunk || result.errored) {
125 return result.chunk;
126 }
127
128 /* Else fall through to get more data from input. */
129 }
130
131 read_from_source:
132
133 if (!self->finishedInput) {
134 if (self->reader) {
135 readResult = PyObject_CallMethod(self->reader, "read", "I", self->inSize);
136 if (!readResult) {
137 return NULL;
138 }
139
140 PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize);
141 }
142 else {
143 assert(self->buffer && self->buffer->buf);
144
145 /* Only support contiguous C arrays for now */
146 assert(self->buffer->strides == NULL && self->buffer->suboffsets == NULL);
147 assert(self->buffer->itemsize == 1);
148
149 /* TODO avoid memcpy() below */
150 readBuffer = (char *)self->buffer->buf + self->bufferOffset;
151 bufferRemaining = self->buffer->len - self->bufferOffset;
152 readSize = min(bufferRemaining, (Py_ssize_t)self->inSize);
153 self->bufferOffset += readSize;
154 }
155
156 if (readSize) {
157 if (!self->readCount && self->skipBytes) {
158 assert(self->skipBytes < self->inSize);
159 if ((Py_ssize_t)self->skipBytes >= readSize) {
160 PyErr_SetString(PyExc_ValueError,
161 "skip_bytes larger than first input chunk; "
162 "this scenario is currently unsupported");
163 Py_DecRef(readResult);
164 return NULL;
165 }
166
167 readBuffer = readBuffer + self->skipBytes;
168 readSize -= self->skipBytes;
169 }
170
171 /* Copy input into previously allocated buffer because it can live longer
172 than a single function call and we don't want to keep a ref to a Python
173 object around. This could be changed... */
174 memcpy((void*)self->input.src, readBuffer, readSize);
175 self->input.size = readSize;
176 self->input.pos = 0;
177 }
178 /* No bytes on first read must mean an empty input stream. */
179 else if (!self->readCount) {
180 self->finishedInput = 1;
181 self->finishedOutput = 1;
182 Py_DecRef(readResult);
183 PyErr_SetString(PyExc_StopIteration, "empty input");
184 return NULL;
185 }
186 else {
187 self->finishedInput = 1;
188 }
189
190 /* We've copied the data managed by memory. Discard the Python object. */
191 Py_DecRef(readResult);
192 }
193
194 result = read_decompressor_iterator(self);
195 if (result.errored || result.chunk) {
196 return result.chunk;
197 }
198
199 /* No new output data. Try again unless we know there is no more data. */
200 if (!self->finishedInput) {
201 goto read_from_source;
202 }
203
204 PyErr_SetString(PyExc_StopIteration, "input exhausted");
205 return NULL;
206 }
207
208 PyTypeObject ZstdDecompressorIteratorType = {
209 PyVarObject_HEAD_INIT(NULL, 0)
210 "zstd.ZstdDecompressorIterator", /* tp_name */
211 sizeof(ZstdDecompressorIterator), /* tp_basicsize */
212 0, /* tp_itemsize */
213 (destructor)ZstdDecompressorIterator_dealloc, /* tp_dealloc */
214 0, /* tp_print */
215 0, /* tp_getattr */
216 0, /* tp_setattr */
217 0, /* tp_compare */
218 0, /* tp_repr */
219 0, /* tp_as_number */
220 0, /* tp_as_sequence */
221 0, /* tp_as_mapping */
222 0, /* tp_hash */
223 0, /* tp_call */
224 0, /* tp_str */
225 0, /* tp_getattro */
226 0, /* tp_setattro */
227 0, /* tp_as_buffer */
228 Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
229 ZstdDecompressorIterator__doc__, /* tp_doc */
230 0, /* tp_traverse */
231 0, /* tp_clear */
232 0, /* tp_richcompare */
233 0, /* tp_weaklistoffset */
234 ZstdDecompressorIterator_iter, /* tp_iter */
235 (iternextfunc)ZstdDecompressorIterator_iternext, /* tp_iternext */
236 0, /* tp_methods */
237 0, /* tp_members */
238 0, /* tp_getset */
239 0, /* tp_base */
240 0, /* tp_dict */
241 0, /* tp_descr_get */
242 0, /* tp_descr_set */
243 0, /* tp_dictoffset */
244 0, /* tp_init */
245 0, /* tp_alloc */
246 PyType_GenericNew, /* tp_new */
247 };
248
249 void decompressoriterator_module_init(PyObject* mod) {
250 Py_TYPE(&ZstdDecompressorIteratorType) = &PyType_Type;
251 if (PyType_Ready(&ZstdDecompressorIteratorType) < 0) {
252 return;
253 }
254 }
@@ -0,0 +1,125 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #include "python-zstandard.h"
10
11 PyDoc_STRVAR(DictParameters__doc__,
12 "DictParameters: low-level control over dictionary generation");
13
14 static PyObject* DictParameters_new(PyTypeObject* subtype, PyObject* args, PyObject* kwargs) {
15 DictParametersObject* self;
16 unsigned selectivityLevel;
17 int compressionLevel;
18 unsigned notificationLevel;
19 unsigned dictID;
20
21 if (!PyArg_ParseTuple(args, "IiII", &selectivityLevel, &compressionLevel,
22 &notificationLevel, &dictID)) {
23 return NULL;
24 }
25
26 self = (DictParametersObject*)subtype->tp_alloc(subtype, 1);
27 if (!self) {
28 return NULL;
29 }
30
31 self->selectivityLevel = selectivityLevel;
32 self->compressionLevel = compressionLevel;
33 self->notificationLevel = notificationLevel;
34 self->dictID = dictID;
35
36 return (PyObject*)self;
37 }
38
39 static void DictParameters_dealloc(PyObject* self) {
40 PyObject_Del(self);
41 }
42
43 static Py_ssize_t DictParameters_length(PyObject* self) {
44 return 4;
45 };
46
47 static PyObject* DictParameters_item(PyObject* o, Py_ssize_t i) {
48 DictParametersObject* self = (DictParametersObject*)o;
49
50 switch (i) {
51 case 0:
52 return PyLong_FromLong(self->selectivityLevel);
53 case 1:
54 return PyLong_FromLong(self->compressionLevel);
55 case 2:
56 return PyLong_FromLong(self->notificationLevel);
57 case 3:
58 return PyLong_FromLong(self->dictID);
59 default:
60 PyErr_SetString(PyExc_IndexError, "index out of range");
61 return NULL;
62 }
63 }
64
65 static PySequenceMethods DictParameters_sq = {
66 DictParameters_length, /* sq_length */
67 0, /* sq_concat */
68 0, /* sq_repeat */
69 DictParameters_item, /* sq_item */
70 0, /* sq_ass_item */
71 0, /* sq_contains */
72 0, /* sq_inplace_concat */
73 0 /* sq_inplace_repeat */
74 };
75
76 PyTypeObject DictParametersType = {
77 PyVarObject_HEAD_INIT(NULL, 0)
78 "DictParameters", /* tp_name */
79 sizeof(DictParametersObject), /* tp_basicsize */
80 0, /* tp_itemsize */
81 (destructor)DictParameters_dealloc, /* tp_dealloc */
82 0, /* tp_print */
83 0, /* tp_getattr */
84 0, /* tp_setattr */
85 0, /* tp_compare */
86 0, /* tp_repr */
87 0, /* tp_as_number */
88 &DictParameters_sq, /* tp_as_sequence */
89 0, /* tp_as_mapping */
90 0, /* tp_hash */
91 0, /* tp_call */
92 0, /* tp_str */
93 0, /* tp_getattro */
94 0, /* tp_setattro */
95 0, /* tp_as_buffer */
96 Py_TPFLAGS_DEFAULT, /* tp_flags */
97 DictParameters__doc__, /* tp_doc */
98 0, /* tp_traverse */
99 0, /* tp_clear */
100 0, /* tp_richcompare */
101 0, /* tp_weaklistoffset */
102 0, /* tp_iter */
103 0, /* tp_iternext */
104 0, /* tp_methods */
105 0, /* tp_members */
106 0, /* tp_getset */
107 0, /* tp_base */
108 0, /* tp_dict */
109 0, /* tp_descr_get */
110 0, /* tp_descr_set */
111 0, /* tp_dictoffset */
112 0, /* tp_init */
113 0, /* tp_alloc */
114 DictParameters_new, /* tp_new */
115 };
116
117 void dictparams_module_init(PyObject* mod) {
118 Py_TYPE(&DictParametersType) = &PyType_Type;
119 if (PyType_Ready(&DictParametersType) < 0) {
120 return;
121 }
122
123 Py_IncRef((PyObject*)&DictParametersType);
124 PyModule_AddObject(mod, "DictParameters", (PyObject*)&DictParametersType);
125 }
@@ -0,0 +1,172 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 #define PY_SSIZE_T_CLEAN
10 #include <Python.h>
11
12 #define ZSTD_STATIC_LINKING_ONLY
13 #define ZDICT_STATIC_LINKING_ONLY
14 #include "mem.h"
15 #include "zstd.h"
16 #include "zdict.h"
17
18 #define PYTHON_ZSTANDARD_VERSION "0.5.0"
19
20 typedef struct {
21 PyObject_HEAD
22 unsigned windowLog;
23 unsigned chainLog;
24 unsigned hashLog;
25 unsigned searchLog;
26 unsigned searchLength;
27 unsigned targetLength;
28 ZSTD_strategy strategy;
29 } CompressionParametersObject;
30
31 extern PyTypeObject CompressionParametersType;
32
33 typedef struct {
34 PyObject_HEAD
35 unsigned selectivityLevel;
36 int compressionLevel;
37 unsigned notificationLevel;
38 unsigned dictID;
39 } DictParametersObject;
40
41 extern PyTypeObject DictParametersType;
42
43 typedef struct {
44 PyObject_HEAD
45
46 void* dictData;
47 size_t dictSize;
48 } ZstdCompressionDict;
49
50 extern PyTypeObject ZstdCompressionDictType;
51
52 typedef struct {
53 PyObject_HEAD
54
55 int compressionLevel;
56 ZstdCompressionDict* dict;
57 ZSTD_CDict* cdict;
58 CompressionParametersObject* cparams;
59 ZSTD_frameParameters fparams;
60 } ZstdCompressor;
61
62 extern PyTypeObject ZstdCompressorType;
63
64 typedef struct {
65 PyObject_HEAD
66
67 ZstdCompressor* compressor;
68 ZSTD_CStream* cstream;
69 ZSTD_outBuffer output;
70 int flushed;
71 } ZstdCompressionObj;
72
73 extern PyTypeObject ZstdCompressionObjType;
74
75 typedef struct {
76 PyObject_HEAD
77
78 ZstdCompressor* compressor;
79 PyObject* writer;
80 Py_ssize_t sourceSize;
81 size_t outSize;
82 ZSTD_CStream* cstream;
83 int entered;
84 } ZstdCompressionWriter;
85
86 extern PyTypeObject ZstdCompressionWriterType;
87
88 typedef struct {
89 PyObject_HEAD
90
91 ZstdCompressor* compressor;
92 PyObject* reader;
93 Py_buffer* buffer;
94 Py_ssize_t bufferOffset;
95 Py_ssize_t sourceSize;
96 size_t inSize;
97 size_t outSize;
98
99 ZSTD_CStream* cstream;
100 ZSTD_inBuffer input;
101 ZSTD_outBuffer output;
102 int finishedOutput;
103 int finishedInput;
104 PyObject* readResult;
105 } ZstdCompressorIterator;
106
107 extern PyTypeObject ZstdCompressorIteratorType;
108
109 typedef struct {
110 PyObject_HEAD
111
112 ZSTD_DCtx* refdctx;
113
114 ZstdCompressionDict* dict;
115 ZSTD_DDict* ddict;
116 } ZstdDecompressor;
117
118 extern PyTypeObject ZstdDecompressorType;
119
120 typedef struct {
121 PyObject_HEAD
122
123 ZstdDecompressor* decompressor;
124 ZSTD_DStream* dstream;
125 int finished;
126 } ZstdDecompressionObj;
127
128 extern PyTypeObject ZstdDecompressionObjType;
129
130 typedef struct {
131 PyObject_HEAD
132
133 ZstdDecompressor* decompressor;
134 PyObject* writer;
135 size_t outSize;
136 ZSTD_DStream* dstream;
137 int entered;
138 } ZstdDecompressionWriter;
139
140 extern PyTypeObject ZstdDecompressionWriterType;
141
142 typedef struct {
143 PyObject_HEAD
144
145 ZstdDecompressor* decompressor;
146 PyObject* reader;
147 Py_buffer* buffer;
148 Py_ssize_t bufferOffset;
149 size_t inSize;
150 size_t outSize;
151 size_t skipBytes;
152 ZSTD_DStream* dstream;
153 ZSTD_inBuffer input;
154 ZSTD_outBuffer output;
155 Py_ssize_t readCount;
156 int finishedInput;
157 int finishedOutput;
158 } ZstdDecompressorIterator;
159
160 extern PyTypeObject ZstdDecompressorIteratorType;
161
162 typedef struct {
163 int errored;
164 PyObject* chunk;
165 } DecompressorIteratorResult;
166
167 void ztopy_compression_parameters(CompressionParametersObject* params, ZSTD_compressionParameters* zparams);
168 CompressionParametersObject* get_compression_parameters(PyObject* self, PyObject* args);
169 PyObject* estimate_compression_context_size(PyObject* self, PyObject* args);
170 ZSTD_CStream* CStream_from_ZstdCompressor(ZstdCompressor* compressor, Py_ssize_t sourceSize);
171 ZSTD_DStream* DStream_from_ZstdDecompressor(ZstdDecompressor* decompressor);
172 ZstdCompressionDict* train_dictionary(PyObject* self, PyObject* args, PyObject* kwargs);
@@ -0,0 +1,110 b''
1 # Copyright (c) 2016-present, Gregory Szorc
2 # All rights reserved.
3 #
4 # This software may be modified and distributed under the terms
5 # of the BSD license. See the LICENSE file for details.
6
7 from __future__ import absolute_import
8
9 import cffi
10 import os
11
12
13 HERE = os.path.abspath(os.path.dirname(__file__))
14
15 SOURCES = ['zstd/%s' % p for p in (
16 'common/entropy_common.c',
17 'common/error_private.c',
18 'common/fse_decompress.c',
19 'common/xxhash.c',
20 'common/zstd_common.c',
21 'compress/fse_compress.c',
22 'compress/huf_compress.c',
23 'compress/zbuff_compress.c',
24 'compress/zstd_compress.c',
25 'decompress/huf_decompress.c',
26 'decompress/zbuff_decompress.c',
27 'decompress/zstd_decompress.c',
28 'dictBuilder/divsufsort.c',
29 'dictBuilder/zdict.c',
30 )]
31
32 INCLUDE_DIRS = [os.path.join(HERE, d) for d in (
33 'zstd',
34 'zstd/common',
35 'zstd/compress',
36 'zstd/decompress',
37 'zstd/dictBuilder',
38 )]
39
40 with open(os.path.join(HERE, 'zstd', 'zstd.h'), 'rb') as fh:
41 zstd_h = fh.read()
42
43 ffi = cffi.FFI()
44 ffi.set_source('_zstd_cffi', '''
45 /* needed for typedefs like U32 references in zstd.h */
46 #include "mem.h"
47 #define ZSTD_STATIC_LINKING_ONLY
48 #include "zstd.h"
49 ''',
50 sources=SOURCES, include_dirs=INCLUDE_DIRS)
51
52 # Rather than define the API definitions from zstd.h inline, munge the
53 # source in a way that cdef() will accept.
54 lines = zstd_h.splitlines()
55 lines = [l.rstrip() for l in lines if l.strip()]
56
57 # Strip preprocessor directives - they aren't important for our needs.
58 lines = [l for l in lines
59 if not l.startswith((b'#if', b'#else', b'#endif', b'#include'))]
60
61 # Remove extern C block
62 lines = [l for l in lines if l not in (b'extern "C" {', b'}')]
63
64 # The version #defines don't parse and aren't necessary. Strip them.
65 lines = [l for l in lines if not l.startswith((
66 b'#define ZSTD_H_235446',
67 b'#define ZSTD_LIB_VERSION',
68 b'#define ZSTD_QUOTE',
69 b'#define ZSTD_EXPAND_AND_QUOTE',
70 b'#define ZSTD_VERSION_STRING',
71 b'#define ZSTD_VERSION_NUMBER'))]
72
73 # The C parser also doesn't like some constant defines referencing
74 # other constants.
75 # TODO we pick the 64-bit constants here. We should assert somewhere
76 # we're compiling for 64-bit.
77 def fix_constants(l):
78 if l.startswith(b'#define ZSTD_WINDOWLOG_MAX '):
79 return b'#define ZSTD_WINDOWLOG_MAX 27'
80 elif l.startswith(b'#define ZSTD_CHAINLOG_MAX '):
81 return b'#define ZSTD_CHAINLOG_MAX 28'
82 elif l.startswith(b'#define ZSTD_HASHLOG_MAX '):
83 return b'#define ZSTD_HASHLOG_MAX 27'
84 elif l.startswith(b'#define ZSTD_CHAINLOG_MAX '):
85 return b'#define ZSTD_CHAINLOG_MAX 28'
86 elif l.startswith(b'#define ZSTD_CHAINLOG_MIN '):
87 return b'#define ZSTD_CHAINLOG_MIN 6'
88 elif l.startswith(b'#define ZSTD_SEARCHLOG_MAX '):
89 return b'#define ZSTD_SEARCHLOG_MAX 26'
90 elif l.startswith(b'#define ZSTD_BLOCKSIZE_ABSOLUTEMAX '):
91 return b'#define ZSTD_BLOCKSIZE_ABSOLUTEMAX 131072'
92 else:
93 return l
94 lines = map(fix_constants, lines)
95
96 # ZSTDLIB_API isn't handled correctly. Strip it.
97 lines = [l for l in lines if not l.startswith(b'# define ZSTDLIB_API')]
98 def strip_api(l):
99 if l.startswith(b'ZSTDLIB_API '):
100 return l[len(b'ZSTDLIB_API '):]
101 else:
102 return l
103 lines = map(strip_api, lines)
104
105 source = b'\n'.join(lines)
106 ffi.cdef(source.decode('latin1'))
107
108
109 if __name__ == '__main__':
110 ffi.compile()
@@ -0,0 +1,62 b''
1 #!/usr/bin/env python
2 # Copyright (c) 2016-present, Gregory Szorc
3 # All rights reserved.
4 #
5 # This software may be modified and distributed under the terms
6 # of the BSD license. See the LICENSE file for details.
7
8 from setuptools import setup
9
10 try:
11 import cffi
12 except ImportError:
13 cffi = None
14
15 import setup_zstd
16
17 # Code for obtaining the Extension instance is in its own module to
18 # facilitate reuse in other projects.
19 extensions = [setup_zstd.get_c_extension()]
20
21 if cffi:
22 import make_cffi
23 extensions.append(make_cffi.ffi.distutils_extension())
24
25 version = None
26
27 with open('c-ext/python-zstandard.h', 'r') as fh:
28 for line in fh:
29 if not line.startswith('#define PYTHON_ZSTANDARD_VERSION'):
30 continue
31
32 version = line.split()[2][1:-1]
33 break
34
35 if not version:
36 raise Exception('could not resolve package version; '
37 'this should never happen')
38
39 setup(
40 name='zstandard',
41 version=version,
42 description='Zstandard bindings for Python',
43 long_description=open('README.rst', 'r').read(),
44 url='https://github.com/indygreg/python-zstandard',
45 author='Gregory Szorc',
46 author_email='gregory.szorc@gmail.com',
47 license='BSD',
48 classifiers=[
49 'Development Status :: 4 - Beta',
50 'Intended Audience :: Developers',
51 'License :: OSI Approved :: BSD License',
52 'Programming Language :: C',
53 'Programming Language :: Python :: 2.6',
54 'Programming Language :: Python :: 2.7',
55 'Programming Language :: Python :: 3.3',
56 'Programming Language :: Python :: 3.4',
57 'Programming Language :: Python :: 3.5',
58 ],
59 keywords='zstandard zstd compression',
60 ext_modules=extensions,
61 test_suite='tests',
62 )
@@ -0,0 +1,64 b''
1 # Copyright (c) 2016-present, Gregory Szorc
2 # All rights reserved.
3 #
4 # This software may be modified and distributed under the terms
5 # of the BSD license. See the LICENSE file for details.
6
7 import os
8 from distutils.extension import Extension
9
10
11 zstd_sources = ['zstd/%s' % p for p in (
12 'common/entropy_common.c',
13 'common/error_private.c',
14 'common/fse_decompress.c',
15 'common/xxhash.c',
16 'common/zstd_common.c',
17 'compress/fse_compress.c',
18 'compress/huf_compress.c',
19 'compress/zbuff_compress.c',
20 'compress/zstd_compress.c',
21 'decompress/huf_decompress.c',
22 'decompress/zbuff_decompress.c',
23 'decompress/zstd_decompress.c',
24 'dictBuilder/divsufsort.c',
25 'dictBuilder/zdict.c',
26 )]
27
28
29 zstd_includes = [
30 'c-ext',
31 'zstd',
32 'zstd/common',
33 'zstd/compress',
34 'zstd/decompress',
35 'zstd/dictBuilder',
36 ]
37
38 ext_sources = [
39 'zstd.c',
40 'c-ext/compressiondict.c',
41 'c-ext/compressobj.c',
42 'c-ext/compressor.c',
43 'c-ext/compressoriterator.c',
44 'c-ext/compressionparams.c',
45 'c-ext/compressionwriter.c',
46 'c-ext/constants.c',
47 'c-ext/decompressobj.c',
48 'c-ext/decompressor.c',
49 'c-ext/decompressoriterator.c',
50 'c-ext/decompressionwriter.c',
51 'c-ext/dictparams.c',
52 ]
53
54
55 def get_c_extension(name='zstd'):
56 """Obtain a distutils.extension.Extension for the C extension."""
57 root = os.path.abspath(os.path.dirname(__file__))
58
59 sources = [os.path.join(root, p) for p in zstd_sources + ext_sources]
60 include_dirs = [os.path.join(root, d) for d in zstd_includes]
61
62 # TODO compile with optimizations.
63 return Extension(name, sources,
64 include_dirs=include_dirs)
1 NO CONTENT: new file 100644
NO CONTENT: new file 100644
@@ -0,0 +1,15 b''
1 import io
2
3 class OpCountingBytesIO(io.BytesIO):
4 def __init__(self, *args, **kwargs):
5 self._read_count = 0
6 self._write_count = 0
7 return super(OpCountingBytesIO, self).__init__(*args, **kwargs)
8
9 def read(self, *args):
10 self._read_count += 1
11 return super(OpCountingBytesIO, self).read(*args)
12
13 def write(self, data):
14 self._write_count += 1
15 return super(OpCountingBytesIO, self).write(data)
@@ -0,0 +1,35 b''
1 import io
2
3 try:
4 import unittest2 as unittest
5 except ImportError:
6 import unittest
7
8 import zstd
9
10 try:
11 import zstd_cffi
12 except ImportError:
13 raise unittest.SkipTest('cffi version of zstd not available')
14
15
16 class TestCFFIWriteToToCDecompressor(unittest.TestCase):
17 def test_simple(self):
18 orig = io.BytesIO()
19 orig.write(b'foo')
20 orig.write(b'bar')
21 orig.write(b'foobar' * 16384)
22
23 dest = io.BytesIO()
24 cctx = zstd_cffi.ZstdCompressor()
25 with cctx.write_to(dest) as compressor:
26 compressor.write(orig.getvalue())
27
28 uncompressed = io.BytesIO()
29 dctx = zstd.ZstdDecompressor()
30 with dctx.write_to(uncompressed) as decompressor:
31 decompressor.write(dest.getvalue())
32
33 self.assertEqual(uncompressed.getvalue(), orig.getvalue())
34
35
@@ -0,0 +1,465 b''
1 import hashlib
2 import io
3 import struct
4 import sys
5
6 try:
7 import unittest2 as unittest
8 except ImportError:
9 import unittest
10
11 import zstd
12
13 from .common import OpCountingBytesIO
14
15
16 if sys.version_info[0] >= 3:
17 next = lambda it: it.__next__()
18 else:
19 next = lambda it: it.next()
20
21
22 class TestCompressor(unittest.TestCase):
23 def test_level_bounds(self):
24 with self.assertRaises(ValueError):
25 zstd.ZstdCompressor(level=0)
26
27 with self.assertRaises(ValueError):
28 zstd.ZstdCompressor(level=23)
29
30
31 class TestCompressor_compress(unittest.TestCase):
32 def test_compress_empty(self):
33 cctx = zstd.ZstdCompressor(level=1)
34 cctx.compress(b'')
35
36 cctx = zstd.ZstdCompressor(level=22)
37 cctx.compress(b'')
38
39 def test_compress_empty(self):
40 cctx = zstd.ZstdCompressor(level=1)
41 self.assertEqual(cctx.compress(b''),
42 b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
43
44 def test_compress_large(self):
45 chunks = []
46 for i in range(255):
47 chunks.append(struct.Struct('>B').pack(i) * 16384)
48
49 cctx = zstd.ZstdCompressor(level=3)
50 result = cctx.compress(b''.join(chunks))
51 self.assertEqual(len(result), 999)
52 self.assertEqual(result[0:4], b'\x28\xb5\x2f\xfd')
53
54 def test_write_checksum(self):
55 cctx = zstd.ZstdCompressor(level=1)
56 no_checksum = cctx.compress(b'foobar')
57 cctx = zstd.ZstdCompressor(level=1, write_checksum=True)
58 with_checksum = cctx.compress(b'foobar')
59
60 self.assertEqual(len(with_checksum), len(no_checksum) + 4)
61
62 def test_write_content_size(self):
63 cctx = zstd.ZstdCompressor(level=1)
64 no_size = cctx.compress(b'foobar' * 256)
65 cctx = zstd.ZstdCompressor(level=1, write_content_size=True)
66 with_size = cctx.compress(b'foobar' * 256)
67
68 self.assertEqual(len(with_size), len(no_size) + 1)
69
70 def test_no_dict_id(self):
71 samples = []
72 for i in range(128):
73 samples.append(b'foo' * 64)
74 samples.append(b'bar' * 64)
75 samples.append(b'foobar' * 64)
76
77 d = zstd.train_dictionary(1024, samples)
78
79 cctx = zstd.ZstdCompressor(level=1, dict_data=d)
80 with_dict_id = cctx.compress(b'foobarfoobar')
81
82 cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_dict_id=False)
83 no_dict_id = cctx.compress(b'foobarfoobar')
84
85 self.assertEqual(len(with_dict_id), len(no_dict_id) + 4)
86
87 def test_compress_dict_multiple(self):
88 samples = []
89 for i in range(128):
90 samples.append(b'foo' * 64)
91 samples.append(b'bar' * 64)
92 samples.append(b'foobar' * 64)
93
94 d = zstd.train_dictionary(8192, samples)
95
96 cctx = zstd.ZstdCompressor(level=1, dict_data=d)
97
98 for i in range(32):
99 cctx.compress(b'foo bar foobar foo bar foobar')
100
101
102 class TestCompressor_compressobj(unittest.TestCase):
103 def test_compressobj_empty(self):
104 cctx = zstd.ZstdCompressor(level=1)
105 cobj = cctx.compressobj()
106 self.assertEqual(cobj.compress(b''), b'')
107 self.assertEqual(cobj.flush(),
108 b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
109
110 def test_compressobj_large(self):
111 chunks = []
112 for i in range(255):
113 chunks.append(struct.Struct('>B').pack(i) * 16384)
114
115 cctx = zstd.ZstdCompressor(level=3)
116 cobj = cctx.compressobj()
117
118 result = cobj.compress(b''.join(chunks)) + cobj.flush()
119 self.assertEqual(len(result), 999)
120 self.assertEqual(result[0:4], b'\x28\xb5\x2f\xfd')
121
122 def test_write_checksum(self):
123 cctx = zstd.ZstdCompressor(level=1)
124 cobj = cctx.compressobj()
125 no_checksum = cobj.compress(b'foobar') + cobj.flush()
126 cctx = zstd.ZstdCompressor(level=1, write_checksum=True)
127 cobj = cctx.compressobj()
128 with_checksum = cobj.compress(b'foobar') + cobj.flush()
129
130 self.assertEqual(len(with_checksum), len(no_checksum) + 4)
131
132 def test_write_content_size(self):
133 cctx = zstd.ZstdCompressor(level=1)
134 cobj = cctx.compressobj(size=len(b'foobar' * 256))
135 no_size = cobj.compress(b'foobar' * 256) + cobj.flush()
136 cctx = zstd.ZstdCompressor(level=1, write_content_size=True)
137 cobj = cctx.compressobj(size=len(b'foobar' * 256))
138 with_size = cobj.compress(b'foobar' * 256) + cobj.flush()
139
140 self.assertEqual(len(with_size), len(no_size) + 1)
141
142 def test_compress_after_flush(self):
143 cctx = zstd.ZstdCompressor()
144 cobj = cctx.compressobj()
145
146 cobj.compress(b'foo')
147 cobj.flush()
148
149 with self.assertRaisesRegexp(zstd.ZstdError, 'cannot call compress\(\) after flush'):
150 cobj.compress(b'foo')
151
152 with self.assertRaisesRegexp(zstd.ZstdError, 'flush\(\) already called'):
153 cobj.flush()
154
155
156 class TestCompressor_copy_stream(unittest.TestCase):
157 def test_no_read(self):
158 source = object()
159 dest = io.BytesIO()
160
161 cctx = zstd.ZstdCompressor()
162 with self.assertRaises(ValueError):
163 cctx.copy_stream(source, dest)
164
165 def test_no_write(self):
166 source = io.BytesIO()
167 dest = object()
168
169 cctx = zstd.ZstdCompressor()
170 with self.assertRaises(ValueError):
171 cctx.copy_stream(source, dest)
172
173 def test_empty(self):
174 source = io.BytesIO()
175 dest = io.BytesIO()
176
177 cctx = zstd.ZstdCompressor(level=1)
178 r, w = cctx.copy_stream(source, dest)
179 self.assertEqual(int(r), 0)
180 self.assertEqual(w, 9)
181
182 self.assertEqual(dest.getvalue(),
183 b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
184
185 def test_large_data(self):
186 source = io.BytesIO()
187 for i in range(255):
188 source.write(struct.Struct('>B').pack(i) * 16384)
189 source.seek(0)
190
191 dest = io.BytesIO()
192 cctx = zstd.ZstdCompressor()
193 r, w = cctx.copy_stream(source, dest)
194
195 self.assertEqual(r, 255 * 16384)
196 self.assertEqual(w, 999)
197
198 def test_write_checksum(self):
199 source = io.BytesIO(b'foobar')
200 no_checksum = io.BytesIO()
201
202 cctx = zstd.ZstdCompressor(level=1)
203 cctx.copy_stream(source, no_checksum)
204
205 source.seek(0)
206 with_checksum = io.BytesIO()
207 cctx = zstd.ZstdCompressor(level=1, write_checksum=True)
208 cctx.copy_stream(source, with_checksum)
209
210 self.assertEqual(len(with_checksum.getvalue()),
211 len(no_checksum.getvalue()) + 4)
212
213 def test_write_content_size(self):
214 source = io.BytesIO(b'foobar' * 256)
215 no_size = io.BytesIO()
216
217 cctx = zstd.ZstdCompressor(level=1)
218 cctx.copy_stream(source, no_size)
219
220 source.seek(0)
221 with_size = io.BytesIO()
222 cctx = zstd.ZstdCompressor(level=1, write_content_size=True)
223 cctx.copy_stream(source, with_size)
224
225 # Source content size is unknown, so no content size written.
226 self.assertEqual(len(with_size.getvalue()),
227 len(no_size.getvalue()))
228
229 source.seek(0)
230 with_size = io.BytesIO()
231 cctx.copy_stream(source, with_size, size=len(source.getvalue()))
232
233 # We specified source size, so content size header is present.
234 self.assertEqual(len(with_size.getvalue()),
235 len(no_size.getvalue()) + 1)
236
237 def test_read_write_size(self):
238 source = OpCountingBytesIO(b'foobarfoobar')
239 dest = OpCountingBytesIO()
240 cctx = zstd.ZstdCompressor()
241 r, w = cctx.copy_stream(source, dest, read_size=1, write_size=1)
242
243 self.assertEqual(r, len(source.getvalue()))
244 self.assertEqual(w, 21)
245 self.assertEqual(source._read_count, len(source.getvalue()) + 1)
246 self.assertEqual(dest._write_count, len(dest.getvalue()))
247
248
249 def compress(data, level):
250 buffer = io.BytesIO()
251 cctx = zstd.ZstdCompressor(level=level)
252 with cctx.write_to(buffer) as compressor:
253 compressor.write(data)
254 return buffer.getvalue()
255
256
257 class TestCompressor_write_to(unittest.TestCase):
258 def test_empty(self):
259 self.assertEqual(compress(b'', 1),
260 b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
261
262 def test_multiple_compress(self):
263 buffer = io.BytesIO()
264 cctx = zstd.ZstdCompressor(level=5)
265 with cctx.write_to(buffer) as compressor:
266 compressor.write(b'foo')
267 compressor.write(b'bar')
268 compressor.write(b'x' * 8192)
269
270 result = buffer.getvalue()
271 self.assertEqual(result,
272 b'\x28\xb5\x2f\xfd\x00\x50\x75\x00\x00\x38\x66\x6f'
273 b'\x6f\x62\x61\x72\x78\x01\x00\xfc\xdf\x03\x23')
274
275 def test_dictionary(self):
276 samples = []
277 for i in range(128):
278 samples.append(b'foo' * 64)
279 samples.append(b'bar' * 64)
280 samples.append(b'foobar' * 64)
281
282 d = zstd.train_dictionary(8192, samples)
283
284 buffer = io.BytesIO()
285 cctx = zstd.ZstdCompressor(level=9, dict_data=d)
286 with cctx.write_to(buffer) as compressor:
287 compressor.write(b'foo')
288 compressor.write(b'bar')
289 compressor.write(b'foo' * 16384)
290
291 compressed = buffer.getvalue()
292 h = hashlib.sha1(compressed).hexdigest()
293 self.assertEqual(h, '1c5bcd25181bcd8c1a73ea8773323e0056129f92')
294
295 def test_compression_params(self):
296 params = zstd.CompressionParameters(20, 6, 12, 5, 4, 10, zstd.STRATEGY_FAST)
297
298 buffer = io.BytesIO()
299 cctx = zstd.ZstdCompressor(compression_params=params)
300 with cctx.write_to(buffer) as compressor:
301 compressor.write(b'foo')
302 compressor.write(b'bar')
303 compressor.write(b'foobar' * 16384)
304
305 compressed = buffer.getvalue()
306 h = hashlib.sha1(compressed).hexdigest()
307 self.assertEqual(h, '1ae31f270ed7de14235221a604b31ecd517ebd99')
308
309 def test_write_checksum(self):
310 no_checksum = io.BytesIO()
311 cctx = zstd.ZstdCompressor(level=1)
312 with cctx.write_to(no_checksum) as compressor:
313 compressor.write(b'foobar')
314
315 with_checksum = io.BytesIO()
316 cctx = zstd.ZstdCompressor(level=1, write_checksum=True)
317 with cctx.write_to(with_checksum) as compressor:
318 compressor.write(b'foobar')
319
320 self.assertEqual(len(with_checksum.getvalue()),
321 len(no_checksum.getvalue()) + 4)
322
323 def test_write_content_size(self):
324 no_size = io.BytesIO()
325 cctx = zstd.ZstdCompressor(level=1)
326 with cctx.write_to(no_size) as compressor:
327 compressor.write(b'foobar' * 256)
328
329 with_size = io.BytesIO()
330 cctx = zstd.ZstdCompressor(level=1, write_content_size=True)
331 with cctx.write_to(with_size) as compressor:
332 compressor.write(b'foobar' * 256)
333
334 # Source size is not known in streaming mode, so header not
335 # written.
336 self.assertEqual(len(with_size.getvalue()),
337 len(no_size.getvalue()))
338
339 # Declaring size will write the header.
340 with_size = io.BytesIO()
341 with cctx.write_to(with_size, size=len(b'foobar' * 256)) as compressor:
342 compressor.write(b'foobar' * 256)
343
344 self.assertEqual(len(with_size.getvalue()),
345 len(no_size.getvalue()) + 1)
346
347 def test_no_dict_id(self):
348 samples = []
349 for i in range(128):
350 samples.append(b'foo' * 64)
351 samples.append(b'bar' * 64)
352 samples.append(b'foobar' * 64)
353
354 d = zstd.train_dictionary(1024, samples)
355
356 with_dict_id = io.BytesIO()
357 cctx = zstd.ZstdCompressor(level=1, dict_data=d)
358 with cctx.write_to(with_dict_id) as compressor:
359 compressor.write(b'foobarfoobar')
360
361 cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_dict_id=False)
362 no_dict_id = io.BytesIO()
363 with cctx.write_to(no_dict_id) as compressor:
364 compressor.write(b'foobarfoobar')
365
366 self.assertEqual(len(with_dict_id.getvalue()),
367 len(no_dict_id.getvalue()) + 4)
368
369 def test_memory_size(self):
370 cctx = zstd.ZstdCompressor(level=3)
371 buffer = io.BytesIO()
372 with cctx.write_to(buffer) as compressor:
373 size = compressor.memory_size()
374
375 self.assertGreater(size, 100000)
376
377 def test_write_size(self):
378 cctx = zstd.ZstdCompressor(level=3)
379 dest = OpCountingBytesIO()
380 with cctx.write_to(dest, write_size=1) as compressor:
381 compressor.write(b'foo')
382 compressor.write(b'bar')
383 compressor.write(b'foobar')
384
385 self.assertEqual(len(dest.getvalue()), dest._write_count)
386
387
388 class TestCompressor_read_from(unittest.TestCase):
389 def test_type_validation(self):
390 cctx = zstd.ZstdCompressor()
391
392 # Object with read() works.
393 cctx.read_from(io.BytesIO())
394
395 # Buffer protocol works.
396 cctx.read_from(b'foobar')
397
398 with self.assertRaisesRegexp(ValueError, 'must pass an object with a read'):
399 cctx.read_from(True)
400
401 def test_read_empty(self):
402 cctx = zstd.ZstdCompressor(level=1)
403
404 source = io.BytesIO()
405 it = cctx.read_from(source)
406 chunks = list(it)
407 self.assertEqual(len(chunks), 1)
408 compressed = b''.join(chunks)
409 self.assertEqual(compressed, b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
410
411 # And again with the buffer protocol.
412 it = cctx.read_from(b'')
413 chunks = list(it)
414 self.assertEqual(len(chunks), 1)
415 compressed2 = b''.join(chunks)
416 self.assertEqual(compressed2, compressed)
417
418 def test_read_large(self):
419 cctx = zstd.ZstdCompressor(level=1)
420
421 source = io.BytesIO()
422 source.write(b'f' * zstd.COMPRESSION_RECOMMENDED_INPUT_SIZE)
423 source.write(b'o')
424 source.seek(0)
425
426 # Creating an iterator should not perform any compression until
427 # first read.
428 it = cctx.read_from(source, size=len(source.getvalue()))
429 self.assertEqual(source.tell(), 0)
430
431 # We should have exactly 2 output chunks.
432 chunks = []
433 chunk = next(it)
434 self.assertIsNotNone(chunk)
435 self.assertEqual(source.tell(), zstd.COMPRESSION_RECOMMENDED_INPUT_SIZE)
436 chunks.append(chunk)
437 chunk = next(it)
438 self.assertIsNotNone(chunk)
439 chunks.append(chunk)
440
441 self.assertEqual(source.tell(), len(source.getvalue()))
442
443 with self.assertRaises(StopIteration):
444 next(it)
445
446 # And again for good measure.
447 with self.assertRaises(StopIteration):
448 next(it)
449
450 # We should get the same output as the one-shot compression mechanism.
451 self.assertEqual(b''.join(chunks), cctx.compress(source.getvalue()))
452
453 # Now check the buffer protocol.
454 it = cctx.read_from(source.getvalue())
455 chunks = list(it)
456 self.assertEqual(len(chunks), 2)
457 self.assertEqual(b''.join(chunks), cctx.compress(source.getvalue()))
458
459 def test_read_write_size(self):
460 source = OpCountingBytesIO(b'foobarfoobar')
461 cctx = zstd.ZstdCompressor(level=3)
462 for chunk in cctx.read_from(source, read_size=1, write_size=1):
463 self.assertEqual(len(chunk), 1)
464
465 self.assertEqual(source._read_count, len(source.getvalue()) + 1)
@@ -0,0 +1,107 b''
1 import io
2
3 try:
4 import unittest2 as unittest
5 except ImportError:
6 import unittest
7
8 try:
9 import hypothesis
10 import hypothesis.strategies as strategies
11 except ImportError:
12 hypothesis = None
13
14 import zstd
15
16 class TestCompressionParameters(unittest.TestCase):
17 def test_init_bad_arg_type(self):
18 with self.assertRaises(TypeError):
19 zstd.CompressionParameters()
20
21 with self.assertRaises(TypeError):
22 zstd.CompressionParameters(0, 1)
23
24 def test_bounds(self):
25 zstd.CompressionParameters(zstd.WINDOWLOG_MIN,
26 zstd.CHAINLOG_MIN,
27 zstd.HASHLOG_MIN,
28 zstd.SEARCHLOG_MIN,
29 zstd.SEARCHLENGTH_MIN,
30 zstd.TARGETLENGTH_MIN,
31 zstd.STRATEGY_FAST)
32
33 zstd.CompressionParameters(zstd.WINDOWLOG_MAX,
34 zstd.CHAINLOG_MAX,
35 zstd.HASHLOG_MAX,
36 zstd.SEARCHLOG_MAX,
37 zstd.SEARCHLENGTH_MAX,
38 zstd.TARGETLENGTH_MAX,
39 zstd.STRATEGY_BTOPT)
40
41 def test_get_compression_parameters(self):
42 p = zstd.get_compression_parameters(1)
43 self.assertIsInstance(p, zstd.CompressionParameters)
44
45 self.assertEqual(p[0], 19)
46
47 if hypothesis:
48 s_windowlog = strategies.integers(min_value=zstd.WINDOWLOG_MIN,
49 max_value=zstd.WINDOWLOG_MAX)
50 s_chainlog = strategies.integers(min_value=zstd.CHAINLOG_MIN,
51 max_value=zstd.CHAINLOG_MAX)
52 s_hashlog = strategies.integers(min_value=zstd.HASHLOG_MIN,
53 max_value=zstd.HASHLOG_MAX)
54 s_searchlog = strategies.integers(min_value=zstd.SEARCHLOG_MIN,
55 max_value=zstd.SEARCHLOG_MAX)
56 s_searchlength = strategies.integers(min_value=zstd.SEARCHLENGTH_MIN,
57 max_value=zstd.SEARCHLENGTH_MAX)
58 s_targetlength = strategies.integers(min_value=zstd.TARGETLENGTH_MIN,
59 max_value=zstd.TARGETLENGTH_MAX)
60 s_strategy = strategies.sampled_from((zstd.STRATEGY_FAST,
61 zstd.STRATEGY_DFAST,
62 zstd.STRATEGY_GREEDY,
63 zstd.STRATEGY_LAZY,
64 zstd.STRATEGY_LAZY2,
65 zstd.STRATEGY_BTLAZY2,
66 zstd.STRATEGY_BTOPT))
67
68 class TestCompressionParametersHypothesis(unittest.TestCase):
69 @hypothesis.given(s_windowlog, s_chainlog, s_hashlog, s_searchlog,
70 s_searchlength, s_targetlength, s_strategy)
71 def test_valid_init(self, windowlog, chainlog, hashlog, searchlog,
72 searchlength, targetlength, strategy):
73 p = zstd.CompressionParameters(windowlog, chainlog, hashlog,
74 searchlog, searchlength,
75 targetlength, strategy)
76 self.assertEqual(tuple(p),
77 (windowlog, chainlog, hashlog, searchlog,
78 searchlength, targetlength, strategy))
79
80 # Verify we can instantiate a compressor with the supplied values.
81 # ZSTD_checkCParams moves the goal posts on us from what's advertised
82 # in the constants. So move along with them.
83 if searchlength == zstd.SEARCHLENGTH_MIN and strategy in (zstd.STRATEGY_FAST, zstd.STRATEGY_GREEDY):
84 searchlength += 1
85 p = zstd.CompressionParameters(windowlog, chainlog, hashlog,
86 searchlog, searchlength,
87 targetlength, strategy)
88 elif searchlength == zstd.SEARCHLENGTH_MAX and strategy != zstd.STRATEGY_FAST:
89 searchlength -= 1
90 p = zstd.CompressionParameters(windowlog, chainlog, hashlog,
91 searchlog, searchlength,
92 targetlength, strategy)
93
94 cctx = zstd.ZstdCompressor(compression_params=p)
95 with cctx.write_to(io.BytesIO()):
96 pass
97
98 @hypothesis.given(s_windowlog, s_chainlog, s_hashlog, s_searchlog,
99 s_searchlength, s_targetlength, s_strategy)
100 def test_estimate_compression_context_size(self, windowlog, chainlog,
101 hashlog, searchlog,
102 searchlength, targetlength,
103 strategy):
104 p = zstd.CompressionParameters(windowlog, chainlog, hashlog,
105 searchlog, searchlength,
106 targetlength, strategy)
107 size = zstd.estimate_compression_context_size(p)
@@ -0,0 +1,478 b''
1 import io
2 import random
3 import struct
4 import sys
5
6 try:
7 import unittest2 as unittest
8 except ImportError:
9 import unittest
10
11 import zstd
12
13 from .common import OpCountingBytesIO
14
15
16 if sys.version_info[0] >= 3:
17 next = lambda it: it.__next__()
18 else:
19 next = lambda it: it.next()
20
21
22 class TestDecompressor_decompress(unittest.TestCase):
23 def test_empty_input(self):
24 dctx = zstd.ZstdDecompressor()
25
26 with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'):
27 dctx.decompress(b'')
28
29 def test_invalid_input(self):
30 dctx = zstd.ZstdDecompressor()
31
32 with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'):
33 dctx.decompress(b'foobar')
34
35 def test_no_content_size_in_frame(self):
36 cctx = zstd.ZstdCompressor(write_content_size=False)
37 compressed = cctx.compress(b'foobar')
38
39 dctx = zstd.ZstdDecompressor()
40 with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'):
41 dctx.decompress(compressed)
42
43 def test_content_size_present(self):
44 cctx = zstd.ZstdCompressor(write_content_size=True)
45 compressed = cctx.compress(b'foobar')
46
47 dctx = zstd.ZstdDecompressor()
48 decompressed = dctx.decompress(compressed)
49 self.assertEqual(decompressed, b'foobar')
50
51 def test_max_output_size(self):
52 cctx = zstd.ZstdCompressor(write_content_size=False)
53 source = b'foobar' * 256
54 compressed = cctx.compress(source)
55
56 dctx = zstd.ZstdDecompressor()
57 # Will fit into buffer exactly the size of input.
58 decompressed = dctx.decompress(compressed, max_output_size=len(source))
59 self.assertEqual(decompressed, source)
60
61 # Input size - 1 fails
62 with self.assertRaisesRegexp(zstd.ZstdError, 'Destination buffer is too small'):
63 dctx.decompress(compressed, max_output_size=len(source) - 1)
64
65 # Input size + 1 works
66 decompressed = dctx.decompress(compressed, max_output_size=len(source) + 1)
67 self.assertEqual(decompressed, source)
68
69 # A much larger buffer works.
70 decompressed = dctx.decompress(compressed, max_output_size=len(source) * 64)
71 self.assertEqual(decompressed, source)
72
73 def test_stupidly_large_output_buffer(self):
74 cctx = zstd.ZstdCompressor(write_content_size=False)
75 compressed = cctx.compress(b'foobar' * 256)
76 dctx = zstd.ZstdDecompressor()
77
78 # Will get OverflowError on some Python distributions that can't
79 # handle really large integers.
80 with self.assertRaises((MemoryError, OverflowError)):
81 dctx.decompress(compressed, max_output_size=2**62)
82
83 def test_dictionary(self):
84 samples = []
85 for i in range(128):
86 samples.append(b'foo' * 64)
87 samples.append(b'bar' * 64)
88 samples.append(b'foobar' * 64)
89
90 d = zstd.train_dictionary(8192, samples)
91
92 orig = b'foobar' * 16384
93 cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_content_size=True)
94 compressed = cctx.compress(orig)
95
96 dctx = zstd.ZstdDecompressor(dict_data=d)
97 decompressed = dctx.decompress(compressed)
98
99 self.assertEqual(decompressed, orig)
100
101 def test_dictionary_multiple(self):
102 samples = []
103 for i in range(128):
104 samples.append(b'foo' * 64)
105 samples.append(b'bar' * 64)
106 samples.append(b'foobar' * 64)
107
108 d = zstd.train_dictionary(8192, samples)
109
110 sources = (b'foobar' * 8192, b'foo' * 8192, b'bar' * 8192)
111 compressed = []
112 cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_content_size=True)
113 for source in sources:
114 compressed.append(cctx.compress(source))
115
116 dctx = zstd.ZstdDecompressor(dict_data=d)
117 for i in range(len(sources)):
118 decompressed = dctx.decompress(compressed[i])
119 self.assertEqual(decompressed, sources[i])
120
121
122 class TestDecompressor_copy_stream(unittest.TestCase):
123 def test_no_read(self):
124 source = object()
125 dest = io.BytesIO()
126
127 dctx = zstd.ZstdDecompressor()
128 with self.assertRaises(ValueError):
129 dctx.copy_stream(source, dest)
130
131 def test_no_write(self):
132 source = io.BytesIO()
133 dest = object()
134
135 dctx = zstd.ZstdDecompressor()
136 with self.assertRaises(ValueError):
137 dctx.copy_stream(source, dest)
138
139 def test_empty(self):
140 source = io.BytesIO()
141 dest = io.BytesIO()
142
143 dctx = zstd.ZstdDecompressor()
144 # TODO should this raise an error?
145 r, w = dctx.copy_stream(source, dest)
146
147 self.assertEqual(r, 0)
148 self.assertEqual(w, 0)
149 self.assertEqual(dest.getvalue(), b'')
150
151 def test_large_data(self):
152 source = io.BytesIO()
153 for i in range(255):
154 source.write(struct.Struct('>B').pack(i) * 16384)
155 source.seek(0)
156
157 compressed = io.BytesIO()
158 cctx = zstd.ZstdCompressor()
159 cctx.copy_stream(source, compressed)
160
161 compressed.seek(0)
162 dest = io.BytesIO()
163 dctx = zstd.ZstdDecompressor()
164 r, w = dctx.copy_stream(compressed, dest)
165
166 self.assertEqual(r, len(compressed.getvalue()))
167 self.assertEqual(w, len(source.getvalue()))
168
169 def test_read_write_size(self):
170 source = OpCountingBytesIO(zstd.ZstdCompressor().compress(
171 b'foobarfoobar'))
172
173 dest = OpCountingBytesIO()
174 dctx = zstd.ZstdDecompressor()
175 r, w = dctx.copy_stream(source, dest, read_size=1, write_size=1)
176
177 self.assertEqual(r, len(source.getvalue()))
178 self.assertEqual(w, len(b'foobarfoobar'))
179 self.assertEqual(source._read_count, len(source.getvalue()) + 1)
180 self.assertEqual(dest._write_count, len(dest.getvalue()))
181
182
183 class TestDecompressor_decompressobj(unittest.TestCase):
184 def test_simple(self):
185 data = zstd.ZstdCompressor(level=1).compress(b'foobar')
186
187 dctx = zstd.ZstdDecompressor()
188 dobj = dctx.decompressobj()
189 self.assertEqual(dobj.decompress(data), b'foobar')
190
191 def test_reuse(self):
192 data = zstd.ZstdCompressor(level=1).compress(b'foobar')
193
194 dctx = zstd.ZstdDecompressor()
195 dobj = dctx.decompressobj()
196 dobj.decompress(data)
197
198 with self.assertRaisesRegexp(zstd.ZstdError, 'cannot use a decompressobj'):
199 dobj.decompress(data)
200
201
202 def decompress_via_writer(data):
203 buffer = io.BytesIO()
204 dctx = zstd.ZstdDecompressor()
205 with dctx.write_to(buffer) as decompressor:
206 decompressor.write(data)
207 return buffer.getvalue()
208
209
210 class TestDecompressor_write_to(unittest.TestCase):
211 def test_empty_roundtrip(self):
212 cctx = zstd.ZstdCompressor()
213 empty = cctx.compress(b'')
214 self.assertEqual(decompress_via_writer(empty), b'')
215
216 def test_large_roundtrip(self):
217 chunks = []
218 for i in range(255):
219 chunks.append(struct.Struct('>B').pack(i) * 16384)
220 orig = b''.join(chunks)
221 cctx = zstd.ZstdCompressor()
222 compressed = cctx.compress(orig)
223
224 self.assertEqual(decompress_via_writer(compressed), orig)
225
226 def test_multiple_calls(self):
227 chunks = []
228 for i in range(255):
229 for j in range(255):
230 chunks.append(struct.Struct('>B').pack(j) * i)
231
232 orig = b''.join(chunks)
233 cctx = zstd.ZstdCompressor()
234 compressed = cctx.compress(orig)
235
236 buffer = io.BytesIO()
237 dctx = zstd.ZstdDecompressor()
238 with dctx.write_to(buffer) as decompressor:
239 pos = 0
240 while pos < len(compressed):
241 pos2 = pos + 8192
242 decompressor.write(compressed[pos:pos2])
243 pos += 8192
244 self.assertEqual(buffer.getvalue(), orig)
245
246 def test_dictionary(self):
247 samples = []
248 for i in range(128):
249 samples.append(b'foo' * 64)
250 samples.append(b'bar' * 64)
251 samples.append(b'foobar' * 64)
252
253 d = zstd.train_dictionary(8192, samples)
254
255 orig = b'foobar' * 16384
256 buffer = io.BytesIO()
257 cctx = zstd.ZstdCompressor(dict_data=d)
258 with cctx.write_to(buffer) as compressor:
259 compressor.write(orig)
260
261 compressed = buffer.getvalue()
262 buffer = io.BytesIO()
263
264 dctx = zstd.ZstdDecompressor(dict_data=d)
265 with dctx.write_to(buffer) as decompressor:
266 decompressor.write(compressed)
267
268 self.assertEqual(buffer.getvalue(), orig)
269
270 def test_memory_size(self):
271 dctx = zstd.ZstdDecompressor()
272 buffer = io.BytesIO()
273 with dctx.write_to(buffer) as decompressor:
274 size = decompressor.memory_size()
275
276 self.assertGreater(size, 100000)
277
278 def test_write_size(self):
279 source = zstd.ZstdCompressor().compress(b'foobarfoobar')
280 dest = OpCountingBytesIO()
281 dctx = zstd.ZstdDecompressor()
282 with dctx.write_to(dest, write_size=1) as decompressor:
283 s = struct.Struct('>B')
284 for c in source:
285 if not isinstance(c, str):
286 c = s.pack(c)
287 decompressor.write(c)
288
289
290 self.assertEqual(dest.getvalue(), b'foobarfoobar')
291 self.assertEqual(dest._write_count, len(dest.getvalue()))
292
293
294 class TestDecompressor_read_from(unittest.TestCase):
295 def test_type_validation(self):
296 dctx = zstd.ZstdDecompressor()
297
298 # Object with read() works.
299 dctx.read_from(io.BytesIO())
300
301 # Buffer protocol works.
302 dctx.read_from(b'foobar')
303
304 with self.assertRaisesRegexp(ValueError, 'must pass an object with a read'):
305 dctx.read_from(True)
306
307 def test_empty_input(self):
308 dctx = zstd.ZstdDecompressor()
309
310 source = io.BytesIO()
311 it = dctx.read_from(source)
312 # TODO this is arguably wrong. Should get an error about missing frame foo.
313 with self.assertRaises(StopIteration):
314 next(it)
315
316 it = dctx.read_from(b'')
317 with self.assertRaises(StopIteration):
318 next(it)
319
320 def test_invalid_input(self):
321 dctx = zstd.ZstdDecompressor()
322
323 source = io.BytesIO(b'foobar')
324 it = dctx.read_from(source)
325 with self.assertRaisesRegexp(zstd.ZstdError, 'Unknown frame descriptor'):
326 next(it)
327
328 it = dctx.read_from(b'foobar')
329 with self.assertRaisesRegexp(zstd.ZstdError, 'Unknown frame descriptor'):
330 next(it)
331
332 def test_empty_roundtrip(self):
333 cctx = zstd.ZstdCompressor(level=1, write_content_size=False)
334 empty = cctx.compress(b'')
335
336 source = io.BytesIO(empty)
337 source.seek(0)
338
339 dctx = zstd.ZstdDecompressor()
340 it = dctx.read_from(source)
341
342 # No chunks should be emitted since there is no data.
343 with self.assertRaises(StopIteration):
344 next(it)
345
346 # Again for good measure.
347 with self.assertRaises(StopIteration):
348 next(it)
349
350 def test_skip_bytes_too_large(self):
351 dctx = zstd.ZstdDecompressor()
352
353 with self.assertRaisesRegexp(ValueError, 'skip_bytes must be smaller than read_size'):
354 dctx.read_from(b'', skip_bytes=1, read_size=1)
355
356 with self.assertRaisesRegexp(ValueError, 'skip_bytes larger than first input chunk'):
357 b''.join(dctx.read_from(b'foobar', skip_bytes=10))
358
359 def test_skip_bytes(self):
360 cctx = zstd.ZstdCompressor(write_content_size=False)
361 compressed = cctx.compress(b'foobar')
362
363 dctx = zstd.ZstdDecompressor()
364 output = b''.join(dctx.read_from(b'hdr' + compressed, skip_bytes=3))
365 self.assertEqual(output, b'foobar')
366
367 def test_large_output(self):
368 source = io.BytesIO()
369 source.write(b'f' * zstd.DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE)
370 source.write(b'o')
371 source.seek(0)
372
373 cctx = zstd.ZstdCompressor(level=1)
374 compressed = io.BytesIO(cctx.compress(source.getvalue()))
375 compressed.seek(0)
376
377 dctx = zstd.ZstdDecompressor()
378 it = dctx.read_from(compressed)
379
380 chunks = []
381 chunks.append(next(it))
382 chunks.append(next(it))
383
384 with self.assertRaises(StopIteration):
385 next(it)
386
387 decompressed = b''.join(chunks)
388 self.assertEqual(decompressed, source.getvalue())
389
390 # And again with buffer protocol.
391 it = dctx.read_from(compressed.getvalue())
392 chunks = []
393 chunks.append(next(it))
394 chunks.append(next(it))
395
396 with self.assertRaises(StopIteration):
397 next(it)
398
399 decompressed = b''.join(chunks)
400 self.assertEqual(decompressed, source.getvalue())
401
402 def test_large_input(self):
403 bytes = list(struct.Struct('>B').pack(i) for i in range(256))
404 compressed = io.BytesIO()
405 input_size = 0
406 cctx = zstd.ZstdCompressor(level=1)
407 with cctx.write_to(compressed) as compressor:
408 while True:
409 compressor.write(random.choice(bytes))
410 input_size += 1
411
412 have_compressed = len(compressed.getvalue()) > zstd.DECOMPRESSION_RECOMMENDED_INPUT_SIZE
413 have_raw = input_size > zstd.DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE * 2
414 if have_compressed and have_raw:
415 break
416
417 compressed.seek(0)
418 self.assertGreater(len(compressed.getvalue()),
419 zstd.DECOMPRESSION_RECOMMENDED_INPUT_SIZE)
420
421 dctx = zstd.ZstdDecompressor()
422 it = dctx.read_from(compressed)
423
424 chunks = []
425 chunks.append(next(it))
426 chunks.append(next(it))
427 chunks.append(next(it))
428
429 with self.assertRaises(StopIteration):
430 next(it)
431
432 decompressed = b''.join(chunks)
433 self.assertEqual(len(decompressed), input_size)
434
435 # And again with buffer protocol.
436 it = dctx.read_from(compressed.getvalue())
437
438 chunks = []
439 chunks.append(next(it))
440 chunks.append(next(it))
441 chunks.append(next(it))
442
443 with self.assertRaises(StopIteration):
444 next(it)
445
446 decompressed = b''.join(chunks)
447 self.assertEqual(len(decompressed), input_size)
448
449 def test_interesting(self):
450 # Found this edge case via fuzzing.
451 cctx = zstd.ZstdCompressor(level=1)
452
453 source = io.BytesIO()
454
455 compressed = io.BytesIO()
456 with cctx.write_to(compressed) as compressor:
457 for i in range(256):
458 chunk = b'\0' * 1024
459 compressor.write(chunk)
460 source.write(chunk)
461
462 dctx = zstd.ZstdDecompressor()
463
464 simple = dctx.decompress(compressed.getvalue(),
465 max_output_size=len(source.getvalue()))
466 self.assertEqual(simple, source.getvalue())
467
468 compressed.seek(0)
469 streamed = b''.join(dctx.read_from(compressed))
470 self.assertEqual(streamed, source.getvalue())
471
472 def test_read_write_size(self):
473 source = OpCountingBytesIO(zstd.ZstdCompressor().compress(b'foobarfoobar'))
474 dctx = zstd.ZstdDecompressor()
475 for chunk in dctx.read_from(source, read_size=1, write_size=1):
476 self.assertEqual(len(chunk), 1)
477
478 self.assertEqual(source._read_count, len(source.getvalue()))
@@ -0,0 +1,17 b''
1 try:
2 import unittest2 as unittest
3 except ImportError:
4 import unittest
5
6 import zstd
7
8
9 class TestSizes(unittest.TestCase):
10 def test_decompression_size(self):
11 size = zstd.estimate_decompression_context_size()
12 self.assertGreater(size, 100000)
13
14 def test_compression_size(self):
15 params = zstd.get_compression_parameters(3)
16 size = zstd.estimate_compression_context_size(params)
17 self.assertGreater(size, 100000)
@@ -0,0 +1,48 b''
1 from __future__ import unicode_literals
2
3 try:
4 import unittest2 as unittest
5 except ImportError:
6 import unittest
7
8 import zstd
9
10 class TestModuleAttributes(unittest.TestCase):
11 def test_version(self):
12 self.assertEqual(zstd.ZSTD_VERSION, (1, 1, 1))
13
14 def test_constants(self):
15 self.assertEqual(zstd.MAX_COMPRESSION_LEVEL, 22)
16 self.assertEqual(zstd.FRAME_HEADER, b'\x28\xb5\x2f\xfd')
17
18 def test_hasattr(self):
19 attrs = (
20 'COMPRESSION_RECOMMENDED_INPUT_SIZE',
21 'COMPRESSION_RECOMMENDED_OUTPUT_SIZE',
22 'DECOMPRESSION_RECOMMENDED_INPUT_SIZE',
23 'DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE',
24 'MAGIC_NUMBER',
25 'WINDOWLOG_MIN',
26 'WINDOWLOG_MAX',
27 'CHAINLOG_MIN',
28 'CHAINLOG_MAX',
29 'HASHLOG_MIN',
30 'HASHLOG_MAX',
31 'HASHLOG3_MAX',
32 'SEARCHLOG_MIN',
33 'SEARCHLOG_MAX',
34 'SEARCHLENGTH_MIN',
35 'SEARCHLENGTH_MAX',
36 'TARGETLENGTH_MIN',
37 'TARGETLENGTH_MAX',
38 'STRATEGY_FAST',
39 'STRATEGY_DFAST',
40 'STRATEGY_GREEDY',
41 'STRATEGY_LAZY',
42 'STRATEGY_LAZY2',
43 'STRATEGY_BTLAZY2',
44 'STRATEGY_BTOPT',
45 )
46
47 for a in attrs:
48 self.assertTrue(hasattr(zstd, a))
@@ -0,0 +1,64 b''
1 import io
2
3 try:
4 import unittest2 as unittest
5 except ImportError:
6 import unittest
7
8 try:
9 import hypothesis
10 import hypothesis.strategies as strategies
11 except ImportError:
12 raise unittest.SkipTest('hypothesis not available')
13
14 import zstd
15
16
17 compression_levels = strategies.integers(min_value=1, max_value=22)
18
19
20 class TestRoundTrip(unittest.TestCase):
21 @hypothesis.given(strategies.binary(), compression_levels)
22 def test_compress_write_to(self, data, level):
23 """Random data from compress() roundtrips via write_to."""
24 cctx = zstd.ZstdCompressor(level=level)
25 compressed = cctx.compress(data)
26
27 buffer = io.BytesIO()
28 dctx = zstd.ZstdDecompressor()
29 with dctx.write_to(buffer) as decompressor:
30 decompressor.write(compressed)
31
32 self.assertEqual(buffer.getvalue(), data)
33
34 @hypothesis.given(strategies.binary(), compression_levels)
35 def test_compressor_write_to_decompressor_write_to(self, data, level):
36 """Random data from compressor write_to roundtrips via write_to."""
37 compress_buffer = io.BytesIO()
38 decompressed_buffer = io.BytesIO()
39
40 cctx = zstd.ZstdCompressor(level=level)
41 with cctx.write_to(compress_buffer) as compressor:
42 compressor.write(data)
43
44 dctx = zstd.ZstdDecompressor()
45 with dctx.write_to(decompressed_buffer) as decompressor:
46 decompressor.write(compress_buffer.getvalue())
47
48 self.assertEqual(decompressed_buffer.getvalue(), data)
49
50 @hypothesis.given(strategies.binary(average_size=1048576))
51 @hypothesis.settings(perform_health_check=False)
52 def test_compressor_write_to_decompressor_write_to_larger(self, data):
53 compress_buffer = io.BytesIO()
54 decompressed_buffer = io.BytesIO()
55
56 cctx = zstd.ZstdCompressor(level=5)
57 with cctx.write_to(compress_buffer) as compressor:
58 compressor.write(data)
59
60 dctx = zstd.ZstdDecompressor()
61 with dctx.write_to(decompressed_buffer) as decompressor:
62 decompressor.write(compress_buffer.getvalue())
63
64 self.assertEqual(decompressed_buffer.getvalue(), data)
@@ -0,0 +1,46 b''
1 import sys
2
3 try:
4 import unittest2 as unittest
5 except ImportError:
6 import unittest
7
8 import zstd
9
10
11 if sys.version_info[0] >= 3:
12 int_type = int
13 else:
14 int_type = long
15
16
17 class TestTrainDictionary(unittest.TestCase):
18 def test_no_args(self):
19 with self.assertRaises(TypeError):
20 zstd.train_dictionary()
21
22 def test_bad_args(self):
23 with self.assertRaises(TypeError):
24 zstd.train_dictionary(8192, u'foo')
25
26 with self.assertRaises(ValueError):
27 zstd.train_dictionary(8192, [u'foo'])
28
29 def test_basic(self):
30 samples = []
31 for i in range(128):
32 samples.append(b'foo' * 64)
33 samples.append(b'bar' * 64)
34 samples.append(b'foobar' * 64)
35 samples.append(b'baz' * 64)
36 samples.append(b'foobaz' * 64)
37 samples.append(b'bazfoo' * 64)
38
39 d = zstd.train_dictionary(8192, samples)
40 self.assertLessEqual(len(d), 8192)
41
42 dict_id = d.dict_id()
43 self.assertIsInstance(dict_id, int_type)
44
45 data = d.as_bytes()
46 self.assertEqual(data[0:4], b'\x37\xa4\x30\xec')
@@ -0,0 +1,112 b''
1 /**
2 * Copyright (c) 2016-present, Gregory Szorc
3 * All rights reserved.
4 *
5 * This software may be modified and distributed under the terms
6 * of the BSD license. See the LICENSE file for details.
7 */
8
9 /* A Python C extension for Zstandard. */
10
11 #include "python-zstandard.h"
12
13 PyObject *ZstdError;
14
15 PyDoc_STRVAR(estimate_compression_context_size__doc__,
16 "estimate_compression_context_size(compression_parameters)\n"
17 "\n"
18 "Give the amount of memory allocated for a compression context given a\n"
19 "CompressionParameters instance");
20
21 PyDoc_STRVAR(estimate_decompression_context_size__doc__,
22 "estimate_decompression_context_size()\n"
23 "\n"
24 "Estimate the amount of memory allocated to a decompression context.\n"
25 );
26
27 static PyObject* estimate_decompression_context_size(PyObject* self) {
28 return PyLong_FromSize_t(ZSTD_estimateDCtxSize());
29 }
30
31 PyDoc_STRVAR(get_compression_parameters__doc__,
32 "get_compression_parameters(compression_level[, source_size[, dict_size]])\n"
33 "\n"
34 "Obtains a ``CompressionParameters`` instance from a compression level and\n"
35 "optional input size and dictionary size");
36
37 PyDoc_STRVAR(train_dictionary__doc__,
38 "train_dictionary(dict_size, samples)\n"
39 "\n"
40 "Train a dictionary from sample data.\n"
41 "\n"
42 "A compression dictionary of size ``dict_size`` will be created from the\n"
43 "iterable of samples provided by ``samples``.\n"
44 "\n"
45 "The raw dictionary content will be returned\n");
46
47 static char zstd_doc[] = "Interface to zstandard";
48
49 static PyMethodDef zstd_methods[] = {
50 { "estimate_compression_context_size", (PyCFunction)estimate_compression_context_size,
51 METH_VARARGS, estimate_compression_context_size__doc__ },
52 { "estimate_decompression_context_size", (PyCFunction)estimate_decompression_context_size,
53 METH_NOARGS, estimate_decompression_context_size__doc__ },
54 { "get_compression_parameters", (PyCFunction)get_compression_parameters,
55 METH_VARARGS, get_compression_parameters__doc__ },
56 { "train_dictionary", (PyCFunction)train_dictionary,
57 METH_VARARGS | METH_KEYWORDS, train_dictionary__doc__ },
58 { NULL, NULL }
59 };
60
61 void compressobj_module_init(PyObject* mod);
62 void compressor_module_init(PyObject* mod);
63 void compressionparams_module_init(PyObject* mod);
64 void constants_module_init(PyObject* mod);
65 void dictparams_module_init(PyObject* mod);
66 void compressiondict_module_init(PyObject* mod);
67 void compressionwriter_module_init(PyObject* mod);
68 void compressoriterator_module_init(PyObject* mod);
69 void decompressor_module_init(PyObject* mod);
70 void decompressobj_module_init(PyObject* mod);
71 void decompressionwriter_module_init(PyObject* mod);
72 void decompressoriterator_module_init(PyObject* mod);
73
74 void zstd_module_init(PyObject* m) {
75 compressionparams_module_init(m);
76 dictparams_module_init(m);
77 compressiondict_module_init(m);
78 compressobj_module_init(m);
79 compressor_module_init(m);
80 compressionwriter_module_init(m);
81 compressoriterator_module_init(m);
82 constants_module_init(m);
83 decompressor_module_init(m);
84 decompressobj_module_init(m);
85 decompressionwriter_module_init(m);
86 decompressoriterator_module_init(m);
87 }
88
89 #if PY_MAJOR_VERSION >= 3
90 static struct PyModuleDef zstd_module = {
91 PyModuleDef_HEAD_INIT,
92 "zstd",
93 zstd_doc,
94 -1,
95 zstd_methods
96 };
97
98 PyMODINIT_FUNC PyInit_zstd(void) {
99 PyObject *m = PyModule_Create(&zstd_module);
100 if (m) {
101 zstd_module_init(m);
102 }
103 return m;
104 }
105 #else
106 PyMODINIT_FUNC initzstd(void) {
107 PyObject *m = Py_InitModule3("zstd", zstd_methods, zstd_doc);
108 if (m) {
109 zstd_module_init(m);
110 }
111 }
112 #endif
@@ -0,0 +1,152 b''
1 # Copyright (c) 2016-present, Gregory Szorc
2 # All rights reserved.
3 #
4 # This software may be modified and distributed under the terms
5 # of the BSD license. See the LICENSE file for details.
6
7 """Python interface to the Zstandard (zstd) compression library."""
8
9 from __future__ import absolute_import, unicode_literals
10
11 import io
12
13 from _zstd_cffi import (
14 ffi,
15 lib,
16 )
17
18
19 _CSTREAM_IN_SIZE = lib.ZSTD_CStreamInSize()
20 _CSTREAM_OUT_SIZE = lib.ZSTD_CStreamOutSize()
21
22
23 class _ZstdCompressionWriter(object):
24 def __init__(self, cstream, writer):
25 self._cstream = cstream
26 self._writer = writer
27
28 def __enter__(self):
29 return self
30
31 def __exit__(self, exc_type, exc_value, exc_tb):
32 if not exc_type and not exc_value and not exc_tb:
33 out_buffer = ffi.new('ZSTD_outBuffer *')
34 out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE)
35 out_buffer.size = _CSTREAM_OUT_SIZE
36 out_buffer.pos = 0
37
38 while True:
39 res = lib.ZSTD_endStream(self._cstream, out_buffer)
40 if lib.ZSTD_isError(res):
41 raise Exception('error ending compression stream: %s' % lib.ZSTD_getErrorName)
42
43 if out_buffer.pos:
44 self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
45 out_buffer.pos = 0
46
47 if res == 0:
48 break
49
50 return False
51
52 def write(self, data):
53 out_buffer = ffi.new('ZSTD_outBuffer *')
54 out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE)
55 out_buffer.size = _CSTREAM_OUT_SIZE
56 out_buffer.pos = 0
57
58 # TODO can we reuse existing memory?
59 in_buffer = ffi.new('ZSTD_inBuffer *')
60 in_buffer.src = ffi.new('char[]', data)
61 in_buffer.size = len(data)
62 in_buffer.pos = 0
63 while in_buffer.pos < in_buffer.size:
64 res = lib.ZSTD_compressStream(self._cstream, out_buffer, in_buffer)
65 if lib.ZSTD_isError(res):
66 raise Exception('zstd compress error: %s' % lib.ZSTD_getErrorName(res))
67
68 if out_buffer.pos:
69 self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
70 out_buffer.pos = 0
71
72
73 class ZstdCompressor(object):
74 def __init__(self, level=3, dict_data=None, compression_params=None):
75 if dict_data:
76 raise Exception('dict_data not yet supported')
77 if compression_params:
78 raise Exception('compression_params not yet supported')
79
80 self._compression_level = level
81
82 def compress(self, data):
83 # Just use the stream API for now.
84 output = io.BytesIO()
85 with self.write_to(output) as compressor:
86 compressor.write(data)
87 return output.getvalue()
88
89 def copy_stream(self, ifh, ofh):
90 cstream = self._get_cstream()
91
92 in_buffer = ffi.new('ZSTD_inBuffer *')
93 out_buffer = ffi.new('ZSTD_outBuffer *')
94
95 out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE)
96 out_buffer.size = _CSTREAM_OUT_SIZE
97 out_buffer.pos = 0
98
99 total_read, total_write = 0, 0
100
101 while True:
102 data = ifh.read(_CSTREAM_IN_SIZE)
103 if not data:
104 break
105
106 total_read += len(data)
107
108 in_buffer.src = ffi.new('char[]', data)
109 in_buffer.size = len(data)
110 in_buffer.pos = 0
111
112 while in_buffer.pos < in_buffer.size:
113 res = lib.ZSTD_compressStream(cstream, out_buffer, in_buffer)
114 if lib.ZSTD_isError(res):
115 raise Exception('zstd compress error: %s' %
116 lib.ZSTD_getErrorName(res))
117
118 if out_buffer.pos:
119 ofh.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
120 total_write = out_buffer.pos
121 out_buffer.pos = 0
122
123 # We've finished reading. Flush the compressor.
124 while True:
125 res = lib.ZSTD_endStream(cstream, out_buffer)
126 if lib.ZSTD_isError(res):
127 raise Exception('error ending compression stream: %s' %
128 lib.ZSTD_getErrorName(res))
129
130 if out_buffer.pos:
131 ofh.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
132 total_write += out_buffer.pos
133 out_buffer.pos = 0
134
135 if res == 0:
136 break
137
138 return total_read, total_write
139
140 def write_to(self, writer):
141 return _ZstdCompressionWriter(self._get_cstream(), writer)
142
143 def _get_cstream(self):
144 cstream = lib.ZSTD_createCStream()
145 cstream = ffi.gc(cstream, lib.ZSTD_freeCStream)
146
147 res = lib.ZSTD_initCStream(cstream, self._compression_level)
148 if lib.ZSTD_isError(res):
149 raise Exception('cannot init CStream: %s' %
150 lib.ZSTD_getErrorName(res))
151
152 return cstream
@@ -7,7 +7,7 b''
7 New errors are not allowed. Warnings are strongly discouraged.
7 New errors are not allowed. Warnings are strongly discouraged.
8 (The writing "no-che?k-code" is for not skipping this file when checking.)
8 (The writing "no-che?k-code" is for not skipping this file when checking.)
9
9
10 $ hg locate | sed 's-\\-/-g' |
10 $ hg locate -X contrib/python-zstandard | sed 's-\\-/-g' |
11 > xargs "$check_code" --warnings --per-file=0 || false
11 > xargs "$check_code" --warnings --per-file=0 || false
12 Skipping hgext/fsmonitor/pywatchman/__init__.py it has no-che?k-code (glob)
12 Skipping hgext/fsmonitor/pywatchman/__init__.py it has no-che?k-code (glob)
13 Skipping hgext/fsmonitor/pywatchman/bser.c it has no-che?k-code (glob)
13 Skipping hgext/fsmonitor/pywatchman/bser.c it has no-che?k-code (glob)
@@ -159,6 +159,7 b' outputs, which should be fixed later.'
159 $ hg locate 'set:**.py or grep(r"^#!.*?python")' \
159 $ hg locate 'set:**.py or grep(r"^#!.*?python")' \
160 > 'tests/**.t' \
160 > 'tests/**.t' \
161 > -X contrib/debugshell.py \
161 > -X contrib/debugshell.py \
162 > -X contrib/python-zstandard/ \
162 > -X contrib/win32/hgwebdir_wsgi.py \
163 > -X contrib/win32/hgwebdir_wsgi.py \
163 > -X doc/gendoc.py \
164 > -X doc/gendoc.py \
164 > -X doc/hgmanpage.py \
165 > -X doc/hgmanpage.py \
@@ -4,6 +4,17 b''
4 $ cd "$TESTDIR"/..
4 $ cd "$TESTDIR"/..
5
5
6 $ hg files 'set:(**.py)' | sed 's|\\|/|g' | xargs python contrib/check-py3-compat.py
6 $ hg files 'set:(**.py)' | sed 's|\\|/|g' | xargs python contrib/check-py3-compat.py
7 contrib/python-zstandard/setup.py not using absolute_import
8 contrib/python-zstandard/setup_zstd.py not using absolute_import
9 contrib/python-zstandard/tests/common.py not using absolute_import
10 contrib/python-zstandard/tests/test_cffi.py not using absolute_import
11 contrib/python-zstandard/tests/test_compressor.py not using absolute_import
12 contrib/python-zstandard/tests/test_data_structures.py not using absolute_import
13 contrib/python-zstandard/tests/test_decompressor.py not using absolute_import
14 contrib/python-zstandard/tests/test_estimate_sizes.py not using absolute_import
15 contrib/python-zstandard/tests/test_module_attributes.py not using absolute_import
16 contrib/python-zstandard/tests/test_roundtrip.py not using absolute_import
17 contrib/python-zstandard/tests/test_train_dictionary.py not using absolute_import
7 hgext/fsmonitor/pywatchman/__init__.py not using absolute_import
18 hgext/fsmonitor/pywatchman/__init__.py not using absolute_import
8 hgext/fsmonitor/pywatchman/__init__.py requires print_function
19 hgext/fsmonitor/pywatchman/__init__.py requires print_function
9 hgext/fsmonitor/pywatchman/capabilities.py not using absolute_import
20 hgext/fsmonitor/pywatchman/capabilities.py not using absolute_import
@@ -10,6 +10,6 b' run pyflakes on all tracked files ending'
10 > -X mercurial/pycompat.py \
10 > -X mercurial/pycompat.py \
11 > 2>/dev/null \
11 > 2>/dev/null \
12 > | xargs pyflakes 2>/dev/null | "$TESTDIR/filterpyflakes.py"
12 > | xargs pyflakes 2>/dev/null | "$TESTDIR/filterpyflakes.py"
13 contrib/python-zstandard/tests/test_data_structures.py:107: local variable 'size' is assigned to but never used
13 tests/filterpyflakes.py:39: undefined name 'undefinedname'
14 tests/filterpyflakes.py:39: undefined name 'undefinedname'
14
15
15
General Comments 0
You need to be logged in to leave comments. Login now