upstream/mercurial-mirror Commit - r30435:b86a448a

zstd: vendor python-zstandard 0.5.0...

zstd: vendor python-zstandard 0.5.0 As the commit message for the previous changeset says, we wish for zstd to be a 1st class citizen in Mercurial. To make that happen, we need to enable Python to talk to the zstd C API. And that requires bindings. This commit vendors a copy of existing Python bindings. Why do we need to vendor? As the commit message of the previous commit says, relying on systems in the wild to have the bindings or zstd present is a losing proposition. By distributing the zstd and bindings with Mercurial, we significantly increase our chances that zstd will work. Since zstd will deliver a better end-user experience by achieving better performance, this benefits our users. Another reason is that the Python bindings still aren't stable and the API is somewhat fluid. While Mercurial could be coded to target multiple versions of the Python bindings, it is safer to bundle an explicit, known working version. The added Python bindings are mostly a fully-featured interface to the zstd C API. They allow one-shot operations, streaming, reading and writing from objects implements the file object protocol, dictionary compression, control over low-level compression parameters, and more. The Python bindings work on Python 2.6, 2.7, and 3.3+ and have been tested on Linux and Windows. There are CFFI bindings, but they are lacking compared to the C extension. Upstream work will be needed before we can support zstd with PyPy. But it will be possible. The files added in this commit come from Git commit from https://github.com/indygreg/python-zstandard and are added without modifications. Some files from the upstream repository have been omitted, namely files related to continuous integration. In the spirit of full disclosure, I'm the maintainer of the "python-zstandard" project and have authored 100% of the code added in this commit. Unfortunately, the Python bindings have not been formally code reviewed by anyone. While I've tested much of the code thoroughly (I even have tests that fuzz APIs), there's a good chance there are bugs, memory leaks, not well thought out APIs, etc. If someone wants to review the code and send feedback to the GitHub project, it would be greatly appreciated. Despite my involvement with both projects, my opinions of code style differ from Mercurial's. The code in this commit introduces numerous code style violations in Mercurial's linters. So, the code is excluded from most lints. However, some violations I agree with. These have been added to the known violations ignore list for now.

Gregory Szorc -

r30435:b86a448a default

parent child

Expand all files

contrib/python-zstandard/LICENSE

0 created 644 +27 0

@@ -0,0 +1,27 b''
	1	Copyright (c) 2016, Gregory Szorc
	2	All rights reserved.
	3
	4	Redistribution and use in source and binary forms, with or without modification,
	5	are permitted provided that the following conditions are met:
	6
	7	1. Redistributions of source code must retain the above copyright notice, this
	8	list of conditions and the following disclaimer.
	9
	10	2. Redistributions in binary form must reproduce the above copyright notice,
	11	this list of conditions and the following disclaimer in the documentation
	12	and/or other materials provided with the distribution.
	13
	14	3. Neither the name of the copyright holder nor the names of its contributors
	15	may be used to endorse or promote products derived from this software without
	16	specific prior written permission.
	17
	18	THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
	19	ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
	20	WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
	21	DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
	22	ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
	23	(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
	24	LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
	25	ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
	26	(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
	27	SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

contrib/python-zstandard/MANIFEST.in

0 created 644 +2 0

@@ -0,0 +1,2 b''
	1	graft zstd
	2	include make_cffi.py

contrib/python-zstandard/NEWS.rst

0 created 644 +63 0

@@ -0,0 +1,63 b''
	1	Version History
	2	===============
	3
	4	0.5.0 (released 2016-11-10)
	5	---------------------------
	6
	7	* Vendored version of zstd updated to 1.1.1.
	8	* Continuous integration for Python 3.6 and 3.7
	9	* Continuous integration for Conda
	10	* Added compression and decompression APIs providing similar interfaces
	11	to the standard library ``zlib`` and ``bz2`` modules. This allows
	12	coding to a common interface.
	13	* ``zstd.__version__` is now defined.
	14	* ``read_from()`` on various APIs now accepts objects implementing the buffer
	15	protocol.
	16	* ``read_from()`` has gained a ``skip_bytes`` argument. This allows callers
	17	to pass in an existing buffer with a header without having to create a
	18	slice or a new object.
	19	* Implemented ``ZstdCompressionDict.as_bytes()``.
	20	* Python's memory allocator is now used instead of ``malloc()``.
	21	* Low-level zstd data structures are reused in more instances, cutting down
	22	on overhead for certain operations.
	23	* ``distutils`` boilerplate for obtaining an ``Extension`` instance
	24	has now been refactored into a standalone ``setup_zstd.py`` file. This
	25	allows other projects with ``setup.py`` files to reuse the
	26	``distutils`` code for this project without copying code.
	27	* The monolithic ``zstd.c`` file has been split into a header file defining
	28	types and separate ``.c`` source files for the implementation.
	29
	30	History of the Project
	31	======================
	32
	33	2016-08-31 - Zstandard 1.0.0 is released and Gregory starts hacking on a
	34	Python extension for use by the Mercurial project. A very hacky prototype
	35	is sent to the mercurial-devel list for RFC.
	36
	37	2016-09-03 - Most functionality from Zstandard C API implemented. Source
	38	code published on https://github.com/indygreg/python-zstandard. Travis-CI
	39	automation configured. 0.0.1 release on PyPI.
	40
	41	2016-09-05 - After the API was rounded out a bit and support for Python
	42	2.6 and 2.7 was added, version 0.1 was released to PyPI.
	43
	44	2016-09-05 - After the compressor and decompressor APIs were changed, 0.2
	45	was released to PyPI.
	46
	47	2016-09-10 - 0.3 is released with a bunch of new features. ZstdCompressor
	48	now accepts arguments controlling frame parameters. The source size can now
	49	be declared when performing streaming compression. ZstdDecompressor.decompress()
	50	is implemented. Compression dictionaries are now cached when using the simple
	51	compression and decompression APIs. Memory size APIs added.
	52	ZstdCompressor.read_from() and ZstdDecompressor.read_from() have been
	53	implemented. This rounds out the major compression/decompression APIs planned
	54	by the author.
	55
	56	2016-10-02 - 0.3.3 is released with a bug fix for read_from not fully
	57	decoding a zstd frame (issue #2).
	58
	59	2016-10-02 - 0.4.0 is released with zstd 1.1.0, support for custom read and
	60	write buffer sizes, and a few bug fixes involving failure to read/write
	61	all data when buffer sizes were too small to hold remaining data.
	62
	63	2016-11-10 - 0.5.0 is released with zstd 1.1.1 and other enhancements.

contrib/python-zstandard/README.rst

0 created 644 +776 0

This diff has been collapsed as it changes many lines, (776 lines changed) Show them Hide them
	@@ -0,0 +1,776 b''
		1	================
		2	python-zstandard
		3	================
		4
		5	This project provides a Python C extension for interfacing with the
		6	`Zstandard <http://www.zstd.net>`_ compression library.
		7
		8	The primary goal of the extension is to provide a Pythonic interface to
		9	the underlying C API. This means exposing most of the features and flexibility
		10	of the C API while not sacrificing usability or safety that Python provides.
		11
		12	\| \|ci-status\| \|win-ci-status\|
		13
		14	State of Project
		15	================
		16
		17	The project is officially in beta state. The author is reasonably satisfied
		18	with the current API and that functionality works as advertised. There
		19	may be some backwards incompatible changes before 1.0. Though the author
		20	does not intend to make any major changes to the Python API.
		21
		22	There is continuous integration for Python versions 2.6, 2.7, and 3.3+
		23	on Linux x86_x64 and Windows x86 and x86_64. The author is reasonably
		24	confident the extension is stable and works as advertised on these
		25	platforms.
		26
		27	Expected Changes
		28	----------------
		29
		30	The author is reasonably confident in the current state of what's
		31	implemented on the ``ZstdCompressor`` and ``ZstdDecompressor`` types.
		32	Those APIs likely won't change significantly. Some low-level behavior
		33	(such as naming and types expected by arguments) may change.
		34
		35	There will likely be arguments added to control the input and output
		36	buffer sizes (currently, certain operations read and write in chunk
		37	sizes using zstd's preferred defaults).
		38
		39	There should be an API that accepts an object that conforms to the buffer
		40	interface and returns an iterator over compressed or decompressed output.
		41
		42	The author is on the fence as to whether to support the extremely
		43	low level compression and decompression APIs. It could be useful to
		44	support compression without the framing headers. But the author doesn't
		45	believe it a high priority at this time.
		46
		47	The CFFI bindings are half-baked and need to be finished.
		48
		49	Requirements
		50	============
		51
		52	This extension is designed to run with Python 2.6, 2.7, 3.3, 3.4, and 3.5
		53	on common platforms (Linux, Windows, and OS X). Only x86_64 is currently
		54	well-tested as an architecture.
		55
		56	Installing
		57	==========
		58
		59	This package is uploaded to PyPI at https://pypi.python.org/pypi/zstandard.
		60	So, to install this package::
		61
		62	$ pip install zstandard
		63
		64	Binary wheels are made available for some platforms. If you need to
		65	install from a source distribution, all you should need is a working C
		66	compiler and the Python development headers/libraries. On many Linux
		67	distributions, you can install a ``python-dev`` or ``python-devel``
		68	package to provide these dependencies.
		69
		70	Packages are also uploaded to Anaconda Cloud at
		71	https://anaconda.org/indygreg/zstandard. See that URL for how to install
		72	this package with ``conda``.
		73
		74	Performance
		75	===========
		76
		77	Very crude and non-scientific benchmarking (most benchmarks fall in this
		78	category because proper benchmarking is hard) show that the Python bindings
		79	perform within 10% of the native C implementation.
		80
		81	The following table compares the performance of compressing and decompressing
		82	a 1.1 GB tar file comprised of the files in a Firefox source checkout. Values
		83	obtained with the ``zstd`` program are on the left. The remaining columns detail
		84	performance of various compression APIs in the Python bindings.
		85
		86	+-------+-----------------+-----------------+-----------------+---------------+
		87	\| Level \| Native \| Simple \| Stream In \| Stream Out \|
		88	\| \| Comp / Decomp \| Comp / Decomp \| Comp / Decomp \| Comp \|
		89	+=======+=================+=================+=================+===============+
		90	\| 1 \| 490 / 1338 MB/s \| 458 / 1266 MB/s \| 407 / 1156 MB/s \| 405 MB/s \|
		91	+-------+-----------------+-----------------+-----------------+---------------+
		92	\| 2 \| 412 / 1288 MB/s \| 381 / 1203 MB/s \| 345 / 1128 MB/s \| 349 MB/s \|
		93	+-------+-----------------+-----------------+-----------------+---------------+
		94	\| 3 \| 342 / 1312 MB/s \| 319 / 1182 MB/s \| 285 / 1165 MB/s \| 287 MB/s \|
		95	+-------+-----------------+-----------------+-----------------+---------------+
		96	\| 11 \| 64 / 1506 MB/s \| 66 / 1436 MB/s \| 56 / 1342 MB/s \| 57 MB/s \|
		97	+-------+-----------------+-----------------+-----------------+---------------+
		98
		99	Again, these are very unscientific. But it shows that Python is capable of
		100	compressing at several hundred MB/s and decompressing at over 1 GB/s.
		101
		102	Comparison to Other Python Bindings
		103	===================================
		104
		105	https://pypi.python.org/pypi/zstd is an alternative Python binding to
		106	Zstandard. At the time this was written, the latest release of that
		107	package (1.0.0.2) had the following significant differences from this package:
		108
		109	* It only exposes the simple API for compression and decompression operations.
		110	This extension exposes the streaming API, dictionary training, and more.
		111	* It adds a custom framing header to compressed data and there is no way to
		112	disable it. This means that data produced with that module cannot be used by
		113	other Zstandard implementations.
		114
		115	Bundling of Zstandard Source Code
		116	=================================
		117
		118	The source repository for this project contains a vendored copy of the
		119	Zstandard source code. This is done for a few reasons.
		120
		121	First, Zstandard is relatively new and not yet widely available as a system
		122	package. Providing a copy of the source code enables the Python C extension
		123	to be compiled without requiring the user to obtain the Zstandard source code
		124	separately.
		125
		126	Second, Zstandard has both a stable public API and an experimental API.
		127	The experimental API is actually quite useful (contains functionality for
		128	training dictionaries for example), so it is something we wish to expose to
		129	Python. However, the experimental API is only available via static linking.
		130	Furthermore, the experimental API can change at any time. So, control over
		131	the exact version of the Zstandard library linked against is important to
		132	ensure known behavior.
		133
		134	Instructions for Building and Testing
		135	=====================================
		136
		137	Once you have the source code, the extension can be built via setup.py::
		138
		139	$ python setup.py build_ext
		140
		141	We recommend testing with ``nose``::
		142
		143	$ nosetests
		144
		145	A Tox configuration is present to test against multiple Python versions::
		146
		147	$ tox
		148
		149	Tests use the ``hypothesis`` Python package to perform fuzzing. If you
		150	don't have it, those tests won't run.
		151
		152	There is also an experimental CFFI module. You need the ``cffi`` Python
		153	package installed to build and test that.
		154
		155	To create a virtualenv with all development dependencies, do something
		156	like the following::
		157
		158	# Python 2
		159	$ virtualenv venv
		160
		161	# Python 3
		162	$ python3 -m venv venv
		163
		164	$ source venv/bin/activate
		165	$ pip install cffi hypothesis nose tox
		166
		167	API
		168	===
		169
		170	The compiled C extension provides a ``zstd`` Python module. This module
		171	exposes the following interfaces.
		172
		173	ZstdCompressor
		174	--------------
		175
		176	The ``ZstdCompressor`` class provides an interface for performing
		177	compression operations.
		178
		179	Each instance is associated with parameters that control compression
		180	behavior. These come from the following named arguments (all optional):
		181
		182	level
		183	Integer compression level. Valid values are between 1 and 22.
		184	dict_data
		185	Compression dictionary to use.
		186
		187	Note: When using dictionary data and ``compress()`` is called multiple
		188	times, the ``CompressionParameters`` derived from an integer compression
		189	``level`` and the first compressed data's size will be reused for all
		190	subsequent operations. This may not be desirable if source data size
		191	varies significantly.
		192	compression_params
		193	A ``CompressionParameters`` instance (overrides the ``level`` value).
		194	write_checksum
		195	Whether a 4 byte checksum should be written with the compressed data.
		196	Defaults to False. If True, the decompressor can verify that decompressed
		197	data matches the original input data.
		198	write_content_size
		199	Whether the size of the uncompressed data will be written into the
		200	header of compressed data. Defaults to False. The data will only be
		201	written if the compressor knows the size of the input data. This is
		202	likely not true for streaming compression.
		203	write_dict_id
		204	Whether to write the dictionary ID into the compressed data.
		205	Defaults to True. The dictionary ID is only written if a dictionary
		206	is being used.
		207
		208	Simple API
		209	^^^^^^^^^^
		210
		211	``compress(data)`` compresses and returns data as a one-shot operation.::
		212
		213	cctx = zstd.ZsdCompressor()
		214	compressed = cctx.compress(b'data to compress')
		215
		216	Streaming Input API
		217	^^^^^^^^^^^^^^^^^^^
		218
		219	``write_to(fh)`` (which behaves as a context manager) allows you to stream
		220	data into a compressor.::
		221
		222	cctx = zstd.ZstdCompressor(level=10)
		223	with cctx.write_to(fh) as compressor:
		224	compressor.write(b'chunk 0')
		225	compressor.write(b'chunk 1')
		226	...
		227
		228	The argument to ``write_to()`` must have a ``write(data)`` method. As
		229	compressed data is available, ``write()`` will be called with the comrpessed
		230	data as its argument. Many common Python types implement ``write()``, including
		231	open file handles and ``io.BytesIO``.
		232
		233	``write_to()`` returns an object representing a streaming compressor instance.
		234	It must be used as a context manager. That object's ``write(data)`` method
		235	is used to feed data into the compressor.
		236
		237	If the size of the data being fed to this streaming compressor is known,
		238	you can declare it before compression begins::
		239
		240	cctx = zstd.ZstdCompressor()
		241	with cctx.write_to(fh, size=data_len) as compressor:
		242	compressor.write(chunk0)
		243	compressor.write(chunk1)
		244	...
		245
		246	Declaring the size of the source data allows compression parameters to
		247	be tuned. And if ``write_content_size`` is used, it also results in the
		248	content size being written into the frame header of the output data.
		249
		250	The size of chunks being ``write()`` to the destination can be specified::
		251
		252	cctx = zstd.ZstdCompressor()
		253	with cctx.write_to(fh, write_size=32768) as compressor:
		254	...
		255
		256	To see how much memory is being used by the streaming compressor::
		257
		258	cctx = zstd.ZstdCompressor()
		259	with cctx.write_to(fh) as compressor:
		260	...
		261	byte_size = compressor.memory_size()
		262
		263	Streaming Output API
		264	^^^^^^^^^^^^^^^^^^^^
		265
		266	``read_from(reader)`` provides a mechanism to stream data out of a compressor
		267	as an iterator of data chunks.::
		268
		269	cctx = zstd.ZstdCompressor()
		270	for chunk in cctx.read_from(fh):
		271	# Do something with emitted data.
		272
		273	``read_from()`` accepts an object that has a ``read(size)`` method or conforms
		274	to the buffer protocol. (``bytes`` and ``memoryview`` are 2 common types that
		275	provide the buffer protocol.)
		276
		277	Uncompressed data is fetched from the source either by calling ``read(size)``
		278	or by fetching a slice of data from the object directly (in the case where
		279	the buffer protocol is being used). The returned iterator consists of chunks
		280	of compressed data.
		281
		282	Like ``write_to()``, ``read_from()`` also accepts a ``size`` argument
		283	declaring the size of the input stream::
		284
		285	cctx = zstd.ZstdCompressor()
		286	for chunk in cctx.read_from(fh, size=some_int):
		287	pass
		288
		289	You can also control the size that data is ``read()`` from the source and
		290	the ideal size of output chunks::
		291
		292	cctx = zstd.ZstdCompressor()
		293	for chunk in cctx.read_from(fh, read_size=16384, write_size=8192):
		294	pass
		295
		296	Stream Copying API
		297	^^^^^^^^^^^^^^^^^^
		298
		299	``copy_stream(ifh, ofh)`` can be used to copy data between 2 streams while
		300	compressing it.::
		301
		302	cctx = zstd.ZstdCompressor()
		303	cctx.copy_stream(ifh, ofh)
		304
		305	For example, say you wish to compress a file::
		306
		307	cctx = zstd.ZstdCompressor()
		308	with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh:
		309	cctx.copy_stream(ifh, ofh)
		310
		311	It is also possible to declare the size of the source stream::
		312
		313	cctx = zstd.ZstdCompressor()
		314	cctx.copy_stream(ifh, ofh, size=len_of_input)
		315
		316	You can also specify how large the chunks that are ``read()`` and ``write()``
		317	from and to the streams::
		318
		319	cctx = zstd.ZstdCompressor()
		320	cctx.copy_stream(ifh, ofh, read_size=32768, write_size=16384)
		321
		322	The stream copier returns a 2-tuple of bytes read and written::
		323
		324	cctx = zstd.ZstdCompressor()
		325	read_count, write_count = cctx.copy_stream(ifh, ofh)
		326
		327	Compressor API
		328	^^^^^^^^^^^^^^
		329
		330	``compressobj()`` returns an object that exposes ``compress(data)`` and
		331	``flush()`` methods. Each returns compressed data or an empty bytes.
		332
		333	The purpose of ``compressobj()`` is to provide an API-compatible interface
		334	with ``zlib.compressobj`` and ``bz2.BZ2Compressor``. This allows callers to
		335	swap in different compressor objects while using the same API.
		336
		337	Once ``flush()`` is called, the compressor will no longer accept new data
		338	to ``compress()``. ``flush()`` must be called to end the compression
		339	context. If not called, the returned data may be incomplete.
		340
		341	Here is how this API should be used::
		342
		343	cctx = zstd.ZstdCompressor()
		344	cobj = cctx.compressobj()
		345	data = cobj.compress(b'raw input 0')
		346	data = cobj.compress(b'raw input 1')
		347	data = cobj.flush()
		348
		349	For best performance results, keep input chunks under 256KB. This avoids
		350	extra allocations for a large output object.
		351
		352	It is possible to declare the input size of the data that will be fed into
		353	the compressor::
		354
		355	cctx = zstd.ZstdCompressor()
		356	cobj = cctx.compressobj(size=6)
		357	data = cobj.compress(b'foobar')
		358	data = cobj.flush()
		359
		360	ZstdDecompressor
		361	----------------
		362
		363	The ``ZstdDecompressor`` class provides an interface for performing
		364	decompression.
		365
		366	Each instance is associated with parameters that control decompression. These
		367	come from the following named arguments (all optional):
		368
		369	dict_data
		370	Compression dictionary to use.
		371
		372	The interface of this class is very similar to ``ZstdCompressor`` (by design).
		373
		374	Simple API
		375	^^^^^^^^^^
		376
		377	``decompress(data)`` can be used to decompress an entire compressed zstd
		378	frame in a single operation.::
		379
		380	dctx = zstd.ZstdDecompressor()
		381	decompressed = dctx.decompress(data)
		382
		383	By default, ``decompress(data)`` will only work on data written with the content
		384	size encoded in its header. This can be achieved by creating a
		385	``ZstdCompressor`` with ``write_content_size=True``. If compressed data without
		386	an embedded content size is seen, ``zstd.ZstdError`` will be raised.
		387
		388	If the compressed data doesn't have its content size embedded within it,
		389	decompression can be attempted by specifying the ``max_output_size``
		390	argument.::
		391
		392	dctx = zstd.ZstdDecompressor()
		393	uncompressed = dctx.decompress(data, max_output_size=1048576)
		394
		395	Ideally, ``max_output_size`` will be identical to the decompressed output
		396	size.
		397
		398	If ``max_output_size`` is too small to hold the decompressed data,
		399	``zstd.ZstdError`` will be raised.
		400
		401	If ``max_output_size`` is larger than the decompressed data, the allocated
		402	output buffer will be resized to only use the space required.
		403
		404	Please note that an allocation of the requested ``max_output_size`` will be
		405	performed every time the method is called. Setting to a very large value could
		406	result in a lot of work for the memory allocator and may result in
		407	``MemoryError`` being raised if the allocation fails.
		408
		409	If the exact size of decompressed data is unknown, it is strongly
		410	recommended to use a streaming API.
		411
		412	Streaming Input API
		413	^^^^^^^^^^^^^^^^^^^
		414
		415	``write_to(fh)`` can be used to incrementally send compressed data to a
		416	decompressor.::
		417
		418	dctx = zstd.ZstdDecompressor()
		419	with dctx.write_to(fh) as decompressor:
		420	decompressor.write(compressed_data)
		421
		422	This behaves similarly to ``zstd.ZstdCompressor``: compressed data is written to
		423	the decompressor by calling ``write(data)`` and decompressed output is written
		424	to the output object by calling its ``write(data)`` method.
		425
		426	The size of chunks being ``write()`` to the destination can be specified::
		427
		428	dctx = zstd.ZstdDecompressor()
		429	with dctx.write_to(fh, write_size=16384) as decompressor:
		430	pass
		431
		432	You can see how much memory is being used by the decompressor::
		433
		434	dctx = zstd.ZstdDecompressor()
		435	with dctx.write_to(fh) as decompressor:
		436	byte_size = decompressor.memory_size()
		437
		438	Streaming Output API
		439	^^^^^^^^^^^^^^^^^^^^
		440
		441	``read_from(fh)`` provides a mechanism to stream decompressed data out of a
		442	compressed source as an iterator of data chunks.::
		443
		444	dctx = zstd.ZstdDecompressor()
		445	for chunk in dctx.read_from(fh):
		446	# Do something with original data.
		447
		448	``read_from()`` accepts a) an object with a ``read(size)`` method that will
		449	return compressed bytes b) an object conforming to the buffer protocol that
		450	can expose its data as a contiguous range of bytes. The ``bytes`` and
		451	``memoryview`` types expose this buffer protocol.
		452
		453	``read_from()`` returns an iterator whose elements are chunks of the
		454	decompressed data.
		455
		456	The size of requested ``read()`` from the source can be specified::
		457
		458	dctx = zstd.ZstdDecompressor()
		459	for chunk in dctx.read_from(fh, read_size=16384):
		460	pass
		461
		462	It is also possible to skip leading bytes in the input data::
		463
		464	dctx = zstd.ZstdDecompressor()
		465	for chunk in dctx.read_from(fh, skip_bytes=1):
		466	pass
		467
		468	Skipping leading bytes is useful if the source data contains extra
		469	header data but you want to avoid the overhead of making a buffer copy
		470	or allocating a new ``memoryview`` object in order to decompress the data.
		471
		472	Similarly to ``ZstdCompressor.read_from()``, the consumer of the iterator
		473	controls when data is decompressed. If the iterator isn't consumed,
		474	decompression is put on hold.
		475
		476	When ``read_from()`` is passed an object conforming to the buffer protocol,
		477	the behavior may seem similar to what occurs when the simple decompression
		478	API is used. However, this API works when the decompressed size is unknown.
		479	Furthermore, if feeding large inputs, the decompressor will work in chunks
		480	instead of performing a single operation.
		481
		482	Stream Copying API
		483	^^^^^^^^^^^^^^^^^^
		484
		485	``copy_stream(ifh, ofh)`` can be used to copy data across 2 streams while
		486	performing decompression.::
		487
		488	dctx = zstd.ZstdDecompressor()
		489	dctx.copy_stream(ifh, ofh)
		490
		491	e.g. to decompress a file to another file::
		492
		493	dctx = zstd.ZstdDecompressor()
		494	with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh:
		495	dctx.copy_stream(ifh, ofh)
		496
		497	The size of chunks being ``read()`` and ``write()`` from and to the streams
		498	can be specified::
		499
		500	dctx = zstd.ZstdDecompressor()
		501	dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384)
		502
		503	Decompressor API
		504	^^^^^^^^^^^^^^^^
		505
		506	``decompressobj()`` returns an object that exposes a ``decompress(data)``
		507	methods. Compressed data chunks are fed into ``decompress(data)`` and
		508	uncompressed output (or an empty bytes) is returned. Output from subsequent
		509	calls needs to be concatenated to reassemble the full decompressed byte
		510	sequence.
		511
		512	The purpose of ``decompressobj()`` is to provide an API-compatible interface
		513	with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor``. This allows callers
		514	to swap in different decompressor objects while using the same API.
		515
		516	Each object is single use: once an input frame is decoded, ``decompress()``
		517	can no longer be called.
		518
		519	Here is how this API should be used::
		520
		521	dctx = zstd.ZstdDeompressor()
		522	dobj = cctx.decompressobj()
		523	data = dobj.decompress(compressed_chunk_0)
		524	data = dobj.decompress(compressed_chunk_1)
		525
		526	Choosing an API
		527	---------------
		528
		529	Various forms of compression and decompression APIs are provided because each
		530	are suitable for different use cases.
		531
		532	The simple/one-shot APIs are useful for small data, when the decompressed
		533	data size is known (either recorded in the zstd frame header via
		534	``write_content_size`` or known via an out-of-band mechanism, such as a file
		535	size).
		536
		537	A limitation of the simple APIs is that input or output data must fit in memory.
		538	And unless using advanced tricks with Python buffer objects, both input and
		539	output must fit in memory simultaneously.
		540
		541	Another limitation is that compression or decompression is performed as a single
		542	operation. So if you feed large input, it could take a long time for the
		543	function to return.
		544
		545	The streaming APIs do not have the limitations of the simple API. The cost to
		546	this is they are more complex to use than a single function call.
		547
		548	The streaming APIs put the caller in control of compression and decompression
		549	behavior by allowing them to directly control either the input or output side
		550	of the operation.
		551
		552	With the streaming input APIs, the caller feeds data into the compressor or
		553	decompressor as they see fit. Output data will only be written after the caller
		554	has explicitly written data.
		555
		556	With the streaming output APIs, the caller consumes output from the compressor
		557	or decompressor as they see fit. The compressor or decompressor will only
		558	consume data from the source when the caller is ready to receive it.
		559
		560	One end of the streaming APIs involves a file-like object that must
		561	``write()`` output data or ``read()`` input data. Depending on what the
		562	backing storage for these objects is, those operations may not complete quickly.
		563	For example, when streaming compressed data to a file, the ``write()`` into
		564	a streaming compressor could result in a ``write()`` to the filesystem, which
		565	may take a long time to finish due to slow I/O on the filesystem. So, there
		566	may be overhead in streaming APIs beyond the compression and decompression
		567	operations.
		568
		569	Dictionary Creation and Management
		570	----------------------------------
		571
		572	Zstandard allows dictionaries to be used when compressing and
		573	decompressing data. The idea is that if you are compressing a lot of similar
		574	data, you can precompute common properties of that data (such as recurring
		575	byte sequences) to achieve better compression ratios.
		576
		577	In Python, compression dictionaries are represented as the
		578	``ZstdCompressionDict`` type.
		579
		580	Instances can be constructed from bytes::
		581
		582	dict_data = zstd.ZstdCompressionDict(data)
		583
		584	More interestingly, instances can be created by training on sample data::
		585
		586	dict_data = zstd.train_dictionary(size, samples)
		587
		588	This takes a list of bytes instances and creates and returns a
		589	``ZstdCompressionDict``.
		590
		591	You can see how many bytes are in the dictionary by calling ``len()``::
		592
		593	dict_data = zstd.train_dictionary(size, samples)
		594	dict_size = len(dict_data) # will not be larger than ``size``
		595
		596	Once you have a dictionary, you can pass it to the objects performing
		597	compression and decompression::
		598
		599	dict_data = zstd.train_dictionary(16384, samples)
		600
		601	cctx = zstd.ZstdCompressor(dict_data=dict_data)
		602	for source_data in input_data:
		603	compressed = cctx.compress(source_data)
		604	# Do something with compressed data.
		605
		606	dctx = zstd.ZstdDecompressor(dict_data=dict_data)
		607	for compressed_data in input_data:
		608	buffer = io.BytesIO()
		609	with dctx.write_to(buffer) as decompressor:
		610	decompressor.write(compressed_data)
		611	# Do something with raw data in ``buffer``.
		612
		613	Dictionaries have unique integer IDs. You can retrieve this ID via::
		614
		615	dict_id = zstd.dictionary_id(dict_data)
		616
		617	You can obtain the raw data in the dict (useful for persisting and constructing
		618	a ``ZstdCompressionDict`` later) via ``as_bytes()``::
		619
		620	dict_data = zstd.train_dictionary(size, samples)
		621	raw_data = dict_data.as_bytes()
		622
		623	Explicit Compression Parameters
		624	-------------------------------
		625
		626	Zstandard's integer compression levels along with the input size and dictionary
		627	size are converted into a data structure defining multiple parameters to tune
		628	behavior of the compression algorithm. It is possible to use define this
		629	data structure explicitly to have lower-level control over compression behavior.
		630
		631	The ``zstd.CompressionParameters`` type represents this data structure.
		632	You can see how Zstandard converts compression levels to this data structure
		633	by calling ``zstd.get_compression_parameters()``. e.g.::
		634
		635	params = zstd.get_compression_parameters(5)
		636
		637	This function also accepts the uncompressed data size and dictionary size
		638	to adjust parameters::
		639
		640	params = zstd.get_compression_parameters(3, source_size=len(data), dict_size=len(dict_data))
		641
		642	You can also construct compression parameters from their low-level components::
		643
		644	params = zstd.CompressionParameters(20, 6, 12, 5, 4, 10, zstd.STRATEGY_FAST)
		645
		646	You can then configure a compressor to use the custom parameters::
		647
		648	cctx = zstd.ZstdCompressor(compression_params=params)
		649
		650	The members of the ``CompressionParameters`` tuple are as follows::
		651
		652	* 0 - Window log
		653	* 1 - Chain log
		654	* 2 - Hash log
		655	* 3 - Search log
		656	* 4 - Search length
		657	* 5 - Target length
		658	* 6 - Strategy (one of the ``zstd.STRATEGY_`` constants)
		659
		660	You'll need to read the Zstandard documentation for what these parameters
		661	do.
		662
		663	Misc Functionality
		664	------------------
		665
		666	estimate_compression_context_size(CompressionParameters)
		667	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
		668
		669	Given a ``CompressionParameters`` struct, estimate the memory size required
		670	to perform compression.
		671
		672	estimate_decompression_context_size()
		673	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
		674
		675	Estimate the memory size requirements for a decompressor instance.
		676
		677	Constants
		678	---------
		679
		680	The following module constants/attributes are exposed:
		681
		682	ZSTD_VERSION
		683	This module attribute exposes a 3-tuple of the Zstandard version. e.g.
		684	``(1, 0, 0)``
		685	MAX_COMPRESSION_LEVEL
		686	Integer max compression level accepted by compression functions
		687	COMPRESSION_RECOMMENDED_INPUT_SIZE
		688	Recommended chunk size to feed to compressor functions
		689	COMPRESSION_RECOMMENDED_OUTPUT_SIZE
		690	Recommended chunk size for compression output
		691	DECOMPRESSION_RECOMMENDED_INPUT_SIZE
		692	Recommended chunk size to feed into decompresor functions
		693	DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE
		694	Recommended chunk size for decompression output
		695
		696	FRAME_HEADER
		697	bytes containing header of the Zstandard frame
		698	MAGIC_NUMBER
		699	Frame header as an integer
		700
		701	WINDOWLOG_MIN
		702	Minimum value for compression parameter
		703	WINDOWLOG_MAX
		704	Maximum value for compression parameter
		705	CHAINLOG_MIN
		706	Minimum value for compression parameter
		707	CHAINLOG_MAX
		708	Maximum value for compression parameter
		709	HASHLOG_MIN
		710	Minimum value for compression parameter
		711	HASHLOG_MAX
		712	Maximum value for compression parameter
		713	SEARCHLOG_MIN
		714	Minimum value for compression parameter
		715	SEARCHLOG_MAX
		716	Maximum value for compression parameter
		717	SEARCHLENGTH_MIN
		718	Minimum value for compression parameter
		719	SEARCHLENGTH_MAX
		720	Maximum value for compression parameter
		721	TARGETLENGTH_MIN
		722	Minimum value for compression parameter
		723	TARGETLENGTH_MAX
		724	Maximum value for compression parameter
		725	STRATEGY_FAST
		726	Compression strategory
		727	STRATEGY_DFAST
		728	Compression strategory
		729	STRATEGY_GREEDY
		730	Compression strategory
		731	STRATEGY_LAZY
		732	Compression strategory
		733	STRATEGY_LAZY2
		734	Compression strategory
		735	STRATEGY_BTLAZY2
		736	Compression strategory
		737	STRATEGY_BTOPT
		738	Compression strategory
		739
		740	Note on Zstandard's Experimental API
		741	======================================
		742
		743	Many of the Zstandard APIs used by this module are marked as experimental
		744	within the Zstandard project. This includes a large number of useful
		745	features, such as compression and frame parameters and parts of dictionary
		746	compression.
		747
		748	It is unclear how Zstandard's C API will evolve over time, especially with
		749	regards to this experimental functionality. We will try to maintain
		750	backwards compatibility at the Python API level. However, we cannot
		751	guarantee this for things not under our control.
		752
		753	Since a copy of the Zstandard source code is distributed with this
		754	module and since we compile against it, the behavior of a specific
		755	version of this module should be constant for all of time. So if you
		756	pin the version of this module used in your projects (which is a Python
		757	best practice), you should be buffered from unwanted future changes.
		758
		759	Donate
		760	======
		761
		762	A lot of time has been invested into this project by the author.
		763
		764	If you find this project useful and would like to thank the author for
		765	their work, consider donating some money. Any amount is appreciated.
		766
		767	.. image:: https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif
		768	:target: https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=gregory%2eszorc%40gmail%2ecom&lc=US&item_name=python%2dzstandard&currency_code=USD&bn=PP%2dDonationsBF%3abtn_donate_LG%2egif%3aNonHosted
		769	:alt: Donate via PayPal
		770
		771	.. \|ci-status\| image:: https://travis-ci.org/indygreg/python-zstandard.svg?branch=master
		772	:target: https://travis-ci.org/indygreg/python-zstandard
		773
		774	.. \|win-ci-status\| image:: https://ci.appveyor.com/api/projects/status/github/indygreg/python-zstandard?svg=true
		775	:target: https://ci.appveyor.com/project/indygreg/python-zstandard
		776	:alt: Windows build status

contrib/python-zstandard/c-ext/compressiondict.c

0 created 644 +247 0

@@ -0,0 +1,247 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	#include "python-zstandard.h"
	10
	11	extern PyObject* ZstdError;
	12
	13	ZstdCompressionDict* train_dictionary(PyObject* self, PyObject* args, PyObject* kwargs) {
	14	static char *kwlist[] = { "dict_size", "samples", "parameters", NULL };
	15	size_t capacity;
	16	PyObject* samples;
	17	Py_ssize_t samplesLen;
	18	PyObject* parameters = NULL;
	19	ZDICT_params_t zparams;
	20	Py_ssize_t sampleIndex;
	21	Py_ssize_t sampleSize;
	22	PyObject* sampleItem;
	23	size_t zresult;
	24	void* sampleBuffer;
	25	void* sampleOffset;
	26	size_t samplesSize = 0;
	27	size_t* sampleSizes;
	28	void* dict;
	29	ZstdCompressionDict* result;
	30
	31	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "nO!\|O!", kwlist,
	32	&capacity,
	33	&PyList_Type, &samples,
	34	(PyObject*)&DictParametersType, &parameters)) {
	35	return NULL;
	36	}
	37
	38	/* Validate parameters first since it is easiest. */
	39	zparams.selectivityLevel = 0;
	40	zparams.compressionLevel = 0;
	41	zparams.notificationLevel = 0;
	42	zparams.dictID = 0;
	43	zparams.reserved[0] = 0;
	44	zparams.reserved[1] = 0;
	45
	46	if (parameters) {
	47	/* TODO validate data ranges */
	48	zparams.selectivityLevel = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 0));
	49	zparams.compressionLevel = PyLong_AsLong(PyTuple_GetItem(parameters, 1));
	50	zparams.notificationLevel = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 2));
	51	zparams.dictID = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 3));
	52	}
	53
	54	/* Figure out the size of the raw samples */
	55	samplesLen = PyList_Size(samples);
	56	for (sampleIndex = 0; sampleIndex < samplesLen; sampleIndex++) {
	57	sampleItem = PyList_GetItem(samples, sampleIndex);
	58	if (!PyBytes_Check(sampleItem)) {
	59	PyErr_SetString(PyExc_ValueError, "samples must be bytes");
	60	/* TODO probably need to perform DECREF here */
	61	return NULL;
	62	}
	63	samplesSize += PyBytes_GET_SIZE(sampleItem);
	64	}
	65
	66	/* Now that we know the total size of the raw simples, we can allocate
	67	a buffer for the raw data */
	68	sampleBuffer = malloc(samplesSize);
	69	if (!sampleBuffer) {
	70	PyErr_NoMemory();
	71	return NULL;
	72	}
	73	sampleSizes = malloc(samplesLen * sizeof(size_t));
	74	if (!sampleSizes) {
	75	free(sampleBuffer);
	76	PyErr_NoMemory();
	77	return NULL;
	78	}
	79
	80	sampleOffset = sampleBuffer;
	81	/* Now iterate again and assemble the samples in the buffer */
	82	for (sampleIndex = 0; sampleIndex < samplesLen; sampleIndex++) {
	83	sampleItem = PyList_GetItem(samples, sampleIndex);
	84	sampleSize = PyBytes_GET_SIZE(sampleItem);
	85	sampleSizes[sampleIndex] = sampleSize;
	86	memcpy(sampleOffset, PyBytes_AS_STRING(sampleItem), sampleSize);
	87	sampleOffset = (char*)sampleOffset + sampleSize;
	88	}
	89
	90	dict = malloc(capacity);
	91	if (!dict) {
	92	free(sampleSizes);
	93	free(sampleBuffer);
	94	PyErr_NoMemory();
	95	return NULL;
	96	}
	97
	98	zresult = ZDICT_trainFromBuffer_advanced(dict, capacity,
	99	sampleBuffer, sampleSizes, (unsigned int)samplesLen,
	100	zparams);
	101	if (ZDICT_isError(zresult)) {
	102	PyErr_Format(ZstdError, "Cannot train dict: %s", ZDICT_getErrorName(zresult));
	103	free(dict);
	104	free(sampleSizes);
	105	free(sampleBuffer);
	106	return NULL;
	107	}
	108
	109	result = PyObject_New(ZstdCompressionDict, &ZstdCompressionDictType);
	110	if (!result) {
	111	return NULL;
	112	}
	113
	114	result->dictData = dict;
	115	result->dictSize = zresult;
	116	return result;
	117	}
	118
	119
	120	PyDoc_STRVAR(ZstdCompressionDict__doc__,
	121	"ZstdCompressionDict(data) - Represents a computed compression dictionary\n"
	122	"\n"
	123	"This type holds the results of a computed Zstandard compression dictionary.\n"
	124	"Instances are obtained by calling ``train_dictionary()`` or by passing bytes\n"
	125	"obtained from another source into the constructor.\n"
	126	);
	127
	128	static int ZstdCompressionDict_init(ZstdCompressionDict* self, PyObject* args) {
	129	const char* source;
	130	Py_ssize_t sourceSize;
	131
	132	self->dictData = NULL;
	133	self->dictSize = 0;
	134
	135	#if PY_MAJOR_VERSION >= 3
	136	if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
	137	#else
	138	if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
	139	#endif
	140	return -1;
	141	}
	142
	143	self->dictData = malloc(sourceSize);
	144	if (!self->dictData) {
	145	PyErr_NoMemory();
	146	return -1;
	147	}
	148
	149	memcpy(self->dictData, source, sourceSize);
	150	self->dictSize = sourceSize;
	151
	152	return 0;
	153	}
	154
	155	static void ZstdCompressionDict_dealloc(ZstdCompressionDict* self) {
	156	if (self->dictData) {
	157	free(self->dictData);
	158	self->dictData = NULL;
	159	}
	160
	161	PyObject_Del(self);
	162	}
	163
	164	static PyObject* ZstdCompressionDict_dict_id(ZstdCompressionDict* self) {
	165	unsigned dictID = ZDICT_getDictID(self->dictData, self->dictSize);
	166
	167	return PyLong_FromLong(dictID);
	168	}
	169
	170	static PyObject* ZstdCompressionDict_as_bytes(ZstdCompressionDict* self) {
	171	return PyBytes_FromStringAndSize(self->dictData, self->dictSize);
	172	}
	173
	174	static PyMethodDef ZstdCompressionDict_methods[] = {
	175	{ "dict_id", (PyCFunction)ZstdCompressionDict_dict_id, METH_NOARGS,
	176	PyDoc_STR("dict_id() -- obtain the numeric dictionary ID") },
	177	{ "as_bytes", (PyCFunction)ZstdCompressionDict_as_bytes, METH_NOARGS,
	178	PyDoc_STR("as_bytes() -- obtain the raw bytes constituting the dictionary data") },
	179	{ NULL, NULL }
	180	};
	181
	182	static Py_ssize_t ZstdCompressionDict_length(ZstdCompressionDict* self) {
	183	return self->dictSize;
	184	}
	185
	186	static PySequenceMethods ZstdCompressionDict_sq = {
	187	(lenfunc)ZstdCompressionDict_length, /* sq_length */
	188	0, /* sq_concat */
	189	0, /* sq_repeat */
	190	0, /* sq_item */
	191	0, /* sq_ass_item */
	192	0, /* sq_contains */
	193	0, /* sq_inplace_concat */
	194	0 /* sq_inplace_repeat */
	195	};
	196
	197	PyTypeObject ZstdCompressionDictType = {
	198	PyVarObject_HEAD_INIT(NULL, 0)
	199	"zstd.ZstdCompressionDict", /* tp_name */
	200	sizeof(ZstdCompressionDict), /* tp_basicsize */
	201	0, /* tp_itemsize */
	202	(destructor)ZstdCompressionDict_dealloc, /* tp_dealloc */
	203	0, /* tp_print */
	204	0, /* tp_getattr */
	205	0, /* tp_setattr */
	206	0, /* tp_compare */
	207	0, /* tp_repr */
	208	0, /* tp_as_number */
	209	&ZstdCompressionDict_sq, /* tp_as_sequence */
	210	0, /* tp_as_mapping */
	211	0, /* tp_hash */
	212	0, /* tp_call */
	213	0, /* tp_str */
	214	0, /* tp_getattro */
	215	0, /* tp_setattro */
	216	0, /* tp_as_buffer */
	217	Py_TPFLAGS_DEFAULT \| Py_TPFLAGS_BASETYPE, /* tp_flags */
	218	ZstdCompressionDict__doc__, /* tp_doc */
	219	0, /* tp_traverse */
	220	0, /* tp_clear */
	221	0, /* tp_richcompare */
	222	0, /* tp_weaklistoffset */
	223	0, /* tp_iter */
	224	0, /* tp_iternext */
	225	ZstdCompressionDict_methods, /* tp_methods */
	226	0, /* tp_members */
	227	0, /* tp_getset */
	228	0, /* tp_base */
	229	0, /* tp_dict */
	230	0, /* tp_descr_get */
	231	0, /* tp_descr_set */
	232	0, /* tp_dictoffset */
	233	(initproc)ZstdCompressionDict_init, /* tp_init */
	234	0, /* tp_alloc */
	235	PyType_GenericNew, /* tp_new */
	236	};
	237
	238	void compressiondict_module_init(PyObject* mod) {
	239	Py_TYPE(&ZstdCompressionDictType) = &PyType_Type;
	240	if (PyType_Ready(&ZstdCompressionDictType) < 0) {
	241	return;
	242	}
	243
	244	Py_INCREF((PyObject*)&ZstdCompressionDictType);
	245	PyModule_AddObject(mod, "ZstdCompressionDict",
	246	(PyObject*)&ZstdCompressionDictType);
	247	}

contrib/python-zstandard/c-ext/compressionparams.c

0 created 644 +226 0

@@ -0,0 +1,226 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	#include "python-zstandard.h"
	10
	11	void ztopy_compression_parameters(CompressionParametersObject* params, ZSTD_compressionParameters* zparams) {
	12	zparams->windowLog = params->windowLog;
	13	zparams->chainLog = params->chainLog;
	14	zparams->hashLog = params->hashLog;
	15	zparams->searchLog = params->searchLog;
	16	zparams->searchLength = params->searchLength;
	17	zparams->targetLength = params->targetLength;
	18	zparams->strategy = params->strategy;
	19	}
	20
	21	CompressionParametersObject* get_compression_parameters(PyObject* self, PyObject* args) {
	22	int compressionLevel;
	23	unsigned PY_LONG_LONG sourceSize = 0;
	24	Py_ssize_t dictSize = 0;
	25	ZSTD_compressionParameters params;
	26	CompressionParametersObject* result;
	27
	28	if (!PyArg_ParseTuple(args, "i\|Kn", &compressionLevel, &sourceSize, &dictSize)) {
	29	return NULL;
	30	}
	31
	32	params = ZSTD_getCParams(compressionLevel, sourceSize, dictSize);
	33
	34	result = PyObject_New(CompressionParametersObject, &CompressionParametersType);
	35	if (!result) {
	36	return NULL;
	37	}
	38
	39	result->windowLog = params.windowLog;
	40	result->chainLog = params.chainLog;
	41	result->hashLog = params.hashLog;
	42	result->searchLog = params.searchLog;
	43	result->searchLength = params.searchLength;
	44	result->targetLength = params.targetLength;
	45	result->strategy = params.strategy;
	46
	47	return result;
	48	}
	49
	50	PyObject* estimate_compression_context_size(PyObject* self, PyObject* args) {
	51	CompressionParametersObject* params;
	52	ZSTD_compressionParameters zparams;
	53	PyObject* result;
	54
	55	if (!PyArg_ParseTuple(args, "O!", &CompressionParametersType, &params)) {
	56	return NULL;
	57	}
	58
	59	ztopy_compression_parameters(params, &zparams);
	60	result = PyLong_FromSize_t(ZSTD_estimateCCtxSize(zparams));
	61	return result;
	62	}
	63
	64	PyDoc_STRVAR(CompressionParameters__doc__,
	65	"CompressionParameters: low-level control over zstd compression");
	66
	67	static PyObject* CompressionParameters_new(PyTypeObject* subtype, PyObject* args, PyObject* kwargs) {
	68	CompressionParametersObject* self;
	69	unsigned windowLog;
	70	unsigned chainLog;
	71	unsigned hashLog;
	72	unsigned searchLog;
	73	unsigned searchLength;
	74	unsigned targetLength;
	75	unsigned strategy;
	76
	77	if (!PyArg_ParseTuple(args, "IIIIIII", &windowLog, &chainLog, &hashLog, &searchLog,
	78	&searchLength, &targetLength, &strategy)) {
	79	return NULL;
	80	}
	81
	82	if (windowLog < ZSTD_WINDOWLOG_MIN \|\| windowLog > ZSTD_WINDOWLOG_MAX) {
	83	PyErr_SetString(PyExc_ValueError, "invalid window log value");
	84	return NULL;
	85	}
	86
	87	if (chainLog < ZSTD_CHAINLOG_MIN \|\| chainLog > ZSTD_CHAINLOG_MAX) {
	88	PyErr_SetString(PyExc_ValueError, "invalid chain log value");
	89	return NULL;
	90	}
	91
	92	if (hashLog < ZSTD_HASHLOG_MIN \|\| hashLog > ZSTD_HASHLOG_MAX) {
	93	PyErr_SetString(PyExc_ValueError, "invalid hash log value");
	94	return NULL;
	95	}
	96
	97	if (searchLog < ZSTD_SEARCHLOG_MIN \|\| searchLog > ZSTD_SEARCHLOG_MAX) {
	98	PyErr_SetString(PyExc_ValueError, "invalid search log value");
	99	return NULL;
	100	}
	101
	102	if (searchLength < ZSTD_SEARCHLENGTH_MIN \|\| searchLength > ZSTD_SEARCHLENGTH_MAX) {
	103	PyErr_SetString(PyExc_ValueError, "invalid search length value");
	104	return NULL;
	105	}
	106
	107	if (targetLength < ZSTD_TARGETLENGTH_MIN \|\| targetLength > ZSTD_TARGETLENGTH_MAX) {
	108	PyErr_SetString(PyExc_ValueError, "invalid target length value");
	109	return NULL;
	110	}
	111
	112	if (strategy < ZSTD_fast \|\| strategy > ZSTD_btopt) {
	113	PyErr_SetString(PyExc_ValueError, "invalid strategy value");
	114	return NULL;
	115	}
	116
	117	self = (CompressionParametersObject*)subtype->tp_alloc(subtype, 1);
	118	if (!self) {
	119	return NULL;
	120	}
	121
	122	self->windowLog = windowLog;
	123	self->chainLog = chainLog;
	124	self->hashLog = hashLog;
	125	self->searchLog = searchLog;
	126	self->searchLength = searchLength;
	127	self->targetLength = targetLength;
	128	self->strategy = strategy;
	129
	130	return (PyObject*)self;
	131	}
	132
	133	static void CompressionParameters_dealloc(PyObject* self) {
	134	PyObject_Del(self);
	135	}
	136
	137	static Py_ssize_t CompressionParameters_length(PyObject* self) {
	138	return 7;
	139	};
	140
	141	static PyObject* CompressionParameters_item(PyObject* o, Py_ssize_t i) {
	142	CompressionParametersObject* self = (CompressionParametersObject*)o;
	143
	144	switch (i) {
	145	case 0:
	146	return PyLong_FromLong(self->windowLog);
	147	case 1:
	148	return PyLong_FromLong(self->chainLog);
	149	case 2:
	150	return PyLong_FromLong(self->hashLog);
	151	case 3:
	152	return PyLong_FromLong(self->searchLog);
	153	case 4:
	154	return PyLong_FromLong(self->searchLength);
	155	case 5:
	156	return PyLong_FromLong(self->targetLength);
	157	case 6:
	158	return PyLong_FromLong(self->strategy);
	159	default:
	160	PyErr_SetString(PyExc_IndexError, "index out of range");
	161	return NULL;
	162	}
	163	}
	164
	165	static PySequenceMethods CompressionParameters_sq = {
	166	CompressionParameters_length, /* sq_length */
	167	0, /* sq_concat */
	168	0, /* sq_repeat */
	169	CompressionParameters_item, /* sq_item */
	170	0, /* sq_ass_item */
	171	0, /* sq_contains */
	172	0, /* sq_inplace_concat */
	173	0 /* sq_inplace_repeat */
	174	};
	175
	176	PyTypeObject CompressionParametersType = {
	177	PyVarObject_HEAD_INIT(NULL, 0)
	178	"CompressionParameters", /* tp_name */
	179	sizeof(CompressionParametersObject), /* tp_basicsize */
	180	0, /* tp_itemsize */
	181	(destructor)CompressionParameters_dealloc, /* tp_dealloc */
	182	0, /* tp_print */
	183	0, /* tp_getattr */
	184	0, /* tp_setattr */
	185	0, /* tp_compare */
	186	0, /* tp_repr */
	187	0, /* tp_as_number */
	188	&CompressionParameters_sq, /* tp_as_sequence */
	189	0, /* tp_as_mapping */
	190	0, /* tp_hash */
	191	0, /* tp_call */
	192	0, /* tp_str */
	193	0, /* tp_getattro */
	194	0, /* tp_setattro */
	195	0, /* tp_as_buffer */
	196	Py_TPFLAGS_DEFAULT, /* tp_flags */
	197	CompressionParameters__doc__, /* tp_doc */
	198	0, /* tp_traverse */
	199	0, /* tp_clear */
	200	0, /* tp_richcompare */
	201	0, /* tp_weaklistoffset */
	202	0, /* tp_iter */
	203	0, /* tp_iternext */
	204	0, /* tp_methods */
	205	0, /* tp_members */
	206	0, /* tp_getset */
	207	0, /* tp_base */
	208	0, /* tp_dict */
	209	0, /* tp_descr_get */
	210	0, /* tp_descr_set */
	211	0, /* tp_dictoffset */
	212	0, /* tp_init */
	213	0, /* tp_alloc */
	214	CompressionParameters_new, /* tp_new */
	215	};
	216
	217	void compressionparams_module_init(PyObject* mod) {
	218	Py_TYPE(&CompressionParametersType) = &PyType_Type;
	219	if (PyType_Ready(&CompressionParametersType) < 0) {
	220	return;
	221	}
	222
	223	Py_IncRef((PyObject*)&CompressionParametersType);
	224	PyModule_AddObject(mod, "CompressionParameters",
	225	(PyObject*)&CompressionParametersType);
	226	}

contrib/python-zstandard/c-ext/compressionwriter.c

0 created 644 +235 0

@@ -0,0 +1,235 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	#include "python-zstandard.h"
	10
	11	extern PyObject* ZstdError;
	12
	13	PyDoc_STRVAR(ZstdCompresssionWriter__doc__,
	14	"""A context manager used for writing compressed output to a writer.\n"
	15	);
	16
	17	static void ZstdCompressionWriter_dealloc(ZstdCompressionWriter* self) {
	18	Py_XDECREF(self->compressor);
	19	Py_XDECREF(self->writer);
	20
	21	if (self->cstream) {
	22	ZSTD_freeCStream(self->cstream);
	23	self->cstream = NULL;
	24	}
	25
	26	PyObject_Del(self);
	27	}
	28
	29	static PyObject* ZstdCompressionWriter_enter(ZstdCompressionWriter* self) {
	30	if (self->entered) {
	31	PyErr_SetString(ZstdError, "cannot __enter__ multiple times");
	32	return NULL;
	33	}
	34
	35	self->cstream = CStream_from_ZstdCompressor(self->compressor, self->sourceSize);
	36	if (!self->cstream) {
	37	return NULL;
	38	}
	39
	40	self->entered = 1;
	41
	42	Py_INCREF(self);
	43	return (PyObject*)self;
	44	}
	45
	46	static PyObject* ZstdCompressionWriter_exit(ZstdCompressionWriter* self, PyObject* args) {
	47	PyObject* exc_type;
	48	PyObject* exc_value;
	49	PyObject* exc_tb;
	50	size_t zresult;
	51
	52	ZSTD_outBuffer output;
	53	PyObject* res;
	54
	55	if (!PyArg_ParseTuple(args, "OOO", &exc_type, &exc_value, &exc_tb)) {
	56	return NULL;
	57	}
	58
	59	self->entered = 0;
	60
	61	if (self->cstream && exc_type == Py_None && exc_value == Py_None &&
	62	exc_tb == Py_None) {
	63
	64	output.dst = malloc(self->outSize);
	65	if (!output.dst) {
	66	return PyErr_NoMemory();
	67	}
	68	output.size = self->outSize;
	69	output.pos = 0;
	70
	71	while (1) {
	72	zresult = ZSTD_endStream(self->cstream, &output);
	73	if (ZSTD_isError(zresult)) {
	74	PyErr_Format(ZstdError, "error ending compression stream: %s",
	75	ZSTD_getErrorName(zresult));
	76	free(output.dst);
	77	return NULL;
	78	}
	79
	80	if (output.pos) {
	81	#if PY_MAJOR_VERSION >= 3
	82	res = PyObject_CallMethod(self->writer, "write", "y#",
	83	#else
	84	res = PyObject_CallMethod(self->writer, "write", "s#",
	85	#endif
	86	output.dst, output.pos);
	87	Py_XDECREF(res);
	88	}
	89
	90	if (!zresult) {
	91	break;
	92	}
	93
	94	output.pos = 0;
	95	}
	96
	97	free(output.dst);
	98	ZSTD_freeCStream(self->cstream);
	99	self->cstream = NULL;
	100	}
	101
	102	Py_RETURN_FALSE;
	103	}
	104
	105	static PyObject* ZstdCompressionWriter_memory_size(ZstdCompressionWriter* self) {
	106	if (!self->cstream) {
	107	PyErr_SetString(ZstdError, "cannot determine size of an inactive compressor; "
	108	"call when a context manager is active");
	109	return NULL;
	110	}
	111
	112	return PyLong_FromSize_t(ZSTD_sizeof_CStream(self->cstream));
	113	}
	114
	115	static PyObject* ZstdCompressionWriter_write(ZstdCompressionWriter* self, PyObject* args) {
	116	const char* source;
	117	Py_ssize_t sourceSize;
	118	size_t zresult;
	119	ZSTD_inBuffer input;
	120	ZSTD_outBuffer output;
	121	PyObject* res;
	122
	123	#if PY_MAJOR_VERSION >= 3
	124	if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
	125	#else
	126	if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
	127	#endif
	128	return NULL;
	129	}
	130
	131	if (!self->entered) {
	132	PyErr_SetString(ZstdError, "compress must be called from an active context manager");
	133	return NULL;
	134	}
	135
	136	output.dst = malloc(self->outSize);
	137	if (!output.dst) {
	138	return PyErr_NoMemory();
	139	}
	140	output.size = self->outSize;
	141	output.pos = 0;
	142
	143	input.src = source;
	144	input.size = sourceSize;
	145	input.pos = 0;
	146
	147	while ((ssize_t)input.pos < sourceSize) {
	148	Py_BEGIN_ALLOW_THREADS
	149	zresult = ZSTD_compressStream(self->cstream, &output, &input);
	150	Py_END_ALLOW_THREADS
	151
	152	if (ZSTD_isError(zresult)) {
	153	free(output.dst);
	154	PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
	155	return NULL;
	156	}
	157
	158	/* Copy data from output buffer to writer. */
	159	if (output.pos) {
	160	#if PY_MAJOR_VERSION >= 3
	161	res = PyObject_CallMethod(self->writer, "write", "y#",
	162	#else
	163	res = PyObject_CallMethod(self->writer, "write", "s#",
	164	#endif
	165	output.dst, output.pos);
	166	Py_XDECREF(res);
	167	}
	168	output.pos = 0;
	169	}
	170
	171	free(output.dst);
	172
	173	/* TODO return bytes written */
	174	Py_RETURN_NONE;
	175	}
	176
	177	static PyMethodDef ZstdCompressionWriter_methods[] = {
	178	{ "__enter__", (PyCFunction)ZstdCompressionWriter_enter, METH_NOARGS,
	179	PyDoc_STR("Enter a compression context.") },
	180	{ "__exit__", (PyCFunction)ZstdCompressionWriter_exit, METH_VARARGS,
	181	PyDoc_STR("Exit a compression context.") },
	182	{ "memory_size", (PyCFunction)ZstdCompressionWriter_memory_size, METH_NOARGS,
	183	PyDoc_STR("Obtain the memory size of the underlying compressor") },
	184	{ "write", (PyCFunction)ZstdCompressionWriter_write, METH_VARARGS,
	185	PyDoc_STR("Compress data") },
	186	{ NULL, NULL }
	187	};
	188
	189	PyTypeObject ZstdCompressionWriterType = {
	190	PyVarObject_HEAD_INIT(NULL, 0)
	191	"zstd.ZstdCompressionWriter", /* tp_name */
	192	sizeof(ZstdCompressionWriter), /* tp_basicsize */
	193	0, /* tp_itemsize */
	194	(destructor)ZstdCompressionWriter_dealloc, /* tp_dealloc */
	195	0, /* tp_print */
	196	0, /* tp_getattr */
	197	0, /* tp_setattr */
	198	0, /* tp_compare */
	199	0, /* tp_repr */
	200	0, /* tp_as_number */
	201	0, /* tp_as_sequence */
	202	0, /* tp_as_mapping */
	203	0, /* tp_hash */
	204	0, /* tp_call */
	205	0, /* tp_str */
	206	0, /* tp_getattro */
	207	0, /* tp_setattro */
	208	0, /* tp_as_buffer */
	209	Py_TPFLAGS_DEFAULT \| Py_TPFLAGS_BASETYPE, /* tp_flags */
	210	ZstdCompresssionWriter__doc__, /* tp_doc */
	211	0, /* tp_traverse */
	212	0, /* tp_clear */
	213	0, /* tp_richcompare */
	214	0, /* tp_weaklistoffset */
	215	0, /* tp_iter */
	216	0, /* tp_iternext */
	217	ZstdCompressionWriter_methods, /* tp_methods */
	218	0, /* tp_members */
	219	0, /* tp_getset */
	220	0, /* tp_base */
	221	0, /* tp_dict */
	222	0, /* tp_descr_get */
	223	0, /* tp_descr_set */
	224	0, /* tp_dictoffset */
	225	0, /* tp_init */
	226	0, /* tp_alloc */
	227	PyType_GenericNew, /* tp_new */
	228	};
	229
	230	void compressionwriter_module_init(PyObject* mod) {
	231	Py_TYPE(&ZstdCompressionWriterType) = &PyType_Type;
	232	if (PyType_Ready(&ZstdCompressionWriterType) < 0) {
	233	return;
	234	}
	235	}

contrib/python-zstandard/c-ext/compressobj.c

0 created 644 +205 0

@@ -0,0 +1,205 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	#include "python-zstandard.h"
	10
	11	extern PyObject* ZstdError;
	12
	13	PyDoc_STRVAR(ZstdCompressionObj__doc__,
	14	"Perform compression using a standard library compatible API.\n"
	15	);
	16
	17	static void ZstdCompressionObj_dealloc(ZstdCompressionObj* self) {
	18	PyMem_Free(self->output.dst);
	19	self->output.dst = NULL;
	20
	21	if (self->cstream) {
	22	ZSTD_freeCStream(self->cstream);
	23	self->cstream = NULL;
	24	}
	25
	26	Py_XDECREF(self->compressor);
	27
	28	PyObject_Del(self);
	29	}
	30
	31	static PyObject* ZstdCompressionObj_compress(ZstdCompressionObj* self, PyObject* args) {
	32	const char* source;
	33	Py_ssize_t sourceSize;
	34	ZSTD_inBuffer input;
	35	size_t zresult;
	36	PyObject* result = NULL;
	37	Py_ssize_t resultSize = 0;
	38
	39	if (self->flushed) {
	40	PyErr_SetString(ZstdError, "cannot call compress() after flush() has been called");
	41	return NULL;
	42	}
	43
	44	#if PY_MAJOR_VERSION >= 3
	45	if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
	46	#else
	47	if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
	48	#endif
	49	return NULL;
	50	}
	51
	52	input.src = source;
	53	input.size = sourceSize;
	54	input.pos = 0;
	55
	56	while ((ssize_t)input.pos < sourceSize) {
	57	Py_BEGIN_ALLOW_THREADS
	58	zresult = ZSTD_compressStream(self->cstream, &self->output, &input);
	59	Py_END_ALLOW_THREADS
	60
	61	if (ZSTD_isError(zresult)) {
	62	PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
	63	return NULL;
	64	}
	65
	66	if (self->output.pos) {
	67	if (result) {
	68	resultSize = PyBytes_GET_SIZE(result);
	69	if (-1 == _PyBytes_Resize(&result, resultSize + self->output.pos)) {
	70	return NULL;
	71	}
	72
	73	memcpy(PyBytes_AS_STRING(result) + resultSize,
	74	self->output.dst, self->output.pos);
	75	}
	76	else {
	77	result = PyBytes_FromStringAndSize(self->output.dst, self->output.pos);
	78	if (!result) {
	79	return NULL;
	80	}
	81	}
	82
	83	self->output.pos = 0;
	84	}
	85	}
	86
	87	if (result) {
	88	return result;
	89	}
	90	else {
	91	return PyBytes_FromString("");
	92	}
	93	}
	94
	95	static PyObject* ZstdCompressionObj_flush(ZstdCompressionObj* self) {
	96	size_t zresult;
	97	PyObject* result = NULL;
	98	Py_ssize_t resultSize = 0;
	99
	100	if (self->flushed) {
	101	PyErr_SetString(ZstdError, "flush() already called");
	102	return NULL;
	103	}
	104
	105	self->flushed = 1;
	106
	107	while (1) {
	108	zresult = ZSTD_endStream(self->cstream, &self->output);
	109	if (ZSTD_isError(zresult)) {
	110	PyErr_Format(ZstdError, "error ending compression stream: %s",
	111	ZSTD_getErrorName(zresult));
	112	return NULL;
	113	}
	114
	115	if (self->output.pos) {
	116	if (result) {
	117	resultSize = PyBytes_GET_SIZE(result);
	118	if (-1 == _PyBytes_Resize(&result, resultSize + self->output.pos)) {
	119	return NULL;
	120	}
	121
	122	memcpy(PyBytes_AS_STRING(result) + resultSize,
	123	self->output.dst, self->output.pos);
	124	}
	125	else {
	126	result = PyBytes_FromStringAndSize(self->output.dst, self->output.pos);
	127	if (!result) {
	128	return NULL;
	129	}
	130	}
	131
	132	self->output.pos = 0;
	133	}
	134
	135	if (!zresult) {
	136	break;
	137	}
	138	}
	139
	140	ZSTD_freeCStream(self->cstream);
	141	self->cstream = NULL;
	142
	143	if (result) {
	144	return result;
	145	}
	146	else {
	147	return PyBytes_FromString("");
	148	}
	149	}
	150
	151	static PyMethodDef ZstdCompressionObj_methods[] = {
	152	{ "compress", (PyCFunction)ZstdCompressionObj_compress, METH_VARARGS,
	153	PyDoc_STR("compress data") },
	154	{ "flush", (PyCFunction)ZstdCompressionObj_flush, METH_NOARGS,
	155	PyDoc_STR("finish compression operation") },
	156	{ NULL, NULL }
	157	};
	158
	159	PyTypeObject ZstdCompressionObjType = {
	160	PyVarObject_HEAD_INIT(NULL, 0)
	161	"zstd.ZstdCompressionObj", /* tp_name */
	162	sizeof(ZstdCompressionObj), /* tp_basicsize */
	163	0, /* tp_itemsize */
	164	(destructor)ZstdCompressionObj_dealloc, /* tp_dealloc */
	165	0, /* tp_print */
	166	0, /* tp_getattr */
	167	0, /* tp_setattr */
	168	0, /* tp_compare */
	169	0, /* tp_repr */
	170	0, /* tp_as_number */
	171	0, /* tp_as_sequence */
	172	0, /* tp_as_mapping */
	173	0, /* tp_hash */
	174	0, /* tp_call */
	175	0, /* tp_str */
	176	0, /* tp_getattro */
	177	0, /* tp_setattro */
	178	0, /* tp_as_buffer */
	179	Py_TPFLAGS_DEFAULT \| Py_TPFLAGS_BASETYPE, /* tp_flags */
	180	ZstdCompressionObj__doc__, /* tp_doc */
	181	0, /* tp_traverse */
	182	0, /* tp_clear */
	183	0, /* tp_richcompare */
	184	0, /* tp_weaklistoffset */
	185	0, /* tp_iter */
	186	0, /* tp_iternext */
	187	ZstdCompressionObj_methods, /* tp_methods */
	188	0, /* tp_members */
	189	0, /* tp_getset */
	190	0, /* tp_base */
	191	0, /* tp_dict */
	192	0, /* tp_descr_get */
	193	0, /* tp_descr_set */
	194	0, /* tp_dictoffset */
	195	0, /* tp_init */
	196	0, /* tp_alloc */
	197	PyType_GenericNew, /* tp_new */
	198	};
	199
	200	void compressobj_module_init(PyObject* module) {
	201	Py_TYPE(&ZstdCompressionObjType) = &PyType_Type;
	202	if (PyType_Ready(&ZstdCompressionObjType) < 0) {
	203	return;
	204	}
	205	}

contrib/python-zstandard/c-ext/compressor.c

0 created 644 +757 0

This diff has been collapsed as it changes many lines, (757 lines changed) Show them Hide them
	@@ -0,0 +1,757 b''
		1	/**
		2	* Copyright (c) 2016-present, Gregory Szorc
		3	* All rights reserved.
		4	*
		5	* This software may be modified and distributed under the terms
		6	* of the BSD license. See the LICENSE file for details.
		7	*/
		8
		9	#include "python-zstandard.h"
		10
		11	extern PyObject* ZstdError;
		12
		13	/**
		14	* Initialize a zstd CStream from a ZstdCompressor instance.
		15	*
		16	* Returns a ZSTD_CStream on success or NULL on failure. If NULL, a Python
		17	* exception will be set.
		18	*/
		19	ZSTD_CStream* CStream_from_ZstdCompressor(ZstdCompressor* compressor, Py_ssize_t sourceSize) {
		20	ZSTD_CStream* cstream;
		21	ZSTD_parameters zparams;
		22	void* dictData = NULL;
		23	size_t dictSize = 0;
		24	size_t zresult;
		25
		26	cstream = ZSTD_createCStream();
		27	if (!cstream) {
		28	PyErr_SetString(ZstdError, "cannot create CStream");
		29	return NULL;
		30	}
		31
		32	if (compressor->dict) {
		33	dictData = compressor->dict->dictData;
		34	dictSize = compressor->dict->dictSize;
		35	}
		36
		37	memset(&zparams, 0, sizeof(zparams));
		38	if (compressor->cparams) {
		39	ztopy_compression_parameters(compressor->cparams, &zparams.cParams);
		40	/* Do NOT call ZSTD_adjustCParams() here because the compression params
		41	come from the user. */
		42	}
		43	else {
		44	zparams.cParams = ZSTD_getCParams(compressor->compressionLevel, sourceSize, dictSize);
		45	}
		46
		47	zparams.fParams = compressor->fparams;
		48
		49	zresult = ZSTD_initCStream_advanced(cstream, dictData, dictSize, zparams, sourceSize);
		50
		51	if (ZSTD_isError(zresult)) {
		52	ZSTD_freeCStream(cstream);
		53	PyErr_Format(ZstdError, "cannot init CStream: %s", ZSTD_getErrorName(zresult));
		54	return NULL;
		55	}
		56
		57	return cstream;
		58	}
		59
		60
		61	PyDoc_STRVAR(ZstdCompressor__doc__,
		62	"ZstdCompressor(level=None, dict_data=None, compression_params=None)\n"
		63	"\n"
		64	"Create an object used to perform Zstandard compression.\n"
		65	"\n"
		66	"An instance can compress data various ways. Instances can be used multiple\n"
		67	"times. Each compression operation will use the compression parameters\n"
		68	"defined at construction time.\n"
		69	"\n"
		70	"Compression can be configured via the following names arguments:\n"
		71	"\n"
		72	"level\n"
		73	" Integer compression level.\n"
		74	"dict_data\n"
		75	" A ``ZstdCompressionDict`` to be used to compress with dictionary data.\n"
		76	"compression_params\n"
		77	" A ``CompressionParameters`` instance defining low-level compression"
		78	" parameters. If defined, this will overwrite the ``level`` argument.\n"
		79	"write_checksum\n"
		80	" If True, a 4 byte content checksum will be written with the compressed\n"
		81	" data, allowing the decompressor to perform content verification.\n"
		82	"write_content_size\n"
		83	" If True, the decompressed content size will be included in the header of\n"
		84	" the compressed data. This data will only be written if the compressor\n"
		85	" knows the size of the input data.\n"
		86	"write_dict_id\n"
		87	" Determines whether the dictionary ID will be written into the compressed\n"
		88	" data. Defaults to True. Only adds content to the compressed data if\n"
		89	" a dictionary is being used.\n"
		90	);
		91
		92	static int ZstdCompressor_init(ZstdCompressor* self, PyObject* args, PyObject* kwargs) {
		93	static char* kwlist[] = {
		94	"level",
		95	"dict_data",
		96	"compression_params",
		97	"write_checksum",
		98	"write_content_size",
		99	"write_dict_id",
		100	NULL
		101	};
		102
		103	int level = 3;
		104	ZstdCompressionDict* dict = NULL;
		105	CompressionParametersObject* params = NULL;
		106	PyObject* writeChecksum = NULL;
		107	PyObject* writeContentSize = NULL;
		108	PyObject* writeDictID = NULL;
		109
		110	self->dict = NULL;
		111	self->cparams = NULL;
		112	self->cdict = NULL;
		113
		114	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "\|iO!O!OOO", kwlist,
		115	&level, &ZstdCompressionDictType, &dict,
		116	&CompressionParametersType, &params,
		117	&writeChecksum, &writeContentSize, &writeDictID)) {
		118	return -1;
		119	}
		120
		121	if (level < 1) {
		122	PyErr_SetString(PyExc_ValueError, "level must be greater than 0");
		123	return -1;
		124	}
		125
		126	if (level > ZSTD_maxCLevel()) {
		127	PyErr_Format(PyExc_ValueError, "level must be less than %d",
		128	ZSTD_maxCLevel() + 1);
		129	return -1;
		130	}
		131
		132	self->compressionLevel = level;
		133
		134	if (dict) {
		135	self->dict = dict;
		136	Py_INCREF(dict);
		137	}
		138
		139	if (params) {
		140	self->cparams = params;
		141	Py_INCREF(params);
		142	}
		143
		144	memset(&self->fparams, 0, sizeof(self->fparams));
		145
		146	if (writeChecksum && PyObject_IsTrue(writeChecksum)) {
		147	self->fparams.checksumFlag = 1;
		148	}
		149	if (writeContentSize && PyObject_IsTrue(writeContentSize)) {
		150	self->fparams.contentSizeFlag = 1;
		151	}
		152	if (writeDictID && PyObject_Not(writeDictID)) {
		153	self->fparams.noDictIDFlag = 1;
		154	}
		155
		156	return 0;
		157	}
		158
		159	static void ZstdCompressor_dealloc(ZstdCompressor* self) {
		160	Py_XDECREF(self->cparams);
		161	Py_XDECREF(self->dict);
		162
		163	if (self->cdict) {
		164	ZSTD_freeCDict(self->cdict);
		165	self->cdict = NULL;
		166	}
		167
		168	PyObject_Del(self);
		169	}
		170
		171	PyDoc_STRVAR(ZstdCompressor_copy_stream__doc__,
		172	"copy_stream(ifh, ofh[, size=0, read_size=default, write_size=default])\n"
		173	"compress data between streams\n"
		174	"\n"
		175	"Data will be read from ``ifh``, compressed, and written to ``ofh``.\n"
		176	"``ifh`` must have a ``read(size)`` method. ``ofh`` must have a ``write(data)``\n"
		177	"method.\n"
		178	"\n"
		179	"An optional ``size`` argument specifies the size of the source stream.\n"
		180	"If defined, compression parameters will be tuned based on the size.\n"
		181	"\n"
		182	"Optional arguments ``read_size`` and ``write_size`` define the chunk sizes\n"
		183	"of ``read()`` and ``write()`` operations, respectively. By default, they use\n"
		184	"the default compression stream input and output sizes, respectively.\n"
		185	);
		186
		187	static PyObject* ZstdCompressor_copy_stream(ZstdCompressor* self, PyObject* args, PyObject* kwargs) {
		188	static char* kwlist[] = {
		189	"ifh",
		190	"ofh",
		191	"size",
		192	"read_size",
		193	"write_size",
		194	NULL
		195	};
		196
		197	PyObject* source;
		198	PyObject* dest;
		199	Py_ssize_t sourceSize = 0;
		200	size_t inSize = ZSTD_CStreamInSize();
		201	size_t outSize = ZSTD_CStreamOutSize();
		202	ZSTD_CStream* cstream;
		203	ZSTD_inBuffer input;
		204	ZSTD_outBuffer output;
		205	Py_ssize_t totalRead = 0;
		206	Py_ssize_t totalWrite = 0;
		207	char* readBuffer;
		208	Py_ssize_t readSize;
		209	PyObject* readResult;
		210	PyObject* res = NULL;
		211	size_t zresult;
		212	PyObject* writeResult;
		213	PyObject* totalReadPy;
		214	PyObject* totalWritePy;
		215
		216	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO\|nkk", kwlist, &source, &dest, &sourceSize,
		217	&inSize, &outSize)) {
		218	return NULL;
		219	}
		220
		221	if (!PyObject_HasAttrString(source, "read")) {
		222	PyErr_SetString(PyExc_ValueError, "first argument must have a read() method");
		223	return NULL;
		224	}
		225
		226	if (!PyObject_HasAttrString(dest, "write")) {
		227	PyErr_SetString(PyExc_ValueError, "second argument must have a write() method");
		228	return NULL;
		229	}
		230
		231	cstream = CStream_from_ZstdCompressor(self, sourceSize);
		232	if (!cstream) {
		233	res = NULL;
		234	goto finally;
		235	}
		236
		237	output.dst = PyMem_Malloc(outSize);
		238	if (!output.dst) {
		239	PyErr_NoMemory();
		240	res = NULL;
		241	goto finally;
		242	}
		243	output.size = outSize;
		244	output.pos = 0;
		245
		246	while (1) {
		247	/* Try to read from source stream. */
		248	readResult = PyObject_CallMethod(source, "read", "n", inSize);
		249	if (!readResult) {
		250	PyErr_SetString(ZstdError, "could not read() from source");
		251	goto finally;
		252	}
		253
		254	PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize);
		255
		256	/* If no data was read, we're at EOF. */
		257	if (0 == readSize) {
		258	break;
		259	}
		260
		261	totalRead += readSize;
		262
		263	/* Send data to compressor */
		264	input.src = readBuffer;
		265	input.size = readSize;
		266	input.pos = 0;
		267
		268	while (input.pos < input.size) {
		269	Py_BEGIN_ALLOW_THREADS
		270	zresult = ZSTD_compressStream(cstream, &output, &input);
		271	Py_END_ALLOW_THREADS
		272
		273	if (ZSTD_isError(zresult)) {
		274	res = NULL;
		275	PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
		276	goto finally;
		277	}
		278
		279	if (output.pos) {
		280	#if PY_MAJOR_VERSION >= 3
		281	writeResult = PyObject_CallMethod(dest, "write", "y#",
		282	#else
		283	writeResult = PyObject_CallMethod(dest, "write", "s#",
		284	#endif
		285	output.dst, output.pos);
		286	Py_XDECREF(writeResult);
		287	totalWrite += output.pos;
		288	output.pos = 0;
		289	}
		290	}
		291	}
		292
		293	/* We've finished reading. Now flush the compressor stream. */
		294	while (1) {
		295	zresult = ZSTD_endStream(cstream, &output);
		296	if (ZSTD_isError(zresult)) {
		297	PyErr_Format(ZstdError, "error ending compression stream: %s",
		298	ZSTD_getErrorName(zresult));
		299	res = NULL;
		300	goto finally;
		301	}
		302
		303	if (output.pos) {
		304	#if PY_MAJOR_VERSION >= 3
		305	writeResult = PyObject_CallMethod(dest, "write", "y#",
		306	#else
		307	writeResult = PyObject_CallMethod(dest, "write", "s#",
		308	#endif
		309	output.dst, output.pos);
		310	totalWrite += output.pos;
		311	Py_XDECREF(writeResult);
		312	output.pos = 0;
		313	}
		314
		315	if (!zresult) {
		316	break;
		317	}
		318	}
		319
		320	ZSTD_freeCStream(cstream);
		321	cstream = NULL;
		322
		323	totalReadPy = PyLong_FromSsize_t(totalRead);
		324	totalWritePy = PyLong_FromSsize_t(totalWrite);
		325	res = PyTuple_Pack(2, totalReadPy, totalWritePy);
		326	Py_DecRef(totalReadPy);
		327	Py_DecRef(totalWritePy);
		328
		329	finally:
		330	if (output.dst) {
		331	PyMem_Free(output.dst);
		332	}
		333
		334	if (cstream) {
		335	ZSTD_freeCStream(cstream);
		336	}
		337
		338	return res;
		339	}
		340
		341	PyDoc_STRVAR(ZstdCompressor_compress__doc__,
		342	"compress(data)\n"
		343	"\n"
		344	"Compress data in a single operation.\n"
		345	"\n"
		346	"This is the simplest mechanism to perform compression: simply pass in a\n"
		347	"value and get a compressed value back. It is almost the most prone to abuse.\n"
		348	"The input and output values must fit in memory, so passing in very large\n"
		349	"values can result in excessive memory usage. For this reason, one of the\n"
		350	"streaming based APIs is preferred for larger values.\n"
		351	);
		352
		353	static PyObject* ZstdCompressor_compress(ZstdCompressor* self, PyObject* args) {
		354	const char* source;
		355	Py_ssize_t sourceSize;
		356	size_t destSize;
		357	ZSTD_CCtx* cctx;
		358	PyObject* output;
		359	char* dest;
		360	void* dictData = NULL;
		361	size_t dictSize = 0;
		362	size_t zresult;
		363	ZSTD_parameters zparams;
		364	ZSTD_customMem zmem;
		365
		366	#if PY_MAJOR_VERSION >= 3
		367	if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
		368	#else
		369	if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
		370	#endif
		371	return NULL;
		372	}
		373
		374	destSize = ZSTD_compressBound(sourceSize);
		375	output = PyBytes_FromStringAndSize(NULL, destSize);
		376	if (!output) {
		377	return NULL;
		378	}
		379
		380	dest = PyBytes_AsString(output);
		381
		382	cctx = ZSTD_createCCtx();
		383	if (!cctx) {
		384	Py_DECREF(output);
		385	PyErr_SetString(ZstdError, "could not create CCtx");
		386	return NULL;
		387	}
		388
		389	if (self->dict) {
		390	dictData = self->dict->dictData;
		391	dictSize = self->dict->dictSize;
		392	}
		393
		394	memset(&zparams, 0, sizeof(zparams));
		395	if (!self->cparams) {
		396	zparams.cParams = ZSTD_getCParams(self->compressionLevel, sourceSize, dictSize);
		397	}
		398	else {
		399	ztopy_compression_parameters(self->cparams, &zparams.cParams);
		400	/* Do NOT call ZSTD_adjustCParams() here because the compression params
		401	come from the user. */
		402	}
		403
		404	zparams.fParams = self->fparams;
		405
		406	/* The raw dict data has to be processed before it can be used. Since this
		407	adds overhead - especially if multiple dictionary compression operations
		408	are performed on the same ZstdCompressor instance - we create a
		409	ZSTD_CDict once and reuse it for all operations. */
		410
		411	/* TODO the zparams (which can be derived from the source data size) used
		412	on first invocation are effectively reused for subsequent operations. This
		413	may not be appropriate if input sizes vary significantly and could affect
		414	chosen compression parameters.
		415	https://github.com/facebook/zstd/issues/358 tracks this issue. */
		416	if (dictData && !self->cdict) {
		417	Py_BEGIN_ALLOW_THREADS
		418	memset(&zmem, 0, sizeof(zmem));
		419	self->cdict = ZSTD_createCDict_advanced(dictData, dictSize, zparams, zmem);
		420	Py_END_ALLOW_THREADS
		421
		422	if (!self->cdict) {
		423	Py_DECREF(output);
		424	ZSTD_freeCCtx(cctx);
		425	PyErr_SetString(ZstdError, "could not create compression dictionary");
		426	return NULL;
		427	}
		428	}
		429
		430	Py_BEGIN_ALLOW_THREADS
		431	/* By avoiding ZSTD_compress(), we don't necessarily write out content
		432	size. This means the argument to ZstdCompressor to control frame
		433	parameters is honored. */
		434	if (self->cdict) {
		435	zresult = ZSTD_compress_usingCDict(cctx, dest, destSize,
		436	source, sourceSize, self->cdict);
		437	}
		438	else {
		439	zresult = ZSTD_compress_advanced(cctx, dest, destSize,
		440	source, sourceSize, dictData, dictSize, zparams);
		441	}
		442	Py_END_ALLOW_THREADS
		443
		444	ZSTD_freeCCtx(cctx);
		445
		446	if (ZSTD_isError(zresult)) {
		447	PyErr_Format(ZstdError, "cannot compress: %s", ZSTD_getErrorName(zresult));
		448	Py_CLEAR(output);
		449	return NULL;
		450	}
		451	else {
		452	Py_SIZE(output) = zresult;
		453	}
		454
		455	return output;
		456	}
		457
		458	PyDoc_STRVAR(ZstdCompressionObj__doc__,
		459	"compressobj()\n"
		460	"\n"
		461	"Return an object exposing ``compress(data)`` and ``flush()`` methods.\n"
		462	"\n"
		463	"The returned object exposes an API similar to ``zlib.compressobj`` and\n"
		464	"``bz2.BZ2Compressor`` so that callers can swap in the zstd compressor\n"
		465	"without changing how compression is performed.\n"
		466	);
		467
		468	static ZstdCompressionObj* ZstdCompressor_compressobj(ZstdCompressor* self, PyObject* args, PyObject* kwargs) {
		469	static char* kwlist[] = {
		470	"size",
		471	NULL
		472	};
		473
		474	Py_ssize_t inSize = 0;
		475	size_t outSize = ZSTD_CStreamOutSize();
		476	ZstdCompressionObj* result = PyObject_New(ZstdCompressionObj, &ZstdCompressionObjType);
		477	if (!result) {
		478	return NULL;
		479	}
		480
		481	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "\|n", kwlist, &inSize)) {
		482	return NULL;
		483	}
		484
		485	result->cstream = CStream_from_ZstdCompressor(self, inSize);
		486	if (!result->cstream) {
		487	Py_DECREF(result);
		488	return NULL;
		489	}
		490
		491	result->output.dst = PyMem_Malloc(outSize);
		492	if (!result->output.dst) {
		493	PyErr_NoMemory();
		494	Py_DECREF(result);
		495	return NULL;
		496	}
		497	result->output.size = outSize;
		498	result->output.pos = 0;
		499
		500	result->compressor = self;
		501	Py_INCREF(result->compressor);
		502
		503	result->flushed = 0;
		504
		505	return result;
		506	}
		507
		508	PyDoc_STRVAR(ZstdCompressor_read_from__doc__,
		509	"read_from(reader, [size=0, read_size=default, write_size=default])\n"
		510	"Read uncompress data from a reader and return an iterator\n"
		511	"\n"
		512	"Returns an iterator of compressed data produced from reading from ``reader``.\n"
		513	"\n"
		514	"Uncompressed data will be obtained from ``reader`` by calling the\n"
		515	"``read(size)`` method of it. The source data will be streamed into a\n"
		516	"compressor. As compressed data is available, it will be exposed to the\n"
		517	"iterator.\n"
		518	"\n"
		519	"Data is read from the source in chunks of ``read_size``. Compressed chunks\n"
		520	"are at most ``write_size`` bytes. Both values default to the zstd input and\n"
		521	"and output defaults, respectively.\n"
		522	"\n"
		523	"The caller is partially in control of how fast data is fed into the\n"
		524	"compressor by how it consumes the returned iterator. The compressor will\n"
		525	"not consume from the reader unless the caller consumes from the iterator.\n"
		526	);
		527
		528	static ZstdCompressorIterator* ZstdCompressor_read_from(ZstdCompressor* self, PyObject* args, PyObject* kwargs) {
		529	static char* kwlist[] = {
		530	"reader",
		531	"size",
		532	"read_size",
		533	"write_size",
		534	NULL
		535	};
		536
		537	PyObject* reader;
		538	Py_ssize_t sourceSize = 0;
		539	size_t inSize = ZSTD_CStreamInSize();
		540	size_t outSize = ZSTD_CStreamOutSize();
		541	ZstdCompressorIterator* result;
		542
		543	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O\|nkk", kwlist, &reader, &sourceSize,
		544	&inSize, &outSize)) {
		545	return NULL;
		546	}
		547
		548	result = PyObject_New(ZstdCompressorIterator, &ZstdCompressorIteratorType);
		549	if (!result) {
		550	return NULL;
		551	}
		552
		553	result->compressor = NULL;
		554	result->reader = NULL;
		555	result->buffer = NULL;
		556	result->cstream = NULL;
		557	result->input.src = NULL;
		558	result->output.dst = NULL;
		559	result->readResult = NULL;
		560
		561	if (PyObject_HasAttrString(reader, "read")) {
		562	result->reader = reader;
		563	Py_INCREF(result->reader);
		564	}
		565	else if (1 == PyObject_CheckBuffer(reader)) {
		566	result->buffer = PyMem_Malloc(sizeof(Py_buffer));
		567	if (!result->buffer) {
		568	goto except;
		569	}
		570
		571	memset(result->buffer, 0, sizeof(Py_buffer));
		572
		573	if (0 != PyObject_GetBuffer(reader, result->buffer, PyBUF_CONTIG_RO)) {
		574	goto except;
		575	}
		576
		577	result->bufferOffset = 0;
		578	sourceSize = result->buffer->len;
		579	}
		580	else {
		581	PyErr_SetString(PyExc_ValueError,
		582	"must pass an object with a read() method or conforms to buffer protocol");
		583	goto except;
		584	}
		585
		586	result->compressor = self;
		587	Py_INCREF(result->compressor);
		588
		589	result->sourceSize = sourceSize;
		590	result->cstream = CStream_from_ZstdCompressor(self, sourceSize);
		591	if (!result->cstream) {
		592	goto except;
		593	}
		594
		595	result->inSize = inSize;
		596	result->outSize = outSize;
		597
		598	result->output.dst = PyMem_Malloc(outSize);
		599	if (!result->output.dst) {
		600	PyErr_NoMemory();
		601	goto except;
		602	}
		603	result->output.size = outSize;
		604	result->output.pos = 0;
		605
		606	result->input.src = NULL;
		607	result->input.size = 0;
		608	result->input.pos = 0;
		609
		610	result->finishedInput = 0;
		611	result->finishedOutput = 0;
		612
		613	goto finally;
		614
		615	except:
		616	if (result->cstream) {
		617	ZSTD_freeCStream(result->cstream);
		618	result->cstream = NULL;
		619	}
		620
		621	Py_DecRef((PyObject*)result->compressor);
		622	Py_DecRef(result->reader);
		623
		624	Py_DECREF(result);
		625	result = NULL;
		626
		627	finally:
		628	return result;
		629	}
		630
		631	PyDoc_STRVAR(ZstdCompressor_write_to___doc__,
		632	"Create a context manager to write compressed data to an object.\n"
		633	"\n"
		634	"The passed object must have a ``write()`` method.\n"
		635	"\n"
		636	"The caller feeds input data to the object by calling ``compress(data)``.\n"
		637	"Compressed data is written to the argument given to this function.\n"
		638	"\n"
		639	"The function takes an optional ``size`` argument indicating the total size\n"
		640	"of the eventual input. If specified, the size will influence compression\n"
		641	"parameter tuning and could result in the size being written into the\n"
		642	"header of the compressed data.\n"
		643	"\n"
		644	"An optional ``write_size`` argument is also accepted. It defines the maximum\n"
		645	"byte size of chunks fed to ``write()``. By default, it uses the zstd default\n"
		646	"for a compressor output stream.\n"
		647	);
		648
		649	static ZstdCompressionWriter* ZstdCompressor_write_to(ZstdCompressor* self, PyObject* args, PyObject* kwargs) {
		650	static char* kwlist[] = {
		651	"writer",
		652	"size",
		653	"write_size",
		654	NULL
		655	};
		656
		657	PyObject* writer;
		658	ZstdCompressionWriter* result;
		659	Py_ssize_t sourceSize = 0;
		660	size_t outSize = ZSTD_CStreamOutSize();
		661
		662	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O\|nk", kwlist, &writer, &sourceSize,
		663	&outSize)) {
		664	return NULL;
		665	}
		666
		667	if (!PyObject_HasAttrString(writer, "write")) {
		668	PyErr_SetString(PyExc_ValueError, "must pass an object with a write() method");
		669	return NULL;
		670	}
		671
		672	result = PyObject_New(ZstdCompressionWriter, &ZstdCompressionWriterType);
		673	if (!result) {
		674	return NULL;
		675	}
		676
		677	result->compressor = self;
		678	Py_INCREF(result->compressor);
		679
		680	result->writer = writer;
		681	Py_INCREF(result->writer);
		682
		683	result->sourceSize = sourceSize;
		684
		685	result->outSize = outSize;
		686
		687	result->entered = 0;
		688	result->cstream = NULL;
		689
		690	return result;
		691	}
		692
		693	static PyMethodDef ZstdCompressor_methods[] = {
		694	{ "compress", (PyCFunction)ZstdCompressor_compress, METH_VARARGS,
		695	ZstdCompressor_compress__doc__ },
		696	{ "compressobj", (PyCFunction)ZstdCompressor_compressobj,
		697	METH_VARARGS \| METH_KEYWORDS, ZstdCompressionObj__doc__ },
		698	{ "copy_stream", (PyCFunction)ZstdCompressor_copy_stream,
		699	METH_VARARGS \| METH_KEYWORDS, ZstdCompressor_copy_stream__doc__ },
		700	{ "read_from", (PyCFunction)ZstdCompressor_read_from,
		701	METH_VARARGS \| METH_KEYWORDS, ZstdCompressor_read_from__doc__ },
		702	{ "write_to", (PyCFunction)ZstdCompressor_write_to,
		703	METH_VARARGS \| METH_KEYWORDS, ZstdCompressor_write_to___doc__ },
		704	{ NULL, NULL }
		705	};
		706
		707	PyTypeObject ZstdCompressorType = {
		708	PyVarObject_HEAD_INIT(NULL, 0)
		709	"zstd.ZstdCompressor", /* tp_name */
		710	sizeof(ZstdCompressor), /* tp_basicsize */
		711	0, /* tp_itemsize */
		712	(destructor)ZstdCompressor_dealloc, /* tp_dealloc */
		713	0, /* tp_print */
		714	0, /* tp_getattr */
		715	0, /* tp_setattr */
		716	0, /* tp_compare */
		717	0, /* tp_repr */
		718	0, /* tp_as_number */
		719	0, /* tp_as_sequence */
		720	0, /* tp_as_mapping */
		721	0, /* tp_hash */
		722	0, /* tp_call */
		723	0, /* tp_str */
		724	0, /* tp_getattro */
		725	0, /* tp_setattro */
		726	0, /* tp_as_buffer */
		727	Py_TPFLAGS_DEFAULT \| Py_TPFLAGS_BASETYPE, /* tp_flags */
		728	ZstdCompressor__doc__, /* tp_doc */
		729	0, /* tp_traverse */
		730	0, /* tp_clear */
		731	0, /* tp_richcompare */
		732	0, /* tp_weaklistoffset */
		733	0, /* tp_iter */
		734	0, /* tp_iternext */
		735	ZstdCompressor_methods, /* tp_methods */
		736	0, /* tp_members */
		737	0, /* tp_getset */
		738	0, /* tp_base */
		739	0, /* tp_dict */
		740	0, /* tp_descr_get */
		741	0, /* tp_descr_set */
		742	0, /* tp_dictoffset */
		743	(initproc)ZstdCompressor_init, /* tp_init */
		744	0, /* tp_alloc */
		745	PyType_GenericNew, /* tp_new */
		746	};
		747
		748	void compressor_module_init(PyObject* mod) {
		749	Py_TYPE(&ZstdCompressorType) = &PyType_Type;
		750	if (PyType_Ready(&ZstdCompressorType) < 0) {
		751	return;
		752	}
		753
		754	Py_INCREF((PyObject*)&ZstdCompressorType);
		755	PyModule_AddObject(mod, "ZstdCompressor",
		756	(PyObject*)&ZstdCompressorType);
		757	}

contrib/python-zstandard/c-ext/compressoriterator.c

0 created 644 +234 0

@@ -0,0 +1,234 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	#include "python-zstandard.h"
	10
	11	#define min(a, b) (((a) < (b)) ? (a) : (b))
	12
	13	extern PyObject* ZstdError;
	14
	15	PyDoc_STRVAR(ZstdCompressorIterator__doc__,
	16	"Represents an iterator of compressed data.\n"
	17	);
	18
	19	static void ZstdCompressorIterator_dealloc(ZstdCompressorIterator* self) {
	20	Py_XDECREF(self->readResult);
	21	Py_XDECREF(self->compressor);
	22	Py_XDECREF(self->reader);
	23
	24	if (self->buffer) {
	25	PyBuffer_Release(self->buffer);
	26	PyMem_FREE(self->buffer);
	27	self->buffer = NULL;
	28	}
	29
	30	if (self->cstream) {
	31	ZSTD_freeCStream(self->cstream);
	32	self->cstream = NULL;
	33	}
	34
	35	if (self->output.dst) {
	36	PyMem_Free(self->output.dst);
	37	self->output.dst = NULL;
	38	}
	39
	40	PyObject_Del(self);
	41	}
	42
	43	static PyObject* ZstdCompressorIterator_iter(PyObject* self) {
	44	Py_INCREF(self);
	45	return self;
	46	}
	47
	48	static PyObject* ZstdCompressorIterator_iternext(ZstdCompressorIterator* self) {
	49	size_t zresult;
	50	PyObject* readResult = NULL;
	51	PyObject* chunk;
	52	char* readBuffer;
	53	Py_ssize_t readSize = 0;
	54	Py_ssize_t bufferRemaining;
	55
	56	if (self->finishedOutput) {
	57	PyErr_SetString(PyExc_StopIteration, "output flushed");
	58	return NULL;
	59	}
	60
	61	feedcompressor:
	62
	63	/* If we have data left in the input, consume it. */
	64	if (self->input.pos < self->input.size) {
	65	Py_BEGIN_ALLOW_THREADS
	66	zresult = ZSTD_compressStream(self->cstream, &self->output, &self->input);
	67	Py_END_ALLOW_THREADS
	68
	69	/* Release the Python object holding the input buffer. */
	70	if (self->input.pos == self->input.size) {
	71	self->input.src = NULL;
	72	self->input.pos = 0;
	73	self->input.size = 0;
	74	Py_DECREF(self->readResult);
	75	self->readResult = NULL;
	76	}
	77
	78	if (ZSTD_isError(zresult)) {
	79	PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
	80	return NULL;
	81	}
	82
	83	/* If it produced output data, emit it. */
	84	if (self->output.pos) {
	85	chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos);
	86	self->output.pos = 0;
	87	return chunk;
	88	}
	89	}
	90
	91	/* We should never have output data sitting around after a previous call. */
	92	assert(self->output.pos == 0);
	93
	94	/* The code above should have either emitted a chunk and returned or consumed
	95	the entire input buffer. So the state of the input buffer is not
	96	relevant. */
	97	if (!self->finishedInput) {
	98	if (self->reader) {
	99	readResult = PyObject_CallMethod(self->reader, "read", "I", self->inSize);
	100	if (!readResult) {
	101	PyErr_SetString(ZstdError, "could not read() from source");
	102	return NULL;
	103	}
	104
	105	PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize);
	106	}
	107	else {
	108	assert(self->buffer && self->buffer->buf);
	109
	110	/* Only support contiguous C arrays. */
	111	assert(self->buffer->strides == NULL && self->buffer->suboffsets == NULL);
	112	assert(self->buffer->itemsize == 1);
	113
	114	readBuffer = (char*)self->buffer->buf + self->bufferOffset;
	115	bufferRemaining = self->buffer->len - self->bufferOffset;
	116	readSize = min(bufferRemaining, (Py_ssize_t)self->inSize);
	117	self->bufferOffset += readSize;
	118	}
	119
	120	if (0 == readSize) {
	121	Py_XDECREF(readResult);
	122	self->finishedInput = 1;
	123	}
	124	else {
	125	self->readResult = readResult;
	126	}
	127	}
	128
	129	/* EOF */
	130	if (0 == readSize) {
	131	zresult = ZSTD_endStream(self->cstream, &self->output);
	132	if (ZSTD_isError(zresult)) {
	133	PyErr_Format(ZstdError, "error ending compression stream: %s",
	134	ZSTD_getErrorName(zresult));
	135	return NULL;
	136	}
	137
	138	assert(self->output.pos);
	139
	140	if (0 == zresult) {
	141	self->finishedOutput = 1;
	142	}
	143
	144	chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos);
	145	self->output.pos = 0;
	146	return chunk;
	147	}
	148
	149	/* New data from reader. Feed into compressor. */
	150	self->input.src = readBuffer;
	151	self->input.size = readSize;
	152	self->input.pos = 0;
	153
	154	Py_BEGIN_ALLOW_THREADS
	155	zresult = ZSTD_compressStream(self->cstream, &self->output, &self->input);
	156	Py_END_ALLOW_THREADS
	157
	158	/* The input buffer currently points to memory managed by Python
	159	(readBuffer). This object was allocated by this function. If it wasn't
	160	fully consumed, we need to release it in a subsequent function call.
	161	If it is fully consumed, do that now.
	162	*/
	163	if (self->input.pos == self->input.size) {
	164	self->input.src = NULL;
	165	self->input.pos = 0;
	166	self->input.size = 0;
	167	Py_XDECREF(self->readResult);
	168	self->readResult = NULL;
	169	}
	170
	171	if (ZSTD_isError(zresult)) {
	172	PyErr_Format(ZstdError, "zstd compress error: %s", ZSTD_getErrorName(zresult));
	173	return NULL;
	174	}
	175
	176	assert(self->input.pos <= self->input.size);
	177
	178	/* If we didn't write anything, start the process over. */
	179	if (0 == self->output.pos) {
	180	goto feedcompressor;
	181	}
	182
	183	chunk = PyBytes_FromStringAndSize(self->output.dst, self->output.pos);
	184	self->output.pos = 0;
	185	return chunk;
	186	}
	187
	188	PyTypeObject ZstdCompressorIteratorType = {
	189	PyVarObject_HEAD_INIT(NULL, 0)
	190	"zstd.ZstdCompressorIterator", /* tp_name */
	191	sizeof(ZstdCompressorIterator), /* tp_basicsize */
	192	0, /* tp_itemsize */
	193	(destructor)ZstdCompressorIterator_dealloc, /* tp_dealloc */
	194	0, /* tp_print */
	195	0, /* tp_getattr */
	196	0, /* tp_setattr */
	197	0, /* tp_compare */
	198	0, /* tp_repr */
	199	0, /* tp_as_number */
	200	0, /* tp_as_sequence */
	201	0, /* tp_as_mapping */
	202	0, /* tp_hash */
	203	0, /* tp_call */
	204	0, /* tp_str */
	205	0, /* tp_getattro */
	206	0, /* tp_setattro */
	207	0, /* tp_as_buffer */
	208	Py_TPFLAGS_DEFAULT \| Py_TPFLAGS_BASETYPE, /* tp_flags */
	209	ZstdCompressorIterator__doc__, /* tp_doc */
	210	0, /* tp_traverse */
	211	0, /* tp_clear */
	212	0, /* tp_richcompare */
	213	0, /* tp_weaklistoffset */
	214	ZstdCompressorIterator_iter, /* tp_iter */
	215	(iternextfunc)ZstdCompressorIterator_iternext, /* tp_iternext */
	216	0, /* tp_methods */
	217	0, /* tp_members */
	218	0, /* tp_getset */
	219	0, /* tp_base */
	220	0, /* tp_dict */
	221	0, /* tp_descr_get */
	222	0, /* tp_descr_set */
	223	0, /* tp_dictoffset */
	224	0, /* tp_init */
	225	0, /* tp_alloc */
	226	PyType_GenericNew, /* tp_new */
	227	};
	228
	229	void compressoriterator_module_init(PyObject* mod) {
	230	Py_TYPE(&ZstdCompressorIteratorType) = &PyType_Type;
	231	if (PyType_Ready(&ZstdCompressorIteratorType) < 0) {
	232	return;
	233	}
	234	}

contrib/python-zstandard/c-ext/constants.c

0 created 644 +84 0

@@ -0,0 +1,84 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	#include "python-zstandard.h"
	10
	11	extern PyObject* ZstdError;
	12
	13	static char frame_header[] = {
	14	'\x28',
	15	'\xb5',
	16	'\x2f',
	17	'\xfd',
	18	};
	19
	20	void constants_module_init(PyObject* mod) {
	21	PyObject* version;
	22	PyObject* zstdVersion;
	23	PyObject* frameHeader;
	24
	25	#if PY_MAJOR_VERSION >= 3
	26	version = PyUnicode_FromString(PYTHON_ZSTANDARD_VERSION);
	27	#else
	28	version = PyString_FromString(PYTHON_ZSTANDARD_VERSION);
	29	#endif
	30	Py_INCREF(version);
	31	PyModule_AddObject(mod, "__version__", version);
	32
	33	ZstdError = PyErr_NewException("zstd.ZstdError", NULL, NULL);
	34	PyModule_AddObject(mod, "ZstdError", ZstdError);
	35
	36	/* For now, the version is a simple tuple instead of a dedicated type. */
	37	zstdVersion = PyTuple_New(3);
	38	PyTuple_SetItem(zstdVersion, 0, PyLong_FromLong(ZSTD_VERSION_MAJOR));
	39	PyTuple_SetItem(zstdVersion, 1, PyLong_FromLong(ZSTD_VERSION_MINOR));
	40	PyTuple_SetItem(zstdVersion, 2, PyLong_FromLong(ZSTD_VERSION_RELEASE));
	41	Py_IncRef(zstdVersion);
	42	PyModule_AddObject(mod, "ZSTD_VERSION", zstdVersion);
	43
	44	frameHeader = PyBytes_FromStringAndSize(frame_header, sizeof(frame_header));
	45	if (frameHeader) {
	46	PyModule_AddObject(mod, "FRAME_HEADER", frameHeader);
	47	}
	48	else {
	49	PyErr_Format(PyExc_ValueError, "could not create frame header object");
	50	}
	51
	52	PyModule_AddIntConstant(mod, "MAX_COMPRESSION_LEVEL", ZSTD_maxCLevel());
	53	PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_INPUT_SIZE",
	54	(long)ZSTD_CStreamInSize());
	55	PyModule_AddIntConstant(mod, "COMPRESSION_RECOMMENDED_OUTPUT_SIZE",
	56	(long)ZSTD_CStreamOutSize());
	57	PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_INPUT_SIZE",
	58	(long)ZSTD_DStreamInSize());
	59	PyModule_AddIntConstant(mod, "DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE",
	60	(long)ZSTD_DStreamOutSize());
	61
	62	PyModule_AddIntConstant(mod, "MAGIC_NUMBER", ZSTD_MAGICNUMBER);
	63	PyModule_AddIntConstant(mod, "WINDOWLOG_MIN", ZSTD_WINDOWLOG_MIN);
	64	PyModule_AddIntConstant(mod, "WINDOWLOG_MAX", ZSTD_WINDOWLOG_MAX);
	65	PyModule_AddIntConstant(mod, "CHAINLOG_MIN", ZSTD_CHAINLOG_MIN);
	66	PyModule_AddIntConstant(mod, "CHAINLOG_MAX", ZSTD_CHAINLOG_MAX);
	67	PyModule_AddIntConstant(mod, "HASHLOG_MIN", ZSTD_HASHLOG_MIN);
	68	PyModule_AddIntConstant(mod, "HASHLOG_MAX", ZSTD_HASHLOG_MAX);
	69	PyModule_AddIntConstant(mod, "HASHLOG3_MAX", ZSTD_HASHLOG3_MAX);
	70	PyModule_AddIntConstant(mod, "SEARCHLOG_MIN", ZSTD_SEARCHLOG_MIN);
	71	PyModule_AddIntConstant(mod, "SEARCHLOG_MAX", ZSTD_SEARCHLOG_MAX);
	72	PyModule_AddIntConstant(mod, "SEARCHLENGTH_MIN", ZSTD_SEARCHLENGTH_MIN);
	73	PyModule_AddIntConstant(mod, "SEARCHLENGTH_MAX", ZSTD_SEARCHLENGTH_MAX);
	74	PyModule_AddIntConstant(mod, "TARGETLENGTH_MIN", ZSTD_TARGETLENGTH_MIN);
	75	PyModule_AddIntConstant(mod, "TARGETLENGTH_MAX", ZSTD_TARGETLENGTH_MAX);
	76
	77	PyModule_AddIntConstant(mod, "STRATEGY_FAST", ZSTD_fast);
	78	PyModule_AddIntConstant(mod, "STRATEGY_DFAST", ZSTD_dfast);
	79	PyModule_AddIntConstant(mod, "STRATEGY_GREEDY", ZSTD_greedy);
	80	PyModule_AddIntConstant(mod, "STRATEGY_LAZY", ZSTD_lazy);
	81	PyModule_AddIntConstant(mod, "STRATEGY_LAZY2", ZSTD_lazy2);
	82	PyModule_AddIntConstant(mod, "STRATEGY_BTLAZY2", ZSTD_btlazy2);
	83	PyModule_AddIntConstant(mod, "STRATEGY_BTOPT", ZSTD_btopt);
	84	}

contrib/python-zstandard/c-ext/decompressionwriter.c

0 created 644 +187 0

@@ -0,0 +1,187 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	#include "python-zstandard.h"
	10
	11	extern PyObject* ZstdError;
	12
	13	PyDoc_STRVAR(ZstdDecompressionWriter__doc,
	14	"""A context manager used for writing decompressed output.\n"
	15	);
	16
	17	static void ZstdDecompressionWriter_dealloc(ZstdDecompressionWriter* self) {
	18	Py_XDECREF(self->decompressor);
	19	Py_XDECREF(self->writer);
	20
	21	if (self->dstream) {
	22	ZSTD_freeDStream(self->dstream);
	23	self->dstream = NULL;
	24	}
	25
	26	PyObject_Del(self);
	27	}
	28
	29	static PyObject* ZstdDecompressionWriter_enter(ZstdDecompressionWriter* self) {
	30	if (self->entered) {
	31	PyErr_SetString(ZstdError, "cannot __enter__ multiple times");
	32	return NULL;
	33	}
	34
	35	self->dstream = DStream_from_ZstdDecompressor(self->decompressor);
	36	if (!self->dstream) {
	37	return NULL;
	38	}
	39
	40	self->entered = 1;
	41
	42	Py_INCREF(self);
	43	return (PyObject*)self;
	44	}
	45
	46	static PyObject* ZstdDecompressionWriter_exit(ZstdDecompressionWriter* self, PyObject* args) {
	47	self->entered = 0;
	48
	49	if (self->dstream) {
	50	ZSTD_freeDStream(self->dstream);
	51	self->dstream = NULL;
	52	}
	53
	54	Py_RETURN_FALSE;
	55	}
	56
	57	static PyObject* ZstdDecompressionWriter_memory_size(ZstdDecompressionWriter* self) {
	58	if (!self->dstream) {
	59	PyErr_SetString(ZstdError, "cannot determine size of inactive decompressor; "
	60	"call when context manager is active");
	61	return NULL;
	62	}
	63
	64	return PyLong_FromSize_t(ZSTD_sizeof_DStream(self->dstream));
	65	}
	66
	67	static PyObject* ZstdDecompressionWriter_write(ZstdDecompressionWriter* self, PyObject* args) {
	68	const char* source;
	69	Py_ssize_t sourceSize;
	70	size_t zresult = 0;
	71	ZSTD_inBuffer input;
	72	ZSTD_outBuffer output;
	73	PyObject* res;
	74
	75	#if PY_MAJOR_VERSION >= 3
	76	if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
	77	#else
	78	if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
	79	#endif
	80	return NULL;
	81	}
	82
	83	if (!self->entered) {
	84	PyErr_SetString(ZstdError, "write must be called from an active context manager");
	85	return NULL;
	86	}
	87
	88	output.dst = malloc(self->outSize);
	89	if (!output.dst) {
	90	return PyErr_NoMemory();
	91	}
	92	output.size = self->outSize;
	93	output.pos = 0;
	94
	95	input.src = source;
	96	input.size = sourceSize;
	97	input.pos = 0;
	98
	99	while ((ssize_t)input.pos < sourceSize) {
	100	Py_BEGIN_ALLOW_THREADS
	101	zresult = ZSTD_decompressStream(self->dstream, &output, &input);
	102	Py_END_ALLOW_THREADS
	103
	104	if (ZSTD_isError(zresult)) {
	105	free(output.dst);
	106	PyErr_Format(ZstdError, "zstd decompress error: %s",
	107	ZSTD_getErrorName(zresult));
	108	return NULL;
	109	}
	110
	111	if (output.pos) {
	112	#if PY_MAJOR_VERSION >= 3
	113	res = PyObject_CallMethod(self->writer, "write", "y#",
	114	#else
	115	res = PyObject_CallMethod(self->writer, "write", "s#",
	116	#endif
	117	output.dst, output.pos);
	118	Py_XDECREF(res);
	119	output.pos = 0;
	120	}
	121	}
	122
	123	free(output.dst);
	124
	125	/* TODO return bytes written */
	126	Py_RETURN_NONE;
	127	}
	128
	129	static PyMethodDef ZstdDecompressionWriter_methods[] = {
	130	{ "__enter__", (PyCFunction)ZstdDecompressionWriter_enter, METH_NOARGS,
	131	PyDoc_STR("Enter a decompression context.") },
	132	{ "__exit__", (PyCFunction)ZstdDecompressionWriter_exit, METH_VARARGS,
	133	PyDoc_STR("Exit a decompression context.") },
	134	{ "memory_size", (PyCFunction)ZstdDecompressionWriter_memory_size, METH_NOARGS,
	135	PyDoc_STR("Obtain the memory size in bytes of the underlying decompressor.") },
	136	{ "write", (PyCFunction)ZstdDecompressionWriter_write, METH_VARARGS,
	137	PyDoc_STR("Compress data") },
	138	{ NULL, NULL }
	139	};
	140
	141	PyTypeObject ZstdDecompressionWriterType = {
	142	PyVarObject_HEAD_INIT(NULL, 0)
	143	"zstd.ZstdDecompressionWriter", /* tp_name */
	144	sizeof(ZstdDecompressionWriter),/* tp_basicsize */
	145	0, /* tp_itemsize */
	146	(destructor)ZstdDecompressionWriter_dealloc, /* tp_dealloc */
	147	0, /* tp_print */
	148	0, /* tp_getattr */
	149	0, /* tp_setattr */
	150	0, /* tp_compare */
	151	0, /* tp_repr */
	152	0, /* tp_as_number */
	153	0, /* tp_as_sequence */
	154	0, /* tp_as_mapping */
	155	0, /* tp_hash */
	156	0, /* tp_call */
	157	0, /* tp_str */
	158	0, /* tp_getattro */
	159	0, /* tp_setattro */
	160	0, /* tp_as_buffer */
	161	Py_TPFLAGS_DEFAULT \| Py_TPFLAGS_BASETYPE, /* tp_flags */
	162	ZstdDecompressionWriter__doc, /* tp_doc */
	163	0, /* tp_traverse */
	164	0, /* tp_clear */
	165	0, /* tp_richcompare */
	166	0, /* tp_weaklistoffset */
	167	0, /* tp_iter */
	168	0, /* tp_iternext */
	169	ZstdDecompressionWriter_methods,/* tp_methods */
	170	0, /* tp_members */
	171	0, /* tp_getset */
	172	0, /* tp_base */
	173	0, /* tp_dict */
	174	0, /* tp_descr_get */
	175	0, /* tp_descr_set */
	176	0, /* tp_dictoffset */
	177	0, /* tp_init */
	178	0, /* tp_alloc */
	179	PyType_GenericNew, /* tp_new */
	180	};
	181
	182	void decompressionwriter_module_init(PyObject* mod) {
	183	Py_TYPE(&ZstdDecompressionWriterType) = &PyType_Type;
	184	if (PyType_Ready(&ZstdDecompressionWriterType) < 0) {
	185	return;
	186	}
	187	}

contrib/python-zstandard/c-ext/decompressobj.c

0 created 644 +170 0

@@ -0,0 +1,170 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	#include "python-zstandard.h"
	10
	11	extern PyObject* ZstdError;
	12
	13	PyDoc_STRVAR(DecompressionObj__doc__,
	14	"Perform decompression using a standard library compatible API.\n"
	15	);
	16
	17	static void DecompressionObj_dealloc(ZstdDecompressionObj* self) {
	18	if (self->dstream) {
	19	ZSTD_freeDStream(self->dstream);
	20	self->dstream = NULL;
	21	}
	22
	23	Py_XDECREF(self->decompressor);
	24
	25	PyObject_Del(self);
	26	}
	27
	28	static PyObject* DecompressionObj_decompress(ZstdDecompressionObj* self, PyObject* args) {
	29	const char* source;
	30	Py_ssize_t sourceSize;
	31	size_t zresult;
	32	ZSTD_inBuffer input;
	33	ZSTD_outBuffer output;
	34	size_t outSize = ZSTD_DStreamOutSize();
	35	PyObject* result = NULL;
	36	Py_ssize_t resultSize = 0;
	37
	38	if (self->finished) {
	39	PyErr_SetString(ZstdError, "cannot use a decompressobj multiple times");
	40	return NULL;
	41	}
	42
	43	#if PY_MAJOR_VERSION >= 3
	44	if (!PyArg_ParseTuple(args, "y#",
	45	#else
	46	if (!PyArg_ParseTuple(args, "s#",
	47	#endif
	48	&source, &sourceSize)) {
	49	return NULL;
	50	}
	51
	52	input.src = source;
	53	input.size = sourceSize;
	54	input.pos = 0;
	55
	56	output.dst = PyMem_Malloc(outSize);
	57	if (!output.dst) {
	58	PyErr_NoMemory();
	59	return NULL;
	60	}
	61	output.size = outSize;
	62	output.pos = 0;
	63
	64	/* Read input until exhausted. */
	65	while (input.pos < input.size) {
	66	Py_BEGIN_ALLOW_THREADS
	67	zresult = ZSTD_decompressStream(self->dstream, &output, &input);
	68	Py_END_ALLOW_THREADS
	69
	70	if (ZSTD_isError(zresult)) {
	71	PyErr_Format(ZstdError, "zstd decompressor error: %s",
	72	ZSTD_getErrorName(zresult));
	73	result = NULL;
	74	goto finally;
	75	}
	76
	77	if (0 == zresult) {
	78	self->finished = 1;
	79	}
	80
	81	if (output.pos) {
	82	if (result) {
	83	resultSize = PyBytes_GET_SIZE(result);
	84	if (-1 == _PyBytes_Resize(&result, resultSize + output.pos)) {
	85	goto except;
	86	}
	87
	88	memcpy(PyBytes_AS_STRING(result) + resultSize,
	89	output.dst, output.pos);
	90	}
	91	else {
	92	result = PyBytes_FromStringAndSize(output.dst, output.pos);
	93	if (!result) {
	94	goto except;
	95	}
	96	}
	97
	98	output.pos = 0;
	99	}
	100	}
	101
	102	if (!result) {
	103	result = PyBytes_FromString("");
	104	}
	105
	106	goto finally;
	107
	108	except:
	109	Py_DecRef(result);
	110	result = NULL;
	111
	112	finally:
	113	PyMem_Free(output.dst);
	114
	115	return result;
	116	}
	117
	118	static PyMethodDef DecompressionObj_methods[] = {
	119	{ "decompress", (PyCFunction)DecompressionObj_decompress,
	120	METH_VARARGS, PyDoc_STR("decompress data") },
	121	{ NULL, NULL }
	122	};
	123
	124	PyTypeObject ZstdDecompressionObjType = {
	125	PyVarObject_HEAD_INIT(NULL, 0)
	126	"zstd.ZstdDecompressionObj", /* tp_name */
	127	sizeof(ZstdDecompressionObj), /* tp_basicsize */
	128	0, /* tp_itemsize */
	129	(destructor)DecompressionObj_dealloc, /* tp_dealloc */
	130	0, /* tp_print */
	131	0, /* tp_getattr */
	132	0, /* tp_setattr */
	133	0, /* tp_compare */
	134	0, /* tp_repr */
	135	0, /* tp_as_number */
	136	0, /* tp_as_sequence */
	137	0, /* tp_as_mapping */
	138	0, /* tp_hash */
	139	0, /* tp_call */
	140	0, /* tp_str */
	141	0, /* tp_getattro */
	142	0, /* tp_setattro */
	143	0, /* tp_as_buffer */
	144	Py_TPFLAGS_DEFAULT \| Py_TPFLAGS_BASETYPE, /* tp_flags */
	145	DecompressionObj__doc__, /* tp_doc */
	146	0, /* tp_traverse */
	147	0, /* tp_clear */
	148	0, /* tp_richcompare */
	149	0, /* tp_weaklistoffset */
	150	0, /* tp_iter */
	151	0, /* tp_iternext */
	152	DecompressionObj_methods, /* tp_methods */
	153	0, /* tp_members */
	154	0, /* tp_getset */
	155	0, /* tp_base */
	156	0, /* tp_dict */
	157	0, /* tp_descr_get */
	158	0, /* tp_descr_set */
	159	0, /* tp_dictoffset */
	160	0, /* tp_init */
	161	0, /* tp_alloc */
	162	PyType_GenericNew, /* tp_new */
	163	};
	164
	165	void decompressobj_module_init(PyObject* module) {
	166	Py_TYPE(&ZstdDecompressionObjType) = &PyType_Type;
	167	if (PyType_Ready(&ZstdDecompressionObjType) < 0) {
	168	return;
	169	}
	170	}

contrib/python-zstandard/c-ext/decompressor.c

0 created 644 +669 0

This diff has been collapsed as it changes many lines, (669 lines changed) Show them Hide them
	@@ -0,0 +1,669 b''
		1	/**
		2	* Copyright (c) 2016-present, Gregory Szorc
		3	* All rights reserved.
		4	*
		5	* This software may be modified and distributed under the terms
		6	* of the BSD license. See the LICENSE file for details.
		7	*/
		8
		9	#include "python-zstandard.h"
		10
		11	extern PyObject* ZstdError;
		12
		13	ZSTD_DStream* DStream_from_ZstdDecompressor(ZstdDecompressor* decompressor) {
		14	ZSTD_DStream* dstream;
		15	void* dictData = NULL;
		16	size_t dictSize = 0;
		17	size_t zresult;
		18
		19	dstream = ZSTD_createDStream();
		20	if (!dstream) {
		21	PyErr_SetString(ZstdError, "could not create DStream");
		22	return NULL;
		23	}
		24
		25	if (decompressor->dict) {
		26	dictData = decompressor->dict->dictData;
		27	dictSize = decompressor->dict->dictSize;
		28	}
		29
		30	if (dictData) {
		31	zresult = ZSTD_initDStream_usingDict(dstream, dictData, dictSize);
		32	}
		33	else {
		34	zresult = ZSTD_initDStream(dstream);
		35	}
		36
		37	if (ZSTD_isError(zresult)) {
		38	PyErr_Format(ZstdError, "could not initialize DStream: %s",
		39	ZSTD_getErrorName(zresult));
		40	return NULL;
		41	}
		42
		43	return dstream;
		44	}
		45
		46	PyDoc_STRVAR(Decompressor__doc__,
		47	"ZstdDecompressor(dict_data=None)\n"
		48	"\n"
		49	"Create an object used to perform Zstandard decompression.\n"
		50	"\n"
		51	"An instance can perform multiple decompression operations."
		52	);
		53
		54	static int Decompressor_init(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) {
		55	static char* kwlist[] = {
		56	"dict_data",
		57	NULL
		58	};
		59
		60	ZstdCompressionDict* dict = NULL;
		61
		62	self->refdctx = NULL;
		63	self->dict = NULL;
		64	self->ddict = NULL;
		65
		66	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "\|O!", kwlist,
		67	&ZstdCompressionDictType, &dict)) {
		68	return -1;
		69	}
		70
		71	/* Instead of creating a ZSTD_DCtx for every decompression operation,
		72	we create an instance at object creation time and recycle it via
		73	ZSTD_copyDCTx() on each use. This means each use is a malloc+memcpy
		74	instead of a malloc+init. */
		75	/* TODO lazily initialize the reference ZSTD_DCtx on first use since
		76	not instances of ZstdDecompressor will use a ZSTD_DCtx. */
		77	self->refdctx = ZSTD_createDCtx();
		78	if (!self->refdctx) {
		79	PyErr_NoMemory();
		80	goto except;
		81	}
		82
		83	if (dict) {
		84	self->dict = dict;
		85	Py_INCREF(dict);
		86	}
		87
		88	return 0;
		89
		90	except:
		91	if (self->refdctx) {
		92	ZSTD_freeDCtx(self->refdctx);
		93	self->refdctx = NULL;
		94	}
		95
		96	return -1;
		97	}
		98
		99	static void Decompressor_dealloc(ZstdDecompressor* self) {
		100	if (self->refdctx) {
		101	ZSTD_freeDCtx(self->refdctx);
		102	}
		103
		104	Py_XDECREF(self->dict);
		105
		106	if (self->ddict) {
		107	ZSTD_freeDDict(self->ddict);
		108	self->ddict = NULL;
		109	}
		110
		111	PyObject_Del(self);
		112	}
		113
		114	PyDoc_STRVAR(Decompressor_copy_stream__doc__,
		115	"copy_stream(ifh, ofh[, read_size=default, write_size=default]) -- decompress data between streams\n"
		116	"\n"
		117	"Compressed data will be read from ``ifh``, decompressed, and written to\n"
		118	"``ofh``. ``ifh`` must have a ``read(size)`` method. ``ofh`` must have a\n"
		119	"``write(data)`` method.\n"
		120	"\n"
		121	"The optional ``read_size`` and ``write_size`` arguments control the chunk\n"
		122	"size of data that is ``read()`` and ``write()`` between streams. They default\n"
		123	"to the default input and output sizes of zstd decompressor streams.\n"
		124	);
		125
		126	static PyObject* Decompressor_copy_stream(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) {
		127	static char* kwlist[] = {
		128	"ifh",
		129	"ofh",
		130	"read_size",
		131	"write_size",
		132	NULL
		133	};
		134
		135	PyObject* source;
		136	PyObject* dest;
		137	size_t inSize = ZSTD_DStreamInSize();
		138	size_t outSize = ZSTD_DStreamOutSize();
		139	ZSTD_DStream* dstream;
		140	ZSTD_inBuffer input;
		141	ZSTD_outBuffer output;
		142	Py_ssize_t totalRead = 0;
		143	Py_ssize_t totalWrite = 0;
		144	char* readBuffer;
		145	Py_ssize_t readSize;
		146	PyObject* readResult;
		147	PyObject* res = NULL;
		148	size_t zresult = 0;
		149	PyObject* writeResult;
		150	PyObject* totalReadPy;
		151	PyObject* totalWritePy;
		152
		153	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO\|kk", kwlist, &source,
		154	&dest, &inSize, &outSize)) {
		155	return NULL;
		156	}
		157
		158	if (!PyObject_HasAttrString(source, "read")) {
		159	PyErr_SetString(PyExc_ValueError, "first argument must have a read() method");
		160	return NULL;
		161	}
		162
		163	if (!PyObject_HasAttrString(dest, "write")) {
		164	PyErr_SetString(PyExc_ValueError, "second argument must have a write() method");
		165	return NULL;
		166	}
		167
		168	dstream = DStream_from_ZstdDecompressor(self);
		169	if (!dstream) {
		170	res = NULL;
		171	goto finally;
		172	}
		173
		174	output.dst = PyMem_Malloc(outSize);
		175	if (!output.dst) {
		176	PyErr_NoMemory();
		177	res = NULL;
		178	goto finally;
		179	}
		180	output.size = outSize;
		181	output.pos = 0;
		182
		183	/* Read source stream until EOF */
		184	while (1) {
		185	readResult = PyObject_CallMethod(source, "read", "n", inSize);
		186	if (!readResult) {
		187	PyErr_SetString(ZstdError, "could not read() from source");
		188	goto finally;
		189	}
		190
		191	PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize);
		192
		193	/* If no data was read, we're at EOF. */
		194	if (0 == readSize) {
		195	break;
		196	}
		197
		198	totalRead += readSize;
		199
		200	/* Send data to decompressor */
		201	input.src = readBuffer;
		202	input.size = readSize;
		203	input.pos = 0;
		204
		205	while (input.pos < input.size) {
		206	Py_BEGIN_ALLOW_THREADS
		207	zresult = ZSTD_decompressStream(dstream, &output, &input);
		208	Py_END_ALLOW_THREADS
		209
		210	if (ZSTD_isError(zresult)) {
		211	PyErr_Format(ZstdError, "zstd decompressor error: %s",
		212	ZSTD_getErrorName(zresult));
		213	res = NULL;
		214	goto finally;
		215	}
		216
		217	if (output.pos) {
		218	#if PY_MAJOR_VERSION >= 3
		219	writeResult = PyObject_CallMethod(dest, "write", "y#",
		220	#else
		221	writeResult = PyObject_CallMethod(dest, "write", "s#",
		222	#endif
		223	output.dst, output.pos);
		224
		225	Py_XDECREF(writeResult);
		226	totalWrite += output.pos;
		227	output.pos = 0;
		228	}
		229	}
		230	}
		231
		232	/* Source stream is exhausted. Finish up. */
		233
		234	ZSTD_freeDStream(dstream);
		235	dstream = NULL;
		236
		237	totalReadPy = PyLong_FromSsize_t(totalRead);
		238	totalWritePy = PyLong_FromSsize_t(totalWrite);
		239	res = PyTuple_Pack(2, totalReadPy, totalWritePy);
		240	Py_DecRef(totalReadPy);
		241	Py_DecRef(totalWritePy);
		242
		243	finally:
		244	if (output.dst) {
		245	PyMem_Free(output.dst);
		246	}
		247
		248	if (dstream) {
		249	ZSTD_freeDStream(dstream);
		250	}
		251
		252	return res;
		253	}
		254
		255	PyDoc_STRVAR(Decompressor_decompress__doc__,
		256	"decompress(data[, max_output_size=None]) -- Decompress data in its entirety\n"
		257	"\n"
		258	"This method will decompress the entirety of the argument and return the\n"
		259	"result.\n"
		260	"\n"
		261	"The input bytes are expected to contain a full Zstandard frame (something\n"
		262	"compressed with ``ZstdCompressor.compress()`` or similar). If the input does\n"
		263	"not contain a full frame, an exception will be raised.\n"
		264	"\n"
		265	"If the frame header of the compressed data does not contain the content size\n"
		266	"``max_output_size`` must be specified or ``ZstdError`` will be raised. An\n"
		267	"allocation of size ``max_output_size`` will be performed and an attempt will\n"
		268	"be made to perform decompression into that buffer. If the buffer is too\n"
		269	"small or cannot be allocated, ``ZstdError`` will be raised. The buffer will\n"
		270	"be resized if it is too large.\n"
		271	"\n"
		272	"Uncompressed data could be much larger than compressed data. As a result,\n"
		273	"calling this function could result in a very large memory allocation being\n"
		274	"performed to hold the uncompressed data. Therefore it is highly\n"
		275	"recommended to use a streaming decompression method instead of this one.\n"
		276	);
		277
		278	PyObject* Decompressor_decompress(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) {
		279	static char* kwlist[] = {
		280	"data",
		281	"max_output_size",
		282	NULL
		283	};
		284
		285	const char* source;
		286	Py_ssize_t sourceSize;
		287	Py_ssize_t maxOutputSize = 0;
		288	unsigned long long decompressedSize;
		289	size_t destCapacity;
		290	PyObject* result = NULL;
		291	ZSTD_DCtx* dctx = NULL;
		292	void* dictData = NULL;
		293	size_t dictSize = 0;
		294	size_t zresult;
		295
		296	#if PY_MAJOR_VERSION >= 3
		297	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y#\|n", kwlist,
		298	#else
		299	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s#\|n", kwlist,
		300	#endif
		301	&source, &sourceSize, &maxOutputSize)) {
		302	return NULL;
		303	}
		304
		305	dctx = PyMem_Malloc(ZSTD_sizeof_DCtx(self->refdctx));
		306	if (!dctx) {
		307	PyErr_NoMemory();
		308	return NULL;
		309	}
		310
		311	ZSTD_copyDCtx(dctx, self->refdctx);
		312
		313	if (self->dict) {
		314	dictData = self->dict->dictData;
		315	dictSize = self->dict->dictSize;
		316	}
		317
		318	if (dictData && !self->ddict) {
		319	Py_BEGIN_ALLOW_THREADS
		320	self->ddict = ZSTD_createDDict(dictData, dictSize);
		321	Py_END_ALLOW_THREADS
		322
		323	if (!self->ddict) {
		324	PyErr_SetString(ZstdError, "could not create decompression dict");
		325	goto except;
		326	}
		327	}
		328
		329	decompressedSize = ZSTD_getDecompressedSize(source, sourceSize);
		330	/* 0 returned if content size not in the zstd frame header */
		331	if (0 == decompressedSize) {
		332	if (0 == maxOutputSize) {
		333	PyErr_SetString(ZstdError, "input data invalid or missing content size "
		334	"in frame header");
		335	goto except;
		336	}
		337	else {
		338	result = PyBytes_FromStringAndSize(NULL, maxOutputSize);
		339	destCapacity = maxOutputSize;
		340	}
		341	}
		342	else {
		343	result = PyBytes_FromStringAndSize(NULL, decompressedSize);
		344	destCapacity = decompressedSize;
		345	}
		346
		347	if (!result) {
		348	goto except;
		349	}
		350
		351	Py_BEGIN_ALLOW_THREADS
		352	if (self->ddict) {
		353	zresult = ZSTD_decompress_usingDDict(dctx, PyBytes_AsString(result), destCapacity,
		354	source, sourceSize, self->ddict);
		355	}
		356	else {
		357	zresult = ZSTD_decompressDCtx(dctx, PyBytes_AsString(result), destCapacity, source, sourceSize);
		358	}
		359	Py_END_ALLOW_THREADS
		360
		361	if (ZSTD_isError(zresult)) {
		362	PyErr_Format(ZstdError, "decompression error: %s", ZSTD_getErrorName(zresult));
		363	goto except;
		364	}
		365	else if (decompressedSize && zresult != decompressedSize) {
		366	PyErr_Format(ZstdError, "decompression error: decompressed %zu bytes; expected %llu",
		367	zresult, decompressedSize);
		368	goto except;
		369	}
		370	else if (zresult < destCapacity) {
		371	if (_PyBytes_Resize(&result, zresult)) {
		372	goto except;
		373	}
		374	}
		375
		376	goto finally;
		377
		378	except:
		379	Py_DecRef(result);
		380	result = NULL;
		381
		382	finally:
		383	if (dctx) {
		384	PyMem_FREE(dctx);
		385	}
		386
		387	return result;
		388	}
		389
		390	PyDoc_STRVAR(Decompressor_decompressobj__doc__,
		391	"decompressobj()\n"
		392	"\n"
		393	"Incrementally feed data into a decompressor.\n"
		394	"\n"
		395	"The returned object exposes a ``decompress(data)`` method. This makes it\n"
		396	"compatible with ``zlib.decompressobj`` and ``bz2.BZ2Decompressor`` so that\n"
		397	"callers can swap in the zstd decompressor while using the same API.\n"
		398	);
		399
		400	static ZstdDecompressionObj* Decompressor_decompressobj(ZstdDecompressor* self) {
		401	ZstdDecompressionObj* result = PyObject_New(ZstdDecompressionObj, &ZstdDecompressionObjType);
		402	if (!result) {
		403	return NULL;
		404	}
		405
		406	result->dstream = DStream_from_ZstdDecompressor(self);
		407	if (!result->dstream) {
		408	Py_DecRef((PyObject*)result);
		409	return NULL;
		410	}
		411
		412	result->decompressor = self;
		413	Py_INCREF(result->decompressor);
		414
		415	result->finished = 0;
		416
		417	return result;
		418	}
		419
		420	PyDoc_STRVAR(Decompressor_read_from__doc__,
		421	"read_from(reader[, read_size=default, write_size=default, skip_bytes=0])\n"
		422	"Read compressed data and return an iterator\n"
		423	"\n"
		424	"Returns an iterator of decompressed data chunks produced from reading from\n"
		425	"the ``reader``.\n"
		426	"\n"
		427	"Compressed data will be obtained from ``reader`` by calling the\n"
		428	"``read(size)`` method of it. The source data will be streamed into a\n"
		429	"decompressor. As decompressed data is available, it will be exposed to the\n"
		430	"returned iterator.\n"
		431	"\n"
		432	"Data is ``read()`` in chunks of size ``read_size`` and exposed to the\n"
		433	"iterator in chunks of size ``write_size``. The default values are the input\n"
		434	"and output sizes for a zstd streaming decompressor.\n"
		435	"\n"
		436	"There is also support for skipping the first ``skip_bytes`` of data from\n"
		437	"the source.\n"
		438	);
		439
		440	static ZstdDecompressorIterator* Decompressor_read_from(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) {
		441	static char* kwlist[] = {
		442	"reader",
		443	"read_size",
		444	"write_size",
		445	"skip_bytes",
		446	NULL
		447	};
		448
		449	PyObject* reader;
		450	size_t inSize = ZSTD_DStreamInSize();
		451	size_t outSize = ZSTD_DStreamOutSize();
		452	ZstdDecompressorIterator* result;
		453	size_t skipBytes = 0;
		454
		455	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O\|kkk", kwlist, &reader,
		456	&inSize, &outSize, &skipBytes)) {
		457	return NULL;
		458	}
		459
		460	if (skipBytes >= inSize) {
		461	PyErr_SetString(PyExc_ValueError,
		462	"skip_bytes must be smaller than read_size");
		463	return NULL;
		464	}
		465
		466	result = PyObject_New(ZstdDecompressorIterator, &ZstdDecompressorIteratorType);
		467	if (!result) {
		468	return NULL;
		469	}
		470
		471	result->decompressor = NULL;
		472	result->reader = NULL;
		473	result->buffer = NULL;
		474	result->dstream = NULL;
		475	result->input.src = NULL;
		476	result->output.dst = NULL;
		477
		478	if (PyObject_HasAttrString(reader, "read")) {
		479	result->reader = reader;
		480	Py_INCREF(result->reader);
		481	}
		482	else if (1 == PyObject_CheckBuffer(reader)) {
		483	/* Object claims it is a buffer. Try to get a handle to it. */
		484	result->buffer = PyMem_Malloc(sizeof(Py_buffer));
		485	if (!result->buffer) {
		486	goto except;
		487	}
		488
		489	memset(result->buffer, 0, sizeof(Py_buffer));
		490
		491	if (0 != PyObject_GetBuffer(reader, result->buffer, PyBUF_CONTIG_RO)) {
		492	goto except;
		493	}
		494
		495	result->bufferOffset = 0;
		496	}
		497	else {
		498	PyErr_SetString(PyExc_ValueError,
		499	"must pass an object with a read() method or conforms to buffer protocol");
		500	goto except;
		501	}
		502
		503	result->decompressor = self;
		504	Py_INCREF(result->decompressor);
		505
		506	result->inSize = inSize;
		507	result->outSize = outSize;
		508	result->skipBytes = skipBytes;
		509
		510	result->dstream = DStream_from_ZstdDecompressor(self);
		511	if (!result->dstream) {
		512	goto except;
		513	}
		514
		515	result->input.src = PyMem_Malloc(inSize);
		516	if (!result->input.src) {
		517	PyErr_NoMemory();
		518	goto except;
		519	}
		520	result->input.size = 0;
		521	result->input.pos = 0;
		522
		523	result->output.dst = NULL;
		524	result->output.size = 0;
		525	result->output.pos = 0;
		526
		527	result->readCount = 0;
		528	result->finishedInput = 0;
		529	result->finishedOutput = 0;
		530
		531	goto finally;
		532
		533	except:
		534	if (result->reader) {
		535	Py_DECREF(result->reader);
		536	result->reader = NULL;
		537	}
		538
		539	if (result->buffer) {
		540	PyBuffer_Release(result->buffer);
		541	Py_DECREF(result->buffer);
		542	result->buffer = NULL;
		543	}
		544
		545	Py_DECREF(result);
		546	result = NULL;
		547
		548	finally:
		549
		550	return result;
		551	}
		552
		553	PyDoc_STRVAR(Decompressor_write_to__doc__,
		554	"Create a context manager to write decompressed data to an object.\n"
		555	"\n"
		556	"The passed object must have a ``write()`` method.\n"
		557	"\n"
		558	"The caller feeds intput data to the object by calling ``write(data)``.\n"
		559	"Decompressed data is written to the argument given as it is decompressed.\n"
		560	"\n"
		561	"An optional ``write_size`` argument defines the size of chunks to\n"
		562	"``write()`` to the writer. It defaults to the default output size for a zstd\n"
		563	"streaming decompressor.\n"
		564	);
		565
		566	static ZstdDecompressionWriter* Decompressor_write_to(ZstdDecompressor* self, PyObject* args, PyObject* kwargs) {
		567	static char* kwlist[] = {
		568	"writer",
		569	"write_size",
		570	NULL
		571	};
		572
		573	PyObject* writer;
		574	size_t outSize = ZSTD_DStreamOutSize();
		575	ZstdDecompressionWriter* result;
		576
		577	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O\|k", kwlist, &writer, &outSize)) {
		578	return NULL;
		579	}
		580
		581	if (!PyObject_HasAttrString(writer, "write")) {
		582	PyErr_SetString(PyExc_ValueError, "must pass an object with a write() method");
		583	return NULL;
		584	}
		585
		586	result = PyObject_New(ZstdDecompressionWriter, &ZstdDecompressionWriterType);
		587	if (!result) {
		588	return NULL;
		589	}
		590
		591	result->decompressor = self;
		592	Py_INCREF(result->decompressor);
		593
		594	result->writer = writer;
		595	Py_INCREF(result->writer);
		596
		597	result->outSize = outSize;
		598
		599	result->entered = 0;
		600	result->dstream = NULL;
		601
		602	return result;
		603	}
		604
		605	static PyMethodDef Decompressor_methods[] = {
		606	{ "copy_stream", (PyCFunction)Decompressor_copy_stream, METH_VARARGS \| METH_KEYWORDS,
		607	Decompressor_copy_stream__doc__ },
		608	{ "decompress", (PyCFunction)Decompressor_decompress, METH_VARARGS \| METH_KEYWORDS,
		609	Decompressor_decompress__doc__ },
		610	{ "decompressobj", (PyCFunction)Decompressor_decompressobj, METH_NOARGS,
		611	Decompressor_decompressobj__doc__ },
		612	{ "read_from", (PyCFunction)Decompressor_read_from, METH_VARARGS \| METH_KEYWORDS,
		613	Decompressor_read_from__doc__ },
		614	{ "write_to", (PyCFunction)Decompressor_write_to, METH_VARARGS \| METH_KEYWORDS,
		615	Decompressor_write_to__doc__ },
		616	{ NULL, NULL }
		617	};
		618
		619	PyTypeObject ZstdDecompressorType = {
		620	PyVarObject_HEAD_INIT(NULL, 0)
		621	"zstd.ZstdDecompressor", /* tp_name */
		622	sizeof(ZstdDecompressor), /* tp_basicsize */
		623	0, /* tp_itemsize */
		624	(destructor)Decompressor_dealloc, /* tp_dealloc */
		625	0, /* tp_print */
		626	0, /* tp_getattr */
		627	0, /* tp_setattr */
		628	0, /* tp_compare */
		629	0, /* tp_repr */
		630	0, /* tp_as_number */
		631	0, /* tp_as_sequence */
		632	0, /* tp_as_mapping */
		633	0, /* tp_hash */
		634	0, /* tp_call */
		635	0, /* tp_str */
		636	0, /* tp_getattro */
		637	0, /* tp_setattro */
		638	0, /* tp_as_buffer */
		639	Py_TPFLAGS_DEFAULT \| Py_TPFLAGS_BASETYPE, /* tp_flags */
		640	Decompressor__doc__, /* tp_doc */
		641	0, /* tp_traverse */
		642	0, /* tp_clear */
		643	0, /* tp_richcompare */
		644	0, /* tp_weaklistoffset */
		645	0, /* tp_iter */
		646	0, /* tp_iternext */
		647	Decompressor_methods, /* tp_methods */
		648	0, /* tp_members */
		649	0, /* tp_getset */
		650	0, /* tp_base */
		651	0, /* tp_dict */
		652	0, /* tp_descr_get */
		653	0, /* tp_descr_set */
		654	0, /* tp_dictoffset */
		655	(initproc)Decompressor_init, /* tp_init */
		656	0, /* tp_alloc */
		657	PyType_GenericNew, /* tp_new */
		658	};
		659
		660	void decompressor_module_init(PyObject* mod) {
		661	Py_TYPE(&ZstdDecompressorType) = &PyType_Type;
		662	if (PyType_Ready(&ZstdDecompressorType) < 0) {
		663	return;
		664	}
		665
		666	Py_INCREF((PyObject*)&ZstdDecompressorType);
		667	PyModule_AddObject(mod, "ZstdDecompressor",
		668	(PyObject*)&ZstdDecompressorType);
		669	}

contrib/python-zstandard/c-ext/decompressoriterator.c

0 created 644 +254 0

@@ -0,0 +1,254 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	#include "python-zstandard.h"
	10
	11	#define min(a, b) (((a) < (b)) ? (a) : (b))
	12
	13	extern PyObject* ZstdError;
	14
	15	PyDoc_STRVAR(ZstdDecompressorIterator__doc__,
	16	"Represents an iterator of decompressed data.\n"
	17	);
	18
	19	static void ZstdDecompressorIterator_dealloc(ZstdDecompressorIterator* self) {
	20	Py_XDECREF(self->decompressor);
	21	Py_XDECREF(self->reader);
	22
	23	if (self->buffer) {
	24	PyBuffer_Release(self->buffer);
	25	PyMem_FREE(self->buffer);
	26	self->buffer = NULL;
	27	}
	28
	29	if (self->dstream) {
	30	ZSTD_freeDStream(self->dstream);
	31	self->dstream = NULL;
	32	}
	33
	34	if (self->input.src) {
	35	PyMem_Free((void*)self->input.src);
	36	self->input.src = NULL;
	37	}
	38
	39	PyObject_Del(self);
	40	}
	41
	42	static PyObject* ZstdDecompressorIterator_iter(PyObject* self) {
	43	Py_INCREF(self);
	44	return self;
	45	}
	46
	47	static DecompressorIteratorResult read_decompressor_iterator(ZstdDecompressorIterator* self) {
	48	size_t zresult;
	49	PyObject* chunk;
	50	DecompressorIteratorResult result;
	51	size_t oldInputPos = self->input.pos;
	52
	53	result.chunk = NULL;
	54
	55	chunk = PyBytes_FromStringAndSize(NULL, self->outSize);
	56	if (!chunk) {
	57	result.errored = 1;
	58	return result;
	59	}
	60
	61	self->output.dst = PyBytes_AsString(chunk);
	62	self->output.size = self->outSize;
	63	self->output.pos = 0;
	64
	65	Py_BEGIN_ALLOW_THREADS
	66	zresult = ZSTD_decompressStream(self->dstream, &self->output, &self->input);
	67	Py_END_ALLOW_THREADS
	68
	69	/* We're done with the pointer. Nullify to prevent anyone from getting a
	70	handle on a Python object. */
	71	self->output.dst = NULL;
	72
	73	if (ZSTD_isError(zresult)) {
	74	Py_DECREF(chunk);
	75	PyErr_Format(ZstdError, "zstd decompress error: %s",
	76	ZSTD_getErrorName(zresult));
	77	result.errored = 1;
	78	return result;
	79	}
	80
	81	self->readCount += self->input.pos - oldInputPos;
	82
	83	/* Frame is fully decoded. Input exhausted and output sitting in buffer. */
	84	if (0 == zresult) {
	85	self->finishedInput = 1;
	86	self->finishedOutput = 1;
	87	}
	88
	89	/* If it produced output data, return it. */
	90	if (self->output.pos) {
	91	if (self->output.pos < self->outSize) {
	92	if (_PyBytes_Resize(&chunk, self->output.pos)) {
	93	result.errored = 1;
	94	return result;
	95	}
	96	}
	97	}
	98	else {
	99	Py_DECREF(chunk);
	100	chunk = NULL;
	101	}
	102
	103	result.errored = 0;
	104	result.chunk = chunk;
	105
	106	return result;
	107	}
	108
	109	static PyObject* ZstdDecompressorIterator_iternext(ZstdDecompressorIterator* self) {
	110	PyObject* readResult = NULL;
	111	char* readBuffer;
	112	Py_ssize_t readSize;
	113	Py_ssize_t bufferRemaining;
	114	DecompressorIteratorResult result;
	115
	116	if (self->finishedOutput) {
	117	PyErr_SetString(PyExc_StopIteration, "output flushed");
	118	return NULL;
	119	}
	120
	121	/* If we have data left in the input, consume it. */
	122	if (self->input.pos < self->input.size) {
	123	result = read_decompressor_iterator(self);
	124	if (result.chunk \|\| result.errored) {
	125	return result.chunk;
	126	}
	127
	128	/* Else fall through to get more data from input. */
	129	}
	130
	131	read_from_source:
	132
	133	if (!self->finishedInput) {
	134	if (self->reader) {
	135	readResult = PyObject_CallMethod(self->reader, "read", "I", self->inSize);
	136	if (!readResult) {
	137	return NULL;
	138	}
	139
	140	PyBytes_AsStringAndSize(readResult, &readBuffer, &readSize);
	141	}
	142	else {
	143	assert(self->buffer && self->buffer->buf);
	144
	145	/* Only support contiguous C arrays for now */
	146	assert(self->buffer->strides == NULL && self->buffer->suboffsets == NULL);
	147	assert(self->buffer->itemsize == 1);
	148
	149	/* TODO avoid memcpy() below */
	150	readBuffer = (char *)self->buffer->buf + self->bufferOffset;
	151	bufferRemaining = self->buffer->len - self->bufferOffset;
	152	readSize = min(bufferRemaining, (Py_ssize_t)self->inSize);
	153	self->bufferOffset += readSize;
	154	}
	155
	156	if (readSize) {
	157	if (!self->readCount && self->skipBytes) {
	158	assert(self->skipBytes < self->inSize);
	159	if ((Py_ssize_t)self->skipBytes >= readSize) {
	160	PyErr_SetString(PyExc_ValueError,
	161	"skip_bytes larger than first input chunk; "
	162	"this scenario is currently unsupported");
	163	Py_DecRef(readResult);
	164	return NULL;
	165	}
	166
	167	readBuffer = readBuffer + self->skipBytes;
	168	readSize -= self->skipBytes;
	169	}
	170
	171	/* Copy input into previously allocated buffer because it can live longer
	172	than a single function call and we don't want to keep a ref to a Python
	173	object around. This could be changed... */
	174	memcpy((void*)self->input.src, readBuffer, readSize);
	175	self->input.size = readSize;
	176	self->input.pos = 0;
	177	}
	178	/* No bytes on first read must mean an empty input stream. */
	179	else if (!self->readCount) {
	180	self->finishedInput = 1;
	181	self->finishedOutput = 1;
	182	Py_DecRef(readResult);
	183	PyErr_SetString(PyExc_StopIteration, "empty input");
	184	return NULL;
	185	}
	186	else {
	187	self->finishedInput = 1;
	188	}
	189
	190	/* We've copied the data managed by memory. Discard the Python object. */
	191	Py_DecRef(readResult);
	192	}
	193
	194	result = read_decompressor_iterator(self);
	195	if (result.errored \|\| result.chunk) {
	196	return result.chunk;
	197	}
	198
	199	/* No new output data. Try again unless we know there is no more data. */
	200	if (!self->finishedInput) {
	201	goto read_from_source;
	202	}
	203
	204	PyErr_SetString(PyExc_StopIteration, "input exhausted");
	205	return NULL;
	206	}
	207
	208	PyTypeObject ZstdDecompressorIteratorType = {
	209	PyVarObject_HEAD_INIT(NULL, 0)
	210	"zstd.ZstdDecompressorIterator", /* tp_name */
	211	sizeof(ZstdDecompressorIterator), /* tp_basicsize */
	212	0, /* tp_itemsize */
	213	(destructor)ZstdDecompressorIterator_dealloc, /* tp_dealloc */
	214	0, /* tp_print */
	215	0, /* tp_getattr */
	216	0, /* tp_setattr */
	217	0, /* tp_compare */
	218	0, /* tp_repr */
	219	0, /* tp_as_number */
	220	0, /* tp_as_sequence */
	221	0, /* tp_as_mapping */
	222	0, /* tp_hash */
	223	0, /* tp_call */
	224	0, /* tp_str */
	225	0, /* tp_getattro */
	226	0, /* tp_setattro */
	227	0, /* tp_as_buffer */
	228	Py_TPFLAGS_DEFAULT \| Py_TPFLAGS_BASETYPE, /* tp_flags */
	229	ZstdDecompressorIterator__doc__, /* tp_doc */
	230	0, /* tp_traverse */
	231	0, /* tp_clear */
	232	0, /* tp_richcompare */
	233	0, /* tp_weaklistoffset */
	234	ZstdDecompressorIterator_iter, /* tp_iter */
	235	(iternextfunc)ZstdDecompressorIterator_iternext, /* tp_iternext */
	236	0, /* tp_methods */
	237	0, /* tp_members */
	238	0, /* tp_getset */
	239	0, /* tp_base */
	240	0, /* tp_dict */
	241	0, /* tp_descr_get */
	242	0, /* tp_descr_set */
	243	0, /* tp_dictoffset */
	244	0, /* tp_init */
	245	0, /* tp_alloc */
	246	PyType_GenericNew, /* tp_new */
	247	};
	248
	249	void decompressoriterator_module_init(PyObject* mod) {
	250	Py_TYPE(&ZstdDecompressorIteratorType) = &PyType_Type;
	251	if (PyType_Ready(&ZstdDecompressorIteratorType) < 0) {
	252	return;
	253	}
	254	}

contrib/python-zstandard/c-ext/dictparams.c

0 created 644 +125 0

@@ -0,0 +1,125 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	#include "python-zstandard.h"
	10
	11	PyDoc_STRVAR(DictParameters__doc__,
	12	"DictParameters: low-level control over dictionary generation");
	13
	14	static PyObject* DictParameters_new(PyTypeObject* subtype, PyObject* args, PyObject* kwargs) {
	15	DictParametersObject* self;
	16	unsigned selectivityLevel;
	17	int compressionLevel;
	18	unsigned notificationLevel;
	19	unsigned dictID;
	20
	21	if (!PyArg_ParseTuple(args, "IiII", &selectivityLevel, &compressionLevel,
	22	&notificationLevel, &dictID)) {
	23	return NULL;
	24	}
	25
	26	self = (DictParametersObject*)subtype->tp_alloc(subtype, 1);
	27	if (!self) {
	28	return NULL;
	29	}
	30
	31	self->selectivityLevel = selectivityLevel;
	32	self->compressionLevel = compressionLevel;
	33	self->notificationLevel = notificationLevel;
	34	self->dictID = dictID;
	35
	36	return (PyObject*)self;
	37	}
	38
	39	static void DictParameters_dealloc(PyObject* self) {
	40	PyObject_Del(self);
	41	}
	42
	43	static Py_ssize_t DictParameters_length(PyObject* self) {
	44	return 4;
	45	};
	46
	47	static PyObject* DictParameters_item(PyObject* o, Py_ssize_t i) {
	48	DictParametersObject* self = (DictParametersObject*)o;
	49
	50	switch (i) {
	51	case 0:
	52	return PyLong_FromLong(self->selectivityLevel);
	53	case 1:
	54	return PyLong_FromLong(self->compressionLevel);
	55	case 2:
	56	return PyLong_FromLong(self->notificationLevel);
	57	case 3:
	58	return PyLong_FromLong(self->dictID);
	59	default:
	60	PyErr_SetString(PyExc_IndexError, "index out of range");
	61	return NULL;
	62	}
	63	}
	64
	65	static PySequenceMethods DictParameters_sq = {
	66	DictParameters_length, /* sq_length */
	67	0, /* sq_concat */
	68	0, /* sq_repeat */
	69	DictParameters_item, /* sq_item */
	70	0, /* sq_ass_item */
	71	0, /* sq_contains */
	72	0, /* sq_inplace_concat */
	73	0 /* sq_inplace_repeat */
	74	};
	75
	76	PyTypeObject DictParametersType = {
	77	PyVarObject_HEAD_INIT(NULL, 0)
	78	"DictParameters", /* tp_name */
	79	sizeof(DictParametersObject), /* tp_basicsize */
	80	0, /* tp_itemsize */
	81	(destructor)DictParameters_dealloc, /* tp_dealloc */
	82	0, /* tp_print */
	83	0, /* tp_getattr */
	84	0, /* tp_setattr */
	85	0, /* tp_compare */
	86	0, /* tp_repr */
	87	0, /* tp_as_number */
	88	&DictParameters_sq, /* tp_as_sequence */
	89	0, /* tp_as_mapping */
	90	0, /* tp_hash */
	91	0, /* tp_call */
	92	0, /* tp_str */
	93	0, /* tp_getattro */
	94	0, /* tp_setattro */
	95	0, /* tp_as_buffer */
	96	Py_TPFLAGS_DEFAULT, /* tp_flags */
	97	DictParameters__doc__, /* tp_doc */
	98	0, /* tp_traverse */
	99	0, /* tp_clear */
	100	0, /* tp_richcompare */
	101	0, /* tp_weaklistoffset */
	102	0, /* tp_iter */
	103	0, /* tp_iternext */
	104	0, /* tp_methods */
	105	0, /* tp_members */
	106	0, /* tp_getset */
	107	0, /* tp_base */
	108	0, /* tp_dict */
	109	0, /* tp_descr_get */
	110	0, /* tp_descr_set */
	111	0, /* tp_dictoffset */
	112	0, /* tp_init */
	113	0, /* tp_alloc */
	114	DictParameters_new, /* tp_new */
	115	};
	116
	117	void dictparams_module_init(PyObject* mod) {
	118	Py_TYPE(&DictParametersType) = &PyType_Type;
	119	if (PyType_Ready(&DictParametersType) < 0) {
	120	return;
	121	}
	122
	123	Py_IncRef((PyObject*)&DictParametersType);
	124	PyModule_AddObject(mod, "DictParameters", (PyObject*)&DictParametersType);
	125	}

contrib/python-zstandard/c-ext/python-zstandard.h

0 created 644 +172 0

@@ -0,0 +1,172 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	#define PY_SSIZE_T_CLEAN
	10	#include <Python.h>
	11
	12	#define ZSTD_STATIC_LINKING_ONLY
	13	#define ZDICT_STATIC_LINKING_ONLY
	14	#include "mem.h"
	15	#include "zstd.h"
	16	#include "zdict.h"
	17
	18	#define PYTHON_ZSTANDARD_VERSION "0.5.0"
	19
	20	typedef struct {
	21	PyObject_HEAD
	22	unsigned windowLog;
	23	unsigned chainLog;
	24	unsigned hashLog;
	25	unsigned searchLog;
	26	unsigned searchLength;
	27	unsigned targetLength;
	28	ZSTD_strategy strategy;
	29	} CompressionParametersObject;
	30
	31	extern PyTypeObject CompressionParametersType;
	32
	33	typedef struct {
	34	PyObject_HEAD
	35	unsigned selectivityLevel;
	36	int compressionLevel;
	37	unsigned notificationLevel;
	38	unsigned dictID;
	39	} DictParametersObject;
	40
	41	extern PyTypeObject DictParametersType;
	42
	43	typedef struct {
	44	PyObject_HEAD
	45
	46	void* dictData;
	47	size_t dictSize;
	48	} ZstdCompressionDict;
	49
	50	extern PyTypeObject ZstdCompressionDictType;
	51
	52	typedef struct {
	53	PyObject_HEAD
	54
	55	int compressionLevel;
	56	ZstdCompressionDict* dict;
	57	ZSTD_CDict* cdict;
	58	CompressionParametersObject* cparams;
	59	ZSTD_frameParameters fparams;
	60	} ZstdCompressor;
	61
	62	extern PyTypeObject ZstdCompressorType;
	63
	64	typedef struct {
	65	PyObject_HEAD
	66
	67	ZstdCompressor* compressor;
	68	ZSTD_CStream* cstream;
	69	ZSTD_outBuffer output;
	70	int flushed;
	71	} ZstdCompressionObj;
	72
	73	extern PyTypeObject ZstdCompressionObjType;
	74
	75	typedef struct {
	76	PyObject_HEAD
	77
	78	ZstdCompressor* compressor;
	79	PyObject* writer;
	80	Py_ssize_t sourceSize;
	81	size_t outSize;
	82	ZSTD_CStream* cstream;
	83	int entered;
	84	} ZstdCompressionWriter;
	85
	86	extern PyTypeObject ZstdCompressionWriterType;
	87
	88	typedef struct {
	89	PyObject_HEAD
	90
	91	ZstdCompressor* compressor;
	92	PyObject* reader;
	93	Py_buffer* buffer;
	94	Py_ssize_t bufferOffset;
	95	Py_ssize_t sourceSize;
	96	size_t inSize;
	97	size_t outSize;
	98
	99	ZSTD_CStream* cstream;
	100	ZSTD_inBuffer input;
	101	ZSTD_outBuffer output;
	102	int finishedOutput;
	103	int finishedInput;
	104	PyObject* readResult;
	105	} ZstdCompressorIterator;
	106
	107	extern PyTypeObject ZstdCompressorIteratorType;
	108
	109	typedef struct {
	110	PyObject_HEAD
	111
	112	ZSTD_DCtx* refdctx;
	113
	114	ZstdCompressionDict* dict;
	115	ZSTD_DDict* ddict;
	116	} ZstdDecompressor;
	117
	118	extern PyTypeObject ZstdDecompressorType;
	119
	120	typedef struct {
	121	PyObject_HEAD
	122
	123	ZstdDecompressor* decompressor;
	124	ZSTD_DStream* dstream;
	125	int finished;
	126	} ZstdDecompressionObj;
	127
	128	extern PyTypeObject ZstdDecompressionObjType;
	129
	130	typedef struct {
	131	PyObject_HEAD
	132
	133	ZstdDecompressor* decompressor;
	134	PyObject* writer;
	135	size_t outSize;
	136	ZSTD_DStream* dstream;
	137	int entered;
	138	} ZstdDecompressionWriter;
	139
	140	extern PyTypeObject ZstdDecompressionWriterType;
	141
	142	typedef struct {
	143	PyObject_HEAD
	144
	145	ZstdDecompressor* decompressor;
	146	PyObject* reader;
	147	Py_buffer* buffer;
	148	Py_ssize_t bufferOffset;
	149	size_t inSize;
	150	size_t outSize;
	151	size_t skipBytes;
	152	ZSTD_DStream* dstream;
	153	ZSTD_inBuffer input;
	154	ZSTD_outBuffer output;
	155	Py_ssize_t readCount;
	156	int finishedInput;
	157	int finishedOutput;
	158	} ZstdDecompressorIterator;
	159
	160	extern PyTypeObject ZstdDecompressorIteratorType;
	161
	162	typedef struct {
	163	int errored;
	164	PyObject* chunk;
	165	} DecompressorIteratorResult;
	166
	167	void ztopy_compression_parameters(CompressionParametersObject* params, ZSTD_compressionParameters* zparams);
	168	CompressionParametersObject* get_compression_parameters(PyObject* self, PyObject* args);
	169	PyObject* estimate_compression_context_size(PyObject* self, PyObject* args);
	170	ZSTD_CStream* CStream_from_ZstdCompressor(ZstdCompressor* compressor, Py_ssize_t sourceSize);
	171	ZSTD_DStream* DStream_from_ZstdDecompressor(ZstdDecompressor* decompressor);
	172	ZstdCompressionDict* train_dictionary(PyObject* self, PyObject* args, PyObject* kwargs);

contrib/python-zstandard/make_cffi.py

0 created 644 +110 0

@@ -0,0 +1,110 b''
	1	# Copyright (c) 2016-present, Gregory Szorc
	2	# All rights reserved.
	3	#
	4	# This software may be modified and distributed under the terms
	5	# of the BSD license. See the LICENSE file for details.
	6
	7	from __future__ import absolute_import
	8
	9	import cffi
	10	import os
	11
	12
	13	HERE = os.path.abspath(os.path.dirname(__file__))
	14
	15	SOURCES = ['zstd/%s' % p for p in (
	16	'common/entropy_common.c',
	17	'common/error_private.c',
	18	'common/fse_decompress.c',
	19	'common/xxhash.c',
	20	'common/zstd_common.c',
	21	'compress/fse_compress.c',
	22	'compress/huf_compress.c',
	23	'compress/zbuff_compress.c',
	24	'compress/zstd_compress.c',
	25	'decompress/huf_decompress.c',
	26	'decompress/zbuff_decompress.c',
	27	'decompress/zstd_decompress.c',
	28	'dictBuilder/divsufsort.c',
	29	'dictBuilder/zdict.c',
	30	)]
	31
	32	INCLUDE_DIRS = [os.path.join(HERE, d) for d in (
	33	'zstd',
	34	'zstd/common',
	35	'zstd/compress',
	36	'zstd/decompress',
	37	'zstd/dictBuilder',
	38	)]
	39
	40	with open(os.path.join(HERE, 'zstd', 'zstd.h'), 'rb') as fh:
	41	zstd_h = fh.read()
	42
	43	ffi = cffi.FFI()
	44	ffi.set_source('_zstd_cffi', '''
	45	/* needed for typedefs like U32 references in zstd.h */
	46	#include "mem.h"
	47	#define ZSTD_STATIC_LINKING_ONLY
	48	#include "zstd.h"
	49	''',
	50	sources=SOURCES, include_dirs=INCLUDE_DIRS)
	51
	52	# Rather than define the API definitions from zstd.h inline, munge the
	53	# source in a way that cdef() will accept.
	54	lines = zstd_h.splitlines()
	55	lines = [l.rstrip() for l in lines if l.strip()]
	56
	57	# Strip preprocessor directives - they aren't important for our needs.
	58	lines = [l for l in lines
	59	if not l.startswith((b'#if', b'#else', b'#endif', b'#include'))]
	60
	61	# Remove extern C block
	62	lines = [l for l in lines if l not in (b'extern "C" {', b'}')]
	63
	64	# The version #defines don't parse and aren't necessary. Strip them.
	65	lines = [l for l in lines if not l.startswith((
	66	b'#define ZSTD_H_235446',
	67	b'#define ZSTD_LIB_VERSION',
	68	b'#define ZSTD_QUOTE',
	69	b'#define ZSTD_EXPAND_AND_QUOTE',
	70	b'#define ZSTD_VERSION_STRING',
	71	b'#define ZSTD_VERSION_NUMBER'))]
	72
	73	# The C parser also doesn't like some constant defines referencing
	74	# other constants.
	75	# TODO we pick the 64-bit constants here. We should assert somewhere
	76	# we're compiling for 64-bit.
	77	def fix_constants(l):
	78	if l.startswith(b'#define ZSTD_WINDOWLOG_MAX '):
	79	return b'#define ZSTD_WINDOWLOG_MAX 27'
	80	elif l.startswith(b'#define ZSTD_CHAINLOG_MAX '):
	81	return b'#define ZSTD_CHAINLOG_MAX 28'
	82	elif l.startswith(b'#define ZSTD_HASHLOG_MAX '):
	83	return b'#define ZSTD_HASHLOG_MAX 27'
	84	elif l.startswith(b'#define ZSTD_CHAINLOG_MAX '):
	85	return b'#define ZSTD_CHAINLOG_MAX 28'
	86	elif l.startswith(b'#define ZSTD_CHAINLOG_MIN '):
	87	return b'#define ZSTD_CHAINLOG_MIN 6'
	88	elif l.startswith(b'#define ZSTD_SEARCHLOG_MAX '):
	89	return b'#define ZSTD_SEARCHLOG_MAX 26'
	90	elif l.startswith(b'#define ZSTD_BLOCKSIZE_ABSOLUTEMAX '):
	91	return b'#define ZSTD_BLOCKSIZE_ABSOLUTEMAX 131072'
	92	else:
	93	return l
	94	lines = map(fix_constants, lines)
	95
	96	# ZSTDLIB_API isn't handled correctly. Strip it.
	97	lines = [l for l in lines if not l.startswith(b'# define ZSTDLIB_API')]
	98	def strip_api(l):
	99	if l.startswith(b'ZSTDLIB_API '):
	100	return l[len(b'ZSTDLIB_API '):]
	101	else:
	102	return l
	103	lines = map(strip_api, lines)
	104
	105	source = b'\n'.join(lines)
	106	ffi.cdef(source.decode('latin1'))
	107
	108
	109	if __name__ == '__main__':
	110	ffi.compile()

contrib/python-zstandard/setup.py

0 created 755 +62 0

@@ -0,0 +1,62 b''
	1	#!/usr/bin/env python
	2	# Copyright (c) 2016-present, Gregory Szorc
	3	# All rights reserved.
	4	#
	5	# This software may be modified and distributed under the terms
	6	# of the BSD license. See the LICENSE file for details.
	7
	8	from setuptools import setup
	9
	10	try:
	11	import cffi
	12	except ImportError:
	13	cffi = None
	14
	15	import setup_zstd
	16
	17	# Code for obtaining the Extension instance is in its own module to
	18	# facilitate reuse in other projects.
	19	extensions = [setup_zstd.get_c_extension()]
	20
	21	if cffi:
	22	import make_cffi
	23	extensions.append(make_cffi.ffi.distutils_extension())
	24
	25	version = None
	26
	27	with open('c-ext/python-zstandard.h', 'r') as fh:
	28	for line in fh:
	29	if not line.startswith('#define PYTHON_ZSTANDARD_VERSION'):
	30	continue
	31
	32	version = line.split()[2][1:-1]
	33	break
	34
	35	if not version:
	36	raise Exception('could not resolve package version; '
	37	'this should never happen')
	38
	39	setup(
	40	name='zstandard',
	41	version=version,
	42	description='Zstandard bindings for Python',
	43	long_description=open('README.rst', 'r').read(),
	44	url='https://github.com/indygreg/python-zstandard',
	45	author='Gregory Szorc',
	46	author_email='gregory.szorc@gmail.com',
	47	license='BSD',
	48	classifiers=[
	49	'Development Status :: 4 - Beta',
	50	'Intended Audience :: Developers',
	51	'License :: OSI Approved :: BSD License',
	52	'Programming Language :: C',
	53	'Programming Language :: Python :: 2.6',
	54	'Programming Language :: Python :: 2.7',
	55	'Programming Language :: Python :: 3.3',
	56	'Programming Language :: Python :: 3.4',
	57	'Programming Language :: Python :: 3.5',
	58	],
	59	keywords='zstandard zstd compression',
	60	ext_modules=extensions,
	61	test_suite='tests',
	62	)

contrib/python-zstandard/setup_zstd.py

0 created 644 +64 0

@@ -0,0 +1,64 b''
	1	# Copyright (c) 2016-present, Gregory Szorc
	2	# All rights reserved.
	3	#
	4	# This software may be modified and distributed under the terms
	5	# of the BSD license. See the LICENSE file for details.
	6
	7	import os
	8	from distutils.extension import Extension
	9
	10
	11	zstd_sources = ['zstd/%s' % p for p in (
	12	'common/entropy_common.c',
	13	'common/error_private.c',
	14	'common/fse_decompress.c',
	15	'common/xxhash.c',
	16	'common/zstd_common.c',
	17	'compress/fse_compress.c',
	18	'compress/huf_compress.c',
	19	'compress/zbuff_compress.c',
	20	'compress/zstd_compress.c',
	21	'decompress/huf_decompress.c',
	22	'decompress/zbuff_decompress.c',
	23	'decompress/zstd_decompress.c',
	24	'dictBuilder/divsufsort.c',
	25	'dictBuilder/zdict.c',
	26	)]
	27
	28
	29	zstd_includes = [
	30	'c-ext',
	31	'zstd',
	32	'zstd/common',
	33	'zstd/compress',
	34	'zstd/decompress',
	35	'zstd/dictBuilder',
	36	]
	37
	38	ext_sources = [
	39	'zstd.c',
	40	'c-ext/compressiondict.c',
	41	'c-ext/compressobj.c',
	42	'c-ext/compressor.c',
	43	'c-ext/compressoriterator.c',
	44	'c-ext/compressionparams.c',
	45	'c-ext/compressionwriter.c',
	46	'c-ext/constants.c',
	47	'c-ext/decompressobj.c',
	48	'c-ext/decompressor.c',
	49	'c-ext/decompressoriterator.c',
	50	'c-ext/decompressionwriter.c',
	51	'c-ext/dictparams.c',
	52	]
	53
	54
	55	def get_c_extension(name='zstd'):
	56	"""Obtain a distutils.extension.Extension for the C extension."""
	57	root = os.path.abspath(os.path.dirname(__file__))
	58
	59	sources = [os.path.join(root, p) for p in zstd_sources + ext_sources]
	60	include_dirs = [os.path.join(root, d) for d in zstd_includes]
	61
	62	# TODO compile with optimizations.
	63	return Extension(name, sources,
	64	include_dirs=include_dirs)

contrib/python-zstandard/tests/__init__.py

0 created 644 0 0

NO CONTENT: new file 100644

contrib/python-zstandard/tests/common.py

0 created 644 +15 0

@@ -0,0 +1,15 b''
	1	import io
	2
	3	class OpCountingBytesIO(io.BytesIO):
	4	def __init__(self, args, *kwargs):
	5	self._read_count = 0
	6	self._write_count = 0
	7	return super(OpCountingBytesIO, self).__init__(args, *kwargs)
	8
	9	def read(self, *args):
	10	self._read_count += 1
	11	return super(OpCountingBytesIO, self).read(*args)
	12
	13	def write(self, data):
	14	self._write_count += 1
	15	return super(OpCountingBytesIO, self).write(data)

contrib/python-zstandard/tests/test_cffi.py

0 created 644 +35 0

@@ -0,0 +1,35 b''
	1	import io
	2
	3	try:
	4	import unittest2 as unittest
	5	except ImportError:
	6	import unittest
	7
	8	import zstd
	9
	10	try:
	11	import zstd_cffi
	12	except ImportError:
	13	raise unittest.SkipTest('cffi version of zstd not available')
	14
	15
	16	class TestCFFIWriteToToCDecompressor(unittest.TestCase):
	17	def test_simple(self):
	18	orig = io.BytesIO()
	19	orig.write(b'foo')
	20	orig.write(b'bar')
	21	orig.write(b'foobar' * 16384)
	22
	23	dest = io.BytesIO()
	24	cctx = zstd_cffi.ZstdCompressor()
	25	with cctx.write_to(dest) as compressor:
	26	compressor.write(orig.getvalue())
	27
	28	uncompressed = io.BytesIO()
	29	dctx = zstd.ZstdDecompressor()
	30	with dctx.write_to(uncompressed) as decompressor:
	31	decompressor.write(dest.getvalue())
	32
	33	self.assertEqual(uncompressed.getvalue(), orig.getvalue())
	34
	35

contrib/python-zstandard/tests/test_compressor.py

0 created 644 +465 0

@@ -0,0 +1,465 b''
	1	import hashlib
	2	import io
	3	import struct
	4	import sys
	5
	6	try:
	7	import unittest2 as unittest
	8	except ImportError:
	9	import unittest
	10
	11	import zstd
	12
	13	from .common import OpCountingBytesIO
	14
	15
	16	if sys.version_info[0] >= 3:
	17	next = lambda it: it.__next__()
	18	else:
	19	next = lambda it: it.next()
	20
	21
	22	class TestCompressor(unittest.TestCase):
	23	def test_level_bounds(self):
	24	with self.assertRaises(ValueError):
	25	zstd.ZstdCompressor(level=0)
	26
	27	with self.assertRaises(ValueError):
	28	zstd.ZstdCompressor(level=23)
	29
	30
	31	class TestCompressor_compress(unittest.TestCase):
	32	def test_compress_empty(self):
	33	cctx = zstd.ZstdCompressor(level=1)
	34	cctx.compress(b'')
	35
	36	cctx = zstd.ZstdCompressor(level=22)
	37	cctx.compress(b'')
	38
	39	def test_compress_empty(self):
	40	cctx = zstd.ZstdCompressor(level=1)
	41	self.assertEqual(cctx.compress(b''),
	42	b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
	43
	44	def test_compress_large(self):
	45	chunks = []
	46	for i in range(255):
	47	chunks.append(struct.Struct('>B').pack(i) * 16384)
	48
	49	cctx = zstd.ZstdCompressor(level=3)
	50	result = cctx.compress(b''.join(chunks))
	51	self.assertEqual(len(result), 999)
	52	self.assertEqual(result[0:4], b'\x28\xb5\x2f\xfd')
	53
	54	def test_write_checksum(self):
	55	cctx = zstd.ZstdCompressor(level=1)
	56	no_checksum = cctx.compress(b'foobar')
	57	cctx = zstd.ZstdCompressor(level=1, write_checksum=True)
	58	with_checksum = cctx.compress(b'foobar')
	59
	60	self.assertEqual(len(with_checksum), len(no_checksum) + 4)
	61
	62	def test_write_content_size(self):
	63	cctx = zstd.ZstdCompressor(level=1)
	64	no_size = cctx.compress(b'foobar' * 256)
	65	cctx = zstd.ZstdCompressor(level=1, write_content_size=True)
	66	with_size = cctx.compress(b'foobar' * 256)
	67
	68	self.assertEqual(len(with_size), len(no_size) + 1)
	69
	70	def test_no_dict_id(self):
	71	samples = []
	72	for i in range(128):
	73	samples.append(b'foo' * 64)
	74	samples.append(b'bar' * 64)
	75	samples.append(b'foobar' * 64)
	76
	77	d = zstd.train_dictionary(1024, samples)
	78
	79	cctx = zstd.ZstdCompressor(level=1, dict_data=d)
	80	with_dict_id = cctx.compress(b'foobarfoobar')
	81
	82	cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_dict_id=False)
	83	no_dict_id = cctx.compress(b'foobarfoobar')
	84
	85	self.assertEqual(len(with_dict_id), len(no_dict_id) + 4)
	86
	87	def test_compress_dict_multiple(self):
	88	samples = []
	89	for i in range(128):
	90	samples.append(b'foo' * 64)
	91	samples.append(b'bar' * 64)
	92	samples.append(b'foobar' * 64)
	93
	94	d = zstd.train_dictionary(8192, samples)
	95
	96	cctx = zstd.ZstdCompressor(level=1, dict_data=d)
	97
	98	for i in range(32):
	99	cctx.compress(b'foo bar foobar foo bar foobar')
	100
	101
	102	class TestCompressor_compressobj(unittest.TestCase):
	103	def test_compressobj_empty(self):
	104	cctx = zstd.ZstdCompressor(level=1)
	105	cobj = cctx.compressobj()
	106	self.assertEqual(cobj.compress(b''), b'')
	107	self.assertEqual(cobj.flush(),
	108	b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
	109
	110	def test_compressobj_large(self):
	111	chunks = []
	112	for i in range(255):
	113	chunks.append(struct.Struct('>B').pack(i) * 16384)
	114
	115	cctx = zstd.ZstdCompressor(level=3)
	116	cobj = cctx.compressobj()
	117
	118	result = cobj.compress(b''.join(chunks)) + cobj.flush()
	119	self.assertEqual(len(result), 999)
	120	self.assertEqual(result[0:4], b'\x28\xb5\x2f\xfd')
	121
	122	def test_write_checksum(self):
	123	cctx = zstd.ZstdCompressor(level=1)
	124	cobj = cctx.compressobj()
	125	no_checksum = cobj.compress(b'foobar') + cobj.flush()
	126	cctx = zstd.ZstdCompressor(level=1, write_checksum=True)
	127	cobj = cctx.compressobj()
	128	with_checksum = cobj.compress(b'foobar') + cobj.flush()
	129
	130	self.assertEqual(len(with_checksum), len(no_checksum) + 4)
	131
	132	def test_write_content_size(self):
	133	cctx = zstd.ZstdCompressor(level=1)
	134	cobj = cctx.compressobj(size=len(b'foobar' * 256))
	135	no_size = cobj.compress(b'foobar' * 256) + cobj.flush()
	136	cctx = zstd.ZstdCompressor(level=1, write_content_size=True)
	137	cobj = cctx.compressobj(size=len(b'foobar' * 256))
	138	with_size = cobj.compress(b'foobar' * 256) + cobj.flush()
	139
	140	self.assertEqual(len(with_size), len(no_size) + 1)
	141
	142	def test_compress_after_flush(self):
	143	cctx = zstd.ZstdCompressor()
	144	cobj = cctx.compressobj()
	145
	146	cobj.compress(b'foo')
	147	cobj.flush()
	148
	149	with self.assertRaisesRegexp(zstd.ZstdError, 'cannot call compress after flush'):
	150	cobj.compress(b'foo')
	151
	152	with self.assertRaisesRegexp(zstd.ZstdError, 'flush already called'):
	153	cobj.flush()
	154
	155
	156	class TestCompressor_copy_stream(unittest.TestCase):
	157	def test_no_read(self):
	158	source = object()
	159	dest = io.BytesIO()
	160
	161	cctx = zstd.ZstdCompressor()
	162	with self.assertRaises(ValueError):
	163	cctx.copy_stream(source, dest)
	164
	165	def test_no_write(self):
	166	source = io.BytesIO()
	167	dest = object()
	168
	169	cctx = zstd.ZstdCompressor()
	170	with self.assertRaises(ValueError):
	171	cctx.copy_stream(source, dest)
	172
	173	def test_empty(self):
	174	source = io.BytesIO()
	175	dest = io.BytesIO()
	176
	177	cctx = zstd.ZstdCompressor(level=1)
	178	r, w = cctx.copy_stream(source, dest)
	179	self.assertEqual(int(r), 0)
	180	self.assertEqual(w, 9)
	181
	182	self.assertEqual(dest.getvalue(),
	183	b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
	184
	185	def test_large_data(self):
	186	source = io.BytesIO()
	187	for i in range(255):
	188	source.write(struct.Struct('>B').pack(i) * 16384)
	189	source.seek(0)
	190
	191	dest = io.BytesIO()
	192	cctx = zstd.ZstdCompressor()
	193	r, w = cctx.copy_stream(source, dest)
	194
	195	self.assertEqual(r, 255 * 16384)
	196	self.assertEqual(w, 999)
	197
	198	def test_write_checksum(self):
	199	source = io.BytesIO(b'foobar')
	200	no_checksum = io.BytesIO()
	201
	202	cctx = zstd.ZstdCompressor(level=1)
	203	cctx.copy_stream(source, no_checksum)
	204
	205	source.seek(0)
	206	with_checksum = io.BytesIO()
	207	cctx = zstd.ZstdCompressor(level=1, write_checksum=True)
	208	cctx.copy_stream(source, with_checksum)
	209
	210	self.assertEqual(len(with_checksum.getvalue()),
	211	len(no_checksum.getvalue()) + 4)
	212
	213	def test_write_content_size(self):
	214	source = io.BytesIO(b'foobar' * 256)
	215	no_size = io.BytesIO()
	216
	217	cctx = zstd.ZstdCompressor(level=1)
	218	cctx.copy_stream(source, no_size)
	219
	220	source.seek(0)
	221	with_size = io.BytesIO()
	222	cctx = zstd.ZstdCompressor(level=1, write_content_size=True)
	223	cctx.copy_stream(source, with_size)
	224
	225	# Source content size is unknown, so no content size written.
	226	self.assertEqual(len(with_size.getvalue()),
	227	len(no_size.getvalue()))
	228
	229	source.seek(0)
	230	with_size = io.BytesIO()
	231	cctx.copy_stream(source, with_size, size=len(source.getvalue()))
	232
	233	# We specified source size, so content size header is present.
	234	self.assertEqual(len(with_size.getvalue()),
	235	len(no_size.getvalue()) + 1)
	236
	237	def test_read_write_size(self):
	238	source = OpCountingBytesIO(b'foobarfoobar')
	239	dest = OpCountingBytesIO()
	240	cctx = zstd.ZstdCompressor()
	241	r, w = cctx.copy_stream(source, dest, read_size=1, write_size=1)
	242
	243	self.assertEqual(r, len(source.getvalue()))
	244	self.assertEqual(w, 21)
	245	self.assertEqual(source._read_count, len(source.getvalue()) + 1)
	246	self.assertEqual(dest._write_count, len(dest.getvalue()))
	247
	248
	249	def compress(data, level):
	250	buffer = io.BytesIO()
	251	cctx = zstd.ZstdCompressor(level=level)
	252	with cctx.write_to(buffer) as compressor:
	253	compressor.write(data)
	254	return buffer.getvalue()
	255
	256
	257	class TestCompressor_write_to(unittest.TestCase):
	258	def test_empty(self):
	259	self.assertEqual(compress(b'', 1),
	260	b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
	261
	262	def test_multiple_compress(self):
	263	buffer = io.BytesIO()
	264	cctx = zstd.ZstdCompressor(level=5)
	265	with cctx.write_to(buffer) as compressor:
	266	compressor.write(b'foo')
	267	compressor.write(b'bar')
	268	compressor.write(b'x' * 8192)
	269
	270	result = buffer.getvalue()
	271	self.assertEqual(result,
	272	b'\x28\xb5\x2f\xfd\x00\x50\x75\x00\x00\x38\x66\x6f'
	273	b'\x6f\x62\x61\x72\x78\x01\x00\xfc\xdf\x03\x23')
	274
	275	def test_dictionary(self):
	276	samples = []
	277	for i in range(128):
	278	samples.append(b'foo' * 64)
	279	samples.append(b'bar' * 64)
	280	samples.append(b'foobar' * 64)
	281
	282	d = zstd.train_dictionary(8192, samples)
	283
	284	buffer = io.BytesIO()
	285	cctx = zstd.ZstdCompressor(level=9, dict_data=d)
	286	with cctx.write_to(buffer) as compressor:
	287	compressor.write(b'foo')
	288	compressor.write(b'bar')
	289	compressor.write(b'foo' * 16384)
	290
	291	compressed = buffer.getvalue()
	292	h = hashlib.sha1(compressed).hexdigest()
	293	self.assertEqual(h, '1c5bcd25181bcd8c1a73ea8773323e0056129f92')
	294
	295	def test_compression_params(self):
	296	params = zstd.CompressionParameters(20, 6, 12, 5, 4, 10, zstd.STRATEGY_FAST)
	297
	298	buffer = io.BytesIO()
	299	cctx = zstd.ZstdCompressor(compression_params=params)
	300	with cctx.write_to(buffer) as compressor:
	301	compressor.write(b'foo')
	302	compressor.write(b'bar')
	303	compressor.write(b'foobar' * 16384)
	304
	305	compressed = buffer.getvalue()
	306	h = hashlib.sha1(compressed).hexdigest()
	307	self.assertEqual(h, '1ae31f270ed7de14235221a604b31ecd517ebd99')
	308
	309	def test_write_checksum(self):
	310	no_checksum = io.BytesIO()
	311	cctx = zstd.ZstdCompressor(level=1)
	312	with cctx.write_to(no_checksum) as compressor:
	313	compressor.write(b'foobar')
	314
	315	with_checksum = io.BytesIO()
	316	cctx = zstd.ZstdCompressor(level=1, write_checksum=True)
	317	with cctx.write_to(with_checksum) as compressor:
	318	compressor.write(b'foobar')
	319
	320	self.assertEqual(len(with_checksum.getvalue()),
	321	len(no_checksum.getvalue()) + 4)
	322
	323	def test_write_content_size(self):
	324	no_size = io.BytesIO()
	325	cctx = zstd.ZstdCompressor(level=1)
	326	with cctx.write_to(no_size) as compressor:
	327	compressor.write(b'foobar' * 256)
	328
	329	with_size = io.BytesIO()
	330	cctx = zstd.ZstdCompressor(level=1, write_content_size=True)
	331	with cctx.write_to(with_size) as compressor:
	332	compressor.write(b'foobar' * 256)
	333
	334	# Source size is not known in streaming mode, so header not
	335	# written.
	336	self.assertEqual(len(with_size.getvalue()),
	337	len(no_size.getvalue()))
	338
	339	# Declaring size will write the header.
	340	with_size = io.BytesIO()
	341	with cctx.write_to(with_size, size=len(b'foobar' * 256)) as compressor:
	342	compressor.write(b'foobar' * 256)
	343
	344	self.assertEqual(len(with_size.getvalue()),
	345	len(no_size.getvalue()) + 1)
	346
	347	def test_no_dict_id(self):
	348	samples = []
	349	for i in range(128):
	350	samples.append(b'foo' * 64)
	351	samples.append(b'bar' * 64)
	352	samples.append(b'foobar' * 64)
	353
	354	d = zstd.train_dictionary(1024, samples)
	355
	356	with_dict_id = io.BytesIO()
	357	cctx = zstd.ZstdCompressor(level=1, dict_data=d)
	358	with cctx.write_to(with_dict_id) as compressor:
	359	compressor.write(b'foobarfoobar')
	360
	361	cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_dict_id=False)
	362	no_dict_id = io.BytesIO()
	363	with cctx.write_to(no_dict_id) as compressor:
	364	compressor.write(b'foobarfoobar')
	365
	366	self.assertEqual(len(with_dict_id.getvalue()),
	367	len(no_dict_id.getvalue()) + 4)
	368
	369	def test_memory_size(self):
	370	cctx = zstd.ZstdCompressor(level=3)
	371	buffer = io.BytesIO()
	372	with cctx.write_to(buffer) as compressor:
	373	size = compressor.memory_size()
	374
	375	self.assertGreater(size, 100000)
	376
	377	def test_write_size(self):
	378	cctx = zstd.ZstdCompressor(level=3)
	379	dest = OpCountingBytesIO()
	380	with cctx.write_to(dest, write_size=1) as compressor:
	381	compressor.write(b'foo')
	382	compressor.write(b'bar')
	383	compressor.write(b'foobar')
	384
	385	self.assertEqual(len(dest.getvalue()), dest._write_count)
	386
	387
	388	class TestCompressor_read_from(unittest.TestCase):
	389	def test_type_validation(self):
	390	cctx = zstd.ZstdCompressor()
	391
	392	# Object with read() works.
	393	cctx.read_from(io.BytesIO())
	394
	395	# Buffer protocol works.
	396	cctx.read_from(b'foobar')
	397
	398	with self.assertRaisesRegexp(ValueError, 'must pass an object with a read'):
	399	cctx.read_from(True)
	400
	401	def test_read_empty(self):
	402	cctx = zstd.ZstdCompressor(level=1)
	403
	404	source = io.BytesIO()
	405	it = cctx.read_from(source)
	406	chunks = list(it)
	407	self.assertEqual(len(chunks), 1)
	408	compressed = b''.join(chunks)
	409	self.assertEqual(compressed, b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
	410
	411	# And again with the buffer protocol.
	412	it = cctx.read_from(b'')
	413	chunks = list(it)
	414	self.assertEqual(len(chunks), 1)
	415	compressed2 = b''.join(chunks)
	416	self.assertEqual(compressed2, compressed)
	417
	418	def test_read_large(self):
	419	cctx = zstd.ZstdCompressor(level=1)
	420
	421	source = io.BytesIO()
	422	source.write(b'f' * zstd.COMPRESSION_RECOMMENDED_INPUT_SIZE)
	423	source.write(b'o')
	424	source.seek(0)
	425
	426	# Creating an iterator should not perform any compression until
	427	# first read.
	428	it = cctx.read_from(source, size=len(source.getvalue()))
	429	self.assertEqual(source.tell(), 0)
	430
	431	# We should have exactly 2 output chunks.
	432	chunks = []
	433	chunk = next(it)
	434	self.assertIsNotNone(chunk)
	435	self.assertEqual(source.tell(), zstd.COMPRESSION_RECOMMENDED_INPUT_SIZE)
	436	chunks.append(chunk)
	437	chunk = next(it)
	438	self.assertIsNotNone(chunk)
	439	chunks.append(chunk)
	440
	441	self.assertEqual(source.tell(), len(source.getvalue()))
	442
	443	with self.assertRaises(StopIteration):
	444	next(it)
	445
	446	# And again for good measure.
	447	with self.assertRaises(StopIteration):
	448	next(it)
	449
	450	# We should get the same output as the one-shot compression mechanism.
	451	self.assertEqual(b''.join(chunks), cctx.compress(source.getvalue()))
	452
	453	# Now check the buffer protocol.
	454	it = cctx.read_from(source.getvalue())
	455	chunks = list(it)
	456	self.assertEqual(len(chunks), 2)
	457	self.assertEqual(b''.join(chunks), cctx.compress(source.getvalue()))
	458
	459	def test_read_write_size(self):
	460	source = OpCountingBytesIO(b'foobarfoobar')
	461	cctx = zstd.ZstdCompressor(level=3)
	462	for chunk in cctx.read_from(source, read_size=1, write_size=1):
	463	self.assertEqual(len(chunk), 1)
	464
	465	self.assertEqual(source._read_count, len(source.getvalue()) + 1)

contrib/python-zstandard/tests/test_data_structures.py

0 created 644 +107 0

@@ -0,0 +1,107 b''
	1	import io
	2
	3	try:
	4	import unittest2 as unittest
	5	except ImportError:
	6	import unittest
	7
	8	try:
	9	import hypothesis
	10	import hypothesis.strategies as strategies
	11	except ImportError:
	12	hypothesis = None
	13
	14	import zstd
	15
	16	class TestCompressionParameters(unittest.TestCase):
	17	def test_init_bad_arg_type(self):
	18	with self.assertRaises(TypeError):
	19	zstd.CompressionParameters()
	20
	21	with self.assertRaises(TypeError):
	22	zstd.CompressionParameters(0, 1)
	23
	24	def test_bounds(self):
	25	zstd.CompressionParameters(zstd.WINDOWLOG_MIN,
	26	zstd.CHAINLOG_MIN,
	27	zstd.HASHLOG_MIN,
	28	zstd.SEARCHLOG_MIN,
	29	zstd.SEARCHLENGTH_MIN,
	30	zstd.TARGETLENGTH_MIN,
	31	zstd.STRATEGY_FAST)
	32
	33	zstd.CompressionParameters(zstd.WINDOWLOG_MAX,
	34	zstd.CHAINLOG_MAX,
	35	zstd.HASHLOG_MAX,
	36	zstd.SEARCHLOG_MAX,
	37	zstd.SEARCHLENGTH_MAX,
	38	zstd.TARGETLENGTH_MAX,
	39	zstd.STRATEGY_BTOPT)
	40
	41	def test_get_compression_parameters(self):
	42	p = zstd.get_compression_parameters(1)
	43	self.assertIsInstance(p, zstd.CompressionParameters)
	44
	45	self.assertEqual(p[0], 19)
	46
	47	if hypothesis:
	48	s_windowlog = strategies.integers(min_value=zstd.WINDOWLOG_MIN,
	49	max_value=zstd.WINDOWLOG_MAX)
	50	s_chainlog = strategies.integers(min_value=zstd.CHAINLOG_MIN,
	51	max_value=zstd.CHAINLOG_MAX)
	52	s_hashlog = strategies.integers(min_value=zstd.HASHLOG_MIN,
	53	max_value=zstd.HASHLOG_MAX)
	54	s_searchlog = strategies.integers(min_value=zstd.SEARCHLOG_MIN,
	55	max_value=zstd.SEARCHLOG_MAX)
	56	s_searchlength = strategies.integers(min_value=zstd.SEARCHLENGTH_MIN,
	57	max_value=zstd.SEARCHLENGTH_MAX)
	58	s_targetlength = strategies.integers(min_value=zstd.TARGETLENGTH_MIN,
	59	max_value=zstd.TARGETLENGTH_MAX)
	60	s_strategy = strategies.sampled_from((zstd.STRATEGY_FAST,
	61	zstd.STRATEGY_DFAST,
	62	zstd.STRATEGY_GREEDY,
	63	zstd.STRATEGY_LAZY,
	64	zstd.STRATEGY_LAZY2,
	65	zstd.STRATEGY_BTLAZY2,
	66	zstd.STRATEGY_BTOPT))
	67
	68	class TestCompressionParametersHypothesis(unittest.TestCase):
	69	@hypothesis.given(s_windowlog, s_chainlog, s_hashlog, s_searchlog,
	70	s_searchlength, s_targetlength, s_strategy)
	71	def test_valid_init(self, windowlog, chainlog, hashlog, searchlog,
	72	searchlength, targetlength, strategy):
	73	p = zstd.CompressionParameters(windowlog, chainlog, hashlog,
	74	searchlog, searchlength,
	75	targetlength, strategy)
	76	self.assertEqual(tuple(p),
	77	(windowlog, chainlog, hashlog, searchlog,
	78	searchlength, targetlength, strategy))
	79
	80	# Verify we can instantiate a compressor with the supplied values.
	81	# ZSTD_checkCParams moves the goal posts on us from what's advertised
	82	# in the constants. So move along with them.
	83	if searchlength == zstd.SEARCHLENGTH_MIN and strategy in (zstd.STRATEGY_FAST, zstd.STRATEGY_GREEDY):
	84	searchlength += 1
	85	p = zstd.CompressionParameters(windowlog, chainlog, hashlog,
	86	searchlog, searchlength,
	87	targetlength, strategy)
	88	elif searchlength == zstd.SEARCHLENGTH_MAX and strategy != zstd.STRATEGY_FAST:
	89	searchlength -= 1
	90	p = zstd.CompressionParameters(windowlog, chainlog, hashlog,
	91	searchlog, searchlength,
	92	targetlength, strategy)
	93
	94	cctx = zstd.ZstdCompressor(compression_params=p)
	95	with cctx.write_to(io.BytesIO()):
	96	pass
	97
	98	@hypothesis.given(s_windowlog, s_chainlog, s_hashlog, s_searchlog,
	99	s_searchlength, s_targetlength, s_strategy)
	100	def test_estimate_compression_context_size(self, windowlog, chainlog,
	101	hashlog, searchlog,
	102	searchlength, targetlength,
	103	strategy):
	104	p = zstd.CompressionParameters(windowlog, chainlog, hashlog,
	105	searchlog, searchlength,
	106	targetlength, strategy)
	107	size = zstd.estimate_compression_context_size(p)

contrib/python-zstandard/tests/test_decompressor.py

0 created 644 +478 0

@@ -0,0 +1,478 b''
	1	import io
	2	import random
	3	import struct
	4	import sys
	5
	6	try:
	7	import unittest2 as unittest
	8	except ImportError:
	9	import unittest
	10
	11	import zstd
	12
	13	from .common import OpCountingBytesIO
	14
	15
	16	if sys.version_info[0] >= 3:
	17	next = lambda it: it.__next__()
	18	else:
	19	next = lambda it: it.next()
	20
	21
	22	class TestDecompressor_decompress(unittest.TestCase):
	23	def test_empty_input(self):
	24	dctx = zstd.ZstdDecompressor()
	25
	26	with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'):
	27	dctx.decompress(b'')
	28
	29	def test_invalid_input(self):
	30	dctx = zstd.ZstdDecompressor()
	31
	32	with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'):
	33	dctx.decompress(b'foobar')
	34
	35	def test_no_content_size_in_frame(self):
	36	cctx = zstd.ZstdCompressor(write_content_size=False)
	37	compressed = cctx.compress(b'foobar')
	38
	39	dctx = zstd.ZstdDecompressor()
	40	with self.assertRaisesRegexp(zstd.ZstdError, 'input data invalid'):
	41	dctx.decompress(compressed)
	42
	43	def test_content_size_present(self):
	44	cctx = zstd.ZstdCompressor(write_content_size=True)
	45	compressed = cctx.compress(b'foobar')
	46
	47	dctx = zstd.ZstdDecompressor()
	48	decompressed = dctx.decompress(compressed)
	49	self.assertEqual(decompressed, b'foobar')
	50
	51	def test_max_output_size(self):
	52	cctx = zstd.ZstdCompressor(write_content_size=False)
	53	source = b'foobar' * 256
	54	compressed = cctx.compress(source)
	55
	56	dctx = zstd.ZstdDecompressor()
	57	# Will fit into buffer exactly the size of input.
	58	decompressed = dctx.decompress(compressed, max_output_size=len(source))
	59	self.assertEqual(decompressed, source)
	60
	61	# Input size - 1 fails
	62	with self.assertRaisesRegexp(zstd.ZstdError, 'Destination buffer is too small'):
	63	dctx.decompress(compressed, max_output_size=len(source) - 1)
	64
	65	# Input size + 1 works
	66	decompressed = dctx.decompress(compressed, max_output_size=len(source) + 1)
	67	self.assertEqual(decompressed, source)
	68
	69	# A much larger buffer works.
	70	decompressed = dctx.decompress(compressed, max_output_size=len(source) * 64)
	71	self.assertEqual(decompressed, source)
	72
	73	def test_stupidly_large_output_buffer(self):
	74	cctx = zstd.ZstdCompressor(write_content_size=False)
	75	compressed = cctx.compress(b'foobar' * 256)
	76	dctx = zstd.ZstdDecompressor()
	77
	78	# Will get OverflowError on some Python distributions that can't
	79	# handle really large integers.
	80	with self.assertRaises((MemoryError, OverflowError)):
	81	dctx.decompress(compressed, max_output_size=2**62)
	82
	83	def test_dictionary(self):
	84	samples = []
	85	for i in range(128):
	86	samples.append(b'foo' * 64)
	87	samples.append(b'bar' * 64)
	88	samples.append(b'foobar' * 64)
	89
	90	d = zstd.train_dictionary(8192, samples)
	91
	92	orig = b'foobar' * 16384
	93	cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_content_size=True)
	94	compressed = cctx.compress(orig)
	95
	96	dctx = zstd.ZstdDecompressor(dict_data=d)
	97	decompressed = dctx.decompress(compressed)
	98
	99	self.assertEqual(decompressed, orig)
	100
	101	def test_dictionary_multiple(self):
	102	samples = []
	103	for i in range(128):
	104	samples.append(b'foo' * 64)
	105	samples.append(b'bar' * 64)
	106	samples.append(b'foobar' * 64)
	107
	108	d = zstd.train_dictionary(8192, samples)
	109
	110	sources = (b'foobar' * 8192, b'foo' * 8192, b'bar' * 8192)
	111	compressed = []
	112	cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_content_size=True)
	113	for source in sources:
	114	compressed.append(cctx.compress(source))
	115
	116	dctx = zstd.ZstdDecompressor(dict_data=d)
	117	for i in range(len(sources)):
	118	decompressed = dctx.decompress(compressed[i])
	119	self.assertEqual(decompressed, sources[i])
	120
	121
	122	class TestDecompressor_copy_stream(unittest.TestCase):
	123	def test_no_read(self):
	124	source = object()
	125	dest = io.BytesIO()
	126
	127	dctx = zstd.ZstdDecompressor()
	128	with self.assertRaises(ValueError):
	129	dctx.copy_stream(source, dest)
	130
	131	def test_no_write(self):
	132	source = io.BytesIO()
	133	dest = object()
	134
	135	dctx = zstd.ZstdDecompressor()
	136	with self.assertRaises(ValueError):
	137	dctx.copy_stream(source, dest)
	138
	139	def test_empty(self):
	140	source = io.BytesIO()
	141	dest = io.BytesIO()
	142
	143	dctx = zstd.ZstdDecompressor()
	144	# TODO should this raise an error?
	145	r, w = dctx.copy_stream(source, dest)
	146
	147	self.assertEqual(r, 0)
	148	self.assertEqual(w, 0)
	149	self.assertEqual(dest.getvalue(), b'')
	150
	151	def test_large_data(self):
	152	source = io.BytesIO()
	153	for i in range(255):
	154	source.write(struct.Struct('>B').pack(i) * 16384)
	155	source.seek(0)
	156
	157	compressed = io.BytesIO()
	158	cctx = zstd.ZstdCompressor()
	159	cctx.copy_stream(source, compressed)
	160
	161	compressed.seek(0)
	162	dest = io.BytesIO()
	163	dctx = zstd.ZstdDecompressor()
	164	r, w = dctx.copy_stream(compressed, dest)
	165
	166	self.assertEqual(r, len(compressed.getvalue()))
	167	self.assertEqual(w, len(source.getvalue()))
	168
	169	def test_read_write_size(self):
	170	source = OpCountingBytesIO(zstd.ZstdCompressor().compress(
	171	b'foobarfoobar'))
	172
	173	dest = OpCountingBytesIO()
	174	dctx = zstd.ZstdDecompressor()
	175	r, w = dctx.copy_stream(source, dest, read_size=1, write_size=1)
	176
	177	self.assertEqual(r, len(source.getvalue()))
	178	self.assertEqual(w, len(b'foobarfoobar'))
	179	self.assertEqual(source._read_count, len(source.getvalue()) + 1)
	180	self.assertEqual(dest._write_count, len(dest.getvalue()))
	181
	182
	183	class TestDecompressor_decompressobj(unittest.TestCase):
	184	def test_simple(self):
	185	data = zstd.ZstdCompressor(level=1).compress(b'foobar')
	186
	187	dctx = zstd.ZstdDecompressor()
	188	dobj = dctx.decompressobj()
	189	self.assertEqual(dobj.decompress(data), b'foobar')
	190
	191	def test_reuse(self):
	192	data = zstd.ZstdCompressor(level=1).compress(b'foobar')
	193
	194	dctx = zstd.ZstdDecompressor()
	195	dobj = dctx.decompressobj()
	196	dobj.decompress(data)
	197
	198	with self.assertRaisesRegexp(zstd.ZstdError, 'cannot use a decompressobj'):
	199	dobj.decompress(data)
	200
	201
	202	def decompress_via_writer(data):
	203	buffer = io.BytesIO()
	204	dctx = zstd.ZstdDecompressor()
	205	with dctx.write_to(buffer) as decompressor:
	206	decompressor.write(data)
	207	return buffer.getvalue()
	208
	209
	210	class TestDecompressor_write_to(unittest.TestCase):
	211	def test_empty_roundtrip(self):
	212	cctx = zstd.ZstdCompressor()
	213	empty = cctx.compress(b'')
	214	self.assertEqual(decompress_via_writer(empty), b'')
	215
	216	def test_large_roundtrip(self):
	217	chunks = []
	218	for i in range(255):
	219	chunks.append(struct.Struct('>B').pack(i) * 16384)
	220	orig = b''.join(chunks)
	221	cctx = zstd.ZstdCompressor()
	222	compressed = cctx.compress(orig)
	223
	224	self.assertEqual(decompress_via_writer(compressed), orig)
	225
	226	def test_multiple_calls(self):
	227	chunks = []
	228	for i in range(255):
	229	for j in range(255):
	230	chunks.append(struct.Struct('>B').pack(j) * i)
	231
	232	orig = b''.join(chunks)
	233	cctx = zstd.ZstdCompressor()
	234	compressed = cctx.compress(orig)
	235
	236	buffer = io.BytesIO()
	237	dctx = zstd.ZstdDecompressor()
	238	with dctx.write_to(buffer) as decompressor:
	239	pos = 0
	240	while pos < len(compressed):
	241	pos2 = pos + 8192
	242	decompressor.write(compressed[pos:pos2])
	243	pos += 8192
	244	self.assertEqual(buffer.getvalue(), orig)
	245
	246	def test_dictionary(self):
	247	samples = []
	248	for i in range(128):
	249	samples.append(b'foo' * 64)
	250	samples.append(b'bar' * 64)
	251	samples.append(b'foobar' * 64)
	252
	253	d = zstd.train_dictionary(8192, samples)
	254
	255	orig = b'foobar' * 16384
	256	buffer = io.BytesIO()
	257	cctx = zstd.ZstdCompressor(dict_data=d)
	258	with cctx.write_to(buffer) as compressor:
	259	compressor.write(orig)
	260
	261	compressed = buffer.getvalue()
	262	buffer = io.BytesIO()
	263
	264	dctx = zstd.ZstdDecompressor(dict_data=d)
	265	with dctx.write_to(buffer) as decompressor:
	266	decompressor.write(compressed)
	267
	268	self.assertEqual(buffer.getvalue(), orig)
	269
	270	def test_memory_size(self):
	271	dctx = zstd.ZstdDecompressor()
	272	buffer = io.BytesIO()
	273	with dctx.write_to(buffer) as decompressor:
	274	size = decompressor.memory_size()
	275
	276	self.assertGreater(size, 100000)
	277
	278	def test_write_size(self):
	279	source = zstd.ZstdCompressor().compress(b'foobarfoobar')
	280	dest = OpCountingBytesIO()
	281	dctx = zstd.ZstdDecompressor()
	282	with dctx.write_to(dest, write_size=1) as decompressor:
	283	s = struct.Struct('>B')
	284	for c in source:
	285	if not isinstance(c, str):
	286	c = s.pack(c)
	287	decompressor.write(c)
	288
	289
	290	self.assertEqual(dest.getvalue(), b'foobarfoobar')
	291	self.assertEqual(dest._write_count, len(dest.getvalue()))
	292
	293
	294	class TestDecompressor_read_from(unittest.TestCase):
	295	def test_type_validation(self):
	296	dctx = zstd.ZstdDecompressor()
	297
	298	# Object with read() works.
	299	dctx.read_from(io.BytesIO())
	300
	301	# Buffer protocol works.
	302	dctx.read_from(b'foobar')
	303
	304	with self.assertRaisesRegexp(ValueError, 'must pass an object with a read'):
	305	dctx.read_from(True)
	306
	307	def test_empty_input(self):
	308	dctx = zstd.ZstdDecompressor()
	309
	310	source = io.BytesIO()
	311	it = dctx.read_from(source)
	312	# TODO this is arguably wrong. Should get an error about missing frame foo.
	313	with self.assertRaises(StopIteration):
	314	next(it)
	315
	316	it = dctx.read_from(b'')
	317	with self.assertRaises(StopIteration):
	318	next(it)
	319
	320	def test_invalid_input(self):
	321	dctx = zstd.ZstdDecompressor()
	322
	323	source = io.BytesIO(b'foobar')
	324	it = dctx.read_from(source)
	325	with self.assertRaisesRegexp(zstd.ZstdError, 'Unknown frame descriptor'):
	326	next(it)
	327
	328	it = dctx.read_from(b'foobar')
	329	with self.assertRaisesRegexp(zstd.ZstdError, 'Unknown frame descriptor'):
	330	next(it)
	331
	332	def test_empty_roundtrip(self):
	333	cctx = zstd.ZstdCompressor(level=1, write_content_size=False)
	334	empty = cctx.compress(b'')
	335
	336	source = io.BytesIO(empty)
	337	source.seek(0)
	338
	339	dctx = zstd.ZstdDecompressor()
	340	it = dctx.read_from(source)
	341
	342	# No chunks should be emitted since there is no data.
	343	with self.assertRaises(StopIteration):
	344	next(it)
	345
	346	# Again for good measure.
	347	with self.assertRaises(StopIteration):
	348	next(it)
	349
	350	def test_skip_bytes_too_large(self):
	351	dctx = zstd.ZstdDecompressor()
	352
	353	with self.assertRaisesRegexp(ValueError, 'skip_bytes must be smaller than read_size'):
	354	dctx.read_from(b'', skip_bytes=1, read_size=1)
	355
	356	with self.assertRaisesRegexp(ValueError, 'skip_bytes larger than first input chunk'):
	357	b''.join(dctx.read_from(b'foobar', skip_bytes=10))
	358
	359	def test_skip_bytes(self):
	360	cctx = zstd.ZstdCompressor(write_content_size=False)
	361	compressed = cctx.compress(b'foobar')
	362
	363	dctx = zstd.ZstdDecompressor()
	364	output = b''.join(dctx.read_from(b'hdr' + compressed, skip_bytes=3))
	365	self.assertEqual(output, b'foobar')
	366
	367	def test_large_output(self):
	368	source = io.BytesIO()
	369	source.write(b'f' * zstd.DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE)
	370	source.write(b'o')
	371	source.seek(0)
	372
	373	cctx = zstd.ZstdCompressor(level=1)
	374	compressed = io.BytesIO(cctx.compress(source.getvalue()))
	375	compressed.seek(0)
	376
	377	dctx = zstd.ZstdDecompressor()
	378	it = dctx.read_from(compressed)
	379
	380	chunks = []
	381	chunks.append(next(it))
	382	chunks.append(next(it))
	383
	384	with self.assertRaises(StopIteration):
	385	next(it)
	386
	387	decompressed = b''.join(chunks)
	388	self.assertEqual(decompressed, source.getvalue())
	389
	390	# And again with buffer protocol.
	391	it = dctx.read_from(compressed.getvalue())
	392	chunks = []
	393	chunks.append(next(it))
	394	chunks.append(next(it))
	395
	396	with self.assertRaises(StopIteration):
	397	next(it)
	398
	399	decompressed = b''.join(chunks)
	400	self.assertEqual(decompressed, source.getvalue())
	401
	402	def test_large_input(self):
	403	bytes = list(struct.Struct('>B').pack(i) for i in range(256))
	404	compressed = io.BytesIO()
	405	input_size = 0
	406	cctx = zstd.ZstdCompressor(level=1)
	407	with cctx.write_to(compressed) as compressor:
	408	while True:
	409	compressor.write(random.choice(bytes))
	410	input_size += 1
	411
	412	have_compressed = len(compressed.getvalue()) > zstd.DECOMPRESSION_RECOMMENDED_INPUT_SIZE
	413	have_raw = input_size > zstd.DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE * 2
	414	if have_compressed and have_raw:
	415	break
	416
	417	compressed.seek(0)
	418	self.assertGreater(len(compressed.getvalue()),
	419	zstd.DECOMPRESSION_RECOMMENDED_INPUT_SIZE)
	420
	421	dctx = zstd.ZstdDecompressor()
	422	it = dctx.read_from(compressed)
	423
	424	chunks = []
	425	chunks.append(next(it))
	426	chunks.append(next(it))
	427	chunks.append(next(it))
	428
	429	with self.assertRaises(StopIteration):
	430	next(it)
	431
	432	decompressed = b''.join(chunks)
	433	self.assertEqual(len(decompressed), input_size)
	434
	435	# And again with buffer protocol.
	436	it = dctx.read_from(compressed.getvalue())
	437
	438	chunks = []
	439	chunks.append(next(it))
	440	chunks.append(next(it))
	441	chunks.append(next(it))
	442
	443	with self.assertRaises(StopIteration):
	444	next(it)
	445
	446	decompressed = b''.join(chunks)
	447	self.assertEqual(len(decompressed), input_size)
	448
	449	def test_interesting(self):
	450	# Found this edge case via fuzzing.
	451	cctx = zstd.ZstdCompressor(level=1)
	452
	453	source = io.BytesIO()
	454
	455	compressed = io.BytesIO()
	456	with cctx.write_to(compressed) as compressor:
	457	for i in range(256):
	458	chunk = b'\0' * 1024
	459	compressor.write(chunk)
	460	source.write(chunk)
	461
	462	dctx = zstd.ZstdDecompressor()
	463
	464	simple = dctx.decompress(compressed.getvalue(),
	465	max_output_size=len(source.getvalue()))
	466	self.assertEqual(simple, source.getvalue())
	467
	468	compressed.seek(0)
	469	streamed = b''.join(dctx.read_from(compressed))
	470	self.assertEqual(streamed, source.getvalue())
	471
	472	def test_read_write_size(self):
	473	source = OpCountingBytesIO(zstd.ZstdCompressor().compress(b'foobarfoobar'))
	474	dctx = zstd.ZstdDecompressor()
	475	for chunk in dctx.read_from(source, read_size=1, write_size=1):
	476	self.assertEqual(len(chunk), 1)
	477
	478	self.assertEqual(source._read_count, len(source.getvalue()))

contrib/python-zstandard/tests/test_estimate_sizes.py

0 created 644 +17 0

@@ -0,0 +1,17 b''
	1	try:
	2	import unittest2 as unittest
	3	except ImportError:
	4	import unittest
	5
	6	import zstd
	7
	8
	9	class TestSizes(unittest.TestCase):
	10	def test_decompression_size(self):
	11	size = zstd.estimate_decompression_context_size()
	12	self.assertGreater(size, 100000)
	13
	14	def test_compression_size(self):
	15	params = zstd.get_compression_parameters(3)
	16	size = zstd.estimate_compression_context_size(params)
	17	self.assertGreater(size, 100000)

contrib/python-zstandard/tests/test_module_attributes.py

0 created 644 +48 0

@@ -0,0 +1,48 b''
	1	from __future__ import unicode_literals
	2
	3	try:
	4	import unittest2 as unittest
	5	except ImportError:
	6	import unittest
	7
	8	import zstd
	9
	10	class TestModuleAttributes(unittest.TestCase):
	11	def test_version(self):
	12	self.assertEqual(zstd.ZSTD_VERSION, (1, 1, 1))
	13
	14	def test_constants(self):
	15	self.assertEqual(zstd.MAX_COMPRESSION_LEVEL, 22)
	16	self.assertEqual(zstd.FRAME_HEADER, b'\x28\xb5\x2f\xfd')
	17
	18	def test_hasattr(self):
	19	attrs = (
	20	'COMPRESSION_RECOMMENDED_INPUT_SIZE',
	21	'COMPRESSION_RECOMMENDED_OUTPUT_SIZE',
	22	'DECOMPRESSION_RECOMMENDED_INPUT_SIZE',
	23	'DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE',
	24	'MAGIC_NUMBER',
	25	'WINDOWLOG_MIN',
	26	'WINDOWLOG_MAX',
	27	'CHAINLOG_MIN',
	28	'CHAINLOG_MAX',
	29	'HASHLOG_MIN',
	30	'HASHLOG_MAX',
	31	'HASHLOG3_MAX',
	32	'SEARCHLOG_MIN',
	33	'SEARCHLOG_MAX',
	34	'SEARCHLENGTH_MIN',
	35	'SEARCHLENGTH_MAX',
	36	'TARGETLENGTH_MIN',
	37	'TARGETLENGTH_MAX',
	38	'STRATEGY_FAST',
	39	'STRATEGY_DFAST',
	40	'STRATEGY_GREEDY',
	41	'STRATEGY_LAZY',
	42	'STRATEGY_LAZY2',
	43	'STRATEGY_BTLAZY2',
	44	'STRATEGY_BTOPT',
	45	)
	46
	47	for a in attrs:
	48	self.assertTrue(hasattr(zstd, a))

contrib/python-zstandard/tests/test_roundtrip.py

0 created 644 +64 0

@@ -0,0 +1,64 b''
	1	import io
	2
	3	try:
	4	import unittest2 as unittest
	5	except ImportError:
	6	import unittest
	7
	8	try:
	9	import hypothesis
	10	import hypothesis.strategies as strategies
	11	except ImportError:
	12	raise unittest.SkipTest('hypothesis not available')
	13
	14	import zstd
	15
	16
	17	compression_levels = strategies.integers(min_value=1, max_value=22)
	18
	19
	20	class TestRoundTrip(unittest.TestCase):
	21	@hypothesis.given(strategies.binary(), compression_levels)
	22	def test_compress_write_to(self, data, level):
	23	"""Random data from compress() roundtrips via write_to."""
	24	cctx = zstd.ZstdCompressor(level=level)
	25	compressed = cctx.compress(data)
	26
	27	buffer = io.BytesIO()
	28	dctx = zstd.ZstdDecompressor()
	29	with dctx.write_to(buffer) as decompressor:
	30	decompressor.write(compressed)
	31
	32	self.assertEqual(buffer.getvalue(), data)
	33
	34	@hypothesis.given(strategies.binary(), compression_levels)
	35	def test_compressor_write_to_decompressor_write_to(self, data, level):
	36	"""Random data from compressor write_to roundtrips via write_to."""
	37	compress_buffer = io.BytesIO()
	38	decompressed_buffer = io.BytesIO()
	39
	40	cctx = zstd.ZstdCompressor(level=level)
	41	with cctx.write_to(compress_buffer) as compressor:
	42	compressor.write(data)
	43
	44	dctx = zstd.ZstdDecompressor()
	45	with dctx.write_to(decompressed_buffer) as decompressor:
	46	decompressor.write(compress_buffer.getvalue())
	47
	48	self.assertEqual(decompressed_buffer.getvalue(), data)
	49
	50	@hypothesis.given(strategies.binary(average_size=1048576))
	51	@hypothesis.settings(perform_health_check=False)
	52	def test_compressor_write_to_decompressor_write_to_larger(self, data):
	53	compress_buffer = io.BytesIO()
	54	decompressed_buffer = io.BytesIO()
	55
	56	cctx = zstd.ZstdCompressor(level=5)
	57	with cctx.write_to(compress_buffer) as compressor:
	58	compressor.write(data)
	59
	60	dctx = zstd.ZstdDecompressor()
	61	with dctx.write_to(decompressed_buffer) as decompressor:
	62	decompressor.write(compress_buffer.getvalue())
	63
	64	self.assertEqual(decompressed_buffer.getvalue(), data)

contrib/python-zstandard/tests/test_train_dictionary.py

0 created 644 +46 0

@@ -0,0 +1,46 b''
	1	import sys
	2
	3	try:
	4	import unittest2 as unittest
	5	except ImportError:
	6	import unittest
	7
	8	import zstd
	9
	10
	11	if sys.version_info[0] >= 3:
	12	int_type = int
	13	else:
	14	int_type = long
	15
	16
	17	class TestTrainDictionary(unittest.TestCase):
	18	def test_no_args(self):
	19	with self.assertRaises(TypeError):
	20	zstd.train_dictionary()
	21
	22	def test_bad_args(self):
	23	with self.assertRaises(TypeError):
	24	zstd.train_dictionary(8192, u'foo')
	25
	26	with self.assertRaises(ValueError):
	27	zstd.train_dictionary(8192, [u'foo'])
	28
	29	def test_basic(self):
	30	samples = []
	31	for i in range(128):
	32	samples.append(b'foo' * 64)
	33	samples.append(b'bar' * 64)
	34	samples.append(b'foobar' * 64)
	35	samples.append(b'baz' * 64)
	36	samples.append(b'foobaz' * 64)
	37	samples.append(b'bazfoo' * 64)
	38
	39	d = zstd.train_dictionary(8192, samples)
	40	self.assertLessEqual(len(d), 8192)
	41
	42	dict_id = d.dict_id()
	43	self.assertIsInstance(dict_id, int_type)
	44
	45	data = d.as_bytes()
	46	self.assertEqual(data[0:4], b'\x37\xa4\x30\xec')

contrib/python-zstandard/zstd.c

0 created 644 +112 0

@@ -0,0 +1,112 b''
	1	/**
	2	* Copyright (c) 2016-present, Gregory Szorc
	3	* All rights reserved.
	4	*
	5	* This software may be modified and distributed under the terms
	6	* of the BSD license. See the LICENSE file for details.
	7	*/
	8
	9	/* A Python C extension for Zstandard. */
	10
	11	#include "python-zstandard.h"
	12
	13	PyObject *ZstdError;
	14
	15	PyDoc_STRVAR(estimate_compression_context_size__doc__,
	16	"estimate_compression_context_size(compression_parameters)\n"
	17	"\n"
	18	"Give the amount of memory allocated for a compression context given a\n"
	19	"CompressionParameters instance");
	20
	21	PyDoc_STRVAR(estimate_decompression_context_size__doc__,
	22	"estimate_decompression_context_size()\n"
	23	"\n"
	24	"Estimate the amount of memory allocated to a decompression context.\n"
	25	);
	26
	27	static PyObject* estimate_decompression_context_size(PyObject* self) {
	28	return PyLong_FromSize_t(ZSTD_estimateDCtxSize());
	29	}
	30
	31	PyDoc_STRVAR(get_compression_parameters__doc__,
	32	"get_compression_parameters(compression_level[, source_size[, dict_size]])\n"
	33	"\n"
	34	"Obtains a ``CompressionParameters`` instance from a compression level and\n"
	35	"optional input size and dictionary size");
	36
	37	PyDoc_STRVAR(train_dictionary__doc__,
	38	"train_dictionary(dict_size, samples)\n"
	39	"\n"
	40	"Train a dictionary from sample data.\n"
	41	"\n"
	42	"A compression dictionary of size ``dict_size`` will be created from the\n"
	43	"iterable of samples provided by ``samples``.\n"
	44	"\n"
	45	"The raw dictionary content will be returned\n");
	46
	47	static char zstd_doc[] = "Interface to zstandard";
	48
	49	static PyMethodDef zstd_methods[] = {
	50	{ "estimate_compression_context_size", (PyCFunction)estimate_compression_context_size,
	51	METH_VARARGS, estimate_compression_context_size__doc__ },
	52	{ "estimate_decompression_context_size", (PyCFunction)estimate_decompression_context_size,
	53	METH_NOARGS, estimate_decompression_context_size__doc__ },
	54	{ "get_compression_parameters", (PyCFunction)get_compression_parameters,
	55	METH_VARARGS, get_compression_parameters__doc__ },
	56	{ "train_dictionary", (PyCFunction)train_dictionary,
	57	METH_VARARGS \| METH_KEYWORDS, train_dictionary__doc__ },
	58	{ NULL, NULL }
	59	};
	60
	61	void compressobj_module_init(PyObject* mod);
	62	void compressor_module_init(PyObject* mod);
	63	void compressionparams_module_init(PyObject* mod);
	64	void constants_module_init(PyObject* mod);
	65	void dictparams_module_init(PyObject* mod);
	66	void compressiondict_module_init(PyObject* mod);
	67	void compressionwriter_module_init(PyObject* mod);
	68	void compressoriterator_module_init(PyObject* mod);
	69	void decompressor_module_init(PyObject* mod);
	70	void decompressobj_module_init(PyObject* mod);
	71	void decompressionwriter_module_init(PyObject* mod);
	72	void decompressoriterator_module_init(PyObject* mod);
	73
	74	void zstd_module_init(PyObject* m) {
	75	compressionparams_module_init(m);
	76	dictparams_module_init(m);
	77	compressiondict_module_init(m);
	78	compressobj_module_init(m);
	79	compressor_module_init(m);
	80	compressionwriter_module_init(m);
	81	compressoriterator_module_init(m);
	82	constants_module_init(m);
	83	decompressor_module_init(m);
	84	decompressobj_module_init(m);
	85	decompressionwriter_module_init(m);
	86	decompressoriterator_module_init(m);
	87	}
	88
	89	#if PY_MAJOR_VERSION >= 3
	90	static struct PyModuleDef zstd_module = {
	91	PyModuleDef_HEAD_INIT,
	92	"zstd",
	93	zstd_doc,
	94	-1,
	95	zstd_methods
	96	};
	97
	98	PyMODINIT_FUNC PyInit_zstd(void) {
	99	PyObject *m = PyModule_Create(&zstd_module);
	100	if (m) {
	101	zstd_module_init(m);
	102	}
	103	return m;
	104	}
	105	#else
	106	PyMODINIT_FUNC initzstd(void) {
	107	PyObject *m = Py_InitModule3("zstd", zstd_methods, zstd_doc);
	108	if (m) {
	109	zstd_module_init(m);
	110	}
	111	}
	112	#endif

contrib/python-zstandard/zstd_cffi.py

0 created 644 +152 0

@@ -0,0 +1,152 b''
	1	# Copyright (c) 2016-present, Gregory Szorc
	2	# All rights reserved.
	3	#
	4	# This software may be modified and distributed under the terms
	5	# of the BSD license. See the LICENSE file for details.
	6
	7	"""Python interface to the Zstandard (zstd) compression library."""
	8
	9	from __future__ import absolute_import, unicode_literals
	10
	11	import io
	12
	13	from _zstd_cffi import (
	14	ffi,
	15	lib,
	16	)
	17
	18
	19	_CSTREAM_IN_SIZE = lib.ZSTD_CStreamInSize()
	20	_CSTREAM_OUT_SIZE = lib.ZSTD_CStreamOutSize()
	21
	22
	23	class _ZstdCompressionWriter(object):
	24	def __init__(self, cstream, writer):
	25	self._cstream = cstream
	26	self._writer = writer
	27
	28	def __enter__(self):
	29	return self
	30
	31	def __exit__(self, exc_type, exc_value, exc_tb):
	32	if not exc_type and not exc_value and not exc_tb:
	33	out_buffer = ffi.new('ZSTD_outBuffer *')
	34	out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE)
	35	out_buffer.size = _CSTREAM_OUT_SIZE
	36	out_buffer.pos = 0
	37
	38	while True:
	39	res = lib.ZSTD_endStream(self._cstream, out_buffer)
	40	if lib.ZSTD_isError(res):
	41	raise Exception('error ending compression stream: %s' % lib.ZSTD_getErrorName)
	42
	43	if out_buffer.pos:
	44	self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
	45	out_buffer.pos = 0
	46
	47	if res == 0:
	48	break
	49
	50	return False
	51
	52	def write(self, data):
	53	out_buffer = ffi.new('ZSTD_outBuffer *')
	54	out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE)
	55	out_buffer.size = _CSTREAM_OUT_SIZE
	56	out_buffer.pos = 0
	57
	58	# TODO can we reuse existing memory?
	59	in_buffer = ffi.new('ZSTD_inBuffer *')
	60	in_buffer.src = ffi.new('char[]', data)
	61	in_buffer.size = len(data)
	62	in_buffer.pos = 0
	63	while in_buffer.pos < in_buffer.size:
	64	res = lib.ZSTD_compressStream(self._cstream, out_buffer, in_buffer)
	65	if lib.ZSTD_isError(res):
	66	raise Exception('zstd compress error: %s' % lib.ZSTD_getErrorName(res))
	67
	68	if out_buffer.pos:
	69	self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
	70	out_buffer.pos = 0
	71
	72
	73	class ZstdCompressor(object):
	74	def __init__(self, level=3, dict_data=None, compression_params=None):
	75	if dict_data:
	76	raise Exception('dict_data not yet supported')
	77	if compression_params:
	78	raise Exception('compression_params not yet supported')
	79
	80	self._compression_level = level
	81
	82	def compress(self, data):
	83	# Just use the stream API for now.
	84	output = io.BytesIO()
	85	with self.write_to(output) as compressor:
	86	compressor.write(data)
	87	return output.getvalue()
	88
	89	def copy_stream(self, ifh, ofh):
	90	cstream = self._get_cstream()
	91
	92	in_buffer = ffi.new('ZSTD_inBuffer *')
	93	out_buffer = ffi.new('ZSTD_outBuffer *')
	94
	95	out_buffer.dst = ffi.new('char[]', _CSTREAM_OUT_SIZE)
	96	out_buffer.size = _CSTREAM_OUT_SIZE
	97	out_buffer.pos = 0
	98
	99	total_read, total_write = 0, 0
	100
	101	while True:
	102	data = ifh.read(_CSTREAM_IN_SIZE)
	103	if not data:
	104	break
	105
	106	total_read += len(data)
	107
	108	in_buffer.src = ffi.new('char[]', data)
	109	in_buffer.size = len(data)
	110	in_buffer.pos = 0
	111
	112	while in_buffer.pos < in_buffer.size:
	113	res = lib.ZSTD_compressStream(cstream, out_buffer, in_buffer)
	114	if lib.ZSTD_isError(res):
	115	raise Exception('zstd compress error: %s' %
	116	lib.ZSTD_getErrorName(res))
	117
	118	if out_buffer.pos:
	119	ofh.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
	120	total_write = out_buffer.pos
	121	out_buffer.pos = 0
	122
	123	# We've finished reading. Flush the compressor.
	124	while True:
	125	res = lib.ZSTD_endStream(cstream, out_buffer)
	126	if lib.ZSTD_isError(res):
	127	raise Exception('error ending compression stream: %s' %
	128	lib.ZSTD_getErrorName(res))
	129
	130	if out_buffer.pos:
	131	ofh.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
	132	total_write += out_buffer.pos
	133	out_buffer.pos = 0
	134
	135	if res == 0:
	136	break
	137
	138	return total_read, total_write
	139
	140	def write_to(self, writer):
	141	return _ZstdCompressionWriter(self._get_cstream(), writer)
	142
	143	def _get_cstream(self):
	144	cstream = lib.ZSTD_createCStream()
	145	cstream = ffi.gc(cstream, lib.ZSTD_freeCStream)
	146
	147	res = lib.ZSTD_initCStream(cstream, self._compression_level)
	148	if lib.ZSTD_isError(res):
	149	raise Exception('cannot init CStream: %s' %
	150	lib.ZSTD_getErrorName(res))
	151
	152	return cstream

tests/test-check-code.t

0 +1 -1

             New errors are not allowed. Warnings are strongly discouraged.
             (The writing "no-che?k-code" is for not skipping this file when checking.)
-              $ hg locate | sed 's-\\-/-g' |
+              $ hg locate -X contrib/python-zstandard | sed 's-\\-/-g' |
               >   xargs "$check_code" --warnings --per-file=0 || false
               Skipping hgext/fsmonitor/pywatchman/__init__.py it has no-che?k-code (glob)
               Skipping hgext/fsmonitor/pywatchman/bser.c it has no-che?k-code (glob)

tests/test-check-module-imports.t

0 +1 0

               $ hg locate 'set:**.py or grep(r"^#!.*?python")' \
               > 'tests/**.t' \
               > -X contrib/debugshell.py \
+              > -X contrib/python-zstandard/ \
               > -X contrib/win32/hgwebdir_wsgi.py \
               > -X doc/gendoc.py \
               > -X doc/hgmanpage.py \

tests/test-check-py3-compat.t

0 +11 0

@@ -4,6 +4,17 b''
4	$ cd "$TESTDIR"/..	4	$ cd "$TESTDIR"/..
5		5
6	$ hg files 'set:(**.py)' \| sed 's\|\\\|/\|g' \| xargs python contrib/check-py3-compat.py	6	$ hg files 'set:(**.py)' \| sed 's\|\\\|/\|g' \| xargs python contrib/check-py3-compat.py
		7	contrib/python-zstandard/setup.py not using absolute_import
		8	contrib/python-zstandard/setup_zstd.py not using absolute_import
		9	contrib/python-zstandard/tests/common.py not using absolute_import
		10	contrib/python-zstandard/tests/test_cffi.py not using absolute_import
		11	contrib/python-zstandard/tests/test_compressor.py not using absolute_import
		12	contrib/python-zstandard/tests/test_data_structures.py not using absolute_import
		13	contrib/python-zstandard/tests/test_decompressor.py not using absolute_import
		14	contrib/python-zstandard/tests/test_estimate_sizes.py not using absolute_import
		15	contrib/python-zstandard/tests/test_module_attributes.py not using absolute_import
		16	contrib/python-zstandard/tests/test_roundtrip.py not using absolute_import
		17	contrib/python-zstandard/tests/test_train_dictionary.py not using absolute_import
7	hgext/fsmonitor/pywatchman/__init__.py not using absolute_import	18	hgext/fsmonitor/pywatchman/__init__.py not using absolute_import
8	hgext/fsmonitor/pywatchman/__init__.py requires print_function	19	hgext/fsmonitor/pywatchman/__init__.py requires print_function
9	hgext/fsmonitor/pywatchman/capabilities.py not using absolute_import	20	hgext/fsmonitor/pywatchman/capabilities.py not using absolute_import

tests/test-check-pyflakes.t

0 +1 -1

               > -X mercurial/pycompat.py \
               > 2>/dev/null \
               > | xargs pyflakes 2>/dev/null | "$TESTDIR/filterpyflakes.py"
+              contrib/python-zstandard/tests/test_data_structures.py:107: local variable 'size' is assigned to but never used
               tests/filterpyflakes.py:39: undefined name 'undefinedname'

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages