upstream/mercurial-mirror Commit - r30895:c32454d6

zstd: vendor python-zstandard 0.7.0...

zstd: vendor python-zstandard 0.7.0 Commit from https://github.com/indygreg/python-zstandard is imported without modifications (other than removing unwanted files). The vendored zstd library within has been upgraded from 1.1.2 to 1.1.3. This version introduced new APIs for threads, thread pools, multi-threaded compression, and a new dictionary builder (COVER). These features are not yet used by python-zstandard (or Mercurial for that matter). However, that will likely change in the next python-zstandard release (and I think there are opportunities for Mercurial to take advantage of the multi-threaded APIs). Relevant to Mercurial, the CFFI bindings are now fully implemented. This means zstd should "just work" with PyPy (although I haven't tried). The python-zstandard test suite also runs all tests against both the C extension and CFFI bindings to ensure feature parity. There is also a "decompress_content_dict_chain()" API. This was derived from discussions with Yann Collet on list about alternate ways of encoding delta chains. The change most relevant to Mercurial is a performance enhancement in the simple decompression API to reuse a data structure across operations. This makes decompression of multiple inputs significantly faster. (This scenario occurs when reading revlog delta chains, for example.) Using python-zstandard's bench.py to measure the performance difference... On changelog chunks in the mozilla-unified repo: decompress discrete decompress() reuse zctx 1.262243 wall; 1.260000 CPU; 1.260000 user; 0.000000 sys 170.43 MB/s (best of 3) 0.949106 wall; 0.950000 CPU; 0.950000 user; 0.000000 sys 226.66 MB/s (best of 4) decompress discrete dict decompress() reuse zctx 0.692170 wall; 0.690000 CPU; 0.690000 user; 0.000000 sys 310.80 MB/s (best of 5) 0.437088 wall; 0.440000 CPU; 0.440000 user; 0.000000 sys 492.17 MB/s (best of 7) On manifest chunks in the mozilla-unified repo: decompress discrete decompress() reuse zctx 1.367284 wall; 1.370000 CPU; 1.370000 user; 0.000000 sys 274.01 MB/s (best of 3) 1.086831 wall; 1.080000 CPU; 1.080000 user; 0.000000 sys 344.72 MB/s (best of 3) decompress discrete dict decompress() reuse zctx 0.993272 wall; 0.990000 CPU; 0.990000 user; 0.000000 sys 377.19 MB/s (best of 3) 0.678651 wall; 0.680000 CPU; 0.680000 user; 0.000000 sys 552.06 MB/s (best of 5) That should make reads on zstd revlogs a bit faster ;) # no-check-commit

Gregory Szorc -

r30895:c32454d6 default

parent child

Expand all files

contrib/python-zstandard/c-ext/frameparams.c

0 created 644 +132 0

			@@ -0,0 +1,132 b''
		1	/**
		2	* Copyright (c) 2017-present, Gregory Szorc
		3	* All rights reserved.
		4	*
		5	* This software may be modified and distributed under the terms
		6	* of the BSD license. See the LICENSE file for details.
		7	*/
		8
		9	#include "python-zstandard.h"
		10
		11	extern PyObject* ZstdError;
		12
		13	PyDoc_STRVAR(FrameParameters__doc__,
		14	"FrameParameters: information about a zstd frame");
		15
		16	FrameParametersObject* get_frame_parameters(PyObject* self, PyObject* args) {
		17	const char* source;
		18	Py_ssize_t sourceSize;
		19	ZSTD_frameParams params;
		20	FrameParametersObject* result = NULL;
		21	size_t zresult;
		22
		23	#if PY_MAJOR_VERSION >= 3
		24	if (!PyArg_ParseTuple(args, "y#:get_frame_parameters",
		25	#else
		26	if (!PyArg_ParseTuple(args, "s#:get_frame_parameters",
		27	#endif
		28	&source, &sourceSize)) {
		29	return NULL;
		30	}
		31
		32	/* Needed for Python 2 to reject unicode */
		33	if (!PyBytes_Check(PyTuple_GET_ITEM(args, 0))) {
		34	PyErr_SetString(PyExc_TypeError, "argument must be bytes");
		35	return NULL;
		36	}
		37
		38	zresult = ZSTD_getFrameParams(&params, (void*)source, sourceSize);
		39
		40	if (ZSTD_isError(zresult)) {
		41	PyErr_Format(ZstdError, "cannot get frame parameters: %s", ZSTD_getErrorName(zresult));
		42	return NULL;
		43	}
		44
		45	if (zresult) {
		46	PyErr_Format(ZstdError, "not enough data for frame parameters; need %zu bytes", zresult);
		47	return NULL;
		48	}
		49
		50	result = PyObject_New(FrameParametersObject, &FrameParametersType);
		51	if (!result) {
		52	return NULL;
		53	}
		54
		55	result->frameContentSize = params.frameContentSize;
		56	result->windowSize = params.windowSize;
		57	result->dictID = params.dictID;
		58	result->checksumFlag = params.checksumFlag ? 1 : 0;
		59
		60	return result;
		61	}
		62
		63	static void FrameParameters_dealloc(PyObject* self) {
		64	PyObject_Del(self);
		65	}
		66
		67	static PyMemberDef FrameParameters_members[] = {
		68	{ "content_size", T_ULONGLONG,
		69	offsetof(FrameParametersObject, frameContentSize), READONLY,
		70	"frame content size" },
		71	{ "window_size", T_UINT,
		72	offsetof(FrameParametersObject, windowSize), READONLY,
		73	"window size" },
		74	{ "dict_id", T_UINT,
		75	offsetof(FrameParametersObject, dictID), READONLY,
		76	"dictionary ID" },
		77	{ "has_checksum", T_BOOL,
		78	offsetof(FrameParametersObject, checksumFlag), READONLY,
		79	"checksum flag" },
		80	{ NULL }
		81	};
		82
		83	PyTypeObject FrameParametersType = {
		84	PyVarObject_HEAD_INIT(NULL, 0)
		85	"FrameParameters", /* tp_name */
		86	sizeof(FrameParametersObject), /* tp_basicsize */
		87	0, /* tp_itemsize */
		88	(destructor)FrameParameters_dealloc, /* tp_dealloc */
		89	0, /* tp_print */
		90	0, /* tp_getattr */
		91	0, /* tp_setattr */
		92	0, /* tp_compare */
		93	0, /* tp_repr */
		94	0, /* tp_as_number */
		95	0, /* tp_as_sequence */
		96	0, /* tp_as_mapping */
		97	0, /* tp_hash */
		98	0, /* tp_call */
		99	0, /* tp_str */
		100	0, /* tp_getattro */
		101	0, /* tp_setattro */
		102	0, /* tp_as_buffer */
		103	Py_TPFLAGS_DEFAULT, /* tp_flags */
		104	FrameParameters__doc__, /* tp_doc */
		105	0, /* tp_traverse */
		106	0, /* tp_clear */
		107	0, /* tp_richcompare */
		108	0, /* tp_weaklistoffset */
		109	0, /* tp_iter */
		110	0, /* tp_iternext */
		111	0, /* tp_methods */
		112	FrameParameters_members, /* tp_members */
		113	0, /* tp_getset */
		114	0, /* tp_base */
		115	0, /* tp_dict */
		116	0, /* tp_descr_get */
		117	0, /* tp_descr_set */
		118	0, /* tp_dictoffset */
		119	0, /* tp_init */
		120	0, /* tp_alloc */
		121	0, /* tp_new */
		122	};
		123
		124	void frameparams_module_init(PyObject* mod) {
		125	Py_TYPE(&FrameParametersType) = &PyType_Type;
		126	if (PyType_Ready(&FrameParametersType) < 0) {
		127	return;
		128	}
		129
		130	Py_IncRef((PyObject*)&FrameParametersType);
		131	PyModule_AddObject(mod, "FrameParameters", (PyObject*)&FrameParametersType);
		132	}

contrib/python-zstandard/zstd/common/pool.c

0 created 644 +194 0

			@@ -0,0 +1,194 b''
		1	/**
		2	* Copyright (c) 2016-present, Facebook, Inc.
		3	* All rights reserved.
		4	*
		5	* This source code is licensed under the BSD-style license found in the
		6	* LICENSE file in the root directory of this source tree. An additional grant
		7	* of patent rights can be found in the PATENTS file in the same directory.
		8	*/
		9
		10
		11	/* ====== Dependencies ======= */
		12	#include <stddef.h> /* size_t */
		13	#include <stdlib.h> /* malloc, calloc, free */
		14	#include "pool.h"
		15
		16	/* ====== Compiler specifics ====== */
		17	#if defined(_MSC_VER)
		18	# pragma warning(disable : 4204) /* disable: C4204: non-constant aggregate initializer */
		19	#endif
		20
		21
		22	#ifdef ZSTD_MULTITHREAD
		23
		24	#include "threading.h" /* pthread adaptation */
		25
		26	/* A job is a function and an opaque argument */
		27	typedef struct POOL_job_s {
		28	POOL_function function;
		29	void *opaque;
		30	} POOL_job;
		31
		32	struct POOL_ctx_s {
		33	/* Keep track of the threads */
		34	pthread_t *threads;
		35	size_t numThreads;
		36
		37	/* The queue is a circular buffer */
		38	POOL_job *queue;
		39	size_t queueHead;
		40	size_t queueTail;
		41	size_t queueSize;
		42	/* The mutex protects the queue */
		43	pthread_mutex_t queueMutex;
		44	/* Condition variable for pushers to wait on when the queue is full */
		45	pthread_cond_t queuePushCond;
		46	/* Condition variables for poppers to wait on when the queue is empty */
		47	pthread_cond_t queuePopCond;
		48	/* Indicates if the queue is shutting down */
		49	int shutdown;
		50	};
		51
		52	/* POOL_thread() :
		53	Work thread for the thread pool.
		54	Waits for jobs and executes them.
		55	@returns : NULL on failure else non-null.
		56	*/
		57	static void* POOL_thread(void* opaque) {
		58	POOL_ctx* const ctx = (POOL_ctx*)opaque;
		59	if (!ctx) { return NULL; }
		60	for (;;) {
		61	/* Lock the mutex and wait for a non-empty queue or until shutdown */
		62	pthread_mutex_lock(&ctx->queueMutex);
		63	while (ctx->queueHead == ctx->queueTail && !ctx->shutdown) {
		64	pthread_cond_wait(&ctx->queuePopCond, &ctx->queueMutex);
		65	}
		66	/* empty => shutting down: so stop */
		67	if (ctx->queueHead == ctx->queueTail) {
		68	pthread_mutex_unlock(&ctx->queueMutex);
		69	return opaque;
		70	}
		71	/* Pop a job off the queue */
		72	{ POOL_job const job = ctx->queue[ctx->queueHead];
		73	ctx->queueHead = (ctx->queueHead + 1) % ctx->queueSize;
		74	/* Unlock the mutex, signal a pusher, and run the job */
		75	pthread_mutex_unlock(&ctx->queueMutex);
		76	pthread_cond_signal(&ctx->queuePushCond);
		77	job.function(job.opaque);
		78	}
		79	}
		80	/* Unreachable */
		81	}
		82
		83	POOL_ctx *POOL_create(size_t numThreads, size_t queueSize) {
		84	POOL_ctx *ctx;
		85	/* Check the parameters */
		86	if (!numThreads \|\| !queueSize) { return NULL; }
		87	/* Allocate the context and zero initialize */
		88	ctx = (POOL_ctx *)calloc(1, sizeof(POOL_ctx));
		89	if (!ctx) { return NULL; }
		90	/* Initialize the job queue.
		91	* It needs one extra space since one space is wasted to differentiate empty
		92	* and full queues.
		93	*/
		94	ctx->queueSize = queueSize + 1;
		95	ctx->queue = (POOL_job )malloc(ctx->queueSize sizeof(POOL_job));
		96	ctx->queueHead = 0;
		97	ctx->queueTail = 0;
		98	pthread_mutex_init(&ctx->queueMutex, NULL);
		99	pthread_cond_init(&ctx->queuePushCond, NULL);
		100	pthread_cond_init(&ctx->queuePopCond, NULL);
		101	ctx->shutdown = 0;
		102	/* Allocate space for the thread handles */
		103	ctx->threads = (pthread_t )malloc(numThreads sizeof(pthread_t));
		104	ctx->numThreads = 0;
		105	/* Check for errors */
		106	if (!ctx->threads \|\| !ctx->queue) { POOL_free(ctx); return NULL; }
		107	/* Initialize the threads */
		108	{ size_t i;
		109	for (i = 0; i < numThreads; ++i) {
		110	if (pthread_create(&ctx->threads[i], NULL, &POOL_thread, ctx)) {
		111	ctx->numThreads = i;
		112	POOL_free(ctx);
		113	return NULL;
		114	} }
		115	ctx->numThreads = numThreads;
		116	}
		117	return ctx;
		118	}
		119
		120	/*! POOL_join() :
		121	Shutdown the queue, wake any sleeping threads, and join all of the threads.
		122	*/
		123	static void POOL_join(POOL_ctx *ctx) {
		124	/* Shut down the queue */
		125	pthread_mutex_lock(&ctx->queueMutex);
		126	ctx->shutdown = 1;
		127	pthread_mutex_unlock(&ctx->queueMutex);
		128	/* Wake up sleeping threads */
		129	pthread_cond_broadcast(&ctx->queuePushCond);
		130	pthread_cond_broadcast(&ctx->queuePopCond);
		131	/* Join all of the threads */
		132	{ size_t i;
		133	for (i = 0; i < ctx->numThreads; ++i) {
		134	pthread_join(ctx->threads[i], NULL);
		135	} }
		136	}
		137
		138	void POOL_free(POOL_ctx *ctx) {
		139	if (!ctx) { return; }
		140	POOL_join(ctx);
		141	pthread_mutex_destroy(&ctx->queueMutex);
		142	pthread_cond_destroy(&ctx->queuePushCond);
		143	pthread_cond_destroy(&ctx->queuePopCond);
		144	if (ctx->queue) free(ctx->queue);
		145	if (ctx->threads) free(ctx->threads);
		146	free(ctx);
		147	}
		148
		149	void POOL_add(void ctxVoid, POOL_function function, void opaque) {
		150	POOL_ctx ctx = (POOL_ctx )ctxVoid;
		151	if (!ctx) { return; }
		152
		153	pthread_mutex_lock(&ctx->queueMutex);
		154	{ POOL_job const job = {function, opaque};
		155	/* Wait until there is space in the queue for the new job */
		156	size_t newTail = (ctx->queueTail + 1) % ctx->queueSize;
		157	while (ctx->queueHead == newTail && !ctx->shutdown) {
		158	pthread_cond_wait(&ctx->queuePushCond, &ctx->queueMutex);
		159	newTail = (ctx->queueTail + 1) % ctx->queueSize;
		160	}
		161	/* The queue is still going => there is space */
		162	if (!ctx->shutdown) {
		163	ctx->queue[ctx->queueTail] = job;
		164	ctx->queueTail = newTail;
		165	}
		166	}
		167	pthread_mutex_unlock(&ctx->queueMutex);
		168	pthread_cond_signal(&ctx->queuePopCond);
		169	}
		170
		171	#else /* ZSTD_MULTITHREAD not defined */
		172	/* No multi-threading support */
		173
		174	/* We don't need any data, but if it is empty malloc() might return NULL. */
		175	struct POOL_ctx_s {
		176	int data;
		177	};
		178
		179	POOL_ctx *POOL_create(size_t numThreads, size_t queueSize) {
		180	(void)numThreads;
		181	(void)queueSize;
		182	return (POOL_ctx *)malloc(sizeof(POOL_ctx));
		183	}
		184
		185	void POOL_free(POOL_ctx *ctx) {
		186	if (ctx) free(ctx);
		187	}
		188
		189	void POOL_add(void ctx, POOL_function function, void opaque) {
		190	(void)ctx;
		191	function(opaque);
		192	}
		193
		194	#endif /* ZSTD_MULTITHREAD */

contrib/python-zstandard/zstd/common/pool.h

0 created 644 +56 0

			@@ -0,0 +1,56 b''
		1	/**
		2	* Copyright (c) 2016-present, Facebook, Inc.
		3	* All rights reserved.
		4	*
		5	* This source code is licensed under the BSD-style license found in the
		6	* LICENSE file in the root directory of this source tree. An additional grant
		7	* of patent rights can be found in the PATENTS file in the same directory.
		8	*/
		9	#ifndef POOL_H
		10	#define POOL_H
		11
		12	#if defined (__cplusplus)
		13	extern "C" {
		14	#endif
		15
		16
		17	#include <stddef.h> /* size_t */
		18
		19	typedef struct POOL_ctx_s POOL_ctx;
		20
		21	/*! POOL_create() :
		22	Create a thread pool with at most `numThreads` threads.
		23	`numThreads` must be at least 1.
		24	The maximum number of queued jobs before blocking is `queueSize`.
		25	`queueSize` must be at least 1.
		26	@return : The POOL_ctx pointer on success else NULL.
		27	*/
		28	POOL_ctx *POOL_create(size_t numThreads, size_t queueSize);
		29
		30	/*! POOL_free() :
		31	Free a thread pool returned by POOL_create().
		32	*/
		33	void POOL_free(POOL_ctx *ctx);
		34
		35	/*! POOL_function :
		36	The function type that can be added to a thread pool.
		37	*/
		38	typedef void (POOL_function)(void );
		39	/*! POOL_add_function :
		40	The function type for a generic thread pool add function.
		41	*/
		42	typedef void (POOL_add_function)(void , POOL_function, void *);
		43
		44	/*! POOL_add() :
		45	Add the job `function(opaque)` to the thread pool.
		46	Possibly blocks until there is room in the queue.
		47	Note : The function may be executed asynchronously, so `opaque` must live until the function has been completed.
		48	*/
		49	void POOL_add(void ctx, POOL_function function, void opaque);
		50
		51
		52	#if defined (__cplusplus)
		53	}
		54	#endif
		55
		56	#endif

contrib/python-zstandard/zstd/common/threading.c

0 created 644 +79 0

			@@ -0,0 +1,79 b''
		1
		2	/**
		3	* Copyright (c) 2016 Tino Reichardt
		4	* All rights reserved.
		5	*
		6	* This source code is licensed under the BSD-style license found in the
		7	* LICENSE file in the root directory of this source tree. An additional grant
		8	* of patent rights can be found in the PATENTS file in the same directory.
		9	*
		10	* You can contact the author at:
		11	* - zstdmt source repository: https://github.com/mcmilk/zstdmt
		12	*/
		13
		14	/**
		15	* This file will hold wrapper for systems, which do not support pthreads
		16	*/
		17
		18	/* ====== Compiler specifics ====== */
		19	#if defined(_MSC_VER)
		20	# pragma warning(disable : 4206) /* disable: C4206: translation unit is empty (when ZSTD_MULTITHREAD is not defined) */
		21	#endif
		22
		23
		24	#if defined(ZSTD_MULTITHREAD) && defined(_WIN32)
		25
		26	/**
		27	* Windows minimalist Pthread Wrapper, based on :
		28	* http://www.cse.wustl.edu/~schmidt/win32-cv-1.html
		29	*/
		30
		31
		32	/* === Dependencies === */
		33	#include <process.h>
		34	#include <errno.h>
		35	#include "threading.h"
		36
		37
		38	/* === Implementation === */
		39
		40	static unsigned __stdcall worker(void *arg)
		41	{
		42	pthread_t* const thread = (pthread_t*) arg;
		43	thread->arg = thread->start_routine(thread->arg);
		44	return 0;
		45	}
		46
		47	int pthread_create(pthread_t* thread, const void* unused,
		48	void* (start_routine) (void), void* arg)
		49	{
		50	(void)unused;
		51	thread->arg = arg;
		52	thread->start_routine = start_routine;
		53	thread->handle = (HANDLE) _beginthreadex(NULL, 0, worker, thread, 0, NULL);
		54
		55	if (!thread->handle)
		56	return errno;
		57	else
		58	return 0;
		59	}
		60
		61	int _pthread_join(pthread_t * thread, void **value_ptr)
		62	{
		63	DWORD result;
		64
		65	if (!thread->handle) return 0;
		66
		67	result = WaitForSingleObject(thread->handle, INFINITE);
		68	switch (result) {
		69	case WAIT_OBJECT_0:
		70	if (value_ptr) *value_ptr = thread->arg;
		71	return 0;
		72	case WAIT_ABANDONED:
		73	return EINVAL;
		74	default:
		75	return GetLastError();
		76	}
		77	}
		78
		79	#endif /* ZSTD_MULTITHREAD */

contrib/python-zstandard/zstd/common/threading.h

0 created 644 +104 0

			@@ -0,0 +1,104 b''
		1
		2	/**
		3	* Copyright (c) 2016 Tino Reichardt
		4	* All rights reserved.
		5	*
		6	* This source code is licensed under the BSD-style license found in the
		7	* LICENSE file in the root directory of this source tree. An additional grant
		8	* of patent rights can be found in the PATENTS file in the same directory.
		9	*
		10	* You can contact the author at:
		11	* - zstdmt source repository: https://github.com/mcmilk/zstdmt
		12	*/
		13
		14	#ifndef THREADING_H_938743
		15	#define THREADING_H_938743
		16
		17	#if defined (__cplusplus)
		18	extern "C" {
		19	#endif
		20
		21	#if defined(ZSTD_MULTITHREAD) && defined(_WIN32)
		22
		23	/**
		24	* Windows minimalist Pthread Wrapper, based on :
		25	* http://www.cse.wustl.edu/~schmidt/win32-cv-1.html
		26	*/
		27	#ifdef WINVER
		28	# undef WINVER
		29	#endif
		30	#define WINVER 0x0600
		31
		32	#ifdef _WIN32_WINNT
		33	# undef _WIN32_WINNT
		34	#endif
		35	#define _WIN32_WINNT 0x0600
		36
		37	#ifndef WIN32_LEAN_AND_MEAN
		38	# define WIN32_LEAN_AND_MEAN
		39	#endif
		40
		41	#include <windows.h>
		42
		43	/* mutex */
		44	#define pthread_mutex_t CRITICAL_SECTION
		45	#define pthread_mutex_init(a,b) InitializeCriticalSection((a))
		46	#define pthread_mutex_destroy(a) DeleteCriticalSection((a))
		47	#define pthread_mutex_lock(a) EnterCriticalSection((a))
		48	#define pthread_mutex_unlock(a) LeaveCriticalSection((a))
		49
		50	/* condition variable */
		51	#define pthread_cond_t CONDITION_VARIABLE
		52	#define pthread_cond_init(a, b) InitializeConditionVariable((a))
		53	#define pthread_cond_destroy(a) /* No delete */
		54	#define pthread_cond_wait(a, b) SleepConditionVariableCS((a), (b), INFINITE)
		55	#define pthread_cond_signal(a) WakeConditionVariable((a))
		56	#define pthread_cond_broadcast(a) WakeAllConditionVariable((a))
		57
		58	/* pthread_create() and pthread_join() */
		59	typedef struct {
		60	HANDLE handle;
		61	void* (start_routine)(void);
		62	void* arg;
		63	} pthread_t;
		64
		65	int pthread_create(pthread_t* thread, const void* unused,
		66	void* (start_routine) (void), void* arg);
		67
		68	#define pthread_join(a, b) _pthread_join(&(a), (b))
		69	int _pthread_join(pthread_t* thread, void** value_ptr);
		70
		71	/**
		72	* add here more wrappers as required
		73	*/
		74
		75
		76	#elif defined(ZSTD_MULTITHREAD) /* posix assumed ; need a better detection mathod */
		77	/* === POSIX Systems === */
		78	# include <pthread.h>
		79
		80	#else /* ZSTD_MULTITHREAD not defined */
		81	/* No multithreading support */
		82
		83	#define pthread_mutex_t int /* #define rather than typedef, as sometimes pthread support is implicit, resulting in duplicated symbols */
		84	#define pthread_mutex_init(a,b)
		85	#define pthread_mutex_destroy(a)
		86	#define pthread_mutex_lock(a)
		87	#define pthread_mutex_unlock(a)
		88
		89	#define pthread_cond_t int
		90	#define pthread_cond_init(a,b)
		91	#define pthread_cond_destroy(a)
		92	#define pthread_cond_wait(a,b)
		93	#define pthread_cond_signal(a)
		94	#define pthread_cond_broadcast(a)
		95
		96	/* do not use pthread_t */
		97
		98	#endif /* ZSTD_MULTITHREAD */
		99
		100	#if defined (__cplusplus)
		101	}
		102	#endif
		103
		104	#endif /* THREADING_H_938743 */

contrib/python-zstandard/zstd/compress/zstdmt_compress.c

0 created 644 +740 0

This diff has been collapsed as it changes many lines, (740 lines changed) Show them Hide them
		@@ -0,0 +1,740 b''
	1	/**
	2	* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
	3	* All rights reserved.
	4	*
	5	* This source code is licensed under the BSD-style license found in the
	6	* LICENSE file in the root directory of this source tree. An additional grant
	7	* of patent rights can be found in the PATENTS file in the same directory.
	8	*/
	9
	10
	11	/* ====== Tuning parameters ====== */
	12	#define ZSTDMT_NBTHREADS_MAX 128
	13
	14
	15	/* ====== Compiler specifics ====== */
	16	#if defined(_MSC_VER)
	17	# pragma warning(disable : 4204) /* disable: C4204: non-constant aggregate initializer */
	18	#endif
	19
	20
	21	/* ====== Dependencies ====== */
	22	#include <stdlib.h> /* malloc */
	23	#include <string.h> /* memcpy */
	24	#include "pool.h" /* threadpool */
	25	#include "threading.h" /* mutex */
	26	#include "zstd_internal.h" /* MIN, ERROR, ZSTD_, ZSTD_highbit32 /
	27	#include "zstdmt_compress.h"
	28	#define XXH_STATIC_LINKING_ONLY /* XXH64_state_t */
	29	#include "xxhash.h"
	30
	31
	32	/* ====== Debug ====== */
	33	#if 0
	34
	35	# include <stdio.h>
	36	# include <unistd.h>
	37	# include <sys/times.h>
	38	static unsigned g_debugLevel = 3;
	39	# define DEBUGLOGRAW(l, ...) if (l<=g_debugLevel) { fprintf(stderr, __VA_ARGS__); }
	40	# define DEBUGLOG(l, ...) if (l<=g_debugLevel) { fprintf(stderr, __FILE__ ": "); fprintf(stderr, __VA_ARGS__); fprintf(stderr, " \n"); }
	41
	42	# define DEBUG_PRINTHEX(l,p,n) { \
	43	unsigned debug_u; \
	44	for (debug_u=0; debug_u<(n); debug_u++) \
	45	DEBUGLOGRAW(l, "%02X ", ((const unsigned char*)(p))[debug_u]); \
	46	DEBUGLOGRAW(l, " \n"); \
	47	}
	48
	49	static unsigned long long GetCurrentClockTimeMicroseconds()
	50	{
	51	static clock_t _ticksPerSecond = 0;
	52	if (_ticksPerSecond <= 0) _ticksPerSecond = sysconf(_SC_CLK_TCK);
	53
	54	struct tms junk; clock_t newTicks = (clock_t) times(&junk);
	55	return ((((unsigned long long)newTicks)*(1000000))/_ticksPerSecond);
	56	}
	57
	58	#define MUTEX_WAIT_TIME_DLEVEL 5
	59	#define PTHREAD_MUTEX_LOCK(mutex) \
	60	if (g_debugLevel>=MUTEX_WAIT_TIME_DLEVEL) { \
	61	unsigned long long beforeTime = GetCurrentClockTimeMicroseconds(); \
	62	pthread_mutex_lock(mutex); \
	63	unsigned long long afterTime = GetCurrentClockTimeMicroseconds(); \
	64	unsigned long long elapsedTime = (afterTime-beforeTime); \
	65	if (elapsedTime > 1000) { /* or whatever threshold you like; I'm using 1 millisecond here */ \
	66	DEBUGLOG(MUTEX_WAIT_TIME_DLEVEL, "Thread took %llu microseconds to acquire mutex %s \n", \
	67	elapsedTime, #mutex); \
	68	} \
	69	} else pthread_mutex_lock(mutex);
	70
	71	#else
	72
	73	# define DEBUGLOG(l, ...) {} /* disabled */
	74	# define PTHREAD_MUTEX_LOCK(m) pthread_mutex_lock(m)
	75	# define DEBUG_PRINTHEX(l,p,n) {}
	76
	77	#endif
	78
	79
	80	/* ===== Buffer Pool ===== */
	81
	82	typedef struct buffer_s {
	83	void* start;
	84	size_t size;
	85	} buffer_t;
	86
	87	static const buffer_t g_nullBuffer = { NULL, 0 };
	88
	89	typedef struct ZSTDMT_bufferPool_s {
	90	unsigned totalBuffers;
	91	unsigned nbBuffers;
	92	buffer_t bTable[1]; /* variable size */
	93	} ZSTDMT_bufferPool;
	94
	95	static ZSTDMT_bufferPool* ZSTDMT_createBufferPool(unsigned nbThreads)
	96	{
	97	unsigned const maxNbBuffers = 2*nbThreads + 2;
	98	ZSTDMT_bufferPool* const bufPool = (ZSTDMT_bufferPool)calloc(1, sizeof(ZSTDMT_bufferPool) + (maxNbBuffers-1) sizeof(buffer_t));
	99	if (bufPool==NULL) return NULL;
	100	bufPool->totalBuffers = maxNbBuffers;
	101	bufPool->nbBuffers = 0;
	102	return bufPool;
	103	}
	104
	105	static void ZSTDMT_freeBufferPool(ZSTDMT_bufferPool* bufPool)
	106	{
	107	unsigned u;
	108	if (!bufPool) return; /* compatibility with free on NULL */
	109	for (u=0; u<bufPool->totalBuffers; u++)
	110	free(bufPool->bTable[u].start);
	111	free(bufPool);
	112	}
	113
	114	/* assumption : invocation from main thread only ! */
	115	static buffer_t ZSTDMT_getBuffer(ZSTDMT_bufferPool* pool, size_t bSize)
	116	{
	117	if (pool->nbBuffers) { /* try to use an existing buffer */
	118	buffer_t const buf = pool->bTable[--(pool->nbBuffers)];
	119	size_t const availBufferSize = buf.size;
	120	if ((availBufferSize >= bSize) & (availBufferSize <= 10bSize)) / large enough, but not too much */
	121	return buf;
	122	free(buf.start); /* size conditions not respected : scratch this buffer and create a new one */
	123	}
	124	/* create new buffer */
	125	{ buffer_t buffer;
	126	void* const start = malloc(bSize);
	127	if (start==NULL) bSize = 0;
	128	buffer.start = start; /* note : start can be NULL if malloc fails ! */
	129	buffer.size = bSize;
	130	return buffer;
	131	}
	132	}
	133
	134	/* store buffer for later re-use, up to pool capacity */
	135	static void ZSTDMT_releaseBuffer(ZSTDMT_bufferPool* pool, buffer_t buf)
	136	{
	137	if (buf.start == NULL) return; /* release on NULL */
	138	if (pool->nbBuffers < pool->totalBuffers) {
	139	pool->bTable[pool->nbBuffers++] = buf; /* store for later re-use */
	140	return;
	141	}
	142	/* Reached bufferPool capacity (should not happen) */
	143	free(buf.start);
	144	}
	145
	146
	147	/* ===== CCtx Pool ===== */
	148
	149	typedef struct {
	150	unsigned totalCCtx;
	151	unsigned availCCtx;
	152	ZSTD_CCtx* cctx[1]; /* variable size */
	153	} ZSTDMT_CCtxPool;
	154
	155	/* assumption : CCtxPool invocation only from main thread */
	156
	157	/* note : all CCtx borrowed from the pool should be released back to the pool _before_ freeing the pool */
	158	static void ZSTDMT_freeCCtxPool(ZSTDMT_CCtxPool* pool)
	159	{
	160	unsigned u;
	161	for (u=0; u<pool->totalCCtx; u++)
	162	ZSTD_freeCCtx(pool->cctx[u]); /* note : compatible with free on NULL */
	163	free(pool);
	164	}
	165
	166	/* ZSTDMT_createCCtxPool() :
	167	* implies nbThreads >= 1 , checked by caller ZSTDMT_createCCtx() */
	168	static ZSTDMT_CCtxPool* ZSTDMT_createCCtxPool(unsigned nbThreads)
	169	{
	170	ZSTDMT_CCtxPool* const cctxPool = (ZSTDMT_CCtxPool) calloc(1, sizeof(ZSTDMT_CCtxPool) + (nbThreads-1)sizeof(ZSTD_CCtx*));
	171	if (!cctxPool) return NULL;
	172	cctxPool->totalCCtx = nbThreads;
	173	cctxPool->availCCtx = 1; /* at least one cctx for single-thread mode */
	174	cctxPool->cctx[0] = ZSTD_createCCtx();
	175	if (!cctxPool->cctx[0]) { ZSTDMT_freeCCtxPool(cctxPool); return NULL; }
	176	DEBUGLOG(1, "cctxPool created, with %u threads", nbThreads);
	177	return cctxPool;
	178	}
	179
	180	static ZSTD_CCtx* ZSTDMT_getCCtx(ZSTDMT_CCtxPool* pool)
	181	{
	182	if (pool->availCCtx) {
	183	pool->availCCtx--;
	184	return pool->cctx[pool->availCCtx];
	185	}
	186	return ZSTD_createCCtx(); /* note : can be NULL, when creation fails ! */
	187	}
	188
	189	static void ZSTDMT_releaseCCtx(ZSTDMT_CCtxPool* pool, ZSTD_CCtx* cctx)
	190	{
	191	if (cctx==NULL) return; /* compatibility with release on NULL */
	192	if (pool->availCCtx < pool->totalCCtx)
	193	pool->cctx[pool->availCCtx++] = cctx;
	194	else
	195	/* pool overflow : should not happen, since totalCCtx==nbThreads */
	196	ZSTD_freeCCtx(cctx);
	197	}
	198
	199
	200	/* ===== Thread worker ===== */
	201
	202	typedef struct {
	203	buffer_t buffer;
	204	size_t filled;
	205	} inBuff_t;
	206
	207	typedef struct {
	208	ZSTD_CCtx* cctx;
	209	buffer_t src;
	210	const void* srcStart;
	211	size_t srcSize;
	212	size_t dictSize;
	213	buffer_t dstBuff;
	214	size_t cSize;
	215	size_t dstFlushed;
	216	unsigned firstChunk;
	217	unsigned lastChunk;
	218	unsigned jobCompleted;
	219	unsigned jobScanned;
	220	pthread_mutex_t* jobCompleted_mutex;
	221	pthread_cond_t* jobCompleted_cond;
	222	ZSTD_parameters params;
	223	ZSTD_CDict* cdict;
	224	unsigned long long fullFrameSize;
	225	} ZSTDMT_jobDescription;
	226
	227	/* ZSTDMT_compressChunk() : POOL_function type */
	228	void ZSTDMT_compressChunk(void* jobDescription)
	229	{
	230	ZSTDMT_jobDescription* const job = (ZSTDMT_jobDescription*)jobDescription;
	231	const void* const src = (const char*)job->srcStart + job->dictSize;
	232	buffer_t const dstBuff = job->dstBuff;
	233	DEBUGLOG(3, "job (first:%u) (last:%u) : dictSize %u, srcSize %u", job->firstChunk, job->lastChunk, (U32)job->dictSize, (U32)job->srcSize);
	234	if (job->cdict) {
	235	size_t const initError = ZSTD_compressBegin_usingCDict(job->cctx, job->cdict, job->fullFrameSize);
	236	if (job->cdict) DEBUGLOG(3, "using CDict ");
	237	if (ZSTD_isError(initError)) { job->cSize = initError; goto _endJob; }
	238	} else {
	239	size_t const initError = ZSTD_compressBegin_advanced(job->cctx, job->srcStart, job->dictSize, job->params, job->fullFrameSize);
	240	if (ZSTD_isError(initError)) { job->cSize = initError; goto _endJob; }
	241	ZSTD_setCCtxParameter(job->cctx, ZSTD_p_forceWindow, 1);
	242	}
	243	if (!job->firstChunk) { /* flush frame header */
	244	size_t const hSize = ZSTD_compressContinue(job->cctx, dstBuff.start, dstBuff.size, src, 0);
	245	if (ZSTD_isError(hSize)) { job->cSize = hSize; goto _endJob; }
	246	ZSTD_invalidateRepCodes(job->cctx);
	247	}
	248
	249	DEBUGLOG(4, "Compressing : ");
	250	DEBUG_PRINTHEX(4, job->srcStart, 12);
	251	job->cSize = (job->lastChunk) ? /* last chunk signal */
	252	ZSTD_compressEnd (job->cctx, dstBuff.start, dstBuff.size, src, job->srcSize) :
	253	ZSTD_compressContinue(job->cctx, dstBuff.start, dstBuff.size, src, job->srcSize);
	254	DEBUGLOG(3, "compressed %u bytes into %u bytes (first:%u) (last:%u)", (unsigned)job->srcSize, (unsigned)job->cSize, job->firstChunk, job->lastChunk);
	255
	256	_endJob:
	257	PTHREAD_MUTEX_LOCK(job->jobCompleted_mutex);
	258	job->jobCompleted = 1;
	259	job->jobScanned = 0;
	260	pthread_cond_signal(job->jobCompleted_cond);
	261	pthread_mutex_unlock(job->jobCompleted_mutex);
	262	}
	263
	264
	265	/* ------------------------------------------ */
	266	/* ===== Multi-threaded compression ===== */
	267	/* ------------------------------------------ */
	268
	269	struct ZSTDMT_CCtx_s {
	270	POOL_ctx* factory;
	271	ZSTDMT_bufferPool* buffPool;
	272	ZSTDMT_CCtxPool* cctxPool;
	273	pthread_mutex_t jobCompleted_mutex;
	274	pthread_cond_t jobCompleted_cond;
	275	size_t targetSectionSize;
	276	size_t marginSize;
	277	size_t inBuffSize;
	278	size_t dictSize;
	279	size_t targetDictSize;
	280	inBuff_t inBuff;
	281	ZSTD_parameters params;
	282	XXH64_state_t xxhState;
	283	unsigned nbThreads;
	284	unsigned jobIDMask;
	285	unsigned doneJobID;
	286	unsigned nextJobID;
	287	unsigned frameEnded;
	288	unsigned allJobsCompleted;
	289	unsigned overlapRLog;
	290	unsigned long long frameContentSize;
	291	size_t sectionSize;
	292	ZSTD_CDict* cdict;
	293	ZSTD_CStream* cstream;
	294	ZSTDMT_jobDescription jobs[1]; /* variable size (must lies at the end) */
	295	};
	296
	297	ZSTDMT_CCtx *ZSTDMT_createCCtx(unsigned nbThreads)
	298	{
	299	ZSTDMT_CCtx* cctx;
	300	U32 const minNbJobs = nbThreads + 2;
	301	U32 const nbJobsLog2 = ZSTD_highbit32(minNbJobs) + 1;
	302	U32 const nbJobs = 1 << nbJobsLog2;
	303	DEBUGLOG(5, "nbThreads : %u ; minNbJobs : %u ; nbJobsLog2 : %u ; nbJobs : %u \n",
	304	nbThreads, minNbJobs, nbJobsLog2, nbJobs);
	305	if ((nbThreads < 1) \| (nbThreads > ZSTDMT_NBTHREADS_MAX)) return NULL;
	306	cctx = (ZSTDMT_CCtx) calloc(1, sizeof(ZSTDMT_CCtx) + nbJobssizeof(ZSTDMT_jobDescription));
	307	if (!cctx) return NULL;
	308	cctx->nbThreads = nbThreads;
	309	cctx->jobIDMask = nbJobs - 1;
	310	cctx->allJobsCompleted = 1;
	311	cctx->sectionSize = 0;
	312	cctx->overlapRLog = 3;
	313	cctx->factory = POOL_create(nbThreads, 1);
	314	cctx->buffPool = ZSTDMT_createBufferPool(nbThreads);
	315	cctx->cctxPool = ZSTDMT_createCCtxPool(nbThreads);
	316	if (!cctx->factory \| !cctx->buffPool \| !cctx->cctxPool) { /* one object was not created */
	317	ZSTDMT_freeCCtx(cctx);
	318	return NULL;
	319	}
	320	if (nbThreads==1) {
	321	cctx->cstream = ZSTD_createCStream();
	322	if (!cctx->cstream) {
	323	ZSTDMT_freeCCtx(cctx); return NULL;
	324	} }
	325	pthread_mutex_init(&cctx->jobCompleted_mutex, NULL); /* Todo : check init function return */
	326	pthread_cond_init(&cctx->jobCompleted_cond, NULL);
	327	DEBUGLOG(4, "mt_cctx created, for %u threads \n", nbThreads);
	328	return cctx;
	329	}
	330
	331	/* ZSTDMT_releaseAllJobResources() :
	332	* Ensure all workers are killed first. */
	333	static void ZSTDMT_releaseAllJobResources(ZSTDMT_CCtx* mtctx)
	334	{
	335	unsigned jobID;
	336	for (jobID=0; jobID <= mtctx->jobIDMask; jobID++) {
	337	ZSTDMT_releaseBuffer(mtctx->buffPool, mtctx->jobs[jobID].dstBuff);
	338	mtctx->jobs[jobID].dstBuff = g_nullBuffer;
	339	ZSTDMT_releaseBuffer(mtctx->buffPool, mtctx->jobs[jobID].src);
	340	mtctx->jobs[jobID].src = g_nullBuffer;
	341	ZSTDMT_releaseCCtx(mtctx->cctxPool, mtctx->jobs[jobID].cctx);
	342	mtctx->jobs[jobID].cctx = NULL;
	343	}
	344	memset(mtctx->jobs, 0, (mtctx->jobIDMask+1)*sizeof(ZSTDMT_jobDescription));
	345	ZSTDMT_releaseBuffer(mtctx->buffPool, mtctx->inBuff.buffer);
	346	mtctx->inBuff.buffer = g_nullBuffer;
	347	mtctx->allJobsCompleted = 1;
	348	}
	349
	350	size_t ZSTDMT_freeCCtx(ZSTDMT_CCtx* mtctx)
	351	{
	352	if (mtctx==NULL) return 0; /* compatible with free on NULL */
	353	POOL_free(mtctx->factory);
	354	if (!mtctx->allJobsCompleted) ZSTDMT_releaseAllJobResources(mtctx); /* stop workers first */
	355	ZSTDMT_freeBufferPool(mtctx->buffPool); /* release job resources into pools first */
	356	ZSTDMT_freeCCtxPool(mtctx->cctxPool);
	357	ZSTD_freeCDict(mtctx->cdict);
	358	ZSTD_freeCStream(mtctx->cstream);
	359	pthread_mutex_destroy(&mtctx->jobCompleted_mutex);
	360	pthread_cond_destroy(&mtctx->jobCompleted_cond);
	361	free(mtctx);
	362	return 0;
	363	}
	364
	365	size_t ZSTDMT_setMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSDTMT_parameter parameter, unsigned value)
	366	{
	367	switch(parameter)
	368	{
	369	case ZSTDMT_p_sectionSize :
	370	mtctx->sectionSize = value;
	371	return 0;
	372	case ZSTDMT_p_overlapSectionLog :
	373	DEBUGLOG(4, "ZSTDMT_p_overlapSectionLog : %u", value);
	374	mtctx->overlapRLog = (value >= 9) ? 0 : 9 - value;
	375	return 0;
	376	default :
	377	return ERROR(compressionParameter_unsupported);
	378	}
	379	}
	380
	381
	382	/* ------------------------------------------ */
	383	/* ===== Multi-threaded compression ===== */
	384	/* ------------------------------------------ */
	385
	386	size_t ZSTDMT_compressCCtx(ZSTDMT_CCtx* mtctx,
	387	void* dst, size_t dstCapacity,
	388	const void* src, size_t srcSize,
	389	int compressionLevel)
	390	{
	391	ZSTD_parameters params = ZSTD_getParams(compressionLevel, srcSize, 0);
	392	size_t const chunkTargetSize = (size_t)1 << (params.cParams.windowLog + 2);
	393	unsigned const nbChunksMax = (unsigned)(srcSize / chunkTargetSize) + (srcSize < chunkTargetSize) /* min 1 */;
	394	unsigned nbChunks = MIN(nbChunksMax, mtctx->nbThreads);
	395	size_t const proposedChunkSize = (srcSize + (nbChunks-1)) / nbChunks;
	396	size_t const avgChunkSize = ((proposedChunkSize & 0x1FFFF) < 0xFFFF) ? proposedChunkSize + 0xFFFF : proposedChunkSize; /* avoid too small last block */
	397	size_t remainingSrcSize = srcSize;
	398	const char* const srcStart = (const char*)src;
	399	size_t frameStartPos = 0;
	400
	401	DEBUGLOG(3, "windowLog : %2u => chunkTargetSize : %u bytes ", params.cParams.windowLog, (U32)chunkTargetSize);
	402	DEBUGLOG(2, "nbChunks : %2u (chunkSize : %u bytes) ", nbChunks, (U32)avgChunkSize);
	403	params.fParams.contentSizeFlag = 1;
	404
	405	if (nbChunks==1) { /* fallback to single-thread mode */
	406	ZSTD_CCtx* const cctx = mtctx->cctxPool->cctx[0];
	407	return ZSTD_compressCCtx(cctx, dst, dstCapacity, src, srcSize, compressionLevel);
	408	}
	409
	410	{ unsigned u;
	411	for (u=0; u<nbChunks; u++) {
	412	size_t const chunkSize = MIN(remainingSrcSize, avgChunkSize);
	413	size_t const dstBufferCapacity = u ? ZSTD_compressBound(chunkSize) : dstCapacity;
	414	buffer_t const dstAsBuffer = { dst, dstCapacity };
	415	buffer_t const dstBuffer = u ? ZSTDMT_getBuffer(mtctx->buffPool, dstBufferCapacity) : dstAsBuffer;
	416	ZSTD_CCtx* const cctx = ZSTDMT_getCCtx(mtctx->cctxPool);
	417
	418	if ((cctx==NULL) \|\| (dstBuffer.start==NULL)) {
	419	mtctx->jobs[u].cSize = ERROR(memory_allocation); /* job result */
	420	mtctx->jobs[u].jobCompleted = 1;
	421	nbChunks = u+1;
	422	break; /* let's wait for previous jobs to complete, but don't start new ones */
	423	}
	424
	425	mtctx->jobs[u].srcStart = srcStart + frameStartPos;
	426	mtctx->jobs[u].srcSize = chunkSize;
	427	mtctx->jobs[u].fullFrameSize = srcSize;
	428	mtctx->jobs[u].params = params;
	429	mtctx->jobs[u].dstBuff = dstBuffer;
	430	mtctx->jobs[u].cctx = cctx;
	431	mtctx->jobs[u].firstChunk = (u==0);
	432	mtctx->jobs[u].lastChunk = (u==nbChunks-1);
	433	mtctx->jobs[u].jobCompleted = 0;
	434	mtctx->jobs[u].jobCompleted_mutex = &mtctx->jobCompleted_mutex;
	435	mtctx->jobs[u].jobCompleted_cond = &mtctx->jobCompleted_cond;
	436
	437	DEBUGLOG(3, "posting job %u (%u bytes)", u, (U32)chunkSize);
	438	DEBUG_PRINTHEX(3, mtctx->jobs[u].srcStart, 12);
	439	POOL_add(mtctx->factory, ZSTDMT_compressChunk, &mtctx->jobs[u]);
	440
	441	frameStartPos += chunkSize;
	442	remainingSrcSize -= chunkSize;
	443	} }
	444	/* note : since nbChunks <= nbThreads, all jobs should be running immediately in parallel */
	445
	446	{ unsigned chunkID;
	447	size_t error = 0, dstPos = 0;
	448	for (chunkID=0; chunkID<nbChunks; chunkID++) {
	449	DEBUGLOG(3, "waiting for chunk %u ", chunkID);
	450	PTHREAD_MUTEX_LOCK(&mtctx->jobCompleted_mutex);
	451	while (mtctx->jobs[chunkID].jobCompleted==0) {
	452	DEBUGLOG(4, "waiting for jobCompleted signal from chunk %u", chunkID);
	453	pthread_cond_wait(&mtctx->jobCompleted_cond, &mtctx->jobCompleted_mutex);
	454	}
	455	pthread_mutex_unlock(&mtctx->jobCompleted_mutex);
	456	DEBUGLOG(3, "ready to write chunk %u ", chunkID);
	457
	458	ZSTDMT_releaseCCtx(mtctx->cctxPool, mtctx->jobs[chunkID].cctx);
	459	mtctx->jobs[chunkID].cctx = NULL;
	460	mtctx->jobs[chunkID].srcStart = NULL;
	461	{ size_t const cSize = mtctx->jobs[chunkID].cSize;
	462	if (ZSTD_isError(cSize)) error = cSize;
	463	if ((!error) && (dstPos + cSize > dstCapacity)) error = ERROR(dstSize_tooSmall);
	464	if (chunkID) { /* note : chunk 0 is already written directly into dst */
	465	if (!error) memcpy((char*)dst + dstPos, mtctx->jobs[chunkID].dstBuff.start, cSize);
	466	ZSTDMT_releaseBuffer(mtctx->buffPool, mtctx->jobs[chunkID].dstBuff);
	467	mtctx->jobs[chunkID].dstBuff = g_nullBuffer;
	468	}
	469	dstPos += cSize ;
	470	}
	471	}
	472	if (!error) DEBUGLOG(3, "compressed size : %u ", (U32)dstPos);
	473	return error ? error : dstPos;
	474	}
	475
	476	}
	477
	478
	479	/* ====================================== */
	480	/* ======= Streaming API ======= */
	481	/* ====================================== */
	482
	483	static void ZSTDMT_waitForAllJobsCompleted(ZSTDMT_CCtx* zcs) {
	484	while (zcs->doneJobID < zcs->nextJobID) {
	485	unsigned const jobID = zcs->doneJobID & zcs->jobIDMask;
	486	PTHREAD_MUTEX_LOCK(&zcs->jobCompleted_mutex);
	487	while (zcs->jobs[jobID].jobCompleted==0) {
	488	DEBUGLOG(4, "waiting for jobCompleted signal from chunk %u", zcs->doneJobID); /* we want to block when waiting for data to flush */
	489	pthread_cond_wait(&zcs->jobCompleted_cond, &zcs->jobCompleted_mutex);
	490	}
	491	pthread_mutex_unlock(&zcs->jobCompleted_mutex);
	492	zcs->doneJobID++;
	493	}
	494	}
	495
	496
	497	static size_t ZSTDMT_initCStream_internal(ZSTDMT_CCtx* zcs,
	498	const void* dict, size_t dictSize, unsigned updateDict,
	499	ZSTD_parameters params, unsigned long long pledgedSrcSize)
	500	{
	501	ZSTD_customMem const cmem = { NULL, NULL, NULL };
	502	DEBUGLOG(3, "Started new compression, with windowLog : %u", params.cParams.windowLog);
	503	if (zcs->nbThreads==1) return ZSTD_initCStream_advanced(zcs->cstream, dict, dictSize, params, pledgedSrcSize);
	504	if (zcs->allJobsCompleted == 0) { /* previous job not correctly finished */
	505	ZSTDMT_waitForAllJobsCompleted(zcs);
	506	ZSTDMT_releaseAllJobResources(zcs);
	507	zcs->allJobsCompleted = 1;
	508	}
	509	zcs->params = params;
	510	if (updateDict) {
	511	ZSTD_freeCDict(zcs->cdict); zcs->cdict = NULL;
	512	if (dict && dictSize) {
	513	zcs->cdict = ZSTD_createCDict_advanced(dict, dictSize, 0, params, cmem);
	514	if (zcs->cdict == NULL) return ERROR(memory_allocation);
	515	} }
	516	zcs->frameContentSize = pledgedSrcSize;
	517	zcs->targetDictSize = (zcs->overlapRLog>=9) ? 0 : (size_t)1 << (zcs->params.cParams.windowLog - zcs->overlapRLog);
	518	DEBUGLOG(4, "overlapRLog : %u ", zcs->overlapRLog);
	519	DEBUGLOG(3, "overlap Size : %u KB", (U32)(zcs->targetDictSize>>10));
	520	zcs->targetSectionSize = zcs->sectionSize ? zcs->sectionSize : (size_t)1 << (zcs->params.cParams.windowLog + 2);
	521	zcs->targetSectionSize = MAX(ZSTDMT_SECTION_SIZE_MIN, zcs->targetSectionSize);
	522	zcs->targetSectionSize = MAX(zcs->targetDictSize, zcs->targetSectionSize);
	523	DEBUGLOG(3, "Section Size : %u KB", (U32)(zcs->targetSectionSize>>10));
	524	zcs->marginSize = zcs->targetSectionSize >> 2;
	525	zcs->inBuffSize = zcs->targetDictSize + zcs->targetSectionSize + zcs->marginSize;
	526	zcs->inBuff.buffer = ZSTDMT_getBuffer(zcs->buffPool, zcs->inBuffSize);
	527	if (zcs->inBuff.buffer.start == NULL) return ERROR(memory_allocation);
	528	zcs->inBuff.filled = 0;
	529	zcs->dictSize = 0;
	530	zcs->doneJobID = 0;
	531	zcs->nextJobID = 0;
	532	zcs->frameEnded = 0;
	533	zcs->allJobsCompleted = 0;
	534	if (params.fParams.checksumFlag) XXH64_reset(&zcs->xxhState, 0);
	535	return 0;
	536	}
	537
	538	size_t ZSTDMT_initCStream_advanced(ZSTDMT_CCtx* zcs,
	539	const void* dict, size_t dictSize,
	540	ZSTD_parameters params, unsigned long long pledgedSrcSize)
	541	{
	542	return ZSTDMT_initCStream_internal(zcs, dict, dictSize, 1, params, pledgedSrcSize);
	543	}
	544
	545	/* ZSTDMT_resetCStream() :
	546	* pledgedSrcSize is optional and can be zero == unknown */
	547	size_t ZSTDMT_resetCStream(ZSTDMT_CCtx* zcs, unsigned long long pledgedSrcSize)
	548	{
	549	if (zcs->nbThreads==1) return ZSTD_resetCStream(zcs->cstream, pledgedSrcSize);
	550	return ZSTDMT_initCStream_internal(zcs, NULL, 0, 0, zcs->params, pledgedSrcSize);
	551	}
	552
	553	size_t ZSTDMT_initCStream(ZSTDMT_CCtx* zcs, int compressionLevel) {
	554	ZSTD_parameters const params = ZSTD_getParams(compressionLevel, 0, 0);
	555	return ZSTDMT_initCStream_internal(zcs, NULL, 0, 1, params, 0);
	556	}
	557
	558
	559	static size_t ZSTDMT_createCompressionJob(ZSTDMT_CCtx* zcs, size_t srcSize, unsigned endFrame)
	560	{
	561	size_t const dstBufferCapacity = ZSTD_compressBound(srcSize);
	562	buffer_t const dstBuffer = ZSTDMT_getBuffer(zcs->buffPool, dstBufferCapacity);
	563	ZSTD_CCtx* const cctx = ZSTDMT_getCCtx(zcs->cctxPool);
	564	unsigned const jobID = zcs->nextJobID & zcs->jobIDMask;
	565
	566	if ((cctx==NULL) \|\| (dstBuffer.start==NULL)) {
	567	zcs->jobs[jobID].jobCompleted = 1;
	568	zcs->nextJobID++;
	569	ZSTDMT_waitForAllJobsCompleted(zcs);
	570	ZSTDMT_releaseAllJobResources(zcs);
	571	return ERROR(memory_allocation);
	572	}
	573
	574	DEBUGLOG(4, "preparing job %u to compress %u bytes with %u preload ", zcs->nextJobID, (U32)srcSize, (U32)zcs->dictSize);
	575	zcs->jobs[jobID].src = zcs->inBuff.buffer;
	576	zcs->jobs[jobID].srcStart = zcs->inBuff.buffer.start;
	577	zcs->jobs[jobID].srcSize = srcSize;
	578	zcs->jobs[jobID].dictSize = zcs->dictSize; /* note : zcs->inBuff.filled is presumed >= srcSize + dictSize */
	579	zcs->jobs[jobID].params = zcs->params;
	580	if (zcs->nextJobID) zcs->jobs[jobID].params.fParams.checksumFlag = 0; /* do not calculate checksum within sections, just keep it in header for first section */
	581	zcs->jobs[jobID].cdict = zcs->nextJobID==0 ? zcs->cdict : NULL;
	582	zcs->jobs[jobID].fullFrameSize = zcs->frameContentSize;
	583	zcs->jobs[jobID].dstBuff = dstBuffer;
	584	zcs->jobs[jobID].cctx = cctx;
	585	zcs->jobs[jobID].firstChunk = (zcs->nextJobID==0);
	586	zcs->jobs[jobID].lastChunk = endFrame;
	587	zcs->jobs[jobID].jobCompleted = 0;
	588	zcs->jobs[jobID].dstFlushed = 0;
	589	zcs->jobs[jobID].jobCompleted_mutex = &zcs->jobCompleted_mutex;
	590	zcs->jobs[jobID].jobCompleted_cond = &zcs->jobCompleted_cond;
	591
	592	/* get a new buffer for next input */
	593	if (!endFrame) {
	594	size_t const newDictSize = MIN(srcSize + zcs->dictSize, zcs->targetDictSize);
	595	zcs->inBuff.buffer = ZSTDMT_getBuffer(zcs->buffPool, zcs->inBuffSize);
	596	if (zcs->inBuff.buffer.start == NULL) { /* not enough memory to allocate next input buffer */
	597	zcs->jobs[jobID].jobCompleted = 1;
	598	zcs->nextJobID++;
	599	ZSTDMT_waitForAllJobsCompleted(zcs);
	600	ZSTDMT_releaseAllJobResources(zcs);
	601	return ERROR(memory_allocation);
	602	}
	603	DEBUGLOG(5, "inBuff filled to %u", (U32)zcs->inBuff.filled);
	604	zcs->inBuff.filled -= srcSize + zcs->dictSize - newDictSize;
	605	DEBUGLOG(5, "new job : filled to %u, with %u dict and %u src", (U32)zcs->inBuff.filled, (U32)newDictSize, (U32)(zcs->inBuff.filled - newDictSize));
	606	memmove(zcs->inBuff.buffer.start, (const char*)zcs->jobs[jobID].srcStart + zcs->dictSize + srcSize - newDictSize, zcs->inBuff.filled);
	607	DEBUGLOG(5, "new inBuff pre-filled");
	608	zcs->dictSize = newDictSize;
	609	} else {
	610	zcs->inBuff.buffer = g_nullBuffer;
	611	zcs->inBuff.filled = 0;
	612	zcs->dictSize = 0;
	613	zcs->frameEnded = 1;
	614	if (zcs->nextJobID == 0)
	615	zcs->params.fParams.checksumFlag = 0; /* single chunk : checksum is calculated directly within worker thread */
	616	}
	617
	618	DEBUGLOG(3, "posting job %u : %u bytes (end:%u) (note : doneJob = %u=>%u)", zcs->nextJobID, (U32)zcs->jobs[jobID].srcSize, zcs->jobs[jobID].lastChunk, zcs->doneJobID, zcs->doneJobID & zcs->jobIDMask);
	619	POOL_add(zcs->factory, ZSTDMT_compressChunk, &zcs->jobs[jobID]); /* this call is blocking when thread worker pool is exhausted */
	620	zcs->nextJobID++;
	621	return 0;
	622	}
	623
	624
	625	/* ZSTDMT_flushNextJob() :
	626	* output : will be updated with amount of data flushed .
	627	* blockToFlush : if >0, the function will block and wait if there is no data available to flush .
	628	* @return : amount of data remaining within internal buffer, 1 if unknown but > 0, 0 if no more, or an error code */
	629	static size_t ZSTDMT_flushNextJob(ZSTDMT_CCtx* zcs, ZSTD_outBuffer* output, unsigned blockToFlush)
	630	{
	631	unsigned const wJobID = zcs->doneJobID & zcs->jobIDMask;
	632	if (zcs->doneJobID == zcs->nextJobID) return 0; /* all flushed ! */
	633	PTHREAD_MUTEX_LOCK(&zcs->jobCompleted_mutex);
	634	while (zcs->jobs[wJobID].jobCompleted==0) {
	635	DEBUGLOG(5, "waiting for jobCompleted signal from job %u", zcs->doneJobID);
	636	if (!blockToFlush) { pthread_mutex_unlock(&zcs->jobCompleted_mutex); return 0; } /* nothing ready to be flushed => skip */
	637	pthread_cond_wait(&zcs->jobCompleted_cond, &zcs->jobCompleted_mutex); /* block when nothing available to flush */
	638	}
	639	pthread_mutex_unlock(&zcs->jobCompleted_mutex);
	640	/* compression job completed : output can be flushed */
	641	{ ZSTDMT_jobDescription job = zcs->jobs[wJobID];
	642	if (!job.jobScanned) {
	643	if (ZSTD_isError(job.cSize)) {
	644	DEBUGLOG(5, "compression error detected ");
	645	ZSTDMT_waitForAllJobsCompleted(zcs);
	646	ZSTDMT_releaseAllJobResources(zcs);
	647	return job.cSize;
	648	}
	649	ZSTDMT_releaseCCtx(zcs->cctxPool, job.cctx);
	650	zcs->jobs[wJobID].cctx = NULL;
	651	DEBUGLOG(5, "zcs->params.fParams.checksumFlag : %u ", zcs->params.fParams.checksumFlag);
	652	if (zcs->params.fParams.checksumFlag) {
	653	XXH64_update(&zcs->xxhState, (const char*)job.srcStart + job.dictSize, job.srcSize);
	654	if (zcs->frameEnded && (zcs->doneJobID+1 == zcs->nextJobID)) { /* write checksum at end of last section */
	655	U32 const checksum = (U32)XXH64_digest(&zcs->xxhState);
	656	DEBUGLOG(4, "writing checksum : %08X \n", checksum);
	657	MEM_writeLE32((char*)job.dstBuff.start + job.cSize, checksum);
	658	job.cSize += 4;
	659	zcs->jobs[wJobID].cSize += 4;
	660	} }
	661	ZSTDMT_releaseBuffer(zcs->buffPool, job.src);
	662	zcs->jobs[wJobID].srcStart = NULL;
	663	zcs->jobs[wJobID].src = g_nullBuffer;
	664	zcs->jobs[wJobID].jobScanned = 1;
	665	}
	666	{ size_t const toWrite = MIN(job.cSize - job.dstFlushed, output->size - output->pos);
	667	DEBUGLOG(4, "Flushing %u bytes from job %u ", (U32)toWrite, zcs->doneJobID);
	668	memcpy((char)output->dst + output->pos, (const char)job.dstBuff.start + job.dstFlushed, toWrite);
	669	output->pos += toWrite;
	670	job.dstFlushed += toWrite;
	671	}
	672	if (job.dstFlushed == job.cSize) { /* output buffer fully flushed => move to next one */
	673	ZSTDMT_releaseBuffer(zcs->buffPool, job.dstBuff);
	674	zcs->jobs[wJobID].dstBuff = g_nullBuffer;
	675	zcs->jobs[wJobID].jobCompleted = 0;
	676	zcs->doneJobID++;
	677	} else {
	678	zcs->jobs[wJobID].dstFlushed = job.dstFlushed;
	679	}
	680	/* return value : how many bytes left in buffer ; fake it to 1 if unknown but >0 */
	681	if (job.cSize > job.dstFlushed) return (job.cSize - job.dstFlushed);
	682	if (zcs->doneJobID < zcs->nextJobID) return 1; /* still some buffer to flush */
	683	zcs->allJobsCompleted = zcs->frameEnded; /* frame completed and entirely flushed */
	684	return 0; /* everything flushed */
	685	} }
	686
	687
	688	size_t ZSTDMT_compressStream(ZSTDMT_CCtx* zcs, ZSTD_outBuffer* output, ZSTD_inBuffer* input)
	689	{
	690	size_t const newJobThreshold = zcs->dictSize + zcs->targetSectionSize + zcs->marginSize;
	691	if (zcs->frameEnded) return ERROR(stage_wrong); /* current frame being ended. Only flush is allowed. Restart with init */
	692	if (zcs->nbThreads==1) return ZSTD_compressStream(zcs->cstream, output, input);
	693
	694	/* fill input buffer */
	695	{ size_t const toLoad = MIN(input->size - input->pos, zcs->inBuffSize - zcs->inBuff.filled);
	696	memcpy((char*)zcs->inBuff.buffer.start + zcs->inBuff.filled, input->src, toLoad);
	697	input->pos += toLoad;
	698	zcs->inBuff.filled += toLoad;
	699	}
	700
	701	if ( (zcs->inBuff.filled >= newJobThreshold) /* filled enough : let's compress */
	702	&& (zcs->nextJobID <= zcs->doneJobID + zcs->jobIDMask) ) { /* avoid overwriting job round buffer */
	703	CHECK_F( ZSTDMT_createCompressionJob(zcs, zcs->targetSectionSize, 0) );
	704	}
	705
	706	/* check for data to flush */
	707	CHECK_F( ZSTDMT_flushNextJob(zcs, output, (zcs->inBuff.filled == zcs->inBuffSize)) ); /* block if it wasn't possible to create new job due to saturation */
	708
	709	/* recommended next input size : fill current input buffer */
	710	return zcs->inBuffSize - zcs->inBuff.filled; /* note : could be zero when input buffer is fully filled and no more availability to create new job */
	711	}
	712
	713
	714	static size_t ZSTDMT_flushStream_internal(ZSTDMT_CCtx* zcs, ZSTD_outBuffer* output, unsigned endFrame)
	715	{
	716	size_t const srcSize = zcs->inBuff.filled - zcs->dictSize;
	717
	718	if (srcSize) DEBUGLOG(4, "flushing : %u bytes left to compress", (U32)srcSize);
	719	if ( ((srcSize > 0) \|\| (endFrame && !zcs->frameEnded))
	720	&& (zcs->nextJobID <= zcs->doneJobID + zcs->jobIDMask) ) {
	721	CHECK_F( ZSTDMT_createCompressionJob(zcs, srcSize, endFrame) );
	722	}
	723
	724	/* check if there is any data available to flush */
	725	DEBUGLOG(5, "zcs->doneJobID : %u ; zcs->nextJobID : %u ", zcs->doneJobID, zcs->nextJobID);
	726	return ZSTDMT_flushNextJob(zcs, output, 1);
	727	}
	728
	729
	730	size_t ZSTDMT_flushStream(ZSTDMT_CCtx* zcs, ZSTD_outBuffer* output)
	731	{
	732	if (zcs->nbThreads==1) return ZSTD_flushStream(zcs->cstream, output);
	733	return ZSTDMT_flushStream_internal(zcs, output, 0);
	734	}
	735
	736	size_t ZSTDMT_endStream(ZSTDMT_CCtx* zcs, ZSTD_outBuffer* output)
	737	{
	738	if (zcs->nbThreads==1) return ZSTD_endStream(zcs->cstream, output);
	739	return ZSTDMT_flushStream_internal(zcs, output, 1);
	740	}

contrib/python-zstandard/zstd/compress/zstdmt_compress.h

0 created 644 +78 0

			@@ -0,0 +1,78 b''
		1	/**
		2	* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
		3	* All rights reserved.
		4	*
		5	* This source code is licensed under the BSD-style license found in the
		6	* LICENSE file in the root directory of this source tree. An additional grant
		7	* of patent rights can be found in the PATENTS file in the same directory.
		8	*/
		9
		10	#ifndef ZSTDMT_COMPRESS_H
		11	#define ZSTDMT_COMPRESS_H
		12
		13	#if defined (__cplusplus)
		14	extern "C" {
		15	#endif
		16
		17
		18	/* Note : All prototypes defined in this file shall be considered experimental.
		19	* There is no guarantee of API continuity (yet) on any of these prototypes */
		20
		21	/* === Dependencies === */
		22	#include <stddef.h> /* size_t */
		23	#define ZSTD_STATIC_LINKING_ONLY /* ZSTD_parameters */
		24	#include "zstd.h" /* ZSTD_inBuffer, ZSTD_outBuffer, ZSTDLIB_API */
		25
		26
		27	/* === Simple one-pass functions === */
		28
		29	typedef struct ZSTDMT_CCtx_s ZSTDMT_CCtx;
		30	ZSTDLIB_API ZSTDMT_CCtx* ZSTDMT_createCCtx(unsigned nbThreads);
		31	ZSTDLIB_API size_t ZSTDMT_freeCCtx(ZSTDMT_CCtx* cctx);
		32
		33	ZSTDLIB_API size_t ZSTDMT_compressCCtx(ZSTDMT_CCtx* cctx,
		34	void* dst, size_t dstCapacity,
		35	const void* src, size_t srcSize,
		36	int compressionLevel);
		37
		38
		39	/* === Streaming functions === */
		40
		41	ZSTDLIB_API size_t ZSTDMT_initCStream(ZSTDMT_CCtx* mtctx, int compressionLevel);
		42	ZSTDLIB_API size_t ZSTDMT_resetCStream(ZSTDMT_CCtx* mtctx, unsigned long long pledgedSrcSize); /*< pledgedSrcSize is optional and can be zero == unknown /
		43
		44	ZSTDLIB_API size_t ZSTDMT_compressStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output, ZSTD_inBuffer* input);
		45
		46	ZSTDLIB_API size_t ZSTDMT_flushStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output); /*< @return : 0 == all flushed; >0 : still some data to be flushed; or an error code (ZSTD_isError()) /
		47	ZSTDLIB_API size_t ZSTDMT_endStream(ZSTDMT_CCtx* mtctx, ZSTD_outBuffer* output); /*< @return : 0 == all flushed; >0 : still some data to be flushed; or an error code (ZSTD_isError()) /
		48
		49
		50	/* === Advanced functions and parameters === */
		51
		52	#ifndef ZSTDMT_SECTION_SIZE_MIN
		53	# define ZSTDMT_SECTION_SIZE_MIN (1U << 20) /* 1 MB - Minimum size of each compression job */
		54	#endif
		55
		56	ZSTDLIB_API size_t ZSTDMT_initCStream_advanced(ZSTDMT_CCtx* mtctx, const void* dict, size_t dictSize, /*< dict can be released after init, a local copy is preserved within zcs /
		57	ZSTD_parameters params, unsigned long long pledgedSrcSize); /*< pledgedSrcSize is optional and can be zero == unknown /
		58
		59	/* ZSDTMT_parameter :
		60	* List of parameters that can be set using ZSTDMT_setMTCtxParameter() */
		61	typedef enum {
		62	ZSTDMT_p_sectionSize, /* size of input "section". Each section is compressed in parallel. 0 means default, which is dynamically determined within compression functions */
		63	ZSTDMT_p_overlapSectionLog /* Log of overlapped section; 0 == no overlap, 6(default) == use 1/8th of window, >=9 == use full window */
		64	} ZSDTMT_parameter;
		65
		66	/* ZSTDMT_setMTCtxParameter() :
		67	* allow setting individual parameters, one at a time, among a list of enums defined in ZSTDMT_parameter.
		68	* The function must be called typically after ZSTD_createCCtx().
		69	* Parameters not explicitly reset by ZSTDMT_init*() remain the same in consecutive compression sessions.
		70	* @return : 0, or an error code (which can be tested using ZSTD_isError()) */
		71	ZSTDLIB_API size_t ZSTDMT_setMTCtxParameter(ZSTDMT_CCtx* mtctx, ZSDTMT_parameter parameter, unsigned value);
		72
		73
		74	#if defined (__cplusplus)
		75	}
		76	#endif
		77
		78	#endif /* ZSTDMT_COMPRESS_H */

contrib/python-zstandard/zstd/dictBuilder/cover.c

0 created 644 +1021 0

This diff has been collapsed as it changes many lines, (1021 lines changed) Show them Hide them
		@@ -0,0 +1,1021 b''
	1	/**
	2	* Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
	3	* All rights reserved.
	4	*
	5	* This source code is licensed under the BSD-style license found in the
	6	* LICENSE file in the root directory of this source tree. An additional grant
	7	* of patent rights can be found in the PATENTS file in the same directory.
	8	*/
	9
	10	/-************************************
	11	* Dependencies
	12	***************************************/
	13	#include <stdio.h> /* fprintf */
	14	#include <stdlib.h> /* malloc, free, qsort */
	15	#include <string.h> /* memset */
	16	#include <time.h> /* clock */
	17
	18	#include "mem.h" /* read */
	19	#include "pool.h"
	20	#include "threading.h"
	21	#include "zstd_internal.h" /* includes zstd.h */
	22	#ifndef ZDICT_STATIC_LINKING_ONLY
	23	#define ZDICT_STATIC_LINKING_ONLY
	24	#endif
	25	#include "zdict.h"
	26
	27	/-************************************
	28	* Constants
	29	***************************************/
	30	#define COVER_MAX_SAMPLES_SIZE (sizeof(size_t) == 8 ? ((U32)-1) : ((U32)1 GB))
	31
	32	/-************************************
	33	* Console display
	34	***************************************/
	35	static int g_displayLevel = 2;
	36	#define DISPLAY(...) \
	37	{ \
	38	fprintf(stderr, __VA_ARGS__); \
	39	fflush(stderr); \
	40	}
	41	#define LOCALDISPLAYLEVEL(displayLevel, l, ...) \
	42	if (displayLevel >= l) { \
	43	DISPLAY(__VA_ARGS__); \
	44	} /* 0 : no display; 1: errors; 2: default; 3: details; 4: debug */
	45	#define DISPLAYLEVEL(l, ...) LOCALDISPLAYLEVEL(g_displayLevel, l, __VA_ARGS__)
	46
	47	#define LOCALDISPLAYUPDATE(displayLevel, l, ...) \
	48	if (displayLevel >= l) { \
	49	if ((clock() - g_time > refreshRate) \|\| (displayLevel >= 4)) { \
	50	g_time = clock(); \
	51	DISPLAY(__VA_ARGS__); \
	52	if (displayLevel >= 4) \
	53	fflush(stdout); \
	54	} \
	55	}
	56	#define DISPLAYUPDATE(l, ...) LOCALDISPLAYUPDATE(g_displayLevel, l, __VA_ARGS__)
	57	static const clock_t refreshRate = CLOCKS_PER_SEC * 15 / 100;
	58	static clock_t g_time = 0;
	59
	60	/-************************************
	61	* Hash table
	62	***************************************
	63	* A small specialized hash map for storing activeDmers.
	64	* The map does not resize, so if it becomes full it will loop forever.
	65	* Thus, the map must be large enough to store every value.
	66	* The map implements linear probing and keeps its load less than 0.5.
	67	*/
	68
	69	#define MAP_EMPTY_VALUE ((U32)-1)
	70	typedef struct COVER_map_pair_t_s {
	71	U32 key;
	72	U32 value;
	73	} COVER_map_pair_t;
	74
	75	typedef struct COVER_map_s {
	76	COVER_map_pair_t *data;
	77	U32 sizeLog;
	78	U32 size;
	79	U32 sizeMask;
	80	} COVER_map_t;
	81
	82	/**
	83	* Clear the map.
	84	*/
	85	static void COVER_map_clear(COVER_map_t *map) {
	86	memset(map->data, MAP_EMPTY_VALUE, map->size * sizeof(COVER_map_pair_t));
	87	}
	88
	89	/**
	90	* Initializes a map of the given size.
	91	* Returns 1 on success and 0 on failure.
	92	* The map must be destroyed with COVER_map_destroy().
	93	* The map is only guaranteed to be large enough to hold size elements.
	94	*/
	95	static int COVER_map_init(COVER_map_t *map, U32 size) {
	96	map->sizeLog = ZSTD_highbit32(size) + 2;
	97	map->size = (U32)1 << map->sizeLog;
	98	map->sizeMask = map->size - 1;
	99	map->data = (COVER_map_pair_t )malloc(map->size sizeof(COVER_map_pair_t));
	100	if (!map->data) {
	101	map->sizeLog = 0;
	102	map->size = 0;
	103	return 0;
	104	}
	105	COVER_map_clear(map);
	106	return 1;
	107	}
	108
	109	/**
	110	* Internal hash function
	111	*/
	112	static const U32 prime4bytes = 2654435761U;
	113	static U32 COVER_map_hash(COVER_map_t *map, U32 key) {
	114	return (key * prime4bytes) >> (32 - map->sizeLog);
	115	}
	116
	117	/**
	118	* Helper function that returns the index that a key should be placed into.
	119	*/
	120	static U32 COVER_map_index(COVER_map_t *map, U32 key) {
	121	const U32 hash = COVER_map_hash(map, key);
	122	U32 i;
	123	for (i = hash;; i = (i + 1) & map->sizeMask) {
	124	COVER_map_pair_t *pos = &map->data[i];
	125	if (pos->value == MAP_EMPTY_VALUE) {
	126	return i;
	127	}
	128	if (pos->key == key) {
	129	return i;
	130	}
	131	}
	132	}
	133
	134	/**
	135	* Returns the pointer to the value for key.
	136	* If key is not in the map, it is inserted and the value is set to 0.
	137	* The map must not be full.
	138	*/
	139	static U32 COVER_map_at(COVER_map_t map, U32 key) {
	140	COVER_map_pair_t *pos = &map->data[COVER_map_index(map, key)];
	141	if (pos->value == MAP_EMPTY_VALUE) {
	142	pos->key = key;
	143	pos->value = 0;
	144	}
	145	return &pos->value;
	146	}
	147
	148	/**
	149	* Deletes key from the map if present.
	150	*/
	151	static void COVER_map_remove(COVER_map_t *map, U32 key) {
	152	U32 i = COVER_map_index(map, key);
	153	COVER_map_pair_t *del = &map->data[i];
	154	U32 shift = 1;
	155	if (del->value == MAP_EMPTY_VALUE) {
	156	return;
	157	}
	158	for (i = (i + 1) & map->sizeMask;; i = (i + 1) & map->sizeMask) {
	159	COVER_map_pair_t *const pos = &map->data[i];
	160	/* If the position is empty we are done */
	161	if (pos->value == MAP_EMPTY_VALUE) {
	162	del->value = MAP_EMPTY_VALUE;
	163	return;
	164	}
	165	/* If pos can be moved to del do so */
	166	if (((i - COVER_map_hash(map, pos->key)) & map->sizeMask) >= shift) {
	167	del->key = pos->key;
	168	del->value = pos->value;
	169	del = pos;
	170	shift = 1;
	171	} else {
	172	++shift;
	173	}
	174	}
	175	}
	176
	177	/**
	178	* Destroyes a map that is inited with COVER_map_init().
	179	*/
	180	static void COVER_map_destroy(COVER_map_t *map) {
	181	if (map->data) {
	182	free(map->data);
	183	}
	184	map->data = NULL;
	185	map->size = 0;
	186	}
	187
	188	/-************************************
	189	* Context
	190	***************************************/
	191
	192	typedef struct {
	193	const BYTE *samples;
	194	size_t *offsets;
	195	const size_t *samplesSizes;
	196	size_t nbSamples;
	197	U32 *suffix;
	198	size_t suffixSize;
	199	U32 *freqs;
	200	U32 *dmerAt;
	201	unsigned d;
	202	} COVER_ctx_t;
	203
	204	/* We need a global context for qsort... */
	205	static COVER_ctx_t *g_ctx = NULL;
	206
	207	/-************************************
	208	* Helper functions
	209	***************************************/
	210
	211	/**
	212	* Returns the sum of the sample sizes.
	213	*/
	214	static size_t COVER_sum(const size_t *samplesSizes, unsigned nbSamples) {
	215	size_t sum = 0;
	216	size_t i;
	217	for (i = 0; i < nbSamples; ++i) {
	218	sum += samplesSizes[i];
	219	}
	220	return sum;
	221	}
	222
	223	/**
	224	* Returns -1 if the dmer at lp is less than the dmer at rp.
	225	* Return 0 if the dmers at lp and rp are equal.
	226	* Returns 1 if the dmer at lp is greater than the dmer at rp.
	227	*/
	228	static int COVER_cmp(COVER_ctx_t ctx, const void lp, const void *rp) {
	229	const U32 lhs = (const U32 )lp;
	230	const U32 rhs = (const U32 )rp;
	231	return memcmp(ctx->samples + lhs, ctx->samples + rhs, ctx->d);
	232	}
	233
	234	/**
	235	* Same as COVER_cmp() except ties are broken by pointer value
	236	* NOTE: g_ctx must be set to call this function. A global is required because
	237	* qsort doesn't take an opaque pointer.
	238	*/
	239	static int COVER_strict_cmp(const void lp, const void rp) {
	240	int result = COVER_cmp(g_ctx, lp, rp);
	241	if (result == 0) {
	242	result = lp < rp ? -1 : 1;
	243	}
	244	return result;
	245	}
	246
	247	/**
	248	* Returns the first pointer in [first, last) whose element does not compare
	249	* less than value. If no such element exists it returns last.
	250	*/
	251	static const size_t COVER_lower_bound(const size_t first, const size_t *last,
	252	size_t value) {
	253	size_t count = last - first;
	254	while (count != 0) {
	255	size_t step = count / 2;
	256	const size_t *ptr = first;
	257	ptr += step;
	258	if (*ptr < value) {
	259	first = ++ptr;
	260	count -= step + 1;
	261	} else {
	262	count = step;
	263	}
	264	}
	265	return first;
	266	}
	267
	268	/**
	269	* Generic groupBy function.
	270	* Groups an array sorted by cmp into groups with equivalent values.
	271	* Calls grp for each group.
	272	*/
	273	static void
	274	COVER_groupBy(const void data, size_t count, size_t size, COVER_ctx_t ctx,
	275	int (cmp)(COVER_ctx_t , const void , const void ),
	276	void (grp)(COVER_ctx_t , const void , const void )) {
	277	const BYTE ptr = (const BYTE )data;
	278	size_t num = 0;
	279	while (num < count) {
	280	const BYTE *grpEnd = ptr + size;
	281	++num;
	282	while (num < count && cmp(ctx, ptr, grpEnd) == 0) {
	283	grpEnd += size;
	284	++num;
	285	}
	286	grp(ctx, ptr, grpEnd);
	287	ptr = grpEnd;
	288	}
	289	}
	290
	291	/-************************************
	292	* Cover functions
	293	***************************************/
	294
	295	/**
	296	* Called on each group of positions with the same dmer.
	297	* Counts the frequency of each dmer and saves it in the suffix array.
	298	* Fills `ctx->dmerAt`.
	299	*/
	300	static void COVER_group(COVER_ctx_t ctx, const void group,
	301	const void *groupEnd) {
	302	/* The group consists of all the positions with the same first d bytes. */
	303	const U32 grpPtr = (const U32 )group;
	304	const U32 grpEnd = (const U32 )groupEnd;
	305	/* The dmerId is how we will reference this dmer.
	306	* This allows us to map the whole dmer space to a much smaller space, the
	307	* size of the suffix array.
	308	*/
	309	const U32 dmerId = (U32)(grpPtr - ctx->suffix);
	310	/* Count the number of samples this dmer shows up in */
	311	U32 freq = 0;
	312	/* Details */
	313	const size_t *curOffsetPtr = ctx->offsets;
	314	const size_t *offsetsEnd = ctx->offsets + ctx->nbSamples;
	315	/* Once *grpPtr >= curSampleEnd this occurrence of the dmer is in a
	316	* different sample than the last.
	317	*/
	318	size_t curSampleEnd = ctx->offsets[0];
	319	for (; grpPtr != grpEnd; ++grpPtr) {
	320	/* Save the dmerId for this position so we can get back to it. */
	321	ctx->dmerAt[*grpPtr] = dmerId;
	322	/* Dictionaries only help for the first reference to the dmer.
	323	* After that zstd can reference the match from the previous reference.
	324	* So only count each dmer once for each sample it is in.
	325	*/
	326	if (*grpPtr < curSampleEnd) {
	327	continue;
	328	}
	329	freq += 1;
	330	/* Binary search to find the end of the sample *grpPtr is in.
	331	* In the common case that grpPtr + 1 == grpEnd we can skip the binary
	332	* search because the loop is over.
	333	*/
	334	if (grpPtr + 1 != grpEnd) {
	335	const size_t *sampleEndPtr =
	336	COVER_lower_bound(curOffsetPtr, offsetsEnd, *grpPtr);
	337	curSampleEnd = *sampleEndPtr;
	338	curOffsetPtr = sampleEndPtr + 1;
	339	}
	340	}
	341	/* At this point we are never going to look at this segment of the suffix
	342	* array again. We take advantage of this fact to save memory.
	343	* We store the frequency of the dmer in the first position of the group,
	344	* which is dmerId.
	345	*/
	346	ctx->suffix[dmerId] = freq;
	347	}
	348
	349	/**
	350	* A segment is a range in the source as well as the score of the segment.
	351	*/
	352	typedef struct {
	353	U32 begin;
	354	U32 end;
	355	double score;
	356	} COVER_segment_t;
	357
	358	/**
	359	* Selects the best segment in an epoch.
	360	* Segments of are scored according to the function:
	361	*
	362	* Let F(d) be the frequency of dmer d.
	363	* Let S_i be the dmer at position i of segment S which has length k.
	364	*
	365	* Score(S) = F(S_1) + F(S_2) + ... + F(S_{k-d+1})
	366	*
	367	* Once the dmer d is in the dictionay we set F(d) = 0.
	368	*/
	369	static COVER_segment_t COVER_selectSegment(const COVER_ctx_t ctx, U32 freqs,
	370	COVER_map_t *activeDmers, U32 begin,
	371	U32 end, COVER_params_t parameters) {
	372	/* Constants */
	373	const U32 k = parameters.k;
	374	const U32 d = parameters.d;
	375	const U32 dmersInK = k - d + 1;
	376	/* Try each segment (activeSegment) and save the best (bestSegment) */
	377	COVER_segment_t bestSegment = {0, 0, 0};
	378	COVER_segment_t activeSegment;
	379	/* Reset the activeDmers in the segment */
	380	COVER_map_clear(activeDmers);
	381	/* The activeSegment starts at the beginning of the epoch. */
	382	activeSegment.begin = begin;
	383	activeSegment.end = begin;
	384	activeSegment.score = 0;
	385	/* Slide the activeSegment through the whole epoch.
	386	* Save the best segment in bestSegment.
	387	*/
	388	while (activeSegment.end < end) {
	389	/* The dmerId for the dmer at the next position */
	390	U32 newDmer = ctx->dmerAt[activeSegment.end];
	391	/* The entry in activeDmers for this dmerId */
	392	U32 *newDmerOcc = COVER_map_at(activeDmers, newDmer);
	393	/* If the dmer isn't already present in the segment add its score. */
	394	if (*newDmerOcc == 0) {
	395	/* The paper suggest using the L-0.5 norm, but experiments show that it
	396	* doesn't help.
	397	*/
	398	activeSegment.score += freqs[newDmer];
	399	}
	400	/* Add the dmer to the segment */
	401	activeSegment.end += 1;
	402	*newDmerOcc += 1;
	403
	404	/* If the window is now too large, drop the first position */
	405	if (activeSegment.end - activeSegment.begin == dmersInK + 1) {
	406	U32 delDmer = ctx->dmerAt[activeSegment.begin];
	407	U32 *delDmerOcc = COVER_map_at(activeDmers, delDmer);
	408	activeSegment.begin += 1;
	409	*delDmerOcc -= 1;
	410	/* If this is the last occurence of the dmer, subtract its score */
	411	if (*delDmerOcc == 0) {
	412	COVER_map_remove(activeDmers, delDmer);
	413	activeSegment.score -= freqs[delDmer];
	414	}
	415	}
	416
	417	/* If this segment is the best so far save it */
	418	if (activeSegment.score > bestSegment.score) {
	419	bestSegment = activeSegment;
	420	}
	421	}
	422	{
	423	/* Trim off the zero frequency head and tail from the segment. */
	424	U32 newBegin = bestSegment.end;
	425	U32 newEnd = bestSegment.begin;
	426	U32 pos;
	427	for (pos = bestSegment.begin; pos != bestSegment.end; ++pos) {
	428	U32 freq = freqs[ctx->dmerAt[pos]];
	429	if (freq != 0) {
	430	newBegin = MIN(newBegin, pos);
	431	newEnd = pos + 1;
	432	}
	433	}
	434	bestSegment.begin = newBegin;
	435	bestSegment.end = newEnd;
	436	}
	437	{
	438	/* Zero out the frequency of each dmer covered by the chosen segment. */
	439	U32 pos;
	440	for (pos = bestSegment.begin; pos != bestSegment.end; ++pos) {
	441	freqs[ctx->dmerAt[pos]] = 0;
	442	}
	443	}
	444	return bestSegment;
	445	}
	446
	447	/**
	448	* Check the validity of the parameters.
	449	* Returns non-zero if the parameters are valid and 0 otherwise.
	450	*/
	451	static int COVER_checkParameters(COVER_params_t parameters) {
	452	/* k and d are required parameters */
	453	if (parameters.d == 0 \|\| parameters.k == 0) {
	454	return 0;
	455	}
	456	/* d <= k */
	457	if (parameters.d > parameters.k) {
	458	return 0;
	459	}
	460	return 1;
	461	}
	462
	463	/**
	464	* Clean up a context initialized with `COVER_ctx_init()`.
	465	*/
	466	static void COVER_ctx_destroy(COVER_ctx_t *ctx) {
	467	if (!ctx) {
	468	return;
	469	}
	470	if (ctx->suffix) {
	471	free(ctx->suffix);
	472	ctx->suffix = NULL;
	473	}
	474	if (ctx->freqs) {
	475	free(ctx->freqs);
	476	ctx->freqs = NULL;
	477	}
	478	if (ctx->dmerAt) {
	479	free(ctx->dmerAt);
	480	ctx->dmerAt = NULL;
	481	}
	482	if (ctx->offsets) {
	483	free(ctx->offsets);
	484	ctx->offsets = NULL;
	485	}
	486	}
	487
	488	/**
	489	* Prepare a context for dictionary building.
	490	* The context is only dependent on the parameter `d` and can used multiple
	491	* times.
	492	* Returns 1 on success or zero on error.
	493	* The context must be destroyed with `COVER_ctx_destroy()`.
	494	*/
	495	static int COVER_ctx_init(COVER_ctx_t ctx, const void samplesBuffer,
	496	const size_t *samplesSizes, unsigned nbSamples,
	497	unsigned d) {
	498	const BYTE const samples = (const BYTE )samplesBuffer;
	499	const size_t totalSamplesSize = COVER_sum(samplesSizes, nbSamples);
	500	/* Checks */
	501	if (totalSamplesSize < d \|\|
	502	totalSamplesSize >= (size_t)COVER_MAX_SAMPLES_SIZE) {
	503	DISPLAYLEVEL(1, "Total samples size is too large, maximum size is %u MB\n",
	504	(COVER_MAX_SAMPLES_SIZE >> 20));
	505	return 0;
	506	}
	507	/* Zero the context */
	508	memset(ctx, 0, sizeof(*ctx));
	509	DISPLAYLEVEL(2, "Training on %u samples of total size %u\n", nbSamples,
	510	(U32)totalSamplesSize);
	511	ctx->samples = samples;
	512	ctx->samplesSizes = samplesSizes;
	513	ctx->nbSamples = nbSamples;
	514	/* Partial suffix array */
	515	ctx->suffixSize = totalSamplesSize - d + 1;
	516	ctx->suffix = (U32 )malloc(ctx->suffixSize sizeof(U32));
	517	/* Maps index to the dmerID */
	518	ctx->dmerAt = (U32 )malloc(ctx->suffixSize sizeof(U32));
	519	/* The offsets of each file */
	520	ctx->offsets = (size_t )malloc((nbSamples + 1) sizeof(size_t));
	521	if (!ctx->suffix \|\| !ctx->dmerAt \|\| !ctx->offsets) {
	522	DISPLAYLEVEL(1, "Failed to allocate scratch buffers\n");
	523	COVER_ctx_destroy(ctx);
	524	return 0;
	525	}
	526	ctx->freqs = NULL;
	527	ctx->d = d;
	528
	529	/* Fill offsets from the samlesSizes */
	530	{
	531	U32 i;
	532	ctx->offsets[0] = 0;
	533	for (i = 1; i <= nbSamples; ++i) {
	534	ctx->offsets[i] = ctx->offsets[i - 1] + samplesSizes[i - 1];
	535	}
	536	}
	537	DISPLAYLEVEL(2, "Constructing partial suffix array\n");
	538	{
	539	/* suffix is a partial suffix array.
	540	* It only sorts suffixes by their first parameters.d bytes.
	541	* The sort is stable, so each dmer group is sorted by position in input.
	542	*/
	543	U32 i;
	544	for (i = 0; i < ctx->suffixSize; ++i) {
	545	ctx->suffix[i] = i;
	546	}
	547	/* qsort doesn't take an opaque pointer, so pass as a global */
	548	g_ctx = ctx;
	549	qsort(ctx->suffix, ctx->suffixSize, sizeof(U32), &COVER_strict_cmp);
	550	}
	551	DISPLAYLEVEL(2, "Computing frequencies\n");
	552	/* For each dmer group (group of positions with the same first d bytes):
	553	* 1. For each position we set dmerAt[position] = dmerID. The dmerID is
	554	* (groupBeginPtr - suffix). This allows us to go from position to
	555	* dmerID so we can look up values in freq.
	556	* 2. We calculate how many samples the dmer occurs in and save it in
	557	* freqs[dmerId].
	558	*/
	559	COVER_groupBy(ctx->suffix, ctx->suffixSize, sizeof(U32), ctx, &COVER_cmp,
	560	&COVER_group);
	561	ctx->freqs = ctx->suffix;
	562	ctx->suffix = NULL;
	563	return 1;
	564	}
	565
	566	/**
	567	* Given the prepared context build the dictionary.
	568	*/
	569	static size_t COVER_buildDictionary(const COVER_ctx_t ctx, U32 freqs,
	570	COVER_map_t activeDmers, void dictBuffer,
	571	size_t dictBufferCapacity,
	572	COVER_params_t parameters) {
	573	BYTE const dict = (BYTE )dictBuffer;
	574	size_t tail = dictBufferCapacity;
	575	/* Divide the data up into epochs of equal size.
	576	* We will select at least one segment from each epoch.
	577	*/
	578	const U32 epochs = (U32)(dictBufferCapacity / parameters.k);
	579	const U32 epochSize = (U32)(ctx->suffixSize / epochs);
	580	size_t epoch;
	581	DISPLAYLEVEL(2, "Breaking content into %u epochs of size %u\n", epochs,
	582	epochSize);
	583	/* Loop through the epochs until there are no more segments or the dictionary
	584	* is full.
	585	*/
	586	for (epoch = 0; tail > 0; epoch = (epoch + 1) % epochs) {
	587	const U32 epochBegin = (U32)(epoch * epochSize);
	588	const U32 epochEnd = epochBegin + epochSize;
	589	size_t segmentSize;
	590	/* Select a segment */
	591	COVER_segment_t segment = COVER_selectSegment(
	592	ctx, freqs, activeDmers, epochBegin, epochEnd, parameters);
	593	/* Trim the segment if necessary and if it is empty then we are done */
	594	segmentSize = MIN(segment.end - segment.begin + parameters.d - 1, tail);
	595	if (segmentSize == 0) {
	596	break;
	597	}
	598	/* We fill the dictionary from the back to allow the best segments to be
	599	* referenced with the smallest offsets.
	600	*/
	601	tail -= segmentSize;
	602	memcpy(dict + tail, ctx->samples + segment.begin, segmentSize);
	603	DISPLAYUPDATE(
	604	2, "\r%u%% ",
	605	(U32)(((dictBufferCapacity - tail) * 100) / dictBufferCapacity));
	606	}
	607	DISPLAYLEVEL(2, "\r%79s\r", "");
	608	return tail;
	609	}
	610
	611	/**
	612	* Translate from COVER_params_t to ZDICT_params_t required for finalizing the
	613	* dictionary.
	614	*/
	615	static ZDICT_params_t COVER_translateParams(COVER_params_t parameters) {
	616	ZDICT_params_t zdictParams;
	617	memset(&zdictParams, 0, sizeof(zdictParams));
	618	zdictParams.notificationLevel = 1;
	619	zdictParams.dictID = parameters.dictID;
	620	zdictParams.compressionLevel = parameters.compressionLevel;
	621	return zdictParams;
	622	}
	623
	624	/**
	625	* Constructs a dictionary using a heuristic based on the following paper:
	626	*
	627	* Liao, Petri, Moffat, Wirth
	628	* Effective Construction of Relative Lempel-Ziv Dictionaries
	629	* Published in WWW 2016.
	630	*/
	631	ZDICTLIB_API size_t COVER_trainFromBuffer(
	632	void dictBuffer, size_t dictBufferCapacity, const void samplesBuffer,
	633	const size_t *samplesSizes, unsigned nbSamples, COVER_params_t parameters) {
	634	BYTE const dict = (BYTE )dictBuffer;
	635	COVER_ctx_t ctx;
	636	COVER_map_t activeDmers;
	637	/* Checks */
	638	if (!COVER_checkParameters(parameters)) {
	639	DISPLAYLEVEL(1, "Cover parameters incorrect\n");
	640	return ERROR(GENERIC);
	641	}
	642	if (nbSamples == 0) {
	643	DISPLAYLEVEL(1, "Cover must have at least one input file\n");
	644	return ERROR(GENERIC);
	645	}
	646	if (dictBufferCapacity < ZDICT_DICTSIZE_MIN) {
	647	DISPLAYLEVEL(1, "dictBufferCapacity must be at least %u\n",
	648	ZDICT_DICTSIZE_MIN);
	649	return ERROR(dstSize_tooSmall);
	650	}
	651	/* Initialize global data */
	652	g_displayLevel = parameters.notificationLevel;
	653	/* Initialize context and activeDmers */
	654	if (!COVER_ctx_init(&ctx, samplesBuffer, samplesSizes, nbSamples,
	655	parameters.d)) {
	656	return ERROR(GENERIC);
	657	}
	658	if (!COVER_map_init(&activeDmers, parameters.k - parameters.d + 1)) {
	659	DISPLAYLEVEL(1, "Failed to allocate dmer map: out of memory\n");
	660	COVER_ctx_destroy(&ctx);
	661	return ERROR(GENERIC);
	662	}
	663
	664	DISPLAYLEVEL(2, "Building dictionary\n");
	665	{
	666	const size_t tail =
	667	COVER_buildDictionary(&ctx, ctx.freqs, &activeDmers, dictBuffer,
	668	dictBufferCapacity, parameters);
	669	ZDICT_params_t zdictParams = COVER_translateParams(parameters);
	670	const size_t dictionarySize = ZDICT_finalizeDictionary(
	671	dict, dictBufferCapacity, dict + tail, dictBufferCapacity - tail,
	672	samplesBuffer, samplesSizes, nbSamples, zdictParams);
	673	if (!ZSTD_isError(dictionarySize)) {
	674	DISPLAYLEVEL(2, "Constructed dictionary of size %u\n",
	675	(U32)dictionarySize);
	676	}
	677	COVER_ctx_destroy(&ctx);
	678	COVER_map_destroy(&activeDmers);
	679	return dictionarySize;
	680	}
	681	}
	682
	683	/**
	684	* COVER_best_t is used for two purposes:
	685	* 1. Synchronizing threads.
	686	* 2. Saving the best parameters and dictionary.
	687	*
	688	* All of the methods except COVER_best_init() are thread safe if zstd is
	689	* compiled with multithreaded support.
	690	*/
	691	typedef struct COVER_best_s {
	692	pthread_mutex_t mutex;
	693	pthread_cond_t cond;
	694	size_t liveJobs;
	695	void *dict;
	696	size_t dictSize;
	697	COVER_params_t parameters;
	698	size_t compressedSize;
	699	} COVER_best_t;
	700
	701	/**
	702	* Initialize the `COVER_best_t`.
	703	*/
	704	static void COVER_best_init(COVER_best_t *best) {
	705	if (!best) {
	706	return;
	707	}
	708	pthread_mutex_init(&best->mutex, NULL);
	709	pthread_cond_init(&best->cond, NULL);
	710	best->liveJobs = 0;
	711	best->dict = NULL;
	712	best->dictSize = 0;
	713	best->compressedSize = (size_t)-1;
	714	memset(&best->parameters, 0, sizeof(best->parameters));
	715	}
	716
	717	/**
	718	* Wait until liveJobs == 0.
	719	*/
	720	static void COVER_best_wait(COVER_best_t *best) {
	721	if (!best) {
	722	return;
	723	}
	724	pthread_mutex_lock(&best->mutex);
	725	while (best->liveJobs != 0) {
	726	pthread_cond_wait(&best->cond, &best->mutex);
	727	}
	728	pthread_mutex_unlock(&best->mutex);
	729	}
	730
	731	/**
	732	* Call COVER_best_wait() and then destroy the COVER_best_t.
	733	*/
	734	static void COVER_best_destroy(COVER_best_t *best) {
	735	if (!best) {
	736	return;
	737	}
	738	COVER_best_wait(best);
	739	if (best->dict) {
	740	free(best->dict);
	741	}
	742	pthread_mutex_destroy(&best->mutex);
	743	pthread_cond_destroy(&best->cond);
	744	}
	745
	746	/**
	747	* Called when a thread is about to be launched.
	748	* Increments liveJobs.
	749	*/
	750	static void COVER_best_start(COVER_best_t *best) {
	751	if (!best) {
	752	return;
	753	}
	754	pthread_mutex_lock(&best->mutex);
	755	++best->liveJobs;
	756	pthread_mutex_unlock(&best->mutex);
	757	}
	758
	759	/**
	760	* Called when a thread finishes executing, both on error or success.
	761	* Decrements liveJobs and signals any waiting threads if liveJobs == 0.
	762	* If this dictionary is the best so far save it and its parameters.
	763	*/
	764	static void COVER_best_finish(COVER_best_t *best, size_t compressedSize,
	765	COVER_params_t parameters, void *dict,
	766	size_t dictSize) {
	767	if (!best) {
	768	return;
	769	}
	770	{
	771	size_t liveJobs;
	772	pthread_mutex_lock(&best->mutex);
	773	--best->liveJobs;
	774	liveJobs = best->liveJobs;
	775	/* If the new dictionary is better */
	776	if (compressedSize < best->compressedSize) {
	777	/* Allocate space if necessary */
	778	if (!best->dict \|\| best->dictSize < dictSize) {
	779	if (best->dict) {
	780	free(best->dict);
	781	}
	782	best->dict = malloc(dictSize);
	783	if (!best->dict) {
	784	best->compressedSize = ERROR(GENERIC);
	785	best->dictSize = 0;
	786	return;
	787	}
	788	}
	789	/* Save the dictionary, parameters, and size */
	790	memcpy(best->dict, dict, dictSize);
	791	best->dictSize = dictSize;
	792	best->parameters = parameters;
	793	best->compressedSize = compressedSize;
	794	}
	795	pthread_mutex_unlock(&best->mutex);
	796	if (liveJobs == 0) {
	797	pthread_cond_broadcast(&best->cond);
	798	}
	799	}
	800	}
	801
	802	/**
	803	* Parameters for COVER_tryParameters().
	804	*/
	805	typedef struct COVER_tryParameters_data_s {
	806	const COVER_ctx_t *ctx;
	807	COVER_best_t *best;
	808	size_t dictBufferCapacity;
	809	COVER_params_t parameters;
	810	} COVER_tryParameters_data_t;
	811
	812	/**
	813	* Tries a set of parameters and upates the COVER_best_t with the results.
	814	* This function is thread safe if zstd is compiled with multithreaded support.
	815	* It takes its parameters as an OWNING opaque pointer to support threading.
	816	*/
	817	static void COVER_tryParameters(void *opaque) {
	818	/* Save parameters as local variables */
	819	COVER_tryParameters_data_t const data = (COVER_tryParameters_data_t )opaque;
	820	const COVER_ctx_t *const ctx = data->ctx;
	821	const COVER_params_t parameters = data->parameters;
	822	size_t dictBufferCapacity = data->dictBufferCapacity;
	823	size_t totalCompressedSize = ERROR(GENERIC);
	824	/* Allocate space for hash table, dict, and freqs */
	825	COVER_map_t activeDmers;
	826	BYTE const dict = (BYTE const)malloc(dictBufferCapacity);
	827	U32 freqs = (U32 )malloc(ctx->suffixSize * sizeof(U32));
	828	if (!COVER_map_init(&activeDmers, parameters.k - parameters.d + 1)) {
	829	DISPLAYLEVEL(1, "Failed to allocate dmer map: out of memory\n");
	830	goto _cleanup;
	831	}
	832	if (!dict \|\| !freqs) {
	833	DISPLAYLEVEL(1, "Failed to allocate buffers: out of memory\n");
	834	goto _cleanup;
	835	}
	836	/* Copy the frequencies because we need to modify them */
	837	memcpy(freqs, ctx->freqs, ctx->suffixSize * sizeof(U32));
	838	/* Build the dictionary */
	839	{
	840	const size_t tail = COVER_buildDictionary(ctx, freqs, &activeDmers, dict,
	841	dictBufferCapacity, parameters);
	842	const ZDICT_params_t zdictParams = COVER_translateParams(parameters);
	843	dictBufferCapacity = ZDICT_finalizeDictionary(
	844	dict, dictBufferCapacity, dict + tail, dictBufferCapacity - tail,
	845	ctx->samples, ctx->samplesSizes, (unsigned)ctx->nbSamples, zdictParams);
	846	if (ZDICT_isError(dictBufferCapacity)) {
	847	DISPLAYLEVEL(1, "Failed to finalize dictionary\n");
	848	goto _cleanup;
	849	}
	850	}
	851	/* Check total compressed size */
	852	{
	853	/* Pointers */
	854	ZSTD_CCtx *cctx;
	855	ZSTD_CDict *cdict;
	856	void *dst;
	857	/* Local variables */
	858	size_t dstCapacity;
	859	size_t i;
	860	/* Allocate dst with enough space to compress the maximum sized sample */
	861	{
	862	size_t maxSampleSize = 0;
	863	for (i = 0; i < ctx->nbSamples; ++i) {
	864	maxSampleSize = MAX(ctx->samplesSizes[i], maxSampleSize);
	865	}
	866	dstCapacity = ZSTD_compressBound(maxSampleSize);
	867	dst = malloc(dstCapacity);
	868	}
	869	/* Create the cctx and cdict */
	870	cctx = ZSTD_createCCtx();
	871	cdict =
	872	ZSTD_createCDict(dict, dictBufferCapacity, parameters.compressionLevel);
	873	if (!dst \|\| !cctx \|\| !cdict) {
	874	goto _compressCleanup;
	875	}
	876	/* Compress each sample and sum their sizes (or error) */
	877	totalCompressedSize = 0;
	878	for (i = 0; i < ctx->nbSamples; ++i) {
	879	const size_t size = ZSTD_compress_usingCDict(
	880	cctx, dst, dstCapacity, ctx->samples + ctx->offsets[i],
	881	ctx->samplesSizes[i], cdict);
	882	if (ZSTD_isError(size)) {
	883	totalCompressedSize = ERROR(GENERIC);
	884	goto _compressCleanup;
	885	}
	886	totalCompressedSize += size;
	887	}
	888	_compressCleanup:
	889	ZSTD_freeCCtx(cctx);
	890	ZSTD_freeCDict(cdict);
	891	if (dst) {
	892	free(dst);
	893	}
	894	}
	895
	896	_cleanup:
	897	COVER_best_finish(data->best, totalCompressedSize, parameters, dict,
	898	dictBufferCapacity);
	899	free(data);
	900	COVER_map_destroy(&activeDmers);
	901	if (dict) {
	902	free(dict);
	903	}
	904	if (freqs) {
	905	free(freqs);
	906	}
	907	}
	908
	909	ZDICTLIB_API size_t COVER_optimizeTrainFromBuffer(void *dictBuffer,
	910	size_t dictBufferCapacity,
	911	const void *samplesBuffer,
	912	const size_t *samplesSizes,
	913	unsigned nbSamples,
	914	COVER_params_t *parameters) {
	915	/* constants */
	916	const unsigned nbThreads = parameters->nbThreads;
	917	const unsigned kMinD = parameters->d == 0 ? 6 : parameters->d;
	918	const unsigned kMaxD = parameters->d == 0 ? 16 : parameters->d;
	919	const unsigned kMinK = parameters->k == 0 ? kMaxD : parameters->k;
	920	const unsigned kMaxK = parameters->k == 0 ? 2048 : parameters->k;
	921	const unsigned kSteps = parameters->steps == 0 ? 32 : parameters->steps;
	922	const unsigned kStepSize = MAX((kMaxK - kMinK) / kSteps, 1);
	923	const unsigned kIterations =
	924	(1 + (kMaxD - kMinD) / 2) * (1 + (kMaxK - kMinK) / kStepSize);
	925	/* Local variables */
	926	const int displayLevel = parameters->notificationLevel;
	927	unsigned iteration = 1;
	928	unsigned d;
	929	unsigned k;
	930	COVER_best_t best;
	931	POOL_ctx *pool = NULL;
	932	/* Checks */
	933	if (kMinK < kMaxD \|\| kMaxK < kMinK) {
	934	LOCALDISPLAYLEVEL(displayLevel, 1, "Incorrect parameters\n");
	935	return ERROR(GENERIC);
	936	}
	937	if (nbSamples == 0) {
	938	DISPLAYLEVEL(1, "Cover must have at least one input file\n");
	939	return ERROR(GENERIC);
	940	}
	941	if (dictBufferCapacity < ZDICT_DICTSIZE_MIN) {
	942	DISPLAYLEVEL(1, "dictBufferCapacity must be at least %u\n",
	943	ZDICT_DICTSIZE_MIN);
	944	return ERROR(dstSize_tooSmall);
	945	}
	946	if (nbThreads > 1) {
	947	pool = POOL_create(nbThreads, 1);
	948	if (!pool) {
	949	return ERROR(memory_allocation);
	950	}
	951	}
	952	/* Initialization */
	953	COVER_best_init(&best);
	954	/* Turn down global display level to clean up display at level 2 and below */
	955	g_displayLevel = parameters->notificationLevel - 1;
	956	/* Loop through d first because each new value needs a new context */
	957	LOCALDISPLAYLEVEL(displayLevel, 2, "Trying %u different sets of parameters\n",
	958	kIterations);
	959	for (d = kMinD; d <= kMaxD; d += 2) {
	960	/* Initialize the context for this value of d */
	961	COVER_ctx_t ctx;
	962	LOCALDISPLAYLEVEL(displayLevel, 3, "d=%u\n", d);
	963	if (!COVER_ctx_init(&ctx, samplesBuffer, samplesSizes, nbSamples, d)) {
	964	LOCALDISPLAYLEVEL(displayLevel, 1, "Failed to initialize context\n");
	965	COVER_best_destroy(&best);
	966	return ERROR(GENERIC);
	967	}
	968	/* Loop through k reusing the same context */
	969	for (k = kMinK; k <= kMaxK; k += kStepSize) {
	970	/* Prepare the arguments */
	971	COVER_tryParameters_data_t data = (COVER_tryParameters_data_t )malloc(
	972	sizeof(COVER_tryParameters_data_t));
	973	LOCALDISPLAYLEVEL(displayLevel, 3, "k=%u\n", k);
	974	if (!data) {
	975	LOCALDISPLAYLEVEL(displayLevel, 1, "Failed to allocate parameters\n");
	976	COVER_best_destroy(&best);
	977	COVER_ctx_destroy(&ctx);
	978	return ERROR(GENERIC);
	979	}
	980	data->ctx = &ctx;
	981	data->best = &best;
	982	data->dictBufferCapacity = dictBufferCapacity;
	983	data->parameters = *parameters;
	984	data->parameters.k = k;
	985	data->parameters.d = d;
	986	data->parameters.steps = kSteps;
	987	/* Check the parameters */
	988	if (!COVER_checkParameters(data->parameters)) {
	989	DISPLAYLEVEL(1, "Cover parameters incorrect\n");
	990	continue;
	991	}
	992	/* Call the function and pass ownership of data to it */
	993	COVER_best_start(&best);
	994	if (pool) {
	995	POOL_add(pool, &COVER_tryParameters, data);
	996	} else {
	997	COVER_tryParameters(data);
	998	}
	999	/* Print status */
	1000	LOCALDISPLAYUPDATE(displayLevel, 2, "\r%u%% ",
	1001	(U32)((iteration * 100) / kIterations));
	1002	++iteration;
	1003	}
	1004	COVER_best_wait(&best);
	1005	COVER_ctx_destroy(&ctx);
	1006	}
	1007	LOCALDISPLAYLEVEL(displayLevel, 2, "\r%79s\r", "");
	1008	/* Fill the output buffer and parameters with output of the best parameters */
	1009	{
	1010	const size_t dictSize = best.dictSize;
	1011	if (ZSTD_isError(best.compressedSize)) {
	1012	COVER_best_destroy(&best);
	1013	return best.compressedSize;
	1014	}
	1015	*parameters = best.parameters;
	1016	memcpy(dictBuffer, best.dict, dictSize);
	1017	COVER_best_destroy(&best);
	1018	POOL_free(pool);
	1019	return dictSize;
	1020	}
	1021	}

contrib/python-zstandard/NEWS.rst

0 +27 0

		@@ -1,6 +1,33 b''
1	1	Version History
2	2	===============
3	3
	4	0.7.0 (released 2017-02-07)
	5	---------------------------
	6
	7	* Added zstd.get_frame_parameters() to obtain info about a zstd frame.
	8	* Added ZstdDecompressor.decompress_content_dict_chain() for efficient
	9	decompression of content-only dictionary chains.
	10	* CFFI module fully implemented; all tests run against both C extension and
	11	CFFI implementation.
	12	* Vendored version of zstd updated to 1.1.3.
	13	* Use ZstdDecompressor.decompress() now uses ZSTD_createDDict_byReference()
	14	to avoid extra memory allocation of dict data.
	15	* Add function names to error messages (by using ":name" in PyArg_Parse*
	16	functions).
	17	* Reuse decompression context across operations. Previously, we created a
	18	new ZSTD_DCtx for each decompress(). This was measured to slow down
	19	decompression by 40-200MB/s. The API guarantees say ZstdDecompressor
	20	is not thread safe. So we reuse the ZSTD_DCtx across operations and make
	21	things faster in the process.
	22	* ZstdCompressor.write_to()'s compress() and flush() methods now return number
	23	of bytes written.
	24	* ZstdDecompressor.write_to()'s write() method now returns the number of bytes
	25	written to the underlying output object.
	26	* CompressionParameters instances now expose their values as attributes.
	27	* CompressionParameters instances no longer are subscriptable nor behave
	28	as tuples (backwards incompatible). Use attributes to obtain values.
	29	* DictParameters instances now expose their values as attributes.
	30
4	31	0.6.0 (released 2017-01-14)
5	32	---------------------------
6	33

contrib/python-zstandard/README.rst

0 +144 -30

              This project provides Python bindings for interfacing with the
              `Zstandard <http://www.zstd.net>`_ compression library. A C extension
-             and CFFI interface is provided.
+             and CFFI interface are provided.
-             The primary goal of the extension is to provide a Pythonic interface to
-             the underlying C API. This means exposing most of the features and flexibility
+             The primary goal of the project is to provide a rich interface to the
+             underlying C API through a Pythonic interface while not sacrificing
+             performance. This means exposing most of the features and flexibility
              of the C API while not sacrificing usability or safety that Python provides.
              The canonical home for this project is
              may be some backwards incompatible changes before 1.0. Though the author
              does not intend to make any major changes to the Python API.
+             This project is vendored and distributed with Mercurial 4.1, where it is
+             used in a production capacity.
              There is continuous integration for Python versions 2.6, 2.7, and 3.3+
              on Linux x86_x64 and Windows x86 and x86_64. The author is reasonably
              confident the extension is stable and works as advertised on these
              support compression without the framing headers. But the author doesn't
              believe it a high priority at this time.
-             The CFFI bindings are half-baked and need to be finished.
+             The CFFI bindings are feature complete and all tests run against both
+             the C extension and CFFI bindings to ensure behavior parity.
              Requirements
              ============
-             This extension is designed to run with Python 2.6, 2.7, 3.3, 3.4, and 3.5
-             on common platforms (Linux, Windows, and OS X). Only x86_64 is currently
-             well-tested as an architecture.
+             This extension is designed to run with Python 2.6, 2.7, 3.3, 3.4, 3.5, and
+.6 on common platforms (Linux, Windows, and OS X). Only x86_64 is
+             currently well-tested as an architecture.
              Installing
              ==========
              Comparison to Other Python Bindings
              ===================================
-             https://pypi.python.org/pypi/zstd is an alternative Python binding to
+             https://pypi.python.org/pypi/zstd is an alternate Python binding to
              Zstandard. At the time this was written, the latest release of that
-             package (1.0.0.2) had the following significant differences from this package:
-             * It only exposes the simple API for compression and decompression operations.
-               This extension exposes the streaming API, dictionary training, and more.
-             * It adds a custom framing header to compressed data and there is no way to
-               disable it. This means that data produced with that module cannot be used by
-               other Zstandard implementations.
+             package (1.1.2) only exposed the simple APIs for compression and decompression.
+             This package exposes much more of the zstd API, including streaming and
+             dictionary compression. This package also has CFFI support.
              Bundling of Zstandard Source Code
              =================================
              compressor's internal state into the output object. This may result in 0 or
              more ``write()`` calls to the output object.
+             Both ``write()`` and ``flush()`` return the number of bytes written to the
+             object's ``write()``. In many cases, small inputs do not accumulate enough
+             data to cause a write and ``write()`` will return ``0``.
              If the size of the data being fed to this streaming compressor is known,
              you can declare it before compression begins::
              the decompressor by calling ``write(data)`` and decompressed output is written
              to the output object by calling its ``write(data)`` method.
+             Calls to ``write()`` will return the number of bytes written to the output
+             object. Not all inputs will result in bytes being written, so return values
+             of ``0`` are possible.
              The size of chunks being ``write()`` to the destination can be specified::
                  dctx = zstd.ZstdDecompressor()
                 data = dobj.decompress(compressed_chunk_0)
                 data = dobj.decompress(compressed_chunk_1)
+             Content-Only Dictionary Chain Decompression
+             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+             ``decompress_content_dict_chain(frames)`` performs decompression of a list of
+             zstd frames produced using chained *content-only* dictionary compression. Such
+             a list of frames is produced by compressing discrete inputs where each
+             non-initial input is compressed with a *content-only* dictionary consisting
+             of the content of the previous input.
+             For example, say you have the following inputs::
+                inputs = [b'input 1', b'input 2', b'input 3']
+             The zstd frame chain consists of:
+. ``b'input 1'`` compressed in standalone/discrete mode
+. ``b'input 2'`` compressed using ``b'input 1'`` as a *content-only* dictionary
+. ``b'input 3'`` compressed using ``b'input 2'`` as a *content-only* dictionary
+             Each zstd frame **must** have the content size written.
+             The following Python code can be used to produce a *content-only dictionary
+             chain*::
+             	def make_chain(inputs):
+             	    frames = []
+             		# First frame is compressed in standalone/discrete mode.
+             		zctx = zstd.ZstdCompressor(write_content_size=True)
+             		frames.append(zctx.compress(inputs[0]))
+             		# Subsequent frames use the previous fulltext as a content-only dictionary
+             		for i, raw in enumerate(inputs[1:]):
+             		    dict_data = zstd.ZstdCompressionDict(inputs[i])
+             			zctx = zstd.ZstdCompressor(write_content_size=True, dict_data=dict_data)
+             			frames.append(zctx.compress(raw))
+             		return frames
+             ``decompress_content_dict_chain()`` returns the uncompressed data of the last
+             element in the input chain.
+             It is possible to implement *content-only dictionary chain* decompression
+             on top of other Python APIs. However, this function will likely be significantly
+             faster, especially for long input chains, as it avoids the overhead of
+             instantiating and passing around intermediate objects between C and Python.
              Choosing an API
              ---------------
                 dict_data = zstd.ZstdCompressionDict(data)
+             It is possible to construct a dictionary from *any* data. Unless the
+             data begins with a magic header, the dictionary will be treated as
+             *content-only*. *Content-only* dictionaries allow compression operations
+             that follow to reference raw data within the content. For one use of
+             *content-only* dictionaries, see
+             ``ZstdDecompressor.decompress_content_dict_chain()``.
              More interestingly, instances can be created by *training* on sample data::
                 dict_data = zstd.train_dictionary(size, samples)
                  cctx = zstd.ZstdCompressor(compression_params=params)
-             The members of the ``CompressionParameters`` tuple are as follows::
+             The members/attributes of ``CompressionParameters`` instances are as follows::
-             * 0 - Window log
-             * 1 - Chain log
-             * 2 - Hash log
-             * 3 - Search log
-             * 4 - Search length
-             * 5 - Target length
-             * 6 - Strategy (one of the ``zstd.STRATEGY_`` constants)
+             * window_log
+             * chain_log
+             * hash_log
+             * search_log
+             * search_length
+             * target_length
+             * strategy
+             This is the order the arguments are passed to the constructor if not using
+             named arguments.
              You'll need to read the Zstandard documentation for what these parameters
              do.
+             Frame Inspection
+             ----------------
+             Data emitted from zstd compression is encapsulated in a *frame*. This frame
+             begins with a 4 byte *magic number* header followed by 2 to 14 bytes describing
+             the frame in more detail. For more info, see
+             https://github.com/facebook/zstd/blob/master/doc/zstd_compression_format.md.
+             ``zstd.get_frame_parameters(data)`` parses a zstd *frame* header from a bytes
+             instance and return a ``FrameParameters`` object describing the frame.
+             Depending on which fields are present in the frame and their values, the
+             length of the frame parameters varies. If insufficient bytes are passed
+             in to fully parse the frame parameters, ``ZstdError`` is raised. To ensure
+             frame parameters can be parsed, pass in at least 18 bytes.
+             ``FrameParameters`` instances have the following attributes:
+             content_size
+                Integer size of original, uncompressed content. This will be ``0`` if the
+                original content size isn't written to the frame (controlled with the
+                ``write_content_size`` argument to ``ZstdCompressor``) or if the input
+                content size was ``0``.
+             window_size
+                Integer size of maximum back-reference distance in compressed data.
+             dict_id
+                Integer of dictionary ID used for compression. ``0`` if no dictionary
+                ID was used or if the dictionary ID was ``0``.
+             has_checksum
+                Bool indicating whether a 4 byte content checksum is stored at the end
+                of the frame.
              Misc Functionality
              ------------------
              TARGETLENGTH_MAX
                  Maximum value for compression parameter
              STRATEGY_FAST
-                 Compression strategory
+                 Compression strategy
              STRATEGY_DFAST
-                 Compression strategory
+                 Compression strategy
              STRATEGY_GREEDY
-                 Compression strategory
+                 Compression strategy
              STRATEGY_LAZY
-                 Compression strategory
+                 Compression strategy
              STRATEGY_LAZY2
-                 Compression strategory
+                 Compression strategy
              STRATEGY_BTLAZY2
-                 Compression strategory
+                 Compression strategy
              STRATEGY_BTOPT
-                 Compression strategory
+                 Compression strategy
+             Performance Considerations
+             --------------------------
+             The ``ZstdCompressor`` and ``ZstdDecompressor`` types maintain state to a
+             persistent compression or decompression *context*. Reusing a ``ZstdCompressor``
+             or ``ZstdDecompressor`` instance for multiple operations is faster than
+             instantiating a new ``ZstdCompressor`` or ``ZstdDecompressor`` for each
+             operation. The differences are magnified as the size of data decreases. For
+             example, the difference between *context* reuse and non-reuse for 100,000
+byte inputs will be significant (possiby over 10x faster to reuse contexts)
+             whereas 10 1,000,000 byte inputs will be more similar in speed (because the
+             time spent doing compression dwarfs time spent creating new *contexts*).
              Note on Zstandard's *Experimental* API
              ======================================

contrib/python-zstandard/c-ext/compressiondict.c

0 +5 -4

              	void* dict;
              	ZstdCompressionDict* result;
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "nO!|O!", kwlist,
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "nO!|O!:train_dictionary",
+             		kwlist,
              		&capacity,
              		&PyList_Type, &samples,
              		(PyObject*)&DictParametersType, &parameters)) {
              		sampleItem = PyList_GetItem(samples, sampleIndex);
              		if (!PyBytes_Check(sampleItem)) {
              			PyErr_SetString(PyExc_ValueError, "samples must be bytes");
-             			/* TODO probably need to perform DECREF here */
              			return NULL;
              		}
              		samplesSize += PyBytes_GET_SIZE(sampleItem);
              	self->dictSize = 0;
              #if PY_MAJOR_VERSION >= 3
-             	if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
+             	if (!PyArg_ParseTuple(args, "y#:ZstdCompressionDict",
              #else
-             	if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
+             	if (!PyArg_ParseTuple(args, "s#:ZstdCompressionDict",
              #endif
+             		&source, &sourceSize)) {
              		return -1;
              	}

contrib/python-zstandard/c-ext/compressionparams.c

0 +104 -110

		@@ -25,7 +25,8 b' CompressionParametersObject* get_compres'
25	25	ZSTD_compressionParameters params;
26	26	CompressionParametersObject* result;
27	27
28		if (!PyArg_ParseTuple(args, "i\|Kn", &~~compressionLevel~~, &~~sourceSize~~, &~~dictSize~~)) {
	28	if (!PyArg_ParseTuple(args, "i\|Kn:get_compression_parameters",
	29	&compressionLevel, &sourceSize, &dictSize)) {
29	30	return NULL;
30	31	}
31	32
		@@ -47,12 +48,85 b' CompressionParametersObject* get_compres'
47	48	return result;
48	49	}
49	50
	51	static int CompressionParameters_init(CompressionParametersObject* self, PyObject* args, PyObject* kwargs) {
	52	static char* kwlist[] = {
	53	"window_log",
	54	"chain_log",
	55	"hash_log",
	56	"search_log",
	57	"search_length",
	58	"target_length",
	59	"strategy",
	60	NULL
	61	};
	62
	63	unsigned windowLog;
	64	unsigned chainLog;
	65	unsigned hashLog;
	66	unsigned searchLog;
	67	unsigned searchLength;
	68	unsigned targetLength;
	69	unsigned strategy;
	70
	71	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "IIIIIII:CompressionParameters",
	72	kwlist, &windowLog, &chainLog, &hashLog, &searchLog, &searchLength,
	73	&targetLength, &strategy)) {
	74	return -1;
	75	}
	76
	77	if (windowLog < ZSTD_WINDOWLOG_MIN \|\| windowLog > ZSTD_WINDOWLOG_MAX) {
	78	PyErr_SetString(PyExc_ValueError, "invalid window log value");
	79	return -1;
	80	}
	81
	82	if (chainLog < ZSTD_CHAINLOG_MIN \|\| chainLog > ZSTD_CHAINLOG_MAX) {
	83	PyErr_SetString(PyExc_ValueError, "invalid chain log value");
	84	return -1;
	85	}
	86
	87	if (hashLog < ZSTD_HASHLOG_MIN \|\| hashLog > ZSTD_HASHLOG_MAX) {
	88	PyErr_SetString(PyExc_ValueError, "invalid hash log value");
	89	return -1;
	90	}
	91
	92	if (searchLog < ZSTD_SEARCHLOG_MIN \|\| searchLog > ZSTD_SEARCHLOG_MAX) {
	93	PyErr_SetString(PyExc_ValueError, "invalid search log value");
	94	return -1;
	95	}
	96
	97	if (searchLength < ZSTD_SEARCHLENGTH_MIN \|\| searchLength > ZSTD_SEARCHLENGTH_MAX) {
	98	PyErr_SetString(PyExc_ValueError, "invalid search length value");
	99	return -1;
	100	}
	101
	102	if (targetLength < ZSTD_TARGETLENGTH_MIN \|\| targetLength > ZSTD_TARGETLENGTH_MAX) {
	103	PyErr_SetString(PyExc_ValueError, "invalid target length value");
	104	return -1;
	105	}
	106
	107	if (strategy < ZSTD_fast \|\| strategy > ZSTD_btopt) {
	108	PyErr_SetString(PyExc_ValueError, "invalid strategy value");
	109	return -1;
	110	}
	111
	112	self->windowLog = windowLog;
	113	self->chainLog = chainLog;
	114	self->hashLog = hashLog;
	115	self->searchLog = searchLog;
	116	self->searchLength = searchLength;
	117	self->targetLength = targetLength;
	118	self->strategy = strategy;
	119
	120	return 0;
	121	}
	122
50	123	PyObject* estimate_compression_context_size(PyObject* self, PyObject* args) {
51	124	CompressionParametersObject* params;
52	125	ZSTD_compressionParameters zparams;
53	126	PyObject* result;
54	127
55		if (!PyArg_ParseTuple(args, "O!", &~~CompressionParametersType~~, &~~params~~)) {
	128	if (!PyArg_ParseTuple(args, "O!:estimate_compression_context_size",
	129	&CompressionParametersType, &params)) {
56	130	return NULL;
57	131	}
58	132
		@@ -64,113 +138,33 b' PyObject* estimate_compression_context_s'
64	138	PyDoc_STRVAR(CompressionParameters__doc__,
65	139	"CompressionParameters: low-level control over zstd compression");
66	140
67		static PyObject* CompressionParameters_new(PyTypeObject* subtype, PyObject* args, PyObject* kwargs) {
68		CompressionParametersObject* self;
69		unsigned windowLog;
70		unsigned chainLog;
71		unsigned hashLog;
72		unsigned searchLog;
73		unsigned searchLength;
74		unsigned targetLength;
75		unsigned strategy;
76
77		if (!PyArg_ParseTuple(args, "IIIIIII", &windowLog, &chainLog, &hashLog, &searchLog,
78		&searchLength, &targetLength, &strategy)) {
79		return NULL;
80		}
81
82		if (windowLog < ZSTD_WINDOWLOG_MIN \|\| windowLog > ZSTD_WINDOWLOG_MAX) {
83		PyErr_SetString(PyExc_ValueError, "invalid window log value");
84		return NULL;
85		}
86
87		if (chainLog < ZSTD_CHAINLOG_MIN \|\| chainLog > ZSTD_CHAINLOG_MAX) {
88		PyErr_SetString(PyExc_ValueError, "invalid chain log value");
89		return NULL;
90		}
91
92		if (hashLog < ZSTD_HASHLOG_MIN \|\| hashLog > ZSTD_HASHLOG_MAX) {
93		PyErr_SetString(PyExc_ValueError, "invalid hash log value");
94		return NULL;
95		}
96
97		if (searchLog < ZSTD_SEARCHLOG_MIN \|\| searchLog > ZSTD_SEARCHLOG_MAX) {
98		PyErr_SetString(PyExc_ValueError, "invalid search log value");
99		return NULL;
100		}
101
102		if (searchLength < ZSTD_SEARCHLENGTH_MIN \|\| searchLength > ZSTD_SEARCHLENGTH_MAX) {
103		PyErr_SetString(PyExc_ValueError, "invalid search length value");
104		return NULL;
105		}
106
107		if (targetLength < ZSTD_TARGETLENGTH_MIN \|\| targetLength > ZSTD_TARGETLENGTH_MAX) {
108		PyErr_SetString(PyExc_ValueError, "invalid target length value");
109		return NULL;
110		}
111
112		if (strategy < ZSTD_fast \|\| strategy > ZSTD_btopt) {
113		PyErr_SetString(PyExc_ValueError, "invalid strategy value");
114		return NULL;
115		}
116
117		self = (CompressionParametersObject*)subtype->tp_alloc(subtype, 1);
118		if (!self) {
119		return NULL;
120		}
121
122		self->windowLog = windowLog;
123		self->chainLog = chainLog;
124		self->hashLog = hashLog;
125		self->searchLog = searchLog;
126		self->searchLength = searchLength;
127		self->targetLength = targetLength;
128		self->strategy = strategy;
129
130		return (PyObject*)self;
131		}
132
133	141	static void CompressionParameters_dealloc(PyObject* self) {
134	142	PyObject_Del(self);
135	143	}
136	144
137		static Py~~_ssize_t~~ CompressionParameters_~~length~~(~~PyObject~~* ~~self~~) {
138		return 7;
139		}
140
141		static PyObject* CompressionParameters_item(PyObject* o, Py_ssize_t i) {
142		CompressionParametersObject* self = (CompressionParametersObject*)o;
143
144		switch (i) {
145		case 0:
146		return PyLong_FromLong(self->windowLog);
147		case 1:
148		return PyLong_FromLong(self->chainLog);
149		case 2:
150		return PyLong_FromLong(self->hashLog);
151		case 3:
152		return PyLong_FromLong(self->searchLog);
153		case 4:
154		return PyLong_FromLong(self->searchLength);
155		case 5:
156		return PyLong_FromLong(self->targetLength);
157		case 6:
158		return PyLong_FromLong(self->strategy);
159		default:
160		PyErr_SetString(PyExc_IndexError, "index out of range");
161		return NULL;
162		}
163		}
164
165		static PySequenceMethods CompressionParameters_sq = {
166		CompressionParameters_length, /* sq_length */
167		0, /* sq_concat */
168		0, /* sq_repeat */
169		CompressionParameters_item, /* sq_item */
170		0, /* sq_ass_item */
171		0, /* sq_contains */
172		0, /* sq_inplace_concat */
173		0 /* sq_inplace_repeat */
	145	static PyMemberDef CompressionParameters_members[] = {
	146	{ "window_log", T_UINT,
	147	offsetof(CompressionParametersObject, windowLog), READONLY,
	148	"window log" },
	149	{ "chain_log", T_UINT,
	150	offsetof(CompressionParametersObject, chainLog), READONLY,
	151	"chain log" },
	152	{ "hash_log", T_UINT,
	153	offsetof(CompressionParametersObject, hashLog), READONLY,
	154	"hash log" },
	155	{ "search_log", T_UINT,
	156	offsetof(CompressionParametersObject, searchLog), READONLY,
	157	"search log" },
	158	{ "search_length", T_UINT,
	159	offsetof(CompressionParametersObject, searchLength), READONLY,
	160	"search length" },
	161	{ "target_length", T_UINT,
	162	offsetof(CompressionParametersObject, targetLength), READONLY,
	163	"target length" },
	164	{ "strategy", T_INT,
	165	offsetof(CompressionParametersObject, strategy), READONLY,
	166	"strategy" },
	167	{ NULL }
174	168	};
175	169
176	170	PyTypeObject CompressionParametersType = {
		@@ -185,7 +179,7 b' PyTypeObject CompressionParametersType ='
185	179	0, /* tp_compare */
186	180	0, /* tp_repr */
187	181	0, /* tp_as_number */
188		&CompressionParameters_sq, /* tp_as_sequence */
	182	0, /* tp_as_sequence */
189	183	0, /* tp_as_mapping */
190	184	0, /* tp_hash */
191	185	0, /* tp_call */
		@@ -193,7 +187,7 b' PyTypeObject CompressionParametersType ='
193	187	0, /* tp_getattro */
194	188	0, /* tp_setattro */
195	189	0, /* tp_as_buffer */
196		Py_TPFLAGS_DEFAULT, /* tp_flags */
	190	Py_TPFLAGS_DEFAULT \| Py_TPFLAGS_BASETYPE, /* tp_flags */
197	191	CompressionParameters__doc__, /* tp_doc */
198	192	0, /* tp_traverse */
199	193	0, /* tp_clear */
		@@ -202,16 +196,16 b' PyTypeObject CompressionParametersType ='
202	196	0, /* tp_iter */
203	197	0, /* tp_iternext */
204	198	0, /* tp_methods */
205		0, /* tp_members */
	199	CompressionParameters_members, /* tp_members */
206	200	0, /* tp_getset */
207	201	0, /* tp_base */
208	202	0, /* tp_dict */
209	203	0, /* tp_descr_get */
210	204	0, /* tp_descr_set */
211	205	0, /* tp_dictoffset */
212		0, /* tp_init */
	206	(initproc)CompressionParameters_init, /* tp_init */
213	207	0, /* tp_alloc */
214		CompressionParameters_new, /* tp_new */
	208	PyType_GenericNew, /* tp_new */
215	209	};
216	210
217	211	void compressionparams_module_init(PyObject* mod) {

contrib/python-zstandard/c-ext/compressionwriter.c

0 +9 -7

              	ZSTD_outBuffer output;
              	PyObject* res;
-             	if (!PyArg_ParseTuple(args, "OOO", &exc_type, &exc_value, &exc_tb)) {
+             	if (!PyArg_ParseTuple(args, "OOO:__exit__", &exc_type, &exc_value, &exc_tb)) {
              		return NULL;
              	}
              	ZSTD_inBuffer input;
              	ZSTD_outBuffer output;
              	PyObject* res;
+             	Py_ssize_t totalWrite = 0;
              #if PY_MAJOR_VERSION >= 3
-             	if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
+             	if (!PyArg_ParseTuple(args, "y#:write", &source, &sourceSize)) {
              #else
-             	if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
+             	if (!PyArg_ParseTuple(args, "s#:write", &source, &sourceSize)) {
              #endif
              		return NULL;
              	}
              #endif
              				output.dst, output.pos);
              			Py_XDECREF(res);
+             			totalWrite += output.pos;
              		}
              		output.pos = 0;
              	}
              	PyMem_Free(output.dst);
-             	/* TODO return bytes written */
-             	Py_RETURN_NONE;
+             	return PyLong_FromSsize_t(totalWrite);
              }
              static PyObject* ZstdCompressionWriter_flush(ZstdCompressionWriter* self, PyObject* args) {
              	size_t zresult;
              	ZSTD_outBuffer output;
              	PyObject* res;
+             	Py_ssize_t totalWrite = 0;
              	if (!self->entered) {
              		PyErr_SetString(ZstdError, "flush must be called from an active context manager");
              #endif
              				output.dst, output.pos);
              			Py_XDECREF(res);
+             			totalWrite += output.pos;
              		}
              		output.pos = 0;
              	}
              	PyMem_Free(output.dst);
-             	/* TODO return bytes written */
-             	Py_RETURN_NONE;
+             	return PyLong_FromSsize_t(totalWrite);
              }
              static PyMethodDef ZstdCompressionWriter_methods[] = {

contrib/python-zstandard/c-ext/compressobj.c

0 +3 -3

              	}
              #if PY_MAJOR_VERSION >= 3
-             	if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
+             	if (!PyArg_ParseTuple(args, "y#:compress", &source, &sourceSize)) {
              #else
-             	if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
+             	if (!PyArg_ParseTuple(args, "s#:compress", &source, &sourceSize)) {
              #endif
              		return NULL;
              	}
              	PyObject* result = NULL;
              	Py_ssize_t resultSize = 0;
-             	if (!PyArg_ParseTuple(args, "|i", &flushMode)) {
+             	if (!PyArg_ParseTuple(args, "|i:flush", &flushMode)) {
              		return NULL;
              	}

contrib/python-zstandard/c-ext/compressor.c

0 +12 -12

              	Py_BEGIN_ALLOW_THREADS
              	memset(&zmem, 0, sizeof(zmem));
              	compressor->cdict = ZSTD_createCDict_advanced(compressor->dict->dictData,
-             		compressor->dict->dictSize, *zparams, zmem);
+             		compressor->dict->dictSize, 1, *zparams, zmem);
              	Py_END_ALLOW_THREADS
              	if (!compressor->cdict) {
              	self->cparams = NULL;
              	self->cdict = NULL;
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|iO!O!OOO", kwlist,
-             		&level, &ZstdCompressionDictType, &dict,
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|iO!O!OOO:ZstdCompressor",
+             		kwlist,	&level, &ZstdCompressionDictType, &dict,
              		&CompressionParametersType, &params,
              		&writeChecksum, &writeContentSize, &writeDictID)) {
              		return -1;
              	PyObject* totalReadPy;
              	PyObject* totalWritePy;
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nkk", kwlist, &source, &dest, &sourceSize,
-             		&inSize, &outSize)) {
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|nkk:copy_stream", kwlist,
+             		&source, &dest, &sourceSize, &inSize, &outSize)) {
              		return NULL;
              	}
              	ZSTD_parameters zparams;
              #if PY_MAJOR_VERSION >= 3
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y#|O",
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y#|O:compress",
              #else
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s#|O",
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s#|O:compress",
              #endif
              		kwlist, &source, &sourceSize, &allowEmpty)) {
              		return NULL;
              		return NULL;
              	}
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n", kwlist, &inSize)) {
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|n:compressobj", kwlist, &inSize)) {
              		return NULL;
              	}
              	size_t outSize = ZSTD_CStreamOutSize();
              	ZstdCompressorIterator* result;
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nkk", kwlist, &reader, &sourceSize,
-             		&inSize, &outSize)) {
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nkk:read_from", kwlist,
+             		&reader, &sourceSize, &inSize, &outSize)) {
              		return NULL;
              	}
              	Py_ssize_t sourceSize = 0;
              	size_t outSize = ZSTD_CStreamOutSize();
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nk", kwlist, &writer, &sourceSize,
-             		&outSize)) {
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|nk:write_to", kwlist,
+             		&writer, &sourceSize, &outSize)) {
              		return NULL;
              	}

contrib/python-zstandard/c-ext/decompressionwriter.c

0 +5 -4

              	ZSTD_inBuffer input;
              	ZSTD_outBuffer output;
              	PyObject* res;
+             	Py_ssize_t totalWrite = 0;
              #if PY_MAJOR_VERSION >= 3
-             	if (!PyArg_ParseTuple(args, "y#", &source, &sourceSize)) {
+             	if (!PyArg_ParseTuple(args, "y#:write", &source, &sourceSize)) {
              #else
-             	if (!PyArg_ParseTuple(args, "s#", &source, &sourceSize)) {
+             	if (!PyArg_ParseTuple(args, "s#:write", &source, &sourceSize)) {
              #endif
              		return NULL;
              	}
              #endif
              				output.dst, output.pos);
              			Py_XDECREF(res);
+             			totalWrite += output.pos;
              			output.pos = 0;
              		}
              	}
              	PyMem_Free(output.dst);
-             	/* TODO return bytes written */
-             	Py_RETURN_NONE;
+             	return PyLong_FromSsize_t(totalWrite);
-             	}
              static PyMethodDef ZstdDecompressionWriter_methods[] = {

contrib/python-zstandard/c-ext/decompressobj.c

0 +2 -2

              	}
              #if PY_MAJOR_VERSION >= 3
-             	if (!PyArg_ParseTuple(args, "y#",
+             	if (!PyArg_ParseTuple(args, "y#:decompress",
              #else
-             	if (!PyArg_ParseTuple(args, "s#",
+             	if (!PyArg_ParseTuple(args, "s#:decompress",
              #endif
              		&source, &sourceSize)) {
              		return NULL;

contrib/python-zstandard/c-ext/decompressor.c

0 +231 -58

              	ZstdCompressionDict* dict = NULL;
-             	self->refdctx = NULL;
+             	self->dctx = NULL;
              	self->dict = NULL;
              	self->ddict = NULL;
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O!", kwlist,
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O!:ZstdDecompressor", kwlist,
              		&ZstdCompressionDictType, &dict)) {
              		return -1;
              	}
-             	/* Instead of creating a ZSTD_DCtx for every decompression operation,
-             	   we create an instance at object creation time and recycle it via
-             	   ZSTD_copyDCTx() on each use. This means each use is a malloc+memcpy
-             	   instead of a malloc+init. */
              	/* TODO lazily initialize the reference ZSTD_DCtx on first use since
              	   not instances of ZstdDecompressor will use a ZSTD_DCtx. */
-             	self->refdctx = ZSTD_createDCtx();
-             	if (!self->refdctx) {
+             	self->dctx = ZSTD_createDCtx();
+             	if (!self->dctx) {
              		PyErr_NoMemory();
              		goto except;
              	}
              	return 0;
              except:
-             	if (self->refdctx) {
-             		ZSTD_freeDCtx(self->refdctx);
-             		self->refdctx = NULL;
+             	if (self->dctx) {
+             		ZSTD_freeDCtx(self->dctx);
+             		self->dctx = NULL;
              	}
              	return -1;
              }
              static void Decompressor_dealloc(ZstdDecompressor* self) {
-             	if (self->refdctx) {
-             		ZSTD_freeDCtx(self->refdctx);
+             	if (self->dctx) {
+             		ZSTD_freeDCtx(self->dctx);
              	}
              	Py_XDECREF(self->dict);
              	PyObject* totalReadPy;
              	PyObject* totalWritePy;
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|kk", kwlist, &source,
-             		&dest, &inSize, &outSize)) {
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "OO|kk:copy_stream", kwlist,
+             		&source, &dest, &inSize, &outSize)) {
              		return NULL;
              	}
              	unsigned long long decompressedSize;
              	size_t destCapacity;
              	PyObject* result = NULL;
-             	ZSTD_DCtx* dctx = NULL;
              	void* dictData = NULL;
              	size_t dictSize = 0;
              	size_t zresult;
              #if PY_MAJOR_VERSION >= 3
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y#|n", kwlist,
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "y#|n:decompress",
              #else
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s#|n", kwlist,
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s#|n:decompress",
              #endif
-             		&source, &sourceSize, &maxOutputSize)) {
+             		kwlist, &source, &sourceSize, &maxOutputSize)) {
              		return NULL;
              	}
-             	dctx = PyMem_Malloc(ZSTD_sizeof_DCtx(self->refdctx));
-             	if (!dctx) {
-             		PyErr_NoMemory();
-             		return NULL;
+             	}
-             	ZSTD_copyDCtx(dctx, self->refdctx);
              	if (self->dict) {
              		dictData = self->dict->dictData;
              		dictSize = self->dict->dictSize;
              	if (dictData && !self->ddict) {
              		Py_BEGIN_ALLOW_THREADS
-             		self->ddict = ZSTD_createDDict(dictData, dictSize);
+             		self->ddict = ZSTD_createDDict_byReference(dictData, dictSize);
              		Py_END_ALLOW_THREADS
              		if (!self->ddict) {
              			PyErr_SetString(ZstdError, "could not create decompression dict");
-             			goto except;
+             			return NULL;
              		}
              	}
              		if (0 == maxOutputSize) {
              			PyErr_SetString(ZstdError, "input data invalid or missing content size "
              				"in frame header");
-             			goto except;
+             			return NULL;
              		}
              		else {
              			result = PyBytes_FromStringAndSize(NULL, maxOutputSize);
              	}
              	if (!result) {
-             		goto except;
+             		return NULL;
              	}
              	Py_BEGIN_ALLOW_THREADS
              	if (self->ddict) {
-             		zresult = ZSTD_decompress_usingDDict(dctx, PyBytes_AsString(result), destCapacity,
+             		zresult = ZSTD_decompress_usingDDict(self->dctx,
+             			PyBytes_AsString(result), destCapacity,
              			source, sourceSize, self->ddict);
              	}
              	else {
-             		zresult = ZSTD_decompressDCtx(dctx, PyBytes_AsString(result), destCapacity, source, sourceSize);
+             		zresult = ZSTD_decompressDCtx(self->dctx,
+             			PyBytes_AsString(result), destCapacity, source, sourceSize);
              	}
              	Py_END_ALLOW_THREADS
              	if (ZSTD_isError(zresult)) {
              		PyErr_Format(ZstdError, "decompression error: %s", ZSTD_getErrorName(zresult));
-             		goto except;
+             		Py_DecRef(result);
+             		return NULL;
              	}
              	else if (decompressedSize && zresult != decompressedSize) {
              		PyErr_Format(ZstdError, "decompression error: decompressed %zu bytes; expected %llu",
              			zresult, decompressedSize);
-             		goto except;
+             		Py_DecRef(result);
+             		return NULL;
              	}
              	else if (zresult < destCapacity) {
              		if (_PyBytes_Resize(&result, zresult)) {
-             			goto except;
+             		}
+             			Py_DecRef(result);
+             			return NULL;
              	}
-             	goto finally;
-             except:
-             	Py_DecRef(result);
-             	result = NULL;
-             finally:
-             	if (dctx) {
-             		PyMem_FREE(dctx);
              	}
              	return result;
              	ZstdDecompressorIterator* result;
              	size_t skipBytes = 0;
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kkk", kwlist, &reader,
-             		&inSize, &outSize, &skipBytes)) {
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|kkk:read_from", kwlist,
+             		&reader, &inSize, &outSize, &skipBytes)) {
              		return NULL;
              	}
              	goto finally;
              except:
-             	if (result->reader) {
-             		Py_DECREF(result->reader);
-             		result->reader = NULL;
+             	}
+             	Py_CLEAR(result->reader);
              	if (result->buffer) {
              		PyBuffer_Release(result->buffer);
-             		Py_DECREF(result->buffer);
-             		result->buffer = NULL;
+             		Py_CLEAR(result->buffer);
              	}
-             	Py_DECREF(result);
-             	result = NULL;
+             	Py_CLEAR(result);
              finally:
              	size_t outSize = ZSTD_DStreamOutSize();
              	ZstdDecompressionWriter* result;
-             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|k", kwlist, &writer, &outSize)) {
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O|k:write_to", kwlist,
+             		&writer, &outSize)) {
              		return NULL;
              	}
              	return result;
              }
+             PyDoc_STRVAR(Decompressor_decompress_content_dict_chain__doc__,
+             "Decompress a series of chunks using the content dictionary chaining technique\n"
+             );
+             static PyObject* Decompressor_decompress_content_dict_chain(PyObject* self, PyObject* args, PyObject* kwargs) {
+             	static char* kwlist[] = {
+             		"frames",
+             		NULL
+             	};
+             	PyObject* chunks;
+             	Py_ssize_t chunksLen;
+             	Py_ssize_t chunkIndex;
+             	char parity = 0;
+             	PyObject* chunk;
+             	char* chunkData;
+             	Py_ssize_t chunkSize;
+             	ZSTD_DCtx* dctx = NULL;
+             	size_t zresult;
+             	ZSTD_frameParams frameParams;
+             	void* buffer1 = NULL;
+             	size_t buffer1Size = 0;
+             	size_t buffer1ContentSize = 0;
+             	void* buffer2 = NULL;
+             	size_t buffer2Size = 0;
+             	size_t buffer2ContentSize = 0;
+             	void* destBuffer = NULL;
+             	PyObject* result = NULL;
+             	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "O!:decompress_content_dict_chain",
+             		kwlist, &PyList_Type, &chunks)) {
+             		return NULL;
+             	}
+             	chunksLen = PyList_Size(chunks);
+             	if (!chunksLen) {
+             		PyErr_SetString(PyExc_ValueError, "empty input chain");
+             		return NULL;
+             	}
+             	/* The first chunk should not be using a dictionary. We handle it specially. */
+             	chunk = PyList_GetItem(chunks, 0);
+             	if (!PyBytes_Check(chunk)) {
+             		PyErr_SetString(PyExc_ValueError, "chunk 0 must be bytes");
+             		return NULL;
+             	}
+             	/* We require that all chunks be zstd frames and that they have content size set. */
+             	PyBytes_AsStringAndSize(chunk, &chunkData, &chunkSize);
+             	zresult = ZSTD_getFrameParams(&frameParams, (void*)chunkData, chunkSize);
+             	if (ZSTD_isError(zresult)) {
+             		PyErr_SetString(PyExc_ValueError, "chunk 0 is not a valid zstd frame");
+             		return NULL;
+             	}
+             	else if (zresult) {
+             		PyErr_SetString(PyExc_ValueError, "chunk 0 is too small to contain a zstd frame");
+             		return NULL;
+             	}
+             	if (0 == frameParams.frameContentSize) {
+             		PyErr_SetString(PyExc_ValueError, "chunk 0 missing content size in frame");
+             		return NULL;
+             	}
+             	dctx = ZSTD_createDCtx();
+             	if (!dctx) {
+             		PyErr_NoMemory();
+             		goto finally;
+             	}
+             	buffer1Size = frameParams.frameContentSize;
+             	buffer1 = PyMem_Malloc(buffer1Size);
+             	if (!buffer1) {
+             		goto finally;
+             	}
+             	Py_BEGIN_ALLOW_THREADS
+             	zresult = ZSTD_decompressDCtx(dctx, buffer1, buffer1Size, chunkData, chunkSize);
+             	Py_END_ALLOW_THREADS
+             	if (ZSTD_isError(zresult)) {
+             		PyErr_Format(ZstdError, "could not decompress chunk 0: %s", ZSTD_getErrorName(zresult));
+             		goto finally;
+             	}
+             	buffer1ContentSize = zresult;
+             	/* Special case of a simple chain. */
+             	if (1 == chunksLen) {
+             		result = PyBytes_FromStringAndSize(buffer1, buffer1Size);
+             		goto finally;
+             	}
+             	/* This should ideally look at next chunk. But this is slightly simpler. */
+             	buffer2Size = frameParams.frameContentSize;
+             	buffer2 = PyMem_Malloc(buffer2Size);
+             	if (!buffer2) {
+             		goto finally;
+             	}
+             	/* For each subsequent chunk, use the previous fulltext as a content dictionary.
+             	   Our strategy is to have 2 buffers. One holds the previous fulltext (to be
+             	   used as a content dictionary) and the other holds the new fulltext. The
+             	   buffers grow when needed but never decrease in size. This limits the
+             	   memory allocator overhead.
+             	*/
+             	for (chunkIndex = 1; chunkIndex < chunksLen; chunkIndex++) {
+             		chunk = PyList_GetItem(chunks, chunkIndex);
+             		if (!PyBytes_Check(chunk)) {
+             			PyErr_Format(PyExc_ValueError, "chunk %zd must be bytes", chunkIndex);
+             			goto finally;
+             		}
+             		PyBytes_AsStringAndSize(chunk, &chunkData, &chunkSize);
+             		zresult = ZSTD_getFrameParams(&frameParams, (void*)chunkData, chunkSize);
+             		if (ZSTD_isError(zresult)) {
+             			PyErr_Format(PyExc_ValueError, "chunk %zd is not a valid zstd frame", chunkIndex);
+             			goto finally;
+             		}
+             		else if (zresult) {
+             			PyErr_Format(PyExc_ValueError, "chunk %zd is too small to contain a zstd frame", chunkIndex);
+             			goto finally;
+             		}
+             		if (0 == frameParams.frameContentSize) {
+             			PyErr_Format(PyExc_ValueError, "chunk %zd missing content size in frame", chunkIndex);
+             			goto finally;
+             		}
+             		parity = chunkIndex % 2;
+             		/* This could definitely be abstracted to reduce code duplication. */
+             		if (parity) {
+             			/* Resize destination buffer to hold larger content. */
+             			if (buffer2Size < frameParams.frameContentSize) {
+             				buffer2Size = frameParams.frameContentSize;
+             				destBuffer = PyMem_Realloc(buffer2, buffer2Size);
+             				if (!destBuffer) {
+             					goto finally;
+             				}
+             				buffer2 = destBuffer;
+             			}
+             			Py_BEGIN_ALLOW_THREADS
+             			zresult = ZSTD_decompress_usingDict(dctx, buffer2, buffer2Size,
+             				chunkData, chunkSize, buffer1, buffer1ContentSize);
+             			Py_END_ALLOW_THREADS
+             			if (ZSTD_isError(zresult)) {
+             				PyErr_Format(ZstdError, "could not decompress chunk %zd: %s",
+             					chunkIndex, ZSTD_getErrorName(zresult));
+             				goto finally;
+             			}
+             			buffer2ContentSize = zresult;
+             		}
+             		else {
+             			if (buffer1Size < frameParams.frameContentSize) {
+             				buffer1Size = frameParams.frameContentSize;
+             				destBuffer = PyMem_Realloc(buffer1, buffer1Size);
+             				if (!destBuffer) {
+             					goto finally;
+             				}
+             				buffer1 = destBuffer;
+             			}
+             			Py_BEGIN_ALLOW_THREADS
+             			zresult = ZSTD_decompress_usingDict(dctx, buffer1, buffer1Size,
+             				chunkData, chunkSize, buffer2, buffer2ContentSize);
+             			Py_END_ALLOW_THREADS
+             			if (ZSTD_isError(zresult)) {
+             				PyErr_Format(ZstdError, "could not decompress chunk %zd: %s",
+             					chunkIndex, ZSTD_getErrorName(zresult));
+             				goto finally;
+             			}
+             			buffer1ContentSize = zresult;
+             		}
+             	}
+             	result = PyBytes_FromStringAndSize(parity ? buffer2 : buffer1,
+             		parity ? buffer2ContentSize : buffer1ContentSize);
+             finally:
+             	if (buffer2) {
+             		PyMem_Free(buffer2);
+             	}
+             	if (buffer1) {
+             		PyMem_Free(buffer1);
+             	}
+             	if (dctx) {
+             		ZSTD_freeDCtx(dctx);
+             	}
+             	return result;
+             }
              static PyMethodDef Decompressor_methods[] = {
              	{ "copy_stream", (PyCFunction)Decompressor_copy_stream, METH_VARARGS | METH_KEYWORDS,
              	Decompressor_copy_stream__doc__ },
              	Decompressor_read_from__doc__ },
              	{ "write_to", (PyCFunction)Decompressor_write_to, METH_VARARGS | METH_KEYWORDS,
              	Decompressor_write_to__doc__ },
+             	{ "decompress_content_dict_chain", (PyCFunction)Decompressor_decompress_content_dict_chain,
+             	  METH_VARARGS | METH_KEYWORDS, Decompressor_decompress_content_dict_chain__doc__ },
              	{ NULL, NULL }
              };

contrib/python-zstandard/c-ext/dictparams.c

0 +19 -3

              	unsigned notificationLevel;
              	unsigned dictID;
-             	if (!PyArg_ParseTuple(args, "IiII", &selectivityLevel, &compressionLevel,
-             		&notificationLevel, &dictID)) {
+             	if (!PyArg_ParseTuple(args, "IiII:DictParameters",
+             		&selectivityLevel, &compressionLevel, &notificationLevel, &dictID)) {
              		return NULL;
              	}
              	PyObject_Del(self);
              }
+             static PyMemberDef DictParameters_members[] = {
+             	{ "selectivity_level", T_UINT,
+             	  offsetof(DictParametersObject, selectivityLevel), READONLY,
+             	  "selectivity level" },
+             	{ "compression_level", T_INT,
+             	  offsetof(DictParametersObject, compressionLevel), READONLY,
+             	  "compression level" },
+             	{ "notification_level", T_UINT,
+             	  offsetof(DictParametersObject, notificationLevel), READONLY,
+             	  "notification level" },
+             	{ "dict_id", T_UINT,
+             	  offsetof(DictParametersObject, dictID), READONLY,
+             	  "dictionary ID" },
+             	{ NULL }
+             };
              static Py_ssize_t DictParameters_length(PyObject* self) {
              	return 4;
              }
 ,                         /* tp_iter */
 ,                         /* tp_iternext */
 ,                         /* tp_methods */
-,                         /* tp_members */
+             	DictParameters_members,    /* tp_members */
 ,                         /* tp_getset */
 ,                         /* tp_base */
 ,                         /* tp_dict */

contrib/python-zstandard/c-ext/python-zstandard.h

0 +14 -2

              #define PY_SSIZE_T_CLEAN
              #include <Python.h>
+             #include "structmember.h"
              #define ZSTD_STATIC_LINKING_ONLY
              #define ZDICT_STATIC_LINKING_ONLY
              #include "zstd.h"
              #include "zdict.h"
-             #define PYTHON_ZSTANDARD_VERSION "0.6.0"
+             #define PYTHON_ZSTANDARD_VERSION "0.7.0"
              typedef enum {
              	compressorobj_flush_finish,
              typedef struct {
              	PyObject_HEAD
+             	unsigned long long frameContentSize;
+             	unsigned windowSize;
+             	unsigned dictID;
+             	char checksumFlag;
+             } FrameParametersObject;
+             extern PyTypeObject FrameParametersType;
+             typedef struct {
+             	PyObject_HEAD
              	unsigned selectivityLevel;
              	int compressionLevel;
              	unsigned notificationLevel;
              typedef struct {
              	PyObject_HEAD
-             	ZSTD_DCtx* refdctx;
+             	ZSTD_DCtx* dctx;
              	ZstdCompressionDict* dict;
              	ZSTD_DDict* ddict;
              void ztopy_compression_parameters(CompressionParametersObject* params, ZSTD_compressionParameters* zparams);
              CompressionParametersObject* get_compression_parameters(PyObject* self, PyObject* args);
+             FrameParametersObject* get_frame_parameters(PyObject* self, PyObject* args);
              PyObject* estimate_compression_context_size(PyObject* self, PyObject* args);
              ZSTD_CStream* CStream_from_ZstdCompressor(ZstdCompressor* compressor, Py_ssize_t sourceSize);
              ZSTD_DStream* DStream_from_ZstdDecompressor(ZstdDecompressor* decompressor);

contrib/python-zstandard/make_cffi.py

0 +53 -7

              import cffi
              import distutils.ccompiler
              import os
+             import re
              import subprocess
              import tempfile
                  'common/entropy_common.c',
                  'common/error_private.c',
                  'common/fse_decompress.c',
+                 'common/pool.c',
+                 'common/threading.c',
                  'common/xxhash.c',
                  'common/zstd_common.c',
                  'compress/fse_compress.c',
                  'compress/zstd_compress.c',
                  'decompress/huf_decompress.c',
                  'decompress/zstd_decompress.c',
+                 'dictBuilder/cover.c',
                  'dictBuilder/divsufsort.c',
                  'dictBuilder/zdict.c',
              )]
+             HEADERS = [os.path.join(HERE, 'zstd', *p) for p in (
+                 ('zstd.h',),
+                 ('common', 'pool.h'),
+                 ('dictBuilder', 'zdict.h'),
+             )]
              INCLUDE_DIRS = [os.path.join(HERE, d) for d in (
                  'zstd',
                  'zstd/common',
                  args.extend([
                      '-E',
                      '-DZSTD_STATIC_LINKING_ONLY',
+                     '-DZDICT_STATIC_LINKING_ONLY',
                  ])
              elif compiler.compiler_type == 'msvc':
                  args = [compiler.cc]
                  args.extend([
                      '/EP',
                      '/DZSTD_STATIC_LINKING_ONLY',
+                     '/DZDICT_STATIC_LINKING_ONLY',
                  ])
              else:
                  raise Exception('unsupported compiler type: %s' % compiler.compiler_type)
+             def preprocess(path):
              # zstd.h includes <stddef.h>, which is also included by cffi's boilerplate.
              # This can lead to duplicate declarations. So we strip this include from the
              # preprocessor invocation.
-             with open(os.path.join(HERE, 'zstd', 'zstd.h'), 'rb') as fh:
+                 with open(path, 'rb') as fh:
                  lines = [l for l in fh if not l.startswith(b'#include <stddef.h>')]
              fd, input_file = tempfile.mkstemp(suffix='.h')
              os.write(fd, b''.join(lines))
              os.close(fd)
-             args.append(input_file)
              try:
-                 process = subprocess.Popen(args, stdout=subprocess.PIPE)
+                     process = subprocess.Popen(args + [input_file], stdout=subprocess.PIPE)
                  output = process.communicate()[0]
                  ret = process.poll()
                  if ret:
                      raise Exception('preprocessor exited with error')
+                     return output
              finally:
                  os.unlink(input_file)
-             def normalize_output():
+             def normalize_output(output):
                  lines = []
                  for line in output.splitlines():
                      # CFFI's parser doesn't like __attribute__ on UNIX compilers.
                      if line.startswith(b'__attribute__ ((visibility ("default"))) '):
                          line = line[len(b'__attribute__ ((visibility ("default"))) '):]
+                     if line.startswith(b'__attribute__((deprecated('):
+                         continue
+                     elif b'__declspec(deprecated(' in line:
+                         continue
                      lines.append(line)
                  return b'\n'.join(lines)
              ffi = cffi.FFI()
              ffi.set_source('_zstd_cffi', '''
+             #include "mem.h"
              #define ZSTD_STATIC_LINKING_ONLY
              #include "zstd.h"
+             #define ZDICT_STATIC_LINKING_ONLY
+             #include "pool.h"
+             #include "zdict.h"
              ''', sources=SOURCES, include_dirs=INCLUDE_DIRS)
-             ffi.cdef(normalize_output().decode('latin1'))
+             DEFINE = re.compile(b'^\\#define ([a-zA-Z0-9_]+) ')
+             sources = []
+             for header in HEADERS:
+                 preprocessed = preprocess(header)
+                 sources.append(normalize_output(preprocessed))
+                 # Do another pass over source and find constants that were preprocessed
+                 # away.
+                 with open(header, 'rb') as fh:
+                     for line in fh:
+                         line = line.strip()
+                         m = DEFINE.match(line)
+                         if not m:
+                             continue
+                         # The parser doesn't like some constants with complex values.
+                         if m.group(1) in (b'ZSTD_LIB_VERSION', b'ZSTD_VERSION_STRING'):
+                             continue
+                         sources.append(m.group(0) + b' ...')
+             ffi.cdef(u'\n'.join(s.decode('latin1') for s in sources))
              if __name__ == '__main__':
                  ffi.compile()

contrib/python-zstandard/setup.py

0 +1 0

                      'Programming Language :: Python :: 3.3',
                      'Programming Language :: Python :: 3.4',
                      'Programming Language :: Python :: 3.5',
+                     'Programming Language :: Python :: 3.6',
                  ],
                  keywords='zstandard zstd compression',
                  ext_modules=extensions,

contrib/python-zstandard/setup_zstd.py

0 +5 0

                  'common/entropy_common.c',
                  'common/error_private.c',
                  'common/fse_decompress.c',
+                 'common/pool.c',
+                 'common/threading.c',
                  'common/xxhash.c',
                  'common/zstd_common.c',
                  'compress/fse_compress.c',
                  'compress/zstd_compress.c',
                  'decompress/huf_decompress.c',
                  'decompress/zstd_decompress.c',
+                 'dictBuilder/cover.c',
                  'dictBuilder/divsufsort.c',
                  'dictBuilder/zdict.c',
              )]
              zstd_sources_legacy = ['zstd/%s' % p for p in (
+                 'deprecated/zbuff_common.c',
                  'deprecated/zbuff_compress.c',
                  'deprecated/zbuff_decompress.c',
                  'legacy/zstd_v01.c',
                  'c-ext/decompressoriterator.c',
                  'c-ext/decompressionwriter.c',
                  'c-ext/dictparams.c',
+                 'c-ext/frameparams.c',
              ]
              zstd_depends = [

contrib/python-zstandard/tests/common.py

0 +46 0

		@@ -1,4 +1,50 b''
	1	import inspect
1	2	import io
	3	import types
	4
	5
	6	def make_cffi(cls):
	7	"""Decorator to add CFFI versions of each test method."""
	8
	9	try:
	10	import zstd_cffi
	11	except ImportError:
	12	return cls
	13
	14	# If CFFI version is available, dynamically construct test methods
	15	# that use it.
	16
	17	for attr in dir(cls):
	18	fn = getattr(cls, attr)
	19	if not inspect.ismethod(fn) and not inspect.isfunction(fn):
	20	continue
	21
	22	if not fn.__name__.startswith('test_'):
	23	continue
	24
	25	name = '%s_cffi' % fn.__name__
	26
	27	# Replace the "zstd" symbol with the CFFI module instance. Then copy
	28	# the function object and install it in a new attribute.
	29	if isinstance(fn, types.FunctionType):
	30	globs = dict(fn.__globals__)
	31	globs['zstd'] = zstd_cffi
	32	new_fn = types.FunctionType(fn.__code__, globs, name,
	33	fn.__defaults__, fn.__closure__)
	34	new_method = new_fn
	35	else:
	36	globs = dict(fn.__func__.func_globals)
	37	globs['zstd'] = zstd_cffi
	38	new_fn = types.FunctionType(fn.__func__.func_code, globs, name,
	39	fn.__func__.func_defaults,
	40	fn.__func__.func_closure)
	41	new_method = types.UnboundMethodType(new_fn, fn.im_self,
	42	fn.im_class)
	43
	44	setattr(cls, name, new_method)
	45
	46	return cls
	47
2	48
3	49	class OpCountingBytesIO(io.BytesIO):
4	50	def __init__(self, args, *kwargs):

contrib/python-zstandard/tests/test_compressor.py

0 +180 -41

              import zstd
-             from .common import OpCountingBytesIO
+             from .common import (
+                 make_cffi,
+                 OpCountingBytesIO,
+             )
              if sys.version_info[0] >= 3:
                  next = lambda it: it.next()
+             @make_cffi
              class TestCompressor(unittest.TestCase):
                  def test_level_bounds(self):
                      with self.assertRaises(ValueError):
                          zstd.ZstdCompressor(level=23)
+             @make_cffi
              class TestCompressor_compress(unittest.TestCase):
                  def test_compress_empty(self):
                      cctx = zstd.ZstdCompressor(level=1)
-                     cctx.compress(b'')
-                     cctx = zstd.ZstdCompressor(level=22)
-                     cctx.compress(b'')
-                 def test_compress_empty(self):
-                     cctx = zstd.ZstdCompressor(level=1)
-                     self.assertEqual(cctx.compress(b''),
-                                      b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
+                     result = cctx.compress(b'')
+                     self.assertEqual(result, b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
+                     params = zstd.get_frame_parameters(result)
+                     self.assertEqual(params.content_size, 0)
+                     self.assertEqual(params.window_size, 524288)
+                     self.assertEqual(params.dict_id, 0)
+                     self.assertFalse(params.has_checksum, 0)
                      # TODO should be temporary until https://github.com/facebook/zstd/issues/506
                      # is fixed.
                      self.assertEqual(len(result), 999)
                      self.assertEqual(result[0:4], b'\x28\xb5\x2f\xfd')
+                     # This matches the test for read_from() below.
+                     cctx = zstd.ZstdCompressor(level=1)
+                     result = cctx.compress(b'f' * zstd.COMPRESSION_RECOMMENDED_INPUT_SIZE + b'o')
+                     self.assertEqual(result, b'\x28\xb5\x2f\xfd\x00\x40\x54\x00\x00'
+                                              b'\x10\x66\x66\x01\x00\xfb\xff\x39\xc0'
+                                              b'\x02\x09\x00\x00\x6f')
                  def test_write_checksum(self):
                      cctx = zstd.ZstdCompressor(level=1)
                      no_checksum = cctx.compress(b'foobar')
                      self.assertEqual(len(with_checksum), len(no_checksum) + 4)
+                     no_params = zstd.get_frame_parameters(no_checksum)
+                     with_params = zstd.get_frame_parameters(with_checksum)
+                     self.assertFalse(no_params.has_checksum)
+                     self.assertTrue(with_params.has_checksum)
                  def test_write_content_size(self):
                      cctx = zstd.ZstdCompressor(level=1)
                      no_size = cctx.compress(b'foobar' * 256)
                      self.assertEqual(len(with_size), len(no_size) + 1)
+                     no_params = zstd.get_frame_parameters(no_size)
+                     with_params = zstd.get_frame_parameters(with_size)
+                     self.assertEqual(no_params.content_size, 0)
+                     self.assertEqual(with_params.content_size, 1536)
                  def test_no_dict_id(self):
                      samples = []
                      for i in range(128):
                      self.assertEqual(len(with_dict_id), len(no_dict_id) + 4)
+                     no_params = zstd.get_frame_parameters(no_dict_id)
+                     with_params = zstd.get_frame_parameters(with_dict_id)
+                     self.assertEqual(no_params.dict_id, 0)
+                     self.assertEqual(with_params.dict_id, 1584102229)
                  def test_compress_dict_multiple(self):
                      samples = []
                      for i in range(128):
                          cctx.compress(b'foo bar foobar foo bar foobar')
+             @make_cffi
              class TestCompressor_compressobj(unittest.TestCase):
                  def test_compressobj_empty(self):
                      cctx = zstd.ZstdCompressor(level=1)
                      self.assertEqual(len(result), 999)
                      self.assertEqual(result[0:4], b'\x28\xb5\x2f\xfd')
+                     params = zstd.get_frame_parameters(result)
+                     self.assertEqual(params.content_size, 0)
+                     self.assertEqual(params.window_size, 1048576)
+                     self.assertEqual(params.dict_id, 0)
+                     self.assertFalse(params.has_checksum)
                  def test_write_checksum(self):
                      cctx = zstd.ZstdCompressor(level=1)
                      cobj = cctx.compressobj()
                      cobj = cctx.compressobj()
                      with_checksum = cobj.compress(b'foobar') + cobj.flush()
+                     no_params = zstd.get_frame_parameters(no_checksum)
+                     with_params = zstd.get_frame_parameters(with_checksum)
+                     self.assertEqual(no_params.content_size, 0)
+                     self.assertEqual(with_params.content_size, 0)
+                     self.assertEqual(no_params.dict_id, 0)
+                     self.assertEqual(with_params.dict_id, 0)
+                     self.assertFalse(no_params.has_checksum)
+                     self.assertTrue(with_params.has_checksum)
                      self.assertEqual(len(with_checksum), len(no_checksum) + 4)
                  def test_write_content_size(self):
                      cobj = cctx.compressobj(size=len(b'foobar' * 256))
                      with_size = cobj.compress(b'foobar' * 256) + cobj.flush()
+                     no_params = zstd.get_frame_parameters(no_size)
+                     with_params = zstd.get_frame_parameters(with_size)
+                     self.assertEqual(no_params.content_size, 0)
+                     self.assertEqual(with_params.content_size, 1536)
+                     self.assertEqual(no_params.dict_id, 0)
+                     self.assertEqual(with_params.dict_id, 0)
+                     self.assertFalse(no_params.has_checksum)
+                     self.assertFalse(with_params.has_checksum)
                      self.assertEqual(len(with_size), len(no_size) + 1)
                  def test_compress_after_finished(self):
                      self.assertEqual(header, b'\x01\x00\x00')
+             @make_cffi
              class TestCompressor_copy_stream(unittest.TestCase):
                  def test_no_read(self):
                      source = object()
                      self.assertEqual(r, 255 * 16384)
                      self.assertEqual(w, 999)
+                     params = zstd.get_frame_parameters(dest.getvalue())
+                     self.assertEqual(params.content_size, 0)
+                     self.assertEqual(params.window_size, 1048576)
+                     self.assertEqual(params.dict_id, 0)
+                     self.assertFalse(params.has_checksum)
                  def test_write_checksum(self):
                      source = io.BytesIO(b'foobar')
                      no_checksum = io.BytesIO()
                      self.assertEqual(len(with_checksum.getvalue()),
                                       len(no_checksum.getvalue()) + 4)
+                     no_params = zstd.get_frame_parameters(no_checksum.getvalue())
+                     with_params = zstd.get_frame_parameters(with_checksum.getvalue())
+                     self.assertEqual(no_params.content_size, 0)
+                     self.assertEqual(with_params.content_size, 0)
+                     self.assertEqual(no_params.dict_id, 0)
+                     self.assertEqual(with_params.dict_id, 0)
+                     self.assertFalse(no_params.has_checksum)
+                     self.assertTrue(with_params.has_checksum)
                  def test_write_content_size(self):
                      source = io.BytesIO(b'foobar' * 256)
                      no_size = io.BytesIO()
                      self.assertEqual(len(with_size.getvalue()),
                                       len(no_size.getvalue()) + 1)
+                     no_params = zstd.get_frame_parameters(no_size.getvalue())
+                     with_params = zstd.get_frame_parameters(with_size.getvalue())
+                     self.assertEqual(no_params.content_size, 0)
+                     self.assertEqual(with_params.content_size, 1536)
+                     self.assertEqual(no_params.dict_id, 0)
+                     self.assertEqual(with_params.dict_id, 0)
+                     self.assertFalse(no_params.has_checksum)
+                     self.assertFalse(with_params.has_checksum)
                  def test_read_write_size(self):
                      source = OpCountingBytesIO(b'foobarfoobar')
                      dest = OpCountingBytesIO()
                  return buffer.getvalue()
+             @make_cffi
              class TestCompressor_write_to(unittest.TestCase):
                  def test_empty(self):
-                     self.assertEqual(compress(b'', 1),
-                                      b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
+                     result = compress(b'', 1)
+                     self.assertEqual(result, b'\x28\xb5\x2f\xfd\x00\x48\x01\x00\x00')
+                     params = zstd.get_frame_parameters(result)
+                     self.assertEqual(params.content_size, 0)
+                     self.assertEqual(params.window_size, 524288)
+                     self.assertEqual(params.dict_id, 0)
+                     self.assertFalse(params.has_checksum)
                  def test_multiple_compress(self):
                      buffer = io.BytesIO()
                      cctx = zstd.ZstdCompressor(level=5)
                      with cctx.write_to(buffer) as compressor:
-                         compressor.write(b'foo')
-                         compressor.write(b'bar')
-                         compressor.write(b'x' * 8192)
+                         self.assertEqual(compressor.write(b'foo'), 0)
+                         self.assertEqual(compressor.write(b'bar'), 0)
+                         self.assertEqual(compressor.write(b'x' * 8192), 0)
                      result = buffer.getvalue()
                      self.assertEqual(result,
                      buffer = io.BytesIO()
                      cctx = zstd.ZstdCompressor(level=9, dict_data=d)
                      with cctx.write_to(buffer) as compressor:
-                         compressor.write(b'foo')
-                         compressor.write(b'bar')
-                         compressor.write(b'foo' * 16384)
+                         self.assertEqual(compressor.write(b'foo'), 0)
+                         self.assertEqual(compressor.write(b'bar'), 0)
+                         self.assertEqual(compressor.write(b'foo' * 16384), 634)
                      compressed = buffer.getvalue()
+                     params = zstd.get_frame_parameters(compressed)
+                     self.assertEqual(params.content_size, 0)
+                     self.assertEqual(params.window_size, 1024)
+                     self.assertEqual(params.dict_id, d.dict_id())
+                     self.assertFalse(params.has_checksum)
+                     self.assertEqual(compressed[0:32],
+                                      b'\x28\xb5\x2f\xfd\x03\x00\x55\x7b\x6b\x5e\x54\x00'
+                                      b'\x00\x00\x02\xfc\xf4\xa5\xba\x23\x3f\x85\xb3\x54'
+                                      b'\x00\x00\x18\x6f\x6f\x66\x01\x00')
                      h = hashlib.sha1(compressed).hexdigest()
                      self.assertEqual(h, '1c5bcd25181bcd8c1a73ea8773323e0056129f92')
                      buffer = io.BytesIO()
                      cctx = zstd.ZstdCompressor(compression_params=params)
                      with cctx.write_to(buffer) as compressor:
-                         compressor.write(b'foo')
-                         compressor.write(b'bar')
-                         compressor.write(b'foobar' * 16384)
+                         self.assertEqual(compressor.write(b'foo'), 0)
+                         self.assertEqual(compressor.write(b'bar'), 0)
+                         self.assertEqual(compressor.write(b'foobar' * 16384), 0)
                      compressed = buffer.getvalue()
+                     params = zstd.get_frame_parameters(compressed)
+                     self.assertEqual(params.content_size, 0)
+                     self.assertEqual(params.window_size, 1048576)
+                     self.assertEqual(params.dict_id, 0)
+                     self.assertFalse(params.has_checksum)
                      h = hashlib.sha1(compressed).hexdigest()
                      self.assertEqual(h, '1ae31f270ed7de14235221a604b31ecd517ebd99')
                      no_checksum = io.BytesIO()
                      cctx = zstd.ZstdCompressor(level=1)
                      with cctx.write_to(no_checksum) as compressor:
-                         compressor.write(b'foobar')
+                         self.assertEqual(compressor.write(b'foobar'), 0)
                      with_checksum = io.BytesIO()
                      cctx = zstd.ZstdCompressor(level=1, write_checksum=True)
                      with cctx.write_to(with_checksum) as compressor:
-                         compressor.write(b'foobar')
+                         self.assertEqual(compressor.write(b'foobar'), 0)
+                     no_params = zstd.get_frame_parameters(no_checksum.getvalue())
+                     with_params = zstd.get_frame_parameters(with_checksum.getvalue())
+                     self.assertEqual(no_params.content_size, 0)
+                     self.assertEqual(with_params.content_size, 0)
+                     self.assertEqual(no_params.dict_id, 0)
+                     self.assertEqual(with_params.dict_id, 0)
+                     self.assertFalse(no_params.has_checksum)
+                     self.assertTrue(with_params.has_checksum)
                      self.assertEqual(len(with_checksum.getvalue()),
                                       len(no_checksum.getvalue()) + 4)
                      no_size = io.BytesIO()
                      cctx = zstd.ZstdCompressor(level=1)
                      with cctx.write_to(no_size) as compressor:
-                         compressor.write(b'foobar' * 256)
+                         self.assertEqual(compressor.write(b'foobar' * 256), 0)
                      with_size = io.BytesIO()
                      cctx = zstd.ZstdCompressor(level=1, write_content_size=True)
                      with cctx.write_to(with_size) as compressor:
-                         compressor.write(b'foobar' * 256)
+                         self.assertEqual(compressor.write(b'foobar' * 256), 0)
                      # Source size is not known in streaming mode, so header not
                      # written.
                      # Declaring size will write the header.
                      with_size = io.BytesIO()
                      with cctx.write_to(with_size, size=len(b'foobar' * 256)) as compressor:
-                         compressor.write(b'foobar' * 256)
+                         self.assertEqual(compressor.write(b'foobar' * 256), 0)
+                     no_params = zstd.get_frame_parameters(no_size.getvalue())
+                     with_params = zstd.get_frame_parameters(with_size.getvalue())
+                     self.assertEqual(no_params.content_size, 0)
+                     self.assertEqual(with_params.content_size, 1536)
+                     self.assertEqual(no_params.dict_id, 0)
+                     self.assertEqual(with_params.dict_id, 0)
+                     self.assertFalse(no_params.has_checksum)
+                     self.assertFalse(with_params.has_checksum)
                      self.assertEqual(len(with_size.getvalue()),
                                       len(no_size.getvalue()) + 1)
                      with_dict_id = io.BytesIO()
                      cctx = zstd.ZstdCompressor(level=1, dict_data=d)
                      with cctx.write_to(with_dict_id) as compressor:
-                         compressor.write(b'foobarfoobar')
+                         self.assertEqual(compressor.write(b'foobarfoobar'), 0)
                      cctx = zstd.ZstdCompressor(level=1, dict_data=d, write_dict_id=False)
                      no_dict_id = io.BytesIO()
                      with cctx.write_to(no_dict_id) as compressor:
-                         compressor.write(b'foobarfoobar')
+                         self.assertEqual(compressor.write(b'foobarfoobar'), 0)
+                     no_params = zstd.get_frame_parameters(no_dict_id.getvalue())
+                     with_params = zstd.get_frame_parameters(with_dict_id.getvalue())
+                     self.assertEqual(no_params.content_size, 0)
+                     self.assertEqual(with_params.content_size, 0)
+                     self.assertEqual(no_params.dict_id, 0)
+                     self.assertEqual(with_params.dict_id, d.dict_id())
+                     self.assertFalse(no_params.has_checksum)
+                     self.assertFalse(with_params.has_checksum)
                      self.assertEqual(len(with_dict_id.getvalue()),
                                       len(no_dict_id.getvalue()) + 4)
                      cctx = zstd.ZstdCompressor(level=3)
                      dest = OpCountingBytesIO()
                      with cctx.write_to(dest, write_size=1) as compressor:
-                         compressor.write(b'foo')
-                         compressor.write(b'bar')
-                         compressor.write(b'foobar')
+                         self.assertEqual(compressor.write(b'foo'), 0)
+                         self.assertEqual(compressor.write(b'bar'), 0)
+                         self.assertEqual(compressor.write(b'foobar'), 0)
                      self.assertEqual(len(dest.getvalue()), dest._write_count)
                      cctx = zstd.ZstdCompressor(level=3)
                      dest = OpCountingBytesIO()
                      with cctx.write_to(dest) as compressor:
-                         compressor.write(b'foo')
+                         self.assertEqual(compressor.write(b'foo'), 0)
                          self.assertEqual(dest._write_count, 0)
-                         compressor.flush()
+                         self.assertEqual(compressor.flush(), 12)
                          self.assertEqual(dest._write_count, 1)
-                         compressor.write(b'bar')
+                         self.assertEqual(compressor.write(b'bar'), 0)
                          self.assertEqual(dest._write_count, 1)
-                         compressor.flush()
+                         self.assertEqual(compressor.flush(), 6)
                          self.assertEqual(dest._write_count, 2)
-                         compressor.write(b'baz')
+                         self.assertEqual(compressor.write(b'baz'), 0)
                      self.assertEqual(dest._write_count, 3)
                      cctx = zstd.ZstdCompressor(level=3, write_checksum=True)
                      dest = OpCountingBytesIO()
                      with cctx.write_to(dest) as compressor:
-                         compressor.write(b'foobar' * 8192)
+                         self.assertEqual(compressor.write(b'foobar' * 8192), 0)
                          count = dest._write_count
                          offset = dest.tell()
-                         compressor.flush()
+                         self.assertEqual(compressor.flush(), 23)
                          self.assertGreater(dest._write_count, count)
                          self.assertGreater(dest.tell(), offset)
                          offset = dest.tell()
                      self.assertEqual(header, b'\x01\x00\x00')
+             @make_cffi
              class TestCompressor_read_from(unittest.TestCase):
                  def test_type_validation(self):
                      cctx = zstd.ZstdCompressor()
                      # Object with read() works.
-                     cctx.read_from(io.BytesIO())
+                     for chunk in cctx.read_from(io.BytesIO()):
+                         pass
                      # Buffer protocol works.
-                     cctx.read_from(b'foobar')
+                     for chunk in cctx.read_from(b'foobar'):
+                         pass
                      with self.assertRaisesRegexp(ValueError, 'must pass an object with a read'):
-                         cctx.read_from(True)
+                         for chunk in cctx.read_from(True):
+                             pass
                  def test_read_empty(self):
                      cctx = zstd.ZstdCompressor(level=1)
                      # We should get the same output as the one-shot compression mechanism.
                      self.assertEqual(b''.join(chunks), cctx.compress(source.getvalue()))
+                     params = zstd.get_frame_parameters(b''.join(chunks))
+                     self.assertEqual(params.content_size, 0)
+                     self.assertEqual(params.window_size, 262144)
+                     self.assertEqual(params.dict_id, 0)
+                     self.assertFalse(params.has_checksum)
                      # Now check the buffer protocol.
                      it = cctx.read_from(source.getvalue())
                      chunks = list(it)

contrib/python-zstandard/tests/test_data_structures.py

0 +83 -4

		@@ -13,6 +13,12 b' except ImportError:'
13	13
14	14	import zstd
15	15
	16	from . common import (
	17	make_cffi,
	18	)
	19
	20
	21	@make_cffi
16	22	class TestCompressionParameters(unittest.TestCase):
17	23	def test_init_bad_arg_type(self):
18	24	with self.assertRaises(TypeError):
		@@ -42,7 +48,81 b' class TestCompressionParameters(unittest'
42	48	p = zstd.get_compression_parameters(1)
43	49	self.assertIsInstance(p, zstd.CompressionParameters)
44	50
45		self.assertEqual(p[0], 19)
	51	self.assertEqual(p.window_log, 19)
	52
	53	def test_members(self):
	54	p = zstd.CompressionParameters(10, 6, 7, 4, 5, 8, 1)
	55	self.assertEqual(p.window_log, 10)
	56	self.assertEqual(p.chain_log, 6)
	57	self.assertEqual(p.hash_log, 7)
	58	self.assertEqual(p.search_log, 4)
	59	self.assertEqual(p.search_length, 5)
	60	self.assertEqual(p.target_length, 8)
	61	self.assertEqual(p.strategy, 1)
	62
	63
	64	@make_cffi
	65	class TestFrameParameters(unittest.TestCase):
	66	def test_invalid_type(self):
	67	with self.assertRaises(TypeError):
	68	zstd.get_frame_parameters(None)
	69
	70	with self.assertRaises(TypeError):
	71	zstd.get_frame_parameters(u'foobarbaz')
	72
	73	def test_invalid_input_sizes(self):
	74	with self.assertRaisesRegexp(zstd.ZstdError, 'not enough data for frame'):
	75	zstd.get_frame_parameters(b'')
	76
	77	with self.assertRaisesRegexp(zstd.ZstdError, 'not enough data for frame'):
	78	zstd.get_frame_parameters(zstd.FRAME_HEADER)
	79
	80	def test_invalid_frame(self):
	81	with self.assertRaisesRegexp(zstd.ZstdError, 'Unknown frame descriptor'):
	82	zstd.get_frame_parameters(b'foobarbaz')
	83
	84	def test_attributes(self):
	85	params = zstd.get_frame_parameters(zstd.FRAME_HEADER + b'\x00\x00')
	86	self.assertEqual(params.content_size, 0)
	87	self.assertEqual(params.window_size, 1024)
	88	self.assertEqual(params.dict_id, 0)
	89	self.assertFalse(params.has_checksum)
	90
	91	# Lowest 2 bits indicate a dictionary and length. Here, the dict id is 1 byte.
	92	params = zstd.get_frame_parameters(zstd.FRAME_HEADER + b'\x01\x00\xff')
	93	self.assertEqual(params.content_size, 0)
	94	self.assertEqual(params.window_size, 1024)
	95	self.assertEqual(params.dict_id, 255)
	96	self.assertFalse(params.has_checksum)
	97
	98	# Lowest 3rd bit indicates if checksum is present.
	99	params = zstd.get_frame_parameters(zstd.FRAME_HEADER + b'\x04\x00')
	100	self.assertEqual(params.content_size, 0)
	101	self.assertEqual(params.window_size, 1024)
	102	self.assertEqual(params.dict_id, 0)
	103	self.assertTrue(params.has_checksum)
	104
	105	# Upper 2 bits indicate content size.
	106	params = zstd.get_frame_parameters(zstd.FRAME_HEADER + b'\x40\x00\xff\x00')
	107	self.assertEqual(params.content_size, 511)
	108	self.assertEqual(params.window_size, 1024)
	109	self.assertEqual(params.dict_id, 0)
	110	self.assertFalse(params.has_checksum)
	111
	112	# Window descriptor is 2nd byte after frame header.
	113	params = zstd.get_frame_parameters(zstd.FRAME_HEADER + b'\x00\x40')
	114	self.assertEqual(params.content_size, 0)
	115	self.assertEqual(params.window_size, 262144)
	116	self.assertEqual(params.dict_id, 0)
	117	self.assertFalse(params.has_checksum)
	118
	119	# Set multiple things.
	120	params = zstd.get_frame_parameters(zstd.FRAME_HEADER + b'\x45\x40\x0f\x10\x00')
	121	self.assertEqual(params.content_size, 272)
	122	self.assertEqual(params.window_size, 262144)
	123	self.assertEqual(params.dict_id, 15)
	124	self.assertTrue(params.has_checksum)
	125
46	126
47	127	if hypothesis:
48	128	s_windowlog = strategies.integers(min_value=zstd.WINDOWLOG_MIN,
		@@ -65,6 +145,8 b' if hypothesis:'
65	145	zstd.STRATEGY_BTLAZY2,
66	146	zstd.STRATEGY_BTOPT))
67	147
	148
	149	@make_cffi
68	150	class TestCompressionParametersHypothesis(unittest.TestCase):
69	151	@hypothesis.given(s_windowlog, s_chainlog, s_hashlog, s_searchlog,
70	152	s_searchlength, s_targetlength, s_strategy)
		@@ -73,9 +155,6 b' if hypothesis:'
73	155	p = zstd.CompressionParameters(windowlog, chainlog, hashlog,
74	156	searchlog, searchlength,
75	157	targetlength, strategy)
76		self.assertEqual(tuple(p),
77		(windowlog, chainlog, hashlog, searchlog,
78		searchlength, targetlength, strategy))
79	158
80	159	# Verify we can instantiate a compressor with the supplied values.
81	160	# ZSTD_checkCParams moves the goal posts on us from what's advertised

contrib/python-zstandard/tests/test_decompressor.py

0 +104 -5

              import zstd
-             from .common import OpCountingBytesIO
+             from .common import (
+                 make_cffi,
+                 OpCountingBytesIO,
+             )
              if sys.version_info[0] >= 3:
                  next = lambda it: it.next()
+             @make_cffi
              class TestDecompressor_decompress(unittest.TestCase):
                  def test_empty_input(self):
                      dctx = zstd.ZstdDecompressor()
                          self.assertEqual(decompressed, sources[i])
+             @make_cffi
              class TestDecompressor_copy_stream(unittest.TestCase):
                  def test_no_read(self):
                      source = object()
                      self.assertEqual(dest._write_count, len(dest.getvalue()))
+             @make_cffi
              class TestDecompressor_decompressobj(unittest.TestCase):
                  def test_simple(self):
                      data = zstd.ZstdCompressor(level=1).compress(b'foobar')
                  return buffer.getvalue()
+             @make_cffi
              class TestDecompressor_write_to(unittest.TestCase):
                  def test_empty_roundtrip(self):
                      cctx = zstd.ZstdCompressor()
                      buffer = io.BytesIO()
                      cctx = zstd.ZstdCompressor(dict_data=d)
                      with cctx.write_to(buffer) as compressor:
-                         compressor.write(orig)
+                         self.assertEqual(compressor.write(orig), 1544)
                      compressed = buffer.getvalue()
                      buffer = io.BytesIO()
                      dctx = zstd.ZstdDecompressor(dict_data=d)
                      with dctx.write_to(buffer) as decompressor:
-                         decompressor.write(compressed)
+                         self.assertEqual(decompressor.write(compressed), len(orig))
                      self.assertEqual(buffer.getvalue(), orig)
                      self.assertEqual(dest._write_count, len(dest.getvalue()))
+             @make_cffi
              class TestDecompressor_read_from(unittest.TestCase):
                  def test_type_validation(self):
                      dctx = zstd.ZstdDecompressor()
                      dctx.read_from(b'foobar')
                      with self.assertRaisesRegexp(ValueError, 'must pass an object with a read'):
-                         dctx.read_from(True)
+                         b''.join(dctx.read_from(True))
                  def test_empty_input(self):
                      dctx = zstd.ZstdDecompressor()
                      dctx = zstd.ZstdDecompressor()
                      with self.assertRaisesRegexp(ValueError, 'skip_bytes must be smaller than read_size'):
-                         dctx.read_from(b'', skip_bytes=1, read_size=1)
+                         b''.join(dctx.read_from(b'', skip_bytes=1, read_size=1))
                      with self.assertRaisesRegexp(ValueError, 'skip_bytes larger than first input chunk'):
                          b''.join(dctx.read_from(b'foobar', skip_bytes=10))
                          self.assertEqual(len(chunk), 1)
                      self.assertEqual(source._read_count, len(source.getvalue()))
+             @make_cffi
+             class TestDecompressor_content_dict_chain(unittest.TestCase):
+                 def test_bad_inputs_simple(self):
+                     dctx = zstd.ZstdDecompressor()
+                     with self.assertRaises(TypeError):
+                         dctx.decompress_content_dict_chain(b'foo')
+                     with self.assertRaises(TypeError):
+                         dctx.decompress_content_dict_chain((b'foo', b'bar'))
+                     with self.assertRaisesRegexp(ValueError, 'empty input chain'):
+                         dctx.decompress_content_dict_chain([])
+                     with self.assertRaisesRegexp(ValueError, 'chunk 0 must be bytes'):
+                         dctx.decompress_content_dict_chain([u'foo'])
+                     with self.assertRaisesRegexp(ValueError, 'chunk 0 must be bytes'):
+                         dctx.decompress_content_dict_chain([True])
+                     with self.assertRaisesRegexp(ValueError, 'chunk 0 is too small to contain a zstd frame'):
+                         dctx.decompress_content_dict_chain([zstd.FRAME_HEADER])
+                     with self.assertRaisesRegexp(ValueError, 'chunk 0 is not a valid zstd frame'):
+                         dctx.decompress_content_dict_chain([b'foo' * 8])
+                     no_size = zstd.ZstdCompressor().compress(b'foo' * 64)
+                     with self.assertRaisesRegexp(ValueError, 'chunk 0 missing content size in frame'):
+                         dctx.decompress_content_dict_chain([no_size])
+                     # Corrupt first frame.
+                     frame = zstd.ZstdCompressor(write_content_size=True).compress(b'foo' * 64)
+                     frame = frame[0:12] + frame[15:]
+                     with self.assertRaisesRegexp(zstd.ZstdError, 'could not decompress chunk 0'):
+                         dctx.decompress_content_dict_chain([frame])
+                 def test_bad_subsequent_input(self):
+                     initial = zstd.ZstdCompressor(write_content_size=True).compress(b'foo' * 64)
+                     dctx = zstd.ZstdDecompressor()
+                     with self.assertRaisesRegexp(ValueError, 'chunk 1 must be bytes'):
+                         dctx.decompress_content_dict_chain([initial, u'foo'])
+                     with self.assertRaisesRegexp(ValueError, 'chunk 1 must be bytes'):
+                         dctx.decompress_content_dict_chain([initial, None])
+                     with self.assertRaisesRegexp(ValueError, 'chunk 1 is too small to contain a zstd frame'):
+                         dctx.decompress_content_dict_chain([initial, zstd.FRAME_HEADER])
+                     with self.assertRaisesRegexp(ValueError, 'chunk 1 is not a valid zstd frame'):
+                         dctx.decompress_content_dict_chain([initial, b'foo' * 8])
+                     no_size = zstd.ZstdCompressor().compress(b'foo' * 64)
+                     with self.assertRaisesRegexp(ValueError, 'chunk 1 missing content size in frame'):
+                         dctx.decompress_content_dict_chain([initial, no_size])
+                     # Corrupt second frame.
+                     cctx = zstd.ZstdCompressor(write_content_size=True, dict_data=zstd.ZstdCompressionDict(b'foo' * 64))
+                     frame = cctx.compress(b'bar' * 64)
+                     frame = frame[0:12] + frame[15:]
+                     with self.assertRaisesRegexp(zstd.ZstdError, 'could not decompress chunk 1'):
+                         dctx.decompress_content_dict_chain([initial, frame])
+                 def test_simple(self):
+                     original = [
+                         b'foo' * 64,
+                         b'foobar' * 64,
+                         b'baz' * 64,
+                         b'foobaz' * 64,
+                         b'foobarbaz' * 64,
+                     ]
+                     chunks = []
+                     chunks.append(zstd.ZstdCompressor(write_content_size=True).compress(original[0]))
+                     for i, chunk in enumerate(original[1:]):
+                         d = zstd.ZstdCompressionDict(original[i])
+                         cctx = zstd.ZstdCompressor(dict_data=d, write_content_size=True)
+                         chunks.append(cctx.compress(chunk))
+                     for i in range(1, len(original)):
+                         chain = chunks[0:i]
+                         expected = original[i - 1]
+                         dctx = zstd.ZstdDecompressor()
+                         decompressed = dctx.decompress_content_dict_chain(chain)
+                         self.assertEqual(decompressed, expected)

contrib/python-zstandard/tests/test_estimate_sizes.py

0 +5 0

              import zstd
+             from . common import (
+                 make_cffi,
+             )
+             @make_cffi
              class TestSizes(unittest.TestCase):
                  def test_decompression_size(self):
                      size = zstd.estimate_decompression_context_size()

contrib/python-zstandard/tests/test_module_attributes.py

0 +8 -2

              import zstd
+             from . common import (
+                 make_cffi,
+             )
+             @make_cffi
              class TestModuleAttributes(unittest.TestCase):
                  def test_version(self):
-                     self.assertEqual(zstd.ZSTD_VERSION, (1, 1, 2))
+                     self.assertEqual(zstd.ZSTD_VERSION, (1, 1, 3))
                  def test_constants(self):
                      self.assertEqual(zstd.MAX_COMPRESSION_LEVEL, 22)
                      )
                      for a in attrs:
-                         self.assertTrue(hasattr(zstd, a))
+                         self.assertTrue(hasattr(zstd, a), a)

contrib/python-zstandard/tests/test_roundtrip.py

0 +4 0

              import zstd
+             from .common import (
+                 make_cffi,
+             )
              compression_levels = strategies.integers(min_value=1, max_value=22)
+             @make_cffi
              class TestRoundTrip(unittest.TestCase):
                  @hypothesis.given(strategies.binary(), compression_levels)
                  def test_compress_write_to(self, data, level):

contrib/python-zstandard/tests/test_train_dictionary.py

0 +4 0

              import zstd
+             from . common import (
+                 make_cffi,
+             )
              if sys.version_info[0] >= 3:
                  int_type = int
                  int_type = long
+             @make_cffi
              class TestTrainDictionary(unittest.TestCase):
                  def test_no_args(self):
                      with self.assertRaises(TypeError):

contrib/python-zstandard/zstd.c

0 +10 -1

              "Obtains a ``CompressionParameters`` instance from a compression level and\n"
              "optional input size and dictionary size");
+             PyDoc_STRVAR(get_frame_parameters__doc__,
+             "get_frame_parameters(data)\n"
+             "\n"
+             "Obtains a ``FrameParameters`` instance by parsing data.\n");
              PyDoc_STRVAR(train_dictionary__doc__,
              "train_dictionary(dict_size, samples)\n"
              "\n"
              	METH_NOARGS, estimate_decompression_context_size__doc__ },
              	{ "get_compression_parameters", (PyCFunction)get_compression_parameters,
              	METH_VARARGS, get_compression_parameters__doc__ },
+             	{ "get_frame_parameters", (PyCFunction)get_frame_parameters,
+             	METH_VARARGS, get_frame_parameters__doc__ },
              	{ "train_dictionary", (PyCFunction)train_dictionary,
              	METH_VARARGS | METH_KEYWORDS, train_dictionary__doc__ },
              	{ NULL, NULL }
              void decompressobj_module_init(PyObject* mod);
              void decompressionwriter_module_init(PyObject* mod);
              void decompressoriterator_module_init(PyObject* mod);
+             void frameparams_module_init(PyObject* mod);
              void zstd_module_init(PyObject* m) {
              	/* python-zstandard relies on unstable zstd C API features. This means
              	   We detect this mismatch here and refuse to load the module if this
              	   scenario is detected.
              	*/
-             	if (ZSTD_VERSION_NUMBER != 10102 || ZSTD_versionNumber() != 10102) {
+             	if (ZSTD_VERSION_NUMBER != 10103 || ZSTD_versionNumber() != 10103) {
              		PyErr_SetString(PyExc_ImportError, "zstd C API mismatch; Python bindings not compiled against expected zstd version");
              		return;
              	}
              	decompressobj_module_init(m);
              	decompressionwriter_module_init(m);
              	decompressoriterator_module_init(m);
+             	frameparams_module_init(m);
              }
              #if PY_MAJOR_VERSION >= 3

contrib/python-zstandard/zstd/common/mem.h

0 +1 -1

              #endif
              /* code only tested on 32 and 64 bits systems */
-             #define MEM_STATIC_ASSERT(c)   { enum { XXH_static_assert = 1/(int)(!!(c)) }; }
+             #define MEM_STATIC_ASSERT(c)   { enum { MEM_static_assert = 1/(int)(!!(c)) }; }
              MEM_STATIC void MEM_check(void) { MEM_STATIC_ASSERT((sizeof(size_t)==4) || (sizeof(size_t)==8)); }

contrib/python-zstandard/zstd/common/zstd_common.c

0 0 -4

              *   provides error code string from enum */
              const char* ZSTD_getErrorString(ZSTD_ErrorCode code) { return ERR_getErrorName(code); }
-             /* ---   ZBUFF Error Management  (deprecated)   --- */
-             unsigned ZBUFF_isError(size_t errorCode) { return ERR_isError(errorCode); }
-             const char* ZBUFF_getErrorName(size_t errorCode) { return ERR_getErrorName(errorCode); }
              /*=**************************************************************
              *  Custom allocator

contrib/python-zstandard/zstd/common/zstd_errors.h

0 +16 -2

              #include <stddef.h>   /* size_t */
+             /* =====   ZSTDERRORLIB_API : control library symbols visibility   ===== */
+             #if defined(__GNUC__) && (__GNUC__ >= 4)
+             #  define ZSTDERRORLIB_VISIBILITY __attribute__ ((visibility ("default")))
+             #else
+             #  define ZSTDERRORLIB_VISIBILITY
+             #endif
+             #if defined(ZSTD_DLL_EXPORT) && (ZSTD_DLL_EXPORT==1)
+             #  define ZSTDERRORLIB_API __declspec(dllexport) ZSTDERRORLIB_VISIBILITY
+             #elif defined(ZSTD_DLL_IMPORT) && (ZSTD_DLL_IMPORT==1)
+             #  define ZSTDERRORLIB_API __declspec(dllimport) ZSTDERRORLIB_VISIBILITY /* It isn't required but allows to generate better code, saving a function pointer load from the IAT and an indirect jump.*/
+             #else
+             #  define ZSTDERRORLIB_API ZSTDERRORLIB_VISIBILITY
+             #endif
              /*-****************************************
              *  error codes list
              ******************************************/
              /*! ZSTD_getErrorCode() :
                  convert a `size_t` function result into a `ZSTD_ErrorCode` enum type,
                  which can be used to compare directly with enum list published into "error_public.h" */
-             ZSTD_ErrorCode ZSTD_getErrorCode(size_t functionResult);
-             const char* ZSTD_getErrorString(ZSTD_ErrorCode code);
+             ZSTDERRORLIB_API ZSTD_ErrorCode ZSTD_getErrorCode(size_t functionResult);
+             ZSTDERRORLIB_API const char* ZSTD_getErrorString(ZSTD_ErrorCode code);
              #if defined (__cplusplus)

contrib/python-zstandard/zstd/common/zstd_internal.h

0 +9 0

              }
+             /* hidden functions */
+             /* ZSTD_invalidateRepCodes() :
+              * ensures next compression will not use repcodes from previous block.
+              * Note : only works with regular variant;
+              *        do not use with extDict variant ! */
+             void ZSTD_invalidateRepCodes(ZSTD_CCtx* cctx);
              #endif   /* ZSTD_CCOMMON_H_MODULE */

contrib/python-zstandard/zstd/compress/zstd_compress.c

0 +91 -48

              /*-*************************************
              *  Context memory management
              ***************************************/
-             struct ZSTD_CCtx_s
+             {
+             struct ZSTD_CCtx_s {
                  const BYTE* nextSrc;    /* next block here to continue on current prefix */
                  const BYTE* base;       /* All regular indexes relative to this position */
                  const BYTE* dictBase;   /* extDict indexes relative to this position */
                  U32   nextToUpdate;     /* index from which to continue dictionary update */
                  U32   nextToUpdate3;    /* index from which to continue dictionary update */
                  U32   hashLog3;         /* dispatch table : larger == faster, more memory */
-                 U32   loadedDictEnd;
+                 U32   loadedDictEnd;    /* index of end of dictionary */
+                 U32   forceWindow;      /* force back-references to respect limit of 1<<wLog, even for dictionary */
                  ZSTD_compressionStage_e stage;
                  U32   rep[ZSTD_REP_NUM];
-                 U32   savedRep[ZSTD_REP_NUM];
+                 U32   repToConfirm[ZSTD_REP_NUM];
                  U32   dictID;
                  ZSTD_parameters params;
                  void* workSpace;
                  cctx = (ZSTD_CCtx*) ZSTD_malloc(sizeof(ZSTD_CCtx), customMem);
                  if (!cctx) return NULL;
                  memset(cctx, 0, sizeof(ZSTD_CCtx));
-                 memcpy(&(cctx->customMem), &customMem, sizeof(customMem));
+                 cctx->customMem = customMem;
                  return cctx;
              }
                  return sizeof(*cctx) + cctx->workSpaceSize;
              }
+             size_t ZSTD_setCCtxParameter(ZSTD_CCtx* cctx, ZSTD_CCtxParameter param, unsigned value)
+             {
+                 switch(param)
+                 {
+                 case ZSTD_p_forceWindow : cctx->forceWindow = value>0; cctx->loadedDictEnd = 0; return 0;
+                 default: return ERROR(parameter_unknown);
+                 }
+             }
              const seqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx)   /* hidden interface */
              {
                  return &(ctx->seqStore);
                  }
              }
+             /* ZSTD_invalidateRepCodes() :
+              * ensures next compression will not use repcodes from previous block.
+              * Note : only works with regular variant;
+              *        do not use with extDict variant ! */
+             void ZSTD_invalidateRepCodes(ZSTD_CCtx* cctx) {
+                 int i;
+                 for (i=0; i<ZSTD_REP_NUM; i++) cctx->rep[i] = 0;
+             }
              /*! ZSTD_copyCCtx() :
              *   Duplicate an existing context `srcCCtx` into another one `dstCCtx`.
                    if ((size_t)(op-ostart) >= maxCSize) return 0; }
                  /* confirm repcodes */
-                 { int i; for (i=0; i<ZSTD_REP_NUM; i++) zc->rep[i] = zc->savedRep[i]; }
+                 { int i; for (i=0; i<ZSTD_REP_NUM; i++) zc->rep[i] = zc->repToConfirm[i]; }
                  return op - ostart;
              }
+             #if 0 /* for debug */
+             #  define STORESEQ_DEBUG
+             #include <stdio.h>   /* fprintf */
+             U32 g_startDebug = 0;
+             const BYTE* g_start = NULL;
+             #endif
              /*! ZSTD_storeSeq() :
                  Store a sequence (literal length, literals, offset code and match length code) into seqStore_t.
                  `offsetCode` : distance to match, or 0 == repCode.
              */
              MEM_STATIC void ZSTD_storeSeq(seqStore_t* seqStorePtr, size_t litLength, const void* literals, U32 offsetCode, size_t matchCode)
              {
-             #if 0  /* for debug */
-                 static const BYTE* g_start = NULL;
+             #ifdef STORESEQ_DEBUG
+                 if (g_startDebug) {
                  const U32 pos = (U32)((const BYTE*)literals - g_start);
                  if (g_start==NULL) g_start = (const BYTE*)literals;
-                 //if ((pos > 1) && (pos < 50000))
-                     printf("Cpos %6u :%5u literals & match %3u bytes at distance %6u \n",
+                     if ((pos > 1895000) && (pos < 1895300))
+                         fprintf(stderr, "Cpos %6u :%5u literals & match %3u bytes at distance %6u \n",
                             pos, (U32)litLength, (U32)matchCode+MINMATCH, (U32)offsetCode);
+                 }
              #endif
                  /* copy Literals */
                  ZSTD_wildcopy(seqStorePtr->lit, literals, litLength);
                  }   }   }
                  /* save reps for next block */
-                 cctx->savedRep[0] = offset_1 ? offset_1 : offsetSaved;
-                 cctx->savedRep[1] = offset_2 ? offset_2 : offsetSaved;
+                 cctx->repToConfirm[0] = offset_1 ? offset_1 : offsetSaved;
+                 cctx->repToConfirm[1] = offset_2 ? offset_2 : offsetSaved;
                  /* Last Literals */
                  {   size_t const lastLLSize = iend - anchor;
                  }   }   }
                  /* save reps for next block */
-                 ctx->savedRep[0] = offset_1; ctx->savedRep[1] = offset_2;
+                 ctx->repToConfirm[0] = offset_1; ctx->repToConfirm[1] = offset_2;
                  /* Last Literals */
                  {   size_t const lastLLSize = iend - anchor;
                  }   }   }
                  /* save reps for next block */
-                 cctx->savedRep[0] = offset_1 ? offset_1 : offsetSaved;
-                 cctx->savedRep[1] = offset_2 ? offset_2 : offsetSaved;
+                 cctx->repToConfirm[0] = offset_1 ? offset_1 : offsetSaved;
+                 cctx->repToConfirm[1] = offset_2 ? offset_2 : offsetSaved;
                  /* Last Literals */
                  {   size_t const lastLLSize = iend - anchor;
                  }   }   }
                  /* save reps for next block */
-                 ctx->savedRep[0] = offset_1; ctx->savedRep[1] = offset_2;
+                 ctx->repToConfirm[0] = offset_1; ctx->repToConfirm[1] = offset_2;
                  /* Last Literals */
                  {   size_t const lastLLSize = iend - anchor;
                  }   }
                  /* Save reps for next block */
-                 ctx->savedRep[0] = offset_1 ? offset_1 : savedOffset;
-                 ctx->savedRep[1] = offset_2 ? offset_2 : savedOffset;
+                 ctx->repToConfirm[0] = offset_1 ? offset_1 : savedOffset;
+                 ctx->repToConfirm[1] = offset_2 ? offset_2 : savedOffset;
                  /* Last Literals */
                  {   size_t const lastLLSize = iend - anchor;
                  }   }
                  /* Save reps for next block */
-                 ctx->savedRep[0] = offset_1; ctx->savedRep[1] = offset_2;
+                 ctx->repToConfirm[0] = offset_1; ctx->repToConfirm[1] = offset_2;
                  /* Last Literals */
                  {   size_t const lastLLSize = iend - anchor;
                  cctx->nextSrc = ip + srcSize;
-                 {   size_t const cSize = frame ?
+                 if (srcSize) {
+                     size_t const cSize = frame ?
                                           ZSTD_compress_generic (cctx, dst, dstCapacity, src, srcSize, lastFrameChunk) :
                                           ZSTD_compressBlock_internal (cctx, dst, dstCapacity, src, srcSize);
                      if (ZSTD_isError(cSize)) return cSize;
                      return cSize + fhSize;
+                 }
+                 } else
+                     return fhSize;
              }
                  zc->dictBase = zc->base;
                  zc->base += ip - zc->nextSrc;
                  zc->nextToUpdate = zc->dictLimit;
-                 zc->loadedDictEnd = (U32)(iend - zc->base);
+                 zc->loadedDictEnd = zc->forceWindow ? 0 : (U32)(iend - zc->base);
                  zc->nextSrc = iend;
                  if (srcSize <= HASH_READ_SIZE) return 0;
                  }
                  if (dictPtr+12 > dictEnd) return ERROR(dictionary_corrupted);
-                 cctx->rep[0] = MEM_readLE32(dictPtr+0); if (cctx->rep[0] >= dictSize) return ERROR(dictionary_corrupted);
-                 cctx->rep[1] = MEM_readLE32(dictPtr+4); if (cctx->rep[1] >= dictSize) return ERROR(dictionary_corrupted);
-                 cctx->rep[2] = MEM_readLE32(dictPtr+8); if (cctx->rep[2] >= dictSize) return ERROR(dictionary_corrupted);
+                 cctx->rep[0] = MEM_readLE32(dictPtr+0); if (cctx->rep[0] == 0 || cctx->rep[0] >= dictSize) return ERROR(dictionary_corrupted);
+                 cctx->rep[1] = MEM_readLE32(dictPtr+4); if (cctx->rep[1] == 0 || cctx->rep[1] >= dictSize) return ERROR(dictionary_corrupted);
+                 cctx->rep[2] = MEM_readLE32(dictPtr+8); if (cctx->rep[2] == 0 || cctx->rep[2] >= dictSize) return ERROR(dictionary_corrupted);
                  dictPtr += 12;
                  {   U32 offcodeMax = MaxOff;
                  }
              }
              /*! ZSTD_compressBegin_internal() :
              *   @return : 0, or an error code */
              static size_t ZSTD_compressBegin_internal(ZSTD_CCtx* cctx,
              }
-             size_t ZSTD_compressBegin(ZSTD_CCtx* zc, int compressionLevel)
+             size_t ZSTD_compressBegin(ZSTD_CCtx* cctx, int compressionLevel)
              {
-                 return ZSTD_compressBegin_usingDict(zc, NULL, 0, compressionLevel);
+                 return ZSTD_compressBegin_usingDict(cctx, NULL, 0, compressionLevel);
              }
              /* =====  Dictionary API  ===== */
              struct ZSTD_CDict_s {
-                 void* dictContent;
+                 void* dictBuffer;
+                 const void* dictContent;
                  size_t dictContentSize;
                  ZSTD_CCtx* refContext;
              };  /* typedef'd tp ZSTD_CDict within "zstd.h" */
              size_t ZSTD_sizeof_CDict(const ZSTD_CDict* cdict)
              {
                  if (cdict==NULL) return 0;   /* support sizeof on NULL */
-                 return ZSTD_sizeof_CCtx(cdict->refContext) + cdict->dictContentSize;
+                 return ZSTD_sizeof_CCtx(cdict->refContext) + (cdict->dictBuffer ? cdict->dictContentSize : 0) + sizeof(*cdict);
              }
-             ZSTD_CDict* ZSTD_createCDict_advanced(const void* dict, size_t dictSize, ZSTD_parameters params, ZSTD_customMem customMem)
+             ZSTD_CDict* ZSTD_createCDict_advanced(const void* dictBuffer, size_t dictSize, unsigned byReference,
+                                                   ZSTD_parameters params, ZSTD_customMem customMem)
              {
                  if (!customMem.customAlloc && !customMem.customFree) customMem = defaultCustomMem;
                  if (!customMem.customAlloc || !customMem.customFree) return NULL;
                  {   ZSTD_CDict* const cdict = (ZSTD_CDict*) ZSTD_malloc(sizeof(ZSTD_CDict), customMem);
-                     void* const dictContent = ZSTD_malloc(dictSize, customMem);
                      ZSTD_CCtx* const cctx = ZSTD_createCCtx_advanced(customMem);
-                     if (!dictContent || !cdict || !cctx) {
-                         ZSTD_free(dictContent, customMem);
+                     if (!cdict || !cctx) {
                          ZSTD_free(cdict, customMem);
                          ZSTD_free(cctx, customMem);
                          return NULL;
                      }
-                     if (dictSize) {
-                         memcpy(dictContent, dict, dictSize);
+                     if ((byReference) || (!dictBuffer) || (!dictSize)) {
+                         cdict->dictBuffer = NULL;
+                         cdict->dictContent = dictBuffer;
+                     } else {
+                         void* const internalBuffer = ZSTD_malloc(dictSize, customMem);
+                         if (!internalBuffer) { ZSTD_free(cctx, customMem); ZSTD_free(cdict, customMem); return NULL; }
+                         memcpy(internalBuffer, dictBuffer, dictSize);
+                         cdict->dictBuffer = internalBuffer;
+                         cdict->dictContent = internalBuffer;
                      }
-                     {   size_t const errorCode = ZSTD_compressBegin_advanced(cctx, dictContent, dictSize, params, 0);
+                     {   size_t const errorCode = ZSTD_compressBegin_advanced(cctx, cdict->dictContent, dictSize, params, 0);
                          if (ZSTD_isError(errorCode)) {
-                             ZSTD_free(dictContent, customMem);
+                             ZSTD_free(cdict->dictBuffer, customMem);
+                             ZSTD_free(cctx, customMem);
                              ZSTD_free(cdict, customMem);
-                             ZSTD_free(cctx, customMem);
                              return NULL;
                      }   }
-                     cdict->dictContent = dictContent;
+                     cdict->refContext = cctx;
                      cdict->dictContentSize = dictSize;
-                     cdict->refContext = cctx;
                      return cdict;
                  }
              }
                  ZSTD_customMem const allocator = { NULL, NULL, NULL };
                  ZSTD_parameters params = ZSTD_getParams(compressionLevel, 0, dictSize);
                  params.fParams.contentSizeFlag = 1;
-                 return ZSTD_createCDict_advanced(dict, dictSize, params, allocator);
+                 return ZSTD_createCDict_advanced(dict, dictSize, 0, params, allocator);
+             }
+             ZSTD_CDict* ZSTD_createCDict_byReference(const void* dict, size_t dictSize, int compressionLevel)
+             {
+                 ZSTD_customMem const allocator = { NULL, NULL, NULL };
+                 ZSTD_parameters params = ZSTD_getParams(compressionLevel, 0, dictSize);
+                 params.fParams.contentSizeFlag = 1;
+                 return ZSTD_createCDict_advanced(dict, dictSize, 1, params, allocator);
              }
              size_t ZSTD_freeCDict(ZSTD_CDict* cdict)
                  if (cdict==NULL) return 0;   /* support free on NULL */
                  {   ZSTD_customMem const cMem = cdict->refContext->customMem;
                      ZSTD_freeCCtx(cdict->refContext);
-                     ZSTD_free(cdict->dictContent, cMem);
+                     ZSTD_free(cdict->dictBuffer, cMem);
                      ZSTD_free(cdict, cMem);
                      return 0;
                  }
                  return ZSTD_getParamsFromCCtx(cdict->refContext);
              }
-             size_t ZSTD_compressBegin_usingCDict(ZSTD_CCtx* cctx, const ZSTD_CDict* cdict, U64 pledgedSrcSize)
+             size_t ZSTD_compressBegin_usingCDict(ZSTD_CCtx* cctx, const ZSTD_CDict* cdict, unsigned long long pledgedSrcSize)
              {
                  if (cdict->dictContentSize) CHECK_F(ZSTD_copyCCtx(cctx, cdict->refContext, pledgedSrcSize))
                  else CHECK_F(ZSTD_compressBegin_advanced(cctx, NULL, 0, cdict->refContext->params, pledgedSrcSize));
              size_t ZSTD_resetCStream(ZSTD_CStream* zcs, unsigned long long pledgedSrcSize)
              {
-                 if (zcs->inBuffSize==0) return ERROR(stage_wrong);   /* zcs has not been init at least once */
+                 if (zcs->inBuffSize==0) return ERROR(stage_wrong);   /* zcs has not been init at least once => can't reset */
                  if (zcs->cdict) CHECK_F(ZSTD_compressBegin_usingCDict(zcs->cctx, zcs->cdict, pledgedSrcSize))
                  else CHECK_F(ZSTD_compressBegin_advanced(zcs->cctx, NULL, 0, zcs->params, pledgedSrcSize));
                      if (zcs->outBuff == NULL) return ERROR(memory_allocation);
                  }
-                 if (dict) {
+                 if (dict && dictSize >= 8) {
                      ZSTD_freeCDict(zcs->cdictLocal);
-                     zcs->cdictLocal = ZSTD_createCDict_advanced(dict, dictSize, params, zcs->customMem);
+                     zcs->cdictLocal = ZSTD_createCDict_advanced(dict, dictSize, 0, params, zcs->customMem);
                      if (zcs->cdictLocal == NULL) return ERROR(memory_allocation);
                      zcs->cdict = zcs->cdictLocal;
                  } else zcs->cdict = NULL;
                  ZSTD_parameters const params = ZSTD_getParamsFromCDict(cdict);
                  size_t const initError =  ZSTD_initCStream_advanced(zcs, NULL, 0, params, 0);
                  zcs->cdict = cdict;
+                 zcs->cctx->dictID = params.fParams.noDictIDFlag ? 0 : cdict->refContext->dictID;
                  return initError;
              }
              size_t ZSTD_initCStream_srcSize(ZSTD_CStream* zcs, int compressionLevel, unsigned long long pledgedSrcSize)
              {
-                 ZSTD_parameters const params = ZSTD_getParams(compressionLevel, pledgedSrcSize, 0);
+                 ZSTD_parameters params = ZSTD_getParams(compressionLevel, pledgedSrcSize, 0);
+                 if (pledgedSrcSize) params.fParams.contentSizeFlag = 1;
                  return ZSTD_initCStream_advanced(zcs, NULL, 0, params, pledgedSrcSize);
              }

contrib/python-zstandard/zstd/compress/zstd_opt.h

0 +4 -4

                  }    }   /* for (cur=0; cur < last_pos; ) */
                  /* Save reps for next block */
-                 { int i; for (i=0; i<ZSTD_REP_NUM; i++) ctx->savedRep[i] = rep[i]; }
+                 { int i; for (i=0; i<ZSTD_REP_NUM; i++) ctx->repToConfirm[i] = rep[i]; }
                  /* Last Literals */
                  {   size_t const lastLLSize = iend - anchor;
                          match_num = ZSTD_BtGetAllMatches_selectMLS_extDict(ctx, inr, iend, maxSearches, mls, matches, minMatch);
-                         if (match_num > 0 && matches[match_num-1].len > sufficient_len) {
+                         if (match_num > 0 && (matches[match_num-1].len > sufficient_len || cur + matches[match_num-1].len >= ZSTD_OPT_NUM)) {
                              best_mlen = matches[match_num-1].len;
                              best_off = matches[match_num-1].off;
                              last_pos = cur + 1;
                          /* set prices using matches at position = cur */
                          for (u = 0; u < match_num; u++) {
                              mlen = (u>0) ? matches[u-1].len+1 : best_mlen;
-                             best_mlen = (cur + matches[u].len < ZSTD_OPT_NUM) ? matches[u].len : ZSTD_OPT_NUM - cur;
+                             best_mlen = matches[u].len;
                              while (mlen <= best_mlen) {
                                  if (opt[cur].mlen == 1) {
                  }    }   /* for (cur=0; cur < last_pos; ) */
                  /* Save reps for next block */
-                 { int i; for (i=0; i<ZSTD_REP_NUM; i++) ctx->savedRep[i] = rep[i]; }
+                 { int i; for (i=0; i<ZSTD_REP_NUM; i++) ctx->repToConfirm[i] = rep[i]; }
                  /* Last Literals */
                  {   size_t lastLLSize = iend - anchor;

contrib/python-zstandard/zstd/decompress/zstd_decompress.c

0 +39 -22

              #if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT==1)
                  if (ZSTD_isLegacy(src, srcSize)) return ZSTD_decompressLegacy(dst, dstCapacity, src, srcSize, dict, dictSize);
              #endif
-                 ZSTD_decompressBegin_usingDict(dctx, dict, dictSize);
+                 CHECK_F(ZSTD_decompressBegin_usingDict(dctx, dict, dictSize));
                  ZSTD_checkContinuity(dctx, dst);
                  return ZSTD_decompressFrame(dctx, dst, dstCapacity, src, srcSize);
              }
                  }
                  if (dictPtr+12 > dictEnd) return ERROR(dictionary_corrupted);
-                 dctx->rep[0] = MEM_readLE32(dictPtr+0); if (dctx->rep[0] >= dictSize) return ERROR(dictionary_corrupted);
-                 dctx->rep[1] = MEM_readLE32(dictPtr+4); if (dctx->rep[1] >= dictSize) return ERROR(dictionary_corrupted);
-                 dctx->rep[2] = MEM_readLE32(dictPtr+8); if (dctx->rep[2] >= dictSize) return ERROR(dictionary_corrupted);
+                 dctx->rep[0] = MEM_readLE32(dictPtr+0); if (dctx->rep[0] == 0 || dctx->rep[0] >= dictSize) return ERROR(dictionary_corrupted);
+                 dctx->rep[1] = MEM_readLE32(dictPtr+4); if (dctx->rep[1] == 0 || dctx->rep[1] >= dictSize) return ERROR(dictionary_corrupted);
+                 dctx->rep[2] = MEM_readLE32(dictPtr+8); if (dctx->rep[2] == 0 || dctx->rep[2] >= dictSize) return ERROR(dictionary_corrupted);
                  dictPtr += 12;
                  dctx->litEntropy = dctx->fseEntropy = 1;
              /* ======   ZSTD_DDict   ====== */
              struct ZSTD_DDict_s {
-                 void* dict;
+                 void* dictBuffer;
+                 const void* dictContent;
                  size_t dictSize;
                  ZSTD_DCtx* refContext;
              };  /* typedef'd to ZSTD_DDict within "zstd.h" */
-             ZSTD_DDict* ZSTD_createDDict_advanced(const void* dict, size_t dictSize, ZSTD_customMem customMem)
+             ZSTD_DDict* ZSTD_createDDict_advanced(const void* dict, size_t dictSize, unsigned byReference, ZSTD_customMem customMem)
              {
                  if (!customMem.customAlloc && !customMem.customFree) customMem = defaultCustomMem;
                  if (!customMem.customAlloc || !customMem.customFree) return NULL;
                  {   ZSTD_DDict* const ddict = (ZSTD_DDict*) ZSTD_malloc(sizeof(ZSTD_DDict), customMem);
-                     void* const dictContent = ZSTD_malloc(dictSize, customMem);
                      ZSTD_DCtx* const dctx = ZSTD_createDCtx_advanced(customMem);
-                     if (!dictContent || !ddict || !dctx) {
-                         ZSTD_free(dictContent, customMem);
+                     if (!ddict || !dctx) {
                          ZSTD_free(ddict, customMem);
                          ZSTD_free(dctx, customMem);
                          return NULL;
                      }
-                     if (dictSize) {
-                         memcpy(dictContent, dict, dictSize);
+                     if ((byReference) || (!dict) || (!dictSize)) {
+                         ddict->dictBuffer = NULL;
+                         ddict->dictContent = dict;
+                     } else {
+                         void* const internalBuffer = ZSTD_malloc(dictSize, customMem);
+                         if (!internalBuffer) { ZSTD_free(dctx, customMem); ZSTD_free(ddict, customMem); return NULL; }
+                         memcpy(internalBuffer, dict, dictSize);
+                         ddict->dictBuffer = internalBuffer;
+                         ddict->dictContent = internalBuffer;
                      }
-                     {   size_t const errorCode = ZSTD_decompressBegin_usingDict(dctx, dictContent, dictSize);
+                     {   size_t const errorCode = ZSTD_decompressBegin_usingDict(dctx, ddict->dictContent, dictSize);
                          if (ZSTD_isError(errorCode)) {
-                             ZSTD_free(dictContent, customMem);
+                             ZSTD_free(ddict->dictBuffer, customMem);
                              ZSTD_free(ddict, customMem);
                              ZSTD_free(dctx, customMem);
                              return NULL;
                      }   }
-                     ddict->dict = dictContent;
                      ddict->dictSize = dictSize;
                      ddict->refContext = dctx;
                      return ddict;
              ZSTD_DDict* ZSTD_createDDict(const void* dict, size_t dictSize)
              {
                  ZSTD_customMem const allocator = { NULL, NULL, NULL };
-                 return ZSTD_createDDict_advanced(dict, dictSize, allocator);
+                 return ZSTD_createDDict_advanced(dict, dictSize, 0, allocator);
              }
+             /*! ZSTD_createDDict_byReference() :
+              *  Create a digested dictionary, ready to start decompression operation without startup delay.
+              *  Dictionary content is simply referenced, and therefore stays in dictBuffer.
+              *  It is important that dictBuffer outlives DDict, it must remain read accessible throughout the lifetime of DDict */
+             ZSTD_DDict* ZSTD_createDDict_byReference(const void* dictBuffer, size_t dictSize)
+             {
+                 ZSTD_customMem const allocator = { NULL, NULL, NULL };
+                 return ZSTD_createDDict_advanced(dictBuffer, dictSize, 1, allocator);
+             }
              size_t ZSTD_freeDDict(ZSTD_DDict* ddict)
              {
                  if (ddict==NULL) return 0;   /* support free on NULL */
                  {   ZSTD_customMem const cMem = ddict->refContext->customMem;
                      ZSTD_freeDCtx(ddict->refContext);
-                     ZSTD_free(ddict->dict, cMem);
+                     ZSTD_free(ddict->dictBuffer, cMem);
                      ZSTD_free(ddict, cMem);
                      return 0;
                  }
              size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict)
              {
                  if (ddict==NULL) return 0;   /* support sizeof on NULL */
-                 return sizeof(*ddict) + sizeof(ddict->refContext) + ddict->dictSize;
+                 return sizeof(*ddict) + ZSTD_sizeof_DCtx(ddict->refContext) + (ddict->dictBuffer ? ddict->dictSize : 0) ;
              }
              /*! ZSTD_getDictID_fromDict() :
              unsigned ZSTD_getDictID_fromDDict(const ZSTD_DDict* ddict)
              {
                  if (ddict==NULL) return 0;
-                 return ZSTD_getDictID_fromDict(ddict->dict, ddict->dictSize);
+                 return ZSTD_getDictID_fromDict(ddict->dictContent, ddict->dictSize);
              }
              /*! ZSTD_getDictID_fromFrame() :
                                          const ZSTD_DDict* ddict)
              {
              #if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT==1)
-                 if (ZSTD_isLegacy(src, srcSize)) return ZSTD_decompressLegacy(dst, dstCapacity, src, srcSize, ddict->dict, ddict->dictSize);
+                 if (ZSTD_isLegacy(src, srcSize)) return ZSTD_decompressLegacy(dst, dstCapacity, src, srcSize, ddict->dictContent, ddict->dictSize);
              #endif
                  ZSTD_refDCtx(dctx, ddict->refContext);
                  ZSTD_checkContinuity(dctx, dst);
                  zds->stage = zdss_loadHeader;
                  zds->lhSize = zds->inPos = zds->outStart = zds->outEnd = 0;
                  ZSTD_freeDDict(zds->ddictLocal);
-                 if (dict) {
+                 if (dict && dictSize >= 8) {
                      zds->ddictLocal = ZSTD_createDDict(dict, dictSize);
                      if (zds->ddictLocal == NULL) return ERROR(memory_allocation);
                  } else zds->ddictLocal = NULL;
                  switch(paramType)
                  {
                      default : return ERROR(parameter_unknown);
-                     case ZSTDdsp_maxWindowSize : zds->maxWindowSize = paramValue ? paramValue : (U32)(-1); break;
+                     case DStream_p_maxWindowSize : zds->maxWindowSize = paramValue ? paramValue : (U32)(-1); break;
                  }
                  return 0;
              }
              #if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT>=1)
                              {   U32 const legacyVersion = ZSTD_isLegacy(istart, iend-istart);
                                  if (legacyVersion) {
-                                     const void* const dict = zds->ddict ? zds->ddict->dict : NULL;
+                                     const void* const dict = zds->ddict ? zds->ddict->dictContent : NULL;
                                      size_t const dictSize = zds->ddict ? zds->ddict->dictSize : 0;
                                      CHECK_F(ZSTD_initLegacyStream(&zds->legacyContext, zds->previousLegacyVersion, legacyVersion,
                                                                     dict, dictSize));

contrib/python-zstandard/zstd/dictBuilder/zdict.c

0 +53 -5

              #include <time.h>          /* clock */
              #include "mem.h"           /* read */
-             #include "error_private.h"
              #include "fse.h"           /* FSE_normalizeCount, FSE_writeNCount */
              #define HUF_STATIC_LINKING_ONLY
-             #include "huf.h"
+             #include "huf.h"           /* HUF_buildCTable, HUF_writeCTable */
              #include "zstd_internal.h" /* includes zstd.h */
-             #include "xxhash.h"
+             #include "xxhash.h"        /* XXH64 */
              #include "divsufsort.h"
              #ifndef ZDICT_STATIC_LINKING_ONLY
              #  define ZDICT_STATIC_LINKING_ONLY
              #define NOISELENGTH 32
              #define MINRATIO 4
-             static const int g_compressionLevel_default = 5;
+             static const int g_compressionLevel_default = 6;
              static const U32 g_selectivity_default = 9;
              static const size_t g_provision_entropySize = 200;
              static const size_t g_min_fast_dictContent = 192;
                          if (ZSTD_isError(errorCode)) { DISPLAYLEVEL(1, "warning : ZSTD_copyCCtx failed \n"); return; }
                  }
                  cSize = ZSTD_compressBlock(esr.zc, esr.workPlace, ZSTD_BLOCKSIZE_ABSOLUTEMAX, src, srcSize);
-                 if (ZSTD_isError(cSize)) { DISPLAYLEVEL(1, "warning : could not compress sample size %u \n", (U32)srcSize); return; }
+                 if (ZSTD_isError(cSize)) { DISPLAYLEVEL(3, "warning : could not compress sample size %u \n", (U32)srcSize); return; }
                  if (cSize) {  /* if == 0; block is not compressible */
                      const seqStore_t* seqStorePtr = ZSTD_getSeqStore(esr.zc);
              }
+             size_t ZDICT_finalizeDictionary(void* dictBuffer, size_t dictBufferCapacity,
+                                       const void* customDictContent, size_t dictContentSize,
+                                       const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
+                                       ZDICT_params_t params)
+             {
+                 size_t hSize;
+             #define HBUFFSIZE 256
+                 BYTE header[HBUFFSIZE];
+                 int const compressionLevel = (params.compressionLevel <= 0) ? g_compressionLevel_default : params.compressionLevel;
+                 U32 const notificationLevel = params.notificationLevel;
+                 /* check conditions */
+                 if (dictBufferCapacity < dictContentSize) return ERROR(dstSize_tooSmall);
+                 if (dictContentSize < ZDICT_CONTENTSIZE_MIN) return ERROR(srcSize_wrong);
+                 if (dictBufferCapacity < ZDICT_DICTSIZE_MIN) return ERROR(dstSize_tooSmall);
+                 /* dictionary header */
+                 MEM_writeLE32(header, ZSTD_DICT_MAGIC);
+                 {   U64 const randomID = XXH64(customDictContent, dictContentSize, 0);
+                     U32 const compliantID = (randomID % ((1U<<31)-32768)) + 32768;
+                     U32 const dictID = params.dictID ? params.dictID : compliantID;
+                     MEM_writeLE32(header+4, dictID);
+                 }
+                 hSize = 8;
+                 /* entropy tables */
+                 DISPLAYLEVEL(2, "\r%70s\r", "");   /* clean display line */
+                 DISPLAYLEVEL(2, "statistics ... \n");
+                 {   size_t const eSize = ZDICT_analyzeEntropy(header+hSize, HBUFFSIZE-hSize,
+                                               compressionLevel,
+                                               samplesBuffer, samplesSizes, nbSamples,
+                                               customDictContent, dictContentSize,
+                                               notificationLevel);
+                     if (ZDICT_isError(eSize)) return eSize;
+                     hSize += eSize;
+                 }
+                 /* copy elements in final buffer ; note : src and dst buffer can overlap */
+                 if (hSize + dictContentSize > dictBufferCapacity) dictContentSize = dictBufferCapacity - hSize;
+                 {   size_t const dictSize = hSize + dictContentSize;
+                     char* dictEnd = (char*)dictBuffer + dictSize;
+                     memmove(dictEnd - dictContentSize, customDictContent, dictContentSize);
+                     memcpy(dictBuffer, header, hSize);
+                     return dictSize;
+                 }
+             }
              size_t ZDICT_addEntropyTablesFromBuffer_advanced(void* dictBuffer, size_t dictContentSize, size_t dictBufferCapacity,
                                                               const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
                                                               ZDICT_params_t params)

contrib/python-zstandard/zstd/dictBuilder/zdict.h

0 +112 -22

		@@ -19,15 +19,18 b' extern "C" {'
19	19	#include <stddef.h> /* size_t */
20	20
21	21
22		/====== Export for Windows ======/
23		/*!
24		* ZSTD_DLL_EXPORT :
25		* Enable exporting of functions when building a Windows DLL
26		*/
27		#if defined(_WIN32) && defined(ZSTD_DLL_EXPORT) && (ZSTD_DLL_EXPORT==1)
28		# define ZDICTLIB_API __declspec(dllexport)
	22	/* ===== ZDICTLIB_API : control library symbols visibility ===== */
	23	#if defined(__GNUC__) && (__GNUC__ >= 4)
	24	# define ZDICTLIB_VISIBILITY __attribute__ ((visibility ("default")))
29	25	#else
30		# define ZDICTLIB_~~API~~
	26	# define ZDICTLIB_VISIBILITY
	27	#endif
	28	#if defined(ZSTD_DLL_EXPORT) && (ZSTD_DLL_EXPORT==1)
	29	# define ZDICTLIB_API __declspec(dllexport) ZDICTLIB_VISIBILITY
	30	#elif defined(ZSTD_DLL_IMPORT) && (ZSTD_DLL_IMPORT==1)
	31	# define ZDICTLIB_API __declspec(dllimport) ZDICTLIB_VISIBILITY /* It isn't required but allows to generate better code, saving a function pointer load from the IAT and an indirect jump.*/
	32	#else
	33	# define ZDICTLIB_API ZDICTLIB_VISIBILITY
31	34	#endif
32	35
33	36
		@@ -79,29 +82,116 b' typedef struct {'
79	82	or an error code, which can be tested by ZDICT_isError().
80	83	note : ZDICT_trainFromBuffer_advanced() will send notifications into stderr if instructed to, using notificationLevel>0.
81	84	*/
82		size_t ZDICT_trainFromBuffer_advanced(void* dictBuffer, size_t dictBufferCapacity,
	85	ZDICTLIB_API size_t ZDICT_trainFromBuffer_advanced(void* dictBuffer, size_t dictBufferCapacity,
	86	const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
	87	ZDICT_params_t parameters);
	88
	89	/*! COVER_params_t :
	90	For all values 0 means default.
	91	kMin and d are the only required parameters.
	92	*/
	93	typedef struct {
	94	unsigned k; /* Segment size : constraint: 0 < k : Reasonable range [16, 2048+] */
	95	unsigned d; /* dmer size : constraint: 0 < d <= k : Reasonable range [6, 16] */
	96	unsigned steps; /* Number of steps : Only used for optimization : 0 means default (32) : Higher means more parameters checked */
	97
	98	unsigned nbThreads; /* Number of threads : constraint: 0 < nbThreads : 1 means single-threaded : Only used for optimization : Ignored if ZSTD_MULTITHREAD is not defined */
	99	unsigned notificationLevel; /* Write to stderr; 0 = none (default); 1 = errors; 2 = progression; 3 = details; 4 = debug; */
	100	unsigned dictID; /* 0 means auto mode (32-bits random value); other : force dictID value */
	101	int compressionLevel; /* 0 means default; target a specific zstd compression level */
	102	} COVER_params_t;
	103
	104
	105	/*! COVER_trainFromBuffer() :
	106	Train a dictionary from an array of samples using the COVER algorithm.
	107	Samples must be stored concatenated in a single flat buffer `samplesBuffer`,
	108	supplied with an array of sizes `samplesSizes`, providing the size of each sample, in order.
	109	The resulting dictionary will be saved into `dictBuffer`.
	110	@return : size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
	111	or an error code, which can be tested with ZDICT_isError().
	112	Note : COVER_trainFromBuffer() requires about 9 bytes of memory for each input byte.
	113	Tips : In general, a reasonable dictionary has a size of ~ 100 KB.
	114	It's obviously possible to target smaller or larger ones, just by specifying different `dictBufferCapacity`.
	115	In general, it's recommended to provide a few thousands samples, but this can vary a lot.
	116	It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
	117	*/
	118	ZDICTLIB_API size_t COVER_trainFromBuffer(void* dictBuffer, size_t dictBufferCapacity,
	119	const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
	120	COVER_params_t parameters);
	121
	122	/*! COVER_optimizeTrainFromBuffer() :
	123	The same requirements as above hold for all the parameters except `parameters`.
	124	This function tries many parameter combinations and picks the best parameters.
	125	`*parameters` is filled with the best parameters found, and the dictionary
	126	constructed with those parameters is stored in `dictBuffer`.
	127
	128	All of the parameters d, k, steps are optional.
	129	If d is non-zero then we don't check multiple values of d, otherwise we check d = {6, 8, 10, 12, 14, 16}.
	130	if steps is zero it defaults to its default value.
	131	If k is non-zero then we don't check multiple values of k, otherwise we check steps values in [16, 2048].
	132
	133	@return : size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
	134	or an error code, which can be tested with ZDICT_isError().
	135	On success `*parameters` contains the parameters selected.
	136	Note : COVER_optimizeTrainFromBuffer() requires about 8 bytes of memory for each input byte and additionally another 5 bytes of memory for each byte of memory for each thread.
	137	*/
	138	ZDICTLIB_API size_t COVER_optimizeTrainFromBuffer(void* dictBuffer, size_t dictBufferCapacity,
	139	const void* samplesBuffer, const size_t *samplesSizes, unsigned nbSamples,
	140	COVER_params_t *parameters);
	141
	142	/*! ZDICT_finalizeDictionary() :
	143
	144	Given a custom content as a basis for dictionary, and a set of samples,
	145	finalize dictionary by adding headers and statistics.
	146
	147	Samples must be stored concatenated in a flat buffer `samplesBuffer`,
	148	supplied with an array of sizes `samplesSizes`, providing the size of each sample in order.
	149
	150	dictContentSize must be > ZDICT_CONTENTSIZE_MIN bytes.
	151	maxDictSize must be >= dictContentSize, and must be > ZDICT_DICTSIZE_MIN bytes.
	152
	153	@return : size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`),
	154	or an error code, which can be tested by ZDICT_isError().
	155	note : ZDICT_finalizeDictionary() will push notifications into stderr if instructed to, using notificationLevel>0.
	156	note 2 : dictBuffer and customDictContent can overlap
	157	*/
	158	#define ZDICT_CONTENTSIZE_MIN 256
	159	#define ZDICT_DICTSIZE_MIN 512
	160	ZDICTLIB_API size_t ZDICT_finalizeDictionary(void* dictBuffer, size_t dictBufferCapacity,
	161	const void* customDictContent, size_t dictContentSize,
83	162	const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples,
84	163	ZDICT_params_t parameters);
85	164
86	165
87		/*! ZDICT_addEntropyTablesFromBuffer() :
88
89		Given a content-only dictionary (built using any 3rd party algorithm),
90		add entropy tables computed from an array of samples.
91		Samples must be stored concatenated in a flat buffer `samplesBuffer`,
92		supplied with an array of sizes `samplesSizes`, providing the size of each sample in order.
93	166
94		The input dictionary content must be stored at the end of `dictBuffer`.
95		Its size is `dictContentSize`.
96		The resulting dictionary with added entropy tables will be written back to `dictBuffer`,
97		starting from its beginning.
98		@return : size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`).
99		*/
	167	/* Deprecation warnings */
	168	/* It is generally possible to disable deprecation warnings from compiler,
	169	for example with -Wno-deprecated-declarations for gcc
	170	or _CRT_SECURE_NO_WARNINGS in Visual.
	171	Otherwise, it's also possible to manually define ZDICT_DISABLE_DEPRECATE_WARNINGS */
	172	#ifdef ZDICT_DISABLE_DEPRECATE_WARNINGS
	173	# define ZDICT_DEPRECATED(message) ZDICTLIB_API /* disable deprecation warnings */
	174	#else
	175	# define ZDICT_GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__)
	176	# if defined (__cplusplus) && (__cplusplus >= 201402) /* C++14 or greater */
	177	# define ZDICT_DEPRECATED(message) ZDICTLIB_API [[deprecated(message)]]
	178	# elif (ZDICT_GCC_VERSION >= 405) \|\| defined(__clang__)
	179	# define ZDICT_DEPRECATED(message) ZDICTLIB_API __attribute__((deprecated(message)))
	180	# elif (ZDICT_GCC_VERSION >= 301)
	181	# define ZDICT_DEPRECATED(message) ZDICTLIB_API __attribute__((deprecated))
	182	# elif defined(_MSC_VER)
	183	# define ZDICT_DEPRECATED(message) ZDICTLIB_API __declspec(deprecated(message))
	184	# else
	185	# pragma message("WARNING: You need to implement ZDICT_DEPRECATED for this compiler")
	186	# define ZDICT_DEPRECATED(message) ZDICTLIB_API
	187	# endif
	188	#endif /* ZDICT_DISABLE_DEPRECATE_WARNINGS */
	189
	190	ZDICT_DEPRECATED("use ZDICT_finalizeDictionary() instead")
100	191	size_t ZDICT_addEntropyTablesFromBuffer(void* dictBuffer, size_t dictContentSize, size_t dictBufferCapacity,
101	192	const void* samplesBuffer, const size_t* samplesSizes, unsigned nbSamples);
102	193
103	194
104
105	195	#endif /* ZDICT_STATIC_LINKING_ONLY */
106	196
107	197	#if defined (__cplusplus)

contrib/python-zstandard/zstd/zstd.h

0 +52 -24

              /* =====   ZSTDLIB_API : control library symbols visibility   ===== */
              #if defined(__GNUC__) && (__GNUC__ >= 4)
-             #  define ZSTDLIB_API __attribute__ ((visibility ("default")))
-             #elif defined(ZSTD_DLL_EXPORT) && (ZSTD_DLL_EXPORT==1)
-             #  define ZSTDLIB_API __declspec(dllexport)
+             #  define ZSTDLIB_VISIBILITY __attribute__ ((visibility ("default")))
+             #else
+             #  define ZSTDLIB_VISIBILITY
+             #endif
+             #if defined(ZSTD_DLL_EXPORT) && (ZSTD_DLL_EXPORT==1)
+             #  define ZSTDLIB_API __declspec(dllexport) ZSTDLIB_VISIBILITY
              #elif defined(ZSTD_DLL_IMPORT) && (ZSTD_DLL_IMPORT==1)
-             #  define ZSTDLIB_API __declspec(dllimport) /* It isn't required but allows to generate better code, saving a function pointer load from the IAT and an indirect jump.*/
+             #  define ZSTDLIB_API __declspec(dllimport) ZSTDLIB_VISIBILITY /* It isn't required but allows to generate better code, saving a function pointer load from the IAT and an indirect jump.*/
              #else
-             #  define ZSTDLIB_API
+             #  define ZSTDLIB_API ZSTDLIB_VISIBILITY
              #endif
              /*------   Version   ------*/
              #define ZSTD_VERSION_MAJOR    1
              #define ZSTD_VERSION_MINOR    1
-             #define ZSTD_VERSION_RELEASE  2
+             #define ZSTD_VERSION_RELEASE  3
              #define ZSTD_LIB_VERSION ZSTD_VERSION_MAJOR.ZSTD_VERSION_MINOR.ZSTD_VERSION_RELEASE
              #define ZSTD_QUOTE(str) #str
              *   When compressing multiple messages / blocks with the same dictionary, it's recommended to load it just once.
              *   ZSTD_createCDict() will create a digested dictionary, ready to start future compression operations without startup delay.
              *   ZSTD_CDict can be created once and used by multiple threads concurrently, as its usage is read-only.
-             *   `dict` can be released after ZSTD_CDict creation. */
-             ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict(const void* dict, size_t dictSize, int compressionLevel);
+             *   `dictBuffer` can be released after ZSTD_CDict creation, as its content is copied within CDict */
+             ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict(const void* dictBuffer, size_t dictSize, int compressionLevel);
              /*! ZSTD_freeCDict() :
              *   Function frees memory allocated by ZSTD_createCDict(). */
              /*! ZSTD_createDDict() :
              *   Create a digested dictionary, ready to start decompression operation without startup delay.
-             *   `dict` can be released after creation. */
-             ZSTDLIB_API ZSTD_DDict* ZSTD_createDDict(const void* dict, size_t dictSize);
+             *   dictBuffer can be released after DDict creation, as its content is copied inside DDict */
+             ZSTDLIB_API ZSTD_DDict* ZSTD_createDDict(const void* dictBuffer, size_t dictSize);
              /*! ZSTD_freeDDict() :
              *   Function frees memory allocated with ZSTD_createDDict() */
               * ***************************************************************************************/
              /* --- Constants ---*/
-             #define ZSTD_MAGICNUMBER            0xFD2FB528   /* v0.8 */
+             #define ZSTD_MAGICNUMBER            0xFD2FB528   /* >= v0.8.0 */
              #define ZSTD_MAGIC_SKIPPABLE_START  0x184D2A50U
              #define ZSTD_WINDOWLOG_MAX_32  25
              #define ZSTD_TARGETLENGTH_MAX 999
              #define ZSTD_FRAMEHEADERSIZE_MAX 18    /* for static allocation */
+             #define ZSTD_FRAMEHEADERSIZE_MIN  6
              static const size_t ZSTD_frameHeaderSize_prefix = 5;
-             static const size_t ZSTD_frameHeaderSize_min = 6;
+             static const size_t ZSTD_frameHeaderSize_min = ZSTD_FRAMEHEADERSIZE_MIN;
              static const size_t ZSTD_frameHeaderSize_max = ZSTD_FRAMEHEADERSIZE_MAX;
              static const size_t ZSTD_skippableHeaderSize = 8;  /* magic number + skippable frame length */
              } ZSTD_compressionParameters;
              typedef struct {
-                 unsigned contentSizeFlag; /**< 1: content size will be in frame header (if known). */
-                 unsigned checksumFlag;    /**< 1: will generate a 22-bits checksum at end of frame, to be used for error detection by decompressor */
+                 unsigned contentSizeFlag; /**< 1: content size will be in frame header (when known) */
+                 unsigned checksumFlag;    /**< 1: generate a 32-bits checksum at end of frame, for error detection */
-                 unsigned noDictIDFlag;    /**< 1: no dict ID will be saved into frame header (if dictionary compression) */
              } ZSTD_frameParameters;
               *  Gives the amount of memory used by a given ZSTD_CCtx */
              ZSTDLIB_API size_t ZSTD_sizeof_CCtx(const ZSTD_CCtx* cctx);
+             typedef enum {
+                 ZSTD_p_forceWindow   /* Force back-references to remain < windowSize, even when referencing Dictionary content (default:0)*/
+             } ZSTD_CCtxParameter;
+             /*! ZSTD_setCCtxParameter() :
+              *  Set advanced parameters, selected through enum ZSTD_CCtxParameter
+              *  @result : 0, or an error code (which can be tested with ZSTD_isError()) */
+             ZSTDLIB_API size_t ZSTD_setCCtxParameter(ZSTD_CCtx* cctx, ZSTD_CCtxParameter param, unsigned value);
+             /*! ZSTD_createCDict_byReference() :
+              *  Create a digested dictionary for compression
+              *  Dictionary content is simply referenced, and therefore stays in dictBuffer.
+              *  It is important that dictBuffer outlives CDict, it must remain read accessible throughout the lifetime of CDict */
+             ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict_byReference(const void* dictBuffer, size_t dictSize, int compressionLevel);
              /*! ZSTD_createCDict_advanced() :
               *  Create a ZSTD_CDict using external alloc and free, and customized compression parameters */
-             ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict_advanced(const void* dict, size_t dictSize,
+             ZSTDLIB_API ZSTD_CDict* ZSTD_createCDict_advanced(const void* dict, size_t dictSize, unsigned byReference,
                                                                ZSTD_parameters params, ZSTD_customMem customMem);
              /*! ZSTD_sizeof_CDict() :
               *  Gives the amount of memory used by a given ZSTD_DCtx */
              ZSTDLIB_API size_t ZSTD_sizeof_DCtx(const ZSTD_DCtx* dctx);
+             /*! ZSTD_createDDict_byReference() :
+              *  Create a digested dictionary, ready to start decompression operation without startup delay.
+              *  Dictionary content is simply referenced, and therefore stays in dictBuffer.
+              *  It is important that dictBuffer outlives DDict, it must remain read accessible throughout the lifetime of DDict */
+             ZSTDLIB_API ZSTD_DDict* ZSTD_createDDict_byReference(const void* dictBuffer, size_t dictSize);
+             ZSTDLIB_API ZSTD_DDict* ZSTD_createDDict_advanced(const void* dict, size_t dictSize,
+                                                               unsigned byReference, ZSTD_customMem customMem);
              /*! ZSTD_sizeof_DDict() :
               *  Gives the amount of memory used by a given ZSTD_DDict */
              ZSTDLIB_API size_t ZSTD_sizeof_DDict(const ZSTD_DDict* ddict);
               *  Provides the dictID stored within dictionary.
               *  if @return == 0, the dictionary is not conformant with Zstandard specification.
               *  It can still be loaded, but as a content-only dictionary. */
-             unsigned ZSTD_getDictID_fromDict(const void* dict, size_t dictSize);
+             ZSTDLIB_API unsigned ZSTD_getDictID_fromDict(const void* dict, size_t dictSize);
              /*! ZSTD_getDictID_fromDDict() :
               *  Provides the dictID of the dictionary loaded into `ddict`.
               *  If @return == 0, the dictionary is not conformant to Zstandard specification, or empty.
               *  Non-conformant dictionaries can still be loaded, but as content-only dictionaries. */
-             unsigned ZSTD_getDictID_fromDDict(const ZSTD_DDict* ddict);
+             ZSTDLIB_API unsigned ZSTD_getDictID_fromDDict(const ZSTD_DDict* ddict);
              /*! ZSTD_getDictID_fromFrame() :
               *  Provides the dictID required to decompressed the frame stored within `src`.
               *  - `srcSize` is too small, and as a result, the frame header could not be decoded (only possible if `srcSize < ZSTD_FRAMEHEADERSIZE_MAX`).
               *  - This is not a Zstandard frame.
               *  When identifying the exact failure cause, it's possible to used ZSTD_getFrameParams(), which will provide a more precise error code. */
-             unsigned ZSTD_getDictID_fromFrame(const void* src, size_t srcSize);
+             ZSTDLIB_API unsigned ZSTD_getDictID_fromFrame(const void* src, size_t srcSize);
              /********************************************************************
              /*=====   Advanced Streaming compression functions  =====*/
              ZSTDLIB_API ZSTD_CStream* ZSTD_createCStream_advanced(ZSTD_customMem customMem);
              ZSTDLIB_API size_t ZSTD_initCStream_srcSize(ZSTD_CStream* zcs, int compressionLevel, unsigned long long pledgedSrcSize);   /**< pledgedSrcSize must be correct */
-             ZSTDLIB_API size_t ZSTD_initCStream_usingDict(ZSTD_CStream* zcs, const void* dict, size_t dictSize, int compressionLevel);
+             ZSTDLIB_API size_t ZSTD_initCStream_usingDict(ZSTD_CStream* zcs, const void* dict, size_t dictSize, int compressionLevel); /**< note: a dict will not be used if dict == NULL or dictSize < 8 */
              ZSTDLIB_API size_t ZSTD_initCStream_advanced(ZSTD_CStream* zcs, const void* dict, size_t dictSize,
                                                           ZSTD_parameters params, unsigned long long pledgedSrcSize);  /**< pledgedSrcSize is optional and can be zero == unknown */
              ZSTDLIB_API size_t ZSTD_initCStream_usingCDict(ZSTD_CStream* zcs, const ZSTD_CDict* cdict);  /**< note : cdict will just be referenced, and must outlive compression session */
              /*=====   Advanced Streaming decompression functions  =====*/
-             typedef enum { ZSTDdsp_maxWindowSize } ZSTD_DStreamParameter_e;
+             typedef enum { DStream_p_maxWindowSize } ZSTD_DStreamParameter_e;
              ZSTDLIB_API ZSTD_DStream* ZSTD_createDStream_advanced(ZSTD_customMem customMem);
-             ZSTDLIB_API size_t ZSTD_initDStream_usingDict(ZSTD_DStream* zds, const void* dict, size_t dictSize);
+             ZSTDLIB_API size_t ZSTD_initDStream_usingDict(ZSTD_DStream* zds, const void* dict, size_t dictSize); /**< note: a dict will not be used if dict == NULL or dictSize < 8 */
              ZSTDLIB_API size_t ZSTD_setDStreamParameter(ZSTD_DStream* zds, ZSTD_DStreamParameter_e paramType, unsigned paramValue);
              ZSTDLIB_API size_t ZSTD_initDStream_usingDDict(ZSTD_DStream* zds, const ZSTD_DDict* ddict);  /**< note : ddict will just be referenced, and must outlive decompression session */
              ZSTDLIB_API size_t ZSTD_resetDStream(ZSTD_DStream* zds);  /**< re-use decompression parameters from previous init; saves dictionary loading */
                  In which case, it will "discard" the relevant memory section from its history.
                Finish a frame with ZSTD_compressEnd(), which will write the last block(s) and optional checksum.
-               It's possible to use a NULL,0 src content, in which case, it will write a final empty block to end the frame,
-               Without last block mark, frames will be considered unfinished (broken) by decoders.
+               It's possible to use srcSize==0, in which case, it will write a final empty block to end the frame.
+               Without last block mark, frames will be considered unfinished (corrupted) by decoders.
-               You can then reuse `ZSTD_CCtx` (ZSTD_compressBegin()) to compress some new frame.
+               `ZSTD_CCtx` object can be re-used (ZSTD_compressBegin()) to compress some new frame.
              */
              /*=====   Buffer-less streaming compression functions  =====*/
              ZSTDLIB_API size_t ZSTD_compressBegin_usingDict(ZSTD_CCtx* cctx, const void* dict, size_t dictSize, int compressionLevel);
              ZSTDLIB_API size_t ZSTD_compressBegin_advanced(ZSTD_CCtx* cctx, const void* dict, size_t dictSize, ZSTD_parameters params, unsigned long long pledgedSrcSize);
              ZSTDLIB_API size_t ZSTD_copyCCtx(ZSTD_CCtx* cctx, const ZSTD_CCtx* preparedCCtx, unsigned long long pledgedSrcSize);
+             ZSTDLIB_API size_t ZSTD_compressBegin_usingCDict(ZSTD_CCtx* cctx, const ZSTD_CDict* cdict, unsigned long long pledgedSrcSize);
              ZSTDLIB_API size_t ZSTD_compressContinue(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize);
              ZSTDLIB_API size_t ZSTD_compressEnd(ZSTD_CCtx* cctx, void* dst, size_t dstCapacity, const void* src, size_t srcSize);

contrib/python-zstandard/zstd_cffi.py

0 +953 -63

This diff has been collapsed as it changes many lines, (1016 lines changed) Show them Hide them
			@@ -8,145 +8,1035 b''
	8	8
	9	9	from __future__ import absolute_import, unicode_literals
	10	10
	11		import io
		11	import sys
	12	12
	13	13	from _zstd_cffi import (
	14	14	ffi,
	15	15	lib,
	16	16	)
	17	17
		18	if sys.version_info[0] == 2:
		19	bytes_type = str
		20	int_type = long
		21	else:
		22	bytes_type = bytes
		23	int_type = int
	18	24
	19		_CSTREAM_IN_SIZE = lib.ZSTD_CStreamInSize()
	20		~~_CSTREAM_O~~UT_SIZE = lib.ZSTD_CStream~~Out~~Size()
		25
		26	COMPRESSION_RECOMMENDED_INPUT_SIZE = lib.ZSTD_CStreamInSize()
		27	COMPRESSION_RECOMMENDED_OUTPUT_SIZE = lib.ZSTD_CStreamOutSize()
		28	DECOMPRESSION_RECOMMENDED_INPUT_SIZE = lib.ZSTD_DStreamInSize()
		29	DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE = lib.ZSTD_DStreamOutSize()
		30
		31	new_nonzero = ffi.new_allocator(should_clear_after_alloc=False)
		32
		33
		34	MAX_COMPRESSION_LEVEL = lib.ZSTD_maxCLevel()
		35	MAGIC_NUMBER = lib.ZSTD_MAGICNUMBER
		36	FRAME_HEADER = b'\x28\xb5\x2f\xfd'
		37	ZSTD_VERSION = (lib.ZSTD_VERSION_MAJOR, lib.ZSTD_VERSION_MINOR, lib.ZSTD_VERSION_RELEASE)
		38
		39	WINDOWLOG_MIN = lib.ZSTD_WINDOWLOG_MIN
		40	WINDOWLOG_MAX = lib.ZSTD_WINDOWLOG_MAX
		41	CHAINLOG_MIN = lib.ZSTD_CHAINLOG_MIN
		42	CHAINLOG_MAX = lib.ZSTD_CHAINLOG_MAX
		43	HASHLOG_MIN = lib.ZSTD_HASHLOG_MIN
		44	HASHLOG_MAX = lib.ZSTD_HASHLOG_MAX
		45	HASHLOG3_MAX = lib.ZSTD_HASHLOG3_MAX
		46	SEARCHLOG_MIN = lib.ZSTD_SEARCHLOG_MIN
		47	SEARCHLOG_MAX = lib.ZSTD_SEARCHLOG_MAX
		48	SEARCHLENGTH_MIN = lib.ZSTD_SEARCHLENGTH_MIN
		49	SEARCHLENGTH_MAX = lib.ZSTD_SEARCHLENGTH_MAX
		50	TARGETLENGTH_MIN = lib.ZSTD_TARGETLENGTH_MIN
		51	TARGETLENGTH_MAX = lib.ZSTD_TARGETLENGTH_MAX
		52
		53	STRATEGY_FAST = lib.ZSTD_fast
		54	STRATEGY_DFAST = lib.ZSTD_dfast
		55	STRATEGY_GREEDY = lib.ZSTD_greedy
		56	STRATEGY_LAZY = lib.ZSTD_lazy
		57	STRATEGY_LAZY2 = lib.ZSTD_lazy2
		58	STRATEGY_BTLAZY2 = lib.ZSTD_btlazy2
		59	STRATEGY_BTOPT = lib.ZSTD_btopt
		60
		61	COMPRESSOBJ_FLUSH_FINISH = 0
		62	COMPRESSOBJ_FLUSH_BLOCK = 1
		63
		64
		65	class ZstdError(Exception):
		66	pass
	21	67
	22	68
	23		class ~~_Zstd~~Compression~~Writer~~(object):
	24		def __init__(self, cstream, writer):
	25		self._cstream = cstream
		69	class CompressionParameters(object):
		70	def __init__(self, window_log, chain_log, hash_log, search_log,
		71	search_length, target_length, strategy):
		72	if window_log < WINDOWLOG_MIN or window_log > WINDOWLOG_MAX:
		73	raise ValueError('invalid window log value')
		74
		75	if chain_log < CHAINLOG_MIN or chain_log > CHAINLOG_MAX:
		76	raise ValueError('invalid chain log value')
		77
		78	if hash_log < HASHLOG_MIN or hash_log > HASHLOG_MAX:
		79	raise ValueError('invalid hash log value')
		80
		81	if search_log < SEARCHLOG_MIN or search_log > SEARCHLOG_MAX:
		82	raise ValueError('invalid search log value')
		83
		84	if search_length < SEARCHLENGTH_MIN or search_length > SEARCHLENGTH_MAX:
		85	raise ValueError('invalid search length value')
		86
		87	if target_length < TARGETLENGTH_MIN or target_length > TARGETLENGTH_MAX:
		88	raise ValueError('invalid target length value')
		89
		90	if strategy < STRATEGY_FAST or strategy > STRATEGY_BTOPT:
		91	raise ValueError('invalid strategy value')
		92
		93	self.window_log = window_log
		94	self.chain_log = chain_log
		95	self.hash_log = hash_log
		96	self.search_log = search_log
		97	self.search_length = search_length
		98	self.target_length = target_length
		99	self.strategy = strategy
		100
		101	def as_compression_parameters(self):
		102	p = ffi.new('ZSTD_compressionParameters *')[0]
		103	p.windowLog = self.window_log
		104	p.chainLog = self.chain_log
		105	p.hashLog = self.hash_log
		106	p.searchLog = self.search_log
		107	p.searchLength = self.search_length
		108	p.targetLength = self.target_length
		109	p.strategy = self.strategy
		110
		111	return p
		112
		113	def get_compression_parameters(level, source_size=0, dict_size=0):
		114	params = lib.ZSTD_getCParams(level, source_size, dict_size)
		115	return CompressionParameters(window_log=params.windowLog,
		116	chain_log=params.chainLog,
		117	hash_log=params.hashLog,
		118	search_log=params.searchLog,
		119	search_length=params.searchLength,
		120	target_length=params.targetLength,
		121	strategy=params.strategy)
		122
		123
		124	def estimate_compression_context_size(params):
		125	if not isinstance(params, CompressionParameters):
		126	raise ValueError('argument must be a CompressionParameters')
		127
		128	cparams = params.as_compression_parameters()
		129	return lib.ZSTD_estimateCCtxSize(cparams)
		130
		131
		132	def estimate_decompression_context_size():
		133	return lib.ZSTD_estimateDCtxSize()
		134
		135
		136	class ZstdCompressionWriter(object):
		137	def __init__(self, compressor, writer, source_size, write_size):
		138	self._compressor = compressor
	26	139	self._writer = writer
		140	self._source_size = source_size
		141	self._write_size = write_size
		142	self._entered = False
	27	143
	28	144	def __enter__(self):
		145	if self._entered:
		146	raise ZstdError('cannot __enter__ multiple times')
		147
		148	self._cstream = self._compressor._get_cstream(self._source_size)
		149	self._entered = True
	29	150	return self
	30	151
	31	152	def __exit__(self, exc_type, exc_value, exc_tb):
		153	self._entered = False
		154
	32	155	if not exc_type and not exc_value and not exc_tb:
	33	156	out_buffer = ffi.new('ZSTD_outBuffer *')
	34		out_buffer.~~dst~~ = ffi.new('char[]', ~~_CSTREAM_OUT_SIZE~~)
	35		out_buffer.~~size~~ = ~~_CSTREAM_OUT_SIZE~~
		157	dst_buffer = ffi.new('char[]', self._write_size)
		158	out_buffer.dst = dst_buffer
		159	out_buffer.size = self._write_size
	36	160	out_buffer.pos = 0
	37	161
	38	162	while True:
	39		res = lib.ZSTD_endStream(self._cstream, out_buffer)
	40		if lib.ZSTD_isError(res):
	41		raise ~~Exception~~('error ending compression stream: %s' % ~~lib~~.~~ZSTD_getErrorName~~)
		163	zresult = lib.ZSTD_endStream(self._cstream, out_buffer)
		164	if lib.ZSTD_isError(zresult):
		165	raise ZstdError('error ending compression stream: %s' %
		166	ffi.string(lib.ZSTD_getErrorName(zresult)))
	42	167
	43	168	if out_buffer.pos:
	44		self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
		169	self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos)[:])
	45	170	out_buffer.pos = 0
	46	171
	47		if res == 0:
		172	if zresult == 0:
	48	173	break
	49	174
		175	self._cstream = None
		176	self._compressor = None
		177
	50	178	return False
	51	179
		180	def memory_size(self):
		181	if not self._entered:
		182	raise ZstdError('cannot determine size of an inactive compressor; '
		183	'call when a context manager is active')
		184
		185	return lib.ZSTD_sizeof_CStream(self._cstream)
		186
	52	187	def write(self, data):
		188	if not self._entered:
		189	raise ZstdError('write() must be called from an active context '
		190	'manager')
		191
		192	total_write = 0
		193
		194	data_buffer = ffi.from_buffer(data)
		195
		196	in_buffer = ffi.new('ZSTD_inBuffer *')
		197	in_buffer.src = data_buffer
		198	in_buffer.size = len(data_buffer)
		199	in_buffer.pos = 0
		200
	53	201	out_buffer = ffi.new('ZSTD_outBuffer *')
	54		out_buffer.~~dst~~ = ffi.new('char[]', ~~_CSTREAM_OUT_SIZE~~)
	55		out_buffer.~~size~~ = ~~_CSTREAM_OUT_SIZE~~
		202	dst_buffer = ffi.new('char[]', self._write_size)
		203	out_buffer.dst = dst_buffer
		204	out_buffer.size = self._write_size
		205	out_buffer.pos = 0
		206
		207	while in_buffer.pos < in_buffer.size:
		208	zresult = lib.ZSTD_compressStream(self._cstream, out_buffer, in_buffer)
		209	if lib.ZSTD_isError(zresult):
		210	raise ZstdError('zstd compress error: %s' %
		211	ffi.string(lib.ZSTD_getErrorName(zresult)))
		212
		213	if out_buffer.pos:
		214	self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos)[:])
		215	total_write += out_buffer.pos
		216	out_buffer.pos = 0
		217
		218	return total_write
		219
		220	def flush(self):
		221	if not self._entered:
		222	raise ZstdError('flush must be called from an active context manager')
		223
		224	total_write = 0
		225
		226	out_buffer = ffi.new('ZSTD_outBuffer *')
		227	dst_buffer = ffi.new('char[]', self._write_size)
		228	out_buffer.dst = dst_buffer
		229	out_buffer.size = self._write_size
		230	out_buffer.pos = 0
		231
		232	while True:
		233	zresult = lib.ZSTD_flushStream(self._cstream, out_buffer)
		234	if lib.ZSTD_isError(zresult):
		235	raise ZstdError('zstd compress error: %s' %
		236	ffi.string(lib.ZSTD_getErrorName(zresult)))
		237
		238	if not out_buffer.pos:
		239	break
		240
		241	self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos)[:])
		242	total_write += out_buffer.pos
	56	243	out_buffer.pos = 0
	57	244
	58		# TODO can we reuse existing memory?
	59		in_buffer = ffi.new('ZSTD_inBuffer *')
	60		in_buffer.src = ffi.new('char[]', data)
	61		in_buffer.size = len(data)
	62		in_buffer.pos = 0
	63		while in_buffer.pos < in_buffer.size:
	64		res = lib.ZSTD_compressStream(self._cstream, out_buffer, in_buffer)
	65		if lib.ZSTD_isError(res):
	66		raise Exception('zstd compress error: %s' % lib.ZSTD_getErrorName(res))
		245	return total_write
		246
		247
		248	class ZstdCompressionObj(object):
		249	def compress(self, data):
		250	if self._finished:
		251	raise ZstdError('cannot call compress() after compressor finished')
		252
		253	data_buffer = ffi.from_buffer(data)
		254	source = ffi.new('ZSTD_inBuffer *')
		255	source.src = data_buffer
		256	source.size = len(data_buffer)
		257	source.pos = 0
		258
		259	chunks = []
		260
		261	while source.pos < len(data):
		262	zresult = lib.ZSTD_compressStream(self._cstream, self._out, source)
		263	if lib.ZSTD_isError(zresult):
		264	raise ZstdError('zstd compress error: %s' %
		265	ffi.string(lib.ZSTD_getErrorName(zresult)))
		266
		267	if self._out.pos:
		268	chunks.append(ffi.buffer(self._out.dst, self._out.pos)[:])
		269	self._out.pos = 0
		270
		271	return b''.join(chunks)
		272
		273	def flush(self, flush_mode=COMPRESSOBJ_FLUSH_FINISH):
		274	if flush_mode not in (COMPRESSOBJ_FLUSH_FINISH, COMPRESSOBJ_FLUSH_BLOCK):
		275	raise ValueError('flush mode not recognized')
		276
		277	if self._finished:
		278	raise ZstdError('compressor object already finished')
		279
		280	assert self._out.pos == 0
	67	281
	68		if out_buffer.pos:
	69		self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
	70		out_buffer.pos = 0
		282	if flush_mode == COMPRESSOBJ_FLUSH_BLOCK:
		283	zresult = lib.ZSTD_flushStream(self._cstream, self._out)
		284	if lib.ZSTD_isError(zresult):
		285	raise ZstdError('zstd compress error: %s' %
		286	ffi.string(lib.ZSTD_getErrorName(zresult)))
		287
		288	# Output buffer is guaranteed to hold full block.
		289	assert zresult == 0
		290
		291	if self._out.pos:
		292	result = ffi.buffer(self._out.dst, self._out.pos)[:]
		293	self._out.pos = 0
		294	return result
		295	else:
		296	return b''
		297
		298	assert flush_mode == COMPRESSOBJ_FLUSH_FINISH
		299	self._finished = True
		300
		301	chunks = []
		302
		303	while True:
		304	zresult = lib.ZSTD_endStream(self._cstream, self._out)
		305	if lib.ZSTD_isError(zresult):
		306	raise ZstdError('error ending compression stream: %s' %
		307	ffi.string(lib.ZSTD_getErroName(zresult)))
		308
		309	if self._out.pos:
		310	chunks.append(ffi.buffer(self._out.dst, self._out.pos)[:])
		311	self._out.pos = 0
		312
		313	if not zresult:
		314	break
		315
		316	# GC compression stream immediately.
		317	self._cstream = None
		318
		319	return b''.join(chunks)
	71	320
	72	321
	73	322	class ZstdCompressor(object):
	74		def __init__(self, level=3, dict_data=None, compression_params=None):
	75		if dict_data:
	76		raise Exception('dict_data not yet supported')
	77		if compression_params:
	78		raise Exception('compression_params not yet supported')
		323	def __init__(self, level=3, dict_data=None, compression_params=None,
		324	write_checksum=False, write_content_size=False,
		325	write_dict_id=True):
		326	if level < 1:
		327	raise ValueError('level must be greater than 0')
		328	elif level > lib.ZSTD_maxCLevel():
		329	raise ValueError('level must be less than %d' % lib.ZSTD_maxCLevel())
	79	330
	80	331	self._compression_level = level
		332	self._dict_data = dict_data
		333	self._cparams = compression_params
		334	self._fparams = ffi.new('ZSTD_frameParameters *')[0]
		335	self._fparams.checksumFlag = write_checksum
		336	self._fparams.contentSizeFlag = write_content_size
		337	self._fparams.noDictIDFlag = not write_dict_id
	81	338
	82		def compress(self, data):
	83		# Just use the stream API for now.
	84		output = io.BytesIO()
	85		with self.write_to(output) as compressor:
	86		compressor.write(data)
	87		return output.getvalue()
		339	cctx = lib.ZSTD_createCCtx()
		340	if cctx == ffi.NULL:
		341	raise MemoryError()
		342
		343	self._cctx = ffi.gc(cctx, lib.ZSTD_freeCCtx)
		344
		345	def compress(self, data, allow_empty=False):
		346	if len(data) == 0 and self._fparams.contentSizeFlag and not allow_empty:
		347	raise ValueError('cannot write empty inputs when writing content sizes')
		348
		349	# TODO use a CDict for performance.
		350	dict_data = ffi.NULL
		351	dict_size = 0
		352
		353	if self._dict_data:
		354	dict_data = self._dict_data.as_bytes()
		355	dict_size = len(self._dict_data)
		356
		357	params = ffi.new('ZSTD_parameters *')[0]
		358	if self._cparams:
		359	params.cParams = self._cparams.as_compression_parameters()
		360	else:
		361	params.cParams = lib.ZSTD_getCParams(self._compression_level, len(data),
		362	dict_size)
		363	params.fParams = self._fparams
		364
		365	dest_size = lib.ZSTD_compressBound(len(data))
		366	out = new_nonzero('char[]', dest_size)
	88	367
	89		def copy_stream(self, ifh, ofh):
	90		cstream = self._get_cstream()
		368	zresult = lib.ZSTD_compress_advanced(self._cctx,
		369	ffi.addressof(out), dest_size,
		370	data, len(data),
		371	dict_data, dict_size,
		372	params)
		373
		374	if lib.ZSTD_isError(zresult):
		375	raise ZstdError('cannot compress: %s' %
		376	ffi.string(lib.ZSTD_getErrorName(zresult)))
		377
		378	return ffi.buffer(out, zresult)[:]
		379
		380	def compressobj(self, size=0):
		381	cstream = self._get_cstream(size)
		382	cobj = ZstdCompressionObj()
		383	cobj._cstream = cstream
		384	cobj._out = ffi.new('ZSTD_outBuffer *')
		385	cobj._dst_buffer = ffi.new('char[]', COMPRESSION_RECOMMENDED_OUTPUT_SIZE)
		386	cobj._out.dst = cobj._dst_buffer
		387	cobj._out.size = COMPRESSION_RECOMMENDED_OUTPUT_SIZE
		388	cobj._out.pos = 0
		389	cobj._compressor = self
		390	cobj._finished = False
		391
		392	return cobj
		393
		394	def copy_stream(self, ifh, ofh, size=0,
		395	read_size=COMPRESSION_RECOMMENDED_INPUT_SIZE,
		396	write_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE):
		397
		398	if not hasattr(ifh, 'read'):
		399	raise ValueError('first argument must have a read() method')
		400	if not hasattr(ofh, 'write'):
		401	raise ValueError('second argument must have a write() method')
		402
		403	cstream = self._get_cstream(size)
	91	404
	92	405	in_buffer = ffi.new('ZSTD_inBuffer *')
	93	406	out_buffer = ffi.new('ZSTD_outBuffer *')
	94	407
	95		out_buffer.~~dst~~ = ffi.new('char[]', ~~_CSTREAM_OUT_SIZE~~)
	96		out_buffer.~~size~~ = ~~_CSTREAM_OUT_SIZE~~
		408	dst_buffer = ffi.new('char[]', write_size)
		409	out_buffer.dst = dst_buffer
		410	out_buffer.size = write_size
	97	411	out_buffer.pos = 0
	98	412
	99	413	total_read, total_write = 0, 0
	100	414
	101	415	while True:
	102		data = ifh.read(~~_CSTREAM_IN_SIZE~~)
		416	data = ifh.read(read_size)
	103	417	if not data:
	104	418	break
	105	419
	106		~~total_read~~ += ~~len~~(data)
	107
	108		in_buffer.src = ~~ffi~~.~~new~~(~~'char[]'~~, ~~data~~)
	109		in_buffer.size = len(data)
		420	data_buffer = ffi.from_buffer(data)
		421	total_read += len(data_buffer)
		422	in_buffer.src = data_buffer
		423	in_buffer.size = len(data_buffer)
	110	424	in_buffer.pos = 0
	111	425
	112	426	while in_buffer.pos < in_buffer.size:
	113		res = lib.ZSTD_compressStream(cstream, out_buffer, in_buffer)
	114		if lib.ZSTD_isError(res):
	115		raise ~~Exception~~('zstd compress error: %s' %
	116		lib.ZSTD_getErrorName(res))
		427	zresult = lib.ZSTD_compressStream(cstream, out_buffer, in_buffer)
		428	if lib.ZSTD_isError(zresult):
		429	raise ZstdError('zstd compress error: %s' %
		430	ffi.string(lib.ZSTD_getErrorName(zresult)))
	117	431
	118	432	if out_buffer.pos:
	119	433	ofh.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
	120		total_write = out_buffer.pos
		434	total_write += out_buffer.pos
	121	435	out_buffer.pos = 0
	122	436
	123	437	# We've finished reading. Flush the compressor.
	124	438	while True:
	125		res = lib.ZSTD_endStream(cstream, out_buffer)
	126		if lib.ZSTD_isError(res):
	127		raise ~~Exception~~('error ending compression stream: %s' %
	128		lib.ZSTD_getErrorName(res))
		439	zresult = lib.ZSTD_endStream(cstream, out_buffer)
		440	if lib.ZSTD_isError(zresult):
		441	raise ZstdError('error ending compression stream: %s' %
		442	ffi.string(lib.ZSTD_getErrorName(zresult)))
	129	443
	130	444	if out_buffer.pos:
	131	445	ofh.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
	132	446	total_write += out_buffer.pos
	133	447	out_buffer.pos = 0
	134	448
	135		if res == 0:
		449	if zresult == 0:
	136	450	break
	137	451
	138	452	return total_read, total_write
	139	453
	140		def write_to(self, writer):
	141		return _ZstdCompressionWriter(self._get_cstream(), writer)
		454	def write_to(self, writer, size=0,
		455	write_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE):
		456
		457	if not hasattr(writer, 'write'):
		458	raise ValueError('must pass an object with a write() method')
		459
		460	return ZstdCompressionWriter(self, writer, size, write_size)
		461
		462	def read_from(self, reader, size=0,
		463	read_size=COMPRESSION_RECOMMENDED_INPUT_SIZE,
		464	write_size=COMPRESSION_RECOMMENDED_OUTPUT_SIZE):
		465	if hasattr(reader, 'read'):
		466	have_read = True
		467	elif hasattr(reader, '__getitem__'):
		468	have_read = False
		469	buffer_offset = 0
		470	size = len(reader)
		471	else:
		472	raise ValueError('must pass an object with a read() method or '
		473	'conforms to buffer protocol')
		474
		475	cstream = self._get_cstream(size)
		476
		477	in_buffer = ffi.new('ZSTD_inBuffer *')
		478	out_buffer = ffi.new('ZSTD_outBuffer *')
		479
		480	in_buffer.src = ffi.NULL
		481	in_buffer.size = 0
		482	in_buffer.pos = 0
		483
		484	dst_buffer = ffi.new('char[]', write_size)
		485	out_buffer.dst = dst_buffer
		486	out_buffer.size = write_size
		487	out_buffer.pos = 0
		488
		489	while True:
		490	# We should never have output data sitting around after a previous
		491	# iteration.
		492	assert out_buffer.pos == 0
		493
		494	# Collect input data.
		495	if have_read:
		496	read_result = reader.read(read_size)
		497	else:
		498	remaining = len(reader) - buffer_offset
		499	slice_size = min(remaining, read_size)
		500	read_result = reader[buffer_offset:buffer_offset + slice_size]
		501	buffer_offset += slice_size
	142	502
	143		def _get_cstream(self):
		503	# No new input data. Break out of the read loop.
		504	if not read_result:
		505	break
		506
		507	# Feed all read data into the compressor and emit output until
		508	# exhausted.
		509	read_buffer = ffi.from_buffer(read_result)
		510	in_buffer.src = read_buffer
		511	in_buffer.size = len(read_buffer)
		512	in_buffer.pos = 0
		513
		514	while in_buffer.pos < in_buffer.size:
		515	zresult = lib.ZSTD_compressStream(cstream, out_buffer, in_buffer)
		516	if lib.ZSTD_isError(zresult):
		517	raise ZstdError('zstd compress error: %s' %
		518	ffi.string(lib.ZSTD_getErrorName(zresult)))
		519
		520	if out_buffer.pos:
		521	data = ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
		522	out_buffer.pos = 0
		523	yield data
		524
		525	assert out_buffer.pos == 0
		526
		527	# And repeat the loop to collect more data.
		528	continue
		529
		530	# If we get here, input is exhausted. End the stream and emit what
		531	# remains.
		532	while True:
		533	assert out_buffer.pos == 0
		534	zresult = lib.ZSTD_endStream(cstream, out_buffer)
		535	if lib.ZSTD_isError(zresult):
		536	raise ZstdError('error ending compression stream: %s' %
		537	ffi.string(lib.ZSTD_getErrorName(zresult)))
		538
		539	if out_buffer.pos:
		540	data = ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
		541	out_buffer.pos = 0
		542	yield data
		543
		544	if zresult == 0:
		545	break
		546
		547	def _get_cstream(self, size):
	144	548	cstream = lib.ZSTD_createCStream()
		549	if cstream == ffi.NULL:
		550	raise MemoryError()
		551
	145	552	cstream = ffi.gc(cstream, lib.ZSTD_freeCStream)
	146	553
	147		res = lib.ZSTD_initCStream(cstream, self._compression_level)
	148		if lib.ZSTD_isError(res):
		554	dict_data = ffi.NULL
		555	dict_size = 0
		556	if self._dict_data:
		557	dict_data = self._dict_data.as_bytes()
		558	dict_size = len(self._dict_data)
		559
		560	zparams = ffi.new('ZSTD_parameters *')[0]
		561	if self._cparams:
		562	zparams.cParams = self._cparams.as_compression_parameters()
		563	else:
		564	zparams.cParams = lib.ZSTD_getCParams(self._compression_level,
		565	size, dict_size)
		566	zparams.fParams = self._fparams
		567
		568	zresult = lib.ZSTD_initCStream_advanced(cstream, dict_data, dict_size,
		569	zparams, size)
		570	if lib.ZSTD_isError(zresult):
	149	571	raise Exception('cannot init CStream: %s' %
	150		lib.ZSTD_getErrorName(res))
		572	ffi.string(lib.ZSTD_getErrorName(zresult)))
	151	573
	152	574	return cstream
		575
		576
		577	class FrameParameters(object):
		578	def __init__(self, fparams):
		579	self.content_size = fparams.frameContentSize
		580	self.window_size = fparams.windowSize
		581	self.dict_id = fparams.dictID
		582	self.has_checksum = bool(fparams.checksumFlag)
		583
		584
		585	def get_frame_parameters(data):
		586	if not isinstance(data, bytes_type):
		587	raise TypeError('argument must be bytes')
		588
		589	params = ffi.new('ZSTD_frameParams *')
		590
		591	zresult = lib.ZSTD_getFrameParams(params, data, len(data))
		592	if lib.ZSTD_isError(zresult):
		593	raise ZstdError('cannot get frame parameters: %s' %
		594	ffi.string(lib.ZSTD_getErrorName(zresult)))
		595
		596	if zresult:
		597	raise ZstdError('not enough data for frame parameters; need %d bytes' %
		598	zresult)
		599
		600	return FrameParameters(params[0])
		601
		602
		603	class ZstdCompressionDict(object):
		604	def __init__(self, data):
		605	assert isinstance(data, bytes_type)
		606	self._data = data
		607
		608	def __len__(self):
		609	return len(self._data)
		610
		611	def dict_id(self):
		612	return int_type(lib.ZDICT_getDictID(self._data, len(self._data)))
		613
		614	def as_bytes(self):
		615	return self._data
		616
		617
		618	def train_dictionary(dict_size, samples, parameters=None):
		619	if not isinstance(samples, list):
		620	raise TypeError('samples must be a list')
		621
		622	total_size = sum(map(len, samples))
		623
		624	samples_buffer = new_nonzero('char[]', total_size)
		625	sample_sizes = new_nonzero('size_t[]', len(samples))
		626
		627	offset = 0
		628	for i, sample in enumerate(samples):
		629	if not isinstance(sample, bytes_type):
		630	raise ValueError('samples must be bytes')
		631
		632	l = len(sample)
		633	ffi.memmove(samples_buffer + offset, sample, l)
		634	offset += l
		635	sample_sizes[i] = l
		636
		637	dict_data = new_nonzero('char[]', dict_size)
		638
		639	zresult = lib.ZDICT_trainFromBuffer(ffi.addressof(dict_data), dict_size,
		640	ffi.addressof(samples_buffer),
		641	ffi.addressof(sample_sizes, 0),
		642	len(samples))
		643	if lib.ZDICT_isError(zresult):
		644	raise ZstdError('Cannot train dict: %s' %
		645	ffi.string(lib.ZDICT_getErrorName(zresult)))
		646
		647	return ZstdCompressionDict(ffi.buffer(dict_data, zresult)[:])
		648
		649
		650	class ZstdDecompressionObj(object):
		651	def __init__(self, decompressor):
		652	self._decompressor = decompressor
		653	self._dstream = self._decompressor._get_dstream()
		654	self._finished = False
		655
		656	def decompress(self, data):
		657	if self._finished:
		658	raise ZstdError('cannot use a decompressobj multiple times')
		659
		660	in_buffer = ffi.new('ZSTD_inBuffer *')
		661	out_buffer = ffi.new('ZSTD_outBuffer *')
		662
		663	data_buffer = ffi.from_buffer(data)
		664	in_buffer.src = data_buffer
		665	in_buffer.size = len(data_buffer)
		666	in_buffer.pos = 0
		667
		668	dst_buffer = ffi.new('char[]', DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE)
		669	out_buffer.dst = dst_buffer
		670	out_buffer.size = len(dst_buffer)
		671	out_buffer.pos = 0
		672
		673	chunks = []
		674
		675	while in_buffer.pos < in_buffer.size:
		676	zresult = lib.ZSTD_decompressStream(self._dstream, out_buffer, in_buffer)
		677	if lib.ZSTD_isError(zresult):
		678	raise ZstdError('zstd decompressor error: %s' %
		679	ffi.string(lib.ZSTD_getErrorName(zresult)))
		680
		681	if zresult == 0:
		682	self._finished = True
		683	self._dstream = None
		684	self._decompressor = None
		685
		686	if out_buffer.pos:
		687	chunks.append(ffi.buffer(out_buffer.dst, out_buffer.pos)[:])
		688	out_buffer.pos = 0
		689
		690	return b''.join(chunks)
		691
		692
		693	class ZstdDecompressionWriter(object):
		694	def __init__(self, decompressor, writer, write_size):
		695	self._decompressor = decompressor
		696	self._writer = writer
		697	self._write_size = write_size
		698	self._dstream = None
		699	self._entered = False
		700
		701	def __enter__(self):
		702	if self._entered:
		703	raise ZstdError('cannot __enter__ multiple times')
		704
		705	self._dstream = self._decompressor._get_dstream()
		706	self._entered = True
		707
		708	return self
		709
		710	def __exit__(self, exc_type, exc_value, exc_tb):
		711	self._entered = False
		712	self._dstream = None
		713
		714	def memory_size(self):
		715	if not self._dstream:
		716	raise ZstdError('cannot determine size of inactive decompressor '
		717	'call when context manager is active')
		718
		719	return lib.ZSTD_sizeof_DStream(self._dstream)
		720
		721	def write(self, data):
		722	if not self._entered:
		723	raise ZstdError('write must be called from an active context manager')
		724
		725	total_write = 0
		726
		727	in_buffer = ffi.new('ZSTD_inBuffer *')
		728	out_buffer = ffi.new('ZSTD_outBuffer *')
		729
		730	data_buffer = ffi.from_buffer(data)
		731	in_buffer.src = data_buffer
		732	in_buffer.size = len(data_buffer)
		733	in_buffer.pos = 0
		734
		735	dst_buffer = ffi.new('char[]', self._write_size)
		736	out_buffer.dst = dst_buffer
		737	out_buffer.size = len(dst_buffer)
		738	out_buffer.pos = 0
		739
		740	while in_buffer.pos < in_buffer.size:
		741	zresult = lib.ZSTD_decompressStream(self._dstream, out_buffer, in_buffer)
		742	if lib.ZSTD_isError(zresult):
		743	raise ZstdError('zstd decompress error: %s' %
		744	ffi.string(lib.ZSTD_getErrorName(zresult)))
		745
		746	if out_buffer.pos:
		747	self._writer.write(ffi.buffer(out_buffer.dst, out_buffer.pos)[:])
		748	total_write += out_buffer.pos
		749	out_buffer.pos = 0
		750
		751	return total_write
		752
		753
		754	class ZstdDecompressor(object):
		755	def __init__(self, dict_data=None):
		756	self._dict_data = dict_data
		757
		758	dctx = lib.ZSTD_createDCtx()
		759	if dctx == ffi.NULL:
		760	raise MemoryError()
		761
		762	self._refdctx = ffi.gc(dctx, lib.ZSTD_freeDCtx)
		763
		764	@property
		765	def _ddict(self):
		766	if self._dict_data:
		767	dict_data = self._dict_data.as_bytes()
		768	dict_size = len(self._dict_data)
		769
		770	ddict = lib.ZSTD_createDDict(dict_data, dict_size)
		771	if ddict == ffi.NULL:
		772	raise ZstdError('could not create decompression dict')
		773	else:
		774	ddict = None
		775
		776	self.__dict__['_ddict'] = ddict
		777	return ddict
		778
		779	def decompress(self, data, max_output_size=0):
		780	data_buffer = ffi.from_buffer(data)
		781
		782	orig_dctx = new_nonzero('char[]', lib.ZSTD_sizeof_DCtx(self._refdctx))
		783	dctx = ffi.cast('ZSTD_DCtx *', orig_dctx)
		784	lib.ZSTD_copyDCtx(dctx, self._refdctx)
		785
		786	ddict = self._ddict
		787
		788	output_size = lib.ZSTD_getDecompressedSize(data_buffer, len(data_buffer))
		789	if output_size:
		790	result_buffer = ffi.new('char[]', output_size)
		791	result_size = output_size
		792	else:
		793	if not max_output_size:
		794	raise ZstdError('input data invalid or missing content size '
		795	'in frame header')
		796
		797	result_buffer = ffi.new('char[]', max_output_size)
		798	result_size = max_output_size
		799
		800	if ddict:
		801	zresult = lib.ZSTD_decompress_usingDDict(dctx,
		802	result_buffer, result_size,
		803	data_buffer, len(data_buffer),
		804	ddict)
		805	else:
		806	zresult = lib.ZSTD_decompressDCtx(dctx,
		807	result_buffer, result_size,
		808	data_buffer, len(data_buffer))
		809	if lib.ZSTD_isError(zresult):
		810	raise ZstdError('decompression error: %s' %
		811	ffi.string(lib.ZSTD_getErrorName(zresult)))
		812	elif output_size and zresult != output_size:
		813	raise ZstdError('decompression error: decompressed %d bytes; expected %d' %
		814	(zresult, output_size))
		815
		816	return ffi.buffer(result_buffer, zresult)[:]
		817
		818	def decompressobj(self):
		819	return ZstdDecompressionObj(self)
		820
		821	def read_from(self, reader, read_size=DECOMPRESSION_RECOMMENDED_INPUT_SIZE,
		822	write_size=DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE,
		823	skip_bytes=0):
		824	if skip_bytes >= read_size:
		825	raise ValueError('skip_bytes must be smaller than read_size')
		826
		827	if hasattr(reader, 'read'):
		828	have_read = True
		829	elif hasattr(reader, '__getitem__'):
		830	have_read = False
		831	buffer_offset = 0
		832	size = len(reader)
		833	else:
		834	raise ValueError('must pass an object with a read() method or '
		835	'conforms to buffer protocol')
		836
		837	if skip_bytes:
		838	if have_read:
		839	reader.read(skip_bytes)
		840	else:
		841	if skip_bytes > size:
		842	raise ValueError('skip_bytes larger than first input chunk')
		843
		844	buffer_offset = skip_bytes
		845
		846	dstream = self._get_dstream()
		847
		848	in_buffer = ffi.new('ZSTD_inBuffer *')
		849	out_buffer = ffi.new('ZSTD_outBuffer *')
		850
		851	dst_buffer = ffi.new('char[]', write_size)
		852	out_buffer.dst = dst_buffer
		853	out_buffer.size = len(dst_buffer)
		854	out_buffer.pos = 0
		855
		856	while True:
		857	assert out_buffer.pos == 0
		858
		859	if have_read:
		860	read_result = reader.read(read_size)
		861	else:
		862	remaining = size - buffer_offset
		863	slice_size = min(remaining, read_size)
		864	read_result = reader[buffer_offset:buffer_offset + slice_size]
		865	buffer_offset += slice_size
		866
		867	# No new input. Break out of read loop.
		868	if not read_result:
		869	break
		870
		871	# Feed all read data into decompressor and emit output until
		872	# exhausted.
		873	read_buffer = ffi.from_buffer(read_result)
		874	in_buffer.src = read_buffer
		875	in_buffer.size = len(read_buffer)
		876	in_buffer.pos = 0
		877
		878	while in_buffer.pos < in_buffer.size:
		879	assert out_buffer.pos == 0
		880
		881	zresult = lib.ZSTD_decompressStream(dstream, out_buffer, in_buffer)
		882	if lib.ZSTD_isError(zresult):
		883	raise ZstdError('zstd decompress error: %s' %
		884	ffi.string(lib.ZSTD_getErrorName(zresult)))
		885
		886	if out_buffer.pos:
		887	data = ffi.buffer(out_buffer.dst, out_buffer.pos)[:]
		888	out_buffer.pos = 0
		889	yield data
		890
		891	if zresult == 0:
		892	return
		893
		894	# Repeat loop to collect more input data.
		895	continue
		896
		897	# If we get here, input is exhausted.
		898
		899	def write_to(self, writer, write_size=DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE):
		900	if not hasattr(writer, 'write'):
		901	raise ValueError('must pass an object with a write() method')
		902
		903	return ZstdDecompressionWriter(self, writer, write_size)
		904
		905	def copy_stream(self, ifh, ofh,
		906	read_size=DECOMPRESSION_RECOMMENDED_INPUT_SIZE,
		907	write_size=DECOMPRESSION_RECOMMENDED_OUTPUT_SIZE):
		908	if not hasattr(ifh, 'read'):
		909	raise ValueError('first argument must have a read() method')
		910	if not hasattr(ofh, 'write'):
		911	raise ValueError('second argument must have a write() method')
		912
		913	dstream = self._get_dstream()
		914
		915	in_buffer = ffi.new('ZSTD_inBuffer *')
		916	out_buffer = ffi.new('ZSTD_outBuffer *')
		917
		918	dst_buffer = ffi.new('char[]', write_size)
		919	out_buffer.dst = dst_buffer
		920	out_buffer.size = write_size
		921	out_buffer.pos = 0
		922
		923	total_read, total_write = 0, 0
		924
		925	# Read all available input.
		926	while True:
		927	data = ifh.read(read_size)
		928	if not data:
		929	break
		930
		931	data_buffer = ffi.from_buffer(data)
		932	total_read += len(data_buffer)
		933	in_buffer.src = data_buffer
		934	in_buffer.size = len(data_buffer)
		935	in_buffer.pos = 0
		936
		937	# Flush all read data to output.
		938	while in_buffer.pos < in_buffer.size:
		939	zresult = lib.ZSTD_decompressStream(dstream, out_buffer, in_buffer)
		940	if lib.ZSTD_isError(zresult):
		941	raise ZstdError('zstd decompressor error: %s' %
		942	ffi.string(lib.ZSTD_getErrorName(zresult)))
		943
		944	if out_buffer.pos:
		945	ofh.write(ffi.buffer(out_buffer.dst, out_buffer.pos))
		946	total_write += out_buffer.pos
		947	out_buffer.pos = 0
		948
		949	# Continue loop to keep reading.
		950
		951	return total_read, total_write
		952
		953	def decompress_content_dict_chain(self, frames):
		954	if not isinstance(frames, list):
		955	raise TypeError('argument must be a list')
		956
		957	if not frames:
		958	raise ValueError('empty input chain')
		959
		960	# First chunk should not be using a dictionary. We handle it specially.
		961	chunk = frames[0]
		962	if not isinstance(chunk, bytes_type):
		963	raise ValueError('chunk 0 must be bytes')
		964
		965	# All chunks should be zstd frames and should have content size set.
		966	chunk_buffer = ffi.from_buffer(chunk)
		967	params = ffi.new('ZSTD_frameParams *')
		968	zresult = lib.ZSTD_getFrameParams(params, chunk_buffer, len(chunk_buffer))
		969	if lib.ZSTD_isError(zresult):
		970	raise ValueError('chunk 0 is not a valid zstd frame')
		971	elif zresult:
		972	raise ValueError('chunk 0 is too small to contain a zstd frame')
		973
		974	if not params.frameContentSize:
		975	raise ValueError('chunk 0 missing content size in frame')
		976
		977	dctx = lib.ZSTD_createDCtx()
		978	if dctx == ffi.NULL:
		979	raise MemoryError()
		980
		981	dctx = ffi.gc(dctx, lib.ZSTD_freeDCtx)
		982
		983	last_buffer = ffi.new('char[]', params.frameContentSize)
		984
		985	zresult = lib.ZSTD_decompressDCtx(dctx, last_buffer, len(last_buffer),
		986	chunk_buffer, len(chunk_buffer))
		987	if lib.ZSTD_isError(zresult):
		988	raise ZstdError('could not decompress chunk 0: %s' %
		989	ffi.string(lib.ZSTD_getErrorName(zresult)))
		990
		991	# Special case of chain length of 1
		992	if len(frames) == 1:
		993	return ffi.buffer(last_buffer, len(last_buffer))[:]
		994
		995	i = 1
		996	while i < len(frames):
		997	chunk = frames[i]
		998	if not isinstance(chunk, bytes_type):
		999	raise ValueError('chunk %d must be bytes' % i)
		1000
		1001	chunk_buffer = ffi.from_buffer(chunk)
		1002	zresult = lib.ZSTD_getFrameParams(params, chunk_buffer, len(chunk_buffer))
		1003	if lib.ZSTD_isError(zresult):
		1004	raise ValueError('chunk %d is not a valid zstd frame' % i)
		1005	elif zresult:
		1006	raise ValueError('chunk %d is too small to contain a zstd frame' % i)
		1007
		1008	if not params.frameContentSize:
		1009	raise ValueError('chunk %d missing content size in frame' % i)
		1010
		1011	dest_buffer = ffi.new('char[]', params.frameContentSize)
		1012
		1013	zresult = lib.ZSTD_decompress_usingDict(dctx, dest_buffer, len(dest_buffer),
		1014	chunk_buffer, len(chunk_buffer),
		1015	last_buffer, len(last_buffer))
		1016	if lib.ZSTD_isError(zresult):
		1017	raise ZstdError('could not decompress chunk %d' % i)
		1018
		1019	last_buffer = dest_buffer
		1020	i += 1
		1021
		1022	return ffi.buffer(last_buffer, len(last_buffer))[:]
		1023
		1024	def _get_dstream(self):
		1025	dstream = lib.ZSTD_createDStream()
		1026	if dstream == ffi.NULL:
		1027	raise MemoryError()
		1028
		1029	dstream = ffi.gc(dstream, lib.ZSTD_freeDStream)
		1030
		1031	if self._dict_data:
		1032	zresult = lib.ZSTD_initDStream_usingDict(dstream,
		1033	self._dict_data.as_bytes(),
		1034	len(self._dict_data))
		1035	else:
		1036	zresult = lib.ZSTD_initDStream(dstream)
		1037
		1038	if lib.ZSTD_isError(zresult):
		1039	raise ZstdError('could not initialize DStream: %s' %
		1040	ffi.string(lib.ZSTD_getErrorName(zresult)))
		1041
		1042	return dstream

tests/test-check-py3-compat.t

0 0 -1

                contrib/python-zstandard/setup.py not using absolute_import
                contrib/python-zstandard/setup_zstd.py not using absolute_import
                contrib/python-zstandard/tests/common.py not using absolute_import
-               contrib/python-zstandard/tests/test_cffi.py not using absolute_import
                contrib/python-zstandard/tests/test_compressor.py not using absolute_import
                contrib/python-zstandard/tests/test_data_structures.py not using absolute_import
                contrib/python-zstandard/tests/test_decompressor.py not using absolute_import

contrib/python-zstandard/tests/test_cffi.py

0 removed 0 -35

NO CONTENT: file was removed

General Comments 0

Write
Preview

You need to be logged in to leave comments. Login now

No TODOs yet

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages