##// END OF EJS Templates
deltas: set estimated compression upper bound to "3x" instead of "10x"...
deltas: set estimated compression upper bound to "3x" instead of "10x" In pratice, we very rarely observer compression better than "3x" on manifest deltas. Having a more aggressive estimate significantly helps our pathological use case on a private repository. Here are a comparison of timings using different upper bound. Estimated compression | ø | ×10 | ×5 | ×3 | timing | 14.11 | 2.61 | 1.96 | 1.53 | We also tested the impact of this series on an array of public repositories. This shown no impact in either size nor timing. Full data set below for those interested. Size ---- Regarding size, not significant impact have been noticed on neither public nor private repositories. Here are the number we gathered on public repositories: zlib/upperbound | no | 10x | 5x | 3x mercurial | 5 875 730 | 5 875 730 | 5 875 730 | 5 875 730 pypy | 27 782 913 | 27 782 913 | 27 782 913 | 27 782 913 netbeans | 159 161 207 | 159 161 207 | 159 161 207 | 159 959 879 (+0.5%) mozilla-central | 323 841 642 | 323 841 642 | 323 841 642 | 319 867 519 (-2.5%) mozilla-try | 746 649 123 | 746 649 123 | 746 649 123 | 741 155 568 (-0.7%) private-repo | 1 485 287 294 | 1 485 287 294 | 1 485 287 294 | 1 409 248 382 (-5.1%) zstd/upperbound | no | 10x | 5x | 3x mercurial | 5 895 206 | 5 895 206 | 5 895 206 | 5 895 206 pypy | 28 689 230 | 28 689 230 | 28 689 230 | 28 689 230 netbeans | 157 636 387 | 157 636 387 | 157 636 387 | 159 692 678 (+1.3%) mozilla-central | 317 650 281 | 317 650 281 | 317 650 281 | 319 613 603 (+0.6%) mozilla-try | 737 555 275 | 737 555 275 | 737 555 275 | 738 079 473 (+0.1%) private-repo | 1 352 362 982 | 1 352 362 982 | 1 346 961 880 | 1 361 327 384 (+0.7%) Speed ------ Timing gathered using `hg perfrevlogwrite -m`. Value are in seconds. mercurial zlib | no | 10x | 5x | 3x | total | 65.551783 | 65.388887 | 65.260658 | 65.321199 | max | 0.034544 | 0.034571 | 0.034659 | 0.034521 | 99.99% | 0.034544 | 0.034571 | 0.034659 | 0.034521 | zstd | no | 10x | 5x | 3x | total | 49.118449 | 49.054062 | 48.753588 | 48.740230 | max | 0.009338 | 0.009239 | 0.009202 | 0.009178 | 99.99% | 0.007618 | 0.007639 | 0.007626 | 0.007621 | pypy zlib | no | 10x | 5x | 3x | total | 560.865984 | 558.983817 | 559.083815 | 559.349152 | max | 0.219614 | 0.215922 | 0.218112 | 0.218107 | 99.99% | 0.219614 | 0.215922 | 0.218112 | 0.218107 | zstd | no | 10x | 5x | 3x | total | 349.393280 | 347.395819 | 347.185407 | 345.643985 | max | 0.084143 | 0.083536 | 0.081834 | 0.082178 | 99.99% | 0.039445 | 0.039639 | 0.039612 | 0.039175 | netbeans zlib | no | 10x | 5x | 3x | total | 33103.327727 | 33314.932260 | 33211.745233 | 33345.891778 | max | 2.666852 | 2.672059 | 2.662453 | 2.662936 | 99.99% | 2.058772 | 2.070429 | 2.069569 | 2.064653 | zstd | no | 10x | 5x | 3x | total | 20112.102708 | 20095.879719 | 20083.390300 | 20123.221859 | max | 2.063482 | 2.062851 | 2.065229 | 2.060147 | 99.99% | 1.146647 | 1.143794 | 1.142933 | 1.146529 | mozilla zlib | no | 10x | 5x | 3x | total | 41374.102138 | 41418.816773 | 41381.956370 | 41334.280732 | max | 3.383474 | 3.387400 | 3.405711 | 3.387316 | 99.99% | 1.006755 | 1.005954 | 1.007700 | 1.007373 | zstd | no | 10x | 5x | 3x | total | 24689.691520 | 24643.939662 | 24664.630027 | 24664.512714 | max | 1.460822 | 1.449640 | 1.439747 | 1.465304 | 99.99% | 0.527111 | 0.527377 | 0.527807 | 0.527226 |

File last commit:

r35650:fa9747e7 default
r42669:4a3abb33 default
Show More
main.rs
233 lines | 7.6 KiB | application/rls-services+xml | RustLexer
// main.rs -- Main routines for `hg` program
//
// Copyright 2017 Gregory Szorc <gregory.szorc@gmail.com>
//
// This software may be used and distributed according to the terms of the
// GNU General Public License version 2 or any later version.
extern crate libc;
extern crate cpython;
extern crate python27_sys;
use cpython::{NoArgs, ObjectProtocol, PyModule, PyResult, Python};
use libc::{c_char, c_int};
use std::env;
use std::path::PathBuf;
use std::ffi::{CString, OsStr};
#[cfg(target_family = "unix")]
use std::os::unix::ffi::{OsStrExt, OsStringExt};
#[derive(Debug)]
struct Environment {
_exe: PathBuf,
python_exe: PathBuf,
python_home: PathBuf,
mercurial_modules: PathBuf,
}
/// Run Mercurial locally from a source distribution or checkout.
///
/// hg is <srcdir>/rust/target/<target>/hg
/// Python interpreter is detected by build script.
/// Python home is relative to Python interpreter.
/// Mercurial files are relative to hg binary, which is relative to source root.
#[cfg(feature = "localdev")]
fn get_environment() -> Environment {
let exe = env::current_exe().unwrap();
let mut mercurial_modules = exe.clone();
mercurial_modules.pop(); // /rust/target/<target>
mercurial_modules.pop(); // /rust/target
mercurial_modules.pop(); // /rust
mercurial_modules.pop(); // /
let python_exe: &'static str = env!("PYTHON_INTERPRETER");
let python_exe = PathBuf::from(python_exe);
let mut python_home = python_exe.clone();
python_home.pop();
// On Windows, python2.7.exe exists at the root directory of the Python
// install. Everywhere else, the Python install root is one level up.
if !python_exe.ends_with("python2.7.exe") {
python_home.pop();
}
Environment {
_exe: exe.clone(),
python_exe: python_exe,
python_home: python_home,
mercurial_modules: mercurial_modules.to_path_buf(),
}
}
// On UNIX, platform string is just bytes and should not contain NUL.
#[cfg(target_family = "unix")]
fn cstring_from_os<T: AsRef<OsStr>>(s: T) -> CString {
CString::new(s.as_ref().as_bytes()).unwrap()
}
// TODO convert to ANSI characters?
#[cfg(target_family = "windows")]
fn cstring_from_os<T: AsRef<OsStr>>(s: T) -> CString {
CString::new(s.as_ref().to_str().unwrap()).unwrap()
}
// On UNIX, argv starts as an array of char*. So it is easy to convert
// to C strings.
#[cfg(target_family = "unix")]
fn args_to_cstrings() -> Vec<CString> {
env::args_os()
.map(|a| CString::new(a.into_vec()).unwrap())
.collect()
}
// TODO Windows support is incomplete. We should either use env::args_os()
// (or call into GetCommandLineW() + CommandLinetoArgvW()), convert these to
// PyUnicode instances, and pass these into Python/Mercurial outside the
// standard PySys_SetArgvEx() mechanism. This will allow us to preserve the
// raw bytes (since PySys_SetArgvEx() is based on char* and can drop wchar
// data.
//
// For now, we use env::args(). This will choke on invalid UTF-8 arguments.
// But it is better than nothing.
#[cfg(target_family = "windows")]
fn args_to_cstrings() -> Vec<CString> {
env::args().map(|a| CString::new(a).unwrap()).collect()
}
fn set_python_home(env: &Environment) {
let raw = cstring_from_os(&env.python_home).into_raw();
unsafe {
python27_sys::Py_SetPythonHome(raw);
}
}
fn update_encoding(_py: Python, _sys_mod: &PyModule) {
// Call sys.setdefaultencoding("undefined") if HGUNICODEPEDANTRY is set.
let pedantry = env::var("HGUNICODEPEDANTRY").is_ok();
if pedantry {
// site.py removes the sys.setdefaultencoding attribute. So we need
// to reload the module to get a handle on it. This is a lesser
// used feature and we'll support this later.
// TODO support this
panic!("HGUNICODEPEDANTRY is not yet supported");
}
}
fn update_modules_path(env: &Environment, py: Python, sys_mod: &PyModule) {
let sys_path = sys_mod.get(py, "path").unwrap();
sys_path
.call_method(py, "insert", (0, env.mercurial_modules.to_str()), None)
.expect("failed to update sys.path to location of Mercurial modules");
}
fn run() -> Result<(), i32> {
let env = get_environment();
//println!("{:?}", env);
// Tell Python where it is installed.
set_python_home(&env);
// Set program name. The backing memory needs to live for the duration of the
// interpreter.
//
// TODO consider storing this in a static or associating with lifetime of
// the Python interpreter.
//
// Yes, we use the path to the Python interpreter not argv[0] here. The
// reason is because Python uses the given path to find the location of
// Python files. Apparently we could define our own ``Py_GetPath()``
// implementation. But this may require statically linking Python, which is
// not desirable.
let program_name = cstring_from_os(&env.python_exe).as_ptr();
unsafe {
python27_sys::Py_SetProgramName(program_name as *mut i8);
}
unsafe {
python27_sys::Py_Initialize();
}
// https://docs.python.org/2/c-api/init.html#c.PySys_SetArgvEx has important
// usage information about PySys_SetArgvEx:
//
// * It says the first argument should be the script that is being executed.
// If not a script, it can be empty. We are definitely not a script.
// However, parts of Mercurial do look at sys.argv[0]. So we need to set
// something here.
//
// * When embedding Python, we should use ``PySys_SetArgvEx()`` and set
// ``updatepath=0`` for security reasons. Essentially, Python's default
// logic will treat an empty argv[0] in a manner that could result in
// sys.path picking up directories it shouldn't and this could lead to
// loading untrusted modules.
// env::args() will panic if it sees a non-UTF-8 byte sequence. And
// Mercurial supports arbitrary encodings of input data. So we need to
// use OS-specific mechanisms to get the raw bytes without UTF-8
// interference.
let args = args_to_cstrings();
let argv: Vec<*const c_char> = args.iter().map(|a| a.as_ptr()).collect();
unsafe {
python27_sys::PySys_SetArgvEx(args.len() as c_int, argv.as_ptr() as *mut *mut i8, 0);
}
let result;
{
// These need to be dropped before we call Py_Finalize(). Hence the
// block.
let gil = Python::acquire_gil();
let py = gil.python();
// Mercurial code could call sys.exit(), which will call exit()
// itself. So this may not return.
// TODO this may cause issues on Windows due to the CRT mismatch.
// Investigate if we can intercept sys.exit() or SystemExit() to
// ensure we handle process exit.
result = match run_py(&env, py) {
// Print unhandled exceptions and exit code 255, as this is what
// `python` does.
Err(err) => {
err.print(py);
Err(255)
}
Ok(()) => Ok(()),
};
}
unsafe {
python27_sys::Py_Finalize();
}
result
}
fn run_py(env: &Environment, py: Python) -> PyResult<()> {
let sys_mod = py.import("sys").unwrap();
update_encoding(py, &sys_mod);
update_modules_path(&env, py, &sys_mod);
// TODO consider a better error message on failure to import.
let demand_mod = py.import("hgdemandimport")?;
demand_mod.call(py, "enable", NoArgs, None)?;
let dispatch_mod = py.import("mercurial.dispatch")?;
dispatch_mod.call(py, "run", NoArgs, None)?;
Ok(())
}
fn main() {
let exit_code = match run() {
Err(err) => err,
Ok(()) => 0,
};
std::process::exit(exit_code);
}