##// END OF EJS Templates
dirstate: use visitchildrenset in traverse...
dirstate: use visitchildrenset in traverse This speeds up `hg status` a fair amount when there is a very large directory and narrow is in use. Timing numbers according to command: hyperfine --warmup 1 'hg status' HGRCPATH points to a file with the following contents: [extensions] narrow = mozilla-unified (called m-u below) was at revision #468856. regular hash: eb39298e432d treemanifests hash: 0553b7f29eaf large-dir-repo (called l-d-r below) was generated with the following script: #!/bin/bash hg init large-dir-repo mkdir -p large-dir-repo/third_party/rust/log touch large-dir-repo/third_party/rust/log/foo.txt for i in $(seq 1 30000); do d=$(mktemp -d large-dir-repo/third_party/XXXXXXXXX) touch $d/file.txt done hg -R large-dir-repo ci -Am 'rev0' --user test --date '0 0' for repos that use narrow, the narrowspec was this: [includes] rootfilesin:third_party/rust/log [excludes] This narrowspec was chosen due to the size of the third_party/rust directory; this directory was *not* modified in revision #468856 in mozilla-unified. Importantly, when using narrow, these repos had everything checked out (in the case of large-dir-repo, that means all 30,001 directories), *before* adding the narrowspec. This is to simulate the behavior when using a virtual filesystem that shows everything for the user even if they haven't added it to the narrowspec yet. This is not a supported configuration, and `hg update` will not really do the "correct" thing, but non-mutating commands should behave correctly. There are two repos below that do not follow the setup above, 'citc1' and 'citc2', which are using a virtual filesystem and can not be reproduced upstream; these numbers are here mostly to indicate that these performance improvements are not hypothetical, and show the benefits we're hoping to achieve on our real workloads. 'citc1' is closest to large-dir-repo with one of our pathological cases, 'citc2' is an arbitrary repo and closer to "average". I'm not claiming anything less than a 5% speed win as improvements due to this change; these are probably eiter measurement artifacts or constant time improvements. The numbers that aren't changing are shown primarily to prove that this doesn't make anything worse in any case I plan on testing during this series. 'before' is hg from commit c83ad576. 'N' indicates narrow in use, 'T' indicates treemanifest in use. hg status: repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before ------+---+---+------------------------+-----------------------+------------ m-u | | | 2.284 s +- 0.022 s | 2.274 s +- 0.021 s | 99.6% m-u | | x | 2.289 s +- 0.008 s | 2.284 s +- 0.028 s | 99.8% m-u | x | | 430.8 ms +- 3.1 ms | 424.5 ms +- 3.2 ms | 98.5% m-u | x | x | 429.8 ms +- 2.5 ms | 425.8 ms +- 3.7 ms | 99.1% l-d-r | | | 681.3 ms +- 5.5 ms | 689.6 ms +- 8.0 ms | 101.2% l-d-r | | x | 666.8 ms +- 21.8 ms | 672.5 ms +- 14.9 ms | 100.9% l-d-r | x | | 282.6 ms +- 1.8 ms | 203.0 ms +- 1.2 ms | 71.8% <-- l-d-r | x | x | 275.2 ms +- 3.9 ms | 199.3 ms +- 3.5 ms | 72.4% <-- citc1 | x | x | 1.023 s +- 0.011 s | 398.6 ms +- 9.2 ms | 39.0% <-- citc2 | x | x | 297.9 ms +- 4.4 ms | 289.6 ms +- 4.2 ms | 97.2% hg status --change .: repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before ------+---+---+------------------------+-----------------------+------------ m-u | | | 478.2 ms +- 2.0 ms | 476.9 ms +- 3.7 ms | 99.7% m-u | | x | 169.5 ms +- 2.7 ms | 169.5 ms +- 2.5 ms | 100.0% m-u | x | | 477.0 ms +- 2.4 ms | 476.1 ms +- 1.4 ms | 99.8% m-u | x | x | 124.7 ms +- 1.9 ms | 124.2 ms +- 3.3 ms | 99.6% l-d-r | | | 97.4 ms +- 1.2 ms | 96.5 ms +- 1.2 ms | 99.1% l-d-r | | x | 4.778 s +- 0.018 s | 4.774 s +- 0.011 s | 99.9% l-d-r | x | | 99.9 ms +- 1.1 ms | 98.8 ms +- 1.3 ms | 98.9% l-d-r | x | x | 848.7 ms +- 7.1 ms | 849.4 ms +- 6.5 ms | 100.1% citc1 | x | x | 4.250 s +- 0.051 s | 4.283 s +- 0.042 s | 100.8% citc2 | x | x | 341.5 ms +- 4.7 ms | 341.5 ms +- 4.1 ms | 100.0% hg update $rev^; hg update $rev: repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before ------+---+---+------------------------+-----------------------+------------ m-u | | | 4.357 s +- 0.032 s | 4.312 s +- 0.093 s | 99.0% m-u | | x | 3.599 s +- 0.061 s | 3.592 s +- 0.071 s | 99.8% m-u | x | | 1.815 s +- 0.012 s | 1.816 s +- 0.013 s | 100.1% m-u | x | x | 1.110 s +- 0.009 s | 1.106 s +- 0.005 s | 99.6% l-d-r | | | 527.1 ms +- 7.8 ms | 523.3 ms +- 6.5 ms | 99.3% l-d-r | | x | 8.835 s +- 0.067 s | 8.825 s +- 0.064 s | 99.9% l-d-r | x | | 313.0 ms +- 2.2 ms | 312.1 ms +- 1.2 ms | 99.7% l-d-r | x | x | 1.780 s +- 0.011 s | 1.799 s +- 0.013 s | 101.1% citc1 | x | x | 6.825 s +- 0.262 s | 6.707 s +- 0.353 s | 98.3% citc2 | x | x | 776.4 ms +- 4.5 ms | 781.3 ms +- 6.3 ms | 100.6% hg diff: repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before ------+---+---+------------------------+-----------------------+------------ m-u | | | 1.519 s +- 0.015 s | 1.525 s +- 0.017 s | 100.4% m-u | | x | 1.512 s +- 0.010 s | 1.517 s +- 0.027 s | 100.3% m-u | x | | 420.0 ms +- 3.2 ms | 417.1 ms +- 1.9 ms | 99.3% m-u | x | x | 415.0 ms +- 3.8 ms | 415.7 ms +- 2.7 ms | 100.2% l-d-r | | | 220.8 ms +- 4.0 ms | 220.8 ms +- 3.7 ms | 100.0% l-d-r | | x | 216.6 ms +- 7.5 ms | 211.4 ms +- 2.1 ms | 97.6% l-d-r | x | | 111.9 ms +- 1.8 ms | 112.0 ms +- 1.5 ms | 100.1% l-d-r | x | x | 111.4 ms +- 1.4 ms | 110.2 ms +- 1.0 ms | 98.9% citc1 | x | x | 268.7 ms +- 2.3 ms | 269.6 ms +- 2.8 ms | 100.3% citc2 | x | x | 273.5 ms +- 5.5 ms | 273.9 ms +- 3.7 ms | 100.1% hg diff -c .: repo | N | T | before (mean +- stdev) | after (mean +- stdev) | % of before ------+---+---+--------------------------+-----------------------+---------- m-u | | | 497.1 ms +- 1.4 ms | 500.1 ms +- 2.4 ms | 100.6% m-u | | x | 195.3 ms +- 13.2 ms | 191.6 ms +- 3.0 ms | 98.1% m-u | x | | 476.8 ms +- 1.9 ms | 476.7 ms +- 2.3 ms | 100.0% m-u | x | x | 122.8 ms +- 2.1 ms | 122.9 ms +- 2.0 ms | 100.1% l-d-r | | | 99.3 ms +- 2.3 ms | 98.8 ms +- 1.7 ms | 99.5% l-d-r | | x | 4.875 s +- 0.041 s | 4.847 s +- 0.038 s | 99.4% l-d-r | x | | 98.5 ms +- 1.2 ms | 98.9 ms +- 1.3 ms | 100.4% l-d-r | x | x | 864.6 ms +- 7.4 ms | 855.4 ms +- 6.6 ms | 98.9% citc1 | x | x | 4.505 s +- 0.060 s | 4.466 s +- 0.036 s | 99.1% citc2 | x | x | 368.0 ms +- 4.0 ms | 365.5 ms +- 6.3 ms | 99.3% Differential Revision: https://phab.mercurial-scm.org/D4131

File last commit:

r35650:fa9747e7 default
r38991:a3cabe94 default
Show More
main.rs
233 lines | 7.6 KiB | application/rls-services+xml | RustLexer
// main.rs -- Main routines for `hg` program
//
// Copyright 2017 Gregory Szorc <gregory.szorc@gmail.com>
//
// This software may be used and distributed according to the terms of the
// GNU General Public License version 2 or any later version.
extern crate libc;
extern crate cpython;
extern crate python27_sys;
use cpython::{NoArgs, ObjectProtocol, PyModule, PyResult, Python};
use libc::{c_char, c_int};
use std::env;
use std::path::PathBuf;
use std::ffi::{CString, OsStr};
#[cfg(target_family = "unix")]
use std::os::unix::ffi::{OsStrExt, OsStringExt};
#[derive(Debug)]
struct Environment {
_exe: PathBuf,
python_exe: PathBuf,
python_home: PathBuf,
mercurial_modules: PathBuf,
}
/// Run Mercurial locally from a source distribution or checkout.
///
/// hg is <srcdir>/rust/target/<target>/hg
/// Python interpreter is detected by build script.
/// Python home is relative to Python interpreter.
/// Mercurial files are relative to hg binary, which is relative to source root.
#[cfg(feature = "localdev")]
fn get_environment() -> Environment {
let exe = env::current_exe().unwrap();
let mut mercurial_modules = exe.clone();
mercurial_modules.pop(); // /rust/target/<target>
mercurial_modules.pop(); // /rust/target
mercurial_modules.pop(); // /rust
mercurial_modules.pop(); // /
let python_exe: &'static str = env!("PYTHON_INTERPRETER");
let python_exe = PathBuf::from(python_exe);
let mut python_home = python_exe.clone();
python_home.pop();
// On Windows, python2.7.exe exists at the root directory of the Python
// install. Everywhere else, the Python install root is one level up.
if !python_exe.ends_with("python2.7.exe") {
python_home.pop();
}
Environment {
_exe: exe.clone(),
python_exe: python_exe,
python_home: python_home,
mercurial_modules: mercurial_modules.to_path_buf(),
}
}
// On UNIX, platform string is just bytes and should not contain NUL.
#[cfg(target_family = "unix")]
fn cstring_from_os<T: AsRef<OsStr>>(s: T) -> CString {
CString::new(s.as_ref().as_bytes()).unwrap()
}
// TODO convert to ANSI characters?
#[cfg(target_family = "windows")]
fn cstring_from_os<T: AsRef<OsStr>>(s: T) -> CString {
CString::new(s.as_ref().to_str().unwrap()).unwrap()
}
// On UNIX, argv starts as an array of char*. So it is easy to convert
// to C strings.
#[cfg(target_family = "unix")]
fn args_to_cstrings() -> Vec<CString> {
env::args_os()
.map(|a| CString::new(a.into_vec()).unwrap())
.collect()
}
// TODO Windows support is incomplete. We should either use env::args_os()
// (or call into GetCommandLineW() + CommandLinetoArgvW()), convert these to
// PyUnicode instances, and pass these into Python/Mercurial outside the
// standard PySys_SetArgvEx() mechanism. This will allow us to preserve the
// raw bytes (since PySys_SetArgvEx() is based on char* and can drop wchar
// data.
//
// For now, we use env::args(). This will choke on invalid UTF-8 arguments.
// But it is better than nothing.
#[cfg(target_family = "windows")]
fn args_to_cstrings() -> Vec<CString> {
env::args().map(|a| CString::new(a).unwrap()).collect()
}
fn set_python_home(env: &Environment) {
let raw = cstring_from_os(&env.python_home).into_raw();
unsafe {
python27_sys::Py_SetPythonHome(raw);
}
}
fn update_encoding(_py: Python, _sys_mod: &PyModule) {
// Call sys.setdefaultencoding("undefined") if HGUNICODEPEDANTRY is set.
let pedantry = env::var("HGUNICODEPEDANTRY").is_ok();
if pedantry {
// site.py removes the sys.setdefaultencoding attribute. So we need
// to reload the module to get a handle on it. This is a lesser
// used feature and we'll support this later.
// TODO support this
panic!("HGUNICODEPEDANTRY is not yet supported");
}
}
fn update_modules_path(env: &Environment, py: Python, sys_mod: &PyModule) {
let sys_path = sys_mod.get(py, "path").unwrap();
sys_path
.call_method(py, "insert", (0, env.mercurial_modules.to_str()), None)
.expect("failed to update sys.path to location of Mercurial modules");
}
fn run() -> Result<(), i32> {
let env = get_environment();
//println!("{:?}", env);
// Tell Python where it is installed.
set_python_home(&env);
// Set program name. The backing memory needs to live for the duration of the
// interpreter.
//
// TODO consider storing this in a static or associating with lifetime of
// the Python interpreter.
//
// Yes, we use the path to the Python interpreter not argv[0] here. The
// reason is because Python uses the given path to find the location of
// Python files. Apparently we could define our own ``Py_GetPath()``
// implementation. But this may require statically linking Python, which is
// not desirable.
let program_name = cstring_from_os(&env.python_exe).as_ptr();
unsafe {
python27_sys::Py_SetProgramName(program_name as *mut i8);
}
unsafe {
python27_sys::Py_Initialize();
}
// https://docs.python.org/2/c-api/init.html#c.PySys_SetArgvEx has important
// usage information about PySys_SetArgvEx:
//
// * It says the first argument should be the script that is being executed.
// If not a script, it can be empty. We are definitely not a script.
// However, parts of Mercurial do look at sys.argv[0]. So we need to set
// something here.
//
// * When embedding Python, we should use ``PySys_SetArgvEx()`` and set
// ``updatepath=0`` for security reasons. Essentially, Python's default
// logic will treat an empty argv[0] in a manner that could result in
// sys.path picking up directories it shouldn't and this could lead to
// loading untrusted modules.
// env::args() will panic if it sees a non-UTF-8 byte sequence. And
// Mercurial supports arbitrary encodings of input data. So we need to
// use OS-specific mechanisms to get the raw bytes without UTF-8
// interference.
let args = args_to_cstrings();
let argv: Vec<*const c_char> = args.iter().map(|a| a.as_ptr()).collect();
unsafe {
python27_sys::PySys_SetArgvEx(args.len() as c_int, argv.as_ptr() as *mut *mut i8, 0);
}
let result;
{
// These need to be dropped before we call Py_Finalize(). Hence the
// block.
let gil = Python::acquire_gil();
let py = gil.python();
// Mercurial code could call sys.exit(), which will call exit()
// itself. So this may not return.
// TODO this may cause issues on Windows due to the CRT mismatch.
// Investigate if we can intercept sys.exit() or SystemExit() to
// ensure we handle process exit.
result = match run_py(&env, py) {
// Print unhandled exceptions and exit code 255, as this is what
// `python` does.
Err(err) => {
err.print(py);
Err(255)
}
Ok(()) => Ok(()),
};
}
unsafe {
python27_sys::Py_Finalize();
}
result
}
fn run_py(env: &Environment, py: Python) -> PyResult<()> {
let sys_mod = py.import("sys").unwrap();
update_encoding(py, &sys_mod);
update_modules_path(&env, py, &sys_mod);
// TODO consider a better error message on failure to import.
let demand_mod = py.import("hgdemandimport")?;
demand_mod.call(py, "enable", NoArgs, None)?;
let dispatch_mod = py.import("mercurial.dispatch")?;
dispatch_mod.call(py, "run", NoArgs, None)?;
Ok(())
}
fn main() {
let exit_code = match run() {
Err(err) => err,
Ok(()) => 0,
};
std::process::exit(exit_code);
}