##// END OF EJS Templates
deltas: set estimated compression upper bound to "3x" instead of "10x"...
deltas: set estimated compression upper bound to "3x" instead of "10x" In pratice, we very rarely observer compression better than "3x" on manifest deltas. Having a more aggressive estimate significantly helps our pathological use case on a private repository. Here are a comparison of timings using different upper bound. Estimated compression | ø | ×10 | ×5 | ×3 | timing | 14.11 | 2.61 | 1.96 | 1.53 | We also tested the impact of this series on an array of public repositories. This shown no impact in either size nor timing. Full data set below for those interested. Size ---- Regarding size, not significant impact have been noticed on neither public nor private repositories. Here are the number we gathered on public repositories: zlib/upperbound | no | 10x | 5x | 3x mercurial | 5 875 730 | 5 875 730 | 5 875 730 | 5 875 730 pypy | 27 782 913 | 27 782 913 | 27 782 913 | 27 782 913 netbeans | 159 161 207 | 159 161 207 | 159 161 207 | 159 959 879 (+0.5%) mozilla-central | 323 841 642 | 323 841 642 | 323 841 642 | 319 867 519 (-2.5%) mozilla-try | 746 649 123 | 746 649 123 | 746 649 123 | 741 155 568 (-0.7%) private-repo | 1 485 287 294 | 1 485 287 294 | 1 485 287 294 | 1 409 248 382 (-5.1%) zstd/upperbound | no | 10x | 5x | 3x mercurial | 5 895 206 | 5 895 206 | 5 895 206 | 5 895 206 pypy | 28 689 230 | 28 689 230 | 28 689 230 | 28 689 230 netbeans | 157 636 387 | 157 636 387 | 157 636 387 | 159 692 678 (+1.3%) mozilla-central | 317 650 281 | 317 650 281 | 317 650 281 | 319 613 603 (+0.6%) mozilla-try | 737 555 275 | 737 555 275 | 737 555 275 | 738 079 473 (+0.1%) private-repo | 1 352 362 982 | 1 352 362 982 | 1 346 961 880 | 1 361 327 384 (+0.7%) Speed ------ Timing gathered using `hg perfrevlogwrite -m`. Value are in seconds. mercurial zlib | no | 10x | 5x | 3x | total | 65.551783 | 65.388887 | 65.260658 | 65.321199 | max | 0.034544 | 0.034571 | 0.034659 | 0.034521 | 99.99% | 0.034544 | 0.034571 | 0.034659 | 0.034521 | zstd | no | 10x | 5x | 3x | total | 49.118449 | 49.054062 | 48.753588 | 48.740230 | max | 0.009338 | 0.009239 | 0.009202 | 0.009178 | 99.99% | 0.007618 | 0.007639 | 0.007626 | 0.007621 | pypy zlib | no | 10x | 5x | 3x | total | 560.865984 | 558.983817 | 559.083815 | 559.349152 | max | 0.219614 | 0.215922 | 0.218112 | 0.218107 | 99.99% | 0.219614 | 0.215922 | 0.218112 | 0.218107 | zstd | no | 10x | 5x | 3x | total | 349.393280 | 347.395819 | 347.185407 | 345.643985 | max | 0.084143 | 0.083536 | 0.081834 | 0.082178 | 99.99% | 0.039445 | 0.039639 | 0.039612 | 0.039175 | netbeans zlib | no | 10x | 5x | 3x | total | 33103.327727 | 33314.932260 | 33211.745233 | 33345.891778 | max | 2.666852 | 2.672059 | 2.662453 | 2.662936 | 99.99% | 2.058772 | 2.070429 | 2.069569 | 2.064653 | zstd | no | 10x | 5x | 3x | total | 20112.102708 | 20095.879719 | 20083.390300 | 20123.221859 | max | 2.063482 | 2.062851 | 2.065229 | 2.060147 | 99.99% | 1.146647 | 1.143794 | 1.142933 | 1.146529 | mozilla zlib | no | 10x | 5x | 3x | total | 41374.102138 | 41418.816773 | 41381.956370 | 41334.280732 | max | 3.383474 | 3.387400 | 3.405711 | 3.387316 | 99.99% | 1.006755 | 1.005954 | 1.007700 | 1.007373 | zstd | no | 10x | 5x | 3x | total | 24689.691520 | 24643.939662 | 24664.630027 | 24664.512714 | max | 1.460822 | 1.449640 | 1.439747 | 1.465304 | 99.99% | 0.527111 | 0.527377 | 0.527807 | 0.527226 |

File last commit:

r42357:1b0be75c default
r42669:4a3abb33 default
Show More
discovery.rs
209 lines | 7.0 KiB | application/rls-services+xml | RustLexer
// discovery.rs
//
// Copyright 2019 Georges Racinet <georges.racinet@octobus.net>
//
// This software may be used and distributed according to the terms of the
// GNU General Public License version 2 or any later version.
//! Discovery operations
//!
//! This is a Rust counterpart to the `partialdiscovery` class of
//! `mercurial.setdiscovery`
use super::{Graph, GraphError, Revision};
use crate::ancestors::MissingAncestors;
use crate::dagops;
use std::collections::HashSet;
pub struct PartialDiscovery<G: Graph + Clone> {
target_heads: Option<Vec<Revision>>,
graph: G, // plays the role of self._repo
common: MissingAncestors<G>,
undecided: Option<HashSet<Revision>>,
missing: HashSet<Revision>,
}
pub struct DiscoveryStats {
pub undecided: Option<usize>,
}
impl<G: Graph + Clone> PartialDiscovery<G> {
/// Create a PartialDiscovery object, with the intent
/// of comparing our `::<target_heads>` revset to the contents of another
/// repo.
///
/// For now `target_heads` is passed as a vector, and will be used
/// at the first call to `ensure_undecided()`.
///
/// If we want to make the signature more flexible,
/// we'll have to make it a type argument of `PartialDiscovery` or a trait
/// object since we'll keep it in the meanwhile
pub fn new(graph: G, target_heads: Vec<Revision>) -> Self {
PartialDiscovery {
undecided: None,
target_heads: Some(target_heads),
graph: graph.clone(),
common: MissingAncestors::new(graph, vec![]),
missing: HashSet::new(),
}
}
/// Register revisions known as being common
pub fn add_common_revisions(
&mut self,
common: impl IntoIterator<Item = Revision>,
) -> Result<(), GraphError> {
self.common.add_bases(common);
if let Some(ref mut undecided) = self.undecided {
self.common.remove_ancestors_from(undecided)?;
}
Ok(())
}
/// Register revisions known as being missing
pub fn add_missing_revisions(
&mut self,
missing: impl IntoIterator<Item = Revision>,
) -> Result<(), GraphError> {
self.ensure_undecided()?;
let range = dagops::range(
&self.graph,
missing,
self.undecided.as_ref().unwrap().iter().cloned(),
)?;
let undecided_mut = self.undecided.as_mut().unwrap();
for missrev in range {
self.missing.insert(missrev);
undecided_mut.remove(&missrev);
}
Ok(())
}
/// Do we have any information about the peer?
pub fn has_info(&self) -> bool {
self.common.has_bases()
}
/// Did we acquire full knowledge of our Revisions that the peer has?
pub fn is_complete(&self) -> bool {
self.undecided.as_ref().map_or(false, |s| s.is_empty())
}
/// Return the heads of the currently known common set of revisions.
///
/// If the discovery process is not complete (see `is_complete()`), the
/// caller must be aware that this is an intermediate state.
///
/// On the other hand, if it is complete, then this is currently
/// the only way to retrieve the end results of the discovery process.
///
/// We may introduce in the future an `into_common_heads` call that
/// would be more appropriate for normal Rust callers, dropping `self`
/// if it is complete.
pub fn common_heads(&self) -> Result<HashSet<Revision>, GraphError> {
self.common.bases_heads()
}
/// Force first computation of `self.undecided`
///
/// After this, `self.undecided.as_ref()` and `.as_mut()` can be
/// unwrapped to get workable immutable or mutable references without
/// any panic.
///
/// This is an imperative call instead of an access with added lazyness
/// to reduce easily the scope of mutable borrow for the caller,
/// compared to undecided(&'a mut self) -> &'a… that would keep it
/// as long as the resulting immutable one.
fn ensure_undecided(&mut self) -> Result<(), GraphError> {
if self.undecided.is_some() {
return Ok(());
}
let tgt = self.target_heads.take().unwrap();
self.undecided =
Some(self.common.missing_ancestors(tgt)?.into_iter().collect());
Ok(())
}
/// Provide statistics about the current state of the discovery process
pub fn stats(&self) -> DiscoveryStats {
DiscoveryStats {
undecided: self.undecided.as_ref().map(|s| s.len()),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::testing::SampleGraph;
/// A PartialDiscovery as for pushing all the heads of `SampleGraph`
fn full_disco() -> PartialDiscovery<SampleGraph> {
PartialDiscovery::new(SampleGraph, vec![10, 11, 12, 13])
}
fn sorted_undecided(
disco: &PartialDiscovery<SampleGraph>,
) -> Vec<Revision> {
let mut as_vec: Vec<Revision> =
disco.undecided.as_ref().unwrap().iter().cloned().collect();
as_vec.sort();
as_vec
}
fn sorted_missing(disco: &PartialDiscovery<SampleGraph>) -> Vec<Revision> {
let mut as_vec: Vec<Revision> =
disco.missing.iter().cloned().collect();
as_vec.sort();
as_vec
}
fn sorted_common_heads(
disco: &PartialDiscovery<SampleGraph>,
) -> Result<Vec<Revision>, GraphError> {
let mut as_vec: Vec<Revision> =
disco.common_heads()?.iter().cloned().collect();
as_vec.sort();
Ok(as_vec)
}
#[test]
fn test_add_common_get_undecided() -> Result<(), GraphError> {
let mut disco = full_disco();
assert_eq!(disco.undecided, None);
assert!(!disco.has_info());
assert_eq!(disco.stats().undecided, None);
disco.add_common_revisions(vec![11, 12])?;
assert!(disco.has_info());
assert!(!disco.is_complete());
assert!(disco.missing.is_empty());
// add_common_revisions did not trigger a premature computation
// of `undecided`, let's check that and ask for them
assert_eq!(disco.undecided, None);
disco.ensure_undecided()?;
assert_eq!(sorted_undecided(&disco), vec![5, 8, 10, 13]);
assert_eq!(disco.stats().undecided, Some(4));
Ok(())
}
/// in this test, we pretend that our peer misses exactly (8+10)::
/// and we're comparing all our repo to it (as in a bare push)
#[test]
fn test_discovery() -> Result<(), GraphError> {
let mut disco = full_disco();
disco.add_common_revisions(vec![11, 12])?;
disco.add_missing_revisions(vec![8, 10])?;
assert_eq!(sorted_undecided(&disco), vec![5]);
assert_eq!(sorted_missing(&disco), vec![8, 10, 13]);
assert!(!disco.is_complete());
disco.add_common_revisions(vec![5])?;
assert_eq!(sorted_undecided(&disco), vec![]);
assert_eq!(sorted_missing(&disco), vec![8, 10, 13]);
assert!(disco.is_complete());
assert_eq!(sorted_common_heads(&disco)?, vec![5, 11, 12]);
Ok(())
}
}