upstream/mercurial-mirror Files · rust/hg-core/src/requirements.rs

revlog: add a mechanism to verify expected file position before appending...

revlog: add a mechanism to verify expected file position before appending If someone uses `hg debuglocks`, or some non-hg process writes to the .hg directory without respecting the locks, or if the repo's on a networked filesystem, it's possible for the revlog code to write out corrupted data. The form of this corruption can vary depending on what data was written and how that happened. We are in the "networked filesystem" case (though I've had users also do this to themselves with the "`hg debuglocks`" scenario), and most often see this with the changelog. What ends up happening is we produce two items (let's call them rev1 and rev2) in the .i file that have the same linkrev, baserev, and offset into the .d file, while the data in the .d file is appended properly. rev2's compressed_size is accurate for rev2, but when we go to decompress the data in the .d file, we use the offset that's recorded in the index file, which is the same as rev1, and attempt to decompress rev2.compressed_size bytes of rev1's data. This usually does not succeed. :) When using inline data, this also fails, though I haven't investigated why too closely. This shows up as a "patch decode" error. I believe what's happening there is that we're basically ignoring the offset field, getting the data properly, but since baserev != rev, it thinks this is a delta based on rev (instead of a full text) and can't actually apply it as such. For now, I'm going to make this an optional component and default it to entirely off. I may increase the default severity of this in the future, once I've enabled it for my users and we gain more experience with it. Luckily, most of my users have a versioned filesystem and can roll back to before the corruption has been written, it's just a hassle to do so and not everyone knows how (so it's a support burden). Users on other filesystems will not have that luxury, and this can cause them to have a corrupted repository that they are unlikely to know how to resolve, and they'll see this as a data-loss event. Refusing to create the corruption is a much better user experience. This mechanism is not perfect. There may be false-negatives (racy writes that are not detected). There should not be any false-positives (non-racy writes that are detected as such). This is not a mechanism that makes putting a repo on a networked filesystem "safe" or "supported", just *less* likely to cause corruption. Differential Revision: https://phab.mercurial-scm.org/D9952

Simon Sapin - - Load All Authors

File last commit:

r47191:95b27628 default


                r47349:e9901d01

default

Download file

             requirements.rs
        
                    138 lines
            
             | 5.2 KiB
            
                | application/rls-services+xml
            
             |
                RustLexer
            
             / rust / hg-core / src / requirements.rs
          
                    History
                
                 |
                  Annotation
                 | Raw
                 |Copy content
                 |Copy permalink

      use crate::errors::{HgError, HgResultExt};

      use crate::repo::{Repo, Vfs};

      use std::collections::HashSet;

      fn parse(bytes: &[u8]) -> Result<HashSet<String>, HgError> {

          // The Python code reading this file uses `str.splitlines`

          // which looks for a number of line separators (even including a couple of

          // non-ASCII ones), but Python code writing it always uses `\n`.

          let lines = bytes.split(|&byte| byte == b'\n');

          lines

              .filter(|line| !line.is_empty())

              .map(|line| {

                  // Python uses Unicode `str.isalnum` but feature names are all

                  // ASCII

                  if line[0].is_ascii_alphanumeric() && line.is_ascii() {

                      Ok(String::from_utf8(line.into()).unwrap())

                  } else {

                      Err(HgError::corrupted("parse error in 'requires' file"))

                  }

              })

              .collect()

      }

      pub(crate) fn load(hg_vfs: Vfs) -> Result<HashSet<String>, HgError> {

          parse(&hg_vfs.read("requires")?)

      }

      pub(crate) fn load_if_exists(hg_vfs: Vfs) -> Result<HashSet<String>, HgError> {

          if let Some(bytes) = hg_vfs.read("requires").io_not_found_as_none()? {

              parse(&bytes)

          } else {

              // Treat a missing file the same as an empty file.

              // From `mercurial/localrepo.py`:

              // > requires file contains a newline-delimited list of

              // > features/capabilities the opener (us) must have in order to use

              // > the repository. This file was introduced in Mercurial 0.9.2,

              // > which means very old repositories may not have one. We assume

              // > a missing file translates to no requirements.

              Ok(HashSet::new())

          }

      }

      pub(crate) fn check(repo: &Repo) -> Result<(), HgError> {

          for feature in repo.requirements() {

              if !SUPPORTED.contains(&feature.as_str()) {

                  // TODO: collect and all unknown features and include them in the

                  // error message?

                  return Err(HgError::UnsupportedFeature(format!(

                      "repository requires feature unknown to this Mercurial: {}",

                      feature

                  )));

              }

          }

          Ok(())

      }

      // TODO: set this to actually-supported features

      const SUPPORTED: &[&str] = &[

          "dotencode",

          "fncache",

          "generaldelta",

          "revlogv1",

          SHARED_REQUIREMENT,

          SHARESAFE_REQUIREMENT,

          SPARSEREVLOG_REQUIREMENT,

          RELATIVE_SHARED_REQUIREMENT,

          "store",

          // As of this writing everything rhg does is read-only.

          // When it starts writing to the repository, it’ll need to either keep the

          // persistent nodemap up to date or remove this entry:

          "persistent-nodemap",

      ];

      // Copied from mercurial/requirements.py:

      /// When narrowing is finalized and no longer subject to format changes,

      /// we should move this to just "narrow" or similar.

      #[allow(unused)]

      pub(crate) const NARROW_REQUIREMENT: &str = "narrowhg-experimental";

      /// Enables sparse working directory usage

      #[allow(unused)]

      pub(crate) const SPARSE_REQUIREMENT: &str = "exp-sparse";

      /// Enables the internal phase which is used to hide changesets instead

      /// of stripping them

      #[allow(unused)]

      pub(crate) const INTERNAL_PHASE_REQUIREMENT: &str = "internal-phase";

      /// Stores manifest in Tree structure

      #[allow(unused)]

      pub(crate) const TREEMANIFEST_REQUIREMENT: &str = "treemanifest";

      /// Increment the sub-version when the revlog v2 format changes to lock out old

      /// clients.

      #[allow(unused)]

      pub(crate) const REVLOGV2_REQUIREMENT: &str = "exp-revlogv2.1";

      /// A repository with the sparserevlog feature will have delta chains that

      /// can spread over a larger span. Sparse reading cuts these large spans into

      /// pieces, so that each piece isn't too big.

      /// Without the sparserevlog capability, reading from the repository could use

      /// huge amounts of memory, because the whole span would be read at once,

      /// including all the intermediate revisions that aren't pertinent for the

      /// chain. This is why once a repository has enabled sparse-read, it becomes

      /// required.

      #[allow(unused)]

      pub(crate) const SPARSEREVLOG_REQUIREMENT: &str = "sparserevlog";

      /// A repository with the sidedataflag requirement will allow to store extra

      /// information for revision without altering their original hashes.

      #[allow(unused)]

      pub(crate) const SIDEDATA_REQUIREMENT: &str = "exp-sidedata-flag";

      /// A repository with the the copies-sidedata-changeset requirement will store

      /// copies related information in changeset's sidedata.

      #[allow(unused)]

      pub(crate) const COPIESSDC_REQUIREMENT: &str = "exp-copies-sidedata-changeset";

      /// The repository use persistent nodemap for the changelog and the manifest.

      #[allow(unused)]

      pub(crate) const NODEMAP_REQUIREMENT: &str = "persistent-nodemap";

      /// Denotes that the current repository is a share

      #[allow(unused)]

      pub(crate) const SHARED_REQUIREMENT: &str = "shared";

      /// Denotes that current repository is a share and the shared source path is

      /// relative to the current repository root path

      #[allow(unused)]

      pub(crate) const RELATIVE_SHARED_REQUIREMENT: &str = "relshared";

      /// A repository with share implemented safely. The repository has different

      /// store and working copy requirements i.e. both `.hg/requires` and

      /// `.hg/store/requires` are present.

      #[allow(unused)]

      pub(crate) const SHARESAFE_REQUIREMENT: &str = "share-safe";

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

				use crate::errors::{HgError, HgResultExt};
				use crate::repo::{Repo, Vfs};
				use std::collections::HashSet;

				fn parse(bytes: &[u8]) -> Result<HashSet<String>, HgError> {
				// The Python code reading this file uses `str.splitlines`
				// which looks for a number of line separators (even including a couple of
				// non-ASCII ones), but Python code writing it always uses `\n`.
				let lines = bytes.split(\|&byte\| byte == b'\n');

				lines
				.filter(\|line\| !line.is_empty())
				.map(\|line\| {
				// Python uses Unicode `str.isalnum` but feature names are all
				// ASCII
				if line[0].is_ascii_alphanumeric() && line.is_ascii() {
				Ok(String::from_utf8(line.into()).unwrap())
				} else {
				Err(HgError::corrupted("parse error in 'requires' file"))
				}
				})
				.collect()
				}

				pub(crate) fn load(hg_vfs: Vfs) -> Result<HashSet<String>, HgError> {
				parse(&hg_vfs.read("requires")?)
				}

				pub(crate) fn load_if_exists(hg_vfs: Vfs) -> Result<HashSet<String>, HgError> {
				if let Some(bytes) = hg_vfs.read("requires").io_not_found_as_none()? {
				parse(&bytes)
				} else {
				// Treat a missing file the same as an empty file.
				// From `mercurial/localrepo.py`:
				// > requires file contains a newline-delimited list of
				// > features/capabilities the opener (us) must have in order to use
				// > the repository. This file was introduced in Mercurial 0.9.2,
				// > which means very old repositories may not have one. We assume
				// > a missing file translates to no requirements.
				Ok(HashSet::new())
				}
				}

				pub(crate) fn check(repo: &Repo) -> Result<(), HgError> {
				for feature in repo.requirements() {
				if !SUPPORTED.contains(&feature.as_str()) {
				// TODO: collect and all unknown features and include them in the
				// error message?
				return Err(HgError::UnsupportedFeature(format!(
				"repository requires feature unknown to this Mercurial: {}",
				feature
				)));
				}
				}
				Ok(())
				}

				// TODO: set this to actually-supported features
				const SUPPORTED: &[&str] = &[
				"dotencode",
				"fncache",
				"generaldelta",
				"revlogv1",
				SHARED_REQUIREMENT,
				SHARESAFE_REQUIREMENT,
				SPARSEREVLOG_REQUIREMENT,
				RELATIVE_SHARED_REQUIREMENT,
				"store",
				// As of this writing everything rhg does is read-only.
				// When it starts writing to the repository, it’ll need to either keep the
				// persistent nodemap up to date or remove this entry:
				"persistent-nodemap",
				];

				// Copied from mercurial/requirements.py:

				/// When narrowing is finalized and no longer subject to format changes,
				/// we should move this to just "narrow" or similar.
				#[allow(unused)]
				pub(crate) const NARROW_REQUIREMENT: &str = "narrowhg-experimental";

				/// Enables sparse working directory usage
				#[allow(unused)]
				pub(crate) const SPARSE_REQUIREMENT: &str = "exp-sparse";

				/// Enables the internal phase which is used to hide changesets instead
				/// of stripping them
				#[allow(unused)]
				pub(crate) const INTERNAL_PHASE_REQUIREMENT: &str = "internal-phase";

				/// Stores manifest in Tree structure
				#[allow(unused)]
				pub(crate) const TREEMANIFEST_REQUIREMENT: &str = "treemanifest";

				/// Increment the sub-version when the revlog v2 format changes to lock out old
				/// clients.
				#[allow(unused)]
				pub(crate) const REVLOGV2_REQUIREMENT: &str = "exp-revlogv2.1";

				/// A repository with the sparserevlog feature will have delta chains that
				/// can spread over a larger span. Sparse reading cuts these large spans into
				/// pieces, so that each piece isn't too big.
				/// Without the sparserevlog capability, reading from the repository could use
				/// huge amounts of memory, because the whole span would be read at once,
				/// including all the intermediate revisions that aren't pertinent for the
				/// chain. This is why once a repository has enabled sparse-read, it becomes
				/// required.
				#[allow(unused)]
				pub(crate) const SPARSEREVLOG_REQUIREMENT: &str = "sparserevlog";

				/// A repository with the sidedataflag requirement will allow to store extra
				/// information for revision without altering their original hashes.
				#[allow(unused)]
				pub(crate) const SIDEDATA_REQUIREMENT: &str = "exp-sidedata-flag";

				/// A repository with the the copies-sidedata-changeset requirement will store
				/// copies related information in changeset's sidedata.
				#[allow(unused)]
				pub(crate) const COPIESSDC_REQUIREMENT: &str = "exp-copies-sidedata-changeset";

				/// The repository use persistent nodemap for the changelog and the manifest.
				#[allow(unused)]
				pub(crate) const NODEMAP_REQUIREMENT: &str = "persistent-nodemap";

				/// Denotes that the current repository is a share
				#[allow(unused)]
				pub(crate) const SHARED_REQUIREMENT: &str = "shared";

				/// Denotes that current repository is a share and the shared source path is
				/// relative to the current repository root path
				#[allow(unused)]
				pub(crate) const RELATIVE_SHARED_REQUIREMENT: &str = "relshared";

				/// A repository with share implemented safely. The repository has different
				/// store and working copy requirements i.e. both `.hg/requires` and
				/// `.hg/store/requires` are present.
				#[allow(unused)]
				pub(crate) const SHARESAFE_REQUIREMENT: &str = "share-safe";