##// END OF EJS Templates
changegroup: remove reordering control (BC)...
changegroup: remove reordering control (BC) This logic - including the experimental bundle.reorder option - was originally added in a8e3931e3fb5 in 2011 and then later ported to changegroup.py. The intent of this option and associated logic is to control the ordering of revisions in deltagroups in changegroups. At the time it was implemented, only changegroup version 1 existed and generaldelta revlogs were just coming into the world. Changegroup version 1 requires that deltas be made against the last revision sent over the wire. Used with generaldelta, this created an impedance mismatch of sorts and resulted in changegroup producers spending a lot of time recomputing deltas. Revision reordering was introduced so outgoing revisions would be sent in "generaldelta order" and producers would be able to reuse internal deltas from storage. Later on, we introduced changegroup version 2. It supported denoting which revision a delta was against. So we no longer needed to sort outgoing revisions to ensure optimal delta generation from the producer. So, subsequent changegroup versions disabled reordering. We also later made the changelog not store deltas by default. And we also made the changelog send out deltas in storage order. Why we do this for changelog, I'm not sure. Maybe we want to preserve revision order across clones? It doesn't really matter for this commit. Fast forward to 2018. We want to abstract storage backends. And having changegroup code require knowledge about how deltas are stored internally interferes with that goal. This commit removes reordering control from changegroup generation. After this commit, the reordering behavior is: * The changelog is always sent out in storage order (no behavior change). * Non-changelog generaldelta revlogs are reordered to always be in DAG topological order (previously, generaldelta revlogs would be emitted in storage order for version 2 and 3 changegroups). * Non-changelog non-generaldelta revlogs are sent in storage order (no behavior change). * There exists no config option to override behavior. The big difference here is that generaldelta revlogs now *always* have their revisions sorted in DAG order before going out over the wire. This behavior was previously only done for changegroup version 1. Version 2 and version 3 changegroups disabled reordering because the interchange format supported encoding arbitrary delta parents, so reordering wasn't strictly necessary. I can think of a few significant implications for this change. Because changegroup receivers will now see non-changelog revisions in DAG order instead of storage order, the internal storage order of manifests and files may differ substantially between producer and consumer. I don't think this matters that much, since the storage order of manifests and files is largely hidden from users. Only the storage order of changelog matters (because `hg log` shows the changelog in storage order). I don't think there should be any controversy here. The reordering of revisions has implications for changegroup producers. Previously, generaldelta revlogs would be emitted in storage order. And in the common case, the internally-stored delta could effectively be copied from disk into the deltagroup delta. This meant that emitting delta groups for generaldelta revlogs would be mostly linear read I/O. This is desirable for performance. With us now reordering generaldelta revlog revisions in DAG order, the read operations may use more random I/O instead of sequential I/O. This could result in performance loss. But with the prevalence of SSDs and fast random I/O, I'm not too worried. (Note: the optimal emission order for revlogs is actually delta encoding order. But the changegroup code wasn't doing that before or after this change. We could potentially implement that in a later commit.) Changegroups in DAG order will have implications for receivers. Previously, receiving storage order might mean seeing a number of interleaved branches. This would mean long delta chains, sparse I/O, and possibly more fulltext revisions instead of deltas, blowing up storage storage. (This is the same set of problems that sparse revlogs aims to address.) With the producer now sending revisions in DAG order, the receiver also stores revisions in DAG order. That means revisions for the same DAG branch are all grouped together. And this should yield better storage outcomes. In other words, sending the reordered changegroup allows the receiver to have better storage order and for the producer to not propagate its (possibly sub-optimal) internal storage order. On the mozilla-unified repository, this change influences bundle generation: $ hg bundle -t none-v2 -a before: time: real 355.680 secs (user 256.790+0.000 sys 16.820+0.000) after: time: real 382.950 secs (user 281.700+0.000 sys 17.690+0.000) before: 7,150,228,967 bytes (uncompressed) after: 7,041,556,273 bytes (uncompressed) before: 1,669,063,234 bytes (zstd l=3) after: 1,628,598,830 bytes (zstd l=3) $ hg unbundle before: time: real 511.910 secs (user 466.750+0.000 sys 32.680+0.000) after: time: real 487.790 secs (user 443.940+0.000 sys 30.840+0.000) 00manifest.d size: source: 274,924,292 bytes before: 304,741,626 bytes after: 245,252,087 bytes .hg/store total file size: source: 2,649,133,490 before: 2,680,888,130 after: 2,627,875,673 We see the bundle size drop. That's probably because if a revlog internally isn't storing a delta, it will choose to delta against the last emitted revision. And on repos with interleaved branches (like mozilla-unified), the previous revision could be an unrelated branch and therefore be a large delta. But with this patch, the previous revision is likely p1 or p2 and a delta should be small. We also see the manifest size drop by ~50 MB. It's worth noting that the manifest actually *increased* in size by ~25 MB in the old strategy and decreased ~25 MB from its source in the new strategy. Again, my explanation for this is that the DAG ordering in the changegroup is resulting in better grouping of revisions in the receiver, which results in more compact delta chains and higher storage efficiency. Unbundle time also dropped. I suspect this is due to the revlog having to work less to compute deltas since the incoming deltas are more optimal. i.e. the receiver spends less time resolving fulltext revisions as incoming deltas bounce around between DAG branches and delta chains. We also see bundle generation time increase. This is not desirable. However, the regression is only significant on the original repository: if we generate a bundle from the repository created from the new, always reordered bundles, we're close to baseline (if not at it with expected noise): $ hg bundle -t none-v2 -a before (original): time: real 355.680 secs (user 256.790+0.000 sys 16.820+0.000) after (original): time: real 382.950 secs (user 281.700+0.000 sys 17.690+0.000) after (new repo): time: real 362.280 secs (user 260.300+0.000 sys 17.700+0.000) This regression is a bit worrying because it will impact serving canonical repositories (that don't have optimal internal storage unless they are reordered - possibly as part of running `hg debugupgraderepo`). However, this regression will only be noticed by very large changegroups. And I'm guessing/hoping that any repository that large is using clonebundles to mitigate server load. Again, sending DAG order isn't the optimal send order for servers: sending in storage-delta order is. But in order to enable storage-optimal send order, we'll need a storage API that handles sorting. Future commits will introduce such an API. Differential Revision: https://phab.mercurial-scm.org/D4721

File last commit:

r37195:68ee6182 default
r39897:db5501d9 default
Show More
idatetime.py
577 lines | 19.6 KiB | text/x-python | PythonLexer
##############################################################################
# Copyright (c) 2002 Zope Foundation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE.
##############################################################################
"""Datetime interfaces.
This module is called idatetime because if it were called datetime the import
of the real datetime would fail.
"""
from __future__ import absolute_import
from .. import Interface, Attribute
from .. import classImplements
from datetime import timedelta, date, datetime, time, tzinfo
class ITimeDeltaClass(Interface):
"""This is the timedelta class interface."""
min = Attribute("The most negative timedelta object")
max = Attribute("The most positive timedelta object")
resolution = Attribute(
"The smallest difference between non-equal timedelta objects")
class ITimeDelta(ITimeDeltaClass):
"""Represent the difference between two datetime objects.
Supported operators:
- add, subtract timedelta
- unary plus, minus, abs
- compare to timedelta
- multiply, divide by int/long
In addition, datetime supports subtraction of two datetime objects
returning a timedelta, and addition or subtraction of a datetime
and a timedelta giving a datetime.
Representation: (days, seconds, microseconds).
"""
days = Attribute("Days between -999999999 and 999999999 inclusive")
seconds = Attribute("Seconds between 0 and 86399 inclusive")
microseconds = Attribute("Microseconds between 0 and 999999 inclusive")
class IDateClass(Interface):
"""This is the date class interface."""
min = Attribute("The earliest representable date")
max = Attribute("The latest representable date")
resolution = Attribute(
"The smallest difference between non-equal date objects")
def today():
"""Return the current local time.
This is equivalent to date.fromtimestamp(time.time())"""
def fromtimestamp(timestamp):
"""Return the local date from a POSIX timestamp (like time.time())
This may raise ValueError, if the timestamp is out of the range of
values supported by the platform C localtime() function. It's common
for this to be restricted to years from 1970 through 2038. Note that
on non-POSIX systems that include leap seconds in their notion of a
timestamp, leap seconds are ignored by fromtimestamp().
"""
def fromordinal(ordinal):
"""Return the date corresponding to the proleptic Gregorian ordinal.
January 1 of year 1 has ordinal 1. ValueError is raised unless
1 <= ordinal <= date.max.toordinal().
For any date d, date.fromordinal(d.toordinal()) == d.
"""
class IDate(IDateClass):
"""Represents a date (year, month and day) in an idealized calendar.
Operators:
__repr__, __str__
__cmp__, __hash__
__add__, __radd__, __sub__ (add/radd only with timedelta arg)
"""
year = Attribute("Between MINYEAR and MAXYEAR inclusive.")
month = Attribute("Between 1 and 12 inclusive")
day = Attribute(
"Between 1 and the number of days in the given month of the given year.")
def replace(year, month, day):
"""Return a date with the same value.
Except for those members given new values by whichever keyword
arguments are specified. For example, if d == date(2002, 12, 31), then
d.replace(day=26) == date(2000, 12, 26).
"""
def timetuple():
"""Return a 9-element tuple of the form returned by time.localtime().
The hours, minutes and seconds are 0, and the DST flag is -1.
d.timetuple() is equivalent to
(d.year, d.month, d.day, 0, 0, 0, d.weekday(), d.toordinal() -
date(d.year, 1, 1).toordinal() + 1, -1)
"""
def toordinal():
"""Return the proleptic Gregorian ordinal of the date
January 1 of year 1 has ordinal 1. For any date object d,
date.fromordinal(d.toordinal()) == d.
"""
def weekday():
"""Return the day of the week as an integer.
Monday is 0 and Sunday is 6. For example,
date(2002, 12, 4).weekday() == 2, a Wednesday.
See also isoweekday().
"""
def isoweekday():
"""Return the day of the week as an integer.
Monday is 1 and Sunday is 7. For example,
date(2002, 12, 4).isoweekday() == 3, a Wednesday.
See also weekday(), isocalendar().
"""
def isocalendar():
"""Return a 3-tuple, (ISO year, ISO week number, ISO weekday).
The ISO calendar is a widely used variant of the Gregorian calendar.
See http://www.phys.uu.nl/~vgent/calendar/isocalendar.htm for a good
explanation.
The ISO year consists of 52 or 53 full weeks, and where a week starts
on a Monday and ends on a Sunday. The first week of an ISO year is the
first (Gregorian) calendar week of a year containing a Thursday. This
is called week number 1, and the ISO year of that Thursday is the same
as its Gregorian year.
For example, 2004 begins on a Thursday, so the first week of ISO year
2004 begins on Monday, 29 Dec 2003 and ends on Sunday, 4 Jan 2004, so
that date(2003, 12, 29).isocalendar() == (2004, 1, 1) and
date(2004, 1, 4).isocalendar() == (2004, 1, 7).
"""
def isoformat():
"""Return a string representing the date in ISO 8601 format.
This is 'YYYY-MM-DD'.
For example, date(2002, 12, 4).isoformat() == '2002-12-04'.
"""
def __str__():
"""For a date d, str(d) is equivalent to d.isoformat()."""
def ctime():
"""Return a string representing the date.
For example date(2002, 12, 4).ctime() == 'Wed Dec 4 00:00:00 2002'.
d.ctime() is equivalent to time.ctime(time.mktime(d.timetuple()))
on platforms where the native C ctime() function
(which time.ctime() invokes, but which date.ctime() does not invoke)
conforms to the C standard.
"""
def strftime(format):
"""Return a string representing the date.
Controlled by an explicit format string. Format codes referring to
hours, minutes or seconds will see 0 values.
"""
class IDateTimeClass(Interface):
"""This is the datetime class interface."""
min = Attribute("The earliest representable datetime")
max = Attribute("The latest representable datetime")
resolution = Attribute(
"The smallest possible difference between non-equal datetime objects")
def today():
"""Return the current local datetime, with tzinfo None.
This is equivalent to datetime.fromtimestamp(time.time()).
See also now(), fromtimestamp().
"""
def now(tz=None):
"""Return the current local date and time.
If optional argument tz is None or not specified, this is like today(),
but, if possible, supplies more precision than can be gotten from going
through a time.time() timestamp (for example, this may be possible on
platforms supplying the C gettimeofday() function).
Else tz must be an instance of a class tzinfo subclass, and the current
date and time are converted to tz's time zone. In this case the result
is equivalent to tz.fromutc(datetime.utcnow().replace(tzinfo=tz)).
See also today(), utcnow().
"""
def utcnow():
"""Return the current UTC date and time, with tzinfo None.
This is like now(), but returns the current UTC date and time, as a
naive datetime object.
See also now().
"""
def fromtimestamp(timestamp, tz=None):
"""Return the local date and time corresponding to the POSIX timestamp.
Same as is returned by time.time(). If optional argument tz is None or
not specified, the timestamp is converted to the platform's local date
and time, and the returned datetime object is naive.
Else tz must be an instance of a class tzinfo subclass, and the
timestamp is converted to tz's time zone. In this case the result is
equivalent to
tz.fromutc(datetime.utcfromtimestamp(timestamp).replace(tzinfo=tz)).
fromtimestamp() may raise ValueError, if the timestamp is out of the
range of values supported by the platform C localtime() or gmtime()
functions. It's common for this to be restricted to years in 1970
through 2038. Note that on non-POSIX systems that include leap seconds
in their notion of a timestamp, leap seconds are ignored by
fromtimestamp(), and then it's possible to have two timestamps
differing by a second that yield identical datetime objects.
See also utcfromtimestamp().
"""
def utcfromtimestamp(timestamp):
"""Return the UTC datetime from the POSIX timestamp with tzinfo None.
This may raise ValueError, if the timestamp is out of the range of
values supported by the platform C gmtime() function. It's common for
this to be restricted to years in 1970 through 2038.
See also fromtimestamp().
"""
def fromordinal(ordinal):
"""Return the datetime from the proleptic Gregorian ordinal.
January 1 of year 1 has ordinal 1. ValueError is raised unless
1 <= ordinal <= datetime.max.toordinal().
The hour, minute, second and microsecond of the result are all 0, and
tzinfo is None.
"""
def combine(date, time):
"""Return a new datetime object.
Its date members are equal to the given date object's, and whose time
and tzinfo members are equal to the given time object's. For any
datetime object d, d == datetime.combine(d.date(), d.timetz()).
If date is a datetime object, its time and tzinfo members are ignored.
"""
class IDateTime(IDate, IDateTimeClass):
"""Object contains all the information from a date object and a time object.
"""
year = Attribute("Year between MINYEAR and MAXYEAR inclusive")
month = Attribute("Month between 1 and 12 inclusive")
day = Attribute(
"Day between 1 and the number of days in the given month of the year")
hour = Attribute("Hour in range(24)")
minute = Attribute("Minute in range(60)")
second = Attribute("Second in range(60)")
microsecond = Attribute("Microsecond in range(1000000)")
tzinfo = Attribute(
"""The object passed as the tzinfo argument to the datetime constructor
or None if none was passed""")
def date():
"""Return date object with same year, month and day."""
def time():
"""Return time object with same hour, minute, second, microsecond.
tzinfo is None. See also method timetz().
"""
def timetz():
"""Return time object with same hour, minute, second, microsecond,
and tzinfo.
See also method time().
"""
def replace(year, month, day, hour, minute, second, microsecond, tzinfo):
"""Return a datetime with the same members, except for those members
given new values by whichever keyword arguments are specified.
Note that tzinfo=None can be specified to create a naive datetime from
an aware datetime with no conversion of date and time members.
"""
def astimezone(tz):
"""Return a datetime object with new tzinfo member tz, adjusting the
date and time members so the result is the same UTC time as self, but
in tz's local time.
tz must be an instance of a tzinfo subclass, and its utcoffset() and
dst() methods must not return None. self must be aware (self.tzinfo
must not be None, and self.utcoffset() must not return None).
If self.tzinfo is tz, self.astimezone(tz) is equal to self: no
adjustment of date or time members is performed. Else the result is
local time in time zone tz, representing the same UTC time as self:
after astz = dt.astimezone(tz), astz - astz.utcoffset()
will usually have the same date and time members as dt - dt.utcoffset().
The discussion of class tzinfo explains the cases at Daylight Saving
Time transition boundaries where this cannot be achieved (an issue only
if tz models both standard and daylight time).
If you merely want to attach a time zone object tz to a datetime dt
without adjustment of date and time members, use dt.replace(tzinfo=tz).
If you merely want to remove the time zone object from an aware
datetime dt without conversion of date and time members, use
dt.replace(tzinfo=None).
Note that the default tzinfo.fromutc() method can be overridden in a
tzinfo subclass to effect the result returned by astimezone().
"""
def utcoffset():
"""Return the timezone offset in minutes east of UTC (negative west of
UTC)."""
def dst():
"""Return 0 if DST is not in effect, or the DST offset (in minutes
eastward) if DST is in effect.
"""
def tzname():
"""Return the timezone name."""
def timetuple():
"""Return a 9-element tuple of the form returned by time.localtime()."""
def utctimetuple():
"""Return UTC time tuple compatilble with time.gmtimr()."""
def toordinal():
"""Return the proleptic Gregorian ordinal of the date.
The same as self.date().toordinal().
"""
def weekday():
"""Return the day of the week as an integer.
Monday is 0 and Sunday is 6. The same as self.date().weekday().
See also isoweekday().
"""
def isoweekday():
"""Return the day of the week as an integer.
Monday is 1 and Sunday is 7. The same as self.date().isoweekday.
See also weekday(), isocalendar().
"""
def isocalendar():
"""Return a 3-tuple, (ISO year, ISO week number, ISO weekday).
The same as self.date().isocalendar().
"""
def isoformat(sep='T'):
"""Return a string representing the date and time in ISO 8601 format.
YYYY-MM-DDTHH:MM:SS.mmmmmm or YYYY-MM-DDTHH:MM:SS if microsecond is 0
If utcoffset() does not return None, a 6-character string is appended,
giving the UTC offset in (signed) hours and minutes:
YYYY-MM-DDTHH:MM:SS.mmmmmm+HH:MM or YYYY-MM-DDTHH:MM:SS+HH:MM
if microsecond is 0.
The optional argument sep (default 'T') is a one-character separator,
placed between the date and time portions of the result.
"""
def __str__():
"""For a datetime instance d, str(d) is equivalent to d.isoformat(' ').
"""
def ctime():
"""Return a string representing the date and time.
datetime(2002, 12, 4, 20, 30, 40).ctime() == 'Wed Dec 4 20:30:40 2002'.
d.ctime() is equivalent to time.ctime(time.mktime(d.timetuple())) on
platforms where the native C ctime() function (which time.ctime()
invokes, but which datetime.ctime() does not invoke) conforms to the
C standard.
"""
def strftime(format):
"""Return a string representing the date and time.
This is controlled by an explicit format string.
"""
class ITimeClass(Interface):
"""This is the time class interface."""
min = Attribute("The earliest representable time")
max = Attribute("The latest representable time")
resolution = Attribute(
"The smallest possible difference between non-equal time objects")
class ITime(ITimeClass):
"""Represent time with time zone.
Operators:
__repr__, __str__
__cmp__, __hash__
"""
hour = Attribute("Hour in range(24)")
minute = Attribute("Minute in range(60)")
second = Attribute("Second in range(60)")
microsecond = Attribute("Microsecond in range(1000000)")
tzinfo = Attribute(
"""The object passed as the tzinfo argument to the time constructor
or None if none was passed.""")
def replace(hour, minute, second, microsecond, tzinfo):
"""Return a time with the same value.
Except for those members given new values by whichever keyword
arguments are specified. Note that tzinfo=None can be specified
to create a naive time from an aware time, without conversion of the
time members.
"""
def isoformat():
"""Return a string representing the time in ISO 8601 format.
That is HH:MM:SS.mmmmmm or, if self.microsecond is 0, HH:MM:SS
If utcoffset() does not return None, a 6-character string is appended,
giving the UTC offset in (signed) hours and minutes:
HH:MM:SS.mmmmmm+HH:MM or, if self.microsecond is 0, HH:MM:SS+HH:MM
"""
def __str__():
"""For a time t, str(t) is equivalent to t.isoformat()."""
def strftime(format):
"""Return a string representing the time.
This is controlled by an explicit format string.
"""
def utcoffset():
"""Return the timezone offset in minutes east of UTC (negative west of
UTC).
If tzinfo is None, returns None, else returns
self.tzinfo.utcoffset(None), and raises an exception if the latter
doesn't return None or a timedelta object representing a whole number
of minutes with magnitude less than one day.
"""
def dst():
"""Return 0 if DST is not in effect, or the DST offset (in minutes
eastward) if DST is in effect.
If tzinfo is None, returns None, else returns self.tzinfo.dst(None),
and raises an exception if the latter doesn't return None, or a
timedelta object representing a whole number of minutes with
magnitude less than one day.
"""
def tzname():
"""Return the timezone name.
If tzinfo is None, returns None, else returns self.tzinfo.tzname(None),
or raises an exception if the latter doesn't return None or a string
object.
"""
class ITZInfo(Interface):
"""Time zone info class.
"""
def utcoffset(dt):
"""Return offset of local time from UTC, in minutes east of UTC.
If local time is west of UTC, this should be negative.
Note that this is intended to be the total offset from UTC;
for example, if a tzinfo object represents both time zone and DST
adjustments, utcoffset() should return their sum. If the UTC offset
isn't known, return None. Else the value returned must be a timedelta
object specifying a whole number of minutes in the range -1439 to 1439
inclusive (1440 = 24*60; the magnitude of the offset must be less
than one day).
"""
def dst(dt):
"""Return the daylight saving time (DST) adjustment, in minutes east
of UTC, or None if DST information isn't known.
"""
def tzname(dt):
"""Return the time zone name corresponding to the datetime object as
a string.
"""
def fromutc(dt):
"""Return an equivalent datetime in self's local time."""
classImplements(timedelta, ITimeDelta)
classImplements(date, IDate)
classImplements(datetime, IDateTime)
classImplements(time, ITime)
classImplements(tzinfo, ITZInfo)
## directlyProvides(timedelta, ITimeDeltaClass)
## directlyProvides(date, IDateClass)
## directlyProvides(datetime, IDateTimeClass)
## directlyProvides(time, ITimeClass)