##// END OF EJS Templates
eol: store and reuse pattern matchers instead of creating in tight loop...
Mads Kiilerich -
r30114:ad43458d default
parent child Browse files
Show More
@@ -1,363 +1,365 b''
1 1 """automatically manage newlines in repository files
2 2
3 3 This extension allows you to manage the type of line endings (CRLF or
4 4 LF) that are used in the repository and in the local working
5 5 directory. That way you can get CRLF line endings on Windows and LF on
6 6 Unix/Mac, thereby letting everybody use their OS native line endings.
7 7
8 8 The extension reads its configuration from a versioned ``.hgeol``
9 9 configuration file found in the root of the working directory. The
10 10 ``.hgeol`` file use the same syntax as all other Mercurial
11 11 configuration files. It uses two sections, ``[patterns]`` and
12 12 ``[repository]``.
13 13
14 14 The ``[patterns]`` section specifies how line endings should be
15 15 converted between the working directory and the repository. The format is
16 16 specified by a file pattern. The first match is used, so put more
17 17 specific patterns first. The available line endings are ``LF``,
18 18 ``CRLF``, and ``BIN``.
19 19
20 20 Files with the declared format of ``CRLF`` or ``LF`` are always
21 21 checked out and stored in the repository in that format and files
22 22 declared to be binary (``BIN``) are left unchanged. Additionally,
23 23 ``native`` is an alias for checking out in the platform's default line
24 24 ending: ``LF`` on Unix (including Mac OS X) and ``CRLF`` on
25 25 Windows. Note that ``BIN`` (do nothing to line endings) is Mercurial's
26 26 default behavior; it is only needed if you need to override a later,
27 27 more general pattern.
28 28
29 29 The optional ``[repository]`` section specifies the line endings to
30 30 use for files stored in the repository. It has a single setting,
31 31 ``native``, which determines the storage line endings for files
32 32 declared as ``native`` in the ``[patterns]`` section. It can be set to
33 33 ``LF`` or ``CRLF``. The default is ``LF``. For example, this means
34 34 that on Windows, files configured as ``native`` (``CRLF`` by default)
35 35 will be converted to ``LF`` when stored in the repository. Files
36 36 declared as ``LF``, ``CRLF``, or ``BIN`` in the ``[patterns]`` section
37 37 are always stored as-is in the repository.
38 38
39 39 Example versioned ``.hgeol`` file::
40 40
41 41 [patterns]
42 42 **.py = native
43 43 **.vcproj = CRLF
44 44 **.txt = native
45 45 Makefile = LF
46 46 **.jpg = BIN
47 47
48 48 [repository]
49 49 native = LF
50 50
51 51 .. note::
52 52
53 53 The rules will first apply when files are touched in the working
54 54 directory, e.g. by updating to null and back to tip to touch all files.
55 55
56 56 The extension uses an optional ``[eol]`` section read from both the
57 57 normal Mercurial configuration files and the ``.hgeol`` file, with the
58 58 latter overriding the former. You can use that section to control the
59 59 overall behavior. There are three settings:
60 60
61 61 - ``eol.native`` (default ``os.linesep``) can be set to ``LF`` or
62 62 ``CRLF`` to override the default interpretation of ``native`` for
63 63 checkout. This can be used with :hg:`archive` on Unix, say, to
64 64 generate an archive where files have line endings for Windows.
65 65
66 66 - ``eol.only-consistent`` (default True) can be set to False to make
67 67 the extension convert files with inconsistent EOLs. Inconsistent
68 68 means that there is both ``CRLF`` and ``LF`` present in the file.
69 69 Such files are normally not touched under the assumption that they
70 70 have mixed EOLs on purpose.
71 71
72 72 - ``eol.fix-trailing-newline`` (default False) can be set to True to
73 73 ensure that converted files end with a EOL character (either ``\\n``
74 74 or ``\\r\\n`` as per the configured patterns).
75 75
76 76 The extension provides ``cleverencode:`` and ``cleverdecode:`` filters
77 77 like the deprecated win32text extension does. This means that you can
78 78 disable win32text and enable eol and your filters will still work. You
79 79 only need to these filters until you have prepared a ``.hgeol`` file.
80 80
81 81 The ``win32text.forbid*`` hooks provided by the win32text extension
82 82 have been unified into a single hook named ``eol.checkheadshook``. The
83 83 hook will lookup the expected line endings from the ``.hgeol`` file,
84 84 which means you must migrate to a ``.hgeol`` file first before using
85 85 the hook. ``eol.checkheadshook`` only checks heads, intermediate
86 86 invalid revisions will be pushed. To forbid them completely, use the
87 87 ``eol.checkallhook`` hook. These hooks are best used as
88 88 ``pretxnchangegroup`` hooks.
89 89
90 90 See :hg:`help patterns` for more information about the glob patterns
91 91 used.
92 92 """
93 93
94 94 from __future__ import absolute_import
95 95
96 96 import os
97 97 import re
98 98 from mercurial.i18n import _
99 99 from mercurial import (
100 100 config,
101 101 error,
102 102 extensions,
103 103 match,
104 104 util,
105 105 )
106 106
107 107 # Note for extension authors: ONLY specify testedwith = 'ships-with-hg-core' for
108 108 # extensions which SHIP WITH MERCURIAL. Non-mainline extensions should
109 109 # be specifying the version(s) of Mercurial they are tested with, or
110 110 # leave the attribute unspecified.
111 111 testedwith = 'ships-with-hg-core'
112 112
113 113 # Matches a lone LF, i.e., one that is not part of CRLF.
114 114 singlelf = re.compile('(^|[^\r])\n')
115 115 # Matches a single EOL which can either be a CRLF where repeated CR
116 116 # are removed or a LF. We do not care about old Macintosh files, so a
117 117 # stray CR is an error.
118 118 eolre = re.compile('\r*\n')
119 119
120 120
121 121 def inconsistenteol(data):
122 122 return '\r\n' in data and singlelf.search(data)
123 123
124 124 def tolf(s, params, ui, **kwargs):
125 125 """Filter to convert to LF EOLs."""
126 126 if util.binary(s):
127 127 return s
128 128 if ui.configbool('eol', 'only-consistent', True) and inconsistenteol(s):
129 129 return s
130 130 if (ui.configbool('eol', 'fix-trailing-newline', False)
131 131 and s and s[-1] != '\n'):
132 132 s = s + '\n'
133 133 return eolre.sub('\n', s)
134 134
135 135 def tocrlf(s, params, ui, **kwargs):
136 136 """Filter to convert to CRLF EOLs."""
137 137 if util.binary(s):
138 138 return s
139 139 if ui.configbool('eol', 'only-consistent', True) and inconsistenteol(s):
140 140 return s
141 141 if (ui.configbool('eol', 'fix-trailing-newline', False)
142 142 and s and s[-1] != '\n'):
143 143 s = s + '\n'
144 144 return eolre.sub('\r\n', s)
145 145
146 146 def isbinary(s, params):
147 147 """Filter to do nothing with the file."""
148 148 return s
149 149
150 150 filters = {
151 151 'to-lf': tolf,
152 152 'to-crlf': tocrlf,
153 153 'is-binary': isbinary,
154 154 # The following provide backwards compatibility with win32text
155 155 'cleverencode:': tolf,
156 156 'cleverdecode:': tocrlf
157 157 }
158 158
159 159 class eolfile(object):
160 160 def __init__(self, ui, root, data):
161 161 self._decode = {'LF': 'to-lf', 'CRLF': 'to-crlf', 'BIN': 'is-binary'}
162 162 self._encode = {'LF': 'to-lf', 'CRLF': 'to-crlf', 'BIN': 'is-binary'}
163 163
164 164 self.cfg = config.config()
165 165 # Our files should not be touched. The pattern must be
166 166 # inserted first override a '** = native' pattern.
167 167 self.cfg.set('patterns', '.hg*', 'BIN', 'eol')
168 168 # We can then parse the user's patterns.
169 169 self.cfg.parse('.hgeol', data)
170 170
171 171 isrepolf = self.cfg.get('repository', 'native') != 'CRLF'
172 172 self._encode['NATIVE'] = isrepolf and 'to-lf' or 'to-crlf'
173 173 iswdlf = ui.config('eol', 'native', os.linesep) in ('LF', '\n')
174 174 self._decode['NATIVE'] = iswdlf and 'to-lf' or 'to-crlf'
175 175
176 176 include = []
177 177 exclude = []
178 self.patterns = []
178 179 for pattern, style in self.cfg.items('patterns'):
179 180 key = style.upper()
180 181 if key == 'BIN':
181 182 exclude.append(pattern)
182 183 else:
183 184 include.append(pattern)
185 m = match.match(root, '', [pattern])
186 self.patterns.append((pattern, key, m))
184 187 # This will match the files for which we need to care
185 188 # about inconsistent newlines.
186 189 self.match = match.match(root, '', [], include, exclude)
187 190
188 191 def copytoui(self, ui):
189 for pattern, style in self.cfg.items('patterns'):
190 key = style.upper()
192 for pattern, key, m in self.patterns:
191 193 try:
192 194 ui.setconfig('decode', pattern, self._decode[key], 'eol')
193 195 ui.setconfig('encode', pattern, self._encode[key], 'eol')
194 196 except KeyError:
195 197 ui.warn(_("ignoring unknown EOL style '%s' from %s\n")
196 % (style, self.cfg.source('patterns', pattern)))
198 % (key, self.cfg.source('patterns', pattern)))
197 199 # eol.only-consistent can be specified in ~/.hgrc or .hgeol
198 200 for k, v in self.cfg.items('eol'):
199 201 ui.setconfig('eol', k, v, 'eol')
200 202
201 203 def checkrev(self, repo, ctx, files):
202 204 failed = []
203 205 for f in (files or ctx.files()):
204 206 if f not in ctx:
205 207 continue
206 for pattern, style in self.cfg.items('patterns'):
207 if not match.match(repo.root, '', [pattern])(f):
208 for pattern, key, m in self.patterns:
209 if not m(f):
208 210 continue
209 target = self._encode[style.upper()]
211 target = self._encode[key]
210 212 data = ctx[f].data()
211 213 if (target == "to-lf" and "\r\n" in data
212 214 or target == "to-crlf" and singlelf.search(data)):
213 215 failed.append((f, target, str(ctx)))
214 216 break
215 217 return failed
216 218
217 219 def parseeol(ui, repo, nodes):
218 220 try:
219 221 for node in nodes:
220 222 try:
221 223 if node is None:
222 224 # Cannot use workingctx.data() since it would load
223 225 # and cache the filters before we configure them.
224 226 data = repo.wfile('.hgeol').read()
225 227 else:
226 228 data = repo[node]['.hgeol'].data()
227 229 return eolfile(ui, repo.root, data)
228 230 except (IOError, LookupError):
229 231 pass
230 232 except error.ParseError as inst:
231 233 ui.warn(_("warning: ignoring .hgeol file due to parse error "
232 234 "at %s: %s\n") % (inst.args[1], inst.args[0]))
233 235 return None
234 236
235 237 def _checkhook(ui, repo, node, headsonly):
236 238 # Get revisions to check and touched files at the same time
237 239 files = set()
238 240 revs = set()
239 241 for rev in xrange(repo[node].rev(), len(repo)):
240 242 revs.add(rev)
241 243 if headsonly:
242 244 ctx = repo[rev]
243 245 files.update(ctx.files())
244 246 for pctx in ctx.parents():
245 247 revs.discard(pctx.rev())
246 248 failed = []
247 249 for rev in revs:
248 250 ctx = repo[rev]
249 251 eol = parseeol(ui, repo, [ctx.node()])
250 252 if eol:
251 253 failed.extend(eol.checkrev(repo, ctx, files))
252 254
253 255 if failed:
254 256 eols = {'to-lf': 'CRLF', 'to-crlf': 'LF'}
255 257 msgs = []
256 258 for f, target, node in sorted(failed):
257 259 msgs.append(_(" %s in %s should not have %s line endings") %
258 260 (f, node, eols[target]))
259 261 raise error.Abort(_("end-of-line check failed:\n") + "\n".join(msgs))
260 262
261 263 def checkallhook(ui, repo, node, hooktype, **kwargs):
262 264 """verify that files have expected EOLs"""
263 265 _checkhook(ui, repo, node, False)
264 266
265 267 def checkheadshook(ui, repo, node, hooktype, **kwargs):
266 268 """verify that files have expected EOLs"""
267 269 _checkhook(ui, repo, node, True)
268 270
269 271 # "checkheadshook" used to be called "hook"
270 272 hook = checkheadshook
271 273
272 274 def preupdate(ui, repo, hooktype, parent1, parent2):
273 275 repo.loadeol([parent1])
274 276 return False
275 277
276 278 def uisetup(ui):
277 279 ui.setconfig('hooks', 'preupdate.eol', preupdate, 'eol')
278 280
279 281 def extsetup(ui):
280 282 try:
281 283 extensions.find('win32text')
282 284 ui.warn(_("the eol extension is incompatible with the "
283 285 "win32text extension\n"))
284 286 except KeyError:
285 287 pass
286 288
287 289
288 290 def reposetup(ui, repo):
289 291 uisetup(repo.ui)
290 292
291 293 if not repo.local():
292 294 return
293 295 for name, fn in filters.iteritems():
294 296 repo.adddatafilter(name, fn)
295 297
296 298 ui.setconfig('patch', 'eol', 'auto', 'eol')
297 299
298 300 class eolrepo(repo.__class__):
299 301
300 302 def loadeol(self, nodes):
301 303 eol = parseeol(self.ui, self, nodes)
302 304 if eol is None:
303 305 return None
304 306 eol.copytoui(self.ui)
305 307 return eol.match
306 308
307 309 def _hgcleardirstate(self):
308 310 self._eolmatch = self.loadeol([None, 'tip'])
309 311 if not self._eolmatch:
310 312 self._eolmatch = util.never
311 313 return
312 314
313 315 try:
314 316 cachemtime = os.path.getmtime(self.join("eol.cache"))
315 317 except OSError:
316 318 cachemtime = 0
317 319
318 320 try:
319 321 eolmtime = os.path.getmtime(self.wjoin(".hgeol"))
320 322 except OSError:
321 323 eolmtime = 0
322 324
323 325 if eolmtime > cachemtime:
324 326 self.ui.debug("eol: detected change in .hgeol\n")
325 327 wlock = None
326 328 try:
327 329 wlock = self.wlock()
328 330 for f in self.dirstate:
329 331 if self.dirstate[f] == 'n':
330 332 # all normal files need to be looked at
331 333 # again since the new .hgeol file might no
332 334 # longer match a file it matched before
333 335 self.dirstate.normallookup(f)
334 336 # Create or touch the cache to update mtime
335 337 self.vfs("eol.cache", "w").close()
336 338 wlock.release()
337 339 except error.LockUnavailable:
338 340 # If we cannot lock the repository and clear the
339 341 # dirstate, then a commit might not see all files
340 342 # as modified. But if we cannot lock the
341 343 # repository, then we can also not make a commit,
342 344 # so ignore the error.
343 345 pass
344 346
345 347 def commitctx(self, ctx, haserror=False):
346 348 for f in sorted(ctx.added() + ctx.modified()):
347 349 if not self._eolmatch(f):
348 350 continue
349 351 fctx = ctx[f]
350 352 if fctx is None:
351 353 continue
352 354 data = fctx.data()
353 355 if util.binary(data):
354 356 # We should not abort here, since the user should
355 357 # be able to say "** = native" to automatically
356 358 # have all non-binary files taken care of.
357 359 continue
358 360 if inconsistenteol(data):
359 361 raise error.Abort(_("inconsistent newline style "
360 362 "in %s\n") % f)
361 363 return super(eolrepo, self).commitctx(ctx, haserror)
362 364 repo.__class__ = eolrepo
363 365 repo._hgcleardirstate()
General Comments 0
You need to be logged in to leave comments. Login now