upstream/mercurial-mirror Files · i18n/posplit

worker: don't expose readinto() on _blockingreader since pickle is picky...

worker: don't expose readinto() on _blockingreader since pickle is picky The `pickle` module expects the input to be buffered and a whole object to be available when `pickle.load()` is called, which is not necessarily true when we send data from workers back to the parent process (i.e., it seems like a bad assumption for the `pickle` module to make). We added a workaround for that in https://phab.mercurial-scm.org/D8076, which made `read()` continue until all the requested bytes have been read. As we found out at work after a lot of investigation (I've spent the last two days on this), the native version of `pickle.load()` has started calling `readinto()` on the input since Python 3.8. That started being called in https://github.com/python/cpython/commit/91f4380cedbae32b49adbea2518014a5624c6523 (and only by the C version of `pickle.load()`)). Before that, it was only `read()` and `readline()` that were called. The problem with that was that `readinto()` on our `_blockingreader` was simply delegating to the underlying, *unbuffered* object. The symptom we saw was that `hg fix` started failing sometimes on Python 3.8 on Mac. It failed very relyable in some cases. I still haven't figured out under what circumstances it fails and I've been unable to reproduce it in test cases (I've tried writing larger amounts of data, using different numbers of workers, and making the formatters sleep). I have, however, been able to reproduce it 3-4 times on Linux, but then it stopped reproducing on the following few hundred attempts. To fix the problem, we can simply remove the implementation of `readinto()`, since the unpickler will then fall back to calling `read()`. The fallback was added a bit later, in https://github.com/python/cpython/commit/b19f7ecfa3adc6ba1544225317b9473649815b38. However, that commit also added checking that what `read()` returns is a `bytes`, so we also need to convert the `bytearray` we use into that. I was able to add a test for that failure at least. Differential Revision: https://phab.mercurial-scm.org/D8928

Gregory Szorc - - Load All Authors

File last commit:

r44089:47ef023d default


                r45950:7d24201b

default

Download file

             posplit
        
                    94 lines
            
             | 3.2 KiB
            
                | text/plain
            
             |
                TextLexer

/ i18n / posplit

History | Annotation | Raw |Copy content |Copy permalink

				#!/usr/bin/env python
				#
				# posplit - split messages in paragraphs on .po/.pot files
				#
				# license: MIT/X11/Expat
				#

				from __future__ import absolute_import, print_function

				import polib
				import re
				import sys


				def addentry(po, entry, cache):
				e = cache.get(entry.msgid)
				if e:
				e.occurrences.extend(entry.occurrences)

				# merge comments from entry
				for comment in entry.comment.split('\n'):
				if comment and comment not in e.comment:
				if not e.comment:
				e.comment = comment
				else:
				e.comment += '\n' + comment
				else:
				po.append(entry)
				cache[entry.msgid] = entry


				def mkentry(orig, delta, msgid, msgstr):
				entry = polib.POEntry()
				entry.merge(orig)
				entry.msgid = msgid or orig.msgid
				entry.msgstr = msgstr or orig.msgstr
				entry.occurrences = [(p, int(l) + delta) for (p, l) in orig.occurrences]
				return entry


				if __name__ == "__main__":
				po = polib.pofile(sys.argv[1])

				cache = {}
				entries = po[:]
				po[:] = []
				findd = re.compile(r' *\.\. (\w+)::') # for finding directives
				for entry in entries:
				msgids = entry.msgid.split(u'\n\n')
				if entry.msgstr:
				msgstrs = entry.msgstr.split(u'\n\n')
				else:
				msgstrs = [u''] * len(msgids)

				if len(msgids) != len(msgstrs):
				# places the whole existing translation as a fuzzy
				# translation for each paragraph, to give the
				# translator a chance to recover part of the old
				# translation - erasing extra paragraphs is
				# probably better than retranslating all from start
				if 'fuzzy' not in entry.flags:
				entry.flags.append('fuzzy')
				msgstrs = [entry.msgstr] * len(msgids)

				delta = 0
				for msgid, msgstr in zip(msgids, msgstrs):
				if msgid and msgid != '::':
				newentry = mkentry(entry, delta, msgid, msgstr)
				mdirective = findd.match(msgid)
				if mdirective:
				if not msgid[mdirective.end() :].rstrip():
				# only directive, nothing to translate here
				delta += 2
				continue
				directive = mdirective.group(1)
				if directive in ('container', 'include'):
				if msgid.rstrip('\n').count('\n') == 0:
				# only rst syntax, nothing to translate
				delta += 2
				continue
				else:
				# lines following directly, unexpected
				print(
				'Warning: text follows line with directive'
				' %s' % directive
				)
				comment = 'do not translate: .. %s::' % directive
				if not newentry.comment:
				newentry.comment = comment
				elif comment not in newentry.comment:
				newentry.comment += '\n' + comment
				addentry(po, newentry, cache)
				delta += 2 + msgid.count('\n')
				po.save()

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages