##// END OF EJS Templates
worker: document poor partitioning scheme impact...
Gregory Szorc -
r28292:3eb7faf6 default
parent child Browse files
Show More
@@ -157,6 +157,28 b' def partition(lst, nslices):'
157 The current strategy takes every Nth element from the input. If
157 The current strategy takes every Nth element from the input. If
158 we ever write workers that need to preserve grouping in input
158 we ever write workers that need to preserve grouping in input
159 we should consider allowing callers to specify a partition strategy.
159 we should consider allowing callers to specify a partition strategy.
160
161 mpm is not a fan of this partitioning strategy when files are involved.
162 In his words:
163
164 Single-threaded Mercurial makes a point of creating and visiting
165 files in a fixed order (alphabetical). When creating files in order,
166 a typical filesystem is likely to allocate them on nearby regions on
167 disk. Thus, when revisiting in the same order, locality is maximized
168 and various forms of OS and disk-level caching and read-ahead get a
169 chance to work.
170
171 This effect can be quite significant on spinning disks. I discovered it
172 circa Mercurial v0.4 when revlogs were named by hashes of filenames.
173 Tarring a repo and copying it to another disk effectively randomized
174 the revlog ordering on disk by sorting the revlogs by hash and suddenly
175 performance of my kernel checkout benchmark dropped by ~10x because the
176 "working set" of sectors visited no longer fit in the drive's cache and
177 the workload switched from streaming to random I/O.
178
179 What we should really be doing is have workers read filenames from a
180 ordered queue. This preserves locality and also keeps any worker from
181 getting more than one file out of balance.
160 '''
182 '''
161 for i in range(nslices):
183 for i in range(nslices):
162 yield lst[i::nslices]
184 yield lst[i::nslices]
General Comments 0
You need to be logged in to leave comments. Login now