##// END OF EJS Templates
localrepo: iteratively derive local repository type...
localrepo: iteratively derive local repository type This commit implements the dynamic local repository type derivation that was explained in the recent commit bfeab472e3c0 "localrepo: create new function for instantiating a local repo object." Instead of a static localrepository class/type which must be customized after construction, we now dynamically construct a type by building up base classes/types to represent specific repository interfaces. Conceptually, the end state is similar to what was happening when various extensions would monkeypatch the __class__ of newly-constructed repo instances. However, the approach is inverted. Instead of making the instance then customizing it, we do the customization up front by influencing the behavior of the type then we instantiate that custom type. This approach gives us much more flexibility. For example, we can use completely separate classes for implementing different aspects of the repository. For example, we could have one class representing revlog-based file storage and another representing non-revlog based file storage. When then choose which implementation to use based on the presence of repo requirements. A concern with this approach is that it creates a lot more types and complexity and that complexity adds overhead. Yes, it is true that this approach will result in more types being created. Yes, this is more complicated than traditional "instantiate a static type." However, I believe the alternatives to supporting alternate storage backends are just as complicated. (Before I arrived at this solution, I had patches storing factory functions on local repo instances for e.g. constructing a file storage instance. We ended up having a handful of these. And this was logically identical to assigning custom methods. Since we were logically changing the type of the instance, I figured it would be better to just use specialized types instead of introducing levels of abstraction at run-time.) On the performance front, I don't believe that having N base classes has any significant performance overhead compared to just a single base class. Intuition says that Python will need to iterate the base classes to find an attribute. However, CPython caches method lookups: as long as the __class__ or MRO isn't changing, method attribute lookup should be constant time after first access. And non-method attributes are stored in __dict__, of which there is only 1 per object, so the number of base classes for __dict__ is irrelevant. Anyway, this commit splits up the monolithic completelocalrepository interface into sub-interfaces: 1 for file storage and 1 representing everything else. We've taught ``makelocalrepository()`` to call a series of factory functions which will produce types implementing specific interfaces. It then calls type() to create a new type from the built-up list of base types. This commit should be considered a start and not the end state. I suspect we'll hit a number of problems as we start to implement alternate storage backends: * Passing custom arguments to __init__ and setting custom attributes on __dict__. * Customizing the set of interfaces that are needed. e.g. the "readonly" intent could translate to not requesting an interface providing methods related to writing. * More ergonomic way for extensions to insert themselves so their callbacks aren't unconditionally called. * Wanting to modify vfs instances, other arguments passed to __init__. That being said, this code is usable in its current state and I'm convinced future commits will demonstrate the value in this approach. Differential Revision: https://phab.mercurial-scm.org/D4642

File last commit:

r39491:a913d289 default
r39800:e4e88157 default
Show More
wireprotolfsserver.py
344 lines | 11.5 KiB | text/x-python | PythonLexer
/ hgext / lfs / wireprotolfsserver.py
# wireprotolfsserver.py - lfs protocol server side implementation
#
# Copyright 2018 Matt Harbison <matt_harbison@yahoo.com>
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
from __future__ import absolute_import
import datetime
import errno
import json
import traceback
from mercurial.hgweb import (
common as hgwebcommon,
)
from mercurial import (
pycompat,
util,
)
from . import blobstore
HTTP_OK = hgwebcommon.HTTP_OK
HTTP_CREATED = hgwebcommon.HTTP_CREATED
HTTP_BAD_REQUEST = hgwebcommon.HTTP_BAD_REQUEST
HTTP_NOT_FOUND = hgwebcommon.HTTP_NOT_FOUND
HTTP_METHOD_NOT_ALLOWED = hgwebcommon.HTTP_METHOD_NOT_ALLOWED
HTTP_NOT_ACCEPTABLE = hgwebcommon.HTTP_NOT_ACCEPTABLE
HTTP_UNSUPPORTED_MEDIA_TYPE = hgwebcommon.HTTP_UNSUPPORTED_MEDIA_TYPE
def handlewsgirequest(orig, rctx, req, res, checkperm):
"""Wrap wireprotoserver.handlewsgirequest() to possibly process an LFS
request if it is left unprocessed by the wrapped method.
"""
if orig(rctx, req, res, checkperm):
return True
if not rctx.repo.ui.configbool('experimental', 'lfs.serve'):
return False
if not util.safehasattr(rctx.repo.svfs, 'lfslocalblobstore'):
return False
if not req.dispatchpath:
return False
try:
if req.dispatchpath == b'.git/info/lfs/objects/batch':
checkperm(rctx, req, 'pull')
return _processbatchrequest(rctx.repo, req, res)
# TODO: reserve and use a path in the proposed http wireprotocol /api/
# namespace?
elif req.dispatchpath.startswith(b'.hg/lfs/objects'):
return _processbasictransfer(rctx.repo, req, res,
lambda perm:
checkperm(rctx, req, perm))
return False
except hgwebcommon.ErrorResponse as e:
# XXX: copied from the handler surrounding wireprotoserver._callhttp()
# in the wrapped function. Should this be moved back to hgweb to
# be a common handler?
for k, v in e.headers:
res.headers[k] = v
res.status = hgwebcommon.statusmessage(e.code, pycompat.bytestr(e))
res.setbodybytes(b'0\n%s\n' % pycompat.bytestr(e))
return True
def _sethttperror(res, code, message=None):
res.status = hgwebcommon.statusmessage(code, message=message)
res.headers[b'Content-Type'] = b'text/plain; charset=utf-8'
res.setbodybytes(b'')
def _logexception(req):
"""Write information about the current exception to wsgi.errors."""
tb = pycompat.sysbytes(traceback.format_exc())
errorlog = req.rawenv[r'wsgi.errors']
uri = b''
if req.apppath:
uri += req.apppath
uri += b'/' + req.dispatchpath
errorlog.write(b"Exception happened while processing request '%s':\n%s" %
(uri, tb))
def _processbatchrequest(repo, req, res):
"""Handle a request for the Batch API, which is the gateway to granting file
access.
https://github.com/git-lfs/git-lfs/blob/master/docs/api/batch.md
"""
# Mercurial client request:
#
# HOST: localhost:$HGPORT
# ACCEPT: application/vnd.git-lfs+json
# ACCEPT-ENCODING: identity
# USER-AGENT: git-lfs/2.3.4 (Mercurial 4.5.2+1114-f48b9754f04c+20180316)
# Content-Length: 125
# Content-Type: application/vnd.git-lfs+json
#
# {
# "objects": [
# {
# "oid": "31cf...8e5b"
# "size": 12
# }
# ]
# "operation": "upload"
# }
if req.method != b'POST':
_sethttperror(res, HTTP_METHOD_NOT_ALLOWED)
return True
if req.headers[b'Content-Type'] != b'application/vnd.git-lfs+json':
_sethttperror(res, HTTP_UNSUPPORTED_MEDIA_TYPE)
return True
if req.headers[b'Accept'] != b'application/vnd.git-lfs+json':
_sethttperror(res, HTTP_NOT_ACCEPTABLE)
return True
# XXX: specify an encoding?
lfsreq = json.loads(req.bodyfh.read())
# If no transfer handlers are explicitly requested, 'basic' is assumed.
if 'basic' not in lfsreq.get('transfers', ['basic']):
_sethttperror(res, HTTP_BAD_REQUEST,
b'Only the basic LFS transfer handler is supported')
return True
operation = lfsreq.get('operation')
if operation not in ('upload', 'download'):
_sethttperror(res, HTTP_BAD_REQUEST,
b'Unsupported LFS transfer operation: %s' % operation)
return True
localstore = repo.svfs.lfslocalblobstore
objects = [p for p in _batchresponseobjects(req, lfsreq.get('objects', []),
operation, localstore)]
rsp = {
'transfer': 'basic',
'objects': objects,
}
res.status = hgwebcommon.statusmessage(HTTP_OK)
res.headers[b'Content-Type'] = b'application/vnd.git-lfs+json'
res.setbodybytes(pycompat.bytestr(json.dumps(rsp)))
return True
def _batchresponseobjects(req, objects, action, store):
"""Yield one dictionary of attributes for the Batch API response for each
object in the list.
req: The parsedrequest for the Batch API request
objects: The list of objects in the Batch API object request list
action: 'upload' or 'download'
store: The local blob store for servicing requests"""
# Successful lfs-test-server response to solict an upload:
# {
# u'objects': [{
# u'size': 12,
# u'oid': u'31cf...8e5b',
# u'actions': {
# u'upload': {
# u'href': u'http://localhost:$HGPORT/objects/31cf...8e5b',
# u'expires_at': u'0001-01-01T00:00:00Z',
# u'header': {
# u'Accept': u'application/vnd.git-lfs'
# }
# }
# }
# }]
# }
# TODO: Sort out the expires_at/expires_in/authenticated keys.
for obj in objects:
# Convert unicode to ASCII to create a filesystem path
oid = obj.get('oid').encode('ascii')
rsp = {
'oid': oid,
'size': obj.get('size'), # XXX: should this check the local size?
#'authenticated': True,
}
exists = True
verifies = False
# Verify an existing file on the upload request, so that the client is
# solicited to re-upload if it corrupt locally. Download requests are
# also verified, so the error can be flagged in the Batch API response.
# (Maybe we can use this to short circuit the download for `hg verify`,
# IFF the client can assert that the remote end is an hg server.)
# Otherwise, it's potentially overkill on download, since it is also
# verified as the file is streamed to the caller.
try:
verifies = store.verify(oid)
if verifies and action == 'upload':
# The client will skip this upload, but make sure it remains
# available locally.
store.linkfromusercache(oid)
except IOError as inst:
if inst.errno != errno.ENOENT:
_logexception(req)
rsp['error'] = {
'code': 500,
'message': inst.strerror or 'Internal Server Server'
}
yield rsp
continue
exists = False
# Items are always listed for downloads. They are dropped for uploads
# IFF they already exist locally.
if action == 'download':
if not exists:
rsp['error'] = {
'code': 404,
'message': "The object does not exist"
}
yield rsp
continue
elif not verifies:
rsp['error'] = {
'code': 422, # XXX: is this the right code?
'message': "The object is corrupt"
}
yield rsp
continue
elif verifies:
yield rsp # Skip 'actions': already uploaded
continue
expiresat = datetime.datetime.now() + datetime.timedelta(minutes=10)
def _buildheader():
# The spec doesn't mention the Accept header here, but avoid
# a gratuitous deviation from lfs-test-server in the test
# output.
hdr = {
'Accept': 'application/vnd.git-lfs'
}
auth = req.headers.get('Authorization', '')
if auth.startswith('Basic '):
hdr['Authorization'] = auth
return hdr
rsp['actions'] = {
'%s' % action: {
'href': '%s%s/.hg/lfs/objects/%s'
% (req.baseurl, req.apppath, oid),
# datetime.isoformat() doesn't include the 'Z' suffix
"expires_at": expiresat.strftime('%Y-%m-%dT%H:%M:%SZ'),
'header': _buildheader(),
}
}
yield rsp
def _processbasictransfer(repo, req, res, checkperm):
"""Handle a single file upload (PUT) or download (GET) action for the Basic
Transfer Adapter.
After determining if the request is for an upload or download, the access
must be checked by calling ``checkperm()`` with either 'pull' or 'upload'
before accessing the files.
https://github.com/git-lfs/git-lfs/blob/master/docs/api/basic-transfers.md
"""
method = req.method
oid = req.dispatchparts[-1]
localstore = repo.svfs.lfslocalblobstore
if len(req.dispatchparts) != 4:
_sethttperror(res, HTTP_NOT_FOUND)
return True
if method == b'PUT':
checkperm('upload')
# TODO: verify Content-Type?
existed = localstore.has(oid)
# TODO: how to handle timeouts? The body proxy handles limiting to
# Content-Length, but what happens if a client sends less than it
# says it will?
statusmessage = hgwebcommon.statusmessage
try:
localstore.download(oid, req.bodyfh)
res.status = statusmessage(HTTP_OK if existed else HTTP_CREATED)
except blobstore.LfsCorruptionError:
_logexception(req)
# XXX: Is this the right code?
res.status = statusmessage(422, b'corrupt blob')
# There's no payload here, but this is the header that lfs-test-server
# sends back. This eliminates some gratuitous test output conditionals.
res.headers[b'Content-Type'] = b'text/plain; charset=utf-8'
res.setbodybytes(b'')
return True
elif method == b'GET':
checkperm('pull')
res.status = hgwebcommon.statusmessage(HTTP_OK)
res.headers[b'Content-Type'] = b'application/octet-stream'
try:
# TODO: figure out how to send back the file in chunks, instead of
# reading the whole thing. (Also figure out how to send back
# an error status if an IOError occurs after a partial write
# in that case. Here, everything is read before starting.)
res.setbodybytes(localstore.read(oid))
except blobstore.LfsCorruptionError:
_logexception(req)
# XXX: Is this the right code?
res.status = hgwebcommon.statusmessage(422, b'corrupt blob')
res.setbodybytes(b'')
return True
else:
_sethttperror(res, HTTP_METHOD_NOT_ALLOWED,
message=b'Unsupported LFS transfer method: %s' % method)
return True