upstream/mercurial-mirror Commit - r34332:53133250

1

# This library is free software; you can redistribute it and/or

1

# This library is free software; you can redistribute it and/or

2

# modify it under the terms of the GNU Lesser General Public

2

# modify it under the terms of the GNU Lesser General Public

3

# License as published by the Free Software Foundation; either

3

# License as published by the Free Software Foundation; either

4

# version 2.1 of the License, or (at your option) any later version.

4

# version 2.1 of the License, or (at your option) any later version.

5

#

5

#

6

# This library is distributed in the hope that it will be useful,

6

# This library is distributed in the hope that it will be useful,

7

# but WITHOUT ANY WARRANTY; without even the implied warranty of

7

# but WITHOUT ANY WARRANTY; without even the implied warranty of

8

# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

8

# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

9

# Lesser General Public License for more details.

9

# Lesser General Public License for more details.

10

#

10

#

11

# You should have received a copy of the GNU Lesser General Public

11

# You should have received a copy of the GNU Lesser General Public

12

# License along with this library; if not, see

12

# License along with this library; if not, see

13

# <http://www.gnu.org/licenses/>.

13

# <http://www.gnu.org/licenses/>.

14

15

# This file is part of urlgrabber, a high-level cross-protocol url-grabber

15

# This file is part of urlgrabber, a high-level cross-protocol url-grabber

16

17

18

# Modified by Benoit Boissinot:

18

# Modified by Benoit Boissinot:

19

# - fix for digest auth (inspired from urllib2.py @ Python v2.4)

19

# - fix for digest auth (inspired from urllib2.py @ Python v2.4)

20

# Modified by Dirkjan Ochtman:

20

# Modified by Dirkjan Ochtman:

21

# - import md5 function from a local util module

21

# - import md5 function from a local util module

22

# Modified by Augie Fackler:

22

# Modified by Augie Fackler:

23

# - add safesend method and use it to prevent broken pipe errors

23

# - add safesend method and use it to prevent broken pipe errors

24

# on large POST requests

24

# on large POST requests

25

26

"""An HTTP handler for urllib2 that supports HTTP 1.1 and keepalive.

26

"""An HTTP handler for urllib2 that supports HTTP 1.1 and keepalive.

27

28

>>> import urllib2

28

>>> import urllib2

29

>>> from keepalive import HTTPHandler

29

>>> from keepalive import HTTPHandler

30

>>> keepalive_handler = HTTPHandler()

30

>>> keepalive_handler = HTTPHandler()

31

>>> opener = urlreq.buildopener(keepalive_handler)

31

>>> opener = urlreq.buildopener(keepalive_handler)

32

>>> urlreq.installopener(opener)

32

>>> urlreq.installopener(opener)

33

>>>

33

>>>

34

>>> fo = urlreq.urlopen('http://www.python.org')

34

>>> fo = urlreq.urlopen('http://www.python.org')

35

36

If a connection to a given host is requested, and all of the existing

36

If a connection to a given host is requested, and all of the existing

37

connections are still in use, another connection will be opened. If

37

connections are still in use, another connection will be opened. If

38

the handler tries to use an existing connection but it fails in some

38

the handler tries to use an existing connection but it fails in some

39

way, it will be closed and removed from the pool.

39

way, it will be closed and removed from the pool.

40

41

To remove the handler, simply re-run build_opener with no arguments, and

41

To remove the handler, simply re-run build_opener with no arguments, and

42

install that opener.

42

install that opener.

43

44

You can explicitly close connections by using the close_connection()

44

You can explicitly close connections by using the close_connection()

45

method of the returned file-like object (described below) or you can

45

method of the returned file-like object (described below) or you can

46

use the handler methods:

46

use the handler methods:

47

48

close_connection(host)

48

close_connection(host)

49

close_all()

49

close_all()

50

open_connections()

50

open_connections()

51

52

NOTE: using the close_connection and close_all methods of the handler

52

NOTE: using the close_connection and close_all methods of the handler

53

should be done with care when using multiple threads.

53

should be done with care when using multiple threads.

54

* there is nothing that prevents another thread from creating new

54

* there is nothing that prevents another thread from creating new

55

connections immediately after connections are closed

55

connections immediately after connections are closed

56

* no checks are done to prevent in-use connections from being closed

56

* no checks are done to prevent in-use connections from being closed

57

58

>>> keepalive_handler.close_all()

58

>>> keepalive_handler.close_all()

59

60

EXTRA ATTRIBUTES AND METHODS

60

EXTRA ATTRIBUTES AND METHODS

61

62

Upon a status of 200, the object returned has a few additional

62

Upon a status of 200, the object returned has a few additional

63

attributes and methods, which should not be used if you want to

63

attributes and methods, which should not be used if you want to

64

remain consistent with the normal urllib2-returned objects:

64

remain consistent with the normal urllib2-returned objects:

65

66

close_connection() - close the connection to the host

66

close_connection() - close the connection to the host

67

readlines() - you know, readlines()

67

readlines() - you know, readlines()

68

status - the return status (i.e. 404)

68

status - the return status (i.e. 404)

69

reason - english translation of status (i.e. 'File not found')

69

reason - english translation of status (i.e. 'File not found')

70

71

If you want the best of both worlds, use this inside an

71

If you want the best of both worlds, use this inside an

72

AttributeError-catching try:

72

AttributeError-catching try:

73

74

>>> try: status = fo.status

74

>>> try: status = fo.status

75

>>> except AttributeError: status = None

75

>>> except AttributeError: status = None

76

77

Unfortunately, these are ONLY there if status == 200, so it's not

77

Unfortunately, these are ONLY there if status == 200, so it's not

78

easy to distinguish between non-200 responses. The reason is that

78

easy to distinguish between non-200 responses. The reason is that

79

urllib2 tries to do clever things with error codes 301, 302, 401,

79

urllib2 tries to do clever things with error codes 301, 302, 401,

80

and 407, and it wraps the object upon return.

80

and 407, and it wraps the object upon return.

81

"""

81

"""

82

83

# $Id: keepalive.py,v 1.14 2006/04/04 21:00:32 mstenner Exp $

83

# $Id: keepalive.py,v 1.14 2006/04/04 21:00:32 mstenner Exp $

84

85

from __future__ import absolute_import, print_function

85

from __future__ import absolute_import, print_function

86

87

import errno

87

import errno

88

import hashlib

88

import hashlib

89

import socket

89

import socket

90

import sys

90

import sys

91

import threading

91

import threading

92

93

from .i18n import _

93

from .i18n import _

94

from . import (

94

from . import (

95

util,

95

util,

96

)

96

)

97

98

httplib = util.httplib

98

httplib = util.httplib

99

urlerr = util.urlerr

99

urlerr = util.urlerr

100

urlreq = util.urlreq

100

urlreq = util.urlreq

101

102

DEBUG = None

102

DEBUG = None

103

104

class ConnectionManager(object):

104

class ConnectionManager(object):

105

"""

105

"""

106

The connection manager must be able to:

106

The connection manager must be able to:

107

* keep track of all existing

107

* keep track of all existing

108

"""

108

"""

109

def __init__(self):

109

def __init__(self):

110

self._lock = threading.Lock()

110

self._lock = threading.Lock()

111

self._hostmap = {} # map hosts to a list of connections

111

self._hostmap = {} # map hosts to a list of connections

112

self._connmap = {} # map connections to host

112

self._connmap = {} # map connections to host

113

self._readymap = {} # map connection to ready state

113

self._readymap = {} # map connection to ready state

114

115

def add(self, host, connection, ready):

115

def add(self, host, connection, ready):

116

self._lock.acquire()

116

self._lock.acquire()

117

try:

117

try:

118

if host not in self._hostmap:

118

if host not in self._hostmap:

119

self._hostmap[host] = []

119

self._hostmap[host] = []

120

self._hostmap[host].append(connection)

120

self._hostmap[host].append(connection)

121

self._connmap[connection] = host

121

self._connmap[connection] = host

122

self._readymap[connection] = ready

122

self._readymap[connection] = ready

123

finally:

123

finally:

124

self._lock.release()

124

self._lock.release()

125

126

def remove(self, connection):

126

def remove(self, connection):

127

self._lock.acquire()

127

self._lock.acquire()

128

try:

128

try:

129

try:

129

try:

130

host = self._connmap[connection]

130

host = self._connmap[connection]

131

except KeyError:

131

except KeyError:

132

pass

132

pass

133

else:

133

else:

134

del self._connmap[connection]

134

del self._connmap[connection]

135

del self._readymap[connection]

135

del self._readymap[connection]

136

self._hostmap[host].remove(connection)

136

self._hostmap[host].remove(connection)

137

if not self._hostmap[host]: del self._hostmap[host]

137

if not self._hostmap[host]: del self._hostmap[host]

138

finally:

138

finally:

139

self._lock.release()

139

self._lock.release()

140

141

def set_ready(self, connection, ready):

141

def set_ready(self, connection, ready):

142

try:

142

try:

143

self._readymap[connection] = ready

143

self._readymap[connection] = ready

144

except KeyError:

144

except KeyError:

145

pass

145

pass

146

147

def get_ready_conn(self, host):

147

def get_ready_conn(self, host):

148

conn = None

148

conn = None

149

self._lock.acquire()

149

self._lock.acquire()

150

try:

150

try:

151

if host in self._hostmap:

151

if host in self._hostmap:

152

for c in self._hostmap[host]:

152

for c in self._hostmap[host]:

153

if self._readymap[c]:

153

if self._readymap[c]:

154

self._readymap[c] = 0

154

self._readymap[c] = 0

155

conn = c

155

conn = c

156

break

156

break

157

finally:

157

finally:

158

self._lock.release()

158

self._lock.release()

159

return conn

159

return conn

160

161

def get_all(self, host=None):

161

def get_all(self, host=None):

162

if host:

162

if host:

163

return list(self._hostmap.get(host, []))

163

return list(self._hostmap.get(host, []))

164

else:

164

else:

165

return dict(self._hostmap)

165

return dict(self._hostmap)

166

167

class KeepAliveHandler(object):

167

class KeepAliveHandler(object):

168

def __init__(self):

168

def __init__(self):

169

self._cm = ConnectionManager()

169

self._cm = ConnectionManager()

170

171

#### Connection Management

171

#### Connection Management

172

def open_connections(self):

172

def open_connections(self):

173

"""return a list of connected hosts and the number of connections

173

"""return a list of connected hosts and the number of connections

174

to each. [('foo.com:80', 2), ('bar.org', 1)]"""

174

to each. [('foo.com:80', 2), ('bar.org', 1)]"""

175

return [(host, len(li)) for (host, li) in self._cm.get_all().items()]

175

return [(host, len(li)) for (host, li) in self._cm.get_all().items()]

176

177

def close_connection(self, host):

177

def close_connection(self, host):

178

"""close connection(s) to <host>

178

"""close connection(s) to <host>

179

host is the host:port spec, as in 'www.cnn.com:8080' as passed in.

179

host is the host:port spec, as in 'www.cnn.com:8080' as passed in.

180

no error occurs if there is no connection to that host."""

180

no error occurs if there is no connection to that host."""

181

for h in self._cm.get_all(host):

181

for h in self._cm.get_all(host):

182

self._cm.remove(h)

182

self._cm.remove(h)

183

h.close()

183

h.close()

184

185

def close_all(self):

185

def close_all(self):

186

"""close all open connections"""

186

"""close all open connections"""

187

for host, conns in self._cm.get_all().iteritems():

187

for host, conns in self._cm.get_all().iteritems():

188

for h in conns:

188

for h in conns:

189

self._cm.remove(h)

189

self._cm.remove(h)

190

h.close()

190

h.close()

191

192

def _request_closed(self, request, host, connection):

192

def _request_closed(self, request, host, connection):

193

"""tells us that this request is now closed and that the

193

"""tells us that this request is now closed and that the

194

connection is ready for another request"""

194

connection is ready for another request"""

195

self._cm.set_ready(connection, 1)

195

self._cm.set_ready(connection, 1)

196

197

def _remove_connection(self, host, connection, close=0):

197

def _remove_connection(self, host, connection, close=0):

198

if close:

198

if close:

199

connection.close()

199

connection.close()

200

self._cm.remove(connection)

200

self._cm.remove(connection)

201

202

#### Transaction Execution

202

#### Transaction Execution

203

def http_open(self, req):

203

def http_open(self, req):

204

return self.do_open(HTTPConnection, req)

204

return self.do_open(HTTPConnection, req)

205

206

def do_open(self, http_class, req):

206

def do_open(self, http_class, req):

207

host = req.get_host()

207

host = req.get_host()

208

if not host:

208

if not host:

209

raise urlerr.urlerror('no host given')

209

raise urlerr.urlerror('no host given')

210

211

try:

211

try:

212

h = self._cm.get_ready_conn(host)

212

h = self._cm.get_ready_conn(host)

213

while h:

213

while h:

214

r = self._reuse_connection(h, req, host)

214

r = self._reuse_connection(h, req, host)

215

216

# if this response is non-None, then it worked and we're

216

# if this response is non-None, then it worked and we're

217

# done. Break out, skipping the else block.

217

# done. Break out, skipping the else block.

218

if r:

218

if r:

219

break

219

break

220

221

# connection is bad - possibly closed by server

221

# connection is bad - possibly closed by server

222

# discard it and ask for the next free connection

222

# discard it and ask for the next free connection

223

h.close()

223

h.close()

224

self._cm.remove(h)

224

self._cm.remove(h)

225

h = self._cm.get_ready_conn(host)

225

h = self._cm.get_ready_conn(host)

226

else:

226

else:

227

# no (working) free connections were found. Create a new one.

227

# no (working) free connections were found. Create a new one.

228

h = http_class(host)

228

h = http_class(host)

229

if DEBUG:

229

if DEBUG:

230

DEBUG.info("creating new connection to %s (%d)",

230

DEBUG.info("creating new connection to %s (%d)",

231

host, id(h))

231

host, id(h))

232

self._cm.add(host, h, 0)

232

self._cm.add(host, h, 0)

233

self._start_transaction(h, req)

233

self._start_transaction(h, req)

234

r = h.getresponse()

234

r = h.getresponse()

235

# The string form of BadStatusLine is the status line. Add some context

235

# The string form of BadStatusLine is the status line. Add some context

236

# to make the error message slightly more useful.

236

# to make the error message slightly more useful.

237

except httplib.BadStatusLine as err:

237

except httplib.BadStatusLine as err:

238

raise urlerr.urlerror(_('bad HTTP status line: %s') % err.line)

238

raise urlerr.urlerror(_('bad HTTP status line: %s') % err.line)

239

except (socket.error, httplib.HTTPException) as err:

239

except (socket.error, httplib.HTTPException) as err:

240

raise urlerr.urlerror(err)

240

raise urlerr.urlerror(err)

241

242

# if not a persistent connection, don't try to reuse it

242

# if not a persistent connection, don't try to reuse it

243

if r.will_close:

243

if r.will_close:

244

self._cm.remove(h)

244

self._cm.remove(h)

245

246

if DEBUG:

246

if DEBUG:

247

DEBUG.info("STATUS: %s, %s", r.status, r.reason)

247

DEBUG.info("STATUS: %s, %s", r.status, r.reason)

248

r._handler = self

248

r._handler = self

249

r._host = host

249

r._host = host

250

r._url = req.get_full_url()

250

r._url = req.get_full_url()

251

r._connection = h

251

r._connection = h

252

r.code = r.status

252

r.code = r.status

253

r.headers = r.msg

253

r.headers = r.msg

254

r.msg = r.reason

254

r.msg = r.reason

255

256

return r

256

return r

257

258

def _reuse_connection(self, h, req, host):

258

def _reuse_connection(self, h, req, host):

259

"""start the transaction with a re-used connection

259

"""start the transaction with a re-used connection

260

return a response object (r) upon success or None on failure.

260

return a response object (r) upon success or None on failure.

261

This DOES not close or remove bad connections in cases where

261

This DOES not close or remove bad connections in cases where

262

it returns. However, if an unexpected exception occurs, it

262

it returns. However, if an unexpected exception occurs, it

263

will close and remove the connection before re-raising.

263

will close and remove the connection before re-raising.

264

"""

264

"""

265

try:

265

try:

266

self._start_transaction(h, req)

266

self._start_transaction(h, req)

267

r = h.getresponse()

267

r = h.getresponse()

268

# note: just because we got something back doesn't mean it

268

# note: just because we got something back doesn't mean it

269

# worked. We'll check the version below, too.

269

# worked. We'll check the version below, too.

270

except (socket.error, httplib.HTTPException):

270

except (socket.error, httplib.HTTPException):

271

r = None

271

r = None

272

except: # re-raises

272

except: # re-raises

273

# adding this block just in case we've missed

273

# adding this block just in case we've missed

274

# something we will still raise the exception, but

274

# something we will still raise the exception, but

275

# lets try and close the connection and remove it

275

# lets try and close the connection and remove it

276

# first. We previously got into a nasty loop

276

# first. We previously got into a nasty loop

277

# where an exception was uncaught, and so the

277

# where an exception was uncaught, and so the

278

# connection stayed open. On the next try, the

278

# connection stayed open. On the next try, the

279

# same exception was raised, etc. The trade-off is

279

# same exception was raised, etc. The trade-off is

280

# that it's now possible this call will raise

280

# that it's now possible this call will raise

281

# a DIFFERENT exception

281

# a DIFFERENT exception

282

if DEBUG:

282

if DEBUG:

283

DEBUG.error("unexpected exception - closing "

283

DEBUG.error("unexpected exception - closing "

284

"connection to %s (%d)", host, id(h))

284

"connection to %s (%d)", host, id(h))

285

self._cm.remove(h)

285

self._cm.remove(h)

286

h.close()

286

h.close()

287

raise

287

raise

288

289

if r is None or r.version == 9:

289

if r is None or r.version == 9:

290

# httplib falls back to assuming HTTP 0.9 if it gets a

290

# httplib falls back to assuming HTTP 0.9 if it gets a

291

# bad header back. This is most likely to happen if

291

# bad header back. This is most likely to happen if

292

# the socket has been closed by the server since we

292

# the socket has been closed by the server since we

293

# last used the connection.

293

# last used the connection.

294

if DEBUG:

294

if DEBUG:

295

DEBUG.info("failed to re-use connection to %s (%d)",

295

DEBUG.info("failed to re-use connection to %s (%d)",

296

host, id(h))

296

host, id(h))

297

r = None

297

r = None

298

else:

298

else:

299

if DEBUG:

299

if DEBUG:

300

DEBUG.info("re-using connection to %s (%d)", host, id(h))

300

DEBUG.info("re-using connection to %s (%d)", host, id(h))

301

302

return r

302

return r

303

304

def _start_transaction(self, h, req):

304

def _start_transaction(self, h, req):

305

# What follows mostly reimplements HTTPConnection.request()

305

# What follows mostly reimplements HTTPConnection.request()

306

# except it adds self.parent.addheaders in the mix and sends headers

306

# except it adds self.parent.addheaders in the mix and sends headers

307

# in a deterministic order (to make testing easier).

307

# in a deterministic order (to make testing easier).

308

headers = util.sortdict(self.parent.addheaders)

308

headers = util.sortdict(self.parent.addheaders)

309

headers.update(sorted(req.headers.items()))

309

headers.update(sorted(req.headers.items()))

310

headers.update(sorted(req.unredirected_hdrs.items()))

310

headers.update(sorted(req.unredirected_hdrs.items()))

311

headers = util.sortdict((n.lower(), v) for n, v in headers.items())

311

headers = util.sortdict((n.lower(), v) for n, v in headers.items())

312

skipheaders = {}

312

skipheaders = {}

313

for n in ('host', 'accept-encoding'):

313

for n in ('host', 'accept-encoding'):

314

if n in headers:

314

if n in headers:

315

skipheaders['skip_' + n.replace('-', '_')] = 1

315

skipheaders['skip_' + n.replace('-', '_')] = 1

316

try:

316

try:

317

if req.has_data():

317

if req.has_data():

318

data = req.get_data()

318

data = req.get_data()

319

h.putrequest(

319

h.putrequest(

320

req.get_method(), req.get_selector(), **skipheaders)

320

req.get_method(), req.get_selector(), **skipheaders)

321

if 'content-type' not in headers:

321

if 'content-type' not in headers:

322

h.putheader('Content-type',

322

h.putheader('Content-type',

323

'application/x-www-form-urlencoded')

323

'application/x-www-form-urlencoded')

324

if 'content-length' not in headers:

324

if 'content-length' not in headers:

325

h.putheader('Content-length', '%d' % len(data))

325

h.putheader('Content-length', '%d' % len(data))

326

else:

326

else:

327

h.putrequest(

327

h.putrequest(

328

req.get_method(), req.get_selector(), **skipheaders)

328

req.get_method(), req.get_selector(), **skipheaders)

329

except socket.error as err:

329

except socket.error as err:

330

raise urlerr.urlerror(err)

330

raise urlerr.urlerror(err)

331

for k, v in headers.items():

331

for k, v in headers.items():

332

h.putheader(k, v)

332

h.putheader(k, v)

333

h.endheaders()

333

h.endheaders()

334

if req.has_data():

334

if req.has_data():

335

h.send(data)

335

h.send(data)

336

337

class HTTPHandler(KeepAliveHandler, urlreq.httphandler):

337

class HTTPHandler(KeepAliveHandler, urlreq.httphandler):

338

pass

338

pass

339

340

class HTTPResponse(httplib.HTTPResponse):

340

class HTTPResponse(httplib.HTTPResponse):

341

# we need to subclass HTTPResponse in order to

341

# we need to subclass HTTPResponse in order to

342

# 1) add readline() and readlines() methods

342

# 1) add readline() and readlines() methods

343

# 2) add close_connection() methods

343

# 2) add close_connection() methods

344

# 3) add info() and geturl() methods

344

# 3) add info() and geturl() methods

345

346

# in order to add readline(), read must be modified to deal with a

346

# in order to add readline(), read must be modified to deal with a

347

# buffer. example: readline must read a buffer and then spit back

347

# buffer. example: readline must read a buffer and then spit back

348

# one line at a time. The only real alternative is to read one

348

# one line at a time. The only real alternative is to read one

349

# BYTE at a time (ick). Once something has been read, it can't be

349

# BYTE at a time (ick). Once something has been read, it can't be

350

# put back (ok, maybe it can, but that's even uglier than this),

350

# put back (ok, maybe it can, but that's even uglier than this),

351

# so if you THEN do a normal read, you must first take stuff from

351

# so if you THEN do a normal read, you must first take stuff from

352

# the buffer.

352

# the buffer.

353

354

# the read method wraps the original to accommodate buffering,

354

# the read method wraps the original to accommodate buffering,

355

# although read() never adds to the buffer.

355

# although read() never adds to the buffer.

356

# Both readline and readlines have been stolen with almost no

356

# Both readline and readlines have been stolen with almost no

357

# modification from socket.py

357

# modification from socket.py

358

359

360

def __init__(self, sock, debuglevel=0, strict=0, method=None):

360

def __init__(self, sock, debuglevel=0, strict=0, method=None):

361

httplib.HTTPResponse.__init__(self, sock, debuglevel=debuglevel,

361

httplib.HTTPResponse.__init__(self, sock, debuglevel=debuglevel,

362

strict=True, method=method,

362

strict=True, method=method,

363

buffering=True)

363

buffering=True)

364

self.fileno = sock.fileno

364

self.fileno = sock.fileno

365

self.code = None

365

self.code = None

366

self._rbuf = ''

366

self._rbuf = ''

367

self._rbufsize = 8096

367

self._rbufsize = 8096

368

self._handler = None # inserted by the handler later

368

self._handler = None # inserted by the handler later

369

self._host = None # (same)

369

self._host = None # (same)

370

self._url = None # (same)

370

self._url = None # (same)

371

self._connection = None # (same)

371

self._connection = None # (same)

372

373

_raw_read = httplib.HTTPResponse.read

373

_raw_read = httplib.HTTPResponse.read

374

375

def close(self):

375

def close(self):

376

if self.fp:

376

if self.fp:

377

self.fp.close()

377

self.fp.close()

378

self.fp = None

378

self.fp = None

379

if self._handler:

379

if self._handler:

380

self._handler._request_closed(self, self._host,

380

self._handler._request_closed(self, self._host,

381

self._connection)

381

self._connection)

382

383

def close_connection(self):

383

def close_connection(self):

384

self._handler._remove_connection(self._host, self._connection, close=1)

384

self._handler._remove_connection(self._host, self._connection, close=1)

385

self.close()

385

self.close()

386

387

def info(self):

387

def info(self):

388

return self.headers

388

return self.headers

389

390

def geturl(self):

390

def geturl(self):

391

return self._url

391

return self._url

392

393

def read(self, amt=None):

393

def read(self, amt=None):

394

# the _rbuf test is only in this first if for speed. It's not

394

# the _rbuf test is only in this first if for speed. It's not

395

# logically necessary

395

# logically necessary

396

if self._rbuf and ~~not~~ amt is None:

396

if self._rbuf and amt is not None:

397

L = len(self._rbuf)

397

L = len(self._rbuf)

398

if amt > L:

398

if amt > L:

399

amt -= L

399

amt -= L

400

else:

400

else:

401

s = self._rbuf[:amt]

401

s = self._rbuf[:amt]

402

self._rbuf = self._rbuf[amt:]

402

self._rbuf = self._rbuf[amt:]

403

return s

403

return s

404

405

s = self._rbuf + self._raw_read(amt)

405

s = self._rbuf + self._raw_read(amt)

406

self._rbuf = ''

406

self._rbuf = ''

407

return s

407

return s

408

409

# stolen from Python SVN #68532 to fix issue1088

409

# stolen from Python SVN #68532 to fix issue1088

410

def _read_chunked(self, amt):

410

def _read_chunked(self, amt):

411

chunk_left = self.chunk_left

411

chunk_left = self.chunk_left

412

parts = []

412

parts = []

413

414

while True:

414

while True:

415

if chunk_left is None:

415

if chunk_left is None:

416

line = self.fp.readline()

416

line = self.fp.readline()

417

i = line.find(';')

417

i = line.find(';')

418

if i >= 0:

418

if i >= 0:

419

line = line[:i] # strip chunk-extensions

419

line = line[:i] # strip chunk-extensions

420

try:

420

try:

421

chunk_left = int(line, 16)

421

chunk_left = int(line, 16)

422

except ValueError:

422

except ValueError:

423

# close the connection as protocol synchronization is

423

# close the connection as protocol synchronization is

424

# probably lost

424

# probably lost

425

self.close()

425

self.close()

426

raise httplib.IncompleteRead(''.join(parts))

426

raise httplib.IncompleteRead(''.join(parts))

427

if chunk_left == 0:

427

if chunk_left == 0:

428

break

428

break

429

if amt is None:

429

if amt is None:

430

parts.append(self._safe_read(chunk_left))

430

parts.append(self._safe_read(chunk_left))

431

elif amt < chunk_left:

431

elif amt < chunk_left:

432

parts.append(self._safe_read(amt))

432

parts.append(self._safe_read(amt))

433

self.chunk_left = chunk_left - amt

433

self.chunk_left = chunk_left - amt

434

return ''.join(parts)

434

return ''.join(parts)

435

elif amt == chunk_left:

435

elif amt == chunk_left:

436

parts.append(self._safe_read(amt))

436

parts.append(self._safe_read(amt))

437

self._safe_read(2) # toss the CRLF at the end of the chunk

437

self._safe_read(2) # toss the CRLF at the end of the chunk

438

self.chunk_left = None

438

self.chunk_left = None

439

return ''.join(parts)

439

return ''.join(parts)

440

else:

440

else:

441

parts.append(self._safe_read(chunk_left))

441

parts.append(self._safe_read(chunk_left))

442

amt -= chunk_left

442

amt -= chunk_left

443

444

# we read the whole chunk, get another

444

# we read the whole chunk, get another

445

self._safe_read(2) # toss the CRLF at the end of the chunk

445

self._safe_read(2) # toss the CRLF at the end of the chunk

446

chunk_left = None

446

chunk_left = None

447

448

# read and discard trailer up to the CRLF terminator

448

# read and discard trailer up to the CRLF terminator

449

### note: we shouldn't have any trailers!

449

### note: we shouldn't have any trailers!

450

while True:

450

while True:

451

line = self.fp.readline()

451

line = self.fp.readline()

452

if not line:

452

if not line:

453

# a vanishingly small number of sites EOF without

453

# a vanishingly small number of sites EOF without

454

# sending the trailer

454

# sending the trailer

455

break

455

break

456

if line == '\r\n':

456

if line == '\r\n':

457

break

457

break

458

459

# we read everything; close the "file"

459

# we read everything; close the "file"

460

self.close()

460

self.close()

461

462

return ''.join(parts)

462

return ''.join(parts)

463

464

def readline(self):

464

def readline(self):

465

# Fast path for a line is already available in read buffer.

465

# Fast path for a line is already available in read buffer.

466

i = self._rbuf.find('\n')

466

i = self._rbuf.find('\n')

467

if i >= 0:

467

if i >= 0:

468

i += 1

468

i += 1

469

line = self._rbuf[:i]

469

line = self._rbuf[:i]

470

self._rbuf = self._rbuf[i:]

470

self._rbuf = self._rbuf[i:]

471

return line

471

return line

472

473

# No newline in local buffer. Read until we find one.

473

# No newline in local buffer. Read until we find one.

474

chunks = [self._rbuf]

474

chunks = [self._rbuf]

475

i = -1

475

i = -1

476

readsize = self._rbufsize

476

readsize = self._rbufsize

477

while True:

477

while True:

478

new = self._raw_read(readsize)

478

new = self._raw_read(readsize)

479

if not new:

479

if not new:

480

break

480

break

481

482

chunks.append(new)

482

chunks.append(new)

483

i = new.find('\n')

483

i = new.find('\n')

484

if i >= 0:

484

if i >= 0:

485

break

485

break

486

487

# We either have exhausted the stream or have a newline in chunks[-1].

487

# We either have exhausted the stream or have a newline in chunks[-1].

488

489

# EOF

489

# EOF

490

if i == -1:

490

if i == -1:

491

self._rbuf = ''

491

self._rbuf = ''

492

return ''.join(chunks)

492

return ''.join(chunks)

493

494

i += 1

494

i += 1

495

self._rbuf = chunks[-1][i:]

495

self._rbuf = chunks[-1][i:]

496

chunks[-1] = chunks[-1][:i]

496

chunks[-1] = chunks[-1][:i]

497

return ''.join(chunks)

497

return ''.join(chunks)

498

499

def readlines(self, sizehint=0):

499

def readlines(self, sizehint=0):

500

total = 0

500

total = 0

501

list = []

501

list = []

502

while True:

502

while True:

503

line = self.readline()

503

line = self.readline()

504

if not line:

504

if not line:

505

break

505

break

506

list.append(line)

506

list.append(line)

507

total += len(line)

507

total += len(line)

508

if sizehint and total >= sizehint:

508

if sizehint and total >= sizehint:

509

break

509

break

510

return list

510

return list

511

512

def safesend(self, str):

512

def safesend(self, str):

513

"""Send `str' to the server.

513

"""Send `str' to the server.

514

515

Shamelessly ripped off from httplib to patch a bad behavior.

515

Shamelessly ripped off from httplib to patch a bad behavior.

516

"""

516

"""

517

# _broken_pipe_resp is an attribute we set in this function

517

# _broken_pipe_resp is an attribute we set in this function

518

# if the socket is closed while we're sending data but

518

# if the socket is closed while we're sending data but

519

# the server sent us a response before hanging up.

519

# the server sent us a response before hanging up.

520

# In that case, we want to pretend to send the rest of the

520

# In that case, we want to pretend to send the rest of the

521

# outgoing data, and then let the user use getresponse()

521

# outgoing data, and then let the user use getresponse()

522

# (which we wrap) to get this last response before

522

# (which we wrap) to get this last response before

523

# opening a new socket.

523

# opening a new socket.

524

if getattr(self, '_broken_pipe_resp', None) is not None:

524

if getattr(self, '_broken_pipe_resp', None) is not None:

525

return

525

return

526

527

if self.sock is None:

527

if self.sock is None:

528

if self.auto_open:

528

if self.auto_open:

529

self.connect()

529

self.connect()

530

else:

530

else:

531

raise httplib.NotConnected

531

raise httplib.NotConnected

532

533

# send the data to the server. if we get a broken pipe, then close

533

# send the data to the server. if we get a broken pipe, then close

534

# the socket. we want to reconnect when somebody tries to send again.

534

# the socket. we want to reconnect when somebody tries to send again.

535

#

535

#

536

# NOTE: we DO propagate the error, though, because we cannot simply

536

# NOTE: we DO propagate the error, though, because we cannot simply

537

# ignore the error... the caller will know if they can retry.

537

# ignore the error... the caller will know if they can retry.

538

if self.debuglevel > 0:

538

if self.debuglevel > 0:

539

print("send:", repr(str))

539

print("send:", repr(str))

540

try:

540

try:

541

blocksize = 8192

541

blocksize = 8192

542

read = getattr(str, 'read', None)

542

read = getattr(str, 'read', None)

543

if read is not None:

543

if read is not None:

544

if self.debuglevel > 0:

544

if self.debuglevel > 0:

545

print("sending a read()able")

545

print("sending a read()able")

546

data = read(blocksize)

546

data = read(blocksize)

547

while data:

547

while data:

548

self.sock.sendall(data)

548

self.sock.sendall(data)

549

data = read(blocksize)

549

data = read(blocksize)

550

else:

550

else:

551

self.sock.sendall(str)

551

self.sock.sendall(str)

552

except socket.error as v:

552

except socket.error as v:

553

reraise = True

553

reraise = True

554

if v[0] == errno.EPIPE: # Broken pipe

554

if v[0] == errno.EPIPE: # Broken pipe

555

if self._HTTPConnection__state == httplib._CS_REQ_SENT:

555

if self._HTTPConnection__state == httplib._CS_REQ_SENT:

556

self._broken_pipe_resp = None

556

self._broken_pipe_resp = None

557

self._broken_pipe_resp = self.getresponse()

557

self._broken_pipe_resp = self.getresponse()

558

reraise = False

558

reraise = False

559

self.close()

559

self.close()

560

if reraise:

560

if reraise:

561

raise

561

raise

562

563

def wrapgetresponse(cls):

563

def wrapgetresponse(cls):

564

"""Wraps getresponse in cls with a broken-pipe sane version.

564

"""Wraps getresponse in cls with a broken-pipe sane version.

565

"""

565

"""

566

def safegetresponse(self):

566

def safegetresponse(self):

567

# In safesend() we might set the _broken_pipe_resp

567

# In safesend() we might set the _broken_pipe_resp

568

# attribute, in which case the socket has already

568

# attribute, in which case the socket has already

569

# been closed and we just need to give them the response

569

# been closed and we just need to give them the response

570

# back. Otherwise, we use the normal response path.

570

# back. Otherwise, we use the normal response path.

571

r = getattr(self, '_broken_pipe_resp', None)

571

r = getattr(self, '_broken_pipe_resp', None)

572

if r is not None:

572

if r is not None:

573

return r

573

return r

574

return cls.getresponse(self)

574

return cls.getresponse(self)

575

safegetresponse.__doc__ = cls.getresponse.__doc__

575

safegetresponse.__doc__ = cls.getresponse.__doc__

576

return safegetresponse

576

return safegetresponse

577

578

class HTTPConnection(httplib.HTTPConnection):

578

class HTTPConnection(httplib.HTTPConnection):

579

# use the modified response class

579

# use the modified response class

580

response_class = HTTPResponse

580

response_class = HTTPResponse

581

send = safesend

581

send = safesend

582

getresponse = wrapgetresponse(httplib.HTTPConnection)

582

getresponse = wrapgetresponse(httplib.HTTPConnection)

583

584

585

#########################################################################

585

#########################################################################

586

##### TEST FUNCTIONS

586

##### TEST FUNCTIONS

587

#########################################################################

587

#########################################################################

588

589

590

def continuity(url):

590

def continuity(url):

591

md5 = hashlib.md5

591

md5 = hashlib.md5

592

format = '%25s: %s'

592

format = '%25s: %s'

593

594

# first fetch the file with the normal http handler

594

# first fetch the file with the normal http handler

595

opener = urlreq.buildopener()

595

opener = urlreq.buildopener()

596

urlreq.installopener(opener)

596

urlreq.installopener(opener)

597

fo = urlreq.urlopen(url)

597

fo = urlreq.urlopen(url)

598

foo = fo.read()

598

foo = fo.read()

599

fo.close()

599

fo.close()

600

m = md5(foo)

600

m = md5(foo)

601

print(format % ('normal urllib', m.hexdigest()))

601

print(format % ('normal urllib', m.hexdigest()))

602

603

# now install the keepalive handler and try again

603

# now install the keepalive handler and try again

604

opener = urlreq.buildopener(HTTPHandler())

604

opener = urlreq.buildopener(HTTPHandler())

605

urlreq.installopener(opener)

605

urlreq.installopener(opener)

606

607

fo = urlreq.urlopen(url)

607

fo = urlreq.urlopen(url)

608

foo = fo.read()

608

foo = fo.read()

609

fo.close()

609

fo.close()

610

m = md5(foo)

610

m = md5(foo)

611

print(format % ('keepalive read', m.hexdigest()))

611

print(format % ('keepalive read', m.hexdigest()))

612

613

fo = urlreq.urlopen(url)

613

fo = urlreq.urlopen(url)

614

foo = ''

614

foo = ''

615

while True:

615

while True:

616

f = fo.readline()

616

f = fo.readline()

617

if f:

617

if f:

618

foo = foo + f

618

foo = foo + f

619

else: break

619

else: break

620

fo.close()

620

fo.close()

621

m = md5(foo)

621

m = md5(foo)

622

print(format % ('keepalive readline', m.hexdigest()))

622

print(format % ('keepalive readline', m.hexdigest()))

623

624

def comp(N, url):

624

def comp(N, url):

625

print(' making %i connections to:\n %s' % (N, url))

625

print(' making %i connections to:\n %s' % (N, url))

626

627

util.stdout.write(' first using the normal urllib handlers')

627

util.stdout.write(' first using the normal urllib handlers')

628

# first use normal opener

628

# first use normal opener

629

opener = urlreq.buildopener()

629

opener = urlreq.buildopener()

630

urlreq.installopener(opener)

630

urlreq.installopener(opener)

631

t1 = fetch(N, url)

631

t1 = fetch(N, url)

632

print(' TIME: %.3f s' % t1)

632

print(' TIME: %.3f s' % t1)

633

634

util.stdout.write(' now using the keepalive handler ')

634

util.stdout.write(' now using the keepalive handler ')

635

# now install the keepalive handler and try again

635

# now install the keepalive handler and try again

636

opener = urlreq.buildopener(HTTPHandler())

636

opener = urlreq.buildopener(HTTPHandler())

637

urlreq.installopener(opener)

637

urlreq.installopener(opener)

638

t2 = fetch(N, url)

638

t2 = fetch(N, url)

639

print(' TIME: %.3f s' % t2)

639

print(' TIME: %.3f s' % t2)

640

print(' improvement factor: %.2f' % (t1 / t2))

640

print(' improvement factor: %.2f' % (t1 / t2))

641

642

def fetch(N, url, delay=0):

642

def fetch(N, url, delay=0):

643

import time

643

import time

644

lens = []

644

lens = []

645

starttime = time.time()

645

starttime = time.time()

646

for i in range(N):

646

for i in range(N):

647

if delay and i > 0:

647

if delay and i > 0:

648

time.sleep(delay)

648

time.sleep(delay)

649

fo = urlreq.urlopen(url)

649

fo = urlreq.urlopen(url)

650

foo = fo.read()

650

foo = fo.read()

651

fo.close()

651

fo.close()

652

lens.append(len(foo))

652

lens.append(len(foo))

653

diff = time.time() - starttime

653

diff = time.time() - starttime

654

655

j = 0

655

j = 0

656

for i in lens[1:]:

656

for i in lens[1:]:

657

j = j + 1

657

j = j + 1

658

if not i == lens[0]:

658

if not i == lens[0]:

659

print("WARNING: inconsistent length on read %i: %i" % (j, i))

659

print("WARNING: inconsistent length on read %i: %i" % (j, i))

660

661

return diff

661

return diff

662

663

def test_timeout(url):

663

def test_timeout(url):

664

global DEBUG

664

global DEBUG

665

dbbackup = DEBUG

665

dbbackup = DEBUG

666

class FakeLogger(object):

666

class FakeLogger(object):

667

def debug(self, msg, *args):

667

def debug(self, msg, *args):

668

print(msg % args)

668

print(msg % args)

669

info = warning = error = debug

669

info = warning = error = debug

670

DEBUG = FakeLogger()

670

DEBUG = FakeLogger()

671

print(" fetching the file to establish a connection")

671

print(" fetching the file to establish a connection")

672

fo = urlreq.urlopen(url)

672

fo = urlreq.urlopen(url)

673

data1 = fo.read()

673

data1 = fo.read()

674

fo.close()

674

fo.close()

675

676

i = 20

676

i = 20

677

print(" waiting %i seconds for the server to close the connection" % i)

677

print(" waiting %i seconds for the server to close the connection" % i)

678

while i > 0:

678

while i > 0:

679

util.stdout.write('\r %2i' % i)

679

util.stdout.write('\r %2i' % i)

680

util.stdout.flush()

680

util.stdout.flush()

681

time.sleep(1)

681

time.sleep(1)

682

i -= 1

682

i -= 1

683

util.stderr.write('\r')

683

util.stderr.write('\r')

684

685

print(" fetching the file a second time")

685

print(" fetching the file a second time")

686

fo = urlreq.urlopen(url)

686

fo = urlreq.urlopen(url)

687

data2 = fo.read()

687

data2 = fo.read()

688

fo.close()

688

fo.close()

689

690

if data1 == data2:

690

if data1 == data2:

691

print(' data are identical')

691

print(' data are identical')

692

else:

692

else:

693

print(' ERROR: DATA DIFFER')

693

print(' ERROR: DATA DIFFER')

694

695

DEBUG = dbbackup

695

DEBUG = dbbackup

696

697

698

def test(url, N=10):

698

def test(url, N=10):

699

print("performing continuity test (making sure stuff isn't corrupted)")

699

print("performing continuity test (making sure stuff isn't corrupted)")

700

continuity(url)

700

continuity(url)

701

print('')

701

print('')

702

print("performing speed comparison")

702

print("performing speed comparison")

703

comp(N, url)

703

comp(N, url)

704

print('')

704

print('')

705

print("performing dropped-connection check")

705

print("performing dropped-connection check")

706

test_timeout(url)

706

test_timeout(url)

707

708

if __name__ == '__main__':

708

if __name__ == '__main__':

709

import time

709

import time

710

try:

710

try:

711

N = int(sys.argv[1])

711

N = int(sys.argv[1])

712

url = sys.argv[2]

712

url = sys.argv[2]

713

except (IndexError, ValueError):

713

except (IndexError, ValueError):

714

print("%s <integer> <url>" % sys.argv[0])

714

print("%s <integer> <url>" % sys.argv[0])

715

else:

715

else:

716

test(url, N)

716

test(url, N)

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

             #   This library is free software; you can redistribute it and/or
             #   modify it under the terms of the GNU Lesser General Public
             #   License as published by the Free Software Foundation; either
             #   version 2.1 of the License, or (at your option) any later version.
             #
             #   This library is distributed in the hope that it will be useful,
             #   but WITHOUT ANY WARRANTY; without even the implied warranty of
             #   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
             #   Lesser General Public License for more details.
             #
             #   You should have received a copy of the GNU Lesser General Public
             #   License along with this library; if not, see
             #   <http://www.gnu.org/licenses/>.
             # This file is part of urlgrabber, a high-level cross-protocol url-grabber
             # Copyright 2002-2004 Michael D. Stenner, Ryan Tomayko
             # Modified by Benoit Boissinot:
             #  - fix for digest auth (inspired from urllib2.py @ Python v2.4)
             # Modified by Dirkjan Ochtman:
             #  - import md5 function from a local util module
             # Modified by Augie Fackler:
             #  - add safesend method and use it to prevent broken pipe errors
             #    on large POST requests
             """An HTTP handler for urllib2 that supports HTTP 1.1 and keepalive.
             >>> import urllib2
             >>> from keepalive import HTTPHandler
             >>> keepalive_handler = HTTPHandler()
             >>> opener = urlreq.buildopener(keepalive_handler)
             >>> urlreq.installopener(opener)
             >>>
             >>> fo = urlreq.urlopen('http://www.python.org')
             If a connection to a given host is requested, and all of the existing
             connections are still in use, another connection will be opened.  If
             the handler tries to use an existing connection but it fails in some
             way, it will be closed and removed from the pool.
             To remove the handler, simply re-run build_opener with no arguments, and
             install that opener.
             You can explicitly close connections by using the close_connection()
             method of the returned file-like object (described below) or you can
             use the handler methods:
               close_connection(host)
               close_all()
               open_connections()
             NOTE: using the close_connection and close_all methods of the handler
             should be done with care when using multiple threads.
               * there is nothing that prevents another thread from creating new
                 connections immediately after connections are closed
               * no checks are done to prevent in-use connections from being closed
             >>> keepalive_handler.close_all()
             EXTRA ATTRIBUTES AND METHODS
               Upon a status of 200, the object returned has a few additional
               attributes and methods, which should not be used if you want to
               remain consistent with the normal urllib2-returned objects:
                 close_connection()  -  close the connection to the host
                 readlines()         -  you know, readlines()
                 status              -  the return status (i.e. 404)
                 reason              -  english translation of status (i.e. 'File not found')
               If you want the best of both worlds, use this inside an
               AttributeError-catching try:
               >>> try: status = fo.status
               >>> except AttributeError: status = None
               Unfortunately, these are ONLY there if status == 200, so it's not
               easy to distinguish between non-200 responses.  The reason is that
               urllib2 tries to do clever things with error codes 301, 302, 401,
               and 407, and it wraps the object upon return.
             """
             # $Id: keepalive.py,v 1.14 2006/04/04 21:00:32 mstenner Exp $
             from __future__ import absolute_import, print_function
             import errno
             import hashlib
             import socket
             import sys
             import threading
             from .i18n import _
             from . import (
                 util,
             )
             httplib = util.httplib
             urlerr = util.urlerr
             urlreq = util.urlreq
             DEBUG = None
             class ConnectionManager(object):
                 """
                 The connection manager must be able to:
                   * keep track of all existing
                   """
                 def __init__(self):
                     self._lock = threading.Lock()
                     self._hostmap = {} # map hosts to a list of connections
                     self._connmap = {} # map connections to host
                     self._readymap = {} # map connection to ready state
                 def add(self, host, connection, ready):
                     self._lock.acquire()
                     try:
                         if host not in self._hostmap:
                             self._hostmap[host] = []
                         self._hostmap[host].append(connection)
                         self._connmap[connection] = host
                         self._readymap[connection] = ready
                     finally:
                         self._lock.release()
                 def remove(self, connection):
                     self._lock.acquire()
                     try:
                         try:
                             host = self._connmap[connection]
                         except KeyError:
                             pass
                         else:
                             del self._connmap[connection]
                             del self._readymap[connection]
                             self._hostmap[host].remove(connection)
                             if not self._hostmap[host]: del self._hostmap[host]
                     finally:
                         self._lock.release()
                 def set_ready(self, connection, ready):
                     try:
                         self._readymap[connection] = ready
                     except KeyError:
                         pass
                 def get_ready_conn(self, host):
                     conn = None
                     self._lock.acquire()
                     try:
                         if host in self._hostmap:
                             for c in self._hostmap[host]:
                                 if self._readymap[c]:
                                     self._readymap[c] = 0
                                     conn = c
                                     break
                     finally:
                         self._lock.release()
                     return conn
                 def get_all(self, host=None):
                     if host:
                         return list(self._hostmap.get(host, []))
                     else:
                         return dict(self._hostmap)
             class KeepAliveHandler(object):
                 def __init__(self):
                     self._cm = ConnectionManager()
                 #### Connection Management
                 def open_connections(self):
                     """return a list of connected hosts and the number of connections
                     to each.  [('foo.com:80', 2), ('bar.org', 1)]"""
                     return [(host, len(li)) for (host, li) in self._cm.get_all().items()]
                 def close_connection(self, host):
                     """close connection(s) to <host>
                     host is the host:port spec, as in 'www.cnn.com:8080' as passed in.
                     no error occurs if there is no connection to that host."""
                     for h in self._cm.get_all(host):
                         self._cm.remove(h)
                         h.close()
                 def close_all(self):
                     """close all open connections"""
                     for host, conns in self._cm.get_all().iteritems():
                         for h in conns:
                             self._cm.remove(h)
                             h.close()
                 def _request_closed(self, request, host, connection):
                     """tells us that this request is now closed and that the
                     connection is ready for another request"""
                     self._cm.set_ready(connection, 1)
                 def _remove_connection(self, host, connection, close=0):
                     if close:
                         connection.close()
                     self._cm.remove(connection)
                 #### Transaction Execution
                 def http_open(self, req):
                     return self.do_open(HTTPConnection, req)
                 def do_open(self, http_class, req):
                     host = req.get_host()
                     if not host:
                         raise urlerr.urlerror('no host given')
                     try:
                         h = self._cm.get_ready_conn(host)
                         while h:
                             r = self._reuse_connection(h, req, host)
                             # if this response is non-None, then it worked and we're
                             # done.  Break out, skipping the else block.
                             if r:
                                 break
                             # connection is bad - possibly closed by server
                             # discard it and ask for the next free connection
                             h.close()
                             self._cm.remove(h)
                             h = self._cm.get_ready_conn(host)
                         else:
                             # no (working) free connections were found.  Create a new one.
                             h = http_class(host)
                             if DEBUG:
                                 DEBUG.info("creating new connection to %s (%d)",
                                            host, id(h))
                             self._cm.add(host, h, 0)
                             self._start_transaction(h, req)
                             r = h.getresponse()
                     # The string form of BadStatusLine is the status line. Add some context
                     # to make the error message slightly more useful.
                     except httplib.BadStatusLine as err:
                         raise urlerr.urlerror(_('bad HTTP status line: %s') % err.line)
                     except (socket.error, httplib.HTTPException) as err:
                         raise urlerr.urlerror(err)
                     # if not a persistent connection, don't try to reuse it
                     if r.will_close:
                         self._cm.remove(h)
                     if DEBUG:
                         DEBUG.info("STATUS: %s, %s", r.status, r.reason)
                     r._handler = self
                     r._host = host
                     r._url = req.get_full_url()
                     r._connection = h
                     r.code = r.status
                     r.headers = r.msg
                     r.msg = r.reason
                     return r
                 def _reuse_connection(self, h, req, host):
                     """start the transaction with a re-used connection
                     return a response object (r) upon success or None on failure.
                     This DOES not close or remove bad connections in cases where
                     it returns.  However, if an unexpected exception occurs, it
                     will close and remove the connection before re-raising.
                     """
                     try:
                         self._start_transaction(h, req)
                         r = h.getresponse()
                         # note: just because we got something back doesn't mean it
                         # worked.  We'll check the version below, too.
                     except (socket.error, httplib.HTTPException):
                         r = None
                     except: # re-raises
                         # adding this block just in case we've missed
                         # something we will still raise the exception, but
                         # lets try and close the connection and remove it
                         # first.  We previously got into a nasty loop
                         # where an exception was uncaught, and so the
                         # connection stayed open.  On the next try, the
                         # same exception was raised, etc.  The trade-off is
                         # that it's now possible this call will raise
                         # a DIFFERENT exception
                         if DEBUG:
                             DEBUG.error("unexpected exception - closing "
                                         "connection to %s (%d)", host, id(h))
                         self._cm.remove(h)
                         h.close()
                         raise
                     if r is None or r.version == 9:
                         # httplib falls back to assuming HTTP 0.9 if it gets a
                         # bad header back.  This is most likely to happen if
                         # the socket has been closed by the server since we
                         # last used the connection.
                         if DEBUG:
                             DEBUG.info("failed to re-use connection to %s (%d)",
                                        host, id(h))
                         r = None
                     else:
                         if DEBUG:
                             DEBUG.info("re-using connection to %s (%d)", host, id(h))
                     return r
                 def _start_transaction(self, h, req):
                     # What follows mostly reimplements HTTPConnection.request()
                     # except it adds self.parent.addheaders in the mix and sends headers
                     # in a deterministic order (to make testing easier).
                     headers = util.sortdict(self.parent.addheaders)
                     headers.update(sorted(req.headers.items()))
                     headers.update(sorted(req.unredirected_hdrs.items()))
                     headers = util.sortdict((n.lower(), v) for n, v in headers.items())
                     skipheaders = {}
                     for n in ('host', 'accept-encoding'):
                         if n in headers:
                             skipheaders['skip_' + n.replace('-', '_')] = 1
                     try:
                         if req.has_data():
                             data = req.get_data()
                             h.putrequest(
                                 req.get_method(), req.get_selector(), **skipheaders)
                             if 'content-type' not in headers:
                                 h.putheader('Content-type',
                                             'application/x-www-form-urlencoded')
                             if 'content-length' not in headers:
                                 h.putheader('Content-length', '%d' % len(data))
                         else:
                             h.putrequest(
                                 req.get_method(), req.get_selector(), **skipheaders)
                     except socket.error as err:
                         raise urlerr.urlerror(err)
                     for k, v in headers.items():
                         h.putheader(k, v)
                     h.endheaders()
                     if req.has_data():
                         h.send(data)
             class HTTPHandler(KeepAliveHandler, urlreq.httphandler):
                 pass
             class HTTPResponse(httplib.HTTPResponse):
                 # we need to subclass HTTPResponse in order to
                 # 1) add readline() and readlines() methods
                 # 2) add close_connection() methods
                 # 3) add info() and geturl() methods
                 # in order to add readline(), read must be modified to deal with a
                 # buffer.  example: readline must read a buffer and then spit back
                 # one line at a time.  The only real alternative is to read one
                 # BYTE at a time (ick).  Once something has been read, it can't be
                 # put back (ok, maybe it can, but that's even uglier than this),
                 # so if you THEN do a normal read, you must first take stuff from
                 # the buffer.
                 # the read method wraps the original to accommodate buffering,
                 # although read() never adds to the buffer.
                 # Both readline and readlines have been stolen with almost no
                 # modification from socket.py
                 def __init__(self, sock, debuglevel=0, strict=0, method=None):
                     httplib.HTTPResponse.__init__(self, sock, debuglevel=debuglevel,
                                                   strict=True, method=method,
                                                   buffering=True)
                     self.fileno = sock.fileno
                     self.code = None
                     self._rbuf = ''
                     self._rbufsize = 8096
                     self._handler = None # inserted by the handler later
                     self._host = None    # (same)
                     self._url = None     # (same)
                     self._connection = None # (same)
                 _raw_read = httplib.HTTPResponse.read
                 def close(self):
                     if self.fp:
                         self.fp.close()
                         self.fp = None
                         if self._handler:
                             self._handler._request_closed(self, self._host,
                                                           self._connection)
                 def close_connection(self):
                     self._handler._remove_connection(self._host, self._connection, close=1)
                     self.close()
                 def info(self):
                     return self.headers
                 def geturl(self):
                     return self._url
                 def read(self, amt=None):
                     # the _rbuf test is only in this first if for speed.  It's not
                     # logically necessary
-                    if self._rbuf and not amt is None:
+                    if self._rbuf and amt is not None:
                         L = len(self._rbuf)
                         if amt > L:
                             amt -= L
                         else:
                             s = self._rbuf[:amt]
                             self._rbuf = self._rbuf[amt:]
                             return s
                     s = self._rbuf + self._raw_read(amt)
                     self._rbuf = ''
                     return s
                 # stolen from Python SVN #68532 to fix issue1088
                 def _read_chunked(self, amt):
                     chunk_left = self.chunk_left
                     parts = []
                     while True:
                         if chunk_left is None:
                             line = self.fp.readline()
                             i = line.find(';')
                             if i >= 0:
                                 line = line[:i] # strip chunk-extensions
                             try:
                                 chunk_left = int(line, 16)
                             except ValueError:
                                 # close the connection as protocol synchronization is
                                 # probably lost
                                 self.close()
                                 raise httplib.IncompleteRead(''.join(parts))
                             if chunk_left == 0:
                                 break
                         if amt is None:
                             parts.append(self._safe_read(chunk_left))
                         elif amt < chunk_left:
                             parts.append(self._safe_read(amt))
                             self.chunk_left = chunk_left - amt
                             return ''.join(parts)
                         elif amt == chunk_left:
                             parts.append(self._safe_read(amt))
                             self._safe_read(2)  # toss the CRLF at the end of the chunk
                             self.chunk_left = None
                             return ''.join(parts)
                         else:
                             parts.append(self._safe_read(chunk_left))
                             amt -= chunk_left
                         # we read the whole chunk, get another
                         self._safe_read(2)      # toss the CRLF at the end of the chunk
                         chunk_left = None
                     # read and discard trailer up to the CRLF terminator
                     ### note: we shouldn't have any trailers!
                     while True:
                         line = self.fp.readline()
                         if not line:
                             # a vanishingly small number of sites EOF without
                             # sending the trailer
                             break
                         if line == '\r\n':
                             break
                     # we read everything; close the "file"
                     self.close()
                     return ''.join(parts)
                 def readline(self):
                     # Fast path for a line is already available in read buffer.
                     i = self._rbuf.find('\n')
                     if i >= 0:
                         i += 1
                         line = self._rbuf[:i]
                         self._rbuf = self._rbuf[i:]
                         return line
                     # No newline in local buffer. Read until we find one.
                     chunks = [self._rbuf]
                     i = -1
                     readsize = self._rbufsize
                     while True:
                         new = self._raw_read(readsize)
                         if not new:
                             break
                         chunks.append(new)
                         i = new.find('\n')
                         if i >= 0:
                             break
                     # We either have exhausted the stream or have a newline in chunks[-1].
                     # EOF
                     if i == -1:
                         self._rbuf = ''
                         return ''.join(chunks)
                     i += 1
                     self._rbuf = chunks[-1][i:]
                     chunks[-1] = chunks[-1][:i]
                     return ''.join(chunks)
                 def readlines(self, sizehint=0):
                     total = 0
                     list = []
                     while True:
                         line = self.readline()
                         if not line:
                             break
                         list.append(line)
                         total += len(line)
                         if sizehint and total >= sizehint:
                             break
                     return list
             def safesend(self, str):
                 """Send `str' to the server.
                 Shamelessly ripped off from httplib to patch a bad behavior.
                 """
                 # _broken_pipe_resp is an attribute we set in this function
                 # if the socket is closed while we're sending data but
                 # the server sent us a response before hanging up.
                 # In that case, we want to pretend to send the rest of the
                 # outgoing data, and then let the user use getresponse()
                 # (which we wrap) to get this last response before
                 # opening a new socket.
                 if getattr(self, '_broken_pipe_resp', None) is not None:
                     return
                 if self.sock is None:
                     if self.auto_open:
                         self.connect()
                     else:
                         raise httplib.NotConnected
                 # send the data to the server. if we get a broken pipe, then close
                 # the socket. we want to reconnect when somebody tries to send again.
                 #
                 # NOTE: we DO propagate the error, though, because we cannot simply
                 #       ignore the error... the caller will know if they can retry.
                 if self.debuglevel > 0:
                     print("send:", repr(str))
                 try:
                     blocksize = 8192
                     read = getattr(str, 'read', None)
                     if read is not None:
                         if self.debuglevel > 0:
                             print("sending a read()able")
                         data = read(blocksize)
                         while data:
                             self.sock.sendall(data)
                             data = read(blocksize)
                     else:
                         self.sock.sendall(str)
                 except socket.error as v:
                     reraise = True
                     if v[0] == errno.EPIPE:      # Broken pipe
                         if self._HTTPConnection__state == httplib._CS_REQ_SENT:
                             self._broken_pipe_resp = None
                             self._broken_pipe_resp = self.getresponse()
                             reraise = False
                         self.close()
                     if reraise:
                         raise
             def wrapgetresponse(cls):
                 """Wraps getresponse in cls with a broken-pipe sane version.
                 """
                 def safegetresponse(self):
                     # In safesend() we might set the _broken_pipe_resp
                     # attribute, in which case the socket has already
                     # been closed and we just need to give them the response
                     # back. Otherwise, we use the normal response path.
                     r = getattr(self, '_broken_pipe_resp', None)
                     if r is not None:
                         return r
                     return cls.getresponse(self)
                 safegetresponse.__doc__ = cls.getresponse.__doc__
                 return safegetresponse
             class HTTPConnection(httplib.HTTPConnection):
                 # use the modified response class
                 response_class = HTTPResponse
                 send = safesend
                 getresponse = wrapgetresponse(httplib.HTTPConnection)
             #########################################################################
             #####   TEST FUNCTIONS
             #########################################################################
             def continuity(url):
                 md5 = hashlib.md5
                 format = '%25s: %s'
                 # first fetch the file with the normal http handler
                 opener = urlreq.buildopener()
                 urlreq.installopener(opener)
                 fo = urlreq.urlopen(url)
                 foo = fo.read()
                 fo.close()
                 m = md5(foo)
                 print(format % ('normal urllib', m.hexdigest()))
                 # now install the keepalive handler and try again
                 opener = urlreq.buildopener(HTTPHandler())
                 urlreq.installopener(opener)
                 fo = urlreq.urlopen(url)
                 foo = fo.read()
                 fo.close()
                 m = md5(foo)
                 print(format % ('keepalive read', m.hexdigest()))
                 fo = urlreq.urlopen(url)
                 foo = ''
                 while True:
                     f = fo.readline()
                     if f:
                         foo = foo + f
                     else: break
                 fo.close()
                 m = md5(foo)
                 print(format % ('keepalive readline', m.hexdigest()))
             def comp(N, url):
                 print('  making %i connections to:\n  %s' % (N, url))
                 util.stdout.write('  first using the normal urllib handlers')
                 # first use normal opener
                 opener = urlreq.buildopener()
                 urlreq.installopener(opener)
                 t1 = fetch(N, url)
                 print('  TIME: %.3f s' % t1)
                 util.stdout.write('  now using the keepalive handler       ')
                 # now install the keepalive handler and try again
                 opener = urlreq.buildopener(HTTPHandler())
                 urlreq.installopener(opener)
                 t2 = fetch(N, url)
                 print('  TIME: %.3f s' % t2)
                 print('  improvement factor: %.2f' % (t1 / t2))
             def fetch(N, url, delay=0):
                 import time
                 lens = []
                 starttime = time.time()
                 for i in range(N):
                     if delay and i > 0:
                         time.sleep(delay)
                     fo = urlreq.urlopen(url)
                     foo = fo.read()
                     fo.close()
                     lens.append(len(foo))
                 diff = time.time() - starttime
                 j = 0
                 for i in lens[1:]:
                     j = j + 1
                     if not i == lens[0]:
                         print("WARNING: inconsistent length on read %i: %i" % (j, i))
                 return diff
             def test_timeout(url):
                 global DEBUG
                 dbbackup = DEBUG
                 class FakeLogger(object):
                     def debug(self, msg, *args):
                         print(msg % args)
                     info = warning = error = debug
                 DEBUG = FakeLogger()
                 print("  fetching the file to establish a connection")
                 fo = urlreq.urlopen(url)
                 data1 = fo.read()
                 fo.close()
                 i = 20
                 print("  waiting %i seconds for the server to close the connection" % i)
                 while i > 0:
                     util.stdout.write('\r  %2i' % i)
                     util.stdout.flush()
                     time.sleep(1)
                     i -= 1
                 util.stderr.write('\r')
                 print("  fetching the file a second time")
                 fo = urlreq.urlopen(url)
                 data2 = fo.read()
                 fo.close()
                 if data1 == data2:
                     print('  data are identical')
                 else:
                     print('  ERROR: DATA DIFFER')
                 DEBUG = dbbackup
             def test(url, N=10):
                 print("performing continuity test (making sure stuff isn't corrupted)")
                 continuity(url)
                 print('')
                 print("performing speed comparison")
                 comp(N, url)
                 print('')
                 print("performing dropped-connection check")
                 test_timeout(url)
             if __name__ == '__main__':
                 import time
                 try:
                     N = int(sys.argv[1])
                     url = sys.argv[2]
                 except (IndexError, ValueError):
                     print("%s <integer> <url>" % sys.argv[0])
                 else:
                     test(url, N)

             # parsers.py - Python implementation of parsers.c
             #
             # Copyright 2009 Matt Mackall <mpm@selenic.com> and others
             #
             # This software may be used and distributed according to the terms of the
             # GNU General Public License version 2 or any later version.
             from __future__ import absolute_import
             import struct
             import zlib
             from ..node import nullid
             from .. import pycompat
             stringio = pycompat.stringio
             _pack = struct.pack
             _unpack = struct.unpack
             _compress = zlib.compress
             _decompress = zlib.decompress
             # Some code below makes tuples directly because it's more convenient. However,
             # code outside this module should always use dirstatetuple.
             def dirstatetuple(*x):
                 # x is a tuple
                 return x
             indexformatng = ">Qiiiiii20s12x"
             indexfirst = struct.calcsize('Q')
             sizeint = struct.calcsize('i')
             indexsize = struct.calcsize(indexformatng)
             def gettype(q):
                 return int(q & 0xFFFF)
             def offset_type(offset, type):
                 return int(int(offset) << 16 | type)
             class BaseIndexObject(object):
                 def __len__(self):
                     return self._lgt + len(self._extra) + 1
                 def insert(self, i, tup):
                     assert i == -1
                     self._extra.append(tup)
                 def _fix_index(self, i):
                     if not isinstance(i, int):
                         raise TypeError("expecting int indexes")
                     if i < 0:
                         i = len(self) + i
                     if i < 0 or i >= len(self):
                         raise IndexError
                     return i
                 def __getitem__(self, i):
                     i = self._fix_index(i)
                     if i == len(self) - 1:
                         return (0, 0, 0, -1, -1, -1, -1, nullid)
                     if i >= self._lgt:
                         return self._extra[i - self._lgt]
                     index = self._calculate_index(i)
                     r = struct.unpack(indexformatng, self._data[index:index + indexsize])
                     if i == 0:
                         e = list(r)
                         type = gettype(e[0])
                         e[0] = offset_type(0, type)
                         return tuple(e)
                     return r
             class IndexObject(BaseIndexObject):
                 def __init__(self, data):
                     assert len(data) % indexsize == 0
                     self._data = data
                     self._lgt = len(data) // indexsize
                     self._extra = []
                 def _calculate_index(self, i):
                     return i * indexsize
                 def __delitem__(self, i):
-                    if not isinstance(i, slice) or not i.stop == -1 or not i.step is None:
+                    if not isinstance(i, slice) or not i.stop == -1 or i.step is not None:
                         raise ValueError("deleting slices only supports a:-1 with step 1")
                     i = self._fix_index(i.start)
                     if i < self._lgt:
                         self._data = self._data[:i * indexsize]
                         self._lgt = i
                         self._extra = []
                     else:
                         self._extra = self._extra[:i - self._lgt]
             class InlinedIndexObject(BaseIndexObject):
                 def __init__(self, data, inline=0):
                     self._data = data
                     self._lgt = self._inline_scan(None)
                     self._inline_scan(self._lgt)
                     self._extra = []
                 def _inline_scan(self, lgt):
                     off = 0
                     if lgt is not None:
                         self._offsets = [0] * lgt
                     count = 0
                     while off <= len(self._data) - indexsize:
                         s, = struct.unpack('>i',
                             self._data[off + indexfirst:off + sizeint + indexfirst])
                         if lgt is not None:
                             self._offsets[count] = off
                         count += 1
                         off += indexsize + s
                     if off != len(self._data):
                         raise ValueError("corrupted data")
                     return count
                 def __delitem__(self, i):
-                    if not isinstance(i, slice) or not i.stop == -1 or not i.step is None:
+                    if not isinstance(i, slice) or not i.stop == -1 or i.step is not None:
                         raise ValueError("deleting slices only supports a:-1 with step 1")
                     i = self._fix_index(i.start)
                     if i < self._lgt:
                         self._offsets = self._offsets[:i]
                         self._lgt = i
                         self._extra = []
                     else:
                         self._extra = self._extra[:i - self._lgt]
                 def _calculate_index(self, i):
                     return self._offsets[i]
             def parse_index2(data, inline):
                 if not inline:
                     return IndexObject(data), None
                 return InlinedIndexObject(data, inline), (0, data)
             def parse_dirstate(dmap, copymap, st):
                 parents = [st[:20], st[20: 40]]
                 # dereference fields so they will be local in loop
                 format = ">cllll"
                 e_size = struct.calcsize(format)
                 pos1 = 40
                 l = len(st)
                 # the inner loop
                 while pos1 < l:
                     pos2 = pos1 + e_size
                     e = _unpack(">cllll", st[pos1:pos2]) # a literal here is faster
                     pos1 = pos2 + e[4]
                     f = st[pos2:pos1]
                     if '\0' in f:
                         f, c = f.split('\0')
                         copymap[f] = c
                     dmap[f] = e[:4]
                 return parents
             def pack_dirstate(dmap, copymap, pl, now):
                 now = int(now)
                 cs = stringio()
                 write = cs.write
                 write("".join(pl))
                 for f, e in dmap.iteritems():
                     if e[0] == 'n' and e[3] == now:
                         # The file was last modified "simultaneously" with the current
                         # write to dirstate (i.e. within the same second for file-
                         # systems with a granularity of 1 sec). This commonly happens
                         # for at least a couple of files on 'update'.
                         # The user could change the file without changing its size
                         # within the same second. Invalidate the file's mtime in
                         # dirstate, forcing future 'status' calls to compare the
                         # contents of the file if the size is the same. This prevents
                         # mistakenly treating such files as clean.
                         e = dirstatetuple(e[0], e[1], e[2], -1)
                         dmap[f] = e
                     if f in copymap:
                         f = "%s\0%s" % (f, copymap[f])
                     e = _pack(">cllll", e[0], e[1], e[2], e[3], len(f))
                     write(e)
                     write(f)
                 return cs.getvalue()