upstream/kallithea Commit - r8362:933d2579

1

.. _performance:

1

.. _performance:

2

3

================================

3

================================

4

Optimizing Kallithea performance

4

Optimizing Kallithea performance

5

================================

5

================================

6

7

When serving a large amount of big repositories, Kallithea can start performing

7

When serving a large amount of big repositories, Kallithea can start performing

8

slower than expected. Because of the demanding nature of handling large amounts

8

slower than expected. Because of the demanding nature of handling large amounts

9

of data from version control systems, here are some tips on how to get the best

9

of data from version control systems, here are some tips on how to get the best

10

performance.

10

performance.

11

12

13

Fast storage

13

Fast storage

14

------------

14

------------

15

16

Kallithea is often I/O bound, and hence a fast disk (SSD/SAN) and plenty of RAM

16

Kallithea is often I/O bound, and hence a fast disk (SSD/SAN) and plenty of RAM

17

is usually more important than a fast CPU.

17

is usually more important than a fast CPU.

18

19

20

Caching

20

Caching

21

-------

21

-------

22

23

Tweak beaker cache settings in the ini file. The actual effect of that is

23

Tweak beaker cache settings in the ini file. The actual effect of that is

24

questionable.

24

questionable.

25

26

.. note::

26

.. note::

27

28

Beaker has no upper bound on cache size and will never drop any caches. For

28

Beaker has no upper bound on cache size and will never drop any caches. For

29

memory cache, the only option is to regularly restart the worker process.

29

memory cache, the only option is to regularly restart the worker process.

30

For file cache, it must be cleaned manually, as described in the `Beaker

30

For file cache, it must be cleaned manually, as described in the `Beaker

31

documentation <https://beaker.readthedocs.io/en/latest/sessions.html#removing-expired-old-sessions>`_::

31

documentation <https://beaker.readthedocs.io/en/latest/sessions.html#removing-expired-old-sessions>`_::

32

33

find data/cache -type f -mtime +30 -print -exec rm {} \;

33

find data/cache -type f -mtime +30 -print -exec rm {} \;

34

35

36

Database

36

Database

37

--------

37

--------

38

39

SQLite is a good option when having a small load on the system. But due to

39

SQLite is a good option when having a small load on the system. But due to

40

locking issues with SQLite, it is not recommended to use it for larger

40

locking issues with SQLite, it is not recommended to use it for larger

41

deployments.

41

deployments.

42

43

Switching to PostgreSQL or MariaDB/MySQL will result in an immediate performance

43

Switching to PostgreSQL or MariaDB/MySQL will result in an immediate performance

44

increase. A tool like SQLAlchemyGrate_ can be used for migrating to another

44

increase. A tool like SQLAlchemyGrate_ can be used for migrating to another

45

database platform.

45

database platform.

46

47

48

Horizontal scaling

48

Horizontal scaling

49

------------------

49

------------------

50

51

Scaling horizontally means running several Kallithea instances ~~and let them~~

51

Scaling horizontally means running several Kallithea instances (also known as

52

share the load. That can give huge performance benefits when dealing with large

52

worker processes) and let them share the load. That is essential to serve other

53

amounts of traffic (many users, CI servers, etc.). Kallithea can be scaled

53

users while processing a long-running request from a user. Usually, the

54

horizontally on one (recommended) or multiple machines.

54

bottleneck on a Kallithea server is not CPU but I/O speed - especially network

55

speed. It is thus a good idea to run multiple worker processes on one server.

55

56

It is generally possible to run WSGI applications multithreaded, so that

57

.. note::

57

several HTTP requests are served from the same Python process at once. That can

58

in principle give better utilization of internal caches and less process

59

overhead.

60

58

61

One danger of running multithreaded is that program execution becomes much more

59

Kallithea and the embedded Mercurial backend are not thread-safe. Each

62

complex; programs must be written to consider all combinations of events and

60

worker process must thus be single-threaded.

63

problems might depend on timing and be impossible to reproduce.

64

61

65

Kallithea can't promise to be thread-safe, just like the embedded Mercurial

62

Web servers can usually launch multiple worker processes - for example ``mod_wsgi`` with the

66

backend doesn't make any strong promises when used as Kallithea uses it.

63

``WSGIDaemonProcess`` ``processes`` parameter or ``uWSGI`` or ``gunicorn`` with

67

Instead, we recommend scaling by using multiple server processes.

64

their ``workers`` setting.

68

65

69

Web servers with multiple worker processes (such as ``mod_wsgi`` with the

66

Kallithea can also be scaled horizontally across multiple machines.

70

``WSGIDaemonProcess`` ``processes`` parameter) will work out of the box.

71

72

In order to scale horizontally on multiple machines, you need to do the

67

In order to scale horizontally on multiple machines, you need to do the

73

following:

68

following:

74

69

75

- Each instance's ``data`` storage needs to be configured to be stored on a

70

- Each instance's ``data`` storage needs to be configured to be stored on a

76

shared disk storage, preferably together with repositories. This ``data``

71

shared disk storage, preferably together with repositories. This ``data``

77

dir contains template caches, sessions, whoosh index and is used for

72

dir contains template caches, sessions, whoosh index and is used for

78

task locking (so it is safe across multiple instances). Set the

73

task locking (so it is safe across multiple instances). Set the

79

``cache_dir``, ``index_dir``, ``beaker.cache.data_dir``, ``beaker.cache.lock_dir``

74

``cache_dir``, ``index_dir``, ``beaker.cache.data_dir``, ``beaker.cache.lock_dir``

80

variables in each .ini file to a shared location across Kallithea instances

75

variables in each .ini file to a shared location across Kallithea instances

81

- If using several Celery instances,

76

- If using several Celery instances,

82

the message broker should be common to all of them (e.g., one

77

the message broker should be common to all of them (e.g., one

83

shared RabbitMQ server)

78

shared RabbitMQ server)

84

- Load balance using round robin or IP hash, recommended is writing LB rules

79

- Load balance using round robin or IP hash, recommended is writing LB rules

85

that will separate regular user traffic from automated processes like CI

80

that will separate regular user traffic from automated processes like CI

86

servers or build bots.

81

servers or build bots.

87

82

88

83

89

Serve static files directly from the web server

84

Serve static files directly from the web server

90

-----------------------------------------------

85

-----------------------------------------------

91

86

92

With the default ``static_files`` ini setting, the Kallithea WSGI application

87

With the default ``static_files`` ini setting, the Kallithea WSGI application

93

will take care of serving the static files from ``kallithea/public/`` at the

88

will take care of serving the static files from ``kallithea/public/`` at the

94

root of the application URL.

89

root of the application URL.

95

90

96

The actual serving of the static files is very fast and unlikely to be a

91

The actual serving of the static files is very fast and unlikely to be a

97

problem in a Kallithea setup - the responses generated by Kallithea from

92

problem in a Kallithea setup - the responses generated by Kallithea from

98

database and repository content will take significantly more time and

93

database and repository content will take significantly more time and

99

resources.

94

resources.

100

95

101

To serve static files from the web server, use something like this Apache config

96

To serve static files from the web server, use something like this Apache config

102

snippet::

97

snippet::

103

98

104

Alias /images/ /srv/kallithea/kallithea/kallithea/public/images/

99

Alias /images/ /srv/kallithea/kallithea/kallithea/public/images/

105

Alias /css/ /srv/kallithea/kallithea/kallithea/public/css/

100

Alias /css/ /srv/kallithea/kallithea/kallithea/public/css/

106

Alias /js/ /srv/kallithea/kallithea/kallithea/public/js/

101

Alias /js/ /srv/kallithea/kallithea/kallithea/public/js/

107

Alias /codemirror/ /srv/kallithea/kallithea/kallithea/public/codemirror/

102

Alias /codemirror/ /srv/kallithea/kallithea/kallithea/public/codemirror/

108

Alias /fontello/ /srv/kallithea/kallithea/kallithea/public/fontello/

103

Alias /fontello/ /srv/kallithea/kallithea/kallithea/public/fontello/

109

104

110

Then disable serving of static files in the ``.ini`` ``app:main`` section::

105

Then disable serving of static files in the ``.ini`` ``app:main`` section::

111

106

112

static_files = false

107

static_files = false

113

108

114

If using Kallithea installed as a package, you should be able to find the files

109

If using Kallithea installed as a package, you should be able to find the files

115

under ``site-packages/kallithea``, either in your Python installation or in your

110

under ``site-packages/kallithea``, either in your Python installation or in your

116

virtualenv. When upgrading, make sure to update the web server configuration

111

virtualenv. When upgrading, make sure to update the web server configuration

117

too if necessary.

112

too if necessary.

118

113

119

It might also be possible to improve performance by configuring the web server

114

It might also be possible to improve performance by configuring the web server

120

to compress responses (served from static files or generated by Kallithea) when

115

to compress responses (served from static files or generated by Kallithea) when

121

serving them. That might also imply buffering of responses - that is more

116

serving them. That might also imply buffering of responses - that is more

122

likely to be a problem; large responses (clones or pulls) will have to be fully

117

likely to be a problem; large responses (clones or pulls) will have to be fully

123

processed and spooled to disk or memory before the client will see any

118

processed and spooled to disk or memory before the client will see any

124

response. See the documentation for your web server.

119

response. See the documentation for your web server.

125

120

126

121

127

.. _SQLAlchemyGrate: https://github.com/shazow/sqlalchemygrate

122

.. _SQLAlchemyGrate: https://github.com/shazow/sqlalchemygrate

123

.. _mod_wsgi: https://modwsgi.readthedocs.io/

124

.. _uWSGI: https://uwsgi-docs.readthedocs.io/

125

.. _gunicorn: http://pypi.python.org/pypi/gunicorn

	Site-wide shortcuts
/	Use quick search box
g h	Goto home page
g g	Goto my private gists page
g G	Goto my public gists page
g 0-9	Goto bookmarked items from 0-9
n r	New repository page
n g	New gist page

	Repositories
g s	Goto summary page
g c	Goto changelog page
g f	Goto files page
g F	Goto files page with file search activated
g p	Goto pull requests page
g o	Goto repository settings
g O	Goto repository access permissions settings
t s	Toggle sidebar on some pages

             .. _performance:
             ================================
             Optimizing Kallithea performance
             ================================
             When serving a large amount of big repositories, Kallithea can start performing
             slower than expected. Because of the demanding nature of handling large amounts
             of data from version control systems, here are some tips on how to get the best
             performance.
             Fast storage
             ------------
             Kallithea is often I/O bound, and hence a fast disk (SSD/SAN) and plenty of RAM
             is usually more important than a fast CPU.
             Caching
             -------
             Tweak beaker cache settings in the ini file. The actual effect of that is
             questionable.
             .. note::
                 Beaker has no upper bound on cache size and will never drop any caches. For
                 memory cache, the only option is to regularly restart the worker process.
                 For file cache, it must be cleaned manually, as described in the `Beaker
                 documentation <https://beaker.readthedocs.io/en/latest/sessions.html#removing-expired-old-sessions>`_::
                     find data/cache -type f -mtime +30 -print -exec rm {} \;
             Database
             --------
             SQLite is a good option when having a small load on the system. But due to
             locking issues with SQLite, it is not recommended to use it for larger
             deployments.
             Switching to PostgreSQL or MariaDB/MySQL will result in an immediate performance
             increase. A tool like SQLAlchemyGrate_ can be used for migrating to another
             database platform.
             Horizontal scaling
             ------------------
-            Scaling horizontally means running several Kallithea instances and let them
+            Scaling horizontally means running several Kallithea instances (also known as
-            share the load. That can give huge performance benefits when dealing with large
+            worker processes) and let them share the load. That is essential to serve other
-            amounts of traffic (many users, CI servers, etc.). Kallithea can be scaled
+            users while processing a long-running request from a user. Usually, the
-            horizontally on one (recommended) or multiple machines.
+            bottleneck on a Kallithea server is not CPU but I/O speed - especially network
+            speed. It is thus a good idea to run multiple worker processes on one server.
-            It is generally possible to run WSGI applications multithreaded, so that
+            .. note::
-            several HTTP requests are served from the same Python process at once. That can
-            in principle give better utilization of internal caches and less process
-            overhead.
-            One danger of running multithreaded is that program execution becomes much more
+                Kallithea and the embedded Mercurial backend are not thread-safe. Each
-            complex; programs must be written to consider all combinations of events and
+                worker process must thus be single-threaded.
-            problems might depend on timing and be impossible to reproduce.
-            Kallithea can't promise to be thread-safe, just like the embedded Mercurial
+            Web servers can usually launch multiple worker processes - for example ``mod_wsgi`` with the
-            backend doesn't make any strong promises when used as Kallithea uses it.
+            ``WSGIDaemonProcess`` ``processes`` parameter or ``uWSGI`` or ``gunicorn`` with
-            Instead, we recommend scaling by using multiple server processes.
+            their ``workers`` setting.
-            Web servers with multiple worker processes (such as ``mod_wsgi`` with the
+            Kallithea can also be scaled horizontally across multiple machines.
-            ``WSGIDaemonProcess`` ``processes`` parameter) will work out of the box.
             In order to scale horizontally on multiple machines, you need to do the
             following:
                 - Each instance's ``data`` storage needs to be configured to be stored on a
                   shared disk storage, preferably together with repositories. This ``data``
                   dir contains template caches, sessions, whoosh index and is used for
                   task locking (so it is safe across multiple instances). Set the
                   ``cache_dir``, ``index_dir``, ``beaker.cache.data_dir``, ``beaker.cache.lock_dir``
                   variables in each .ini file to a shared location across Kallithea instances
                 - If using several Celery instances,
                   the message broker should be common to all of them (e.g.,  one
                   shared RabbitMQ server)
                 - Load balance using round robin or IP hash, recommended is writing LB rules
                   that will separate regular user traffic from automated processes like CI
                   servers or build bots.
             Serve static files directly from the web server
             -----------------------------------------------
             With the default ``static_files`` ini setting, the Kallithea WSGI application
             will take care of serving the static files from ``kallithea/public/`` at the
             root of the application URL.
             The actual serving of the static files is very fast and unlikely to be a
             problem in a Kallithea setup - the responses generated by Kallithea from
             database and repository content will take significantly more time and
             resources.
             To serve static files from the web server, use something like this Apache config
             snippet::
                     Alias /images/ /srv/kallithea/kallithea/kallithea/public/images/
                     Alias /css/ /srv/kallithea/kallithea/kallithea/public/css/
                     Alias /js/ /srv/kallithea/kallithea/kallithea/public/js/
                     Alias /codemirror/ /srv/kallithea/kallithea/kallithea/public/codemirror/
                     Alias /fontello/ /srv/kallithea/kallithea/kallithea/public/fontello/
             Then disable serving of static files in the ``.ini`` ``app:main`` section::
                     static_files = false
             If using Kallithea installed as a package, you should be able to find the files
             under ``site-packages/kallithea``, either in your Python installation or in your
             virtualenv. When upgrading, make sure to update the web server configuration
             too if necessary.
             It might also be possible to improve performance by configuring the web server
             to compress responses (served from static files or generated by Kallithea) when
             serving them. That might also imply buffering of responses - that is more
             likely to be a problem; large responses (clones or pulls) will have to be fully
             processed and spooled to disk or memory before the client will see any
             response. See the documentation for your web server.
             .. _SQLAlchemyGrate: https://github.com/shazow/sqlalchemygrate
+            .. _mod_wsgi: https://modwsgi.readthedocs.io/
+            .. _uWSGI: https://uwsgi-docs.readthedocs.io/
+            .. _gunicorn: http://pypi.python.org/pypi/gunicorn