optimization.txt 9.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260
  1. .. _topics-db-optimization:
  2. ============================
  3. Database access optimization
  4. ============================
  5. Django's database layer provides various ways to help developers get the most
  6. out of their databases. This document gathers together links to the relevant
  7. documentation, and adds various tips, organized under an number of headings that
  8. outline the steps to take when attempting to optimize your database usage.
  9. Profile first
  10. =============
  11. As general programming practice, this goes without saying. Find out :ref:`what
  12. queries you are doing and what they are costing you
  13. <faq-see-raw-sql-queries>`. You may also want to use an external project like
  14. 'django-debug-toolbar', or a tool that monitors your database directly.
  15. Remember that you may be optimizing for speed or memory or both, depending on
  16. your requirements. Sometimes optimizing for one will be detrimental to the
  17. other, but sometimes they will help each other. Also, work that is done by the
  18. database process might not have the same cost (to you) as the same amount of
  19. work done in your Python process. It is up to you to decide what your
  20. priorities are, where the balance must lie, and profile all of these as required
  21. since this will depend on your application and server.
  22. With everything that follows, remember to profile after every change to ensure
  23. that the change is a benefit, and a big enough benefit given the decrease in
  24. readability of your code. **All** of the suggestions below come with the caveat
  25. that in your circumstances the general principle might not apply, or might even
  26. be reversed.
  27. Use standard DB optimization techniques
  28. =======================================
  29. ...including:
  30. * Indexes. This is a number one priority, *after* you have determined from
  31. profiling what indexes should be added. Use :attr:`django.db.models.Field.db_index` to add
  32. these from Django.
  33. * Appropriate use of field types.
  34. We will assume you have done the obvious things above. The rest of this document
  35. focuses on how to use Django in such a way that you are not doing unnecessary
  36. work. This document also does not address other optimization techniques that
  37. apply to all expensive operations, such as :ref:`general purpose caching
  38. <topics-cache>`.
  39. Understand QuerySets
  40. ====================
  41. Understanding :ref:`QuerySets <ref-models-querysets>` is vital to getting good
  42. performance with simple code. In particular:
  43. Understand QuerySet evaluation
  44. ------------------------------
  45. To avoid performance problems, it is important to understand:
  46. * that :ref:`QuerySets are lazy <querysets-are-lazy>`.
  47. * when :ref:`they are evaluated <when-querysets-are-evaluated>`.
  48. * how :ref:`the data is held in memory <caching-and-querysets>`.
  49. Understand cached attributes
  50. ----------------------------
  51. As well as caching of the whole ``QuerySet``, there is caching of the result of
  52. attributes on ORM objects. In general, attributes that are not callable will be
  53. cached. For example, assuming the :ref:`example weblog models
  54. <queryset-model-example>`:
  55. >>> entry = Entry.objects.get(id=1)
  56. >>> entry.blog # Blog object is retrieved at this point
  57. >>> entry.blog # cached version, no DB access
  58. But in general, callable attributes cause DB lookups every time::
  59. >>> entry = Entry.objects.get(id=1)
  60. >>> entry.authors.all() # query performed
  61. >>> entry.authors.all() # query performed again
  62. Be careful when reading template code - the template system does not allow use
  63. of parentheses, but will call callables automatically, hiding the above
  64. distinction.
  65. Be careful with your own custom properties - it is up to you to implement
  66. caching.
  67. Use the ``with`` template tag
  68. -----------------------------
  69. To make use of the caching behaviour of ``QuerySet``, you may need to use the
  70. :ttag:`with` template tag.
  71. Use ``iterator()``
  72. ------------------
  73. When you have a lot of objects, the caching behaviour of the ``QuerySet`` can
  74. cause a large amount of memory to be used. In this case,
  75. :meth:`~django.db.models.QuerySet.iterator()` may help.
  76. Do database work in the database rather than in Python
  77. ======================================================
  78. For instance:
  79. * At the most basic level, use :ref:`filter and exclude <queryset-api>` to do
  80. filtering in the database.
  81. * Use :ref:`F() object query expressions <query-expressions>` to do filtering
  82. against other fields within the same model.
  83. * Use :ref:`annotate to do aggregation in the database <topics-db-aggregation>`.
  84. If these aren't enough to generate the SQL you need:
  85. Use ``QuerySet.extra()``
  86. ------------------------
  87. A less portable but more powerful method is
  88. :meth:`~django.db.models.QuerySet.extra()`, which allows some SQL to be
  89. explicitly added to the query. If that still isn't powerful enough:
  90. Use raw SQL
  91. -----------
  92. Write your own :ref:`custom SQL to retrieve data or populate models
  93. <topics-db-sql>`. Use ``django.db.connection.queries`` to find out what Django
  94. is writing for you and start from there.
  95. Retrieve everything at once if you know you will need it
  96. ========================================================
  97. Hitting the database multiple times for different parts of a single 'set' of
  98. data that you will need all parts of is, in general, less efficient than
  99. retrieving it all in one query. This is particularly important if you have a
  100. query that is executed in a loop, and could therefore end up doing many database
  101. queries, when only one was needed. So:
  102. Use ``QuerySet.select_related()``
  103. ---------------------------------
  104. Understand :ref:`QuerySet.select_related() <select-related>` thoroughly, and use it:
  105. * in view code,
  106. * and in :ref:`managers and default managers <topics-db-managers>` where
  107. appropriate. Be aware when your manager is and is not used; sometimes this is
  108. tricky so don't make assumptions.
  109. Don't retrieve things you don't need
  110. ====================================
  111. Use ``QuerySet.values()`` and ``values_list()``
  112. -----------------------------------------------
  113. When you just want a dict/list of values, and don't need ORM model objects, make
  114. appropriate usage of :meth:`~django.db.models.QuerySet.values()`.
  115. These can be useful for replacing model objects in template code - as long as
  116. the dicts you supply have the same attributes as those used in the template, you
  117. are fine.
  118. Use ``QuerySet.defer()`` and ``only()``
  119. ---------------------------------------
  120. Use :meth:`~django.db.models.QuerySet.defer()` and
  121. :meth:`~django.db.models.QuerySet.only()` if there are database columns you
  122. know that you won't need (or won't need in most cases) to avoid loading
  123. them. Note that if you *do* use them, the ORM will have to go and get them in a
  124. separate query, making this a pessimization if you use it inappropriately.
  125. Use QuerySet.count()
  126. --------------------
  127. ...if you only want the count, rather than doing ``len(queryset)``.
  128. Use QuerySet.exists()
  129. ---------------------
  130. ...if you only want to find out if at least one result exists, rather than ``if
  131. queryset``.
  132. But:
  133. Don't overuse ``count()`` and ``exists()``
  134. ------------------------------------------
  135. If you are going to need other data from the QuerySet, just evaluate it.
  136. For example, assuming an Email class that has a ``body`` attribute and a
  137. many-to-many relation to User, the following template code is optimal:
  138. .. code-block:: html+django
  139. {% if display_inbox %}
  140. {% with user.emails.all as emails %}
  141. {% if emails %}
  142. <p>You have {{ emails|length }} email(s)</p>
  143. {% for email in emails %}
  144. <p>{{ email.body }}</p>
  145. {% endfor %}
  146. {% else %}
  147. <p>No messages today.</p>
  148. {% endif %}
  149. {% endwith %}
  150. {% endif %}
  151. It is optimal because:
  152. 1. Since QuerySets are lazy, this does no database if 'display_inbox' is False.
  153. #. Use of ``with`` means that we store ``user.emails.all`` in a variable for
  154. later use, allowing its cache to be re-used.
  155. #. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called,
  156. which causes the ``user.emails.all()`` query to be run on the database, and
  157. at the least the first line to be turned into an ORM object. If there aren't
  158. any results, it will return False, otherwise True.
  159. #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling
  160. out the rest of the cache without doing another query.
  161. #. The ``for`` loop iterates over the already filled cache.
  162. In total, this code does either one or zero database queries. The only
  163. deliberate optimization performed is the use of the ``with`` tag. Using
  164. ``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause
  165. additional queries.
  166. Use ``QuerySet.update()`` and ``delete()``
  167. ------------------------------------------
  168. Rather than retrieve a load of objects, set some values, and save them
  169. individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
  170. <topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
  171. <topics-db-queries-delete>` where possible.
  172. Note, however, that these bulk update methods cannot call the ``save()`` or ``delete()``
  173. methods of individual instances, which means that any custom behaviour you have
  174. added for these methods will not be executed, including anything driven from the
  175. normal database object :ref:`signals <ref-signals>`.
  176. Use foreign key values directly
  177. -------------------------------
  178. If you only need a foreign key value, use the foreign key value that is already on
  179. the object you've got, rather than getting the whole related object and taking
  180. its primary key. i.e. do::
  181. entry.blog_id
  182. instead of::
  183. entry.blog.id