optimization.txt 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474
  1. ============================
  2. Database access optimization
  3. ============================
  4. Django's database layer provides various ways to help developers get the most
  5. out of their databases. This document gathers together links to the relevant
  6. documentation, and adds various tips, organized under a number of headings that
  7. outline the steps to take when attempting to optimize your database usage.
  8. Profile first
  9. =============
  10. As general programming practice, this goes without saying. Find out :ref:`what
  11. queries you are doing and what they are costing you <faq-see-raw-sql-queries>`.
  12. Use :meth:`.QuerySet.explain` to understand how specific ``QuerySet``\s are
  13. executed by your database. You may also want to use an external project like
  14. django-debug-toolbar_, or a tool that monitors your database directly.
  15. Remember that you may be optimizing for speed or memory or both, depending on
  16. your requirements. Sometimes optimizing for one will be detrimental to the
  17. other, but sometimes they will help each other. Also, work that is done by the
  18. database process might not have the same cost (to you) as the same amount of
  19. work done in your Python process. It is up to you to decide what your
  20. priorities are, where the balance must lie, and profile all of these as required
  21. since this will depend on your application and server.
  22. With everything that follows, remember to profile after every change to ensure
  23. that the change is a benefit, and a big enough benefit given the decrease in
  24. readability of your code. **All** of the suggestions below come with the caveat
  25. that in your circumstances the general principle might not apply, or might even
  26. be reversed.
  27. .. _django-debug-toolbar: https://github.com/jazzband/django-debug-toolbar/
  28. Use standard DB optimization techniques
  29. =======================================
  30. ...including:
  31. * Indexes_. This is a number one priority, *after* you have determined from
  32. profiling what indexes should be added. Use
  33. :attr:`Meta.indexes <django.db.models.Options.indexes>` or
  34. :attr:`Field.db_index <django.db.models.Field.db_index>` to add these from
  35. Django. Consider adding indexes to fields that you frequently query using
  36. :meth:`~django.db.models.query.QuerySet.filter()`,
  37. :meth:`~django.db.models.query.QuerySet.exclude()`,
  38. :meth:`~django.db.models.query.QuerySet.order_by()`, etc. as indexes may help
  39. to speed up lookups. Note that determining the best indexes is a complex
  40. database-dependent topic that will depend on your particular application.
  41. The overhead of maintaining an index may outweigh any gains in query speed.
  42. .. _Indexes: https://en.wikipedia.org/wiki/Database_index
  43. * Appropriate use of field types.
  44. We will assume you have done the things listed above. The rest of this document
  45. focuses on how to use Django in such a way that you are not doing unnecessary
  46. work. This document also does not address other optimization techniques that
  47. apply to all expensive operations, such as :doc:`general purpose caching
  48. </topics/cache>`.
  49. Understand ``QuerySet``\s
  50. =========================
  51. Understanding :doc:`QuerySets </ref/models/querysets>` is vital to getting good
  52. performance with simple code. In particular:
  53. Understand ``QuerySet`` evaluation
  54. ----------------------------------
  55. To avoid performance problems, it is important to understand:
  56. * that :ref:`QuerySets are lazy <querysets-are-lazy>`.
  57. * when :ref:`they are evaluated <when-querysets-are-evaluated>`.
  58. * how :ref:`the data is held in memory <caching-and-querysets>`.
  59. Understand cached attributes
  60. ----------------------------
  61. As well as caching of the whole ``QuerySet``, there is caching of the result of
  62. attributes on ORM objects. In general, attributes that are not callable will be
  63. cached. For example, assuming the :ref:`example Weblog models
  64. <queryset-model-example>`::
  65. >>> entry = Entry.objects.get(id=1)
  66. >>> entry.blog # Blog object is retrieved at this point
  67. >>> entry.blog # cached version, no DB access
  68. But in general, callable attributes cause DB lookups every time::
  69. >>> entry = Entry.objects.get(id=1)
  70. >>> entry.authors.all() # query performed
  71. >>> entry.authors.all() # query performed again
  72. Be careful when reading template code - the template system does not allow use
  73. of parentheses, but will call callables automatically, hiding the above
  74. distinction.
  75. Be careful with your own custom properties - it is up to you to implement
  76. caching when required, for example using the
  77. :class:`~django.utils.functional.cached_property` decorator.
  78. Use the ``with`` template tag
  79. -----------------------------
  80. To make use of the caching behavior of ``QuerySet``, you may need to use the
  81. :ttag:`with` template tag.
  82. Use ``iterator()``
  83. ------------------
  84. When you have a lot of objects, the caching behavior of the ``QuerySet`` can
  85. cause a large amount of memory to be used. In this case,
  86. :meth:`~django.db.models.query.QuerySet.iterator()` may help.
  87. Use ``explain()``
  88. -----------------
  89. :meth:`.QuerySet.explain` gives you detailed information about how the database
  90. executes a query, including indexes and joins that are used. These details may
  91. help you find queries that could be rewritten more efficiently, or identify
  92. indexes that could be added to improve performance.
  93. Do database work in the database rather than in Python
  94. ======================================================
  95. For instance:
  96. * At the most basic level, use :ref:`filter and exclude <queryset-api>` to do
  97. filtering in the database.
  98. * Use :class:`F expressions <django.db.models.F>` to filter
  99. based on other fields within the same model.
  100. * Use :doc:`annotate to do aggregation in the database
  101. </topics/db/aggregation>`.
  102. If these aren't enough to generate the SQL you need:
  103. Use ``RawSQL``
  104. --------------
  105. A less portable but more powerful method is the
  106. :class:`~django.db.models.expressions.RawSQL` expression, which allows some SQL
  107. to be explicitly added to the query. If that still isn't powerful enough:
  108. Use raw SQL
  109. -----------
  110. Write your own :doc:`custom SQL to retrieve data or populate models
  111. </topics/db/sql>`. Use ``django.db.connection.queries`` to find out what Django
  112. is writing for you and start from there.
  113. Retrieve individual objects using a unique, indexed column
  114. ==========================================================
  115. There are two reasons to use a column with
  116. :attr:`~django.db.models.Field.unique` or
  117. :attr:`~django.db.models.Field.db_index` when using
  118. :meth:`~django.db.models.query.QuerySet.get` to retrieve individual objects.
  119. First, the query will be quicker because of the underlying database index.
  120. Also, the query could run much slower if multiple objects match the lookup;
  121. having a unique constraint on the column guarantees this will never happen.
  122. So using the :ref:`example Weblog models <queryset-model-example>`::
  123. >>> entry = Entry.objects.get(id=10)
  124. will be quicker than:
  125. >>> entry = Entry.objects.get(headline="News Item Title")
  126. because ``id`` is indexed by the database and is guaranteed to be unique.
  127. Doing the following is potentially quite slow:
  128. >>> entry = Entry.objects.get(headline__startswith="News")
  129. First of all, ``headline`` is not indexed, which will make the underlying
  130. database fetch slower.
  131. Second, the lookup doesn't guarantee that only one object will be returned.
  132. If the query matches more than one object, it will retrieve and transfer all of
  133. them from the database. This penalty could be substantial if hundreds or
  134. thousands of records are returned. The penalty will be compounded if the
  135. database lives on a separate server, where network overhead and latency also
  136. play a factor.
  137. Retrieve everything at once if you know you will need it
  138. ========================================================
  139. Hitting the database multiple times for different parts of a single 'set' of
  140. data that you will need all parts of is, in general, less efficient than
  141. retrieving it all in one query. This is particularly important if you have a
  142. query that is executed in a loop, and could therefore end up doing many database
  143. queries, when only one was needed. So:
  144. Use ``QuerySet.select_related()`` and ``prefetch_related()``
  145. ------------------------------------------------------------
  146. Understand :meth:`~django.db.models.query.QuerySet.select_related` and
  147. :meth:`~django.db.models.query.QuerySet.prefetch_related` thoroughly, and use
  148. them:
  149. * in :doc:`managers and default managers </topics/db/managers>` where
  150. appropriate. Be aware when your manager is and is not used; sometimes this is
  151. tricky so don't make assumptions.
  152. * in view code or other layers, possibly making use of
  153. :func:`~django.db.models.prefetch_related_objects` where needed.
  154. Don't retrieve things you don't need
  155. ====================================
  156. Use ``QuerySet.values()`` and ``values_list()``
  157. -----------------------------------------------
  158. When you only want a ``dict`` or ``list`` of values, and don't need ORM model
  159. objects, make appropriate usage of
  160. :meth:`~django.db.models.query.QuerySet.values()`.
  161. These can be useful for replacing model objects in template code - as long as
  162. the dicts you supply have the same attributes as those used in the template,
  163. you are fine.
  164. Use ``QuerySet.defer()`` and ``only()``
  165. ---------------------------------------
  166. Use :meth:`~django.db.models.query.QuerySet.defer()` and
  167. :meth:`~django.db.models.query.QuerySet.only()` if there are database columns
  168. you know that you won't need (or won't need in most cases) to avoid loading
  169. them. Note that if you *do* use them, the ORM will have to go and get them in
  170. a separate query, making this a pessimization if you use it inappropriately.
  171. Don't be too aggressive in deferring fields without profiling as the database
  172. has to read most of the non-text, non-VARCHAR data from the disk for a single
  173. row in the results, even if it ends up only using a few columns. The
  174. ``defer()`` and ``only()`` methods are most useful when you can avoid loading a
  175. lot of text data or for fields that might take a lot of processing to convert
  176. back to Python. As always, profile first, then optimize.
  177. Use ``QuerySet.count()``
  178. ------------------------
  179. ...if you only want the count, rather than doing ``len(queryset)``.
  180. Use ``QuerySet.exists()``
  181. -------------------------
  182. ...if you only want to find out if at least one result exists, rather than ``if
  183. queryset``.
  184. But:
  185. .. _overuse_of_count_and_exists:
  186. Don't overuse ``count()`` and ``exists()``
  187. ------------------------------------------
  188. If you are going to need other data from the QuerySet, evaluate it immediately.
  189. For example, assuming an Email model that has a ``body`` attribute and a
  190. many-to-many relation to User, the following template code is optimal:
  191. .. code-block:: html+django
  192. {% if display_inbox %}
  193. {% with emails=user.emails.all %}
  194. {% if emails %}
  195. <p>You have {{ emails|length }} email(s)</p>
  196. {% for email in emails %}
  197. <p>{{ email.body }}</p>
  198. {% endfor %}
  199. {% else %}
  200. <p>No messages today.</p>
  201. {% endif %}
  202. {% endwith %}
  203. {% endif %}
  204. It is optimal because:
  205. #. Since QuerySets are lazy, this does no database queries if 'display_inbox'
  206. is False.
  207. #. Use of :ttag:`with` means that we store ``user.emails.all`` in a variable
  208. for later use, allowing its cache to be re-used.
  209. #. The line ``{% if emails %}`` causes ``QuerySet.__bool__()`` to be called,
  210. which causes the ``user.emails.all()`` query to be run on the database, and
  211. at the least the first line to be turned into an ORM object. If there aren't
  212. any results, it will return False, otherwise True.
  213. #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling
  214. out the rest of the cache without doing another query.
  215. #. The :ttag:`for` loop iterates over the already filled cache.
  216. In total, this code does either one or zero database queries. The only
  217. deliberate optimization performed is the use of the :ttag:`with` tag. Using
  218. ``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause
  219. additional queries.
  220. Use ``QuerySet.update()`` and ``delete()``
  221. ------------------------------------------
  222. Rather than retrieve a load of objects, set some values, and save them
  223. individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
  224. <topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
  225. <topics-db-queries-delete>` where possible.
  226. Note, however, that these bulk update methods cannot call the ``save()`` or
  227. ``delete()`` methods of individual instances, which means that any custom
  228. behavior you have added for these methods will not be executed, including
  229. anything driven from the normal database object :doc:`signals </ref/signals>`.
  230. Use foreign key values directly
  231. -------------------------------
  232. If you only need a foreign key value, use the foreign key value that is already on
  233. the object you've got, rather than getting the whole related object and taking
  234. its primary key. i.e. do::
  235. entry.blog_id
  236. instead of::
  237. entry.blog.id
  238. Don't order results if you don't care
  239. -------------------------------------
  240. Ordering is not free; each field to order by is an operation the database must
  241. perform. If a model has a default ordering (:attr:`Meta.ordering
  242. <django.db.models.Options.ordering>`) and you don't need it, remove
  243. it on a ``QuerySet`` by calling
  244. :meth:`~django.db.models.query.QuerySet.order_by()` with no parameters.
  245. Adding an index to your database may help to improve ordering performance.
  246. Use bulk methods
  247. ================
  248. Use bulk methods to reduce the number of SQL statements.
  249. Create in bulk
  250. --------------
  251. When creating objects, where possible, use the
  252. :meth:`~django.db.models.query.QuerySet.bulk_create()` method to reduce the
  253. number of SQL queries. For example::
  254. Entry.objects.bulk_create([
  255. Entry(headline='This is a test'),
  256. Entry(headline='This is only a test'),
  257. ])
  258. ...is preferable to::
  259. Entry.objects.create(headline='This is a test')
  260. Entry.objects.create(headline='This is only a test')
  261. Note that there are a number of :meth:`caveats to this method
  262. <django.db.models.query.QuerySet.bulk_create>`, so make sure it's appropriate
  263. for your use case.
  264. Update in bulk
  265. --------------
  266. When updating objects, where possible, use the
  267. :meth:`~django.db.models.query.QuerySet.bulk_update()` method to reduce the
  268. number of SQL queries. Given a list or queryset of objects::
  269. entries = Entry.objects.bulk_create([
  270. Entry(headline='This is a test'),
  271. Entry(headline='This is only a test'),
  272. ])
  273. The following example::
  274. entries[0].headline = 'This is not a test'
  275. entries[1].headline = 'This is no longer a test'
  276. Entry.objects.bulk_update(entries, ['headline'])
  277. ...is preferable to::
  278. entries[0].headline = 'This is not a test'
  279. entries.save()
  280. entries[1].headline = 'This is no longer a test'
  281. entries.save()
  282. Note that there are a number of :meth:`caveats to this method
  283. <django.db.models.query.QuerySet.bulk_update>`, so make sure it's appropriate
  284. for your use case.
  285. Insert in bulk
  286. --------------
  287. When inserting objects into :class:`ManyToManyFields
  288. <django.db.models.ManyToManyField>`, use
  289. :meth:`~django.db.models.fields.related.RelatedManager.add` with multiple
  290. objects to reduce the number of SQL queries. For example::
  291. my_band.members.add(me, my_friend)
  292. ...is preferable to::
  293. my_band.members.add(me)
  294. my_band.members.add(my_friend)
  295. ...where ``Bands`` and ``Artists`` have a many-to-many relationship.
  296. When inserting different pairs of objects into
  297. :class:`~django.db.models.ManyToManyField` or when the custom
  298. :attr:`~django.db.models.ManyToManyField.through` table is defined, use
  299. :meth:`~django.db.models.query.QuerySet.bulk_create()` method to reduce the
  300. number of SQL queries. For example::
  301. PizzaToppingRelationship = Pizza.toppings.through
  302. PizzaToppingRelationship.objects.bulk_create([
  303. PizzaToppingRelationship(pizza=my_pizza, topping=pepperoni),
  304. PizzaToppingRelationship(pizza=your_pizza, topping=pepperoni),
  305. PizzaToppingRelationship(pizza=your_pizza, topping=mushroom),
  306. ], ignore_conflicts=True)
  307. ...is preferable to::
  308. my_pizza.toppings.add(pepperoni)
  309. your_pizza.toppings.add(pepperoni, mushroom)
  310. ...where ``Pizza`` and ``Topping`` have a many-to-many relationship. Note that
  311. there are a number of :meth:`caveats to this method
  312. <django.db.models.query.QuerySet.bulk_create>`, so make sure it's appropriate
  313. for your use case.
  314. Remove in bulk
  315. --------------
  316. When removing objects from :class:`ManyToManyFields
  317. <django.db.models.ManyToManyField>`, use
  318. :meth:`~django.db.models.fields.related.RelatedManager.remove` with multiple
  319. objects to reduce the number of SQL queries. For example::
  320. my_band.members.remove(me, my_friend)
  321. ...is preferable to::
  322. my_band.members.remove(me)
  323. my_band.members.remove(my_friend)
  324. ...where ``Bands`` and ``Artists`` have a many-to-many relationship.
  325. When removing different pairs of objects from :class:`ManyToManyFields
  326. <django.db.models.ManyToManyField>`, use
  327. :meth:`~django.db.models.query.QuerySet.delete` on a
  328. :class:`~django.db.models.Q` expression with multiple
  329. :attr:`~django.db.models.ManyToManyField.through` model instances to reduce
  330. the number of SQL queries. For example::
  331. from django.db.models import Q
  332. PizzaToppingRelationship = Pizza.toppings.through
  333. PizzaToppingRelationship.objects.filter(
  334. Q(pizza=my_pizza, topping=pepperoni) |
  335. Q(pizza=your_pizza, topping=pepperoni) |
  336. Q(pizza=your_pizza, topping=mushroom)
  337. ).delete()
  338. ...is preferable to::
  339. my_pizza.toppings.remove(pepperoni)
  340. your_pizza.toppings.remove(pepperoni, mushroom)
  341. ...where ``Pizza`` and ``Topping`` have a many-to-many relationship.