123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260 |
- .. _topics-db-optimization:
- ============================
- Database access optimization
- ============================
- Django's database layer provides various ways to help developers get the most
- out of their databases. This document gathers together links to the relevant
- documentation, and adds various tips, organized under an number of headings that
- outline the steps to take when attempting to optimize your database usage.
- Profile first
- =============
- As general programming practice, this goes without saying. Find out :ref:`what
- queries you are doing and what they are costing you
- <faq-see-raw-sql-queries>`. You may also want to use an external project like
- 'django-debug-toolbar', or a tool that monitors your database directly.
- Remember that you may be optimizing for speed or memory or both, depending on
- your requirements. Sometimes optimizing for one will be detrimental to the
- other, but sometimes they will help each other. Also, work that is done by the
- database process might not have the same cost (to you) as the same amount of
- work done in your Python process. It is up to you to decide what your
- priorities are, where the balance must lie, and profile all of these as required
- since this will depend on your application and server.
- With everything that follows, remember to profile after every change to ensure
- that the change is a benefit, and a big enough benefit given the decrease in
- readability of your code. **All** of the suggestions below come with the caveat
- that in your circumstances the general principle might not apply, or might even
- be reversed.
- Use standard DB optimization techniques
- =======================================
- ...including:
- * Indexes. This is a number one priority, *after* you have determined from
- profiling what indexes should be added. Use :attr:`django.db.models.Field.db_index` to add
- these from Django.
- * Appropriate use of field types.
- We will assume you have done the obvious things above. The rest of this document
- focuses on how to use Django in such a way that you are not doing unnecessary
- work. This document also does not address other optimization techniques that
- apply to all expensive operations, such as :ref:`general purpose caching
- <topics-cache>`.
- Understand QuerySets
- ====================
- Understanding :ref:`QuerySets <ref-models-querysets>` is vital to getting good
- performance with simple code. In particular:
- Understand QuerySet evaluation
- ------------------------------
- To avoid performance problems, it is important to understand:
- * that :ref:`QuerySets are lazy <querysets-are-lazy>`.
- * when :ref:`they are evaluated <when-querysets-are-evaluated>`.
- * how :ref:`the data is held in memory <caching-and-querysets>`.
- Understand cached attributes
- ----------------------------
- As well as caching of the whole ``QuerySet``, there is caching of the result of
- attributes on ORM objects. In general, attributes that are not callable will be
- cached. For example, assuming the :ref:`example weblog models
- <queryset-model-example>`:
- >>> entry = Entry.objects.get(id=1)
- >>> entry.blog # Blog object is retrieved at this point
- >>> entry.blog # cached version, no DB access
- But in general, callable attributes cause DB lookups every time::
- >>> entry = Entry.objects.get(id=1)
- >>> entry.authors.all() # query performed
- >>> entry.authors.all() # query performed again
- Be careful when reading template code - the template system does not allow use
- of parentheses, but will call callables automatically, hiding the above
- distinction.
- Be careful with your own custom properties - it is up to you to implement
- caching.
- Use the ``with`` template tag
- -----------------------------
- To make use of the caching behaviour of ``QuerySet``, you may need to use the
- :ttag:`with` template tag.
- Use ``iterator()``
- ------------------
- When you have a lot of objects, the caching behaviour of the ``QuerySet`` can
- cause a large amount of memory to be used. In this case,
- :meth:`~django.db.models.QuerySet.iterator()` may help.
- Do database work in the database rather than in Python
- ======================================================
- For instance:
- * At the most basic level, use :ref:`filter and exclude <queryset-api>` to do
- filtering in the database.
- * Use :ref:`F() object query expressions <query-expressions>` to do filtering
- against other fields within the same model.
- * Use :ref:`annotate to do aggregation in the database <topics-db-aggregation>`.
- If these aren't enough to generate the SQL you need:
- Use ``QuerySet.extra()``
- ------------------------
- A less portable but more powerful method is
- :meth:`~django.db.models.QuerySet.extra()`, which allows some SQL to be
- explicitly added to the query. If that still isn't powerful enough:
- Use raw SQL
- -----------
- Write your own :ref:`custom SQL to retrieve data or populate models
- <topics-db-sql>`. Use ``django.db.connection.queries`` to find out what Django
- is writing for you and start from there.
- Retrieve everything at once if you know you will need it
- ========================================================
- Hitting the database multiple times for different parts of a single 'set' of
- data that you will need all parts of is, in general, less efficient than
- retrieving it all in one query. This is particularly important if you have a
- query that is executed in a loop, and could therefore end up doing many database
- queries, when only one was needed. So:
- Use ``QuerySet.select_related()``
- ---------------------------------
- Understand :ref:`QuerySet.select_related() <select-related>` thoroughly, and use it:
- * in view code,
- * and in :ref:`managers and default managers <topics-db-managers>` where
- appropriate. Be aware when your manager is and is not used; sometimes this is
- tricky so don't make assumptions.
- Don't retrieve things you don't need
- ====================================
- Use ``QuerySet.values()`` and ``values_list()``
- -----------------------------------------------
- When you just want a dict/list of values, and don't need ORM model objects, make
- appropriate usage of :meth:`~django.db.models.QuerySet.values()`.
- These can be useful for replacing model objects in template code - as long as
- the dicts you supply have the same attributes as those used in the template, you
- are fine.
- Use ``QuerySet.defer()`` and ``only()``
- ---------------------------------------
- Use :meth:`~django.db.models.QuerySet.defer()` and
- :meth:`~django.db.models.QuerySet.only()` if there are database columns you
- know that you won't need (or won't need in most cases) to avoid loading
- them. Note that if you *do* use them, the ORM will have to go and get them in a
- separate query, making this a pessimization if you use it inappropriately.
- Use QuerySet.count()
- --------------------
- ...if you only want the count, rather than doing ``len(queryset)``.
- Use QuerySet.exists()
- ---------------------
- ...if you only want to find out if at least one result exists, rather than ``if
- queryset``.
- But:
- Don't overuse ``count()`` and ``exists()``
- ------------------------------------------
- If you are going to need other data from the QuerySet, just evaluate it.
- For example, assuming an Email class that has a ``body`` attribute and a
- many-to-many relation to User, the following template code is optimal:
- .. code-block:: html+django
- {% if display_inbox %}
- {% with user.emails.all as emails %}
- {% if emails %}
- <p>You have {{ emails|length }} email(s)</p>
- {% for email in emails %}
- <p>{{ email.body }}</p>
- {% endfor %}
- {% else %}
- <p>No messages today.</p>
- {% endif %}
- {% endwith %}
- {% endif %}
- It is optimal because:
- 1. Since QuerySets are lazy, this does no database if 'display_inbox' is False.
- #. Use of ``with`` means that we store ``user.emails.all`` in a variable for
- later use, allowing its cache to be re-used.
- #. The line ``{% if emails %}`` causes ``QuerySet.__nonzero__()`` to be called,
- which causes the ``user.emails.all()`` query to be run on the database, and
- at the least the first line to be turned into an ORM object. If there aren't
- any results, it will return False, otherwise True.
- #. The use of ``{{ emails|length }}`` calls ``QuerySet.__len__()``, filling
- out the rest of the cache without doing another query.
- #. The ``for`` loop iterates over the already filled cache.
- In total, this code does either one or zero database queries. The only
- deliberate optimization performed is the use of the ``with`` tag. Using
- ``QuerySet.exists()`` or ``QuerySet.count()`` at any point would cause
- additional queries.
- Use ``QuerySet.update()`` and ``delete()``
- ------------------------------------------
- Rather than retrieve a load of objects, set some values, and save them
- individual, use a bulk SQL UPDATE statement, via :ref:`QuerySet.update()
- <topics-db-queries-update>`. Similarly, do :ref:`bulk deletes
- <topics-db-queries-delete>` where possible.
- Note, however, that these bulk update methods cannot call the ``save()`` or ``delete()``
- methods of individual instances, which means that any custom behaviour you have
- added for these methods will not be executed, including anything driven from the
- normal database object :ref:`signals <ref-signals>`.
- Use foreign key values directly
- -------------------------------
- If you only need a foreign key value, use the foreign key value that is already on
- the object you've got, rather than getting the whole related object and taking
- its primary key. i.e. do::
- entry.blog_id
- instead of::
- entry.blog.id
|