search.txt 5.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140
  1. ======
  2. Search
  3. ======
  4. A common task for web applications is to search some data in the database with
  5. user input. In a simple case, this could be filtering a list of objects by a
  6. category. A more complex use case might require searching with weighting,
  7. categorization, highlighting, multiple languages, and so on. This document
  8. explains some of the possible use cases and the tools you can use.
  9. We'll refer to the same models used in :doc:`/topics/db/queries`.
  10. Use Cases
  11. =========
  12. Standard textual queries
  13. ------------------------
  14. Text-based fields have a selection of matching operations. For example, you may
  15. wish to allow lookup up an author like so:
  16. .. code-block:: pycon
  17. >>> Author.objects.filter(name__contains="Terry")
  18. [<Author: Terry Gilliam>, <Author: Terry Jones>]
  19. This is a very fragile solution as it requires the user to know an exact
  20. substring of the author's name. A better approach could be a case-insensitive
  21. match (:lookup:`icontains`), but this is only marginally better.
  22. A database's more advanced comparison functions
  23. -----------------------------------------------
  24. If you're using PostgreSQL, Django provides :doc:`a selection of database
  25. specific tools </ref/contrib/postgres/search>` to allow you to leverage more
  26. complex querying options. Other databases have different selections of tools,
  27. possibly via plugins or user-defined functions. Django doesn't include any
  28. support for them at this time. We'll use some examples from PostgreSQL to
  29. demonstrate the kind of functionality databases may have.
  30. .. admonition:: Searching in other databases
  31. All of the searching tools provided by :mod:`django.contrib.postgres` are
  32. constructed entirely on public APIs such as :doc:`custom lookups
  33. </ref/models/lookups>` and :doc:`database functions
  34. </ref/models/database-functions>`. Depending on your database, you should
  35. be able to construct queries to allow similar APIs. If there are specific
  36. things which cannot be achieved this way, please open a ticket.
  37. In the above example, we determined that a case insensitive lookup would be
  38. more useful. When dealing with non-English names, a further improvement is to
  39. use :lookup:`unaccented comparison <unaccent>`:
  40. .. code-block:: pycon
  41. >>> Author.objects.filter(name__unaccent__icontains="Helen")
  42. [<Author: Helen Mirren>, <Author: Helena Bonham Carter>, <Author: Hélène Joy>]
  43. This shows another issue, where we are matching against a different spelling of
  44. the name. In this case we have an asymmetry though - a search for ``Helen``
  45. will pick up ``Helena`` or ``Hélène``, but not the reverse. Another option
  46. would be to use a :lookup:`trigram_similar` comparison, which compares
  47. sequences of letters.
  48. For example:
  49. .. code-block:: pycon
  50. >>> Author.objects.filter(name__unaccent__lower__trigram_similar="Hélène")
  51. [<Author: Helen Mirren>, <Author: Hélène Joy>]
  52. Now we have a different problem - the longer name of "Helena Bonham Carter"
  53. doesn't show up as it is much longer. Trigram searches consider all
  54. combinations of three letters, and compares how many appear in both search and
  55. source strings. For the longer name, there are more combinations that don't
  56. appear in the source string, so it is no longer considered a close match.
  57. The correct choice of comparison functions here depends on your particular data
  58. set, for example the language(s) used and the type of text being searched. All
  59. of the examples we've seen are on short strings where the user is likely to
  60. enter something close (by varying definitions) to the source data.
  61. Document-based search
  62. ---------------------
  63. Standard database operations stop being a useful approach when you start
  64. considering large blocks of text. Whereas the examples above can be thought of
  65. as operations on a string of characters, full text search looks at the actual
  66. words. Depending on the system used, it's likely to use some of the following
  67. ideas:
  68. - Ignoring "stop words" such as "a", "the", "and".
  69. - Stemming words, so that "pony" and "ponies" are considered similar.
  70. - Weighting words based on different criteria such as how frequently they
  71. appear in the text, or the importance of the fields, such as the title or
  72. keywords, that they appear in.
  73. There are many alternatives for using searching software, some of the most
  74. prominent are Elastic_ and Solr_. These are full document-based search
  75. solutions. To use them with data from Django models, you'll need a layer which
  76. translates your data into a textual document, including back-references to the
  77. database ids. When a search using the engine returns a certain document, you
  78. can then look it up in the database. There are a variety of third-party
  79. libraries which are designed to help with this process.
  80. .. _Elastic: https://www.elastic.co/
  81. .. _Solr: https://solr.apache.org/
  82. PostgreSQL support
  83. ~~~~~~~~~~~~~~~~~~
  84. PostgreSQL has its own full text search implementation built-in. While not as
  85. powerful as some other search engines, it has the advantage of being inside
  86. your database and so can easily be combined with other relational queries such
  87. as categorization.
  88. The :mod:`django.contrib.postgres` module provides some helpers to make these
  89. queries. For example, a query might select all the blog entries which mention
  90. "cheese":
  91. .. code-block:: pycon
  92. >>> Entry.objects.filter(body_text__search="cheese")
  93. [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
  94. You can also filter on a combination of fields and on related models:
  95. .. code-block:: pycon
  96. >>> Entry.objects.annotate(
  97. ... search=SearchVector("blog__tagline", "body_text"),
  98. ... ).filter(search="cheese")
  99. [
  100. <Entry: Cheese on Toast recipes>,
  101. <Entry: Pizza Recipes>,
  102. <Entry: Dairy farming in Argentina>,
  103. ]
  104. See the ``contrib.postgres`` :doc:`/ref/contrib/postgres/search` document for
  105. complete details.