search.txt 5.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
  1. ======
  2. Search
  3. ======
  4. A common task for web applications is to search some data in the database with
  5. user input. In a simple case, this could be filtering a list of objects by a
  6. category. A more complex use case might require searching with weighting,
  7. categorization, highlighting, multiple languages, and so on. This document
  8. explains some of the possible use cases and the tools you can use.
  9. We'll refer to the same models used in :doc:`/topics/db/queries`.
  10. Use Cases
  11. =========
  12. Standard textual queries
  13. ------------------------
  14. Text-based fields have a selection of simple matching operations. For example,
  15. you may wish to allow lookup of an author like so::
  16. >>> Author.objects.filter(name__contains='Terry')
  17. [<Author: Terry Gilliam>, <Author: Terry Jones>]
  18. This is a very fragile solution as it requires the user to know an exact
  19. substring of the author's name. A better approach could be a case-insensitive
  20. match (:lookup:`icontains`), but this is only marginally better.
  21. A database's more advanced comparison functions
  22. -----------------------------------------------
  23. If you're using PostgreSQL, Django provides :doc:`a selection of database
  24. specific tools </ref/contrib/postgres/search>` to allow you to leverage more
  25. complex querying options. Other databases have different selections of tools,
  26. possibly via plugins or user-defined functions. Django doesn't include any
  27. support for them at this time. We'll use some examples from PostgreSQL to
  28. demonstrate the kind of functionality databases may have.
  29. .. admonition:: Searching in other databases
  30. All of the searching tools provided by :mod:`django.contrib.postgres` are
  31. constructed entirely on public APIs such as :doc:`custom lookups
  32. </ref/models/lookups>` and :doc:`database functions
  33. </ref/models/database-functions>`. Depending on your database, you should
  34. be able to construct queries to allow similar APIs. If there are specific
  35. things which cannot be achieved this way, please open a ticket.
  36. In the above example, we determined that a case insensitive lookup would be
  37. more useful. When dealing with non-English names, a further improvement is to
  38. use :lookup:`unaccented comparison <unaccent>`::
  39. >>> Author.objects.filter(name__unaccent__icontains='Helen')
  40. [<Author: Helen Mirren>, <Author: Helena Bonham Carter>, <Author: Hélène Joy>]
  41. This shows another issue, where we are matching against a different spelling of
  42. the name. In this case we have an asymmetry though - a search for ``Helen``
  43. will pick up ``Helena`` or ``Hélène``, but not the reverse. Another option
  44. would be to use a :lookup:`trigram_similar` comparison, which compares
  45. sequences of letters.
  46. For example::
  47. >>> Author.objects.filter(name__unaccent__lower__trigram_similar='Hélène')
  48. [<Author: Helen Mirren>, <Author: Hélène Joy>]
  49. Now we have a different problem - the longer name of "Helena Bonham Carter"
  50. doesn't show up as it is much longer. Trigram searches consider all
  51. combinations of three letters, and compares how many appear in both search and
  52. source strings. For the longer name, there are more combinations which appear
  53. in the source string so it is no longer considered a close match.
  54. The correct choice of comparison functions here depends on your particular data
  55. set, for example the language(s) used and the type of text being searched. All
  56. of the examples we've seen are on short strings where the user is likely to
  57. enter something close (by varying definitions) to the source data.
  58. Document-based search
  59. ---------------------
  60. Simple database operations are too simple an approach when you start
  61. considering large blocks of text. Whereas the examples above can be thought of
  62. as operations on a string of characters, full text search looks at the actual
  63. words. Depending on the system used, it's likely to use some of the following
  64. ideas:
  65. - Ignoring "stop words" such as "a", "the", "and".
  66. - Stemming words, so that "pony" and "ponies" are considered similar.
  67. - Weighting words based on different criteria such as how frequently they
  68. appear in the text, or the importance of the fields, such as the title or
  69. keywords, that they appear in.
  70. There are many alternatives for using searching software, some of the most
  71. prominent are Elastic_ and Solr_. These are full document-based search
  72. solutions. To use them with data from Django models, you'll need a layer which
  73. translates your data into a textual document, including back-references to the
  74. database ids. When a search using the engine returns a certain document, you
  75. can then look it up in the database. There are a variety of third-party
  76. libraries which are designed to help with this process.
  77. .. _Elastic: https://www.elastic.co/
  78. .. _Solr: https://lucene.apache.org/solr/
  79. PostgreSQL support
  80. ~~~~~~~~~~~~~~~~~~
  81. PostgreSQL has its own full text search implementation built-in. While not as
  82. powerful as some other search engines, it has the advantage of being inside
  83. your database and so can easily be combined with other relational queries such
  84. as categorization.
  85. The :mod:`django.contrib.postgres` module provides some helpers to make these
  86. queries. For example, a simple query might be to select all the blog entries
  87. which mention "cheese"::
  88. >>> Entry.objects.filter(body_text__search='cheese')
  89. [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
  90. You can also filter on a combination of fields and on related models::
  91. >>> Entry.objects.annotate(
  92. ... search=SearchVector('blog__tagline', 'body_text'),
  93. ... ).filter(search='cheese')
  94. [
  95. <Entry: Cheese on Toast recipes>,
  96. <Entry: Pizza Recipes>,
  97. <Entry: Dairy farming in Argentina>,
  98. ]
  99. See the ``contrib.postgres`` :doc:`/ref/contrib/postgres/search` document for
  100. complete details.