2
0

search.txt 9.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254
  1. ================
  2. Full text search
  3. ================
  4. The database functions in the ``django.contrib.postgres.search`` module ease
  5. the use of PostgreSQL's `full text search engine
  6. <https://www.postgresql.org/docs/current/textsearch.html>`_.
  7. For the examples in this document, we'll use the models defined in
  8. :doc:`/topics/db/queries`.
  9. .. seealso::
  10. For a high-level overview of searching, see the :doc:`topic documentation
  11. </topics/db/search>`.
  12. .. currentmodule:: django.contrib.postgres.search
  13. The ``search`` lookup
  14. =====================
  15. .. fieldlookup:: search
  16. The simplest way to use full text search is to search a single term against a
  17. single column in the database. For example::
  18. >>> Entry.objects.filter(body_text__search='Cheese')
  19. [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
  20. This creates a ``to_tsvector`` in the database from the ``body_text`` field
  21. and a ``plainto_tsquery`` from the search term ``'Cheese'``, both using the
  22. default database search configuration. The results are obtained by matching the
  23. query and the vector.
  24. To use the ``search`` lookup, ``'django.contrib.postgres'`` must be in your
  25. :setting:`INSTALLED_APPS`.
  26. ``SearchVector``
  27. ================
  28. .. class:: SearchVector(*expressions, config=None, weight=None)
  29. Searching against a single field is great but rather limiting. The ``Entry``
  30. instances we're searching belong to a ``Blog``, which has a ``tagline`` field.
  31. To query against both fields, use a ``SearchVector``::
  32. >>> from django.contrib.postgres.search import SearchVector
  33. >>> Entry.objects.annotate(
  34. ... search=SearchVector('body_text', 'blog__tagline'),
  35. ... ).filter(search='Cheese')
  36. [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
  37. The arguments to ``SearchVector`` can be any
  38. :class:`~django.db.models.Expression` or the name of a field. Multiple
  39. arguments will be concatenated together using a space so that the search
  40. document includes them all.
  41. ``SearchVector`` objects can be combined together, allowing you to reuse them.
  42. For example::
  43. >>> Entry.objects.annotate(
  44. ... search=SearchVector('body_text') + SearchVector('blog__tagline'),
  45. ... ).filter(search='Cheese')
  46. [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
  47. See :ref:`postgresql-fts-search-configuration` and
  48. :ref:`postgresql-fts-weighting-queries` for an explanation of the ``config``
  49. and ``weight`` parameters.
  50. ``SearchQuery``
  51. ===============
  52. .. class:: SearchQuery(value, config=None, search_type='plain')
  53. ``SearchQuery`` translates the terms the user provides into a search query
  54. object that the database compares to a search vector. By default, all the words
  55. the user provides are passed through the stemming algorithms, and then it
  56. looks for matches for all of the resulting terms.
  57. If ``search_type`` is ``'plain'``, which is the default, the terms are treated
  58. as separate keywords. If ``search_type`` is ``'phrase'``, the terms are treated
  59. as a single phrase. If ``search_type`` is ``'raw'``, then you can provide a
  60. formatted search query with terms and operators. Read PostgreSQL's `Full Text
  61. Search docs`_ to learn about differences and syntax. Examples:
  62. .. _Full Text Search docs: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
  63. >>> from django.contrib.postgres.search import SearchQuery
  64. >>> SearchQuery('red tomato') # two keywords
  65. >>> SearchQuery('tomato red') # same results as above
  66. >>> SearchQuery('red tomato', search_type='phrase') # a phrase
  67. >>> SearchQuery('tomato red', search_type='phrase') # a different phrase
  68. >>> SearchQuery("'tomato' & ('red' | 'green')", search_type='raw') # boolean operators
  69. ``SearchQuery`` terms can be combined logically to provide more flexibility::
  70. >>> from django.contrib.postgres.search import SearchQuery
  71. >>> SearchQuery('meat') & SearchQuery('cheese') # AND
  72. >>> SearchQuery('meat') | SearchQuery('cheese') # OR
  73. >>> ~SearchQuery('meat') # NOT
  74. See :ref:`postgresql-fts-search-configuration` for an explanation of the
  75. ``config`` parameter.
  76. .. versionadded:: 2.2
  77. The `search_type` parameter was added.
  78. ``SearchRank``
  79. ==============
  80. .. class:: SearchRank(vector, query, weights=None)
  81. So far, we've just returned the results for which any match between the vector
  82. and the query are possible. It's likely you may wish to order the results by
  83. some sort of relevancy. PostgreSQL provides a ranking function which takes into
  84. account how often the query terms appear in the document, how close together
  85. the terms are in the document, and how important the part of the document is
  86. where they occur. The better the match, the higher the value of the rank. To
  87. order by relevancy::
  88. >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
  89. >>> vector = SearchVector('body_text')
  90. >>> query = SearchQuery('cheese')
  91. >>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
  92. [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
  93. See :ref:`postgresql-fts-weighting-queries` for an explanation of the
  94. ``weights`` parameter.
  95. .. _postgresql-fts-search-configuration:
  96. Changing the search configuration
  97. =================================
  98. You can specify the ``config`` attribute to a :class:`SearchVector` and
  99. :class:`SearchQuery` to use a different search configuration. This allows using
  100. different language parsers and dictionaries as defined by the database::
  101. >>> from django.contrib.postgres.search import SearchQuery, SearchVector
  102. >>> Entry.objects.annotate(
  103. ... search=SearchVector('body_text', config='french'),
  104. ... ).filter(search=SearchQuery('œuf', config='french'))
  105. [<Entry: Pain perdu>]
  106. The value of ``config`` could also be stored in another column::
  107. >>> from django.db.models import F
  108. >>> Entry.objects.annotate(
  109. ... search=SearchVector('body_text', config=F('blog__language')),
  110. ... ).filter(search=SearchQuery('œuf', config=F('blog__language')))
  111. [<Entry: Pain perdu>]
  112. .. _postgresql-fts-weighting-queries:
  113. Weighting queries
  114. =================
  115. Every field may not have the same relevance in a query, so you can set weights
  116. of various vectors before you combine them::
  117. >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
  118. >>> vector = SearchVector('body_text', weight='A') + SearchVector('blog__tagline', weight='B')
  119. >>> query = SearchQuery('cheese')
  120. >>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by('rank')
  121. The weight should be one of the following letters: D, C, B, A. By default,
  122. these weights refer to the numbers ``0.1``, ``0.2``, ``0.4``, and ``1.0``,
  123. respectively. If you wish to weight them differently, pass a list of four
  124. floats to :class:`SearchRank` as ``weights`` in the same order above::
  125. >>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
  126. >>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by('-rank')
  127. Performance
  128. ===========
  129. Special database configuration isn't necessary to use any of these functions,
  130. however, if you're searching more than a few hundred records, you're likely to
  131. run into performance problems. Full text search is a more intensive process
  132. than comparing the size of an integer, for example.
  133. In the event that all the fields you're querying on are contained within one
  134. particular model, you can create a functional index which matches the search
  135. vector you wish to use. The PostgreSQL documentation has details on
  136. `creating indexes for full text search
  137. <https://www.postgresql.org/docs/current/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX>`_.
  138. ``SearchVectorField``
  139. ---------------------
  140. .. class:: SearchVectorField
  141. If this approach becomes too slow, you can add a ``SearchVectorField`` to your
  142. model. You'll need to keep it populated with triggers, for example, as
  143. described in the `PostgreSQL documentation`_. You can then query the field as
  144. if it were an annotated ``SearchVector``::
  145. >>> Entry.objects.update(search_vector=SearchVector('body_text'))
  146. >>> Entry.objects.filter(search_vector='cheese')
  147. [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
  148. .. _PostgreSQL documentation: https://www.postgresql.org/docs/current/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS
  149. Trigram similarity
  150. ==================
  151. Another approach to searching is trigram similarity. A trigram is a group of
  152. three consecutive characters. In addition to the :lookup:`trigram_similar`
  153. lookup, you can use a couple of other expressions.
  154. To use them, you need to activate the `pg_trgm extension
  155. <https://www.postgresql.org/docs/current/pgtrgm.html>`_ on PostgreSQL. You can
  156. install it using the
  157. :class:`~django.contrib.postgres.operations.TrigramExtension` migration
  158. operation.
  159. ``TrigramSimilarity``
  160. ---------------------
  161. .. class:: TrigramSimilarity(expression, string, **extra)
  162. Accepts a field name or expression, and a string or expression. Returns the
  163. trigram similarity between the two arguments.
  164. Usage example::
  165. >>> from django.contrib.postgres.search import TrigramSimilarity
  166. >>> Author.objects.create(name='Katy Stevens')
  167. >>> Author.objects.create(name='Stephen Keats')
  168. >>> test = 'Katie Stephens'
  169. >>> Author.objects.annotate(
  170. ... similarity=TrigramSimilarity('name', test),
  171. ... ).filter(similarity__gt=0.3).order_by('-similarity')
  172. [<Author: Katy Stevens>, <Author: Stephen Keats>]
  173. ``TrigramDistance``
  174. -------------------
  175. .. class:: TrigramDistance(expression, string, **extra)
  176. Accepts a field name or expression, and a string or expression. Returns the
  177. trigram distance between the two arguments.
  178. Usage example::
  179. >>> from django.contrib.postgres.search import TrigramDistance
  180. >>> Author.objects.create(name='Katy Stevens')
  181. >>> Author.objects.create(name='Stephen Keats')
  182. >>> test = 'Katie Stephens'
  183. >>> Author.objects.annotate(
  184. ... distance=TrigramDistance('name', test),
  185. ... ).filter(distance__lte=0.7).order_by('distance')
  186. [<Author: Katy Stevens>, <Author: Stephen Keats>]