search.txt 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442
  1. ================
  2. Full text search
  3. ================
  4. The database functions in the ``django.contrib.postgres.search`` module ease
  5. the use of PostgreSQL's `full text search engine
  6. <https://www.postgresql.org/docs/current/textsearch.html>`_.
  7. For the examples in this document, we'll use the models defined in
  8. :doc:`/topics/db/queries`.
  9. .. seealso::
  10. For a high-level overview of searching, see the :doc:`topic documentation
  11. </topics/db/search>`.
  12. .. currentmodule:: django.contrib.postgres.search
  13. The ``search`` lookup
  14. =====================
  15. .. fieldlookup:: search
  16. A common way to use full text search is to search a single term against a
  17. single column in the database. For example:
  18. .. code-block:: pycon
  19. >>> Entry.objects.filter(body_text__search="Cheese")
  20. [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
  21. This creates a ``to_tsvector`` in the database from the ``body_text`` field
  22. and a ``plainto_tsquery`` from the search term ``'Cheese'``, both using the
  23. default database search configuration. The results are obtained by matching the
  24. query and the vector.
  25. To use the ``search`` lookup, ``'django.contrib.postgres'`` must be in your
  26. :setting:`INSTALLED_APPS`.
  27. ``SearchVector``
  28. ================
  29. .. class:: SearchVector(*expressions, config=None, weight=None)
  30. Searching against a single field is great but rather limiting. The ``Entry``
  31. instances we're searching belong to a ``Blog``, which has a ``tagline`` field.
  32. To query against both fields, use a ``SearchVector``:
  33. .. code-block:: pycon
  34. >>> from django.contrib.postgres.search import SearchVector
  35. >>> Entry.objects.annotate(
  36. ... search=SearchVector("body_text", "blog__tagline"),
  37. ... ).filter(search="Cheese")
  38. [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
  39. The arguments to ``SearchVector`` can be any
  40. :class:`~django.db.models.Expression` or the name of a field. Multiple
  41. arguments will be concatenated together using a space so that the search
  42. document includes them all.
  43. ``SearchVector`` objects can be combined together, allowing you to reuse them.
  44. For example:
  45. .. code-block:: pycon
  46. >>> Entry.objects.annotate(
  47. ... search=SearchVector("body_text") + SearchVector("blog__tagline"),
  48. ... ).filter(search="Cheese")
  49. [<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]
  50. See :ref:`postgresql-fts-search-configuration` and
  51. :ref:`postgresql-fts-weighting-queries` for an explanation of the ``config``
  52. and ``weight`` parameters.
  53. ``SearchQuery``
  54. ===============
  55. .. class:: SearchQuery(value, config=None, search_type='plain')
  56. ``SearchQuery`` translates the terms the user provides into a search query
  57. object that the database compares to a search vector. By default, all the words
  58. the user provides are passed through the stemming algorithms, and then it
  59. looks for matches for all of the resulting terms.
  60. If ``search_type`` is ``'plain'``, which is the default, the terms are treated
  61. as separate keywords. If ``search_type`` is ``'phrase'``, the terms are treated
  62. as a single phrase. If ``search_type`` is ``'raw'``, then you can provide a
  63. formatted search query with terms and operators. If ``search_type`` is
  64. ``'websearch'``, then you can provide a formatted search query, similar to the
  65. one used by web search engines. ``'websearch'`` requires PostgreSQL ≥ 11. Read
  66. PostgreSQL's `Full Text Search docs`_ to learn about differences and syntax.
  67. Examples:
  68. .. _Full Text Search docs: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
  69. .. code-block:: pycon
  70. >>> from django.contrib.postgres.search import SearchQuery
  71. >>> SearchQuery("red tomato") # two keywords
  72. >>> SearchQuery("tomato red") # same results as above
  73. >>> SearchQuery("red tomato", search_type="phrase") # a phrase
  74. >>> SearchQuery("tomato red", search_type="phrase") # a different phrase
  75. >>> SearchQuery("'tomato' & ('red' | 'green')", search_type="raw") # boolean operators
  76. >>> SearchQuery(
  77. ... "'tomato' ('red' OR 'green')", search_type="websearch"
  78. ... ) # websearch operators
  79. ``SearchQuery`` terms can be combined logically to provide more flexibility:
  80. .. code-block:: pycon
  81. >>> from django.contrib.postgres.search import SearchQuery
  82. >>> SearchQuery("meat") & SearchQuery("cheese") # AND
  83. >>> SearchQuery("meat") | SearchQuery("cheese") # OR
  84. >>> ~SearchQuery("meat") # NOT
  85. See :ref:`postgresql-fts-search-configuration` for an explanation of the
  86. ``config`` parameter.
  87. ``SearchRank``
  88. ==============
  89. .. class:: SearchRank(vector, query, weights=None, normalization=None, cover_density=False)
  90. So far, we've returned the results for which any match between the vector and
  91. the query are possible. It's likely you may wish to order the results by some
  92. sort of relevancy. PostgreSQL provides a ranking function which takes into
  93. account how often the query terms appear in the document, how close together
  94. the terms are in the document, and how important the part of the document is
  95. where they occur. The better the match, the higher the value of the rank. To
  96. order by relevancy:
  97. .. code-block:: pycon
  98. >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
  99. >>> vector = SearchVector("body_text")
  100. >>> query = SearchQuery("cheese")
  101. >>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by("-rank")
  102. [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
  103. See :ref:`postgresql-fts-weighting-queries` for an explanation of the
  104. ``weights`` parameter.
  105. Set the ``cover_density`` parameter to ``True`` to enable the cover density
  106. ranking, which means that the proximity of matching query terms is taken into
  107. account.
  108. Provide an integer to the ``normalization`` parameter to control rank
  109. normalization. This integer is a bit mask, so you can combine multiple
  110. behaviors:
  111. .. code-block:: pycon
  112. >>> from django.db.models import Value
  113. >>> Entry.objects.annotate(
  114. ... rank=SearchRank(
  115. ... vector,
  116. ... query,
  117. ... normalization=Value(2).bitor(Value(4)),
  118. ... )
  119. ... )
  120. The PostgreSQL documentation has more details about `different rank
  121. normalization options`_.
  122. .. _different rank normalization options: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING
  123. ``SearchHeadline``
  124. ==================
  125. .. class:: SearchHeadline(expression, query, config=None, start_sel=None, stop_sel=None, max_words=None, min_words=None, short_word=None, highlight_all=None, max_fragments=None, fragment_delimiter=None)
  126. Accepts a single text field or an expression, a query, a config, and a set of
  127. options. Returns highlighted search results.
  128. Set the ``start_sel`` and ``stop_sel`` parameters to the string values to be
  129. used to wrap highlighted query terms in the document. PostgreSQL's defaults are
  130. ``<b>`` and ``</b>``.
  131. Provide integer values to the ``max_words`` and ``min_words`` parameters to
  132. determine the longest and shortest headlines. PostgreSQL's defaults are 35 and
  133. 15.
  134. Provide an integer value to the ``short_word`` parameter to discard words of
  135. this length or less in each headline. PostgreSQL's default is 3.
  136. Set the ``highlight_all`` parameter to ``True`` to use the whole document in
  137. place of a fragment and ignore ``max_words``, ``min_words``, and ``short_word``
  138. parameters. That's disabled by default in PostgreSQL.
  139. Provide a non-zero integer value to the ``max_fragments`` to set the maximum
  140. number of fragments to display. That's disabled by default in PostgreSQL.
  141. Set the ``fragment_delimiter`` string parameter to configure the delimiter
  142. between fragments. PostgreSQL's default is ``" ... "``.
  143. The PostgreSQL documentation has more details on `highlighting search
  144. results`_.
  145. Usage example:
  146. .. code-block:: pycon
  147. >>> from django.contrib.postgres.search import SearchHeadline, SearchQuery
  148. >>> query = SearchQuery("red tomato")
  149. >>> entry = Entry.objects.annotate(
  150. ... headline=SearchHeadline(
  151. ... "body_text",
  152. ... query,
  153. ... start_sel="<span>",
  154. ... stop_sel="</span>",
  155. ... ),
  156. ... ).get()
  157. >>> print(entry.headline)
  158. Sandwich with <span>tomato</span> and <span>red</span> cheese.
  159. See :ref:`postgresql-fts-search-configuration` for an explanation of the
  160. ``config`` parameter.
  161. .. _highlighting search results: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-HEADLINE
  162. .. _postgresql-fts-search-configuration:
  163. Changing the search configuration
  164. =================================
  165. You can specify the ``config`` attribute to a :class:`SearchVector` and
  166. :class:`SearchQuery` to use a different search configuration. This allows using
  167. different language parsers and dictionaries as defined by the database:
  168. .. code-block:: pycon
  169. >>> from django.contrib.postgres.search import SearchQuery, SearchVector
  170. >>> Entry.objects.annotate(
  171. ... search=SearchVector("body_text", config="french"),
  172. ... ).filter(search=SearchQuery("œuf", config="french"))
  173. [<Entry: Pain perdu>]
  174. The value of ``config`` could also be stored in another column:
  175. .. code-block:: pycon
  176. >>> from django.db.models import F
  177. >>> Entry.objects.annotate(
  178. ... search=SearchVector("body_text", config=F("blog__language")),
  179. ... ).filter(search=SearchQuery("œuf", config=F("blog__language")))
  180. [<Entry: Pain perdu>]
  181. .. _postgresql-fts-weighting-queries:
  182. Weighting queries
  183. =================
  184. Every field may not have the same relevance in a query, so you can set weights
  185. of various vectors before you combine them:
  186. .. code-block:: pycon
  187. >>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
  188. >>> vector = SearchVector("body_text", weight="A") + SearchVector(
  189. ... "blog__tagline", weight="B"
  190. ... )
  191. >>> query = SearchQuery("cheese")
  192. >>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by(
  193. ... "rank"
  194. ... )
  195. The weight should be one of the following letters: D, C, B, A. By default,
  196. these weights refer to the numbers ``0.1``, ``0.2``, ``0.4``, and ``1.0``,
  197. respectively. If you wish to weight them differently, pass a list of four
  198. floats to :class:`SearchRank` as ``weights`` in the same order above:
  199. .. code-block:: pycon
  200. >>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
  201. >>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by("-rank")
  202. Performance
  203. ===========
  204. Special database configuration isn't necessary to use any of these functions,
  205. however, if you're searching more than a few hundred records, you're likely to
  206. run into performance problems. Full text search is a more intensive process
  207. than comparing the size of an integer, for example.
  208. In the event that all the fields you're querying on are contained within one
  209. particular model, you can create a functional
  210. :class:`GIN <django.contrib.postgres.indexes.GinIndex>` or
  211. :class:`GiST <django.contrib.postgres.indexes.GistIndex>` index which matches
  212. the search vector you wish to use. For example::
  213. GinIndex(
  214. SearchVector("body_text", "headline", config="english"),
  215. name="search_vector_idx",
  216. )
  217. The PostgreSQL documentation has details on
  218. `creating indexes for full text search
  219. <https://www.postgresql.org/docs/current/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX>`_.
  220. ``SearchVectorField``
  221. ---------------------
  222. .. class:: SearchVectorField
  223. If this approach becomes too slow, you can add a ``SearchVectorField`` to your
  224. model. You'll need to keep it populated with triggers, for example, as
  225. described in the `PostgreSQL documentation`_. You can then query the field as
  226. if it were an annotated ``SearchVector``:
  227. .. code-block:: pycon
  228. >>> Entry.objects.update(search_vector=SearchVector("body_text"))
  229. >>> Entry.objects.filter(search_vector="cheese")
  230. [<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]
  231. .. _PostgreSQL documentation: https://www.postgresql.org/docs/current/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERS
  232. Trigram similarity
  233. ==================
  234. Another approach to searching is trigram similarity. A trigram is a group of
  235. three consecutive characters. In addition to the :lookup:`trigram_similar`,
  236. :lookup:`trigram_word_similar`, and :lookup:`trigram_strict_word_similar`
  237. lookups, you can use a couple of other expressions.
  238. To use them, you need to activate the `pg_trgm extension
  239. <https://www.postgresql.org/docs/current/pgtrgm.html>`_ on PostgreSQL. You can
  240. install it using the
  241. :class:`~django.contrib.postgres.operations.TrigramExtension` migration
  242. operation.
  243. ``TrigramSimilarity``
  244. ---------------------
  245. .. class:: TrigramSimilarity(expression, string, **extra)
  246. Accepts a field name or expression, and a string or expression. Returns the
  247. trigram similarity between the two arguments.
  248. Usage example:
  249. .. code-block:: pycon
  250. >>> from django.contrib.postgres.search import TrigramSimilarity
  251. >>> Author.objects.create(name="Katy Stevens")
  252. >>> Author.objects.create(name="Stephen Keats")
  253. >>> test = "Katie Stephens"
  254. >>> Author.objects.annotate(
  255. ... similarity=TrigramSimilarity("name", test),
  256. ... ).filter(
  257. ... similarity__gt=0.3
  258. ... ).order_by("-similarity")
  259. [<Author: Katy Stevens>, <Author: Stephen Keats>]
  260. ``TrigramWordSimilarity``
  261. -------------------------
  262. .. class:: TrigramWordSimilarity(string, expression, **extra)
  263. Accepts a string or expression, and a field name or expression. Returns the
  264. trigram word similarity between the two arguments.
  265. Usage example:
  266. .. code-block:: pycon
  267. >>> from django.contrib.postgres.search import TrigramWordSimilarity
  268. >>> Author.objects.create(name="Katy Stevens")
  269. >>> Author.objects.create(name="Stephen Keats")
  270. >>> test = "Kat"
  271. >>> Author.objects.annotate(
  272. ... similarity=TrigramWordSimilarity(test, "name"),
  273. ... ).filter(
  274. ... similarity__gt=0.3
  275. ... ).order_by("-similarity")
  276. [<Author: Katy Stevens>]
  277. ``TrigramStrictWordSimilarity``
  278. -------------------------------
  279. .. class:: TrigramStrictWordSimilarity(string, expression, **extra)
  280. Accepts a string or expression, and a field name or expression. Returns the
  281. trigram strict word similarity between the two arguments. Similar to
  282. :class:`TrigramWordSimilarity() <TrigramWordSimilarity>`, except that it forces
  283. extent boundaries to match word boundaries.
  284. ``TrigramDistance``
  285. -------------------
  286. .. class:: TrigramDistance(expression, string, **extra)
  287. Accepts a field name or expression, and a string or expression. Returns the
  288. trigram distance between the two arguments.
  289. Usage example:
  290. .. code-block:: pycon
  291. >>> from django.contrib.postgres.search import TrigramDistance
  292. >>> Author.objects.create(name="Katy Stevens")
  293. >>> Author.objects.create(name="Stephen Keats")
  294. >>> test = "Katie Stephens"
  295. >>> Author.objects.annotate(
  296. ... distance=TrigramDistance("name", test),
  297. ... ).filter(
  298. ... distance__lte=0.7
  299. ... ).order_by("distance")
  300. [<Author: Katy Stevens>, <Author: Stephen Keats>]
  301. ``TrigramWordDistance``
  302. -----------------------
  303. .. class:: TrigramWordDistance(string, expression, **extra)
  304. Accepts a string or expression, and a field name or expression. Returns the
  305. trigram word distance between the two arguments.
  306. Usage example:
  307. .. code-block:: pycon
  308. >>> from django.contrib.postgres.search import TrigramWordDistance
  309. >>> Author.objects.create(name="Katy Stevens")
  310. >>> Author.objects.create(name="Stephen Keats")
  311. >>> test = "Kat"
  312. >>> Author.objects.annotate(
  313. ... distance=TrigramWordDistance(test, "name"),
  314. ... ).filter(
  315. ... distance__lte=0.7
  316. ... ).order_by("distance")
  317. [<Author: Katy Stevens>]
  318. ``TrigramStrictWordDistance``
  319. -----------------------------
  320. .. class:: TrigramStrictWordDistance(string, expression, **extra)
  321. Accepts a string or expression, and a field name or expression. Returns the
  322. trigram strict word distance between the two arguments.