-
Dylan Griffith authored
The previous algorithm was a heuristic approach which looked for a term inside a highlighted fragment. The term in the fragment was then used again to find the matching line in the actual content. This ocassionally gave incorrect results where whatever happened to appear to be highlighted was also found earlier in the document written in a different way which was not intended to be a match. The [Elasticsearch highlighter]( https://www.elastic.co/guide/en/elasticsearch/reference/current/highlighting.html ) supports setting `number_of_fragments` to `0` which means that the highlighted result will actual be the entire content itself rather than small fragments of content. This makes it much easier to figure out the line number since we can just loop through this to begin with and stop as soon as we find the opening highlight tag. This does come at the cost that all docs are returned in the highlight section now which makes the Elasticsearch response payload approximately twice as large but it is the only correct way to do it that I could find. An alternative described in [the docs]( https://www.elastic.co/guide/en/elasticsearch/reference/current/highlighting.html ) is to use `boundary_scanner` but this requires the [Fast vector highlighter]( https://www.elastic.co/guide/en/elasticsearch/reference/current/highlighting.html#fast-vector-highlighter ) which in term requires us to set `offsets` for our [`index_options`]( https://www.elastic.co/guide/en/elasticsearch/reference/current/index-options.html ) but this will use considerably more storage so we would like to avoid this if possible. It's worth noting that this won't fix the fact that highlighting is not behaving properly in the quoted examples. For that I've created https://gitlab.com/gitlab-org/gitlab/-/issues/254941 which is explaining a very similar problem to this.
f24f5fb5