forked from go-gitea/gitea
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Cherry picked from v1.23.0 - Tentative] Updated tokenizer to better …
…matching when search for code snippets (go-gitea#32261) This PR improves the accuracy of Gitea's code search. Currently, Gitea does not consider statements such as `onsole.log("hello")` as hits when the user searches for `log`. The culprit is how both ES and Bleve are tokenizing the file contents (in both cases, `console.log` is a whole token). In ES' case, we changed the tokenizer to [simple_pattern_split](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simplepatternsplit-tokenizer.html#:~:text=The%20simple_pattern_split%20tokenizer%20uses%20a,the%20tokenization%20is%20generally%20faster.). In such a case, tokens are words formed by digits and letters. In Bleve's case, it employs a [letter](https://blevesearch.com/docs/Tokenizers/) tokenizer. Resolves go-gitea#32220 --------- Signed-off-by: Bruno Sofiato <bruno.sofiato@gmail.com>
- Loading branch information
Showing
17 changed files
with
83 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file removed
BIN
-1.73 KB
tests/gitea-repositories-meta/org42/search-by-path.git/objects/info/commit-graph
Binary file not shown.
2 changes: 1 addition & 1 deletion
2
tests/gitea-repositories-meta/org42/search-by-path.git/objects/info/packs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
P pack-393dc29256bc27cb2ec73898507df710be7a3cf5.pack | ||
P pack-a7bef76cf6e2b46bc816936ab69306fb10aea571.pack | ||
|
Binary file removed
BIN
-674 Bytes
...rg42/search-by-path.git/objects/pack/pack-393dc29256bc27cb2ec73898507df710be7a3cf5.bitmap
Binary file not shown.
Binary file removed
BIN
-2.03 KB
...a/org42/search-by-path.git/objects/pack/pack-393dc29256bc27cb2ec73898507df710be7a3cf5.idx
Binary file not shown.
Binary file removed
BIN
-196 Bytes
...a/org42/search-by-path.git/objects/pack/pack-393dc29256bc27cb2ec73898507df710be7a3cf5.rev
Binary file not shown.
Binary file added
BIN
+678 Bytes
...rg42/search-by-path.git/objects/pack/pack-a7bef76cf6e2b46bc816936ab69306fb10aea571.bitmap
Binary file not shown.
Binary file added
BIN
+2.06 KB
...a/org42/search-by-path.git/objects/pack/pack-a7bef76cf6e2b46bc816936ab69306fb10aea571.idx
Binary file not shown.
Binary file renamed
BIN
+6.56 KB
...c29256bc27cb2ec73898507df710be7a3cf5.pack → ...f76cf6e2b46bc816936ab69306fb10aea571.pack
Binary file not shown.
Binary file added
BIN
+200 Bytes
...a/org42/search-by-path.git/objects/pack/pack-a7bef76cf6e2b46bc816936ab69306fb10aea571.rev
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters