Skip to content

Commit

Permalink
🐛 Add digits in searched keywords (#193)
Browse files Browse the repository at this point in the history
Also conserve combining characters, that are characters that are intended to modify other characters.

Close Bug avec les titres contenant des guillemets anglais (" ")  #25
  • Loading branch information
lnoss authored Jan 26, 2024
1 parent 8431fe3 commit a7e2d46
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion ophirofox/content_scripts/europresse_search.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,17 @@ async function onLoad() {
const search_terms = await consumeSearchTerms();
if (!search_terms) return;
const stopwords = new Set(['d', 'l', 'et', 'sans']);

/*
L = { Lu , Ll , Lt , Lm , Lo }
M = { Mn , Mc , Me }
Nd: a decimal digit
Unicode specification: https://www.unicode.org/reports/tr44/#General_Category_Values
Categories browser: https://www.compart.com/fr/unicode/category
*/
const keywords = search_terms
.replace(/œ/g, 'oe')
.split(/[^\p{L}]+/u)
.split(/[^\p{L}\p{M}\p{Nd}]+/u)
.filter(w => !stopwords.has(w))
.join(' ');
const keyword_field = document.getElementById("Keywords");
Expand Down

0 comments on commit a7e2d46

Please sign in to comment.