Include offset for cropped fields #769
Replies: 2 comments
-
@solonovamax feel free to add more details about your needs here |
Beta Was this translation helpful? Give feedback.
-
Yes, this would be an important feature for me as well. I will not use any HTML prerendering feature to show highlights due to XSS concerns, but I do want to use the cropping feature for search results and I do want to highlight matches. This is currently plainly impossible because, while I get a cropped string, I only have information about the matches' positions in the uncropped attribute. Effectively, this means there is no way to securely highlight matches in cropped attributes. Would it be going too far to call this a potential security issue? To avoid further processing of the response, it'd be ideal to get match positions within the cropped attribute to avoid having to manually calculate offsets. And while we're at it, it'd be much more usable to (at least optionally) get these indexes as char not byte indices. Or, even better, actually get the entire response as segments of text that are marked as highlighted or not highlighted. Below Typescript function is what I currently use to make the response usable in a web frontend (only that it doesn't work for cropped content). Ideally, it would be possible to have meilisearch respond in this format directly: export type MatchPosition = { start: number; length: number };
export type MarkedText =
| { type: "standard"; text: string }
| { type: "highlighted"; text: string };
export function highlightMatches(
input: string,
matches: MatchPosition[]
): MarkedText[] {
const encoder = new TextEncoder();
const decoder = new TextDecoder("utf-8");
const byteArray = encoder.encode(input);
const result: MarkedText[] = [];
let currentByteIndex = 0;
for (const { start, length } of matches) {
// Add any text before the match as "standard"
if (currentByteIndex < start) {
result.push({
type: "standard",
text: decoder.decode(byteArray.slice(currentByteIndex, start)),
});
}
// Add the matched text as "highlighted"
result.push({
type: "highlighted",
text: decoder.decode(byteArray.slice(start, start + length)),
});
currentByteIndex = start + length;
}
// Add any remaining text after the last match as "standard"
if (currentByteIndex < byteArray.length) {
result.push({
type: "standard",
text: decoder.decode(byteArray.slice(currentByteIndex)),
});
}
return result;
} |
Beta Was this translation helpful? Give feedback.
-
Originally posted here by @solonovamax
I spent like 20 minutes writing an issue and then my browser tab decided to crash and killed all my progress, and I'm kinda annoyed about that so I'm just gonna be brief here and add more details later.
Currently, when both
showMatchesPosition
andattributesToCrop
are in use, it is difficult to use the matches position as you do not know where the crop begins.I would like to propose adding
[field]_start
and[field]_length
properties to the returned json when searching. For example, searching forminim
might return this result:then, this can be used as follows (in pseudocode)
my specific usecase: I want to add match highlighting to search results for my application. However, I cannot use the built-in highlighting as that would wrap html tags around the matches, as the content is user-supplied. So if I didn't escape the html, it would open up an avenue to XSS. But since it's escaped, wrapping the matches with html tags does not work.
And, I can't use the offset provided in the match positions, as I do not know where the content was cropped, so I don't know where the match is.
An alternative syntax to the one I suggested could be:
Beta Was this translation helpful? Give feedback.
All reactions