Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Obs AI Assistant] Make content from Search connectors fully searchable #175434

Closed
miltonhultgren opened this issue Jan 24, 2024 · 14 comments
Closed
Labels
blocked Team:Obs AI Assistant Observability AI Assistant

Comments

@miltonhultgren
Copy link
Contributor

miltonhultgren commented Jan 24, 2024

Today, if we ingest a large piece of text into a Knowledge base entry, only the first 512 word pieces are used for creating the embeddings that ELSER uses to match on during semantic search.

This means that if the relevant parts for the query is not that the "start" of this big text, it won't match even though there may be critical information at the end of this text.

We should attempt to apply chunking to all documents ingested into the Knowledge base so that the recall search has a better chance of finding relevant hits, regardless of their size.

As a stretch, it would also be valuable if it was possible to extract only the relevant chunk (512 word pieces?) from the matched document in order to send less (and only relevant) text to the LLM.

AC

  • Large texts imported into the Knowledge base get embeddings that cover the full text
  • The Ingest pipeline used to apply the chunking is shared in docs so users can apply it to their search-* indices as well
  • Recall is able to search across small Knowledge base documents ("single" embedding) and large documents ("multiple" embeddings) in a seamless manner
  • (Stretch) Only the relevant part of a "multiple embeddings" document is passed to the LLM

More resources on chunking https://github.com/elastic/elasticsearch-labs/tree/main/notebooks/document-chunking

@miltonhultgren miltonhultgren added the Team:obs-knowledge Observability Experience Knowledge team label Jan 24, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-knowledge-team (Team:obs-knowledge)

@miltonhultgren miltonhultgren self-assigned this Jan 24, 2024
@miltonhultgren
Copy link
Contributor Author

If we want to retrieve multiple passages from the same text document, we need to split them before ingesting them and store 1 document per passage.
The recommended chunk size for ELSER is 512 but to make the search more coherent it's also recommended to overlap the chunks by 256 tokens.

@dgieselaar
Copy link
Member

If we want to retrieve multiple passages from the same text document, we need to split them before ingesting them and store 1 document per passage.

Do you mean that we can only select a subset of passages if we split them up into separate documents?

@miltonhultgren
Copy link
Contributor Author

Yes, at least that is my understanding after talking to the AI Search folks.

Assuming you have a large document, and you create nested fields of each passage and create embeddings for each passage.
You'll be able to use knn with inner_hits to search across all passages but it will still give back the whole document (and perhaps some information about which passage caused the match), but you can't pull out more than one passage this way (even setting the k value of the knn to higher, that will just give you more whole document hits with a single passage).

So to get multiple passage hits we need to store multiple documents in ES, which would then let us turn up the k value in our search to find possibly multiple hits from the same original large document text. Not sure if semantic_text would change this.

@miltonhultgren
Copy link
Contributor Author

miltonhultgren commented Feb 2, 2024

Do you mean that we can only select a subset of passages if we split them up into separate documents?

@dgieselaar The thing I said above is true for using knn (I've asked if this will change at some point), but if you're using ELSER you cannot use knn (dense vector vs sparse vector), so you need to stick to text_expansion queries which also support inner_hits but in this case can give back more than 1 hit.

So as long as we use ELSER (or rather some model that produces sparse_vector) for the chunking, we can search across a large document and return X number of passages in that document that matched.

Example query:

GET wiki-dual_semantic*/_search
{
  "query": {
    "nested": {
      "path": "passages",
      "query": {
        "text_expansion": {
          "passages.sparse": {
            "model_id": ".elser_model_2_linux-x86_64",
            "model_text": "Where is the Eiffel Tower?"
           }
        }
      },
      "inner_hits": {
        "_source": false,
        "size": 5,
        "fields": [
        "passages.text"
      ]
     }
    }
  },
  "_source": false,
  "fields": [
    "title"
  ]
}

Pseudo query for multi model hybrid search:

GET my-index/_search
{
query: {
  bool: {
    should: [
      { text_expansion }, // on nested field1, with inner_hits
      { text_expansion }, // on nested field2, with inner_hits
      { match_phrase }, // on nested field3
    ]
  }
}.
knn: [
  {
    "field": "image-vector",
    "query_vector": [-5, 9, -12],
    "k": 10,
    "num_candidates": 100
   // with inner_hits
  },
  {
    "field": "image-vector",
    "query_vector": [-5, 9, -12],
    "k": 10,
    "num_candidates": 100
    // with inner_hits
  }
],
 "rank": {
        "rrf": {
            "window_size": 50,
            "rank_constant": 20
        }
    }
}

@dgieselaar
Copy link
Member

@miltonhultgren that sounds good AFAICT, do you see any concerns?

@miltonhultgren
Copy link
Contributor Author

miltonhultgren commented Feb 2, 2024

KNN supports multiple inner hits in 8.13 🚀

I haven't gotten to really trying these things out yet. It seems the path is being paved for us here (and semantic_text will only make it easier).
A lot of the things I've looked at are out of scope for this issue and will be things we can plan for future iterations.

For this issue I will stick to using ELSER, chunking into a nested object, using a nested query with text_expansion and inner_hits to grab multiple relevant passages.

I have two small concerns for this ticket:

  1. Should we aim to support keyword/hybrid search (using a normal text match BM25 query with/without RRF)?
  2. I'm not sure I fully understand how to apply the chunking yet, in particular the "512 size, 256 overlap"]

Number 1 would be in case, for example, there isn't any embeddings in a search-* index or there are only dense_vector embeddings, we could still fallback on keyword search and maybe find good matches that way.
That could also allow users to use our Knowledge base without ELSER installed.
I'm leaning towards deferring that until later though (together with multi model support), do you agree @dgieselaar ?

I'm going to research number 2 next.

@miltonhultgren
Copy link
Contributor Author

Sample query combining nested query match and inner_hits with knn and inner_hits sorted with RRF:

GET wikipedia_*/_search
{
  "size": 5,
  "_source": false,
  "fields": [
    "title",
    "passages.text"
  ], 
  "query": {
    "nested": {
      "path": "passages",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "passages.text": "who is batman"
              }
            }
          ]
        }
      },
      "inner_hits": {
        "name": "query",
        "_source": false,
        "fields": [
          "passages.text"
        ]
      }
    }
  },
  "knn": {
    "inner_hits": {
      "name": "knn",
      "_source": false,
      "fields": [
        "passages.text"
      ]
    },
    "field": "passages.embeddings",
    "k": 5,
    "num_candidates": 100,
    "query_vector_builder": {
      "text_embedding": {
        "model_id": "sentence-transformers__all-distilroberta-v1",
        "model_text": "who is batman"
      }
    }
  },
  "rank": {
    "rrf": {}
  }
}

@miltonhultgren
Copy link
Contributor Author

Would it be desired/ideal to perform a single ranked search across text, dense and sparse vectors but also across all indices at once? Rather than per source (knowledge base, search connectors in different indices)? What are the trade offs for that?

How would one combine that with "API search", meaning searches that hit an API rather than Elasticsearch? Just thinking out loud here for the future.

@dgieselaar
Copy link
Member

dgieselaar commented Feb 6, 2024

@miltonhultgren yes it would be preferable (a single search), but we have different privilege models for the knowledge base versus search-* - the former uses the internal user, and the latter uses the current user, so we cannot (at least to my understanding) execute it as a single search request.

@miltonhultgren
Copy link
Contributor Author

We're waiting for semantic_text to be available since it will handle chunking for us, at that point this ticket can be re-written to reflect the work needed to migrate the Knowledge base to use semantic_text instead.

@emma-raffenne emma-raffenne added this to the 8.15 milestone Apr 17, 2024
@sorenlouv
Copy link
Member

Update: This is still blocked by semantic_text

@emma-raffenne emma-raffenne changed the title Make Knowledge base articles fully searchable Make content from Search connectors fully searchable Jun 27, 2024
@emma-raffenne emma-raffenne changed the title Make content from Search connectors fully searchable [Obs AI Assistant] Make content from Search connectors fully searchable Jun 27, 2024
@emma-raffenne emma-raffenne removed this from the 8.15 milestone Jun 27, 2024
@emma-raffenne emma-raffenne added Team:Obs AI Assistant Observability AI Assistant and removed Team:obs-knowledge Observability Experience Knowledge team labels Jun 27, 2024
@emma-raffenne
Copy link
Contributor

This will be solved by elastic/obs-ai-assistant-team#162

@emma-raffenne emma-raffenne closed this as not planned Won't fix, can't repro, duplicate, stale Oct 22, 2024
@miltonhultgren
Copy link
Contributor Author

Just to clarify, this issue is about content ingested via Search connectors, which have their own mappings and ingest pipelines (i.e. different from the knowledge base index).

The linked issue makes no mention of changing the Search connector mappings or ingest pipelines, so we'd need to verify if they already are using semantic_text to generate embeddings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Team:Obs AI Assistant Observability AI Assistant
Projects
None yet
Development

No branches or pull requests

5 participants