[FEATURE] Create an ingest pipeline workflow step #24

joshpalis · 2023-09-07T16:27:27Z

Is your feature request related to a problem?

Implement workflow set up interface to facilitate the creation of an ingest pipeline given the pipeline name, list of processors and their associated inputs.

This first implementation for the semantic search use case should construct an ingestion pipeline using the text_embedding pre-processor which handles transforming IngestDocuments into vector embeddings prior to being indexed into a KNN index and stores the pipeline ID within the GlobalContext index. This pipeline ID will then be used to set as the default_pipeline of the KNN index.

The implementation should parse the use case template to retrieve the following inputs for the text_embedding pre-processor:

input_field_name
output_field_name

The implementation should leverage the provided client to read the model_id from the global context.

Using this information, create the PutPipelineRequest payload:

PUT _ingest/pipeline/neural-pipeline
{
  "description": "An example neural search pipeline",
  "processors" : [
    {
      "text_embedding": {
        "model_id": "<model_id>",
        "field_map": {
           "<input_field_name>": "<output_field_name>"
        }
      }
    }
  ]
}

Do you have any additional context?

Creation of the ingest pipeline should be done via a PutPipelineRequest using the transport client. Reference : https://github.com/opensearch-project/geospatial/blob/main/src/main/java/org/opensearch/geospatial/action/upload/geojson/PipelineManager.java#L92

The text was updated successfully, but these errors were encountered:

owaiskazi19 · 2023-09-13T18:32:58Z

@joshpalis this issue shouldn't just talk about neural pipeline but any ingest pipeline which we can create. Our building blocks shouldn't be tightly coupled. There should be a type associated while creating the pipeline. If it's neural and the processor is text_embedding the above pipeline would be created then.

joshpalis · 2023-09-13T19:57:20Z

Sure ill make this generic for any ingest pipeline. I do not agree with a type associated with the pipeline, pipelines should just have a pipelineID and the list of processors that were chosen, along with the necessary inputs. Having a type associated with these pipelines would require unnecessary mapping.

joshpalis added enhancement New feature or request untriaged labels Sep 7, 2023

joshpalis self-assigned this Sep 7, 2023

joshpalis mentioned this issue Sep 6, 2023

[META] Create Infrastructure for the workflow #21

Closed

7 tasks

joshpalis changed the title ~~[FEATURE] Create an ingest pipeline for neural search~~ [FEATURE] Create an ingest pipeline workflow step Sep 13, 2023

joshpalis mentioned this issue Sep 18, 2023

Adds Create Ingest Pipeline Step #44

Merged

joshpalis closed this as completed Sep 25, 2023

dbwiddis mentioned this issue Sep 25, 2023

Add WorkflowStep Factory and implement XContent-based Template Parsing #47

Merged

minalsha removed the untriaged label Sep 25, 2023

amitgalitz mentioned this issue Mar 11, 2024

Adding create ingest pipeline step #558

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Create an ingest pipeline workflow step #24

[FEATURE] Create an ingest pipeline workflow step #24

joshpalis commented Sep 7, 2023 •

edited

Loading

owaiskazi19 commented Sep 13, 2023 •

edited

Loading

joshpalis commented Sep 13, 2023

[FEATURE] Create an ingest pipeline workflow step #24

[FEATURE] Create an ingest pipeline workflow step #24

Comments

joshpalis commented Sep 7, 2023 • edited Loading

Is your feature request related to a problem?

Do you have any additional context?

owaiskazi19 commented Sep 13, 2023 • edited Loading

joshpalis commented Sep 13, 2023

joshpalis commented Sep 7, 2023 •

edited

Loading

owaiskazi19 commented Sep 13, 2023 •

edited

Loading