Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Create an ingest pipeline workflow step #24

Closed
joshpalis opened this issue Sep 7, 2023 · 2 comments · Fixed by #558
Closed

[FEATURE] Create an ingest pipeline workflow step #24

joshpalis opened this issue Sep 7, 2023 · 2 comments · Fixed by #558
Assignees
Labels
enhancement New feature or request

Comments

@joshpalis
Copy link
Member

joshpalis commented Sep 7, 2023

Is your feature request related to a problem?

Implement workflow set up interface to facilitate the creation of an ingest pipeline given the pipeline name, list of processors and their associated inputs.

This first implementation for the semantic search use case should construct an ingestion pipeline using the text_embedding pre-processor which handles transforming IngestDocuments into vector embeddings prior to being indexed into a KNN index and stores the pipeline ID within the GlobalContext index. This pipeline ID will then be used to set as the default_pipeline of the KNN index.

The implementation should parse the use case template to retrieve the following inputs for the text_embedding pre-processor:

  • input_field_name
  • output_field_name

The implementation should leverage the provided client to read the model_id from the global context.

Using this information, create the PutPipelineRequest payload:

PUT _ingest/pipeline/neural-pipeline
{
  "description": "An example neural search pipeline",
  "processors" : [
    {
      "text_embedding": {
        "model_id": "<model_id>",
        "field_map": {
           "<input_field_name>": "<output_field_name>"
        }
      }
    }
  ]
}

Do you have any additional context?

Creation of the ingest pipeline should be done via a PutPipelineRequest using the transport client. Reference : https://github.com/opensearch-project/geospatial/blob/main/src/main/java/org/opensearch/geospatial/action/upload/geojson/PipelineManager.java#L92

@joshpalis joshpalis added enhancement New feature or request untriaged labels Sep 7, 2023
@joshpalis joshpalis self-assigned this Sep 7, 2023
@owaiskazi19
Copy link
Member

owaiskazi19 commented Sep 13, 2023

@joshpalis this issue shouldn't just talk about neural pipeline but any ingest pipeline which we can create. Our building blocks shouldn't be tightly coupled. There should be a type associated while creating the pipeline. If it's neural and the processor is text_embedding the above pipeline would be created then.

@joshpalis
Copy link
Member Author

Sure ill make this generic for any ingest pipeline. I do not agree with a type associated with the pipeline, pipelines should just have a pipelineID and the list of processors that were chosen, along with the necessary inputs. Having a type associated with these pipelines would require unnecessary mapping.

@joshpalis joshpalis changed the title [FEATURE] Create an ingest pipeline for neural search [FEATURE] Create an ingest pipeline workflow step Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants