Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Vega support with MDS #5927

Closed
Tracked by #5752
huyaboo opened this issue Feb 22, 2024 · 13 comments · Fixed by #5975, #6123 or #6218
Closed
Tracked by #5752

[RFC] Vega support with MDS #5927

huyaboo opened this issue Feb 22, 2024 · 13 comments · Fixed by #5975, #6123 or #6218
Assignees
Labels
enhancement New feature or request multiple datasource multiple datasource project RFC Substantial changes or new features that require community input to garner consensus. v2.13.0

Comments

@huyaboo
Copy link
Member

huyaboo commented Feb 22, 2024

Proposal

Currently, Multiple DataSource (MDS) does not support Vega visualizations. Thus, we propose to create a new optional field from within the Vega-spec data_source_name (tentatively named) that is passed within the url body of the data body. This field will take in a datasource name and under the hood, the visualization will be able to retrieve data from the index located in that datasource. This enables a user to retrieve data from one or more indices from one or more datasources to create custom Vega visualizations.

Here is an example Vega spec that should be supported with this feature. Note that data field can be a singular url object or an array that contains multiple url objects:

data: {
    url: {
      %context%: true
      %timefield%: @timestamp
      index: opensearch_dashboards_sample_data_logs
      datasource: some_datasource_id
      body: {
        aggs: {
          ...
      }
    }
    format: {
      property: aggregations.5
    }
  }

Background

Vega is a declarative visualization grammar that can be used to create and share custom interactive visualizations. It is important to note that Vega-lite is a similar but very much different lightweight visualization grammar. Both of these grammars are supported within Dashboards but with the caveat that the data is retrieved from the index BEFORE any rendering happens. This means that data is NOT dynamically loaded. Additionally, Vega is not supported with MDS since the local cluster is the assumed datasource. This proposal will fix the latter (with the former being out of scope).

It is also important to clarify what is meant by Vega support for MDS. Vega support for MDS can be interpreted in two different ways

  1. Option 1: Vega can support visualizations which fetch data from multiple datasources
    MDS drawio
    In this example, the Vega visualization is stored in Datasource A but references data from Datasources A, B, and C

  2. Option 2: Vega can support visualizations which reference data from the local cluster or any remote datasource (but not both)
    MDS drawio(1)
    In this example, the Vega visualization is stored in Datasource A but references data only from Datasource B and not any other datasource

The proposal seeks to support Option 1. While there are more limitations with option 1 (see the Limitations section), having the option to fetch data from any index from any datasource (provided the user has the permissions) provides a robust visualization experience.

Approach

When Dashboards parses the Vega-spec to render the visualization, it parses the URL object and passes the object into the search API, which uses IOpenSearchSearchRequest as a parameter. This interface provides a field dataSourceId that will tell dashboards to use the data source client. All that the Vega plugin would need to do is check if MDS is enabled and if so, retrieve the associated datasource id from the data_source_name field and pass it into the search query.

Add a datasource field to the UrlObject

export interface UrlObject {
  [index: string]: any;
  [CONSTANTS.TIMEFILTER]?: string;
  [CONSTANTS.CONTEXT]?: boolean;
  [CONSTANTS.LEGACY_CONTEXT]?: string;
  [CONSTANTS.TYPE]?: string;
  name?: string;
  index?: string;
  data_source_name?: string;
  body?: Body;
  size?: number;
  timeout?: string;
}

Have the saved objects client get the associated dataSourceId from the data_source_name via find

Then, in the _searchAPI.search() method, we can pass in the dataSourceId as a parameter

return search({params, dataSourceId} , { abortSignal: this.abortSignal }).pipe(
	tap((data) => this.inspectSearchResult(data, requestResponders[requestId])),
	map((data) => ({
		name: requestId,
		rawResponse: data.rawResponse,
	}))
);

Thus, when the user wants to write a visualization with data from another datasource, they can do something like the following (mockup)
image

Limitations

This approach provides greater flexibility in enabling users to make visualizations. However with great power comes great responsibility.

  1. Importing/exporting Vega saved objects will inevitably break. Because data_source_name is specific to an OpenSearch cluster, if users were to export/import into another cluster and the same datasource names are not configured, the visualizations cannot find the data and thus return errors. This is a tradeoff that can be made but in the future, this would ideally need to be mitigated. See below section Importing Vega saved objects for mitigating some of these issues.
  2. As a sort of add-on to point 1, editing data_source_names to each url that contains one is cumbersome, especially when multiple url objects were involved.

Importing Vega saved objects

In addition to the above requirements, the Vega visualization should have support for importing saved objects. As mentioned in Limitations, full support is a challenge since these the current import logic supports only one datasource. Following similar logic as #5712, this issue will take into consideration the following scenarios:

  1. Importing from non-MDS Vega -> MDS Vega visualization: since the data_source_name will not be present in these visualizations, the field can be added to the Vega spec directly.
  2. Importing from MDS Vega -> MDS Vega: any data_source_name that uses the previous datasource will be updated to use the new data_source_name.

Alternatives

Initially the decision was made to use the data_source_id vs the data_source_name. This was due to the fact that data_source_id enforces a unique datasource to query from and does not make an extra find query to find the datasource. However, other plugins referred to datasource by name, not by id, and having name be the identifier is more user friendly.

Open Question(s)

  1. Since datasource can be a bit ambiguous here, what are some alternative field names here? Im thinking data_source_id would help disambiguate this field
@seraphjiang
Copy link
Member

dataSourceId looks good call here, could we conduct an e2e poc next

@BionIT
Copy link
Collaborator

BionIT commented Feb 28, 2024

@huyaboo So we are proposing to add a new data source field the the script during creation of the visualization.
How is the visualization using vega persisted in the dashboard, is it by storing the vega script? How would data source be persisted in the visualization?

@kgcreative
Copy link
Member

I would recommend we use dataSourceName instead of data source ID (since the ID can be queried from the data source name under the hood).
@BionIT, we could also pre-pend the data source name to the Index name dataSourceName::indexName

@kgcreative
Copy link
Member

@BionIT, we should follow the same conventions that we use for Index Patterns

@seraphjiang
Copy link
Member

@BionIT, we could also pre-pend the data source name to the Index name dataSourceName::indexName

Looks we already has an index property.

@huyaboo would help to check if this property support pure index or index pattern.

cc: @kgcreative @BionIT

export interface UrlObject {
[index: string]: any;
[CONSTANTS.TIMEFILTER]?: string;
[CONSTANTS.CONTEXT]?: boolean;
[CONSTANTS.LEGACY_CONTEXT]?: string;
[CONSTANTS.TYPE]?: string;
name?: string;
index?: string;
datasource?: string;
body?: Body;
size?: number;
timeout?: string;
}

@huyaboo
Copy link
Member Author

huyaboo commented Feb 29, 2024

@BionIT, we could also pre-pend the data source name to the Index name dataSourceName::indexName

Looks we already has an index property.

@huyaboo would help to check if this property support pure index or index pattern.

cc: @kgcreative @BionIT

export interface UrlObject { [index: string]: any; [CONSTANTS.TIMEFILTER]?: string; [CONSTANTS.CONTEXT]?: boolean; [CONSTANTS.LEGACY_CONTEXT]?: string; [CONSTANTS.TYPE]?: string; name?: string; index?: string; datasource?: string; body?: Body; size?: number; timeout?: string; }

Yeah Vega plugin supports index patterns since fetching the data is treated like running a search query. Any information in the vega spec is persisted so using an Index Pattern convention is possible. However, since it's possible for duplicate index pattern names, I'm not sure if this will be a potential issue when rendering visualizations.

@seraphjiang
Copy link
Member

Yeah Vega plugin supports index patterns

Thanks.

Could we see an example using index-pattern in vega today without MDS.

A record video will help if possible

@seraphjiang
Copy link
Member

seraphjiang commented Mar 3, 2024

@huyaboo @kgcreative @bandinib-amzn @BionIT

Btw, when I ask about if vega support index pattern, I mean the index-pattern created and saved in saved object. not the arbitrary target string. The user created index pattern may contain scripted fields.

GET /<target>/_search

Comma-separated list of indices, and aliases to search. Supports wildcards (*). and *, _all for all indices .

@seraphjiang
Copy link
Member

Currently, Multiple DataSource (MDS) does not support Vega visualizations.

Additionally, Vega is not supported with MDS since a datasource needs to be provided to the Vega spec.

@huyaboo

nitpick: let's use Vega is not supported with MDS, to be consistent.

@seraphjiang seraphjiang added the RFC Substantial changes or new features that require community input to garner consensus. label Mar 3, 2024
@seraphjiang
Copy link
Member

seraphjiang commented Mar 3, 2024

Thanks @kgcreative and team for the brainstorming, I'd provide more information to see if that could help us to make the call.

  1. syntax for index name with datasource

@BionIT, we could also pre-pend the data source name to the Index name dataSourceName::indexName

pre-pend datasource to index could be one option, however the syntax may confuse user who using cross-cluster. whether vega support cross-cluster is unknown. it may beyond the scope of this feature request, we could create separate to track that.

# cross cluster syntax
cluster1:index1,index2

# single datasource syntax
datasource1::index1,index2

  1. use dataSourceId and dataSource Name for identity

I would recommend we use dataSourceName instead of data source ID (since the ID can be queried from the data source name under the hood).

currently, data-source-id is the key to retrieve the detail information. Agree with @kgcreative we could lookup ID by name. However, this will introducing another _search API dependency of this feature.

besides the overhead, in vega viz, edit, rendering page. we may also take save-object import/export for MDS into consideration.

@BionIT @huyaboo @bandinib-amzn would you meet and dive deep a little and come up proposal quickly

image

  1. Target Index vs Index pattern
    base on my observation, vega viz support Target Index instead of Index Pattern

refer to more detail here
#5927 (comment)
Could I suggest to create another feature request to add index pattern support(which need to apply to both MDS and non-MDS)

  1. saved object import/export for Vega support MDS
    I think we need it as part of this feature request/rfc. but could be addressed incrementally in separate PR

  2. Single DataSource vs Multi DataSource

Our end vision to move to true multi datasource world. However there are ambiguous part on both use case and technical detail.

from technical side, we might rely on vega/vega lite to support MDS.
vega/vega-lite#1271

# possible approach is to request data array support in vega/vega lite
[
  {
     data: {}
     name: "data1"
  },
  {
     data: {}
     name: "data2"
  },
]

From use case, we will need more use case to help us validate and prioritize.

cc: @kgcreative @zengyan-amazon @BionIT @huyaboo @bandinib-amzn

@YANG-DB
Copy link
Member

YANG-DB commented Mar 5, 2024

@huyaboo this is an awesome idea
In this context I'd like to add the following issue for #6040

@huyaboo
Copy link
Member Author

huyaboo commented Mar 6, 2024

  1. use dataSourceId and dataSource Name for identity

After some conversations with Vega users, the tradeoff with using dataSourceName may be worth it purely from a useability point-of-view.

  1. Target Index vs Index pattern
    Could I suggest to create another feature request to add index pattern support(which need to apply to both MDS and non-MDS)

This could be a potential option as well. Right now the Vega visualization only accepts OpenSearch queries.

  1. saved object import/export for Vega support MDS

After some consideration, I'm not sure if any approach can mitigate issues when importing/exporting. The revised proposal will support querying data from multiple indices from multiple datasources (see the Limitations section).

  1. Single DataSource vs Multi DataSource

We do not need to rely on Vega/Vega-lite for this. The current plugin already supports querying multiple indices from local cluster by making the data object an array.

@seraphjiang

@seraphjiang
Copy link
Member

Thanks @huyaboo @YANG-DB

3. use dataSourceId and dataSource Name for identity

After some conversations with Vega users, the tradeoff with using dataSourceName may be worth it purely from a useability point-of-view.

  1. Target Index vs Index pattern
    Could I suggest to create another feature request to add index pattern support(which need to apply to both MDS and non-MDS)

This could be a potential option as well. Right now the Vega visualization only accepts OpenSearch queries.

@huyaboo if it is confirmed, Vega don't support index-pattern. let's create an to track this feature request, we should focus on enhancement in 2.13, and make incremental progress.

  1. saved object import/export for Vega support MDS

After some consideration, I'm not sure if any approach can mitigate issues when importing/exporting. The revised proposal will support querying data from multiple indices from multiple datasources (see the Limitations section).

@huyaboo did you get chance to take a look https://opensearch.org/blog/enhancement-multiple-data-source-import-saved-object/

I wish the change could compatible with the enhanced import/export feature delivered by @BionIT @yujin-emma . if so , there is no work, otherwise, please check with @BionIT @yujin-emma to ensure for customer who live in non-MDS world,

image, they could export their vega saved object, and import into world with MDS enabled.

  1. Single DataSource vs Multi DataSource

We do not need to rely on Vega/Vega-lite for this. The current plugin already supports querying multiple indices from local cluster by making the data object an array.

@huyaboo huyaboo changed the title [MDS] Vega support with MDS [RFC] Vega support with MDS Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request multiple datasource multiple datasource project RFC Substantial changes or new features that require community input to garner consensus. v2.13.0
Projects
None yet
6 participants