[RFC] Vega support with MDS #5927

huyaboo · 2024-02-22T21:33:23Z

Proposal

Currently, Multiple DataSource (MDS) does not support Vega visualizations. Thus, we propose to create a new optional field from within the Vega-spec data_source_name (tentatively named) that is passed within the url body of the data body. This field will take in a datasource name and under the hood, the visualization will be able to retrieve data from the index located in that datasource. This enables a user to retrieve data from one or more indices from one or more datasources to create custom Vega visualizations.

Here is an example Vega spec that should be supported with this feature. Note that data field can be a singular url object or an array that contains multiple url objects:

data: {
    url: {
      %context%: true
      %timefield%: @timestamp
      index: opensearch_dashboards_sample_data_logs
      datasource: some_datasource_id
      body: {
        aggs: {
          ...
      }
    }
    format: {
      property: aggregations.5
    }
  }

Background

Vega is a declarative visualization grammar that can be used to create and share custom interactive visualizations. It is important to note that Vega-lite is a similar but very much different lightweight visualization grammar. Both of these grammars are supported within Dashboards but with the caveat that the data is retrieved from the index BEFORE any rendering happens. This means that data is NOT dynamically loaded. Additionally, Vega is not supported with MDS since the local cluster is the assumed datasource. This proposal will fix the latter (with the former being out of scope).

It is also important to clarify what is meant by Vega support for MDS. Vega support for MDS can be interpreted in two different ways

Option 1: Vega can support visualizations which fetch data from multiple datasources

In this example, the Vega visualization is stored in Datasource A but references data from Datasources A, B, and C
Option 2: Vega can support visualizations which reference data from the local cluster or any remote datasource (but not both)

In this example, the Vega visualization is stored in Datasource A but references data only from Datasource B and not any other datasource

The proposal seeks to support Option 1. While there are more limitations with option 1 (see the Limitations section), having the option to fetch data from any index from any datasource (provided the user has the permissions) provides a robust visualization experience.

Approach

When Dashboards parses the Vega-spec to render the visualization, it parses the URL object and passes the object into the search API, which uses IOpenSearchSearchRequest as a parameter. This interface provides a field dataSourceId that will tell dashboards to use the data source client. All that the Vega plugin would need to do is check if MDS is enabled and if so, retrieve the associated datasource id from the data_source_name field and pass it into the search query.

Add a datasource field to the UrlObject

export interface UrlObject {
  [index: string]: any;
  [CONSTANTS.TIMEFILTER]?: string;
  [CONSTANTS.CONTEXT]?: boolean;
  [CONSTANTS.LEGACY_CONTEXT]?: string;
  [CONSTANTS.TYPE]?: string;
  name?: string;
  index?: string;
  data_source_name?: string;
  body?: Body;
  size?: number;
  timeout?: string;
}

Have the saved objects client get the associated dataSourceId from the data_source_name via find

Then, in the _searchAPI.search() method, we can pass in the dataSourceId as a parameter

return search({params, dataSourceId} , { abortSignal: this.abortSignal }).pipe(
	tap((data) => this.inspectSearchResult(data, requestResponders[requestId])),
	map((data) => ({
		name: requestId,
		rawResponse: data.rawResponse,
	}))
);

Thus, when the user wants to write a visualization with data from another datasource, they can do something like the following (mockup)

Limitations

This approach provides greater flexibility in enabling users to make visualizations. However with great power comes great responsibility.

Importing/exporting Vega saved objects will inevitably break. Because data_source_name is specific to an OpenSearch cluster, if users were to export/import into another cluster and the same datasource names are not configured, the visualizations cannot find the data and thus return errors. This is a tradeoff that can be made but in the future, this would ideally need to be mitigated. See below section Importing Vega saved objects for mitigating some of these issues.
As a sort of add-on to point 1, editing data_source_names to each url that contains one is cumbersome, especially when multiple url objects were involved.

Importing Vega saved objects

In addition to the above requirements, the Vega visualization should have support for importing saved objects. As mentioned in Limitations, full support is a challenge since these the current import logic supports only one datasource. Following similar logic as #5712, this issue will take into consideration the following scenarios:

Importing from non-MDS Vega -> MDS Vega visualization: since the data_source_name will not be present in these visualizations, the field can be added to the Vega spec directly.
Importing from MDS Vega -> MDS Vega: any data_source_name that uses the previous datasource will be updated to use the new data_source_name.

Alternatives

Initially the decision was made to use the data_source_id vs the data_source_name. This was due to the fact that data_source_id enforces a unique datasource to query from and does not make an extra find query to find the datasource. However, other plugins referred to datasource by name, not by id, and having name be the identifier is more user friendly.

Open Question(s)

Since datasource can be a bit ambiguous here, what are some alternative field names here? Im thinking data_source_id would help disambiguate this field

The text was updated successfully, but these errors were encountered:

seraphjiang · 2024-02-26T01:51:30Z

dataSourceId looks good call here, could we conduct an e2e poc next

BionIT · 2024-02-28T17:45:50Z

@huyaboo So we are proposing to add a new data source field the the script during creation of the visualization.
How is the visualization using vega persisted in the dashboard, is it by storing the vega script? How would data source be persisted in the visualization?

kgcreative · 2024-02-28T23:35:48Z

I would recommend we use dataSourceName instead of data source ID (since the ID can be queried from the data source name under the hood).
@BionIT, we could also pre-pend the data source name to the Index name dataSourceName::indexName

kgcreative · 2024-02-28T23:36:35Z

@BionIT, we should follow the same conventions that we use for Index Patterns

seraphjiang · 2024-02-29T01:41:47Z

@BionIT, we could also pre-pend the data source name to the Index name dataSourceName::indexName

Looks we already has an index property.

@huyaboo would help to check if this property support pure index or index pattern.

cc: @kgcreative @BionIT

export interface UrlObject {
[index: string]: any;
[CONSTANTS.TIMEFILTER]?: string;
[CONSTANTS.CONTEXT]?: boolean;
[CONSTANTS.LEGACY_CONTEXT]?: string;
[CONSTANTS.TYPE]?: string;
name?: string;
index?: string;
datasource?: string;
body?: Body;
size?: number;
timeout?: string;
}

huyaboo · 2024-02-29T21:25:38Z

@BionIT, we could also pre-pend the data source name to the Index name dataSourceName::indexName

Looks we already has an index property.

@huyaboo would help to check if this property support pure index or index pattern.

cc: @kgcreative @BionIT

export interface UrlObject { [index: string]: any; [CONSTANTS.TIMEFILTER]?: string; [CONSTANTS.CONTEXT]?: boolean; [CONSTANTS.LEGACY_CONTEXT]?: string; [CONSTANTS.TYPE]?: string; name?: string; index?: string; datasource?: string; body?: Body; size?: number; timeout?: string; }

Yeah Vega plugin supports index patterns since fetching the data is treated like running a search query. Any information in the vega spec is persisted so using an Index Pattern convention is possible. However, since it's possible for duplicate index pattern names, I'm not sure if this will be a potential issue when rendering visualizations.

seraphjiang · 2024-03-03T04:46:49Z

Yeah Vega plugin supports index patterns

Thanks.

Could we see an example using index-pattern in vega today without MDS.

A record video will help if possible

seraphjiang · 2024-03-03T05:04:56Z

@huyaboo @kgcreative @bandinib-amzn @BionIT

Btw, when I ask about if vega support index pattern, I mean the index-pattern created and saved in saved object. not the arbitrary target string. The user created index pattern may contain scripted fields.

GET /<target>/_search

Comma-separated list of indices, and aliases to search. Supports wildcards (*). and *, _all for all indices .

seraphjiang · 2024-03-03T17:44:57Z

Currently, Multiple DataSource (MDS) does not support Vega visualizations.

Additionally, Vega is not supported with MDS since a datasource needs to be provided to the Vega spec.

@huyaboo

nitpick: let's use Vega is not supported with MDS, to be consistent.

seraphjiang · 2024-03-03T19:28:48Z

Thanks @kgcreative and team for the brainstorming, I'd provide more information to see if that could help us to make the call.

syntax for index name with datasource

@BionIT, we could also pre-pend the data source name to the Index name dataSourceName::indexName

pre-pend datasource to index could be one option, however the syntax may confuse user who using cross-cluster. whether vega support cross-cluster is unknown. it may beyond the scope of this feature request, we could create separate to track that.

# cross cluster syntax
cluster1:index1,index2

# single datasource syntax
datasource1::index1,index2

use dataSourceId and dataSource Name for identity

I would recommend we use dataSourceName instead of data source ID (since the ID can be queried from the data source name under the hood).

currently, data-source-id is the key to retrieve the detail information. Agree with @kgcreative we could lookup ID by name. However, this will introducing another _search API dependency of this feature.

besides the overhead, in vega viz, edit, rendering page. we may also take save-object import/export for MDS into consideration.

@BionIT @huyaboo @bandinib-amzn would you meet and dive deep a little and come up proposal quickly

Target Index vs Index pattern
base on my observation, vega viz support Target Index instead of Index Pattern

refer to more detail here
#5927 (comment)
Could I suggest to create another feature request to add index pattern support(which need to apply to both MDS and non-MDS)

saved object import/export for Vega support MDS
I think we need it as part of this feature request/rfc. but could be addressed incrementally in separate PR
Single DataSource vs Multi DataSource

Our end vision to move to true multi datasource world. However there are ambiguous part on both use case and technical detail.

from technical side, we might rely on vega/vega lite to support MDS.
vega/vega-lite#1271

# possible approach is to request data array support in vega/vega lite
[
  {
     data: {}
     name: "data1"
  },
  {
     data: {}
     name: "data2"
  },
]

From use case, we will need more use case to help us validate and prioritize.

cc: @kgcreative @zengyan-amazon @BionIT @huyaboo @bandinib-amzn

YANG-DB · 2024-03-05T19:27:18Z

@huyaboo this is an awesome idea
In this context I'd like to add the following issue for #6040

huyaboo · 2024-03-06T01:42:37Z

use dataSourceId and dataSource Name for identity

After some conversations with Vega users, the tradeoff with using dataSourceName may be worth it purely from a useability point-of-view.

Target Index vs Index pattern
Could I suggest to create another feature request to add index pattern support(which need to apply to both MDS and non-MDS)

This could be a potential option as well. Right now the Vega visualization only accepts OpenSearch queries.

saved object import/export for Vega support MDS

After some consideration, I'm not sure if any approach can mitigate issues when importing/exporting. The revised proposal will support querying data from multiple indices from multiple datasources (see the Limitations section).

Single DataSource vs Multi DataSource

We do not need to rely on Vega/Vega-lite for this. The current plugin already supports querying multiple indices from local cluster by making the data object an array.

@seraphjiang

seraphjiang · 2024-03-06T02:24:21Z

Thanks @huyaboo @YANG-DB

3. use dataSourceId and dataSource Name for identity

After some conversations with Vega users, the tradeoff with using dataSourceName may be worth it purely from a useability point-of-view.

Target Index vs Index pattern
Could I suggest to create another feature request to add index pattern support(which need to apply to both MDS and non-MDS)

This could be a potential option as well. Right now the Vega visualization only accepts OpenSearch queries.

@huyaboo if it is confirmed, Vega don't support index-pattern. let's create an to track this feature request, we should focus on enhancement in 2.13, and make incremental progress.

saved object import/export for Vega support MDS

After some consideration, I'm not sure if any approach can mitigate issues when importing/exporting. The revised proposal will support querying data from multiple indices from multiple datasources (see the Limitations section).

@huyaboo did you get chance to take a look https://opensearch.org/blog/enhancement-multiple-data-source-import-saved-object/

I wish the change could compatible with the enhanced import/export feature delivered by @BionIT @yujin-emma . if so , there is no work, otherwise, please check with @BionIT @yujin-emma to ensure for customer who live in non-MDS world,

, they could export their vega saved object, and import into world with MDS enabled.

Single DataSource vs Multi DataSource

We do not need to rely on Vega/Vega-lite for this. The current plugin already supports querying multiple indices from local cluster by making the data object an array.

huyaboo added the enhancement New feature or request label Feb 22, 2024

github-actions bot added the untriaged label Feb 22, 2024

kavilla added the multiple datasource multiple datasource project label Feb 22, 2024

seraphjiang added v2.13.0 and removed v2.13.0 labels Feb 25, 2024

seraphjiang assigned seraphjiang and huyaboo and unassigned seraphjiang Feb 25, 2024

seraphjiang added v2.13.0 and removed untriaged labels Feb 25, 2024

seraphjiang mentioned this issue Feb 25, 2024

[Meta][2.13] Security, bug fix and enhancemanet for dashboards anywhere #5752

Open

28 tasks

huyaboo mentioned this issue Feb 28, 2024

[MDS] Support Vega Visualizations #5975

Merged

7 tasks

seraphjiang added the RFC Substantial changes or new features that require community input to garner consensus. label Mar 3, 2024

YANG-DB mentioned this issue Mar 5, 2024

[PPL] Vega support with PPL #6040

Closed

bandinib-amzn closed this as completed in #5975 Mar 8, 2024

huyaboo mentioned this issue Mar 12, 2024

[MDS] Add Vega support for importing saved objects #6123

Merged

10 tasks

huyaboo changed the title ~~[MDS] Vega support with MDS~~ [RFC] Vega support with MDS Mar 15, 2024

This was referenced Mar 18, 2024

[DOC] Add documentation about how to use Vega Visualization plugin in OpenSearch Dashboards opensearch-project/documentation-website#6709

Closed

[MDS] Install Vega sample data #6218

Merged

huyaboo mentioned this issue Mar 28, 2024

[MDS] Full TSVB Support #6290

Closed

huyaboo mentioned this issue Apr 10, 2024

[MDS] Support for Timeline #6385

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Vega support with MDS #5927

[RFC] Vega support with MDS #5927

huyaboo commented Feb 22, 2024 •

edited

Loading

seraphjiang commented Feb 26, 2024

BionIT commented Feb 28, 2024

kgcreative commented Feb 28, 2024

kgcreative commented Feb 28, 2024

seraphjiang commented Feb 29, 2024

huyaboo commented Feb 29, 2024

seraphjiang commented Mar 3, 2024

seraphjiang commented Mar 3, 2024 •

edited

Loading

seraphjiang commented Mar 3, 2024

seraphjiang commented Mar 3, 2024 •

edited

Loading

YANG-DB commented Mar 5, 2024 •

edited

Loading

huyaboo commented Mar 6, 2024

seraphjiang commented Mar 6, 2024

[RFC] Vega support with MDS #5927

[RFC] Vega support with MDS #5927

Comments

huyaboo commented Feb 22, 2024 • edited Loading

Proposal

Background

Approach

Limitations

Importing Vega saved objects

Alternatives

Open Question(s)

seraphjiang commented Feb 26, 2024

BionIT commented Feb 28, 2024

kgcreative commented Feb 28, 2024

kgcreative commented Feb 28, 2024

seraphjiang commented Feb 29, 2024

huyaboo commented Feb 29, 2024

seraphjiang commented Mar 3, 2024

seraphjiang commented Mar 3, 2024 • edited Loading

seraphjiang commented Mar 3, 2024

seraphjiang commented Mar 3, 2024 • edited Loading

YANG-DB commented Mar 5, 2024 • edited Loading

huyaboo commented Mar 6, 2024

seraphjiang commented Mar 6, 2024

huyaboo commented Feb 22, 2024 •

edited

Loading

seraphjiang commented Mar 3, 2024 •

edited

Loading

seraphjiang commented Mar 3, 2024 •

edited

Loading

YANG-DB commented Mar 5, 2024 •

edited

Loading