You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes. I can contribute a fix for this bug independently.
Feature Request Proposal
Currently Snowflake support is tied to HdfsSource and SimplePath. When the user wants to create a snowflake support, they have to provide a JDBC url to an HDFS source. This translates to a SimplePath in the Computation Engine.
Couple issues arise because of this:
Defining a snowflake source is confusing and difficult. Every time the user wants to use snowflake for feature source or observation path they have to create a jdbc url that requires SF_URL, SF_USER, SF_ROLE (all of which are already specified in the config settings).
Since this HDFS source translates to a SimplePath in the computation engine, there is an issue with running WindowAggregation functionality with snowflake. SimplePath's (isFileBasedLocation) defaults to True - however, snowflake and JDBC are not file locations.
Caused by: URISyntaxException: Relative path in absolute URI:
2022-10-26 05:01:39.682 | ERROR | feathr.spark_provider._databricks_submission:wait_for_completion:218 - at org.apache.hadoop.fs.Path.initialize(Path.java:263)
at org.apache.hadoop.fs.Path.<init>(Path.java:221)
at com.linkedin.feathr.offline.util.HdfsUtils$.exists(HdfsUtils.scala:453)
at com.linkedin.feathr.offline.source.pathutil.HdfsPathChecker.exists(HdfsPathChecker.scala:11)
at com.linkedin.feathr.offline.source.pathutil.TimeBasedHdfsPathAnalyzer.analyze(TimeBasedHdfsPathAnalyzer.scala:51)
at com.linkedin.feathr.offline.transformation.AnchorToDataSourceMapper.getWindowAggAnchorDFMapForJoin(AnchorToDataSourceMapper.scala:102)
at com.linkedin.feathr.offline.swa.SlidingWindowAggregationJoiner.$anonfun$joinWindowAggFeaturesAsDF$8(SlidingWindowAggregationJoiner.scala:142)
Minor Addition:
Currently, the registry implementation (request functionality) is tied to Azure Auth. We create an AWS implementation that allows the user to provide an AWSRequestAuth key and authenticate to the registry.
Motivation
What is the use case for this feature?
Users can now define a snowflake source for features and observations using the SnowflakeSource API. Users can pass in a custom query rather than the dbtable. This allows for filtering/processing on snowflake cluster before loading data in memory.
User can now use Registry with AWS Auth by creating an AWSRequestsAuth object and passing it in during client initialization
Details
The goal of this feature is to separate out the Snowflake implementation into its own source and functionality. This includes:
Separate Source API (SnowflakeSource)
Translates to Snowflake DataLocation in computation engine. (No longer tied to SimplePath)
Instead of having to provide a JDBC url each time, user can provide database, schema, dbtable/query information and rest of the SF config is retrieved from the configurations specified during client initialization.
Along with dbtable, users now have the ability to also pass in a query instead. (Enabling predicate pushdown functionality)
Add sfWarehouse to required SF config so user doesn't need to specify each time.
Expose client functionality to generate snowflake url given the same parameters as SnowflakeSource
What component(s) does this feature request affect?
Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
Feature Registry Web UI: The Web UI for feature registry. Written in React
The text was updated successfully, but these errors were encountered:
Willingness to contribute
Yes. I can contribute a fix for this bug independently.
Feature Request Proposal
Currently Snowflake support is tied to HdfsSource and SimplePath. When the user wants to create a snowflake support, they have to provide a JDBC url to an HDFS source. This translates to a SimplePath in the Computation Engine.
Couple issues arise because of this:
Minor Addition:
Currently, the registry implementation (request functionality) is tied to Azure Auth. We create an AWS implementation that allows the user to provide an AWSRequestAuth key and authenticate to the registry.
Motivation
Details
The goal of this feature is to separate out the Snowflake implementation into its own source and functionality. This includes:
What component(s) does this feature request affect?
Python Client
: This is the client users use to interact with most of our API. Mostly written in Python.Computation Engine
: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.Feature Registry API
: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)Feature Registry Web UI
: The Web UI for feature registry. Written in ReactThe text was updated successfully, but these errors were encountered: