Adding documentation for azure.cosmos.spark package and managing spar…

…k libraries (#1013) * Removing double quotes * Commenting out * Bringing maven jar back * Adding double quotes * Reverting changes * Adding document to manage library for Azure Spark platforms * More edits, adding section for azure.cosmos.spark package * Adding back the cosmos DB documentation that was deleted in previous PRs
feathr-ai · Jan 27, 2023 · 305ef73 · 305ef73
1 parent e322717
commit 305ef73
Show file tree

Hide file tree

Showing 2 changed files with 41 additions and 1 deletion.
diff --git a/docs/how-to-guides/jdbc-cosmos-notes.md b/docs/how-to-guides/jdbc-cosmos-notes.md
@@ -105,4 +105,28 @@ client.get_offline_features(..., output_path=sink)
 
 ## Using SQL database as the online store
 
-Same as the offline, create JDBC sink and add it to the `MaterializationSettings`, set corresponding environment variables, then use it with `FeathrClient.materialize_features`.
+Same as the offline, create JDBC sink and add it to the `MaterializationSettings`, set corresponding environment variables, then use it with `FeathrClient.materialize_features`.
+
+## Using CosmosDb as the online store
+
+To use CosmosDb as the online store, create `CosmosDbSink` and add it to the `MaterializationSettings`, then use it with `FeathrClient.materialize_features`, e.g..
+
+```
+name = 'cosmosdb_output'
+sink = CosmosDbSink(name, some_cosmosdb_url, some_cosmosdb_database, some_cosmosdb_collection)
+os.environ[f"{name.upper()}_KEY"] = "some_cosmosdb_api_key"
+client.materialize_features(..., materialization_settings=MaterializationSettings(..., sinks=[sink]))
+```
+
+Feathr client doesn't support getting feature values from CosmosDb, you need to use [official CosmosDb client](https://pypi.org/project/azure-cosmos/) to get the values:
+
+```
+from azure.cosmos import exceptions, CosmosClient, PartitionKey
+client = CosmosClient(some_cosmosdb_url, some_cosmosdb_api_key)
+db_client = client.get_database_client(some_cosmosdb_database)
+container_client = db_client.get_container_client(some_cosmosdb_collection)
+doc = container_client.read_item(some_key)
+feature_value = doc['feature_name']
+```
+
+**Note: There is currently a known issue with using azure.cosmos.spark package on databricks. Somehow this dependency is not resolving correctly on databricks when packaged through gradle and results in ClassNotFound error. To mitigate this issue, please install the azure.cosmos.spark package directly from [maven central](https://mvnrepository.com/artifact/com.azure.cosmos.spark/azure-cosmos-spark_3-1_2-12) following the steps [here](./manage-library-spark-platform.md)**
diff --git a/docs/how-to-guides/manage-library-spark-platform.md b/docs/how-to-guides/manage-library-spark-platform.md
@@ -0,0 +1,16 @@
+---
+layout: default
+title: Manage library for Azure Spark Compute (Azure Databricks and Azure Synapse)
+parent: How-to Guides
+---
+
+# Manage libraries for Azure Synapse
+
+Sometimes you might run into dependency issues where a particular dependency might be missing on your spark compute, most likely you will face this issue while executing sample notebooks in that environment.
+
+If you want to intall maven, PyPi or private packages on your Synapse cluster, you can follow the [official documentation](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries) of how to do it for workspace, pool and session and the difference between each one of them.
+
+
+## Manage libraries for Azure Databricks
+
+Similarly to install an external library from PyPi, Maven or private packages on databricks you can follow the official [databricks documentation](https://learn.microsoft.com/en-us/azure/databricks/libraries/cluster-libraries)