Skip to content

Commit

Permalink
Adding documentation for azure.cosmos.spark package and managing spar…
Browse files Browse the repository at this point in the history
…k libraries (#1013)

* Removing double quotes

* Commenting out

* Bringing maven jar back

* Adding double quotes

* Reverting changes

* Adding document to manage library for Azure Spark platforms

* More edits, adding section for azure.cosmos.spark package

* Adding back the cosmos DB documentation that was deleted in previous PRs
  • Loading branch information
jainr authored Jan 27, 2023
1 parent e322717 commit 305ef73
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 1 deletion.
26 changes: 25 additions & 1 deletion docs/how-to-guides/jdbc-cosmos-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,4 +105,28 @@ client.get_offline_features(..., output_path=sink)

## Using SQL database as the online store

Same as the offline, create JDBC sink and add it to the `MaterializationSettings`, set corresponding environment variables, then use it with `FeathrClient.materialize_features`.
Same as the offline, create JDBC sink and add it to the `MaterializationSettings`, set corresponding environment variables, then use it with `FeathrClient.materialize_features`.

## Using CosmosDb as the online store

To use CosmosDb as the online store, create `CosmosDbSink` and add it to the `MaterializationSettings`, then use it with `FeathrClient.materialize_features`, e.g..

```
name = 'cosmosdb_output'
sink = CosmosDbSink(name, some_cosmosdb_url, some_cosmosdb_database, some_cosmosdb_collection)
os.environ[f"{name.upper()}_KEY"] = "some_cosmosdb_api_key"
client.materialize_features(..., materialization_settings=MaterializationSettings(..., sinks=[sink]))
```

Feathr client doesn't support getting feature values from CosmosDb, you need to use [official CosmosDb client](https://pypi.org/project/azure-cosmos/) to get the values:

```
from azure.cosmos import exceptions, CosmosClient, PartitionKey
client = CosmosClient(some_cosmosdb_url, some_cosmosdb_api_key)
db_client = client.get_database_client(some_cosmosdb_database)
container_client = db_client.get_container_client(some_cosmosdb_collection)
doc = container_client.read_item(some_key)
feature_value = doc['feature_name']
```

**Note: There is currently a known issue with using azure.cosmos.spark package on databricks. Somehow this dependency is not resolving correctly on databricks when packaged through gradle and results in ClassNotFound error. To mitigate this issue, please install the azure.cosmos.spark package directly from [maven central](https://mvnrepository.com/artifact/com.azure.cosmos.spark/azure-cosmos-spark_3-1_2-12) following the steps [here](./manage-library-spark-platform.md)**
16 changes: 16 additions & 0 deletions docs/how-to-guides/manage-library-spark-platform.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
layout: default
title: Manage library for Azure Spark Compute (Azure Databricks and Azure Synapse)
parent: How-to Guides
---

# Manage libraries for Azure Synapse

Sometimes you might run into dependency issues where a particular dependency might be missing on your spark compute, most likely you will face this issue while executing sample notebooks in that environment.

If you want to intall maven, PyPi or private packages on your Synapse cluster, you can follow the [official documentation](https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries) of how to do it for workspace, pool and session and the difference between each one of them.


## Manage libraries for Azure Databricks

Similarly to install an external library from PyPi, Maven or private packages on databricks you can follow the official [databricks documentation](https://learn.microsoft.com/en-us/azure/databricks/libraries/cluster-libraries)

0 comments on commit 305ef73

Please sign in to comment.