Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems using the patterns module. #50

Closed
gillespied opened this issue Oct 25, 2023 · 0 comments
Closed

Problems using the patterns module. #50

gillespied opened this issue Oct 25, 2023 · 0 comments

Comments

@gillespied
Copy link

gillespied commented Oct 25, 2023

Attempting to follow the instructions at https://rapid.readthedocs.io/en/latest/sdk/useful_patterns/

rapid version 7.0.4,
rapid-sdk version 0.1.4

First there is a small issue a the patterns module has no file data, it should be dataset. I'm guess this is a typo or the file had a rename recently.

import pandas as pd
from rapid import Rapid
from rapid.patterns import data. <--- should be dataset
from rapid.items.schema import SchemaMetadata, SensitivityLevel, Owner
from rapid.exceptions import DataFrameUploadValidationException

rapid = Rapid()

raw_data = [{"a": 1, "b": 2, "c": 3}, {"a": 10, "b": 20, "c": 30}]
df = pd.DataFrame(raw_data)

metadata = SchemaMetadata(
    layer="default",
    domain="mydomain",
    dataset="mydataset",
    owners=[Owner(name="myname", email="myemail@email.com")],
    sensitivity=SensitivityLevel.PUBLIC.value,
)

try:
    data.upload_and_create_dataset(. <--- should be dataset
        rapid=rapid, df=df, metadata=metadata, upgrade_schema_on_fail=False
    )
except DataFrameUploadValidationException:
    print("Incorrect DataFrame schema")

The second problem is around attempting to use the `upload_and_create_dataset method. Which results in an error that you must change the owner from default. The error is

SchemaCreateFailedException: ('Could not create schema', {'details': 'You must change the default owner'})

I think the issue is with the generate_schema method with does not use the metadata.owner property and returns a schema with the default owner.

metadata=SchemaMetadata(layer='default', domain='mydomain', dataset='mydataset', sensitivity='PUBLIC', owners=[Owner(name='change_me', email='change_me@email.com')], version=None, key_value_tags={}, key_only_tags=[]) columns=[Column(name='c', data_type='int', partition_index=None, allow_null=True, format=None), Column(name='d', data_type='int', partition_index=None, allow_null=True, format=None), Column(name='e', data_type='int', partition_index=None, allow_null=True, format=None)]

This is then passed within upload_and_create_dataset to rapid.schema which throws the error.

Its possible to create schema by setting the owner again after executing generate_schema e.g.

schema =  rapid.generate_schema(
            df, metadata.layer, metadata.domain, metadata.dataset, metadata.sensitivity
        )

schema.metadata.owners = [Owner(name="myname", email="myemail@email.com")]

rapid.create_schema(schema)

That feels a bit of a hack though. Is that the intended behaviour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants