Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected KeyError raised related to missing 'primary_key' field schema in the tabular data columns #1633

Open
amelie-rondot opened this issue Jan 31, 2024 · 0 comments · May be fixed by #1641

Comments

@amelie-rondot
Copy link
Contributor

Overview

In the of migration from v4 to v5 of frictionless-py in validata.fr, we experienced an unexpected KeyError when validating a tabular data which not contains a primary_key header contained in the schema used for validate data.

For example :

data = [["b"], ["foo"]]
schema = {
        "$schema": "https://frictionlessdata.io/schemas/table-schema.json",
        "fields": [
            {
                "name": "a",
            }
        ],
        "primaryKey": ["a"],
    }

Using python, it raises a KeyError:

import frictionless

if __name__ == "__main__":
    report = frictionless.validate(
            source=data,
            schema=frictionless.Schema.from_descriptor(schema),
            detector=frictionless.Detector(schema_sync=True))
    print(report)

Output:

Traceback (most recent call last):
  File "code.py", line 4, in <module>
    report = frictionless.validate(
    ...
    File "frictionless/table/row.py", line 281, in __process
        raise KeyError(f"Row does not have a field {key}")
    KeyError: 'Row does not have a field a'

Expected behaviour

According to the documentation of PrimaryKey specification of TableSchema, the fields related to the primary-key in the data cannot be null.
I was expected an invalid report specifying a missing-primary-key for example, mentioning the error description "Based on the schema there should be a label 'a' corresponding to the schema's primary key that is missing in the data's header." for example. (Such as for missing-label validation error.)
The expected report validation would be:

{'valid': False,
 'stats': {'tasks': 1, 'errors': 1, 'warnings': 0, 'seconds': 0.003},
 'warnings': [],
 'errors': [],
 'tasks': [{'name': 'memory',
            'type': 'table',
            'valid': False,
            'place': '<memory>',
            'labels': ['b'],
            'stats': {'errors': 1,
                      'warnings': 0,
                      'seconds': 0.003,
                      'fields': 2,
                      'rows': 1},
            'warnings': [],
            'errors': [{'type': 'missing-primary-key',
                        'title': 'Missing Primary Key',
                        'description': 'Based on the schema there should be a '
                                       "label 'a' corresponding to the schema's primary key"
                                        'that is missing in the data's header.',
                        'message': "There is a missing primary key in the header's "
                                   'field "a" at position "2"',
                        'tags': ['#table', '#header', '#label'],
                        'note': '',
                        'labels': ['b'],
                        'rowNumbers': [1],
                        'label': '',
                        'fieldName': 'a',
                        'fieldNumber': 2}]}]}

Other details and experimentations

Frictionless version 5.16.0

Same result with command line validation.
I have put "schema-sync" to reproduce more closely our use case, but it does not seem to be related with the actual issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment