Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify use of _type.purpose Key and Link #367

Open
jamesrhester opened this issue Apr 5, 2023 · 2 comments
Open

Clarify use of _type.purpose Key and Link #367

jamesrhester opened this issue Apr 5, 2023 · 2 comments

Comments

@jamesrhester
Copy link
Contributor

Some data names may plausibly have purpose of both Link and Key, if they are a foreign key that is also a key data name of the category. Note that both these purposes are redundant information as this information can be determined from other attributes. Therefore we just need to decide on a rule for assigning _type.purpose, and it won't have any semantic consequences.

I suggest that Link overrules Key, so that a key data name that is a link to another data name has _type.purpose of Link.

Also, as currently written a Key data name has to have a unique value, suggesting that it is the single key data name for the category. This purpose would obviously change if new key data names are added, so I suggest that we adjust the ddl description of Key purpose to state that this data name is one of the category key data names.

@vaitkus
Copy link
Collaborator

vaitkus commented Apr 5, 2023

Some data names may plausibly have purpose of both Link and Key, if they are a foreign key that is also a key data name of the category. Note that both these purposes are redundant information as this information can be determined from other attributes. Therefore we just need to decide on a rule for assigning _type.purpose, and it won't have any semantic consequences.

I fully agree that there is a need for some clarification, but are the purposes really redundant? I guess the status of an item being a Key can be determined from the category definition, but other than that I am unsure what attributes could signal that. Similarly, the presence of _name.linked_item_id could indicate the Link purpose, but it can also be used by items with the SU purpose. Of course, I might be missing something obvious here.

I suggest that Link overrules Key, so that a key data name that is a link to another data name has _type.purpose of Link.

Seems sound. Do you suggest that this change is made automatically by the software or should we change the existing dictionaries to conform to this rule (I guess that majority of them already do)?

Also, as currently written a Key data name has to have a unique value, suggesting that it is the single key data name for the category. This purpose would obviously change if new key data names are added, so I suggest that we adjust the ddl description of Key purpose to state that this data name is one of the category key data names.

I guess that this extension is needed for the merged datasets? I do not object to the change, but currently there seem to be only two scenarios where composite keys are (properly) used:

  1. The individual items serve as foreign keys (links) to other categories. In this case the items will have the Link purpose and will not be required to be unique in the given category (only in the linked category).
  2. The individual items serve as keys of a top level category. In this case, the only reason to use more than one key is if the keys are natural keys, i.e. they encode a meaningful value (e.g. Miller indices). Due to this, the items would be assigned the Encode purpose.

@jamesrhester
Copy link
Contributor Author

@vaitkus's logic is impeccable. Let us leave the Key definition in ddl.dic alone. Essentially a Key type.purpose means that a data name forms part of a key, is not linked to a parent data name, and has no information encoded in it.

Seems sound. Do you suggest that this change is made automatically by the software or should we change the existing dictionaries to conform to this rule (I guess that majority of them already do)?

We should change the dictionaries, if necessary, to conform to this rule. I believe that they should already conform, and if there are places that they don't we should check that we haven't overlooked something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants