-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide default values for author identifiers #359
Conversation
Recent changes to the author categories introduced author identifiers, which were key data names. However, legacy data files will not have this data name and thus fail validation. The current changes introduce dREL methods that provide default values.
Thank you for pointing out the not so obvious distinction between I quite like the current assumption that it is sufficient to run Also, as mentioned in #103, wouldn't it be better to use a method that does not tie the generated id to the row number? Using something like a uuid would help to prevent accidental id clashes and would be more in line with the idea that loop order should/can be generally ignored when dealing with DDLm ontologies. Finally, note that according to the interpretation that you provided, the
|
The meaning of
In the dREL context, there are no data blocks or data files, only collections of tables. So the simplest option for Definition methods is that they do evaluate for every row. The alternative might be to say something like "once for every distinct value of a Set category key", which is more complex to implement and means that Set categories acquire special status in dREL, which they haven't had before.
Yes,
Unfortunately 3 decades of practice have made that "normally" into "always". This is hard-coded into too much software to change now. We should change that definition to move away from sequence number and simply say that it is the value of But, given my comments above, |
Understood, so the main idea is that
This is just a nitpick, but a definition can have several attached methods, but just not several attached methods of the same type (
But doesn't dREL still operate on a finite dataset which in the regular use case matches a single data block? Probably, not all of the currently defined methods could not handle merged multiblock data files (e.g. method provided with
Ok, these changes can be made in a separate PR once the discussions on the id generation functions are settled. As for the current PR, the |
That is the reason for a specific dREL behaviour: when indexing into a category in a dREL method, if the values of any key data names are not provided explicitly, their values can be assumed from the values any related (child-parent links) key data names take in the currently-processed row. Otherwise the values of the key data names should be provided (
In practice a |
I believe there might be a further issue: even if |
Although we can generate arbitrary ids for parent categories publ_author and audit_author where they are missing, we cannot do the same for their child categories. This commit adds dREL that searches for matching names in the parent categories and determines the missing identifier accordingly.
Yes, the same missing key problem would technically occur with the |
Understood. I think by making this a default value, which might be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the discussion the provided methods look great. I suggested a few minor tweaks for the _audit_contact_author.id
method, if you are ok with them, we should also propagate them to the _publ_contact_author
method.
Also, I guess issue COMCIFS/dREL#6 should first be resolved before merging this?
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
Those changes are fine. Yes, we should resolve COMCIFS/dREL#6 first. |
cif_core.dic
Outdated
_method.purpose Definition | ||
_method.expression | ||
; | ||
_enumeration.default = Current_Row(audit_author) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Unique_id
function (COMCIFS/dREL#10) might be more suitable here.
If not, the Current_row()
call should be updated to reflect the changes implemented in COMCIFS/dREL#9.
cif_core.dic
Outdated
_method.purpose Definition | ||
_method.expression | ||
; | ||
_enumeration.default = Current_Row(publ_author) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Unique_id
function (COMCIFS/dREL#10) might be more suitable here.
If not, the Current_row()
call should be updated to reflect the changes implemented in COMCIFS/dREL#9.
Missing identifiers can now be generated by dREL using the Unique_id function.
As all of the related issues seem to be resolved, I will merge this PR. In the future, if we get any more relations similar to those between |
Recent changes to the author categories introduced author identifiers, which were key data names. However, legacy data files will not have this data name and thus fail validation. The current changes introduce dREL methods that provide default values. Closes #103 .
Note that the dREL is implemented using a
Definition
method rather than anEvaluation
method, as theEvaluation
method would imply a single correct value, and validation would therefore fail if the file contained any values for the identifiers that were not the row numbers.Evaluation
methods were appropriate for_space_group_symop.id
because legacy CIFs used thespace_group_symop
category row number to refer to symops.The
Definition
method creates a new value for the default value for each row. This is an original use ofDefinition
methods, which up until now have been used inSet
categories only, that is,_enumeration.default
might be interpreted as creating a category-wide default. Now thatSet
categories are simply seen asLoop
categories that have been split over several data blocks, there is no fundamental difference between a per-row default and a per-Set-category-row default. But please think if there is any reason not to allow this behaviour.