-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow thesis processors to assign accession numbers to degree periods #1214
Conversation
68d847d
to
2cac042
Compare
2cac042
to
c49ead3
Compare
c49ead3
to
7d36544
Compare
7d36544
to
6aa4816
Compare
6aa4816
to
c0ae957
Compare
c0ae957
to
8746064
Compare
8746064
to
ae2e84e
Compare
ae2e84e
to
62946fa
Compare
This is a big one, and I invite the reviewer to interrogate any part of it (including, but not limited to, the data modeling approach). Thanks in advance to whomever takes this on! |
app/models/accession.rb
Outdated
@@ -0,0 +1,20 @@ | |||
# == Schema Information | |||
# | |||
# Table name: accessions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accession as the Class name for this feels not quite right to me. It feels too generic and by default I feel like I'd assume it was related to something with this App itself rather than an external system this app interacts with.
I know it's somewhat clunkier, but what are your thoughts on something like ArchivematicaAccession
or ArchivalAccession
to better reflect that the Accession
here is external in nature?
(this is just a feeling, I suspect I can be convinced this name is okay)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd lean ArchivematicaAccession
. Normally I hate naming things after a product, but we'll likely have bigger things to worry about than a model name if we change our preservation system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We sort of have mixed usage in this area. SubmissionInformationPackage
doesn't indicate an external system, but DspaceMetadata
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now let's not touch anything that exists outside of these changes but go with ArchivematicaAccession
for this new model if that feels okay-ish to you.
I'm not entirely sure how much of our SIP is actually Archivematica specific and how much follows something like an OAIS SIP format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point. The SIPs are intended for use in Archivematica, but they could be used in any digital preservation system. (I noticed we do have ArchivematicaMetadata, so the proposed renaming is internally consistent in that regard.)
updated_at | ||
].freeze | ||
|
||
# FORM_ATTRIBUTES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is possible to add a note to the new
and edit
forms to remind the user that this does not actually "mint" a new Archivematica Accession and this is just where they "enter" the Accession number they minted in Archivematica? I know the current staff understand that but I'm nervous that something like this could get messy in the future if we have staff turnover and they just make stuff up in this without understanding where they are supposed to get the value for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would require custom new
and edit
views. Is that alright with you? I'm not opposed to it, but I want to make sure it's worth the additional debt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm waffly on it. I'd prefer a future user didn't just add nonsense here and had context as to where the number should be generated before entering it... but also yeah I'm not sure it's worth it. I guess we could ask Mikki/Wilder if they are maintaining internal documentation someplace that explains how to actually do the Thesis Processor job that would include something like that naturally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think for now let's not fork the templates and remind/inform stakeholders they need to make sure documentation exists and is accurate someplace. If they ask us to add it to those forms, we can fork though.
.github/PULL_REQUEST_TEMPLATE.md
Outdated
@@ -6,14 +6,15 @@ | |||
[our guide](https://mitlibraries.github.io/guides/basics/a11y.html) and | |||
all issues introduced by these changes have been resolved or opened as new | |||
issues (link to those issues in the Pull Request details above) | |||
- [ ] ERD has been updated (if applicable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we can add this gem even if we haven't automated/semi-automated building the diagrams. However, I think adding these two changes to the PR template will cause confusion as there currently isn't anything to actually do with the ERD for either the reviewer or the developer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I'll take it out for now (and hopefully the gem will work soon).
app/models/accession.rb
Outdated
validates :accession_number, presence: true, | ||
format: { | ||
with: /\A(19|20)\d{2}_\d{3}\z/i, | ||
message: 'must match the format `YYYY_seq`, where `YYYY` is a year between 1900 and ' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should double check with Joe that this validation is correct. From my understanding it is, but it would be good to specifically ask him to confirm that 2023_1000
or 2023_0001
are actually impossible or just not their current practice. We can validate on whatever they want, but I curious if an accession number of whatever
is technically valid in Archivematica so we might want to note someplace that our validations are based on current practices and not technical requirements (I'm not sure where we'd even note that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Joe says that the number will always follow that format. He didn't indicate whether another format was technically possible; I'm not sure my question was clear enough to get to that.
From my perspective, though, I don't think that ETD should concern itself with what is possible in Archivematica, only what stakeholders' current needs/preferences are. Could you say more about why it's important to document that? (Fwiw, if we do, I'd put it in the readme.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'd look for that kind of thing in a readme so maybe we don't document it at this time. My inclination for wanting to document that this may be an arbitrary validation would be so when it inevitably starts not accepting "valid" stakeholder supplied accession numbers we can remember the reason we did this was not for some inherently important reason but because we are trying to match an external requirement. i.e. we don't care what they put here, they do. I suspect we might get confused later on but I suppose because our code links to tickets that shouldn't be too hard for us to track this back to the original requirements coming from stakeholders and not us. tldr; all good nothing to change here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the Jira discussion, I'd say that renaming the model to ArchivematicaAccession
and including the context you provided will help diffuse any future confusion. ("Why did we make this number like this? Oh right, it's for Archivematica.")
My hope is that, even if we can't find the ticket, stakeholders would be able to help us understand why things were the way they were and/or why they have changed. As a former digital preservationist, I know that painstaking documentation is 90% of the job. :)
@JPrevost I think those last two commits address everything, but let me know if I missed something. Thanks for the in-depth review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be cool to only show Degrees that don't already have an AccessionNumber in the form that allows creating a new AccessionNumber dropdown but I didn't see an easy way to do that so we can add that to our mental "reasons we shouldn't use an automated dashboard for actual staff workflows" list.
Why these changes are being introduced: In order to automate SIP processing in Archivematica, incoming SIPs must have an accession number in their S3 keys. We can facilitate this on the ETD side by letting thesis processors create accession numbers and assign them to degree periods, which in turn will enable a thesis to look up its accession number. This PR makes some assumptions: * Each degree period has one and only one accession number. * Each accession number is in the format `YYYY_ddd`, where `YYYY` is a four-digit year and `ddd` is a three-digit sequence number. * A thesis will always have the same accession number (i.e., a thesis' grad date will never change, or if it does, the accession number will remain the same). These assumptions are based on our current understanding of the requirements. However, we expect that edge cases will arise and and can make changes retroactively if this approach does not meet the need. The exception would be looking up an historical accession number if a thesis' accession number were to change, since we are not recording that state on the Thesis model. Relevant ticket(s): https://mitlibraries.atlassian.net/browse/ETD-589 How this addresses that need: This adds ArchivematicaAccession and Degree Period models and corresponding administrate dashboards. The degree periods table is autopopulated on creation based on the grad date data in the thesis table. Processors can use the dashboards to create archivematica accessions and assign them to degree periods, or create degree periods to prepare for future thesis submissions. Note that assigning accession numbers to theses/SIPs is explicitly out of scope for this ticket. This will likely be a convenience method on the Thesis model that looks up the accession number as needed, which will also be added as a publication readiness check. Side effects of this change: * As noted above, this approach does not allows us to look up a past accession number for a thesis. If there's any possibility that a thesis' accession number will change over time, then we should consider adding state to the Thesis or SIP models to facilitate this. * It may be slightly confusing to introduce a Degree Period model that only interacts with the ArchivematicaAccession model. This is intended as a course correction for not having a Degree Period model when we began developing this application. Degree periods are central to the ETD data model and should likely be first class citizens. Longer term, the grad date logic on the Thesis and Transfer models should be refactored to reference the Degree Period model. * Optimally, processors would be able to create and edit this data in a single custom UI rather than in two administrate dashboards. However, processors already use administrate to do other things. A potential future refactor would be to ensure that processors can do everything they need outside of administrate, so the dashboards are only for developer use. co-authored-by: Jeremy Prevost <JPrevost@users.noreply.github.com>
This was intended only for Accession and Degree Period, but it caught a few others, so I'm including it as a separate commit.
a7aa9db
to
f0b4a98
Compare
Why these changes are being introduced: EngX has been discussing ways to add more visual documentation to our applications. The Rails Mermaid ERD gem is one way to achieve that. Relevant ticket(s): N/A How this addresses that need: This adds the Rails Mermaid ERD gem. However, it does not include the actual ERD markdown, because it doesn't work just yet (see side effects). Side effects of this change: GitHub cannot currently render rich display Mermaid diagrams. Apparently, this has been fixed upstream, but GitHub is a few versions behind. Once this is fixed in GitHub, we should generate the diagram, include it in the readme, and update the PR template.
615d036
to
d647acd
Compare
Why these changes are being introduced:
In order to automate SIP processing in Archivematica, incoming
SIPs must have an accession number in their S3 keys. We can
facilitate this on the ETD side by letting thesis processors
create accession numbers and assign them to degree periods, which
in turn will enable a thesis to look up its accession number.
This PR makes some assumptions:
YYYY_ddd
, whereYYYY
is a four-digit year and
ddd
is a three-digit sequence number.thesis' grad date will never change, or if it does, the accession
number will remain the same).
These assumptions are based on our current understanding of the
requirements. However, we expect that edge cases will arise and
and can make changes retroactively if this approach does not meet
the need. The exception would be looking up an historical
accession number if a thesis' accession number were to change,
since we are not recording that state on the Thesis model.
Relevant ticket(s):
https://mitlibraries.atlassian.net/browse/ETD-589
How this addresses that need:
This adds Accession and Degree Period models and corresponding
administrate dashboards. The degree periods table is autopopulated
on creation based on the grad date data in the thesis table.
Processors can use the dashboards to create accessions and assign
them to degree periods, or create degree periods to prepare for
future thesis submissions.
Note that assigning accession numbers to theses/SIPs is explicitly
out of scope for this ticket. This will likely be a convenience
method on the Thesis model that looks up the accession number as
needed, which will also be added as a publication readiness check.
Side effects of this change:
past accession number for a thesis. If there's any possibility that
a thesis' accession number will change over time, then we should
consider adding state to the Thesis or SIP models to facilitate this.
that only interacts with the Accession model. This is intended as
a course correction for not having a Degree Period model when we
began developing this application. Degree periods are central to
the ETD data model and should likely be first class citizens.
Longer term, the grad date logic on the Thesis and Transfer models
should be refactored to reference the Degree Period model.
in a single custom UI rather than in two administrate dashboards.
However, processors already use administrate to do other things. A
potential future refactor would be to ensure that processors can
do everything they need outside of administrate, so the dashboards
are only for developer use.
Developer
our guide and
all issues introduced by these changes have been resolved or opened as new
issues (link to those issues in the Pull Request details above)
Code Reviewer
(not just this pull request message)
Requires database migrations?
YES
Includes new or updated dependencies?
YES