Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow thesis processors to assign accession numbers to degree periods #1214

Merged
merged 3 commits into from
Sep 7, 2023

Conversation

jazairi
Copy link
Contributor

@jazairi jazairi commented Aug 28, 2023

Why these changes are being introduced:

In order to automate SIP processing in Archivematica, incoming
SIPs must have an accession number in their S3 keys. We can
facilitate this on the ETD side by letting thesis processors
create accession numbers and assign them to degree periods, which
in turn will enable a thesis to look up its accession number.

This PR makes some assumptions:

  • Each degree period has one and only one accession number.
  • Each accession number is in the format YYYY_ddd, where YYYY
    is a four-digit year and ddd is a three-digit sequence number.
  • A thesis will always have the same accession number (i.e., a
    thesis' grad date will never change, or if it does, the accession
    number will remain the same).

These assumptions are based on our current understanding of the
requirements. However, we expect that edge cases will arise and
and can make changes retroactively if this approach does not meet
the need. The exception would be looking up an historical
accession number if a thesis' accession number were to change,
since we are not recording that state on the Thesis model.

Relevant ticket(s):

https://mitlibraries.atlassian.net/browse/ETD-589

How this addresses that need:

This adds Accession and Degree Period models and corresponding
administrate dashboards. The degree periods table is autopopulated
on creation based on the grad date data in the thesis table.

Processors can use the dashboards to create accessions and assign
them to degree periods, or create degree periods to prepare for
future thesis submissions.

Note that assigning accession numbers to theses/SIPs is explicitly
out of scope for this ticket. This will likely be a convenience
method on the Thesis model that looks up the accession number as
needed, which will also be added as a publication readiness check.

Side effects of this change:

  • As noted above, this approach does not allows us to look up a
    past accession number for a thesis. If there's any possibility that
    a thesis' accession number will change over time, then we should
    consider adding state to the Thesis or SIP models to facilitate this.
  • It may be slightly confusing to introduce a Degree Period model
    that only interacts with the Accession model. This is intended as
    a course correction for not having a Degree Period model when we
    began developing this application. Degree periods are central to
    the ETD data model and should likely be first class citizens.
    Longer term, the grad date logic on the Thesis and Transfer models
    should be refactored to reference the Degree Period model.
  • Optimally, processors would be able to create and edit this data
    in a single custom UI rather than in two administrate dashboards.
    However, processors already use administrate to do other things. A
    potential future refactor would be to ensure that processors can
    do everything they need outside of administrate, so the dashboards
    are only for developer use.
  • The Rails Mermaid ERD gem has been added, but I have not included the generated markdown because GitHub has a problem rendering it that seems to be related to the version of Mermaid it's using. (See that commit message for more details.)

Developer

  • All new ENV is documented in README
  • All new ENV has been added to Heroku Pipeline, Staging and Prod
  • ANDI or Wave has been run in accordance to
    our guide and
    all issues introduced by these changes have been resolved or opened as new
    issues (link to those issues in the Pull Request details above)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer

  • The commit message is clear and follows our guidelines
    (not just this pull request message)
  • There are appropriate tests covering any new functionality
  • The documentation has been updated or is unnecessary
  • The changes have been verified
  • New dependencies are appropriate or there were no changes

Requires database migrations?

YES

Includes new or updated dependencies?

YES

@mitlib mitlib temporarily deployed to thesis-submit-pr-1214 August 28, 2023 14:26 Inactive
@jazairi jazairi force-pushed the etd-589-accession-model branch from 68d847d to 2cac042 Compare August 28, 2023 14:33
@jazairi jazairi temporarily deployed to thesis-submit-pr-1214 August 28, 2023 14:34 Inactive
@jazairi jazairi force-pushed the etd-589-accession-model branch from 2cac042 to c49ead3 Compare August 28, 2023 14:35
@jazairi jazairi temporarily deployed to thesis-submit-pr-1214 August 28, 2023 14:35 Inactive
@jazairi jazairi force-pushed the etd-589-accession-model branch from c49ead3 to 7d36544 Compare August 28, 2023 14:37
@jazairi jazairi had a problem deploying to thesis-submit-pr-1214 August 28, 2023 14:37 Failure
@coveralls
Copy link

coveralls commented Aug 28, 2023

Coverage Status

coverage: 98.344% (+0.03%) from 98.313% when pulling a7aa9db on etd-589-accession-model into ac9a23d on main.

@jazairi jazairi force-pushed the etd-589-accession-model branch from 7d36544 to 6aa4816 Compare August 28, 2023 14:47
@jazairi jazairi had a problem deploying to thesis-submit-pr-1214 August 28, 2023 14:47 Failure
@jazairi jazairi force-pushed the etd-589-accession-model branch from 6aa4816 to c0ae957 Compare August 28, 2023 16:18
@jazairi jazairi had a problem deploying to thesis-submit-pr-1214 August 28, 2023 16:18 Failure
@jazairi jazairi force-pushed the etd-589-accession-model branch from c0ae957 to 8746064 Compare August 28, 2023 16:23
@jazairi jazairi had a problem deploying to thesis-submit-pr-1214 August 28, 2023 16:23 Failure
@jazairi jazairi force-pushed the etd-589-accession-model branch from 8746064 to ae2e84e Compare August 28, 2023 16:28
@jazairi jazairi temporarily deployed to thesis-submit-pr-1214 August 28, 2023 16:28 Inactive
@jazairi jazairi force-pushed the etd-589-accession-model branch from ae2e84e to 62946fa Compare August 28, 2023 16:54
@jazairi jazairi temporarily deployed to thesis-submit-pr-1214 August 28, 2023 16:54 Inactive
@jazairi jazairi marked this pull request as ready for review August 28, 2023 16:57
@jazairi
Copy link
Contributor Author

jazairi commented Aug 28, 2023

This is a big one, and I invite the reviewer to interrogate any part of it (including, but not limited to, the data modeling approach). Thanks in advance to whomever takes this on!

@JPrevost JPrevost self-assigned this Aug 28, 2023
@JPrevost JPrevost temporarily deployed to thesis-submit-pr-1214 August 29, 2023 13:35 Inactive
app/models/accession.rb Outdated Show resolved Hide resolved
app/models/degree_period.rb Show resolved Hide resolved
@@ -0,0 +1,20 @@
# == Schema Information
#
# Table name: accessions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accession as the Class name for this feels not quite right to me. It feels too generic and by default I feel like I'd assume it was related to something with this App itself rather than an external system this app interacts with.

I know it's somewhat clunkier, but what are your thoughts on something like ArchivematicaAccession or ArchivalAccession to better reflect that the Accession here is external in nature?

(this is just a feeling, I suspect I can be convinced this name is okay)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd lean ArchivematicaAccession. Normally I hate naming things after a product, but we'll likely have bigger things to worry about than a model name if we change our preservation system.

Copy link
Contributor Author

@jazairi jazairi Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We sort of have mixed usage in this area. SubmissionInformationPackage doesn't indicate an external system, but DspaceMetadata does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now let's not touch anything that exists outside of these changes but go with ArchivematicaAccession for this new model if that feels okay-ish to you.

I'm not entirely sure how much of our SIP is actually Archivematica specific and how much follows something like an OAIS SIP format.

Copy link
Contributor Author

@jazairi jazairi Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point. The SIPs are intended for use in Archivematica, but they could be used in any digital preservation system. (I noticed we do have ArchivematicaMetadata, so the proposed renaming is internally consistent in that regard.)

updated_at
].freeze

# FORM_ATTRIBUTES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is possible to add a note to the new and edit forms to remind the user that this does not actually "mint" a new Archivematica Accession and this is just where they "enter" the Accession number they minted in Archivematica? I know the current staff understand that but I'm nervous that something like this could get messy in the future if we have staff turnover and they just make stuff up in this without understanding where they are supposed to get the value for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would require custom new and edit views. Is that alright with you? I'm not opposed to it, but I want to make sure it's worth the additional debt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm waffly on it. I'd prefer a future user didn't just add nonsense here and had context as to where the number should be generated before entering it... but also yeah I'm not sure it's worth it. I guess we could ask Mikki/Wilder if they are maintaining internal documentation someplace that explains how to actually do the Thesis Processor job that would include something like that naturally?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think for now let's not fork the templates and remind/inform stakeholders they need to make sure documentation exists and is accurate someplace. If they ask us to add it to those forms, we can fork though.

@@ -6,14 +6,15 @@
[our guide](https://mitlibraries.github.io/guides/basics/a11y.html) and
all issues introduced by these changes have been resolved or opened as new
issues (link to those issues in the Pull Request details above)
- [ ] ERD has been updated (if applicable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can add this gem even if we haven't automated/semi-automated building the diagrams. However, I think adding these two changes to the PR template will cause confusion as there currently isn't anything to actually do with the ERD for either the reviewer or the developer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I'll take it out for now (and hopefully the gem will work soon).

test/models/accession_test.rb Outdated Show resolved Hide resolved
test/models/degree_period_test.rb Show resolved Hide resolved
app/models/degree_period.rb Show resolved Hide resolved
app/models/accession.rb Outdated Show resolved Hide resolved
validates :accession_number, presence: true,
format: {
with: /\A(19|20)\d{2}_\d{3}\z/i,
message: 'must match the format `YYYY_seq`, where `YYYY` is a year between 1900 and ' \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should double check with Joe that this validation is correct. From my understanding it is, but it would be good to specifically ask him to confirm that 2023_1000 or 2023_0001 are actually impossible or just not their current practice. We can validate on whatever they want, but I curious if an accession number of whatever is technically valid in Archivematica so we might want to note someplace that our validations are based on current practices and not technical requirements (I'm not sure where we'd even note that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joe says that the number will always follow that format. He didn't indicate whether another format was technically possible; I'm not sure my question was clear enough to get to that.

From my perspective, though, I don't think that ETD should concern itself with what is possible in Archivematica, only what stakeholders' current needs/preferences are. Could you say more about why it's important to document that? (Fwiw, if we do, I'd put it in the readme.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'd look for that kind of thing in a readme so maybe we don't document it at this time. My inclination for wanting to document that this may be an arbitrary validation would be so when it inevitably starts not accepting "valid" stakeholder supplied accession numbers we can remember the reason we did this was not for some inherently important reason but because we are trying to match an external requirement. i.e. we don't care what they put here, they do. I suspect we might get confused later on but I suppose because our code links to tickets that shouldn't be too hard for us to track this back to the original requirements coming from stakeholders and not us. tldr; all good nothing to change here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the Jira discussion, I'd say that renaming the model to ArchivematicaAccession and including the context you provided will help diffuse any future confusion. ("Why did we make this number like this? Oh right, it's for Archivematica.")

My hope is that, even if we can't find the ticket, stakeholders would be able to help us understand why things were the way they were and/or why they have changed. As a former digital preservationist, I know that painstaking documentation is 90% of the job. :)

@jazairi jazairi requested a review from JPrevost August 30, 2023 19:35
@jazairi
Copy link
Contributor Author

jazairi commented Aug 30, 2023

@JPrevost I think those last two commits address everything, but let me know if I missed something. Thanks for the in-depth review!

Copy link
Member

@JPrevost JPrevost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool to only show Degrees that don't already have an AccessionNumber in the form that allows creating a new AccessionNumber dropdown but I didn't see an easy way to do that so we can add that to our mental "reasons we shouldn't use an automated dashboard for actual staff workflows" list.

Why these changes are being introduced:

In order to automate SIP processing in Archivematica, incoming
SIPs must have an accession number in their S3 keys. We can
facilitate this on the ETD side by letting thesis processors
create accession numbers and assign them to degree periods, which
in turn will enable a thesis to look up its accession number.

This PR makes some assumptions:
* Each degree period has one and only one accession number.
* Each accession number is in the format `YYYY_ddd`, where `YYYY`
is a four-digit year and `ddd` is a three-digit sequence number.
* A thesis will always have the same accession number (i.e., a
thesis' grad date will never change, or if it does, the accession
number will remain the same).

These assumptions are based on our current understanding of the
requirements. However, we expect that edge cases will arise and
and can make changes retroactively if this approach does not meet
the need. The exception would be looking up an historical
accession number if a thesis' accession number were to change,
since we are not recording that state on the Thesis model.

Relevant ticket(s):

https://mitlibraries.atlassian.net/browse/ETD-589

How this addresses that need:

This adds ArchivematicaAccession and Degree Period models and
corresponding administrate dashboards. The degree periods table
is autopopulated on creation based on the grad date data in the
thesis table.

Processors can use the dashboards to create archivematica
accessions and assign them to degree periods, or create degree
periods to prepare for future thesis submissions.

Note that assigning accession numbers to theses/SIPs is explicitly
out of scope for this ticket. This will likely be a convenience
method on the Thesis model that looks up the accession number as
needed, which will also be added as a publication readiness check.

Side effects of this change:

* As noted above, this approach does not allows us to look up a
past accession number for a thesis. If there's any possibility that
a thesis' accession number will change over time, then we should
consider adding state to the Thesis or SIP models to facilitate this.
* It may be slightly confusing to introduce a Degree Period model
that only interacts with the ArchivematicaAccession model. This is
intended as a course correction for not having a Degree Period model
when we began developing this application. Degree periods are central to
the ETD data model and should likely be first class citizens.
Longer term, the grad date logic on the Thesis and Transfer models
should be refactored to reference the Degree Period model.
* Optimally, processors would be able to create and edit this data
in a single custom UI rather than in two administrate dashboards.
However, processors already use administrate to do other things. A
potential future refactor would be to ensure that processors can
do everything they need outside of administrate, so the dashboards
are only for developer use.

co-authored-by: Jeremy Prevost <JPrevost@users.noreply.github.com>
This was intended only for Accession and Degree Period, but it
caught a few others, so I'm including it as a separate commit.
@jazairi jazairi force-pushed the etd-589-accession-model branch from a7aa9db to f0b4a98 Compare September 7, 2023 13:54
Why these changes are being introduced:

EngX has been discussing ways to add more visual documentation
to our applications. The Rails Mermaid ERD gem is one way to
achieve that.

Relevant ticket(s):

N/A

How this addresses that need:

This adds the Rails Mermaid ERD gem. However, it does not include the
actual ERD markdown, because it doesn't work just yet (see side effects).

Side effects of this change:

GitHub cannot currently render rich display Mermaid diagrams. Apparently,
this has been fixed upstream, but GitHub is a few versions behind. Once this
is fixed in GitHub, we should generate the diagram, include it in the readme,
and update the PR template.
@jazairi jazairi force-pushed the etd-589-accession-model branch from 615d036 to d647acd Compare September 7, 2023 14:02
@jazairi jazairi merged commit 4b03d42 into main Sep 7, 2023
1 check passed
@jazairi jazairi deleted the etd-589-accession-model branch September 7, 2023 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants