Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define how to expose version through the collection metadata API #1028

Merged
merged 12 commits into from
Dec 11, 2023

Conversation

Ndpnt
Copy link
Member

@Ndpnt Ndpnt commented Oct 25, 2023

This PR is an experiment to create RFC that will lead to a decision record delivery.

The idea is to add comments right onto the document, much like you would with code. I will update the document for each consensus reached.

The deadline for this RFC is 27/11 end of day AoE (Anywhere on Earth).

@MattiSG MattiSG added the RFC Request for comments label Oct 25, 2023
Copy link
Member

@MattiSG MattiSG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several stylistic changes and complements.

decision-records/0004-data-api.md Outdated Show resolved Hide resolved
decision-records/0004-data-api.md Outdated Show resolved Hide resolved
decision-records/0004-data-api.md Outdated Show resolved Hide resolved
decision-records/0004-data-api.md Outdated Show resolved Hide resolved
decision-records/0004-data-api.md Outdated Show resolved Hide resolved
decision-records/0004-data-api.md Show resolved Hide resolved
decision-records/0004-data-api.md Outdated Show resolved Hide resolved
decision-records/0004-data-api.md Outdated Show resolved Hide resolved
decision-records/0004-data-api.md Outdated Show resolved Hide resolved
Comment on lines 102 to 111
"content": "
Privacy Policy
==============

Last updated: September 26, 2023

This privacy policy explains how, why and under what conditions the Open Terms Archive (OTA) site collects personal information and how it is used. Our privacy policy will change over time. And of course, we also record the changes of [our own documents](https://github.com/OpenTermsArchive/demo-versions/tree/main/Open%20Terms%20Archive) 😉

"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This multi-line example is invalid. We would need to transform newlines to \n. The example should also provide an example with JSON data, to demonstrate how it would be encoded.

@Ndpnt
Copy link
Member Author

Ndpnt commented Dec 5, 2023

Solutions A and C are interesting, but I find that while it's still possible to pass metadata via HTTP headers, this approach is less common and less discoverable than solution B.
Furthermore, in the case of solution C, the need to make a new request due to a redirect for each call does not seem desirable to me. Especially as the caching and transmission of metadata can be achieved in another way and are also doable in solution B.

So, I vote for the solution B as it aligns with the common and anticipated conventions of API responses. It provides a straightforward and highly extensible method for passing metadata in a standardized manner.

@MattiSG @clementbiron @madoleary @michielbdejong What do you think ?

Note: The deadline elapsed a week ago, and I am starting the implementation this afternoon. The decision on the chosen solution will be confirmed tomorrow morning on the basis of the votes taken between now and then.

Ndpnt and others added 3 commits December 5, 2023 11:42
Co-authored-by: Matti Schneider <matti@opentermsarchive.org>
@MattiSG
Copy link
Member

MattiSG commented Dec 5, 2023

Thanks @Ndpnt!

Reading through solutions A, B and C, I came up with solution D that aims at keeping the best parts of each:

  • A for ease of content access
  • B for metadata extensibility
  • C for performance capabilities

It has the added benefit of being possibly split in implementation in two parts, which are made clear in the proposal. Indeed, I feel that we have two orthogonal choices to make rather than 3 full-fledged solutions:

  1. Do we redirect to leverage caching?
  2. Do we serve JSON or Markdown?

…and that choice 2 can be delayed: for sure, there is relevance in JSON extensibility. Let's start with implementing that one and, while we're at it, defining in this RFC how to implement the Markdown version, without committing to implementing it at this stage as we first collect feedback from the JSOn version 🙂

I thus vote for solution D, with only implementation of the JSON version of the endpoint!

@Ndpnt
Copy link
Member Author

Ndpnt commented Dec 5, 2023

Thanks @MattiSG for this relevant contribution!

I really like the idea of mixing solutions to keep the best parts of each 👍. At this stage I still find that using redirections to leverage caching is a premature optimization. And I prefer to wait until the need arises before choosing how to manage the cache. In addition, this caching optimisation also has the cost of introducing an extra round trip between client and server, which I still don't find desirable.

So for now I'm in favor of the solution D without the redirection mechanism. It means where solution D returns HTTP 302, I suggest to returns HTTP 200 with the version (either in markdown format or in JSON format according to the Content-Type). I added it in the document in solution E.

I thus vote for solution E (and still with only implementation of the JSON version).

@clementbiron
Copy link
Member

I'm in favour of solution E, which I feel is sufficient for the uses identified.

Well done Nico for your work!

@madoleary
Copy link

madoleary commented Dec 5, 2023

I, too, am in favor of solution E. I think the implementation is solid, particularly returning HTTP 200 with the version, and I agree that optimization can come later.

For posterity, I'd like to recap a conversation that happened outside of GitHub regarding this RFC and certain questions of mine.

Generally, I was confused by the term "version" and the function of the date. I also didn't understand the difference between a version and a dataset.

@MattiSG explained to me that a “version” is indeed “the content" of the term text, in Markdown format, and that a dataset is the consolidation of all those versions.

Here is an example of a version: https://github.com/OpenTermsArchive/contrib-versions/blob/46d10a6c2ef50a11eaa39d4fa5ae3913e9258831/Atlassian/Privacy%20Policy.md

Here is an example of a dataset: https://github.com/OpenTermsArchive/contrib-versions/releases/tag/dataset-contrib-2023-12-04

Regarding the date, I wondered whether it simply represents the day the terms were added or updated, and whether there could be multiple dates for the same terms type. I also questioned how handling the dates on the ToS;DR end would work, i.e., how we'd know which date to use for the API request.

Matti answered these questions in kind. There is an infinity of valid dates between their first declaration in Open Terms Archive and the current time. This enables clients such as ToS;DR to request any date and let the API retrieve the version (terms text) that was valid at that date. In our use case, it's likely that we'll always use the current date and time to get the latest version, but this is something we should verify.

Thank you!

@Ndpnt
Copy link
Member Author

Ndpnt commented Dec 6, 2023

Solution E received the most votes and will thus be selected.

Thanks for your participation :).

@Ndpnt Ndpnt requested a review from MattiSG December 6, 2023 08:38
| date | URL-encoded ISO 8601 datetime string | The date and time for which the version is requested |

Note about the date:
- A full date and time is required, and not a simple date (such as `2023-10-24`), to avoid ambiguities on days where a version changed, and timezone differences between client and server.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pin the timezone to UTC? So "A full UTC data and time"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the timezone is defined as an offset from UTC (ex: 2023-12-06T12:34:56+02:00) is still ISO 8601 valid.
So, why forcing the timezone to be UTC?

| --------- | ------ | ---------------------- |
| serviceId | URL-encoded string | The ID of the service |
| termsType | URL-encoded string | The name of terms type |
| date | URL-encoded ISO 8601 datetime string | The date and time for which the version is requested |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And require that the timezone be UTC?

@michielbdejong
Copy link
Contributor

michielbdejong commented Dec 6, 2023

Ah, looks like I came to this discussion just a day after it finished :)
OK, cool! You can take my suggestion to require UTC as the only allowed timezone or not, we can also simplify our code on our side to always use UTC even if your API supports more.

I think one way we could use this API is by setting the date far into the future, to get the latest known version, right? For instance:

GET /version/Open%20Terms%20Archive/Privacy%20Policy/9999-01-01T00%3A00%3A00.000Z`

And that would be cacheable! :)

@Ndpnt
Copy link
Member Author

Ndpnt commented Dec 6, 2023

Ah, looks like I came to this discussion just a day after it finished :) OK, cool! You can take my suggestion to require UTC as the only allowed timezone or not, we can also simplify our code on our code to always use UTC even if your API supports more.

👍

I think one way we could use this API is by setting the date far into the future, to get the latest known version, right? For instance:

GET /version/Open%20Terms%20Archive/Privacy%20Policy/9999-01-201T00%3A00%3A00.000Z`

And that would be cacheable! :)

We could also use a more elegant alias (latest for example) which would also allow caching:

GET /version/Open%20Terms%20Archive/Privacy%20Policy/latest

@MattiSG
Copy link
Member

MattiSG commented Dec 8, 2023

I think one way we could use this API is by setting the date far into the future, to get the latest known version, right?

In the current proposal, this would not work and would reply with a 416 Range Not Satisfiable: the idea is that you're asking for the legal text that was applicable at the given date, and we can't guess what text will be applicable in the future.

I believe that this spontaneous usage, instead of using Date.now(), justifies further the relevance of a /latest endpoint.


- HTTP 200 with the Markdown content as body when a version is applicable at this date
- HTTP 404 Not Found with Markdown content `# Error\n\n_No version found for date <requested-date>_` for a date that is anterior to the first available version
- HTTP 400 Bad Request with JSON content `# Error\n\n_Requested date <requested-date> is in the future, no version can exist there_` if the date is in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an inconsistency with the Markdown / JSON content in this proposal here.

Suggested change
- HTTP 400 Bad Request with JSON content `# Error\n\n_Requested date <requested-date> is in the future, no version can exist there_` if the date is in the future.
- HTTP 400 Bad Request with Markdown content `# Error\n\n_Requested date <requested-date> is in the future, no version can exist there_` if the date is in the future.

decision-records/0004-data-api.md Outdated Show resolved Hide resolved
decision-records/0004-data-api.md Outdated Show resolved Hide resolved
decision-records/0004-data-api.md Outdated Show resolved Hide resolved
Co-authored-by: Matti Schneider <matti@opentermsarchive.org>
@Ndpnt Ndpnt requested a review from MattiSG December 11, 2023 10:34
Copy link
Member

@MattiSG MattiSG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone! That works for me 🙂 👍

My only concern is the lack of a /latest route for implementers, but since we introduced a 416 for far-future dates, the /9999 trick suggested will not work and will thus enforce the use of Date.now. This might not be comfortable but is at least forward-compatible, and we could thus introduce the /latest route as an improvement after shipping the first version.

@Ndpnt Ndpnt merged commit 4c411db into main Dec 11, 2023
6 of 7 checks passed
@Ndpnt Ndpnt deleted the add-expose-data-in-api-decision-record branch December 11, 2023 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants