-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement server-side bookmarks #2843
Conversation
716bb0e
to
474da36
Compare
@catileptic I’ve thought about the bulk endpoint: Right now, the default Bulk creation is only required for a one-time migration of client-side bookmarks. Apart from that, the bulk endpoint will need to silently ignore entities that are already bookmarked or that do not exist/are not accessible to the user. It will need to support a user-provided Maybe it’s better to have the default Then, we could have a separate |
69bc551
to
fbcaad2
Compare
|
||
for bookmark in data: | ||
try: | ||
entity = get_index_entity(bookmark.get("entity_id")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could instead use entities_by_ids
from aleph.index.entities
to get all entities in a single ES request, but would need to manually check permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this following similar conventions in other parts of the codebase? What I'm saying is that if other areas are manually checking the permissions it might be worth following the convention, if not, then that's a good indicator that perhaps this is the way to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. For singular resources, we do rely on these helpers to check permissions and throw appropriate errors in case of missing permissions etc.
For lists of many resources, we usually ensure that only resources the current user can access are returned at the query level, and we have another last-resort check when serializing resources in API responses.
This case is a little different though -- we’re not processing data from the database, but from a user. And we can’t rely on the serializer-level check because we actually need to know about the issues in order to show a message to the user.
The current solution is the least complex, but I’m not sure if it might lead to issues when migrating huge lists of bookmarks due to the O(n) ES requests. So far it hasn’t been a problem locally, and I’ll make sure to test it on staging once it’s out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try this approach, which seems easier to follow and only change it if we see issues with it.
fbcaad2
to
a890ead
Compare
@@ -165,6 +165,7 @@ def prune_entity(collection, entity_id=None, job_id=None): | |||
if doc is not None: | |||
doc.delete() | |||
EntitySetItem.delete_by_entity(entity_id) | |||
Bookmark.delete_by_entity(entity_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this isn’t covered by tests. I was trying to find existing tests for this method without success so far, and I guess it doesn’t make a lot of sense to test prune_entity
in isolation? Anyone knows whether this is already covered as part of a bigger integration-style test somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I'm not seeing any test coverage for this. And I agree it should go into an integration test. Shall we leave a TODO here?
db.create_all() | ||
flask_migrate.upgrade() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we created the DB schema based on the model classes. I know that some people use Alembic in a way where the model classes are the only source of truth and contain all information required to generate the schema, and then auto-generate alembic revisions based on the model classes. We don’t seem to do that though (at least not consistently), as many indexes etc. are only defined in revision files, so we ended up with inconsistent DB schemata in test and dev/prod environemnts.
Not sure if there is any reason not to use the actual migrations (except for maybe test performance and historic reason)?
I tested if this has an effect on how long it takes to execute tests but couldn’t find a significant difference.
36a6865
to
fbf4c27
Compare
298fb4c
to
f5ac9d5
Compare
@@ -176,5 +176,6 @@ def migrate(): | |||
index_elements=[Bookmark.role_id, Bookmark.entity_id], | |||
) | |||
db.session.execute(stmt) | |||
db.session.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I accidentally removed this line in a previous commit or rebase I think. I was a little surprised the existing test didn’t catch this. When running this in development, if the session is not committed, no bookmarks are created (expected). However, in a test environment, asserting that bookmarks exist does not fail.
Pretty sure the cause is related to SQLAlchemy’s session handling, autocommit/transaction isolation config or something like that, but wasn’t able to isolate it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm great point and I see no immediate answer to this. Anything wrapped in a session should be running in a transaction so I would expect this to require a commit. But perhaps the the default isolation level is autocommit? 🤔
|
||
for bookmark in data: | ||
try: | ||
entity = get_index_entity(bookmark.get("entity_id")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this following similar conventions in other parts of the codebase? What I'm saying is that if other areas are manually checking the permissions it might be worth following the convention, if not, then that's a good indicator that perhaps this is the way to go
entity = get_index_entity(entity_id, request.authz.READ) | ||
except (NotFound, Forbidden): | ||
raise BadRequest( | ||
"Could not bookmark the given entity as the entity does not exist or you do not have access." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps: "We're sorry but we are unable to bookmark this entity for you" - But if this happens are we logging/tracking it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, Till! Outside of the mysterious commit
question, which I can't quite answer, everything seems good to go! 👏
Before, we created the database schema based on model classes which meant we'd end up with a slightly different schema in test and dev/prod environments.
As a reminder to my future self: In Aleph’s frontend, we use our own mini data fetching framework built on top of Redux. One thing it does is caching data from API responses. For example, when a user views their bookmarks, does something else and then views the bookmarks again, the bookmarks are only fetched once. When viewing the bookmarks the second time, we render them based on a runtime cache. This can lead to outdated data being displayed. For example, when the user creates a new bookmark *after* the bookmarks have been loaded, the list of bookmarks would be outdated. Our mini framework does handle data invalidation, but only globally, for all cached data. That works ok in most cases, but for bookmarks, it leads to a bad UX. When you view an entity, then click on the bookmarks button, it would cause the entire page (all the data about the entity) to reload, even though none of that data has changed. The only thing that has changed is the list of bookmarks. We handle data invalidation by storing the timestamp when a data object was loaded and the timestamp of the last mutation. Whenever we render cached data, we check whether the cached data might be outdated (i.e. when it has been loaded before the latest mutation). Until now, we only stored one global mutation timestamp. Whenever that timestamp was updated, all cached data became outdated. Now, in addition to the global mutation timestamp, we have an option to store mutation timestamp for specific subsets of the cached data. So when creating or deleting a bookmark, instead of updating the global mutation timestamp (which would invalidate all cached data), we can update the timestamp for the `bookmarks` mutation key. This would invalidate only cached bookmarks, but no other data.
d11036c
to
ff6a27d
Compare
* Implement basic CRUD operations for bookmarks * Delete bookmarks when entity is deleted * Run Alembic migrations in test environment Before, we created the database schema based on model classes which meant we'd end up with a slightly different schema in test and dev/prod environments. * Add endpoint to migrate bookmarks from client-side storage * Return whether entity is bookmarked in entity API response * Remove warning popover when using bookmarks feature for the first time * Load bookmarks from API * Automatically migrate local bookmarks * Extend data fetching logic to support partial invalidations As a reminder to my future self: In Aleph’s frontend, we use our own mini data fetching framework built on top of Redux. One thing it does is caching data from API responses. For example, when a user views their bookmarks, does something else and then views the bookmarks again, the bookmarks are only fetched once. When viewing the bookmarks the second time, we render them based on a runtime cache. This can lead to outdated data being displayed. For example, when the user creates a new bookmark *after* the bookmarks have been loaded, the list of bookmarks would be outdated. Our mini framework does handle data invalidation, but only globally, for all cached data. That works ok in most cases, but for bookmarks, it leads to a bad UX. When you view an entity, then click on the bookmarks button, it would cause the entire page (all the data about the entity) to reload, even though none of that data has changed. The only thing that has changed is the list of bookmarks. We handle data invalidation by storing the timestamp when a data object was loaded and the timestamp of the last mutation. Whenever we render cached data, we check whether the cached data might be outdated (i.e. when it has been loaded before the latest mutation). Until now, we only stored one global mutation timestamp. Whenever that timestamp was updated, all cached data became outdated. Now, in addition to the global mutation timestamp, we have an option to store mutation timestamp for specific subsets of the cached data. So when creating or deleting a bookmark, instead of updating the global mutation timestamp (which would invalidate all cached data), we can update the timestamp for the `bookmarks` mutation key. This would invalidate only cached bookmarks, but no other data. * Actually commit ORM session to execute queries * Update wording
The current experimental bookmarks feature in Aleph stores bookmark in local storage in the browser. This PR extends the feature to store bookmarks on the server, preventing a few common issues (for example users losing their bookmarks after clearing the browsing data). Closes #2831.
Specifically, there are three primary changes:
/bookmarks
API endpoint to retrieve, create, and delete bookmarks.API
bookmark
table in Postgres. This table stores the role ID, entity ID and (to support efficient querying based on users permissions) the collection ID.DatabaseQuery
class, and merging it with entity data from the ElasticSearch index using theSerializer
class. So it’s actually quite simple.UI
Migration
How to test the migration
Testing the happy path:
develop
branch.Testing migration errors:
develop
branch.Backend:
POST /bookmarks
endpoint that handles both creating a single bookmark as well as bulk creation (see comment below).GET /bookmarks
: Endpoint should return bookmarks sorted by creation date, in descending order.POST /bookmarks
: Endpoint should validateentity_id
when creating a new bookmark.POST /bookmarks
: Creating a bookmark for an entity that is already bookmarked shouldn’t raise an exception, but silently ignore the request or return a semantic status code (see review comment below).GET /entities/:entity_id
to return whether the entity is bookmarked if the user is authenticatedOptional: Delete bookmarks if entity is deleted. (Check entity set implementation)Update: Entity sets are very similar to bookmarks. I checked the current implementation and right now, if you delete an entity, corresponding entity set items aren’t deleted. They are obviously not returned from API responses and not displayed in the UI, but we keep the records in the database. I think it’s sensible to keep the bookmarks implementation similar for now and don’t think it’s a huge issue.Frontend:
Other:
entity_id
column? This will also help us get a better understanding of Aleph’s architecture.