Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change tracking #4768

Closed
wants to merge 22 commits into from
Closed

Change tracking #4768

wants to merge 22 commits into from

Conversation

kravets-levko
Copy link
Collaborator

@kravets-levko kravets-levko commented Apr 1, 2020

What type of PR is this? (check all applicable)

  • Refactor
  • Feature

Description

  • Improve existing change tracking mechanism:
    • choose which attributes to track
    • track creations and deletions (in addition to modifications)
    • track relations (visualizations for queries, widgets for dashboard)
      • when forking query, log creation of all new visualizations
      • track API key updates
  • Enable change tracking for models:
    • Query
      • Visualization
    • Dashboard
      • Widget
      • ApiKey
    • Alert
    • NotificationDestination
  • API to get recent changes
  • Migration to clear old data and update table structure
  • Update tests
    • for models
    • for API endpoints

Example of data stored in DB

image

Closes #3136.

@kravets-levko kravets-levko self-assigned this Apr 1, 2020
@kravets-levko
Copy link
Collaborator Author

@arikfr (cc @rauchy) I still need to update tests, but the feature code is ready and I'd like to know your opinion about the result/what could be improved/etc. Also, I have two more questions:

  1. when forking query, should it also log creation of all new visualizatons?
  2. should it track API key updates (for dashboards and queries)?

def track_changes(attributes, parent_attr=None):
attributes = set(attributes) - {"id", "created_at", "updated_at", "version"}

# monkey-patch class because inheritance will break SQLAlchemy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to create a decorator for change tracking because it looks nice when you use in when defining a model. Decorator needs to modify original model class: intercept __setattr__ and add a method to save changes. First I tried to inherit from original model class (class ChangeTracking(cls)) and it even worked. But in some cases SQLA was throwing errors - something like class is not mapped or like this (in particular that happened in tests). So eventually I checked how they implement own decorators in SQLA (e.g. sqlalchemy_utils.models.generic_repr) and realized that instead of using inheritance, they just monkey-patch the class (replace existing methods with own ones). I tried to do so - and it worked. OOP in Python is a bit weird; I guess when I inherited my class - it changed something in a mechanism SQLA uses to get model's metadata; when patching class I in fact keep the class itself untouched, and everything works

redash/models/changes.py Show resolved Hide resolved
redash/models/changes.py Outdated Show resolved Hide resolved

changes = Change(
object=self,
object_version=getattr(self, "version", None),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we have any use for version -- maybe we can skip it?

Copy link
Collaborator Author

@kravets-levko kravets-levko Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Upd.: if needed - we can always log it as a regular field (in changes array)

user=changed_by,
change={
"object_type": self.__table__.name,
"object_id": self.id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't object_type/object_id redundant? Or you have it here for cases when changes.object is the parent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you have it here for cases when changes.object is the parent?

Exactly. When we modify, say, visualization - we log it as a change to corresponding query; but then there is no other way to know that changes were made to visualization. I tried few different data structures, but this one looked the most straightforward

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comment in the code that describes a structure of changes record and why we need object_type/object_id in changes field

redash/models/changes.py Outdated Show resolved Hide resolved
redash/models/changes.py Show resolved Hide resolved
@@ -1,12 +1,20 @@
from sqlalchemy.inspection import inspect
from sqlalchemy_utils.models import generic_repr
from sqlalchemy.orm import object_session

from enum import Enum

from .base import GFKBase, db, Column
from .types import PseudoJSON


@generic_repr("id", "object_type", "object_id", "created_at")
class Change(GFKBase, db.Model):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add index(es) for this table to make it efficient to run a query like: give me last N changes for object type X and id Y.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I added only index for object_type, object_id pair. When I'll start working on API - I'll check if we need more indexes or not.

@arikfr
Copy link
Member

arikfr commented Apr 10, 2020

when forking query, should it also log creation of all new visualizatons?

Yes.

should it track API key updates (for dashboards and queries)?

Yes.

@kravets-levko
Copy link
Collaborator Author

Recent changes (since last code review):

  • when creating/forking query it now logs its visualizations as well;
  • stopped logging object_version, removed this column from the changes table; updated migration script;
  • added index for object_type + object_id fields;
  • when creating new instance - log all initial fields (not just modified);
  • log changes to API keys;
  • optimized a way we access "parent" object - now it will not load instance from DB (this also simplified API keys tracking).

@kravets-levko kravets-levko marked this pull request as ready for review April 20, 2020 16:27
@kravets-levko kravets-levko requested a review from arikfr April 20, 2020 16:58
@jhult
Copy link
Contributor

jhult commented Jan 31, 2021

This looks like a cool feature. Any news on it?

@guidopetri
Copy link
Contributor

@kravets-levko , thanks for the PR! We've updated a lot of things now that we're Community-driven so - if you're still interested (or @jhult ) in getting this merged - would you mind rebasing off master to re-run the CI, as well as updating merge conflicts?

We're trying to clean up our PR todo list, so if you're not interested, that's fine - we'll close the PR in about a week if we don't hear back. If you're interested in reopening the PR afterwards, we would also very much welcome that.

@guidopetri guidopetri closed this Jul 29, 2023
@guidopetri guidopetri deleted the change-tracking branch July 29, 2023 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restore change tracking functionality
4 participants