-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC 42: Data transfer between Wagtail installations #42
RFC 42: Data transfer between Wagtail installations #42
Conversation
1a3df8a
to
81060c3
Compare
The list of instances offered as import source sites will be configurable through the Django settings file. In cases where sites are locked down by basic authentication, IP range or similiar, it is the implementor's responsibility to ensure that each instance is able to make HTTP / HTTPS requests to its 'sibling' sites. | ||
|
||
WAGTAIL_IMPORT_SOURCE_INSTANCES = [ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not require an identifier for each source? (similar to how we identify database and search backends)
WAGTAIL_IMPORT_SOURCE_INSTANCES = {
'staging': {
'label': "Staging", # Could default to the identifier.title()
}
}
|
||
The list of instances offered as import source sites will be configurable through the Django settings file. In cases where sites are locked down by basic authentication, IP range or similiar, it is the implementor's responsibility to ensure that each instance is able to make HTTP / HTTPS requests to its 'sibling' sites. | ||
|
||
WAGTAIL_IMPORT_SOURCE_INSTANCES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tend to omit the underscore between WAGTAIL
and the module name, For example. WAGTAILSEARCH_BACKENDS
.
Also maybe _SOURCE_INSTANCES
could be replaced with _SOURCES
?
|
||
To avoid the need to modify Wagtail's core data model, the import-export app will define a new model / database table to store the mappings from UIDs to local installation-specific IDs: | ||
|
||
class IDMapping(models.Model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a unique_together
constraint on content_type
and local_id
?
I think in the spirit of the use case that I have for this, the pull approach should be fine, but the push approach is very much preferred. :) In cases where the reason for doing data transfer is security related, it may not be possible for direct login to 'publish' instances vs. 'author' instances. UX-wise, the push approach also prevents the author from having to log in to multiple instances, and figure out which instance they are on. Perhaps a menu item in the author instance that jumps you over to publish and initiates the import for that page? |
This would be a great feature to have and should definitely be discussed as an RFC here.
+1 @cyface. A reasonable and common pattern is to have the production environment in a network purposefully isolated from staging and development environments. You might not want to expose those environments to the production one because that opens the possibility of external requests making it down to an internal network. In those cases it would be preferable to be able to do some kind of file-based export, where users on the "source" Wagtail can generate a transfer artifact which can be imported into the "destination" Wagtail. This would be akin to a super dumpdata which dumped only the desired models, along with a super loaddata which figured out how to smartly ingest them. This file-based export could potentially even live along side a direct connection-based import/export, sharing the same logic underneath. This might depend a bit on how the logic works: if an import is "make a bunch of API calls, then manipulate and ingest the data", that seems like it'd work just as well with an interim file-based step. If instead the import logic were iterative (meaning, a query is made to the API, then other queries are made depending on how the destination is set up), this wouldn't fit so well with the proposed approach. |
@chosak I think it is an iterative process, and one that can potentially vary depending on what objects exist at the destination. For any item that has been explicitly chosen to be imported, there's a tree of related items to be imported along with it, potentially of any depth (for example, a page may reference a snippet which references an image). For every such relation in that tree, there's a business rule (i.e. one of the bullet points given under https://github.com/gasman/weps/blob/81060c34c190482e262be6b7595ac4d213276704/text/042-data-transfer-between-installations.md#configuration) which determines whether the related object will be imported or skipped. Some of these business rules have a 'get-or-create' behaviour, which means that they may or may not end up importing the related object (and possibly pulling in even more related objects associated with it) depending on whether it exists already. This means that the dataset exported from the source site has to be larger in the 'create' scenario than the 'get' scenario. As a concrete example: suppose we're importing a blog section, and the blog posts have categories, identified by a slug. These categories are snippet/modeladmin models, and should be created at the destination whenever we encounter one that doesn't exist. Now, suppose that a category object isn't just a plain slug + label, but includes some 'rich' fields, such as an image or a rich-text description. The import logic has to consider the object references in these fields if, and only if, the category doesn't exist already at the destination - which is something we can't know if we're preparing the export dataset up-front. This isn't insurmountable - the exporter logic could take a "pessimistic" view and always output the extra records whenever there's a chance of needing them - but it's added complexity that can be left until later. |
Discussed in Wagtail core meeting. Everyone in favour of approach as it stands as it could be extended to support a push workflow and transfer with a file later on. We also discussed implementing a mechanism in Wagtail core to discover all objects that a page references. This would scrape foreign keys, chooser blocks and rich text features to find all linked pages, images, documents and snippets. Everyone was in favour of this change too. |
Work done in https://github.com/wagtail/wagtail-transfer/ |
Merged since this is implemented now and the RFC generally reflects what made it in to Wagtail transfer |
I want to start issues |
Rendered