Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2749] [Spike] Investigate merging merge_from_artifact and add_from_artifact #7965

Closed
aranke opened this issue Jun 27, 2023 · 2 comments · Fixed by #9040
Closed

[CT-2749] [Spike] Investigate merging merge_from_artifact and add_from_artifact #7965

aranke opened this issue Jun 27, 2023 · 2 comments · Fixed by #9040
Labels
stale Issues that have gone stale

Comments

@aranke
Copy link
Member

aranke commented Jun 27, 2023

Thinking more about this: Do we actually need separate methods for merge_from_artifact and add_from_artifact? Or could it simply be that:

  • for nodes which are "being deferred," we completely clobber/replace this manifest's version with the other manifest's version
  • for nodes which aren't "being deferred" (because they're being selected, or already exist in this schema), we keep the existing node entry and add the defer_relation attribute

Then, all tasks could call this same method. We don't need the divergent behavior between clone and run, and we set ourselves up more nicely for future work around contract inference, dev/prod diff, ...

Pseudo code:

    def merge_from_artifact(
        self,
        adapter,
        other: "WritableManifest",
        selected: AbstractSet[UniqueID],
        favor_state: bool = False,
    ) -> None:
        """Given the selected unique IDs and a writable manifest, update this
        manifest by replacing any unselected nonexisting nodes with their counterpart.
        For any selected/existing nodes, add information about each node's location
        in the other manifest.

        Only non-ephemeral refable nodes are examined.
        """
        refables = set(NodeType.refable())
        merged = set()
        for unique_id, node in other.nodes.items():
            current = self.nodes.get(unique_id)
            
            # do the replace thing
            if current and (
                node.resource_type in refables
                and not node.is_ephemeral
                and unique_id not in selected
                and (
                    not adapter.get_relation(current.database, current.schema, current.identifier)
                    or favor_state
                )
            ):
                merged.add(unique_id)
                self.nodes[unique_id] = node.replace(deferred=True)
           
           # do the additive thing
           elif current and (node.resource_type in refables and not node.is_ephemeral):
                defer_state_relation = RelationalNode(node.database, node.schema, node.alias)
                self.nodes[unique_id] = current.replace(defer_relation=defer_relation)

        # Rebuild the flat_graph, which powers the 'graph' context variable,
        # now that we've deferred some nodes
        self.build_flat_graph()

        # log up to 5 items
        sample = list(islice(merged, 5))
        fire_event(MergedFromState(num_merged=len(merged), sample=sample))

The complication I could foresee here: Traditional defer behavior requires the population of the adapter cache (for the "nonexisting" part). I had suggested doing things differently in clone's before_run setup so that we defer first, and then cache, so that we can also cache the "other" schemas (indicated by defer_relation). I'm not sure if that's actually necessary, or if it's extra complication that isn't really worth it.

Originally posted by @jtcohen6 in #7881 (comment)

@github-actions github-actions bot changed the title [Spike] Investigate merging merge_from_artifact and add_from_artifact [CT-2749] [Spike] Investigate merging merge_from_artifact and add_from_artifact Jun 27, 2023
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Dec 25, 2023
Copy link
Contributor

github-actions bot commented Jan 1, 2024

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that have gone stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant