Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move functionality out of PhysicalPlanBuilder into other classes and add tests #405

Merged
merged 1 commit into from
Oct 31, 2017

Conversation

dguy
Copy link
Contributor

@dguy dguy commented Oct 23, 2017

No description provided.

@hjafarpour
Copy link
Contributor

@dguy you have moved the physical layer functionality into the logical layer. We want to keep logical plan phase independent of the physical plan phase.

Copy link
Contributor

@hjafarpour hjafarpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have moved the physical layer functionality into the logical layer. We want to keep logical plan phase independent of the physical plan phase.

@apurvam
Copy link
Contributor

apurvam commented Oct 24, 2017

@hjafarpour, not sure I follow your point. I understand that the PlanNode hierarchy was the logical layer, and this was converted to a stream topology by the physical plan builder.

However, I have been looking at the physical plan builder code, and it really is quite complicated. In particular, the recursive calls into kafkaStreamsDsl are quite hard to follow. This seems to disentangle some of that by the adding the buildStream method to each PlanNode type which builds the DSL for that particular type of node.

While this does mix the strictly logical PlanNode with the physical DSL, it does so through a clean boundary (ie. the buildStream method).

The proposed structure makes things a bit easier to read and reason about. More importantly, when in the future when we add more types of PlanNode, the logic would also be compartmentalized in those new nodes. It would result in a simpler PhysicalPlanBuilder over time. So there are many benefits.

Is it really that important that the logical and physical layers are separated in the previous way? What are the advantages of that approach in your mind?

@hjafarpour
Copy link
Contributor

@apurvam yes, keeping logical plan independent of physical plan layer is essential since in logical layer we should not care about how we implement the plan in destination physical layer. Our default physical layer is Kafka Streams at the moment but in future we will also consider other execution contexts such as connect or 3rd party systems. So it is essential to keep any dependency to the underlying physical plan away and keep logical plan layer abstract.

@dguy
Copy link
Contributor Author

dguy commented Oct 24, 2017

I (obviously) agree with @apurvam. This makes the code easier to reason about. The logic is actually in the classes it needs to be rather than all dumped in the PhysicalPlanBuilder.
As for the future... Well, if we had some interfaces then this would be not a problem. Maybe we'll get there maybe we won't. I don't think the maybe future should hold this up as it is an improvement in readability and testability

@hjafarpour
Copy link
Contributor

I totally agree with the readability and testability argument but we would be violating a more important principal here which is keeping logical layer independent of physical layer. I would suggest breaking up the PhysicalPlanBuilder by creating new classes in the same layer instead of spilling the physical implementation code in the abstraction layer. All data management systems have kept their logical planning code independent of implementation details and we should not violate this principal.

@dguy
Copy link
Contributor Author

dguy commented Oct 24, 2017

As it is in this PR if we add new extensions of PlanNode we only have to add 1 class that encapsulates the behaviour. If we do it another way we need add multiple classes every time we change something. IMO, this is not a good design. If we put the logic with the data we can just add a single class and it works.
As it stands, there is no abstraction layer - it is just a bunch of containers of data that do nothing.

Copy link
Contributor

@hjafarpour hjafarpour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link

@bluemonk3y bluemonk3y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leggit

@dguy dguy merged commit 93dba91 into confluentinc:4.0.x Oct 31, 2017
@dguy dguy deleted the physical-plan-builder branch October 31, 2017 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants