Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filter ratio dynamic metric #24

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

andreiNek
Copy link

  • calcNodeMetrics in updateSqlNodeMetrics (for live updates) and calculateSql (for completed runs)
    were replaced with updateNodeMetrics function.
  • updateNodeMetrics accepts the node graph and able to add extra insights based on spark metrics and graphs insights.
  • new logic for metrics add-on can be added here.
  • Now there is only 1 call to addFilterRatioMetric which adds to Filter/Join nodes an extra metrics names filter_ratio (in percentage).
  • base use cases like no input, stage followed by filter, more than one input node with rows and join filtering are implemented.

const filterRatio = ((totalInputRows - outputRows) / totalInputRows) * 100;
if(filterRatio <= 100){
updatedMetrics.push({
name: "filter_ratio",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name should be readable, i.e. "Filter Ratio"

}

let totalInputRows = 0;
let validPredecessors = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant, you can just check if input rows is 0

return updatedMetrics;
}

const filterRatio = ((totalInputRows - outputRows) / totalInputRows) * 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use formatPercentage util method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also "filter ratio" is the opposite I think, if you filter 1000 from 1M it's 0.1% filtered not 99%. Maybe the naming should be better here like "filtered rows percentage"

@@ -399,6 +400,7 @@ export function updateSqlNodeMetrics(

const notEffectedSqls = currentStore.sqls.filter((sql) => sql.id !== sqlId);
const runningSql = runningSqls[0];
const graph = generateGraph(runningSql.edges, runningSql.nodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we are generating the graph for every metric update cycle. This is not ideal.
We should cache the graph if there is no change.
Please at least add a todo comment to cache the graph.

): EnrichedSqlMetric[] {

const updatedMetrics = calcNodeMetrics(node.type, metrics);
addFilterRatioMetric(node, updatedMetrics, graph, allNodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you return updatedMetrics and does nothing with it.
We prefer to be functional, so add the new metric and with [..., newMetric"] and return this from the method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants