Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cartesian 2dMap: implement sorting of categorical axes #3827

Merged
merged 21 commits into from
May 8, 2019
Merged

Conversation

antoinerg
Copy link
Contributor

@antoinerg antoinerg commented May 1, 2019

Closes #1097 (see below #3827 (comment))

Closes #1117 (see below #3827 (comment))

@antoinerg
Copy link
Contributor Author

antoinerg commented May 1, 2019

Closes #1097:
Codepen before: https://codepen.io/antoinerg/pen/bJyOjX
Codepen after: https://codepen.io/antoinerg/pen/EJzGdv

@@ -30,12 +30,23 @@ module.exports = function clean2dArray(zOld, transpose) {
old2new = function(zOld, i, j) { return zOld[i][j]; };
}

var xMap = function(i) {return i;};
var yMap = function(i) {return i;};
if(trace && trace.type === 'heatmap') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Why just heatmap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the low-hanging fruit. I guess I should enlarge the scope of this PR and implement sorting of categorical axes for contour, histogram2d and all other traces relying on heatmap's calc routine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely useful for histogram2d! I have qualms from a viz-theory standpoint about sorting an axis of a contour plot: if you're drawing contours, which interpolate between data points, it strongly implies a continuous axis, so you really have no business with categories unless they're some sort of a hack of what's really a continuous axis... and sorting a continuous axis is meaningless.

That said, we allow it in the schema, therefore it's a bug if it doesn't work!

Copy link
Contributor Author

@antoinerg antoinerg May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Histogram2d and contour are now working in 6b06516

By allowing all traces (except carpet and contourcarpet), the only test that would fail was in bar_test.js. In it, we would create a contour trace without x and y data. In that case, trace._x and trace._y would be empty and it failed. To fix this, we check whether those arrays are empty and ignore categorical sorting if they are. After all, if the data has no categories attached to it, it's not clear how it should be sorted.

src/traces/heatmap/calc.js Outdated Show resolved Hide resolved
@etpinard
Copy link
Contributor

etpinard commented May 2, 2019

Thanks very much @antoinerg . This PR is looking good!

@etpinard
Copy link
Contributor

etpinard commented May 2, 2019

Oh wait, codepen https://codepen.io/anon/pen/qqdqWq from #1117 still looks off using your latest commit -> https://codepen.io/etpinard/pen/ROXLbB

@etpinard
Copy link
Contributor

etpinard commented May 2, 2019

... it would be ok to leave #1117 un-addressed in this PR by the way. I hoped to fix two bugs in one go, but it doesn't have to be the case 😄

@antoinerg
Copy link
Contributor Author

@etpinard Although I can get heatmapgl to support categoryorder and categoryarray on first render, it doesn't work on Plotly.relayout (somehow the calc routine is not called again). Maybe this is related to #3833 which I just opened. 🤔

@etpinard
Copy link
Contributor

etpinard commented May 3, 2019

Although I can get heatmapgl to support categoryorder and categoryarray on first render, it doesn't work on Plotly.relayout

That's ok. If you're using heatmapgl, I hope that you have a large data set that isn't categorical, so I have 100% no problem calling this an edge case.

Thanks for opening #3833 !

@antoinerg
Copy link
Contributor Author

@etpinard Ok so in the end, there are minimal changes to the code. Would you like to see more tests?

@etpinard
Copy link
Contributor

etpinard commented May 6, 2019

Ok so in the end, there are minimal changes to the code. Would you like to see more tests?

Test coverage look good for this fix. Thanks very much! Although I'm a little worried about that @flaky tag you added.

Now, could you investigate (i.e. no need to fix this now) why your patch here does not resolve #1117 (which I also suspect will need a patch in clean2darray)? Thanks!

@antoinerg

This comment has been minimized.

@antoinerg antoinerg changed the title heatmap: implement sorting of categorical axes cartesian 2dMap: implement sorting of categorical axes May 6, 2019
@antoinerg
Copy link
Contributor Author

Commit e966030 closes #1117:
Codepen before: https://codepen.io/anon/pen/qqdqWq
Codepen after: https://codepen.io/antoinerg/pen/WBbyyP

@antoinerg
Copy link
Contributor Author

antoinerg commented May 7, 2019

One detail I would like to discuss is whether we really want to support categories that are numerical. Right now, gd._fullLayout.xaxis._categories can be an array of integers. However, whenever findCategories gets called, it is turned into an array of strings by returning a call to Object.keys:

function findCategories(ax, opts) {
var dataAttr = opts.dataAttr || ax._id.charAt(0);
var lookup = {};
var axData;
var i, j;
if(opts.axData) {
// non-x/y case
axData = opts.axData;
} else {
// x/y case
axData = [];
for(i = 0; i < opts.data.length; i++) {
var trace = opts.data[i];
if(trace[dataAttr + 'axis'] === ax._id) {
axData.push(trace);
}
}
}
for(i = 0; i < axData.length; i++) {
var vals = axData[i][dataAttr];
for(j = 0; j < vals.length; j++) {
var v = vals[j];
if(v !== null && v !== undefined) {
lookup[v] = 1;
}
}
}
return Object.keys(lookup);
}

This needs to be adapted if we want to properly support numerical categories.

@etpinard
Copy link
Contributor

etpinard commented May 7, 2019

One detail I would like to discuss is whether we really want to support categories that are numerical.

Yes.

Right now, gd._fullLayout.xaxis._categories can be an array of integers. However, whenever findCategories gets called, it is turned into an array of strings by returning a call to Object.keys

Why is that a big deal? I don't see a problem with turning typeof number items in _categories into strings if that makes things easier.

ax && ax.type === 'category' && trace['_' + ax._id.charAt(0)].length) {
var axMapping = [];
for(i = 0; i < ax._categories.length; i++) {
axMapping.push((trace['_' + ax._id.charAt(0) + 'Map'] || trace[ax._id.charAt(0)]).indexOf(ax._categories[i]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding var axLetter = ax._id.charAt(0) above probably wouldn't hurt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and can we use ax._categoriesMap instead of a (potentially) costly indexOf into ax._categories?

Copy link
Contributor

@etpinard etpinard May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, grepping for xMap and yMap doesn't yield anything for me. Where do they come from?

EDIT: Nevermind. I found them in convert_column_xyz.js.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and can we use ax._categoriesMap instead of a (potentially) costly indexOf into ax._categories?

Unfortunately, we cannot. For example, in the heatmap_shared_categories mock, _categoriesMap links category Team C with index 2. That index is out of range for the second trace since it only has 2 columns. We really need to check whether each trace has data for a given category (and fill with BADNUM) to fix #1117.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's very unfortunate, because

for(i = 0; i < ax._categories.length; i++) {
  trace[ax._id.charAt(0)]).indexOf(ax._categories[i]))
}

can be very costly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if instead you made _(x|y)CategoryMap lookup objects?

That is, instead of:

   if(ax1 && ax1.type === 'category') {
        trace['_' + var1Name + 'CategoryMap'] = col1vals.map(function(v) { return ax1._categories[v];});
    }

we could have

   if(ax1 && ax1.type === 'category') {
        var lookup = trace['_' + var1Name + 'CategoryMap'] = {};
        for(var i = 0; i < col1vals.length; i++) {
           var v = col1vals[i];
           lookup[v] = i;
       }
    }

Copy link
Contributor Author

@antoinerg antoinerg May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About your questions regarding hasOwnProperty:

  • we have it in several places already and it seems well supported
  • hasOwnProperty is roughly 6 times faster than indexOf

However, we now run it for every element so in the end, I'm not sure which is faster with real data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @antoinerg I got confused (again). Your logic looks right to me.

The way you have it setup right now, I think we can get rid of the axMapping.hasOwnProperty(ax._categories[i]) and just have:

function(i) { 
  var ind = axMapping[ax._categories[i]];
  return ind + 1 ? ind : BADNUM;
};

where the + 1 takes care of the ind === 0 case.

This comment is non-blocking as I don't expect much of a performance gain from it 😏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where the + 1 takes care of the ind === 0 case.

Awesome trick :) Done in c7cd1f3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💃 💃

@antoinerg
Copy link
Contributor Author

@etpinard I updated the various Codepens to use the latest version f180c79!

@etpinard
Copy link
Contributor

etpinard commented May 7, 2019

Nice work @antoinerg !

Thanks for commit 0c41776 - which should be a step in the right direction towards cleaning up our categories stashes (we still have a long way to go though).

The best part about this PR are the tests, so let's get in 💃


I'm not sure if GitHub is smart enough to close multiple issues at once, so make sure both #1097 and #1117 are closed off correctly.

@antoinerg antoinerg merged commit 19299af into master May 8, 2019
@antoinerg antoinerg deleted the issue-1097 branch May 8, 2019 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Heatmap should look up category position when building brick data heatmaps don't respect axis categoryarray
3 participants