-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hist application does not use the user specified nbin parameter correctly. #3882
Comments
The snarkiness is understandable, but it makes your bug report harder to read. The person your talking about doesn't work on ISIS anymore, so you're directing your snark towards people who are not related to it at all. I'd appreciate it if you removed the snarkiness from your bug report next time so that it's less likely to be misunderstood. |
Yes, that is a fair request. I will try to do my best to not bring up things that happened a generation ago! |
I agree this math looks incorrect to me:
|
@lavalaz Would it be possible for you to share a cube that would be good for testing, possibly the one you used in the original post? |
Tied up in meetings, will try to get you one this evening. Will give you a link to one on /work somewhere... |
@lavalaz You mentioned adding two extra bins to "catch" values outside of the range specified by the user. Would adding a parameter that allows users to select if they want the two extra bins be useful? Is there a use case for this functionality? It looks like the current functionality places values outside of the range in the first and last bins, e.g., bins 1 and 10 in the example. In addition, would changing the CSV output to:
from:
meet the requirements of showing the bounds of each bin? Could this change negatively impact users? I am not familiar with hist or if it is used in any pipelines. Lastly, some of these changes will affect other programs, e.g., @jessemapel If you have any opinions about this too. |
You could make it a switch or just have them there always. There may be some users who complain about "extra" bins if they asked for a specific set of bins, so its not a bad idea but probably not essential. |
After discussions and understanding more of how hist works, I am clarifying desired output: |
To clarify, I was wrong about pixels outside of the range being placed in the first and last bins. I missed a line of code and did some testing. However, we will still need to add how many pixels are outside of the range. |
Because of how this change impacts the output format for hist, we're pushing the release of this bug fix back to August with the full feature release in 4.2 @lavalaz If you need this fix before then, let us know and we'll stage an internal build for you to use. |
The release candidate for 4.2 will be available in early July? |
@blandoplanet Yes |
I don't need a special early version. Thank you! |
PR needed to be temporarily reverted due to lots of failing tests. See PR #3922. |
Fixed by #4004 |
This change became available in ISIS 4.3.0, and changes the number of output columns in the text output file from For the record, this was a breaking change. It altered the default output of an ISIS program. This feature should have triggered 5.0.0 right then, it was not a non-breaking change. This is the kind of thing that the new SemVer approach is supposed to help us enforce, but it slipped through the cracks. I'll also note that this is the kind of thing that really is a breaking change hidden in a minor feature-release that can screw up a set of mission pipelines. It will affect HiRISE processing. |
Yes, we recently discovered this while testing ISIS 4.3.0. I agree that it is a breaking change. We are in the process of updating our pipelines to handle the changed output. Changes like these are what add to the time it takes missions to test and update to newer versions of ISIS. The change seems small, but our pipelines can be complicated and the original developer might no longer be around. It takes us time to understand the original developers intent, the implications of making changes to the code and then to implement and test those changes. It is situations like this that impact which long term support release cadence would work best for mature missions. Changes take time and we are trying to keep up with smaller budgets, fewer people and an aging instrument. |
Good catch here. I think we need to come up with a better definition of what a breaking change is for ISIS. It should capture the library API, the app arguments, and the app outputs that people are writing pipelines against. I'm going to give this some thought over the weekend. |
From our previous work, I am pretty sure we are considering application output outside of the versioning schema. It looks like the change here is in a text file that is output by an application? If so, then isn't this change in the same spirit as #3227. It sounds like we need to revisit the discussion about what an API change is and what the expectations are for parseable output in a different issue. |
I thought the ISIS TC had come to the understanding that "the ISIS API" includes the arguments to userland programs and their output. In fact, there is this sentence in our 2020-02-11 meeting notes: "Currently the API is both the library API and the application API."
Sounds like we do if we aren't all on the same page. If we're going to claim that adding a column of output to an output file isn't a breaking change (which is changing the "return signature" of that program), then this distinction between major and minor will appear to have no meaning for the users of the software. If we tell people that there will be no impact to their work when upgrading minor versions, but we are allowing changes like this, then we are lying to them. |
ISIS version(s) affected: x.y.z
isis4.1.0 (and earlier)
Description
I carefully picked the min/max values and NBINS to put the bin boundaries at very specific DN values. But the application did not honor those user input limits and chose a min value larger than the user specified min and a max value less than the user specified max. Therefore the bin boundaries are not where the user intended.
How to reproduce
Pick any typical image with I/F values between 0 and 1.
hist from=[input.cub] minimum=0.0 maximum=1.0 nbin=10
This should but the bin boundaries at 0.0-0.1, 0.1-0.2, 0.2-0.3, 0.3-0.4, 0.4-0.5, 0.5-0.6, 0.6-0.7, 0.7-0.8, 0.8-0.9, 0.9-1.0. Count them. Ten (10) bins. Hist produced 10 bins with "DN" of 0,0.11111,0.222222,0.333333,0.444444... ...0.88888,1. Ten bins alright, but not at 0.1 spacing. I guess I was wrong and 1/10=0.111111. My bad. So what DN's are in those bins? The output only specifies one DN value and I can't tell if its the max or the min of the bin. Since it starts with "0", clearly its the minimum value and the first bin has DN values of 0.0-0.111111. But since the last bin is "1", it must have DN values of 1 to ??? Oh, must be my mistake, the DN value must be the for the top of the bin so the last bin is for DN values of 0.88888-1.0. So the first bin has DNs of ??? to zero? But if I ask for 11 bins, I get bins of "0,0.1,0.2,0.3...1". 1/11=0.1, now I get it!
Possible Solution
Part 1: You need to give the values for minimum and maximum of the range of DNs that are going into each bin, not just a single DN value. You should also clearly document which side of the bin is a greater than/less than versus greater-than-or-equal-to/less-than-or-equal-to.
Part 2: You should also have a bin for "less-than-the-minimum" and "greater-than-the-maximum". So 12, clearly labeled, bins when the user says nbin=10.
Additional context
Just for historical context, I know this issue was raised about 15 years ago. The user in question, not myself, who had a degree in mathematics, was told that they did not understand the mathematics of histograms and this behavior was correct. That is why you are seeing extra snarkiness in this ticket.
The text was updated successfully, but these errors were encountered: