Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulation Envelopes, Low & High #57

Closed
vpapaioannou opened this issue Jun 26, 2020 · 10 comments
Closed

Simulation Envelopes, Low & High #57

vpapaioannou opened this issue Jun 26, 2020 · 10 comments
Assignees

Comments

@vpapaioannou
Copy link

The way the low and high functions are calculated, https://github.com/pysal/pointpats/blob/master/pointpats/distance_statistics.py#L627, it appears to be confusing since it depends on the pct parameter whereas it shouldn't. If pct=1 then the low and high envelopes are the same as can be seen below.

realizations = PoissonPointProcess( spp.window, spp.n, 10, asPP = True)
kenv = Kenv( spp, intervals = 20, realizations = realizations, pct = 1)
kenv.plot()

image

If pct = 0.05 having the same realizations, then,

image

If there is an intention to incorporate some sort of p - value in the graphs, then from what I know this p - value is determined exclusively by the amount realizations and nothing else. Finally, I would suggest to have an assertion about pct valid values.

Thanks,

Vasilis

@knaaptime knaaptime transferred this issue from pysal/pysal Jun 26, 2020
@sjsrey
Copy link
Member

sjsrey commented Jun 28, 2020

@vpapaioannou if you could add a PR that illustrates they type of functionality you are interested in, that would be very helpful.

@sjsrey sjsrey self-assigned this Jun 28, 2020
@vpapaioannou
Copy link
Author

What does PR mean?

To my understanding, the envelopes should be independent from the level of significance. That means, given the existing code, that the low envelope should be always row 0 and the high envelope should be always the last row of numpy array res. They should be the same if only one realization is provided.

@sjsrey
Copy link
Member

sjsrey commented Jun 29, 2020

What does PR mean?

Pull Request

To my understanding, the envelopes should be independent from the level of significance. That means, given the existing code, that the low envelope should be always row 0 and the high envelope should be always the last row of numpy array res. They should be the same if only one realization is provided.

The simulation envelopes are dependent on the specification of the significance level.

@vpapaioannou
Copy link
Author

Do you have a reference about "The simulation envelopes are dependent on the specification of the significance level."?

@sjsrey
Copy link
Member

sjsrey commented Jun 29, 2020

Do you have a reference about "The simulation envelopes are dependent on the specification of the significance level."?

https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1890/13-2042.1

@vpapaioannou
Copy link
Author

vpapaioannou commented Jun 30, 2020

Reading the paper, I understand your rationale. To my mind though, there are two different but very close concepts. The first one is that of the Lower / Upper Bound (LB / UB) and the second one is that of a K function at level α. In the first case, the LB should be the row 0 in array res in the code, and the UB should be the last row. For the second case, the K functions should be reported together with the level α so that to be distinguished from the actual LB and UB K functions. Also, instead of using the LB / UB labels I would just use K_α.

Finally, in line https://github.com/pysal/pointpats/blob/master/pointpats/distance_statistics.py#L626, the length nres should be nres = len( res) + 1 where the 1 comes from the current point process under testing that is considered as another instance of a CSR process. As I understand it, realizations variable doesn't include the observed one.

@ljwolf
Copy link
Member

ljwolf commented Jun 30, 2020

Would clarifying the documentation help? It seems your interpretation of envelope is the extrema, while we're using it to mean the (1- α)% extrema. It is unlikely that we will change the nomenclature in this code, but a re-vamp of the code is about to be merged that is oriented to let users work with simulations directly.

@ljwolf
Copy link
Member

ljwolf commented Jun 30, 2020

where the 1 comes from the current point process under testing

Edit: Of course, I should also say thank you for reviewing this in depth and giving feedback!!

I believe this is done correctly in the new implementation.

@vpapaioannou
Copy link
Author

vpapaioannou commented Jun 30, 2020

Yes, updating the documentation it would be enough. If pct is small enough then you can get the extrema, otherwise you can get the (1 - α)% extrema. If you do so, I would suggest instead of pct to use alpha and this parameter would be self explanatory (keep the doc string though). Finally, I would like to bring attention to np.int() function that is used, that floors a number e.g. 2.7 -> 2.

As of nres that I said above, since Python starts from 0, I think the code is correct. However, if pct = 0 then an IndexError is raised. Maybe there you need to take care of this in the new implementation.

You are welcome, thanks for implementing all this functionality.

@ljwolf
Copy link
Member

ljwolf commented Jun 2, 2021

This should be addressed in the new implementation

@ljwolf ljwolf closed this as completed Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants