[ENH][DOC] local Geary statistics #145

jeffcsauer · 2020-08-22T05:38:15Z

Purpose of PR

This PR addresses bullet point 7 and 8 of #61. Specifically, this PR adds two new functions to estimate:

Local Geary statistics - univariate (function name: Local_Geary)
Local Geary statistics - multivariate (function name: Local_Geary_MV)

Each function is written in the form of a scikit-learn style estimator. PEP8 formatting has been applied to all of the functions, although a handful of lines are left long due to readability.

Documentation

One notebook is included in the PR. The notebook reviews the core math of the statistic and explains the basic use of the statistic following examples provided by Anselin (2017).

All functions include docstrings and doctests that use built-in PySAL example datasets.

Tests

I have added in two .py test files that execute successfully on a branch specifically for the Geary estimators. These new tests follow the lead of the existing tests, specifically test_moran.py.

Notes for PR consideration

The univariate local Geary is numba-ized, whereas the multivariate local Geary uses a modified form of the old _crand() engine.

codecov-commenter · 2020-08-22T05:41:43Z

Codecov Report

Merging #145 into master will increase coverage by 2.53%.
The diff coverage is 94.83%.

@@            Coverage Diff             @@
##           master     #145      +/-   ##
==========================================
+ Coverage   77.61%   80.14%   +2.53%     
==========================================
  Files          25       37      +12     
  Lines        3274     3758     +484     
==========================================
+ Hits         2541     3012     +471     
- Misses        733      746      +13

Impacted Files	Coverage Δ
esda/local_geary.py	`87.50% <87.50%> (ø)`
esda/losh.py	`89.74% <89.74%> (ø)`
esda/tests/test_ljc.py	`91.30% <91.30%> (ø)`
esda/tests/test_ljc_mv.py	`92.00% <92.00%> (ø)`
esda/tests/test_local_geary.py	`92.00% <92.00%> (ø)`
esda/tests/test_losh.py	`92.00% <92.00%> (ø)`
esda/tests/test_ljc_bv.py	`92.30% <92.30%> (ø)`
esda/tests/test_local_geary_mv.py	`92.59% <92.59%> (ø)`
esda/local_join_count_bv.py	`97.33% <97.33%> (ø)`
esda/local_geary_mv.py	`100.00% <100.00%> (ø)`
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e660983...ffc1b30. Read the comment docs.

jeffcsauer · 2020-08-29T06:02:51Z

A few slight changes in 449e767, specifically:

updating docstrings with some formatting changes
ensuring scikit learn formatting for univariate statistics
fixed permutations such that it is part of init(...) and removed constant from top of file

hh2110 · 2020-09-02T09:22:39Z

Hi Jeff, I'm using this branch to do carry out some spatial autocorrelation. @ljwolf may have noticed this already but setting n_jobs > 1, causes the issue described in #146 - the quick fix described in #146, does not work here since the argument for keep is set to True in the call to _crand_plus (L#128 in local_geary.py).

jeffcsauer · 2020-09-02T14:12:59Z

@hh2110 thanks for catching that! I will take a look at it today and get back to you ASAP.

jeffcsauer · 2020-09-02T14:31:46Z

@hh2110 thanks again for the catch - I appreciate your code review and noting of the issue! The keep_simulations variable was not yet set to be modifiable as it should have been. I ran some basic tests on my side (see chunk below), but do let me know if it's not working!

import libpysal as lp
import geopandas as gpd
guerry = lp.examples.load_example('Guerry')
guerry_ds = gpd.read_file(guerry.get_path('Guerry.shp'))
w = libpysal.weights.Queen.from_dataframe(guerry_ds)
y = guerry_ds['Donatns']
lG = Local_Geary(connectivity=w, n_jobs=2, keep_simulations=False).fit(y)
lG.p_sim[0:5]

array([0.191, 0.05 , 0.058, 0.146, 0.48]

hh2110 · 2020-09-02T17:01:26Z

Thanks @jeffcsauer. One more thing please: I am analysing a 400x400 matrix using the Local_Geary.fit function. For this large array, I noticed that there was a time consuming step in Line#169 in local_geary.py: I think you can replace this line

gs = sum(list(w.weights.values()), []) * (zi-zj)**2

with

gs = adj_list.weight.values * (zi-zj)**2

The sum(list(w.weights.values()), []) seems to take a long time for large arrays. The only thing I am unsure of is whether or not the order of values in adj_list.weight.values matches that of sum(list(w.weights.values()), [])?

jeffcsauer · 2020-09-03T14:08:21Z

@hh2110 taking a look at your suggestion this morning - will report back results! I think you may be onto something as a similar type of improvement has been implemented in the LOSH function.

jeffcsauer · 2020-09-03T14:34:53Z

@hh2110 another great catch! 🚀 I first checked to make sure that sum(list(w.weights.values()) and adj_list.weight.values were equivalent. On my local tests these appear to be the same. I also checked to ensure there was no difference in the results between row-standardized and unstandardized weights - all good there as well. I think in the original implementation I was fixated on a single approach to getting those adjacency values - didn't take the step back to see your more elegant solution!

I've pushed your suggested change in both Local_Geary and Local_Geary_MV. Could you let me know if its (1) producing same/similar results to the slow version of the function, and (2) if it's much faster?

ljwolf · 2021-01-18T16:03:19Z

I've fixed up imports and renamed things to conform to the other esda style (Local_Geary becomes Geary_Local, so that tab completion can work on the statistic, then the variant)

This is some very high-quality work, @jeffcsauer and I'm sorry we couldn't get it merged sooner!

ljwolf · 2021-01-18T16:11:42Z

This is ready to ship. It has notebooks, documentation, and conforms to style.

jeffcsauer · 2021-01-18T18:24:38Z

No worries at all - thanks for the kind comments and mentorship! 😃

Jeffery Sauer added 4 commits July 20, 2020 09:32

[ENH][DOC] local join count and LOSH statistics

066337f

Merge branch 'master' of https://github.com/pysal/esda

55dd123

[ENH][DOC] local geary statistics

13dbad1

Updating doctest in local_geary_mv.py

dfab1cb

ljwolf requested review from ljwolf and sjsrey August 28, 2020 16:07

Updating permutations and structure from Levi LOSH LJC code review

449e767

Updating keep_simulations argument noted by hh2110 in PR145

0648514

Updating adj_list weights variable noted by hh2110 in PR145

ffc1b30

ljwolf added 6 commits January 18, 2021 15:53

rename files and classes to match existing api

8cd4303

update the API

c7e3a56

join_counts is plural, not singular

bd4b050

import multivariate and bivariate statistics from their own file

303c188

fix spatial pearson local import as well

47cc073

update imports for tests, too

75318b4

ljwolf approved these changes Jan 18, 2021

View reviewed changes

update notebooks with correct classes

a0fcf7c

sjsrey merged commit 5e7e38d into pysal:master Jan 18, 2021

jeffcsauer deleted the gearybranch branch January 18, 2021 18:24

ljwolf mentioned this pull request Jan 20, 2021

[ENH][DOC] local join count and LOSH statistics #139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH][DOC] local Geary statistics #145

[ENH][DOC] local Geary statistics #145

jeffcsauer commented Aug 22, 2020

codecov-commenter commented Aug 22, 2020 •

edited

Loading

jeffcsauer commented Aug 29, 2020

hh2110 commented Sep 2, 2020

jeffcsauer commented Sep 2, 2020

jeffcsauer commented Sep 2, 2020

hh2110 commented Sep 2, 2020

jeffcsauer commented Sep 3, 2020 •

edited

Loading

jeffcsauer commented Sep 3, 2020

ljwolf commented Jan 18, 2021

ljwolf commented Jan 18, 2021

jeffcsauer commented Jan 18, 2021

[ENH][DOC] local Geary statistics #145

[ENH][DOC] local Geary statistics #145

Conversation

jeffcsauer commented Aug 22, 2020

codecov-commenter commented Aug 22, 2020 • edited Loading

Codecov Report

jeffcsauer commented Aug 29, 2020

hh2110 commented Sep 2, 2020

jeffcsauer commented Sep 2, 2020

jeffcsauer commented Sep 2, 2020

hh2110 commented Sep 2, 2020

jeffcsauer commented Sep 3, 2020 • edited Loading

jeffcsauer commented Sep 3, 2020

ljwolf commented Jan 18, 2021

ljwolf commented Jan 18, 2021

jeffcsauer commented Jan 18, 2021

codecov-commenter commented Aug 22, 2020 •

edited

Loading

jeffcsauer commented Sep 3, 2020 •

edited

Loading