limiting outcome comparison to a period, filtering out by percentile in the last year #36

rfl-urbaniak · 2023-10-23T12:21:34Z

Resolves #32.

Now you can restrict outcome variable attention in similarity calculations by years, and euclidean kins output by outcome performance percentile in the last year, like so:

f = FipsQuery(42001, "gdp", lag = 0, top =5, time_decay = 1.06,
outcome_comparison_period=(2003, 2019), outcome_percentile_range= (40,100))

A working example is available in similarity_demo.ipynb.

Added gdp data and the minimal working version

Added population and generalized data loading and euclidean distances to multiple features

rfl-urbaniak · 2023-10-23T15:40:57Z

@riadas please investigate the workflow failure, seems to be related to dependencies installation. All tests pass locally.

…into elm-compile-unempl-rate

…ch/cities into nl-add-industry-time-series

adding industry composition as a time series

rfl-urbaniak · 2023-10-31T17:51:46Z

Really great functionality already, some remaining design stuff to figure out, but already quite powerful!

I'm happy for this to be merged, though there are some things to be fixed -- please make issues for things that will be fixed in future PRs.

In addition to what's in my and Andy's comments:

* [ ]  In similarity_demo.ipynb (see codeblock below), the FipsQuery option outcome_comparison_period doesn't work right. Setting it to (2003, 2010) yields weights between 2001 and 2008. Setting it (2010, 2023) yields weights bt 2001 and 2011. (2003, 2019) -> (2001, 2017).

f  = FipsQuery(42001, "gdp", lag = 0, top =5, time_decay = 1.06, 
               outcome_comparison_period=(2003, 2010), outcome_percentile_range= (40,100))
f.find_euclidean_kins()
display(f.plot_weights())

* [ ]  Also debug the FipsQuery option outcome_comparison_period for the case of multiple feature groups. This behaves badly (different times for the different variables, neither what is specified):

f  = FipsQuery(1007, outcome_var = "gdp",
               feature_groups_with_weights= {"gdp": 1, "population":1}, #with one feature group only
               # weights 1-4 won't make a difference
               lag = 0, top =5, time_decay = 1.5, outcome_comparison_period=(2003, 2010))

Yeh, that seems like a bug. Thanks for spotting it! Will get on it once the first version of the explainability stuff is ready.

rfl-urbaniak · 2023-11-01T05:12:19Z

@riadas @emackev the bug has been fixed, inspect the first few cells of the similarity notebook to confirm the results are as desired. @emackev once the first issue has been resolved, the second is not a bug, but a feature. you should expect different years for different variables, as you restrict weights to certain years for the outcome variable only and use whatever data is available for other variables. Other variables differ in what years are available.

In a discussion about this feature we talked about locations that "have similar outcome variable for years x-y and are similar in other respects", and this implements the idea. If you want comparison years to restrict all variables in similarity search, this can be implemented, but I'm not sure if this is what you want.

…ch/cities into ru-divergence-search

…ch/cities into ru-explainability-table

riadas

Things are working well enough now on the frontend, so think this is good to go.

Ru explainability table

scraping unemployment data from BLS, script & csv

riadas and others added 5 commits October 13, 2023 15:27

Merge pull request #5 from BasisResearch/ru-add-gdp

428e04d

Added gdp data and the minimal working version

added percentiles

7c79e38

added comparison period

75c88cd

added percentile filtering

3220549

lint

e8c5001

rfl-urbaniak requested review from Niklewa, riadas, emackev and azane October 23, 2023 12:22

rfl-urbaniak changed the base branch from main to runl-generalized-weights October 23, 2023 12:23

Merge pull request #14 from BasisResearch/ru-add-fed-and-features

ed48a3b

Added population and generalized data loading and euclidean distances to multiple features

rfl-urbaniak changed the title ~~limining outcome comparison to a period, filtering out by percentile in the last year~~ limiting outcome comparison to a period, filtering out by percentile in the last year Oct 23, 2023

Emily and others added 17 commits October 23, 2023 21:41

scraping unemployment data from BLS, script & csv

95257c2

fixed FIPS bug; ploting map to check

5db8235

.ipynb for getting personal income

8842241

saving income notebook

8da64ef

saved unempl notebook

0a9f215

removed eggs tracking

89c6cdf

expanding gitignore

046d73f

Merge branch 'ru-add-gdp' of https://github.com/BasisResearch/cities …

665f444

…into elm-compile-unempl-rate

adding ethnic compostion data set WIP

c6fe58d

updated data rules to be helpful to Emily

e84e2b9

find root, list features, new DataGrabber test

c0a9a3a

Merge 'ru-add-gdp' into elm-compile-unempl-rate

ac547c3

debugged data folder test

be69d44

Mer'ru-add-gdp' into elm-compile-unempl-rate

a6c963e

update to gitignore

9cd1517

updated gitignore

bf84ddd

cleanup

f3f4d2b

rfl-urbaniak added 7 commits October 30, 2023 18:19

made file paths more robust in test_cleaning_utils

2bfbd8d

fixed weight plotting

6af22e6

done with E's review of generalized...

6c2fe39

removed torch version no from reqs (runner error)

6476ad1

Merge branch 'ru-divergence-search' of https://github.com/BasisResear…

e734caa

…ch/cities into nl-add-industry-time-series

data notebook cleanup, lint

d1fe7e9

Merge pull request #50 from BasisResearch/nl-add-industry-time-series

a01905e

adding industry composition as a time series

rfl-urbaniak mentioned this pull request Oct 31, 2023

weights misbehave with time restrictions #58

Closed

2 tasks

fixed weights when years restricted

33f98f6

rfl-urbaniak and others added 12 commits November 1, 2023 07:48

explainability table WIP

f03de67

small bug fix to operate with frontend

b5acaa1

Merge branch 'ru-divergence-search' of https://github.com/BasisResear…

3a3253f

…ch/cities into ru-divergence-search

Merge branch 'ru-divergence-search' of https://github.com/BasisResear…

e250947

…ch/cities into ru-explainability-table

explainability table WIP

500f593

explainability table WIP

aba6543

Merge branch 'ru-divergence-search' into elm-compile-unempl-rate

74d83ce

Merge branch 'ru-divergence-search' into elm-compile-unempl-rate

d4275b8

run cleaning_pipeline to update datasets

b220f5c

added explainability table

8f1d709

format, lint

2f97b64

typo

c69f853

riadas approved these changes Nov 1, 2023

View reviewed changes

Emily and others added 4 commits November 1, 2023 16:34

fix make format, lint issues

e1c5c6c

add message to assertion error check

526a219

Merge pull request #61 from BasisResearch/ru-explainability-table

34b865f

Ru explainability table

Merge pull request #38 from BasisResearch/elm-compile-unempl-rate

de2a6e5

scraping unemployment data from BLS, script & csv

riadas merged commit 37574d9 into runl-generalized-weights Nov 1, 2023

rfl-urbaniak deleted the ru-divergence-search branch November 2, 2023 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limiting outcome comparison to a period, filtering out by percentile in the last year #36

limiting outcome comparison to a period, filtering out by percentile in the last year #36

rfl-urbaniak commented Oct 23, 2023

rfl-urbaniak commented Oct 23, 2023

rfl-urbaniak commented Oct 31, 2023

rfl-urbaniak commented Nov 1, 2023

riadas left a comment

limiting outcome comparison to a period, filtering out by percentile in the last year #36

limiting outcome comparison to a period, filtering out by percentile in the last year #36

Conversation

rfl-urbaniak commented Oct 23, 2023

rfl-urbaniak commented Oct 23, 2023

rfl-urbaniak commented Oct 31, 2023

rfl-urbaniak commented Nov 1, 2023

riadas left a comment

Choose a reason for hiding this comment