Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limiting outcome comparison to a period, filtering out by percentile in the last year #36

Merged
merged 97 commits into from
Nov 1, 2023

Conversation

rfl-urbaniak
Copy link
Contributor

Resolves #32.

Now you can restrict outcome variable attention in similarity calculations by years, and euclidean kins output by outcome performance percentile in the last year, like so:

f = FipsQuery(42001, "gdp", lag = 0, top =5, time_decay = 1.06,
outcome_comparison_period=(2003, 2019), outcome_percentile_range= (40,100))

A working example is available in similarity_demo.ipynb.

@rfl-urbaniak rfl-urbaniak changed the base branch from main to runl-generalized-weights October 23, 2023 12:23
Added population and generalized data loading and euclidean distances to multiple features
@rfl-urbaniak
Copy link
Contributor Author

@riadas please investigate the workflow failure, seems to be related to dependencies installation. All tests pass locally.

@rfl-urbaniak rfl-urbaniak changed the title limining outcome comparison to a period, filtering out by percentile in the last year limiting outcome comparison to a period, filtering out by percentile in the last year Oct 23, 2023
@rfl-urbaniak
Copy link
Contributor Author

Really great functionality already, some remaining design stuff to figure out, but already quite powerful!

I'm happy for this to be merged, though there are some things to be fixed -- please make issues for things that will be fixed in future PRs.

In addition to what's in my and Andy's comments:

* [ ]  In similarity_demo.ipynb (see codeblock below), the FipsQuery option outcome_comparison_period doesn't work right. Setting it to (2003, 2010) yields weights between 2001 and 2008. Setting it (2010, 2023) yields weights bt 2001 and 2011. (2003, 2019) -> (2001, 2017).
f  = FipsQuery(42001, "gdp", lag = 0, top =5, time_decay = 1.06, 
               outcome_comparison_period=(2003, 2010), outcome_percentile_range= (40,100))
f.find_euclidean_kins()
display(f.plot_weights()) 
* [ ]  Also debug the FipsQuery option outcome_comparison_period for the case of multiple feature groups. This behaves badly (different times for the different variables, neither what is specified):
f  = FipsQuery(1007, outcome_var = "gdp",
               feature_groups_with_weights= {"gdp": 1, "population":1}, #with one feature group only
               # weights 1-4 won't make a difference
               lag = 0, top =5, time_decay = 1.5, outcome_comparison_period=(2003, 2010))

Yeh, that seems like a bug. Thanks for spotting it! Will get on it once the first version of the explainability stuff is ready.

@rfl-urbaniak
Copy link
Contributor Author

@riadas @emackev the bug has been fixed, inspect the first few cells of the similarity notebook to confirm the results are as desired. @emackev once the first issue has been resolved, the second is not a bug, but a feature. you should expect different years for different variables, as you restrict weights to certain years for the outcome variable only and use whatever data is available for other variables. Other variables differ in what years are available.

In a discussion about this feature we talked about locations that "have similar outcome variable for years x-y and are similar in other respects", and this implements the idea. If you want comparison years to restrict all variables in similarity search, this can be implemented, but I'm not sure if this is what you want.

Copy link
Collaborator

@riadas riadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things are working well enough now on the frontend, so think this is good to go.

@riadas riadas merged commit 37574d9 into runl-generalized-weights Nov 1, 2023
@rfl-urbaniak rfl-urbaniak deleted the ru-divergence-search branch November 2, 2023 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

filtering by percentile on outcome variable
5 participants