-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment the effect of removing perCapita stop words #4415
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to be losing all those startWithDenom: true
s?
The other diffs look like they're further down in the charts, so seem ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! how does the svindex_differ result look?
"Count_CriminalActivities_LarcenyTheft", | ||
"Count_CriminalActivities_Robbery", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea on this diff?
@@ -31,7 +31,6 @@ | |||
} | |||
], | |||
"denom": "Count_Person", | |||
"startWithDenom": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we are losing startWithDenom?
# WARNING: These will conflate with Correlation | ||
"vs", | ||
"versus", | ||
QUERY_CLASSIFICATION_HEURISTICS: Dict[str, Union[List[str], Dict[str, List[str]]]] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything actually changing in this dict, or are these just formatting diffs?
"title": "Median Family Household Income (${yDate}) Vs. Diabetes (${xDate}) in Counties of United States", | ||
"type": "SCATTER" | ||
} | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why we lose this?
"\brate\b(?!\s*(birth|change|death|exchange|fertility|literacy|mortality|participation|unemployment|withdrawal)\s+rate)", | ||
"rates" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the actual change
|
||
# We do not want to strip words from events / superlatives / temporal | ||
# since we want those to match SVs too! | ||
HEURISTIC_TYPES_IN_VARIABLES = frozenset( | ||
["Event", "Superlative", "Temporal", "PerCapita"]) | ||
HEURISTIC_TYPES_IN_VARIABLES = frozenset(["Event", "Superlative", "Temporal"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the actual change
svindex differ result: https://storage.mtls.cloud.google.com/datcom-embedding-diffs/chejennifer_base_uae_mem_2024_07_09_15_03_08.html base does not strip per capita stop words |
same change as #4415 but rebased off a clean master svindex diff: https://storage.mtls.cloud.google.com/datcom-embedding-diffs/chejennifer_base_uae_mem_2024_07_09_21_55_14.html base does not remove per capita stop words test removes per capita stop words
No description provided.