Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment the effect of removing perCapita stop words #4415

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shifucun
Copy link
Contributor

No description provided.

Copy link
Contributor

@chejennifer chejennifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to be losing all those startWithDenom: trues?

The other diffs look like they're further down in the charts, so seem ok

Copy link
Contributor

@pradh pradh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! how does the svindex_differ result look?

"Count_CriminalActivities_LarcenyTheft",
"Count_CriminalActivities_Robbery",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea on this diff?

@@ -31,7 +31,6 @@
}
],
"denom": "Count_Person",
"startWithDenom": true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we are losing startWithDenom?

# WARNING: These will conflate with Correlation
"vs",
"versus",
QUERY_CLASSIFICATION_HEURISTICS: Dict[str, Union[List[str], Dict[str, List[str]]]] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything actually changing in this dict, or are these just formatting diffs?

"title": "Median Family Household Income (${yDate}) Vs. Diabetes (${xDate}) in Counties of United States",
"type": "SCATTER"
}
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why we lose this?

Comment on lines +308 to +309
"\brate\b(?!\s*(birth|change|death|exchange|fertility|literacy|mortality|participation|unemployment|withdrawal)\s+rate)",
"rates"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the actual change


# We do not want to strip words from events / superlatives / temporal
# since we want those to match SVs too!
HEURISTIC_TYPES_IN_VARIABLES = frozenset(
["Event", "Superlative", "Temporal", "PerCapita"])
HEURISTIC_TYPES_IN_VARIABLES = frozenset(["Event", "Superlative", "Temporal"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the actual change

@chejennifer
Copy link
Contributor

svindex differ result: https://storage.mtls.cloud.google.com/datcom-embedding-diffs/chejennifer_base_uae_mem_2024_07_09_15_03_08.html

base does not strip per capita stop words
test strips the per capita stop words (including the new ones added in this pr)

chejennifer added a commit that referenced this pull request Jul 12, 2024
same change as #4415 but
rebased off a clean master

svindex diff:
https://storage.mtls.cloud.google.com/datcom-embedding-diffs/chejennifer_base_uae_mem_2024_07_09_21_55_14.html

base does not remove per capita stop words
test removes per capita stop words
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants