Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datarelations): Data Relations MVP #13

Merged
merged 5 commits into from
Sep 20, 2021
Merged

Conversation

jfsantos-ds
Copy link
Contributor

No description provided.

@jfsantos-ds jfsantos-ds self-assigned this Aug 31, 2021
@jfsantos-ds jfsantos-ds marked this pull request as ready for review September 7, 2021 20:21
@UrbanoFonseca
Copy link
Contributor

Don't forget to add this functionality to the main DataQuality class (I forgot for the Bias&Fairness PR 🤦)

src/ydata_quality/data_relations/__init__.py Outdated Show resolved Hide resolved
src/ydata_quality/data_relations/engine.py Outdated Show resolved Hide resolved
src/ydata_quality/data_relations/engine.py Outdated Show resolved Hide resolved
src/ydata_quality/data_relations/engine.py Outdated Show resolved Hide resolved
src/ydata_quality/utils/auxiliary.py Outdated Show resolved Hide resolved
src/ydata_quality/data_relations/engine.py Outdated Show resolved Hide resolved
src/ydata_quality/utils/correlations.py Outdated Show resolved Hide resolved
tutorials/data_relations.ipynb Outdated Show resolved Hide resolved
tutorials/data_relations.ipynb Show resolved Hide resolved
tutorials/data_relations.ipynb Show resolved Hide resolved
@UrbanoFonseca
Copy link
Contributor

really nice module 🚀 I'm looking to try this out on a full dataset

@UrbanoFonseca
Copy link
Contributor

UrbanoFonseca commented Sep 15, 2021

Pending:

  • Break plots for correlation / partial correlation in 2
  • Update notebook
  • Update census dataset file path

@UrbanoFonseca
Copy link
Contributor

@jfsantos-ds current PR still includes previous dataset, please remove for merge
image

tutorials/data_relations.ipynb Outdated Show resolved Hide resolved
tutorials/data_relations.ipynb Outdated Show resolved Hide resolved
tutorials/data_relations.ipynb Outdated Show resolved Hide resolved
tutorials/data_relations.ipynb Outdated Show resolved Hide resolved
tutorials/data_relations.ipynb Outdated Show resolved Hide resolved
tutorials/data_relations.ipynb Outdated Show resolved Hide resolved
src/ydata_quality/data_relations/engine.py Show resolved Hide resolved
src/ydata_quality/data_relations/engine.py Show resolved Hide resolved
@jfsantos-ds
Copy link
Contributor Author

Known issues with the correlations matrix:

  • Odd behavior on correlations of all uniques (or close to all uniques) categorical columns (will easily lead to 100% correlations, still have to think on the expected behavior for this scenario)

  • No robustness of partial correlation matrix in case duplicate columns exist (I think this is due to trying to invert a matrix with incomplete rank) -> to handle this we can drop one of the columns from the correlation matrix for each duplicate pair found

@UrbanoFonseca UrbanoFonseca merged commit a8ad157 into master Sep 20, 2021
@portellaa portellaa deleted the feat/data_relations branch September 23, 2021 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants