Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest different df types + tests #134

Conversation

artiom-matvei
Copy link
Contributor

No description provided.

@artiom-matvei artiom-matvei force-pushed the fix_issue_108_ingest_narwhals branch from ca086b6 to a7a60cd Compare October 26, 2024 03:09
@artiom-matvei artiom-matvei force-pushed the fix_issue_108_ingest_narwhals branch from a7a60cd to cb4d158 Compare October 26, 2024 03:17
@artiom-matvei
Copy link
Contributor Author

@vincentarelbundock This one should do the trick for issue #108
These tests were run manually to make sure it accepts pandas and duckdb

import polars as pl
import statsmodels.formula.api as smf
from marginaleffects import *
import numpy as np
import pandas as pd
penguins = pd.read_csv(
    "tests/data/penguins.csv",
    na_values="NA"
).dropna()
mod = smf.ols(
  "body_mass_g ~ flipper_length_mm * species * bill_length_mm + island",
  penguins,
).fit()
p = predictions(mod, newdata=penguins)
import polars as pl
import statsmodels.formula.api as smf
from marginaleffects import *
import numpy as np
import pandas as pd
penguins = pd.read_csv(
    "tests/data/penguins.csv",
    na_values="NA"
).dropna()
import duckdb
df_duckdb = duckdb.sql("SELECT * FROM penguins LIMIT 10")
mod = smf.ols(
  "body_mass_g ~ flipper_length_mm * species * bill_length_mm + island",
  penguins,
).fit()
p = predictions(mod, newdata=df_duckdb)

@vincentarelbundock
Copy link
Owner

pandas is still used in hypotheses_joint.py. Once that is replaced, pandas should be placed in the "extras" list of pyproject.toml, and ideally pyarrow as well, unless we have a compelling reason to keep it there.

Comment on lines -13 to -15
if isinstance(obj, pd.DataFrame):
obj = pl.DataFrame(obj)

Copy link
Contributor Author

@artiom-matvei artiom-matvei Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like obj is of type model and it shouldn't ever be of type dataframe. It was introduced here #94 but not sure why. I also checked in R and obj is not a dataframe in R either.

If it were to be a dataframe all the following operations would fail so I think we can remove it. @vincentarelbundock do you agree?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems right but I won't have time to dig deep, so I'll trust you here.

@artiom-matvei artiom-matvei marked this pull request as ready for review October 27, 2024 01:05
@artiom-matvei
Copy link
Contributor Author

@vincentarelbundock please review and don't hesitate to share your comments.
When merged, please, let me know what would be the next steps. Thank you!

@vincentarelbundock vincentarelbundock merged commit d1d4d24 into vincentarelbundock:main Oct 27, 2024
5 checks passed
@artiom-matvei artiom-matvei deleted the fix_issue_108_ingest_narwhals branch October 27, 2024 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants