Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] training xgboost dosen't work with dataframe, only numpy array #1

Open
Yarden234 opened this issue Jul 14, 2022 · 3 comments
Open
Labels
bug Something isn't working

Comments

@Yarden234
Copy link

Hello and thanks you for that package.
I came across a problem while trying to use a xgboost model that was trained on dataframe.
So this is my code:

X_train, X_test, y_train, y_test = load_csv('X_train'), load_csv('X_test'), load_csv('y_train'), load_csv('y_test')
model = XGBClassifier(tree_method='hist')
X_train_val, y_train_vals = X_train.values, y_train.values.squeeze()
X_test_val, y_test = X_test.values, y_test.values.squeeze()
model.fit(X_train, y_train)

# fit influence estimator
explainer = BoostIn().fit(model, X_train, y_train)

Which produce this exception:

Traceback (most recent call last):
  File "/home/jupyter/owlytics-data-science/influence/influence.py", line 35, in <module>
    explainer = BoostIn().fit(model, X_train, y_train)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/boostin.py", line 44, in fit
    super().fit(model, X, y)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/base.py", line 31, in fit
    self.model_ = parse_model(model, X, y)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/__init__.py", line 33, in parse_model
    trees, params = parse_xgb_ensemble(model)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 17, in parse_xgb_ensemble
    trees = np.array([_parse_xgb_tree(tree_str) for tree_str in string_data], dtype=np.dtype(object))
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 17, in <listcomp>
    trees = np.array([_parse_xgb_tree(tree_str) for tree_str in string_data], dtype=np.dtype(object))
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 88, in _parse_xgb_tree
    node_dict = _parse_line(line)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 190, in _parse_line
    res['feature'], res['threshold'] = _parse_decision_node_line(line)
  File "/opt/conda/envs/py39/lib/python3.9/site-packages/tree_influence/explainers/parsers/parser_xgb.py", line 201, in _parse_decision_node_line
    feature_ndx = int(feature_str[1:])
ValueError: invalid literal for int() with base 10: 'ecent_beta_blockers_change'

However, When training X_train_val, y_train_val (which is a numpy array) works perfectly good.
It would be great if you could support training with DataFrame as well.
Thanks again!

@jjbrophy47 jjbrophy47 added the bug Something isn't working label Jul 14, 2022
@jjbrophy47
Copy link
Owner

Hi Yarden234! Thanks for bringing this up. I believe I've fixed this issue now in version 0.1.1. Please give it a try and feel free to open this issue back up if it's not working. Thanks again!

@aclarkse
Copy link

Hi there,

I encountered this error still. I was wondering if you might check on it again. Thanks!

@jjbrophy47 jjbrophy47 reopened this Mar 30, 2024
@jjbrophy47
Copy link
Owner

Hi @aclarkse, can you provide a fully reproducible example, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants