Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loading pretrained tokenizers returns exception when installing package from git #97

Open
sivanravidos opened this issue Jan 23, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@sivanravidos
Copy link
Collaborator

Describe the bug
When installing with pip install from git, the compiled tokenizers are not installed

To reproduce

  1. install from git:
    pip install git+https://github.com/BiomedSciAI/fuse-med-ml.git
    pip install git+https://github.com/BiomedSciAI/fuse-drug.git

  2. Try to use a pre-trained tokenizer:

import os
from fusedrug.data.tokenizer.ops import FastModularTokenizer
from fusedrug.data.tokenizer.modulartokenizer import pretrained_tokenizers
tokenizer_path = os.path.join(pretrained_tokenizers.get_dir_path(), 'modular_AA_SMILES_genes_single_path')
tokenizer_op = FastModularTokenizer(tokenizer_path=tokenizer_path)

results in exception

>> FastModularTokenizer(tokenizer_path=tokenizer_path)
Traceback (most recent call last):
  File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 440, in load
    loaded_conf: omegaconf.dictconfig.DictConfig = OmegaConf.load(
  File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/omegaconf/omegaconf.py", line 189, in load
    with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/pretrained_tokenizers/modular_AA_SMILES_genes_single_path/config.yaml'
Traceback (most recent call last):
  File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 440, in load
    loaded_conf: omegaconf.dictconfig.DictConfig = OmegaConf.load(
  File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/omegaconf/omegaconf.py", line 189, in load
    with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/pretrained_tokenizers/modular_AA_SMILES_genes_single_path/config.yaml'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/ops/modular_tokenizer_ops.py", line 47, in __init__
    self._tokenizer = Tokenizer.from_file(self._tokenizer_path)
  File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 1545, in from_file
    return ModularTokenizer.load(path)
  File "/dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/modular_tokenizer.py", line 445, in load
    raise Exception(f"couldn't load config.yaml from {path}")
Exception: couldn't load config.yaml from /dccstor/fmm2/sivanra/anaconda3/envs/test_fuse_drug/lib/python3.8/site-packages/fusedrug/data/tokenizer/modulartokenizer/pretrained_tokenizers/modular_AA_SMILES_genes_single_path

Expected behavior
tokenizers files should be added to setup.py so they are installed together with the python files

@sivanravidos sivanravidos added the bug Something isn't working label Jan 23, 2024
@sivanravidos sivanravidos changed the title Package should be installed with the data files loading pretrained tokenizers returns exception when installing package from git Jan 23, 2024
@mmdanziger
Copy link

This looks like a package data issue. The real question here is how is CI passing??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants