Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: save preprocessed data into one .hdf5 file as default #250

Merged
merged 26 commits into from
Nov 25, 2022

Conversation

gcroci2
Copy link
Collaborator

@gcroci2 gcroci2 commented Nov 18, 2022

Now the preprocessed data are saved into one hdf5 as default. The user can always decide to keep them in separated files using combine_files parameter in preprocess function.

This is the current order of arguments in preprocess:

def preprocess(
    queries: List[Query],
    prefix: Optional[str] = None,
    process_count: Optional[int] = None,
    combine_files: bool = True,
    feature_modules: Union[List[ModuleType], str] = "all"
    ):

Now the user can fill in the arguments like this: preprocess(queries, prefix) in the default case (that is supposed to be the most used one, especially when starting using the package; the possibility to select only some features from the beginning should be used by "expert" users that already have lots of insights on the features themselves). So if we put it in a position different from the last ones --same applies to combine_files-- , then it will be always needed to specify the arguments like this: preprocess(queries=queries, prefix=prefix).

But we can think about changing such order @DaniBodor

@gcroci2 gcroci2 requested a review from DaniBodor November 18, 2022 10:37
@gcroci2 gcroci2 linked an issue Nov 18, 2022 that may be closed by this pull request
@gcroci2 gcroci2 requested a review from cbaakman November 22, 2022 12:37
tests/test_preprocess.py Outdated Show resolved Hide resolved
deeprankcore/preprocess.py Outdated Show resolved Hide resolved
deeprankcore/query.py Outdated Show resolved Hide resolved
deeprankcore/query.py Outdated Show resolved Hide resolved
@gcroci2 gcroci2 merged commit 59c9be4 into main Nov 25, 2022
@gcroci2 gcroci2 deleted the gcroci2_181_preprocess_one_file branch November 25, 2022 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Save preprocessed data in one hdf5 file
2 participants