-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Paramspace to automate file naming scheme based on wildcards #36
Labels
feature
A new feature request or enhancement
Comments
A quick proof of concept: import pandas as pd
import yaml
from snakemake.utils import Paramspace
with open('config/robust.yml', 'r') as infile:
config = yaml.load(infile, Loader=yaml.Loader)
ignore_keys = ['dataset-csv', 'outcome-colname', 'hyperparams', 'find-feature-importance', 'nseeds', 'ncores']
for k in ignore_keys:
config.pop(k)
config_df = pd.DataFrame.from_dict(config)
paramspace = Paramspace(config_df, param_sep = "_")
print('paramspace.wildcard_pattern:\t', paramspace.wildcard_pattern)
print('paramspace.instance_patterns:\t', [i for i in paramspace.instance_patterns]) output:
|
kelly-sovacool
added a commit
that referenced
this issue
Jan 18, 2023
Now with permutations of lists in config, similar to R's from itertools import product
import pandas as pd
from snakemake.utils import Paramspace
import yaml
with open('config/robust.yml', 'r') as infile:
config = yaml.load(infile, Loader=yaml.Loader)
ignore_keys = ['dataset_csv', 'outcome_colname', 'hyperparams', 'find_feature_importance', 'ncores', 'nseeds']
for k in ignore_keys:
config.pop(k, None)
config['seed'] = list(range(100, 102))
conf_lists = {k:v for k,v in config.items() if type(v) == list}
params_df = pd.DataFrame(list(product(*[v for v in conf_lists.values()])), columns = conf_lists.keys())
for k in conf_lists.keys():
config.pop(k)
for k, v in config.items():
params_df[k] = v
paramspace = Paramspace(params_df, param_sep = "_")
print('paramspace.wildcard_pattern:\t', paramspace.wildcard_pattern)
print('paramspace.instance_patterns:\t', [i for i in paramspace.instance_patterns]) output:
|
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently, wildcards are hardcoded in I/O filenames. However, users might like to use different parameters (e.g. different outcomes to investigate the same dataset at different taxonomic levels, etc.) to repeat model training. Using
Paramspace
would help the rule definitions be more generalized instead of hard-coded. See the main snakemake docs and the snakemake.utils api docs for howParamspace()
works.TODO
The text was updated successfully, but these errors were encountered: