Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pipeline digest in JSON and YAML format #238

Merged
merged 2 commits into from
Oct 8, 2019
Merged

Add support for pipeline digest in JSON and YAML format #238

merged 2 commits into from
Oct 8, 2019

Conversation

janosh
Copy link
Member

@janosh janosh commented Oct 7, 2019

Let me know if you would like me to add some tests. I tried running the one's that are already there but pytest has been stuck on test_adaptors.py for half an hour now.

@janosh
Copy link
Member Author

janosh commented Oct 7, 2019

Here's an example of a YAML digest:

Click to expand...
_logger: null
autofeaturizer:
  autofeaturizer:
    _logger: null
    auto_featurizer: true
    bandstruct_col: bandstructure
    cache_src: null
    composition_col: composition
    converted_input_df:
      columns: 3
      obj: <not serializable>
      samples: 537
    do_precheck: true
    dos_col: dos
    drop_inputs: true
    exclude: []
    features:
    - MagpieData minimum Number
    - MagpieData maximum Number
    - MagpieData range Number
    - MagpieData mean Number
    - MagpieData avg_dev Number
    - MagpieData mode Number
    - MagpieData minimum MendeleevNumber
    - MagpieData maximum MendeleevNumber
    - MagpieData range MendeleevNumber
    - MagpieData mean MendeleevNumber
    - MagpieData avg_dev MendeleevNumber
    - MagpieData mode MendeleevNumber
    - MagpieData minimum AtomicWeight
    - MagpieData maximum AtomicWeight
    - MagpieData range AtomicWeight
    - MagpieData mean AtomicWeight
    - MagpieData avg_dev AtomicWeight
    - MagpieData mode AtomicWeight
    - MagpieData minimum MeltingT
    - MagpieData maximum MeltingT
    - MagpieData range MeltingT
    - MagpieData mean MeltingT
    - MagpieData avg_dev MeltingT
    - MagpieData mode MeltingT
    - MagpieData minimum Column
    - MagpieData maximum Column
    - MagpieData range Column
    - MagpieData mean Column
    - MagpieData avg_dev Column
    - MagpieData mode Column
    - MagpieData minimum Row
    - MagpieData maximum Row
    - MagpieData range Row
    - MagpieData mean Row
    - MagpieData avg_dev Row
    - MagpieData mode Row
    - MagpieData minimum CovalentRadius
    - MagpieData maximum CovalentRadius
    - MagpieData range CovalentRadius
    - MagpieData mean CovalentRadius
    - MagpieData avg_dev CovalentRadius
    - MagpieData mode CovalentRadius
    - MagpieData minimum Electronegativity
    - MagpieData maximum Electronegativity
    - MagpieData range Electronegativity
    - MagpieData mean Electronegativity
    - MagpieData avg_dev Electronegativity
    - MagpieData mode Electronegativity
    - MagpieData minimum NsValence
    - MagpieData maximum NsValence
    - MagpieData range NsValence
    - MagpieData mean NsValence
    - MagpieData avg_dev NsValence
    - MagpieData mode NsValence
    - MagpieData minimum NpValence
    - MagpieData maximum NpValence
    - MagpieData range NpValence
    - MagpieData mean NpValence
    - MagpieData avg_dev NpValence
    - MagpieData mode NpValence
    - MagpieData minimum NdValence
    - MagpieData maximum NdValence
    - MagpieData range NdValence
    - MagpieData mean NdValence
    - MagpieData avg_dev NdValence
    - MagpieData mode NdValence
    - MagpieData minimum NfValence
    - MagpieData maximum NfValence
    - MagpieData range NfValence
    - MagpieData mean NfValence
    - MagpieData avg_dev NfValence
    - MagpieData mode NfValence
    - MagpieData minimum NValence
    - MagpieData maximum NValence
    - MagpieData range NValence
    - MagpieData mean NValence
    - MagpieData avg_dev NValence
    - MagpieData mode NValence
    - MagpieData minimum NsUnfilled
    - MagpieData maximum NsUnfilled
    - MagpieData range NsUnfilled
    - MagpieData mean NsUnfilled
    - MagpieData avg_dev NsUnfilled
    - MagpieData mode NsUnfilled
    - MagpieData minimum NpUnfilled
    - MagpieData maximum NpUnfilled
    - MagpieData range NpUnfilled
    - MagpieData mean NpUnfilled
    - MagpieData avg_dev NpUnfilled
    - MagpieData mode NpUnfilled
    - MagpieData minimum NdUnfilled
    - MagpieData maximum NdUnfilled
    - MagpieData range NdUnfilled
    - MagpieData mean NdUnfilled
    - MagpieData avg_dev NdUnfilled
    - MagpieData mode NdUnfilled
    - MagpieData minimum NfUnfilled
    - MagpieData maximum NfUnfilled
    - MagpieData range NfUnfilled
    - MagpieData mean NfUnfilled
    - MagpieData avg_dev NfUnfilled
    - MagpieData mode NfUnfilled
    - MagpieData minimum NUnfilled
    - MagpieData maximum NUnfilled
    - MagpieData range NUnfilled
    - MagpieData mean NUnfilled
    - MagpieData avg_dev NUnfilled
    - MagpieData mode NUnfilled
    - MagpieData minimum GSvolume_pa
    - MagpieData maximum GSvolume_pa
    - MagpieData range GSvolume_pa
    - MagpieData mean GSvolume_pa
    - MagpieData avg_dev GSvolume_pa
    - MagpieData mode GSvolume_pa
    - MagpieData minimum GSbandgap
    - MagpieData maximum GSbandgap
    - MagpieData range GSbandgap
    - MagpieData mean GSbandgap
    - MagpieData avg_dev GSbandgap
    - MagpieData mode GSbandgap
    - MagpieData minimum GSmagmom
    - MagpieData maximum GSmagmom
    - MagpieData range GSmagmom
    - MagpieData mean GSmagmom
    - MagpieData avg_dev GSmagmom
    - MagpieData mode GSmagmom
    - MagpieData minimum SpaceGroupNumber
    - MagpieData maximum SpaceGroupNumber
    - MagpieData range SpaceGroupNumber
    - MagpieData mean SpaceGroupNumber
    - MagpieData avg_dev SpaceGroupNumber
    - MagpieData mode SpaceGroupNumber
    - minimum oxidation state
    - maximum oxidation state
    - range oxidation state
    - std_dev oxidation state
    - avg anion electron affinity
    - compound possible
    - max ionic char
    - avg ionic char
    featurizers:
      bandstructure:
      - <not serializable>
      - <not serializable>
      composition:
      - <not serializable>
      - <not serializable>
      - <not serializable>
      - <not serializable>
      dos:
      - <not serializable>
      - <not serializable>
      - <not serializable>
      - <not serializable>
      structure:
      - <not serializable>
      - <not serializable>
      - <not serializable>
      - <not serializable>
      - <not serializable>
    fittable_fcls: <not serializable>
    fitted_input_df:
      columns: 3
      obj: <not serializable>
      samples: 537
    functionalize: false
    guess_oxistates: true
    ignore_cols: []
    ignore_errors: true
    is_fit: true
    min_precheck_frac: 0.9
    multiindex: false
    n_jobs: null
    needs_fit: false
    preset: express
    removed_featurizers:
    - <not serializable>
    - <not serializable>
    structure_col: structure
cleaner:
  cleaner:
    _logger: null
    drop_na_targets: true
    dropped_features:
    - max ionic char
    - maximum oxidation state
    - avg ionic char
    - avg anion electron affinity
    - std_dev oxidation state
    - compound possible
    - minimum oxidation state
    - range oxidation state
    dropped_samples:
      columns: 142
      obj: <not serializable>
      samples: 0
    encode_categories: true
    encoder: one-hot
    feature_na_method: drop
    fitted_df:
      columns: 134
      obj: <not serializable>
      samples: 537
    fitted_target: zT
    is_fit: true
    max_na_frac: 0.01
    na_method_fit: drop
    na_method_transform: fill
    number_cols:
    - T
    - MagpieData minimum Number
    - MagpieData maximum Number
    - MagpieData range Number
    - MagpieData mean Number
    - MagpieData avg_dev Number
    - MagpieData mode Number
    - MagpieData minimum MendeleevNumber
    - MagpieData maximum MendeleevNumber
    - MagpieData range MendeleevNumber
    - MagpieData mean MendeleevNumber
    - MagpieData avg_dev MendeleevNumber
    - MagpieData mode MendeleevNumber
    - MagpieData minimum AtomicWeight
    - MagpieData maximum AtomicWeight
    - MagpieData range AtomicWeight
    - MagpieData mean AtomicWeight
    - MagpieData avg_dev AtomicWeight
    - MagpieData mode AtomicWeight
    - MagpieData minimum MeltingT
    - MagpieData maximum MeltingT
    - MagpieData range MeltingT
    - MagpieData mean MeltingT
    - MagpieData avg_dev MeltingT
    - MagpieData mode MeltingT
    - MagpieData minimum Column
    - MagpieData maximum Column
    - MagpieData range Column
    - MagpieData mean Column
    - MagpieData avg_dev Column
    - MagpieData mode Column
    - MagpieData minimum Row
    - MagpieData maximum Row
    - MagpieData range Row
    - MagpieData mean Row
    - MagpieData avg_dev Row
    - MagpieData mode Row
    - MagpieData minimum CovalentRadius
    - MagpieData maximum CovalentRadius
    - MagpieData range CovalentRadius
    - MagpieData mean CovalentRadius
    - MagpieData avg_dev CovalentRadius
    - MagpieData mode CovalentRadius
    - MagpieData minimum Electronegativity
    - MagpieData maximum Electronegativity
    - MagpieData range Electronegativity
    - MagpieData mean Electronegativity
    - MagpieData avg_dev Electronegativity
    - MagpieData mode Electronegativity
    - MagpieData minimum NsValence
    - MagpieData maximum NsValence
    - MagpieData range NsValence
    - MagpieData mean NsValence
    - MagpieData avg_dev NsValence
    - MagpieData mode NsValence
    - MagpieData minimum NpValence
    - MagpieData maximum NpValence
    - MagpieData range NpValence
    - MagpieData mean NpValence
    - MagpieData avg_dev NpValence
    - MagpieData mode NpValence
    - MagpieData minimum NdValence
    - MagpieData maximum NdValence
    - MagpieData range NdValence
    - MagpieData mean NdValence
    - MagpieData avg_dev NdValence
    - MagpieData mode NdValence
    - MagpieData minimum NfValence
    - MagpieData maximum NfValence
    - MagpieData range NfValence
    - MagpieData mean NfValence
    - MagpieData avg_dev NfValence
    - MagpieData mode NfValence
    - MagpieData minimum NValence
    - MagpieData maximum NValence
    - MagpieData range NValence
    - MagpieData mean NValence
    - MagpieData avg_dev NValence
    - MagpieData mode NValence
    - MagpieData minimum NsUnfilled
    - MagpieData maximum NsUnfilled
    - MagpieData range NsUnfilled
    - MagpieData mean NsUnfilled
    - MagpieData avg_dev NsUnfilled
    - MagpieData mode NsUnfilled
    - MagpieData minimum NpUnfilled
    - MagpieData maximum NpUnfilled
    - MagpieData range NpUnfilled
    - MagpieData mean NpUnfilled
    - MagpieData avg_dev NpUnfilled
    - MagpieData mode NpUnfilled
    - MagpieData minimum NdUnfilled
    - MagpieData maximum NdUnfilled
    - MagpieData range NdUnfilled
    - MagpieData mean NdUnfilled
    - MagpieData avg_dev NdUnfilled
    - MagpieData mode NdUnfilled
    - MagpieData minimum NfUnfilled
    - MagpieData maximum NfUnfilled
    - MagpieData range NfUnfilled
    - MagpieData mean NfUnfilled
    - MagpieData avg_dev NfUnfilled
    - MagpieData mode NfUnfilled
    - MagpieData minimum NUnfilled
    - MagpieData maximum NUnfilled
    - MagpieData range NUnfilled
    - MagpieData mean NUnfilled
    - MagpieData avg_dev NUnfilled
    - MagpieData mode NUnfilled
    - MagpieData minimum GSvolume_pa
    - MagpieData maximum GSvolume_pa
    - MagpieData range GSvolume_pa
    - MagpieData mean GSvolume_pa
    - MagpieData avg_dev GSvolume_pa
    - MagpieData mode GSvolume_pa
    - MagpieData minimum GSbandgap
    - MagpieData maximum GSbandgap
    - MagpieData range GSbandgap
    - MagpieData mean GSbandgap
    - MagpieData avg_dev GSbandgap
    - MagpieData mode GSbandgap
    - MagpieData minimum GSmagmom
    - MagpieData maximum GSmagmom
    - MagpieData range GSmagmom
    - MagpieData mean GSmagmom
    - MagpieData avg_dev GSmagmom
    - MagpieData mode GSmagmom
    - MagpieData minimum SpaceGroupNumber
    - MagpieData maximum SpaceGroupNumber
    - MagpieData range SpaceGroupNumber
    - MagpieData mean SpaceGroupNumber
    - MagpieData avg_dev SpaceGroupNumber
    - MagpieData mode SpaceGroupNumber
    - minimum oxidation state
    - maximum oxidation state
    - range oxidation state
    - std_dev oxidation state
    - avg anion electron affinity
    - compound possible
    - max ionic char
    - avg ionic char
    object_cols: []
is_fit: true
learner:
  learner:
    _backend: <not serializable>
    _features:
    - T
    - MagpieData mean Number
    - MagpieData mean MendeleevNumber
    - MagpieData avg_dev MendeleevNumber
    - MagpieData maximum AtomicWeight
    - MagpieData range AtomicWeight
    - MagpieData avg_dev AtomicWeight
    - MagpieData minimum MeltingT
    - MagpieData mean MeltingT
    - MagpieData mean Column
    - MagpieData avg_dev Row
    - MagpieData minimum CovalentRadius
    - MagpieData range CovalentRadius
    - MagpieData mean CovalentRadius
    - MagpieData avg_dev CovalentRadius
    - MagpieData maximum Electronegativity
    - MagpieData range Electronegativity
    - MagpieData mean Electronegativity
    - MagpieData avg_dev Electronegativity
    - MagpieData avg_dev NpValence
    - MagpieData mean NdValence
    - MagpieData mode NdValence
    - MagpieData mean NfValence
    - MagpieData avg_dev NfValence
    - MagpieData maximum NValence
    - MagpieData mean NValence
    - MagpieData avg_dev NValence
    - MagpieData mode NValence
    - MagpieData range NpUnfilled
    - MagpieData mean NpUnfilled
    - MagpieData avg_dev NpUnfilled
    - MagpieData range NUnfilled
    - MagpieData mean NUnfilled
    - MagpieData avg_dev NUnfilled
    - MagpieData minimum GSvolume_pa
    - MagpieData range GSvolume_pa
    - MagpieData mean GSvolume_pa
    - MagpieData avg_dev GSvolume_pa
    - MagpieData mean SpaceGroupNumber
    - MagpieData avg_dev SpaceGroupNumber
    _fitted_target: zT
    _logger: null
    greater_score_is_better: null
    is_fit: true
    mode: regression
    models: null
    random_state: null
    tpot_kwargs:
      config_dict:
        sklearn.cluster.FeatureAgglomeration:
          affinity:
          - euclidean
          - l1
          - l2
          - manhattan
          - cosine
          linkage:
          - ward
          - complete
          - average
        sklearn.decomposition.FastICA:
          tol: <not serializable>
        sklearn.decomposition.PCA:
          iterated_power: <not serializable>
          svd_solver:
          - randomized
        sklearn.ensemble.ExtraTreesRegressor:
          bootstrap:
          - true
          - false
          max_features: <not serializable>
          min_samples_leaf: <not serializable>
          min_samples_split: <not serializable>
          n_estimators:
          - 20
          - 100
          - 200
          - 500
          - 1000
        sklearn.ensemble.GradientBoostingRegressor:
          alpha:
          - 0.75
          - 0.8
          - 0.85
          - 0.9
          - 0.95
          - 0.99
          learning_rate:
          - 0.01
          - 0.1
          - 0.5
          - 1.0
          loss:
          - ls
          - lad
          - huber
          - quantile
          max_depth: <not serializable>
          max_features: <not serializable>
          min_samples_leaf: <not serializable>
          min_samples_split: <not serializable>
          n_estimators:
          - 20
          - 100
          - 200
          - 500
          - 1000
          subsample: <not serializable>
        sklearn.ensemble.RandomForestRegressor:
          bootstrap:
          - true
          - false
          max_features: <not serializable>
          min_samples_leaf: <not serializable>
          min_samples_split: <not serializable>
          n_estimators:
          - 20
          - 100
          - 200
          - 500
          - 1000
        sklearn.feature_selection.SelectFromModel:
          estimator:
            sklearn.ensemble.ExtraTreesRegressor:
              max_features: <not serializable>
              n_estimators:
              - 100
          threshold: <not serializable>
        sklearn.feature_selection.SelectFwe:
          alpha: <not serializable>
          score_func:
            sklearn.feature_selection.f_regression: null
        sklearn.feature_selection.SelectPercentile:
          percentile: <not serializable>
          score_func:
            sklearn.feature_selection.f_regression: null
        sklearn.feature_selection.VarianceThreshold:
          threshold:
          - 0.0001
          - 0.0005
          - 0.001
          - 0.005
          - 0.01
          - 0.05
          - 0.1
          - 0.2
        sklearn.kernel_approximation.Nystroem:
          gamma: <not serializable>
          kernel:
          - rbf
          - cosine
          - chi2
          - laplacian
          - polynomial
          - poly
          - linear
          - additive_chi2
          - sigmoid
          n_components: <not serializable>
        sklearn.kernel_approximation.RBFSampler:
          gamma: <not serializable>
        sklearn.linear_model.ElasticNetCV:
          l1_ratio: <not serializable>
          tol:
          - 1e-05
          - 0.0001
          - 0.001
          - 0.01
          - 0.1
        sklearn.linear_model.LassoLarsCV:
          normalize:
          - true
          - false
        sklearn.linear_model.RidgeCV: {}
        sklearn.neighbors.KNeighborsRegressor:
          n_neighbors: <not serializable>
          p:
          - 1
          - 2
          weights:
          - uniform
          - distance
        sklearn.preprocessing.Binarizer:
          threshold: <not serializable>
        sklearn.preprocessing.MaxAbsScaler: {}
        sklearn.preprocessing.MinMaxScaler: {}
        sklearn.preprocessing.Normalizer:
          norm:
          - l1
          - l2
          - max
        sklearn.preprocessing.PolynomialFeatures:
          degree:
          - 2
          include_bias:
          - false
          interaction_only:
          - false
        sklearn.preprocessing.RobustScaler: {}
        sklearn.preprocessing.StandardScaler: {}
        sklearn.svm.LinearSVR:
          C:
          - 0.0001
          - 0.001
          - 0.01
          - 0.1
          - 0.5
          - 1.0
          - 5.0
          - 10.0
          - 15.0
          - 20.0
          - 25.0
          dual:
          - true
          - false
          epsilon:
          - 0.0001
          - 0.001
          - 0.01
          - 0.1
          - 1.0
          loss:
          - epsilon_insensitive
          - squared_epsilon_insensitive
          tol:
          - 1e-05
          - 0.0001
          - 0.001
          - 0.01
          - 0.1
        sklearn.tree.DecisionTreeRegressor:
          max_depth: <not serializable>
          min_samples_leaf: <not serializable>
          min_samples_split: <not serializable>
        tpot.builtins.OneHotEncoder:
          minimum_fraction:
          - 0.05
          - 0.1
          - 0.15
          - 0.2
          - 0.25
          sparse:
          - false
          threshold:
          - 10
        tpot.builtins.ZeroCount: {}
        xgboost.XGBRegressor:
          learning_rate:
          - 0.01
          - 0.1
          - 0.5
          - 1.0
          max_depth: <not serializable>
          min_child_weight: <not serializable>
          n_estimators:
          - 20
          - 100
          - 200
          - 500
          - 1000
          nthread:
          - 1
          subsample: <not serializable>
      cv: 5
      max_time_mins: 60
      memory: auto
      n_jobs: -1
      population_size: 20
      scoring: neg_mean_absolute_error
      template: Selector-Transformer-Regressor
      verbosity: 3
ml_type: regression
post_fit_df:
  columns: 41
  obj: <not serializable>
  samples: 537
pre_fit_df:
  columns: 3
  obj: <not serializable>
  samples: 537
reducer:
  reducer:
    _keep_features: []
    _logger: null
    _pca: null
    _pca_feats: null
    _remove_features: []
    corr_threshold: 0.95
    is_fit: true
    n_pca_features: auto
    n_rebate_features: 0.3
    reducer_params:
      tree:
        importance_percentile: 0.99
        mode: regression
        random_state: 0
    reducers:
    - corr
    - tree
    removed_features:
      corr:
      - MagpieData maximum Number
      - MagpieData range Number
      - MagpieData avg_dev Number
      - MagpieData mode Number
      - MagpieData maximum MendeleevNumber
      - MagpieData range MendeleevNumber
      - MagpieData minimum AtomicWeight
      - MagpieData mean AtomicWeight
      - MagpieData mode AtomicWeight
      - MagpieData minimum Column
      - MagpieData mean Row
      - MagpieData range NsValence
      - MagpieData mean NsValence
      - MagpieData avg_dev NsValence
      - MagpieData minimum NfValence
      - MagpieData maximum NfValence
      - MagpieData maximum NsUnfilled
      - MagpieData range NsUnfilled
      - MagpieData mean NsUnfilled
      - MagpieData range NdUnfilled
      - MagpieData maximum NfUnfilled
      - MagpieData range NfUnfilled
      - MagpieData mean NfUnfilled
      - MagpieData range GSbandgap
      - MagpieData mean GSbandgap
      - MagpieData avg_dev GSbandgap
      - MagpieData maximum GSmagmom
      - MagpieData range GSmagmom
      - MagpieData mean GSmagmom
      - MagpieData avg_dev GSmagmom
      - MagpieData minimum SpaceGroupNumber
      tree:
      - MagpieData minimum Number
      - MagpieData minimum MendeleevNumber
      - MagpieData mode MendeleevNumber
      - MagpieData maximum MeltingT
      - MagpieData range MeltingT
      - MagpieData avg_dev MeltingT
      - MagpieData mode MeltingT
      - MagpieData maximum Column
      - MagpieData range Column
      - MagpieData avg_dev Column
      - MagpieData mode Column
      - MagpieData minimum Row
      - MagpieData maximum Row
      - MagpieData range Row
      - MagpieData mode Row
      - MagpieData maximum CovalentRadius
      - MagpieData mode CovalentRadius
      - MagpieData minimum Electronegativity
      - MagpieData mode Electronegativity
      - MagpieData minimum NsValence
      - MagpieData maximum NsValence
      - MagpieData mode NsValence
      - MagpieData minimum NpValence
      - MagpieData maximum NpValence
      - MagpieData range NpValence
      - MagpieData mean NpValence
      - MagpieData mode NpValence
      - MagpieData minimum NdValence
      - MagpieData maximum NdValence
      - MagpieData range NdValence
      - MagpieData avg_dev NdValence
      - MagpieData range NfValence
      - MagpieData mode NfValence
      - MagpieData minimum NValence
      - MagpieData range NValence
      - MagpieData minimum NsUnfilled
      - MagpieData avg_dev NsUnfilled
      - MagpieData mode NsUnfilled
      - MagpieData minimum NpUnfilled
      - MagpieData maximum NpUnfilled
      - MagpieData mode NpUnfilled
      - MagpieData minimum NdUnfilled
      - MagpieData maximum NdUnfilled
      - MagpieData mean NdUnfilled
      - MagpieData avg_dev NdUnfilled
      - MagpieData mode NdUnfilled
      - MagpieData minimum NfUnfilled
      - MagpieData avg_dev NfUnfilled
      - MagpieData mode NfUnfilled
      - MagpieData minimum NUnfilled
      - MagpieData maximum NUnfilled
      - MagpieData mode NUnfilled
      - MagpieData maximum GSvolume_pa
      - MagpieData mode GSvolume_pa
      - MagpieData minimum GSbandgap
      - MagpieData maximum GSbandgap
      - MagpieData mode GSbandgap
      - MagpieData minimum GSmagmom
      - MagpieData mode GSmagmom
      - MagpieData maximum SpaceGroupNumber
      - MagpieData range SpaceGroupNumber
      - MagpieData mode SpaceGroupNumber
    retained_features:
    - MagpieData minimum MeltingT
    - MagpieData mean Column
    - MagpieData avg_dev MendeleevNumber
    - MagpieData range NUnfilled
    - MagpieData avg_dev NpUnfilled
    - MagpieData mean NUnfilled
    - MagpieData minimum GSvolume_pa
    - MagpieData mean NValence
    - MagpieData avg_dev SpaceGroupNumber
    - MagpieData mean Number
    - MagpieData range NpUnfilled
    - MagpieData avg_dev CovalentRadius
    - MagpieData mean MendeleevNumber
    - T
    - MagpieData avg_dev NpValence
    - MagpieData mean Electronegativity
    - MagpieData mode NValence
    - MagpieData mean NdValence
    - MagpieData avg_dev NValence
    - MagpieData mode NdValence
    - MagpieData mean MeltingT
    - MagpieData avg_dev NfValence
    - MagpieData mean GSvolume_pa
    - MagpieData maximum NValence
    - MagpieData mean CovalentRadius
    - MagpieData range GSvolume_pa
    - MagpieData maximum AtomicWeight
    - MagpieData maximum Electronegativity
    - MagpieData mean NfValence
    - MagpieData range AtomicWeight
    - MagpieData avg_dev AtomicWeight
    - MagpieData mean NpUnfilled
    - MagpieData avg_dev NUnfilled
    - MagpieData avg_dev Row
    - MagpieData avg_dev Electronegativity
    - MagpieData minimum CovalentRadius
    - MagpieData avg_dev GSvolume_pa
    - MagpieData range CovalentRadius
    - MagpieData range Electronegativity
    - MagpieData mean SpaceGroupNumber
    tree_importance_percentile: 0.99
target: zT

@ardunn
Copy link
Contributor

ardunn commented Oct 7, 2019

Hey @janosh thanks for the PR! This is a great idea.

Yes, we do need some tests for it. You could just add them onto the existing test for test_persistence_and_digest.

Also, we'll need to update the requirements file with a yaml version (this is why current test is failing). In some of our other projects we use ruamel.yaml but I have no preference really :)

@janosh
Copy link
Member Author

janosh commented Oct 7, 2019

@ardunn I added some tests and pyyaml as a dependency.

@janosh
Copy link
Member Author

janosh commented Oct 7, 2019

Also, there might be a more sophisticated way to handle non-serializable attributes than json.dumps(attrs, default=lambda x: "<not serializable>").

@ardunn
Copy link
Contributor

ardunn commented Oct 8, 2019

@janosh it might be sufficient to output the __repr__ string of the non-serializable object in addition to (not serializable) for each non-serializable object.

Which attributes are not json serializable as text/lists though? My thoughts are to have .digest should produce a file containing only info (i.e., text/lists of text/dicts of text) about the pipeline, and let .save handle serializing the actual pipelines for use.

@janosh
Copy link
Member Author

janosh commented Oct 8, 2019

I just played around a bit more with json.dumps' default hook and found that json.dumps(attrs, default=lambda x: str(x)) gives really nice output. Here's the above YAML file again but with the new hook.

Click to expand...
_logger: null
autofeaturizer:
  autofeaturizer:
    _logger: null
    auto_featurizer: true
    bandstruct_col: bandstructure
    cache_src: null
    composition_col: composition
    converted_input_df:
      columns: 3
      obj: <class 'pandas.core.frame.DataFrame'>
      samples: 537
    do_precheck: true
    dos_col: dos
    drop_inputs: true
    exclude: []
    features:
    - MagpieData minimum Number
    - MagpieData maximum Number
    - MagpieData range Number
    - MagpieData mean Number
    - MagpieData avg_dev Number
    - MagpieData mode Number
    - MagpieData minimum MendeleevNumber
    - MagpieData maximum MendeleevNumber
    - MagpieData range MendeleevNumber
    - MagpieData mean MendeleevNumber
    - MagpieData avg_dev MendeleevNumber
    - MagpieData mode MendeleevNumber
    - MagpieData minimum AtomicWeight
    - MagpieData maximum AtomicWeight
    - MagpieData range AtomicWeight
    - MagpieData mean AtomicWeight
    - MagpieData avg_dev AtomicWeight
    - MagpieData mode AtomicWeight
    - MagpieData minimum MeltingT
    - MagpieData maximum MeltingT
    - MagpieData range MeltingT
    - MagpieData mean MeltingT
    - MagpieData avg_dev MeltingT
    - MagpieData mode MeltingT
    - MagpieData minimum Column
    - MagpieData maximum Column
    - MagpieData range Column
    - MagpieData mean Column
    - MagpieData avg_dev Column
    - MagpieData mode Column
    - MagpieData minimum Row
    - MagpieData maximum Row
    - MagpieData range Row
    - MagpieData mean Row
    - MagpieData avg_dev Row
    - MagpieData mode Row
    - MagpieData minimum CovalentRadius
    - MagpieData maximum CovalentRadius
    - MagpieData range CovalentRadius
    - MagpieData mean CovalentRadius
    - MagpieData avg_dev CovalentRadius
    - MagpieData mode CovalentRadius
    - MagpieData minimum Electronegativity
    - MagpieData maximum Electronegativity
    - MagpieData range Electronegativity
    - MagpieData mean Electronegativity
    - MagpieData avg_dev Electronegativity
    - MagpieData mode Electronegativity
    - MagpieData minimum NsValence
    - MagpieData maximum NsValence
    - MagpieData range NsValence
    - MagpieData mean NsValence
    - MagpieData avg_dev NsValence
    - MagpieData mode NsValence
    - MagpieData minimum NpValence
    - MagpieData maximum NpValence
    - MagpieData range NpValence
    - MagpieData mean NpValence
    - MagpieData avg_dev NpValence
    - MagpieData mode NpValence
    - MagpieData minimum NdValence
    - MagpieData maximum NdValence
    - MagpieData range NdValence
    - MagpieData mean NdValence
    - MagpieData avg_dev NdValence
    - MagpieData mode NdValence
    - MagpieData minimum NfValence
    - MagpieData maximum NfValence
    - MagpieData range NfValence
    - MagpieData mean NfValence
    - MagpieData avg_dev NfValence
    - MagpieData mode NfValence
    - MagpieData minimum NValence
    - MagpieData maximum NValence
    - MagpieData range NValence
    - MagpieData mean NValence
    - MagpieData avg_dev NValence
    - MagpieData mode NValence
    - MagpieData minimum NsUnfilled
    - MagpieData maximum NsUnfilled
    - MagpieData range NsUnfilled
    - MagpieData mean NsUnfilled
    - MagpieData avg_dev NsUnfilled
    - MagpieData mode NsUnfilled
    - MagpieData minimum NpUnfilled
    - MagpieData maximum NpUnfilled
    - MagpieData range NpUnfilled
    - MagpieData mean NpUnfilled
    - MagpieData avg_dev NpUnfilled
    - MagpieData mode NpUnfilled
    - MagpieData minimum NdUnfilled
    - MagpieData maximum NdUnfilled
    - MagpieData range NdUnfilled
    - MagpieData mean NdUnfilled
    - MagpieData avg_dev NdUnfilled
    - MagpieData mode NdUnfilled
    - MagpieData minimum NfUnfilled
    - MagpieData maximum NfUnfilled
    - MagpieData range NfUnfilled
    - MagpieData mean NfUnfilled
    - MagpieData avg_dev NfUnfilled
    - MagpieData mode NfUnfilled
    - MagpieData minimum NUnfilled
    - MagpieData maximum NUnfilled
    - MagpieData range NUnfilled
    - MagpieData mean NUnfilled
    - MagpieData avg_dev NUnfilled
    - MagpieData mode NUnfilled
    - MagpieData minimum GSvolume_pa
    - MagpieData maximum GSvolume_pa
    - MagpieData range GSvolume_pa
    - MagpieData mean GSvolume_pa
    - MagpieData avg_dev GSvolume_pa
    - MagpieData mode GSvolume_pa
    - MagpieData minimum GSbandgap
    - MagpieData maximum GSbandgap
    - MagpieData range GSbandgap
    - MagpieData mean GSbandgap
    - MagpieData avg_dev GSbandgap
    - MagpieData mode GSbandgap
    - MagpieData minimum GSmagmom
    - MagpieData maximum GSmagmom
    - MagpieData range GSmagmom
    - MagpieData mean GSmagmom
    - MagpieData avg_dev GSmagmom
    - MagpieData mode GSmagmom
    - MagpieData minimum SpaceGroupNumber
    - MagpieData maximum SpaceGroupNumber
    - MagpieData range SpaceGroupNumber
    - MagpieData mean SpaceGroupNumber
    - MagpieData avg_dev SpaceGroupNumber
    - MagpieData mode SpaceGroupNumber
    - minimum oxidation state
    - maximum oxidation state
    - range oxidation state
    - std_dev oxidation state
    - avg anion electron affinity
    - compound possible
    - max ionic char
    - avg ionic char
    featurizers:
      bandstructure:
      - BandFeaturizer(find_method='nearest', kpoints=None, nbands=2)
      - BranchPointEnergy(atol=1e-05, calculate_band_edges=True, n_cb=1, n_vb=1)
      composition:
      - "ElementProperty(data_source=<matminer.utils.data.MagpieData object at 0x131635cc0>,\n\
        \                features=['Number', 'MendeleevNumber', 'AtomicWeight',\n\
        \                          'MeltingT', 'Column', 'Row', 'CovalentRadius',\n\
        \                          'Electronegativity', 'NsValence', 'NpValence',\n\
        \                          'NdValence', 'NfValence', 'NValence', 'NsUnfilled',\n\
        \                          'NpUnfilled', 'NdUnfilled', 'NfUnfilled', 'NUnfilled',\n\
        \                          'GSvolume_pa', 'GSbandgap', 'GSmagmom',\n     \
        \                     'SpaceGroupNumber'],\n                stats=['minimum',\
        \ 'maximum', 'range', 'mean', 'avg_dev',\n                       'mode'])"
      - OxidationStates(stats=['minimum', 'maximum', 'range', 'std_dev'])
      - ElectronAffinity()
      - "IonProperty(data_source=<matminer.utils.data.PymatgenData object at 0x11740dda0>,\n\
        \            fast=False)"
      dos:
      - "DOSFeaturizer(contributors=1, decay_length=0.1, gaussian_smear=0.05,\n  \
        \            sampling_resolution=100)"
      - DopingFermi(T=300, dopings=[-1e+20, 1e+20], eref='midgap', return_eref=False)
      - "Hybridization(decay_length=0.1, gaussian_smear=0.05, sampling_resolution=100,\n\
        \              species=[])"
      - DosAsymmetry(decay_length=0.5, gaussian_smear=0.05, sampling_resolution=100)
      structure:
      - DensityFeatures(desired_features=None)
      - GlobalSymmetryFeatures(desired_features=None)
      - EwaldEnergy(accuracy=4)
      - SineCoulombMatrix(diag_elems=True, flatten=True)
      - GlobalInstabilityIndex(disordered_pymatgen=False, r_cut=4.0)
    fittable_fcls: '{''BondFractions'', ''BagofBonds'', ''PartialRadialDistributionFunction''}'
    fitted_input_df:
      columns: 3
      obj: <class 'pandas.core.frame.DataFrame'>
      samples: 537
    functionalize: false
    guess_oxistates: true
    ignore_cols: []
    ignore_errors: true
    is_fit: true
    min_precheck_frac: 0.9
    multiindex: false
    n_jobs: null
    needs_fit: false
    preset: express
    removed_featurizers:
    - YangSolidSolution()
    - "Miedema(data_source='Miedema', ss_types=['min'],\n        struct_types=['inter',\
      \ 'amor', 'ss'])"
    structure_col: structure
cleaner:
  cleaner:
    _logger: null
    drop_na_targets: true
    dropped_features:
    - max ionic char
    - maximum oxidation state
    - avg ionic char
    - avg anion electron affinity
    - std_dev oxidation state
    - compound possible
    - minimum oxidation state
    - range oxidation state
    dropped_samples:
      columns: 142
      obj: <class 'pandas.core.frame.DataFrame'>
      samples: 0
    encode_categories: true
    encoder: one-hot
    feature_na_method: drop
    fitted_df:
      columns: 134
      obj: <class 'pandas.core.frame.DataFrame'>
      samples: 537
    fitted_target: zT
    is_fit: true
    max_na_frac: 0.01
    na_method_fit: drop
    na_method_transform: fill
    number_cols:
    - T
    - MagpieData minimum Number
    - MagpieData maximum Number
    - MagpieData range Number
    - MagpieData mean Number
    - MagpieData avg_dev Number
    - MagpieData mode Number
    - MagpieData minimum MendeleevNumber
    - MagpieData maximum MendeleevNumber
    - MagpieData range MendeleevNumber
    - MagpieData mean MendeleevNumber
    - MagpieData avg_dev MendeleevNumber
    - MagpieData mode MendeleevNumber
    - MagpieData minimum AtomicWeight
    - MagpieData maximum AtomicWeight
    - MagpieData range AtomicWeight
    - MagpieData mean AtomicWeight
    - MagpieData avg_dev AtomicWeight
    - MagpieData mode AtomicWeight
    - MagpieData minimum MeltingT
    - MagpieData maximum MeltingT
    - MagpieData range MeltingT
    - MagpieData mean MeltingT
    - MagpieData avg_dev MeltingT
    - MagpieData mode MeltingT
    - MagpieData minimum Column
    - MagpieData maximum Column
    - MagpieData range Column
    - MagpieData mean Column
    - MagpieData avg_dev Column
    - MagpieData mode Column
    - MagpieData minimum Row
    - MagpieData maximum Row
    - MagpieData range Row
    - MagpieData mean Row
    - MagpieData avg_dev Row
    - MagpieData mode Row
    - MagpieData minimum CovalentRadius
    - MagpieData maximum CovalentRadius
    - MagpieData range CovalentRadius
    - MagpieData mean CovalentRadius
    - MagpieData avg_dev CovalentRadius
    - MagpieData mode CovalentRadius
    - MagpieData minimum Electronegativity
    - MagpieData maximum Electronegativity
    - MagpieData range Electronegativity
    - MagpieData mean Electronegativity
    - MagpieData avg_dev Electronegativity
    - MagpieData mode Electronegativity
    - MagpieData minimum NsValence
    - MagpieData maximum NsValence
    - MagpieData range NsValence
    - MagpieData mean NsValence
    - MagpieData avg_dev NsValence
    - MagpieData mode NsValence
    - MagpieData minimum NpValence
    - MagpieData maximum NpValence
    - MagpieData range NpValence
    - MagpieData mean NpValence
    - MagpieData avg_dev NpValence
    - MagpieData mode NpValence
    - MagpieData minimum NdValence
    - MagpieData maximum NdValence
    - MagpieData range NdValence
    - MagpieData mean NdValence
    - MagpieData avg_dev NdValence
    - MagpieData mode NdValence
    - MagpieData minimum NfValence
    - MagpieData maximum NfValence
    - MagpieData range NfValence
    - MagpieData mean NfValence
    - MagpieData avg_dev NfValence
    - MagpieData mode NfValence
    - MagpieData minimum NValence
    - MagpieData maximum NValence
    - MagpieData range NValence
    - MagpieData mean NValence
    - MagpieData avg_dev NValence
    - MagpieData mode NValence
    - MagpieData minimum NsUnfilled
    - MagpieData maximum NsUnfilled
    - MagpieData range NsUnfilled
    - MagpieData mean NsUnfilled
    - MagpieData avg_dev NsUnfilled
    - MagpieData mode NsUnfilled
    - MagpieData minimum NpUnfilled
    - MagpieData maximum NpUnfilled
    - MagpieData range NpUnfilled
    - MagpieData mean NpUnfilled
    - MagpieData avg_dev NpUnfilled
    - MagpieData mode NpUnfilled
    - MagpieData minimum NdUnfilled
    - MagpieData maximum NdUnfilled
    - MagpieData range NdUnfilled
    - MagpieData mean NdUnfilled
    - MagpieData avg_dev NdUnfilled
    - MagpieData mode NdUnfilled
    - MagpieData minimum NfUnfilled
    - MagpieData maximum NfUnfilled
    - MagpieData range NfUnfilled
    - MagpieData mean NfUnfilled
    - MagpieData avg_dev NfUnfilled
    - MagpieData mode NfUnfilled
    - MagpieData minimum NUnfilled
    - MagpieData maximum NUnfilled
    - MagpieData range NUnfilled
    - MagpieData mean NUnfilled
    - MagpieData avg_dev NUnfilled
    - MagpieData mode NUnfilled
    - MagpieData minimum GSvolume_pa
    - MagpieData maximum GSvolume_pa
    - MagpieData range GSvolume_pa
    - MagpieData mean GSvolume_pa
    - MagpieData avg_dev GSvolume_pa
    - MagpieData mode GSvolume_pa
    - MagpieData minimum GSbandgap
    - MagpieData maximum GSbandgap
    - MagpieData range GSbandgap
    - MagpieData mean GSbandgap
    - MagpieData avg_dev GSbandgap
    - MagpieData mode GSbandgap
    - MagpieData minimum GSmagmom
    - MagpieData maximum GSmagmom
    - MagpieData range GSmagmom
    - MagpieData mean GSmagmom
    - MagpieData avg_dev GSmagmom
    - MagpieData mode GSmagmom
    - MagpieData minimum SpaceGroupNumber
    - MagpieData maximum SpaceGroupNumber
    - MagpieData range SpaceGroupNumber
    - MagpieData mean SpaceGroupNumber
    - MagpieData avg_dev SpaceGroupNumber
    - MagpieData mode SpaceGroupNumber
    - minimum oxidation state
    - maximum oxidation state
    - range oxidation state
    - std_dev oxidation state
    - avg anion electron affinity
    - compound possible
    - max ionic char
    - avg ionic char
    object_cols: []
is_fit: true
learner:
  learner:
    _backend: "TPOTRegressor(config_dict={'sklearn.cluster.FeatureAgglomeration':\
      \ {'affinity': ['euclidean',\n                                             \
      \                                    'l1',\n                               \
      \                                                  'l2',\n                 \
      \                                                                'manhattan',\n\
      \                                                                          \
      \       'cosine'],\n                                                       \
      \             'linkage': ['ward',\n                                        \
      \                                        'complete',\n                     \
      \                                                           'average']},\n \
      \                          'sklearn.decomposition.FastICA': {'tol': array([0.\
      \  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,\n       0.55,\
      \ 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95, 1.  ])},\n               \
      \            'sklearn.decomposition.PCA': {'iterated_power'...\n           \
      \   crossover_rate=0.1, cv=5, disable_update_check=False,\n              early_stop=None,\
      \ generations=1000000, max_eval_time_mins=5,\n              max_time_mins=60,\
      \ memory='auto', mutation_rate=0.9, n_jobs=-1,\n              offspring_size=None,\
      \ periodic_checkpoint_folder=None,\n              population_size=20, random_state=None,\n\
      \              scoring='neg_mean_absolute_error', subsample=1.0,\n         \
      \     template='Selector-Transformer-Regressor', use_dask=False,\n         \
      \     verbosity=3, warm_start=False)"
    _features:
    - T
    - MagpieData mean Number
    - MagpieData mean MendeleevNumber
    - MagpieData avg_dev MendeleevNumber
    - MagpieData maximum AtomicWeight
    - MagpieData range AtomicWeight
    - MagpieData avg_dev AtomicWeight
    - MagpieData minimum MeltingT
    - MagpieData mean MeltingT
    - MagpieData mean Column
    - MagpieData avg_dev Row
    - MagpieData minimum CovalentRadius
    - MagpieData range CovalentRadius
    - MagpieData mean CovalentRadius
    - MagpieData avg_dev CovalentRadius
    - MagpieData maximum Electronegativity
    - MagpieData range Electronegativity
    - MagpieData mean Electronegativity
    - MagpieData avg_dev Electronegativity
    - MagpieData avg_dev NpValence
    - MagpieData mean NdValence
    - MagpieData mode NdValence
    - MagpieData mean NfValence
    - MagpieData avg_dev NfValence
    - MagpieData maximum NValence
    - MagpieData mean NValence
    - MagpieData avg_dev NValence
    - MagpieData mode NValence
    - MagpieData range NpUnfilled
    - MagpieData mean NpUnfilled
    - MagpieData avg_dev NpUnfilled
    - MagpieData range NUnfilled
    - MagpieData mean NUnfilled
    - MagpieData avg_dev NUnfilled
    - MagpieData minimum GSvolume_pa
    - MagpieData range GSvolume_pa
    - MagpieData mean GSvolume_pa
    - MagpieData avg_dev GSvolume_pa
    - MagpieData mean SpaceGroupNumber
    - MagpieData avg_dev SpaceGroupNumber
    _fitted_target: zT
    _logger: null
    greater_score_is_better: null
    is_fit: true
    mode: regression
    models: null
    random_state: null
    tpot_kwargs:
      config_dict:
        sklearn.cluster.FeatureAgglomeration:
          affinity:
          - euclidean
          - l1
          - l2
          - manhattan
          - cosine
          linkage:
          - ward
          - complete
          - average
        sklearn.decomposition.FastICA:
          tol: "[0.   0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5  0.55 0.6 \
            \ 0.65\n 0.7  0.75 0.8  0.85 0.9  0.95 1.  ]"
        sklearn.decomposition.PCA:
          iterated_power: range(1, 11)
          svd_solver:
          - randomized
        sklearn.ensemble.ExtraTreesRegressor:
          bootstrap:
          - true
          - false
          max_features: '[0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95]'
          min_samples_leaf: range(1, 21, 3)
          min_samples_split: range(2, 21, 3)
          n_estimators:
          - 20
          - 100
          - 200
          - 500
          - 1000
        sklearn.ensemble.GradientBoostingRegressor:
          alpha:
          - 0.75
          - 0.8
          - 0.85
          - 0.9
          - 0.95
          - 0.99
          learning_rate:
          - 0.01
          - 0.1
          - 0.5
          - 1.0
          loss:
          - ls
          - lad
          - huber
          - quantile
          max_depth: range(1, 11, 2)
          max_features: "[0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5  0.55 0.6\
            \  0.65 0.7\n 0.75 0.8  0.85 0.9  0.95 1.  ]"
          min_samples_leaf: range(1, 21, 3)
          min_samples_split: range(2, 21, 3)
          n_estimators:
          - 20
          - 100
          - 200
          - 500
          - 1000
          subsample: "[0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5  0.55 0.6\
            \  0.65 0.7\n 0.75 0.8  0.85 0.9  0.95 1.  ]"
        sklearn.ensemble.RandomForestRegressor:
          bootstrap:
          - true
          - false
          max_features: '[0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95]'
          min_samples_leaf: range(1, 21, 3)
          min_samples_split: range(2, 21, 3)
          n_estimators:
          - 20
          - 100
          - 200
          - 500
          - 1000
        sklearn.feature_selection.SelectFromModel:
          estimator:
            sklearn.ensemble.ExtraTreesRegressor:
              max_features: '[0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95]'
              n_estimators:
              - 100
          threshold: "[0.   0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5  0.55\
            \ 0.6  0.65\n 0.7  0.75 0.8  0.85 0.9  0.95 1.  ]"
        sklearn.feature_selection.SelectFwe:
          alpha: "[0.    0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01\
            \  0.011\n 0.012 0.013 0.014 0.015 0.016 0.017 0.018 0.019 0.02  0.021\
            \ 0.022 0.023\n 0.024 0.025 0.026 0.027 0.028 0.029 0.03  0.031 0.032\
            \ 0.033 0.034 0.035\n 0.036 0.037 0.038 0.039 0.04  0.041 0.042 0.043\
            \ 0.044 0.045 0.046 0.047\n 0.048 0.049]"
          score_func:
            sklearn.feature_selection.f_regression: null
        sklearn.feature_selection.SelectPercentile:
          percentile: range(1, 100)
          score_func:
            sklearn.feature_selection.f_regression: null
        sklearn.feature_selection.VarianceThreshold:
          threshold:
          - 0.0001
          - 0.0005
          - 0.001
          - 0.005
          - 0.01
          - 0.05
          - 0.1
          - 0.2
        sklearn.kernel_approximation.Nystroem:
          gamma: "[0.   0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5  0.55 0.6\
            \  0.65\n 0.7  0.75 0.8  0.85 0.9  0.95 1.  ]"
          kernel:
          - rbf
          - cosine
          - chi2
          - laplacian
          - polynomial
          - poly
          - linear
          - additive_chi2
          - sigmoid
          n_components: range(1, 11)
        sklearn.kernel_approximation.RBFSampler:
          gamma: "[0.   0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5  0.55 0.6\
            \  0.65\n 0.7  0.75 0.8  0.85 0.9  0.95 1.  ]"
        sklearn.linear_model.ElasticNetCV:
          l1_ratio: "[0.   0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5  0.55\
            \ 0.6  0.65\n 0.7  0.75 0.8  0.85 0.9  0.95 1.  ]"
          tol:
          - 1e-05
          - 0.0001
          - 0.001
          - 0.01
          - 0.1
        sklearn.linear_model.LassoLarsCV:
          normalize:
          - true
          - false
        sklearn.linear_model.RidgeCV: {}
        sklearn.neighbors.KNeighborsRegressor:
          n_neighbors: range(1, 101)
          p:
          - 1
          - 2
          weights:
          - uniform
          - distance
        sklearn.preprocessing.Binarizer:
          threshold: "[0.   0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5  0.55\
            \ 0.6  0.65\n 0.7  0.75 0.8  0.85 0.9  0.95 1.  ]"
        sklearn.preprocessing.MaxAbsScaler: {}
        sklearn.preprocessing.MinMaxScaler: {}
        sklearn.preprocessing.Normalizer:
          norm:
          - l1
          - l2
          - max
        sklearn.preprocessing.PolynomialFeatures:
          degree:
          - 2
          include_bias:
          - false
          interaction_only:
          - false
        sklearn.preprocessing.RobustScaler: {}
        sklearn.preprocessing.StandardScaler: {}
        sklearn.svm.LinearSVR:
          C:
          - 0.0001
          - 0.001
          - 0.01
          - 0.1
          - 0.5
          - 1.0
          - 5.0
          - 10.0
          - 15.0
          - 20.0
          - 25.0
          dual:
          - true
          - false
          epsilon:
          - 0.0001
          - 0.001
          - 0.01
          - 0.1
          - 1.0
          loss:
          - epsilon_insensitive
          - squared_epsilon_insensitive
          tol:
          - 1e-05
          - 0.0001
          - 0.001
          - 0.01
          - 0.1
        sklearn.tree.DecisionTreeRegressor:
          max_depth: range(1, 11, 2)
          min_samples_leaf: range(1, 21, 3)
          min_samples_split: range(2, 21, 3)
        tpot.builtins.OneHotEncoder:
          minimum_fraction:
          - 0.05
          - 0.1
          - 0.15
          - 0.2
          - 0.25
          sparse:
          - false
          threshold:
          - 10
        tpot.builtins.ZeroCount: {}
        xgboost.XGBRegressor:
          learning_rate:
          - 0.01
          - 0.1
          - 0.5
          - 1.0
          max_depth: range(1, 11, 2)
          min_child_weight: range(1, 21, 4)
          n_estimators:
          - 20
          - 100
          - 200
          - 500
          - 1000
          nthread:
          - 1
          subsample: '[0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95]'
      cv: 5
      max_time_mins: 60
      memory: auto
      n_jobs: -1
      population_size: 20
      scoring: neg_mean_absolute_error
      template: Selector-Transformer-Regressor
      verbosity: 3
ml_type: regression
post_fit_df:
  columns: 41
  obj: <class 'pandas.core.frame.DataFrame'>
  samples: 537
pre_fit_df:
  columns: 3
  obj: <class 'pandas.core.frame.DataFrame'>
  samples: 537
reducer:
  reducer:
    _keep_features: []
    _logger: null
    _pca: null
    _pca_feats: null
    _remove_features: []
    corr_threshold: 0.95
    is_fit: true
    n_pca_features: auto
    n_rebate_features: 0.3
    reducer_params:
      tree:
        importance_percentile: 0.99
        mode: regression
        random_state: 0
    reducers:
    - corr
    - tree
    removed_features:
      corr:
      - MagpieData maximum Number
      - MagpieData range Number
      - MagpieData avg_dev Number
      - MagpieData mode Number
      - MagpieData maximum MendeleevNumber
      - MagpieData range MendeleevNumber
      - MagpieData minimum AtomicWeight
      - MagpieData mean AtomicWeight
      - MagpieData mode AtomicWeight
      - MagpieData minimum Column
      - MagpieData mean Row
      - MagpieData range NsValence
      - MagpieData mean NsValence
      - MagpieData avg_dev NsValence
      - MagpieData minimum NfValence
      - MagpieData maximum NfValence
      - MagpieData maximum NsUnfilled
      - MagpieData range NsUnfilled
      - MagpieData mean NsUnfilled
      - MagpieData range NdUnfilled
      - MagpieData maximum NfUnfilled
      - MagpieData range NfUnfilled
      - MagpieData mean NfUnfilled
      - MagpieData range GSbandgap
      - MagpieData mean GSbandgap
      - MagpieData avg_dev GSbandgap
      - MagpieData maximum GSmagmom
      - MagpieData range GSmagmom
      - MagpieData mean GSmagmom
      - MagpieData avg_dev GSmagmom
      - MagpieData minimum SpaceGroupNumber
      tree:
      - MagpieData minimum Number
      - MagpieData minimum MendeleevNumber
      - MagpieData mode MendeleevNumber
      - MagpieData maximum MeltingT
      - MagpieData range MeltingT
      - MagpieData avg_dev MeltingT
      - MagpieData mode MeltingT
      - MagpieData maximum Column
      - MagpieData range Column
      - MagpieData avg_dev Column
      - MagpieData mode Column
      - MagpieData minimum Row
      - MagpieData maximum Row
      - MagpieData range Row
      - MagpieData mode Row
      - MagpieData maximum CovalentRadius
      - MagpieData mode CovalentRadius
      - MagpieData minimum Electronegativity
      - MagpieData mode Electronegativity
      - MagpieData minimum NsValence
      - MagpieData maximum NsValence
      - MagpieData mode NsValence
      - MagpieData minimum NpValence
      - MagpieData maximum NpValence
      - MagpieData range NpValence
      - MagpieData mean NpValence
      - MagpieData mode NpValence
      - MagpieData minimum NdValence
      - MagpieData maximum NdValence
      - MagpieData range NdValence
      - MagpieData avg_dev NdValence
      - MagpieData range NfValence
      - MagpieData mode NfValence
      - MagpieData minimum NValence
      - MagpieData range NValence
      - MagpieData minimum NsUnfilled
      - MagpieData avg_dev NsUnfilled
      - MagpieData mode NsUnfilled
      - MagpieData minimum NpUnfilled
      - MagpieData maximum NpUnfilled
      - MagpieData mode NpUnfilled
      - MagpieData minimum NdUnfilled
      - MagpieData maximum NdUnfilled
      - MagpieData mean NdUnfilled
      - MagpieData avg_dev NdUnfilled
      - MagpieData mode NdUnfilled
      - MagpieData minimum NfUnfilled
      - MagpieData avg_dev NfUnfilled
      - MagpieData mode NfUnfilled
      - MagpieData minimum NUnfilled
      - MagpieData maximum NUnfilled
      - MagpieData mode NUnfilled
      - MagpieData maximum GSvolume_pa
      - MagpieData mode GSvolume_pa
      - MagpieData minimum GSbandgap
      - MagpieData maximum GSbandgap
      - MagpieData mode GSbandgap
      - MagpieData minimum GSmagmom
      - MagpieData mode GSmagmom
      - MagpieData maximum SpaceGroupNumber
      - MagpieData range SpaceGroupNumber
      - MagpieData mode SpaceGroupNumber
    retained_features:
    - MagpieData minimum MeltingT
    - MagpieData mean Column
    - MagpieData avg_dev MendeleevNumber
    - MagpieData range NUnfilled
    - MagpieData avg_dev NpUnfilled
    - MagpieData mean NUnfilled
    - MagpieData minimum GSvolume_pa
    - MagpieData mean NValence
    - MagpieData avg_dev SpaceGroupNumber
    - MagpieData mean Number
    - MagpieData range NpUnfilled
    - MagpieData avg_dev CovalentRadius
    - MagpieData mean MendeleevNumber
    - T
    - MagpieData avg_dev NpValence
    - MagpieData mean Electronegativity
    - MagpieData mode NValence
    - MagpieData mean NdValence
    - MagpieData avg_dev NValence
    - MagpieData mode NdValence
    - MagpieData mean MeltingT
    - MagpieData avg_dev NfValence
    - MagpieData mean GSvolume_pa
    - MagpieData maximum NValence
    - MagpieData mean CovalentRadius
    - MagpieData range GSvolume_pa
    - MagpieData maximum AtomicWeight
    - MagpieData maximum Electronegativity
    - MagpieData mean NfValence
    - MagpieData range AtomicWeight
    - MagpieData avg_dev AtomicWeight
    - MagpieData mean NpUnfilled
    - MagpieData avg_dev NUnfilled
    - MagpieData avg_dev Row
    - MagpieData avg_dev Electronegativity
    - MagpieData minimum CovalentRadius
    - MagpieData avg_dev GSvolume_pa
    - MagpieData range CovalentRadius
    - MagpieData range Electronegativity
    - MagpieData mean SpaceGroupNumber
    tree_importance_percentile: 0.99
target: zT

@janosh
Copy link
Member Author

janosh commented Oct 8, 2019

@ardunn Fyi, Codacy now complains that

Lambda may not be necessary
digeststr = json.dumps(attrs, default=lambda x: str(x))

but at my end it certainly is necessary. Without it, I get a bunch of

TypeError: Object of type 'ElementProperty' is not JSON serializable

@janosh janosh mentioned this pull request Oct 8, 2019
@ardunn
Copy link
Contributor

ardunn commented Oct 8, 2019

@janosh that's ok for now. I might be able to update it on my end to appease codacy, since I'll need to run the intensive tests (and hopefully fix any issues there) anyway

@ardunn ardunn merged commit 364a068 into hackingmaterials:master Oct 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants