Skip to content

v24.08.0

Compare
Choose a tag to compare
@github-actions github-actions released this 13 Aug 02:52

Packages

Changes

User Tools

  • Remove calculation of gpu cluster recommendation from python tool when cluster argument is passed (#1278)
  • Remove unused argument --target_platform in Python Tool (#1279)
  • Qualification tool: Add output stats file for Execs(operators) (#1225)
  • Include GPU information in the cluster recommendation for Dataproc and OnPrem (#1265)
  • Remove speedup based recommendation column from qual_summary csv (#1268)
  • Fix prediction CSV files for multiple qual directories (#1267)
  • Clean up tools after removing CLI dependency (#1256)
  • Rename cluster shape columns to use 'worker' prefix in the output files and rename metadata file (#1258)
  • Remove CLI dependency in Dataproc _pull_gpu_hw_info implementation (#1245)
  • Replace split_nds with split_train_val (#1252)
  • Update xgboost models and metrics (#1244)
  • Add footnotes for config recommendations and speedup category in top candidate view (#1243)
  • [BUG] Update Dataproc instance catalog for n1 series GPU info (#1242)
  • Improvements in Cluster Config Recommender (#1241)
  • Improve console output from python tool for failed/gpu/photon event logs (#1235)
  • [FEA] Generate and use instance description file for Databricks-Azure platform (#1232)
  • Remove arguments related to cost-savings (#1230)
  • Updated models for latest databricks-aws datasets (#1231)
  • Refactor QualX for Linter and Test Compatibility (#1228)
  • Generate summary metadata file and fix node recommendation in python (#1216)
  • [FEA] Remove gcloud CLI dependency for Dataproc platform (#1223)
  • Updated models for latest dataproc eventlogs (#1226)
  • Remove estimation-model column from qualification summary (#1220)
  • Add option to add features.csv files to training set (#1212)
  • Disable cost saving functionality (#1218)
  • [FEA] Remove CLI dependency for EMR and Databricks-AWS platforms in user tool (#1196)
  • Fix some basic pylint errors in qualx code (#1210)
  • Qual tool tuning rec based on CPU event log coherently recommend tunings and node setup and infer cluster from eventlog (#1188)
  • Add shap command to internal CLI for debugging (#1197)
  • Add internal CLI to generate instance descriptions for CSPs (#1137)
  • [FEA] Support custom XGBoost model file via user tools CLI (#1184)
  • Updated models for new training data (#1186)
  • Add evaluate_summary command to internal CLI (#1185)
  • [DOC] Fix broken link to qualX docs and update python prerequisites (#1180)
  • Bump to certifi-2024.7.4 and urllib3-1.26.19 (#1173)
  • Disable UI-HTML report by default in Qualification tool (#1168)
  • Fix parsing App IDs inside metrics directory in QualX (#1167)
  • Refactor Databricks-AWS Qual tool to cache and process pricing info from DB website (#1141)
  • Add plugin mechanism for dataset-specific preprocessing in qualx (#1148)
  • Unsupported op logic should read action column from qual's output (#1150)
  • Update qualx readme for training (#1140)
  • Disable pylint-unreachable code in tox.ini (#1145)

Core

  • Include GPU information in the cluster recommendation for Dataproc and OnPrem (#1265)
  • [TASK] Optimize the storage of accumulables in core tools (#1263)
  • Sync GetJsonObject support with Rapids-Plugin (#1266)
  • Do not create new StageInfo object (#1261)
  • [FEA] Add support for map_from_arrays in qualification tools (#1248)
  • Rename cluster shape columns to use 'worker' prefix in the output files and rename metadata file (#1258)
  • Fix stage level metrics output csv file (#1251)
  • Handle event logs with wildcards in status report generation (#1237)
  • Fix duplicate records in DataSourceInfo report (#1227)
  • Reduce memory footprint of stageInfo (#1222)
  • Ensure UTF-8 encoding for reading non-english characters (#1211)
  • Sync plugin support for hash-hive and shift operators (#1198)
  • Sync-up the support of parse_url in qualification tool (#1195)
  • Include status information for failed event logs in core tool (#1187)
  • [FEA] Adding Benchmarking classes to evaluate core tools performance (#1169)
  • [BUG] Fix handling of non-english characters in tools output files (#1189)
  • [Bug] Fix java Qual tool handling of --platform argument (#1161)
  • Add all stage metrics to tools output (#1151)
  • Follow-up 1142: remove TODO line (#1146)
  • Mark wholestageCodeGen as shouldRemove when child nodes are removed (#1142)
  • [FEA] Display full failure messages in failed CSV files (#1135)

Miscellaneous

  • Qualification tool: Add option to filter event logs for a maximum file system size (#1275)
  • Qualification tool should print Kryo related recommendations (#1204)
  • Fix header check script to exclude files (#1224)
  • Update header check script for pre-commit hooks (#1219)
  • Follow-up 1189: handle non-english characters in data-output.js (#1208)
  • Update pre-commit hooks to check for headers and white-spaces (#1205)
  • user-tools:Update --help for cluster argument (#1178)
  • Support fine-tuning models (#1174)