Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto tuning feature enhancements #379

Merged
merged 6 commits into from
May 21, 2018
Merged

Conversation

arpang
Copy link
Contributor

@arpang arpang commented May 10, 2018

Auto tuning feature enhancements:

  1. Tuning auto switch off when:
    • parameter converges
    • the maximum number of tuning iterations is reached
    • no gain in cost function
  2. Remembers and returns the best parameter set when:
    • tuning is switched off
    • no new parameter suggestion exists for the job
    • execution has failed and a retry is attempted
  3. Parameters being tune now attain only discrete values defined by the step size
  4. Bug fixes

File-wise summary of changes:

  1. exception_class parameter in Scheduler.conf renamed to workflow_client
  2. Removing APIFitnessComputeUtil.java
  3. Remembering the best parameter set
  4. Changes in FitnessComputeUtil.java:
    • Checking if tuning can be switched off. The qualifying scenarios are:
      • If parameters converge
      • If median gain is negative
      • Max #executions are reached
    • Removing the penalty applied in case of failed executions
    • Metric violation normalized by input size
    • Check and update if the best parameter set is found
  5. Changes in JobTuningInfo.java, PSOParamGenerator.java, ParamGenerator.java, pso_param_generation.py:
    • Added jobType information
  6. Added the missing java docs
  7. Removed unused variables from models
  8. Changes in tunein-test1.sql, test-init.sql: Added the column names in insert statements

Copy link
Contributor

@akshayrai akshayrai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge after addressing the minor comments!

Thanks for cleaning up the code as well.

* @param jobExecId String jobExecId of the execution to which penalty has to be applied
*/
private void applyPenalty(String jobExecId) {
Integer penaltyConstant = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment in the code on why 3 was chosen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

elif param_name[i] == PARAM_PIG_MAX_COMBINED_SPLIT_SIZE:
max_combined_split_size_index = i
pig_max_combined_split_size_index = i
elif param_name[i] == PARAM_SPARK_EXECUTOR_MEMORY:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned offline, it would make sense to send this together with the Spark changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

) ENGINE=InnoDB;


# --- !Downs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write the downs section for the alter and insert statements as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

INSERT INTO tuning_parameter VALUES (11,'spark.memory.fraction',2, 0.6, 0.1, 0.9, 0.1, 0, current_timestamp(0), current_timestamp(0));
INSERT INTO tuning_parameter VALUES (12,'spark.memory.storageFraction', 2, 0.5, 0.1, 0.9, 0.1, 0, current_timestamp(0), current_timestamp(0));
INSERT INTO tuning_parameter VALUES (13,'spark.executor.cores', 2, 1 , 1, 1, 1, 0, current_timestamp(0), current_timestamp(0));
INSERT INTO tuning_parameter VALUES (14,'spark.yarn.executor.memoryOverhead', 2, 384, 384, 1024, 100, 0, current_timestamp(0), current_timestamp(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on the background behind these constants?

Copy link
Contributor Author

@arpang arpang May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually not needed now (it was needed for spark tuning) so I removed it altogether.

@@ -38,8 +39,14 @@
PARAM_MAPREDUCE_MAP_JAVA_OPTS = 'mapreduce.map.java.opts'
PARAM_MAPREDUCE_REDUCE_JAVA_OPTS = 'mapreduce.reduce.java.opts'

PARAM_SPARK_EXECUTOR_MEMORY = "spark.executor.memory"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can include these with the spark changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

logger.info("Constraint violated: Sort memory > 60% of map memory");
violations++;
}
if (mrMapMemory - mrSortMemory < 768) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to define variables for these constants like 768.

protected void updateExecutionMetrics(List<TuningJobExecution> completedExecutions) {
for (TuningJobExecution tuningJobExecution : completedExecutions) {
private void updateExecutionMetrics(List<TuningJobExecution> completedExecutions) {
Integer penaltyConstant = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.eq(TuningJobDefinition.TABLE.job + '.' + JobDefinition.TABLE.id, jobDefinition.id)
.findUnique();
if (tuningJobDefinition.tuningEnabled == 1) {
tuningJobDefinition.tuningEnabled = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to log this event

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done.

* @return true if the median gain is negative, else false
*/
private boolean isMedianGainNegative(List<JobExecution> jobExecutions) {
int num_fitness_for_median = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explanation added in java docs.

@akshayrai akshayrai merged commit d3fb6ba into linkedin:master May 21, 2018
varunsaxena added a commit that referenced this pull request Aug 30, 2018
varunsaxena added a commit that referenced this pull request Aug 31, 2018
pralabhkumar pushed a commit to pralabhkumar/dr-elephant that referenced this pull request Aug 31, 2018
Auto tuning feature enhancements:

Tuning auto switch off when:
        * parameter converges
        * the maximum number of tuning iterations is reached
        * no gain in cost function
Remembers and returns the best parameter set when:
        * tuning is switched off
        * no new parameter suggestion exists for the job
        * execution has failed and a retry is attempted
Parameters being tune now attain only discrete values defined by the step size
Bug fixes
varunsaxena added a commit that referenced this pull request Oct 16, 2018
varunsaxena added a commit that referenced this pull request Oct 16, 2018
edwinalu pushed a commit to edwinalu/dr-elephant that referenced this pull request Oct 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants