-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE[581] Add Built-in Execution Context Variables #654
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looking good! I thought that we should replace those system environment variables in the DAG definition to intermediate representation during DAG building stage and replace back them to the original when running the node(step), but the code change does not seem to include that part. Wouldn't it be necessary?
internal/dag/scheduler/node.go
Outdated
|
||
return os.Setenv(dag.RequestIDEnvKey, dagCtx.DaguRequestID) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we set the environment variables only for the child process rather than the entire agent process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: I thought that we should replace those system environment variables in the DAG definition to intermediate representation during DAG building stage and replace back them to the original when running the node(step), but the code change does not seem to include that part. Wouldn't it be necessary?
A: I attempted to implement this feature this afternoon, but I encountered a problem. All the environment variables such as ${DAG_REQUEST_ID} and ${GOPATH}, as well as other configured environment variables, are converted to an intermediate state {{DAG_REQUEST_ID}}. In the executor, {{DAG_REQUEST_ID}} is converted back into environment variables like ${DAG_REQUEST_ID}. However, in the executor command, when using echo requestId: ${DAG_REQUEST_ID}, the logs show the result as below. I found that to handle the processing of environment variables in echo, the transformation needs to be done within step.commandWithArgs.
Therefore, I do not perform the conversion and instead handle it by adding environment variables directly using os.SetEnv before util.SplitCommandWithParse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Shouldn't we set the environment variables only for the child process rather than the entire agent process?
A: I am very sorry.I misunderstood, so I will rethink and adjust my code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no worries! Please keep up the great work. FYI, here's the code to set the environment variables: command.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the reminder. I forgot about this piece of code this afternoon. I now understand how to adjust it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yohamta hi! Your provided approach is quite good, but I've noticed it may lead to some problems:
- The step in multiple DAGs needs to be incremented because if there are multiple DAGs running simultaneously,
${STEP_1_DAG_SCHEDULER_LOG_PATH}
could lead to conflicts. - When spawning the child process, we should make sure to pass these variables using their original keys (like
DAG_SCHEDULER_LOG_PATH
etc)
There could be conflicts in environment variables if multiple steps are running.
for example :
A: Dag test , running step1 , pass${STEP_1_DAG_SCHEDULER_LOG_PATH}
toDAG_SCHEDULER_LOG_PATH
B: Dag anotherOneTest, running step2 , pass${STEP_2_DAG_SCHEDULER_LOG_PATH}
toDAG_SCHEDULER_LOG_PATH
Running the steps of the two DAGs(A and B) simultaneously could result in conflicts with the DAG_SCHEDULER_LOG_PATH.
I believe converting each step's DAG_SCHEDULER_LOG_PATH
to internal, step-specific versions is a good approach.Perhaps we can pass variables to {STEP_1_DAG_SCHEDULER_LOG_PATH} environment variable before uitil.SplitCommandWithParse() function
Upon executing the uitil.SplitCommandWithParse() function can substitute the corresponding ${STEP_1_DAG_SCHEDULER_LOG_PATH} environment variable with its own specific data.
- If we are to convert
DAG_SCHEDULER_LOG_PATH
to${STEP_ID_DAG_SCHEDULER_LOG_PATH}
, we need to specifically determine whether it is the designated environment variable, such asDAG_SCHEDULER_LOG_PATH
. In this case, if the scenario is only used for echoing the corresponding build context variables in the command, we can directly replace the specific ${DAG_SCHEDULER_LOG_PATH}. Are there any other considerations for different scenarios?
I'm not sure if my explanation is clear, but if you have any new ideas, we can continue the discussion. Thank you for providing such a great approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @halalala222, thank you for thinking about this so thoroughly. I have a few thoughts on your points.
The step in multiple DAGs needs to be incremented because if there are multiple DAGs running simultaneously,
${STEP_1_DAG_SCHEDULER_LOG_PATH}
could lead to conflicts.
When spawning the child process, we should make sure to pass these variables using their original keys (likeDAG_SCHEDULER_LOG_PATH
etc)
There could be conflicts in environment variables if multiple steps are running.
for example :
A: Dag test , running step1 , pass${STEP_1_DAG_SCHEDULER_LOG_PATH}
toDAG_SCHEDULER_LOG_PATH
B: Dag anotherOneTest, running step2 , pass${STEP_2_DAG_SCHEDULER_LOG_PATH}
toDAG_SCHEDULER_LOG_PATH
Running the steps of the two DAGs(A and B) simultaneously could result in conflicts with theDAG_SCHEDULER_LOG_PATH
.
You're absolutely right that if we use some global value, it would lead to conflicts. However, I think this should be local. Setting an environment variable inside a process does not affect other processes. We can just assign a new internal id field to each step, starting from 1, every time, during the DAG building step. This value would be strictly scoped within the agent process, so we don't have to worry about conflicts with other concurrent workflow executions.
Regarding your example:
A: Dag test, running step1, pass
${STEP_1_DAG_SCHEDULER_LOG_PATH}
toDAG_SCHEDULER_LOG_PATH
B:Dag anotherOneTest, running step2, pass ${STEP_2_DAG_SCHEDULER_LOG_PATH}
toDAG_SCHEDULER_LOG_PATH
Running the steps of the two DAGs (A and B) simultaneously wouldn't result in conflicts with the DAG_SCHEDULER_LOG_PATH
because each would be in its own process.
If we are to convert DAG_SCHEDULER_LOG_PATH to ${STEP_ID_DAG_SCHEDULER_LOG_PATH}, we need to specifically determine whether it is the designated environment variable, such as DAG_SCHEDULER_LOG_PATH. In this case, if the scenario is only used for echoing the corresponding build context variables in the command, we can directly replace the specific ${DAG_SCHEDULER_LOG_PATH}. Are there any other considerations for different scenarios?
Yes, there are other scenarios to consider. For example, users might want to use the environment variable inside a Python script, where we can't replace the string. So I think just replacing the command arguments doesn't work for all use cases.
Thank you for bringing up these important considerations. Let me know if you have any thoughts on this or if there's anything else we should discuss.
However, I've been thinking that this might be a more complex implementation than we initially thought, with potential impacts in various areas of the codebase. Sorry for the issue description was not very clear in details. Given this complexity, it might be better to put this aside for now and focus on other issues, for example paging parameters for DAG list API. What do you think?
Thank you for your hard work and deep thoughts on this PR. It is very helpful for me to clarify the problem and potential solutions significantly. Your contributions are truly valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yohamta Thank you for your reply! I think so, it might be better to put aside this for now and focus on DAG list API pagination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yohamta hi!I will start working on this issue again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @halalala222, sure! Thank you very much! That would be awesome!
61e3c13
to
e2957f6
Compare
Command echo
|
@yohamta Hi, I've updated my PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the amazing work! Overall looking good! However, there're some changes I would like to suggest.
internal/dag/executor/command.go
Outdated
cmd.Dir = step.Dir | ||
cmd.Env = append(cmd.Env, os.Environ()...) | ||
cmd.Env = append(cmd.Env, step.Variables...) | ||
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", constants.StepDaguExecutionLogPathKeySuffix, os.Getenv(stepSpecialExecutionLogPathKey))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to something like this?
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", constants.StepDaguExecutionLogPathKeySuffix, os.Getenv(stepSpecialExecutionLogPathKey))) | |
// Set environment variables from the DAG context. | |
dagContext, err := dag.GetContext(ctx) | |
if err != nil { | |
return nil, err | |
} | |
cmd.Env = append(cmd.Env, dagContext.Envs...) |
internal/dag/scheduler/node.go
Outdated
if err = os.Setenv(util.GenerateStepSpecialExecutionLogPathKey(n.id), n.data.State.Log); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use DAG context to pass down the special environment variables?
if err = os.Setenv(util.GenerateStepSpecialExecutionLogPathKey(n.id), n.data.State.Log); err != nil { | |
return err | |
} | |
// Execute runs the command synchronously and returns error if any. | |
func (n *Node) Execute(ctx context.Context) error { | |
dagContext, err := dag.GetContext(ctx) | |
if err !=nil { | |
return err | |
} | |
// set special environment variables | |
globalStepLogEnvKey := dag.GenGlobalStepLogEnvKey(n.Step.ID) | |
if err = os.Setenv(globalStepLogEnvKey, n.data.State.Log); err != nil { | |
return err | |
} | |
dagContext.Envs = append( | |
dagContext.Envs, | |
dag.StepLogPathEnvKey + "=" + n.data.State.Log, | |
) | |
ctx = dag.WithContext(ctx, dagContext) | |
// ...... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes!!!!! Thank you!
internal/util/utils.go
Outdated
@@ -225,3 +228,21 @@ func AddYamlExtension(file string) string { | |||
} | |||
return file | |||
} | |||
|
|||
func GenerateStepSpecialExecutionLogPathKey(stepID int) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define this function within the dag
package? I wonder that if it could be smaller scoped, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Thank you!
I will modify my code during the day tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yohamta Hi! Hi, I've updated my PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!👍✨✨✨🚀🚀🚀 Thank you very much!
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #654 +/- ##
==========================================
- Coverage 65.12% 64.72% -0.41%
==========================================
Files 53 53
Lines 4278 4306 +28
==========================================
+ Hits 2786 2787 +1
Misses 1263 1263
- Partials 229 256 +27
... and 5 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
Pass Build-in Execution Context Variables as command args.
example: