Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature-13331][Remote Logging] Add support for writing task logs to OSS #13332

Merged
merged 3 commits into from
Feb 13, 2023

Conversation

rickchengx
Copy link
Contributor

@rickchengx rickchengx commented Jan 4, 2023

Purpose of the pull request

Add support for writing task logs to OSS

Brief change log

截屏2023-01-18 14 20 57

Task log writing

  • master / worker will send the task log to the remote storage asynchronously after the task is finished if remote.logging.enable=true (By default it's false)

Task log reading

  • master / worker will first read the task log if the log file exists on the local file system
  • if the task log file does not exist, master / worker will download the task log file from the remote storage to the local file system and then read the task log file

Log retention

  • Log retention can be directly configured using the retention policy provided by the remote storage
  • E.g., log retention on OSS

Verify this pull request

manually tested

  1. Configure related variables in common.properties
# Whether to enable remote logging
remote.logging.enable=true
# if remote.logging.enable = true, set the target of remote logging
remote.logging.target=OSS
# log base directory
remote.logging.base.dir=logs
# oss access key id, required if you set remote.logging.target=OSS
remote.logging.oss.access.key.id=<access.key.id>
# oss access key secret, required if you set remote.logging.target=OSS
remote.logging.oss.access.key.secret=<access.key.secret>
# oss bucket name, required if you set remote.logging.target=OSS
remote.logging.oss.bucket.name=<bucket.name>
# oss endpoint, required if you set remote.logging.target=OSS
remote.logging.oss.endpoint=<endpoint>
  1. Create a shell task

  2. View the task log

截屏2023-01-04 11 08 23

  1. Check that the task log is sent to OSS

截屏2023-01-04 11 22 02

  1. delete the local task log file

  2. View the task log again. The task log will be download from OSS.

截屏2023-01-04 11 10 36

@codecov-commenter
Copy link

codecov-commenter commented Jan 4, 2023

Codecov Report

Merging #13332 (02cd619) into dev (a6550bc) will decrease coverage by 0.11%.
The diff coverage is 6.89%.

@@             Coverage Diff              @@
##                dev   #13332      +/-   ##
============================================
- Coverage     39.57%   39.47%   -0.11%     
- Complexity     4373     4377       +4     
============================================
  Files          1097     1102       +5     
  Lines         41293    41437     +144     
  Branches       4723     4736      +13     
============================================
+ Hits          16342    16356      +14     
- Misses        23138    23263     +125     
- Partials       1813     1818       +5     
Impacted Files Coverage Δ
...e/dolphinscheduler/common/constants/Constants.java 75.00% <ø> (ø)
...r/common/log/remote/RemoteLogHandleThreadPool.java 0.00% <0.00%> (ø)
...ler/common/log/remote/RemoteLogHandlerFactory.java 0.00% <0.00%> (ø)
...nscheduler/common/log/remote/RemoteLogService.java 0.00% <0.00%> (ø)
...hinscheduler/common/log/remote/RemoteLogUtils.java 0.00% <0.00%> (ø)
.../server/master/runner/WorkflowExecuteRunnable.java 10.09% <0.00%> (-0.06%) ⬇️
...duler/remote/processor/LoggerRequestProcessor.java 0.00% <0.00%> (ø)
...duler/plugin/task/api/AbstractCommandExecutor.java 0.00% <0.00%> (ø)
...lphinscheduler/plugin/task/api/utils/LogUtils.java 26.96% <ø> (ø)
...heduler/common/log/remote/OssRemoteLogHandler.java 16.00% <16.00%> (ø)
... and 5 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

# remote logging
remote.logging.enable=false
# if remote.logging.enable = true, set the target of remote logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a comment to tell users we support oss only currently?

Copy link
Contributor Author

@rickchengx rickchengx Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add it in the doc.

remote.logging.oss.endpoint=<endpoint>
# oss base directory, required if you set remote.logging.target=OSS
remote.logging.oss.base.dir=logs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether some of our remote log plugin without base.dir, if all of them have this config, we should use remote.logging.base.dir instead of remote.logging.oss.base.dir

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about this problem, and I rechecked the relevant configuration in Airflow.

I think we could use remote.logging.base.dir instead, which can be shared by multiple plugins (OSS / S3 / GCS)


void sendRemoteLog(String logPath);

void getRemoteLog(String logPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also not add an interface to test whether the current remote target work or not?

Copy link
Contributor Author

@rickchengx rickchengx Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @zhongjiajie , I am not sure about this.

Do you mean that the user can click a button somewhere on the front end to test whether the remote storage is available?

The usage scenario of remote logging is somewhat similar to that of the resource center, but it seems that StorageOperate does not currently have a similar interface.

But I add checkBucketNameExists(), like OssStorageOperator and S3StorageOperator.


public RemoteLogHandler getRemoteLogHandler() {
if ("true".equalsIgnoreCase(PropertyUtils.getString(Constants.REMOTE_LOGGING_ENABLE))) {
if ("OSS".equalsIgnoreCase(PropertyUtils.getString(Constants.REMOTE_LOGGING_TARGET))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have getUpperCaseString method now

Suggested change
if ("OSS".equalsIgnoreCase(PropertyUtils.getString(Constants.REMOTE_LOGGING_TARGET))) {
if ("OSS".equals(PropertyUtils.getUpperCaseString(Constants.REMOTE_LOGGING_TARGET))) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll use getUpperCaseString.

public class RemoteLogHandlerFactory {

public RemoteLogHandler getRemoteLogHandler() {
if ("true".equalsIgnoreCase(PropertyUtils.getString(Constants.REMOTE_LOGGING_ENABLE))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about use PropertyUtils.getBoolean directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

private String getObjectNameFromLogPath(String logPath) {
Path path = Paths.get(logPath);
int nameCount = path.getNameCount();
if (nameCount < 2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use constants for magic number 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Comment on lines 2188 to 2189
if ("true".equalsIgnoreCase(PropertyUtils.getString(Constants.REMOTE_LOGGING_ENABLE))) {
if (taskInstance.getHost().endsWith(masterAddress.split(":")[1])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use PropertyUtils.getBoolean instead of string, and we have COLON in constants for :

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

logger.info("Succeed to send master's log {} to remote target {}", taskInstance.getLogPath(),
PropertyUtils.getString(Constants.REMOTE_LOGGING_TARGET));
} catch (Exception e) {
logger.error("send master's log {} to remote target error", taskInstance.getLogPath(), e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find out we already catch exceptions in sendRemoteLog method is there any expected exception will be thrown during log sending? if not I think one catch is enough, in handler or in currently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll look into it.

logger.info("Start to send log {} to remote target {}", taskExecutionContext.getLogPath(),
PropertyUtils.getString(Constants.REMOTE_LOGGING_TARGET));
logger.info("Wait log {} to be flushed...", taskExecutionContext.getLogPath());
Thread.sleep(5000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we have to sleep for 5s here? can we run continue when we get log path immediately?

Copy link
Contributor Author

@rickchengx rickchengx Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for waiting here is to allow the final output of the log to be written to the local file
Here is an example.

截屏2023-01-05 10 19 26

The green box in the figure is the last output of the task log, which is later than the time when the log is uploaded to the remote storage.

Therefore, if the thread does not use sleep(), the log uploaded to the remote storage will lose the statements in the green box.

WDYT, or is there any other better way to flush the final output of the log?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you trigger sendRemoteLogIfNeeded in afterExecute, it does have to sleep for a few seconds, because when handling large amount of logs, async flush will take a lot of time.

So, I recommend you trigger in logHandle when handling FINALIZE_SESSION_MARKER which marks that log appender will be closed in few seconds, flush logic in ds(you can check in parseProcessOutput) makes sure that no logs will reach in these few seconds and we don't have to wait for five seconds any more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you trigger sendRemoteLogIfNeeded in afterExecute, it does have to sleep for a few seconds, because when handling large amount of logs, async flush will take a lot of time.

So, I recommend you trigger in logHandle when handling FINALIZE_SESSION_MARKER which marks that log appender will be closed in few seconds, flush logic in ds(you can check in parseProcessOutput) makes sure that no logs will reach in these few seconds and we don't have to wait for five seconds any more.

Hi, @Radeity Thanks a lot for your suggestion! I think this is a good idea and I'll look into it.

@rickchengx
Copy link
Contributor Author

Hi, @zhongjiajie , thanks a lot for your review and comments.

I will carefully modify according to the suggestions and add related UT and documents.

@EricGao888 EricGao888 added this to the 3.2.0 milestone Jan 5, 2023
@EricGao888 EricGao888 added feature new feature improvement make more easy to user or prompt friendly 3.2.0 for 3.2.0 version labels Jan 5, 2023
@rickchengx
Copy link
Contributor Author

rickchengx commented Jan 17, 2023

Hi, @zhongjiajie @Radeity, thanks again for your kind review. I have modified this PR according to your comments and suggestions.

Here are some brief changes:

  1. Remove the Thread.sleep() in WorkerTaskExecuteRunnable
  • send the task log to the remote storage in clear() method in AbstractCommandExecutor, so there is no need to sleep anymore.

截屏2023-01-17 15 07 36

  • Note that not all tasks will use logHandle and FINALIZE_SESSION_MARKER, such as ZEPPELIN task and the tasks executed on master. So this PR also sends the task log to the remote storage in afterExecuted() and afterThrowing() on worker (if the task does not use logHandle) and taskFinished on master (if the task is executed on master).

Here is some examples:

  • Shell Task (use logHandle)

截屏2023-01-16 14 08 29

  • Zeppelin Task (does not use logHandle)

截屏2023-01-16 14 08 50

  1. Add the doc

截屏2023-01-18 14 03 14

  • I am not sure if the location of this doc is appropriate.

@rickchengx rickchengx force-pushed the Feature-13331 branch 2 times, most recently from 9700d2b to c7b29cf Compare January 17, 2023 07:44
@rickchengx rickchengx marked this pull request as draft January 17, 2023 08:48
@rickchengx rickchengx marked this pull request as ready for review January 17, 2023 08:54
@rickchengx rickchengx marked this pull request as draft January 18, 2023 02:24
@rickchengx rickchengx marked this pull request as ready for review January 18, 2023 05:59
@rickchengx
Copy link
Contributor Author

rickchengx commented Jan 18, 2023

In order to prevent remote logging from affecting the end processing process of the task, the log is written to the remote storage in an asynchronous manner.

截屏2023-01-18 14 20 57

@rickchengx
Copy link
Contributor Author

Hi, @zhongjiajie , could you please help review this again?

@rickchengx rickchengx force-pushed the Feature-13331 branch 2 times, most recently from 5af18c6 to 475906d Compare February 9, 2023 08:27
@@ -100,6 +101,10 @@ public AbstractCommandExecutor(Consumer<LinkedBlockingQueue<String>> logHandler,
this.taskRequest = taskRequest;
this.logger = logger;
this.logBuffer = new LinkedBlockingQueue<>();

if (this.taskRequest != null) {
this.taskRequest.setLogHandleEnable(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why set default true? Please add some comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed logHandleEnable to logBufferEnable for better readability.

This PR records whether the logBuffer is enabled in the task context for the following resaons:

There are two types of tasks:

  1. Tasks use logBuffer to cache logs (tasks that use ShellCommandExecutor which extends AbstractCommandExecutor)
  • At the end of this type of task, the log is not completely written to the file system, so it needs to write the log to the remote storage when the cache is finally emptied
  • it sends the task log to the remote storage in clear() method in AbstractCommandExecutor as below

截屏2023-01-17 15 07 36

  1. Tasks that not use logBuffer
  • Note that not all tasks will use logBuffer , such as ZEPPELIN task and the tasks executed on master. So this PR also sends the task log to the remote storage in afterExecuted() and afterThrowing() on worker (if the task does not use logBuffer) and taskFinished on master (if the task is executed on master).

@rickchengx rickchengx force-pushed the Feature-13331 branch 2 times, most recently from b93a2e4 to 9252c8e Compare February 10, 2023 02:32
@rickchengx
Copy link
Contributor Author

Hi, @caishunfeng , thanks a lot for your review and comments. I've modified it according to your suggestions, could you please help review this again?

@rickchengx rickchengx force-pushed the Feature-13331 branch 2 times, most recently from b70bed3 to 72c3daf Compare February 10, 2023 10:29
caishunfeng
caishunfeng previously approved these changes Feb 12, 2023
Copy link
Contributor

@caishunfeng caishunfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@caishunfeng
Copy link
Contributor

Hi, @caishunfeng , thanks a lot for your review and comments. I've modified it according to your suggestions, could you please help review this again?

Sure, and please resolve the conflicts.

@rickchengx
Copy link
Contributor Author

Hi, @caishunfeng , I've resolved the conflicts.

Copy link
Member

@zhongjiajie zhongjiajie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, one more things, we should add our new adding docs remote-logging.md to https://github.com/apache/dolphinscheduler/blob/dev/docs/configs/docsdev.js file, otherwise your new docs will not show in our website

@rickchengx
Copy link
Contributor Author

@sonarcloud
Copy link

sonarcloud bot commented Feb 13, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 6 Code Smells

8.4% 8.4% Coverage
0.0% 0.0% Duplication

@rickchengx
Copy link
Contributor Author

Hi, @zhongjiajie , I've add configs in docsdev.js

Copy link
Member

@zhongjiajie zhongjiajie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now

@zhongjiajie zhongjiajie merged commit 2bd65fb into apache:dev Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.2.0 for 3.2.0 version backend document feature new feature improvement make more easy to user or prompt friendly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][Remote Logging] Add support for writing task logs to OSS
6 participants