Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement diagnostic flare #454

Merged
merged 7 commits into from
Oct 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,11 @@ The environment variables with the `DATADOG_JENKINS_PLUGIN` namespace take prece

#### Logging

Logging is done by utilizing the `java.util.Logger`, which follows the [best logging practices for Jenkins][6]. To obtain logs, follow the directions in the [Jenkins logging documentation][6]. When adding a logger, all Datadog plugin functions start with `org.datadog.jenkins.plugins.datadog.` and the function name you are after should autopopulate. As of this writing, the only function available was `org.datadog.jenkins.plugins.datadog.listeners.DatadogBuildListener`.
Logging is done by utilizing the `java.util.Logger`, which follows the [best logging practices for Jenkins][6].

The plugin automatically registers a custom logger named "Datadog Plugin Logs" that writes the plugin's logs with level `INFO` or higher.
The custom logger registration can be disabled by setting the `DD_JENKINS_PLUGIN_LOG_RECORDER_ENABLED` environment variable to `false`.
If you want to see the plugin logs with maximum detail, manually change the level of the custom logger to `ALL`.

## Customization

Expand Down Expand Up @@ -422,6 +426,23 @@ NOTE: As mentioned in the [job customization](#job-customization) section, there

Build status `jenkins.job.status` with the default tags: : `jenkins_url`, `job`, `node`, `user_id`

## Troubleshooting

### Generating a diagnostic flare.

Plugin diagnostic flare contains data that can be used to diagnose problems with the plugin.
At the time of this writing the flare includes the following:
- plugin configuration in XML format
- plugin connectivity checks results
- runtime data (current versions of JVM, Jenkins Core, plugin)
- recent exceptions that happened inside the plugin code
- plugin logs with level `INFO` and above, and recent Jenkins controller logs
- current stacks of the threads of the Jenkins controller process
- environment variables starting with `DD_` or `DATADOG_` (except API key and/or APP key)

To generate a flare go to the `Manage Jenkins` page, find the `Troubleshooting` section, and select `Datadog`.
Click on `Download Diagnostic Flare` (requires "MANAGE" permissions) to generate the flare.

## Issue tracking

GitHub's built-in issue tracking system is used to track all issues relating to this plugin: [jenkinsci/datadog-plugin/issues][7].
Expand Down
5 changes: 5 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,11 @@
<artifactId>bcpg-jdk18on</artifactId>
<version>1.72</version>
</dependency>
<dependency>
<groupId>io.jenkins.lib</groupId>
<artifactId>support-log-formatter</artifactId>
<version>1.2</version>
</dependency>
</dependencies>

<build>
Expand Down
Binary file added src/main/.DS_Store
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
package org.datadog.jenkins.plugins.datadog;

import edu.umd.cs.findbugs.annotations.CheckForNull;
import edu.umd.cs.findbugs.annotations.NonNull;
import hudson.Extension;
import hudson.ExtensionList;
import hudson.model.ManagementLink;
import hudson.security.Permission;
import jenkins.model.Jenkins;
import net.sf.json.JSONArray;
import net.sf.json.JSONObject;
import org.apache.commons.lang3.exception.ExceptionUtils;
import org.datadog.jenkins.plugins.datadog.flare.FlareContributor;
import org.kohsuke.stapler.StaplerRequest;
import org.kohsuke.stapler.StaplerResponse;
import org.kohsuke.stapler.interceptor.RequirePOST;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServletResponse;
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
import java.time.LocalDateTime;
import java.time.ZoneOffset;
import java.time.format.DateTimeFormatter;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

@Extension
public class DatadogPluginManagement extends ManagementLink {

private static final Logger LOGGER = Logger.getLogger(DatadogPluginManagement.class.getName());

private final List<FlareContributor> contributors;

public DatadogPluginManagement() {
contributors = new ArrayList<>(ExtensionList.lookup(FlareContributor.class));
contributors.sort(Comparator.comparingInt(FlareContributor::order));
}

public List<FlareContributor> getContributors() {
return contributors;
}

@CheckForNull
@Override
public String getIconFileName() {
return "/plugin/datadog/icons/dd_icon_rgb.svg";
}

@CheckForNull
@Override
public String getDisplayName() {
return "Datadog";
}

@CheckForNull
@Override
public String getUrlName() {
return "datadog";
}

@Override
public String getDescription() {
return "Datadog Plugin Troubleshooting";
}

@NonNull
@Override
public Category getCategory() {
return Category.TROUBLESHOOTING;
}

@NonNull
@Override
public Permission getRequiredPermission() {
return Jenkins.MANAGE;
}

@RequirePOST
public void doDownloadDiagnosticFlare(StaplerRequest request, StaplerResponse response) throws Exception {
if (!Jenkins.get().hasPermission(Jenkins.MANAGE)) {
response.sendError(HttpServletResponse.SC_FORBIDDEN);
return;
}

try {
LocalDateTime now = LocalDateTime.now(ZoneOffset.UTC);
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd-HH-mm-ss");
String formattedTimestamp = now.format(formatter);

response.setContentType("application/octet-stream");
response.setHeader("Content-Disposition", String.format("attachment; filename=dd-jenkins-plugin-flare-%s.zip", formattedTimestamp));

try (OutputStream out = response.getOutputStream()) {
List<FlareContributor> selectedContributors = getSelectedContributors(request);
writeDiagnosticFlare(selectedContributors, out);
}

} catch (InterruptedException e) {
Thread.currentThread().interrupt();
LOGGER.severe("Interrupted while generating Datadog plugin flare");

} catch (Exception e) {
LOGGER.log(Level.SEVERE, "Failed to generate Datadog plugin flare", e);
response.sendError(HttpServletResponse.SC_INTERNAL_SERVER_ERROR);
}
}

private List<FlareContributor> getSelectedContributors(StaplerRequest request) throws ServletException {
JSONObject form = request.getSubmittedForm();
JSONArray selectedUiControls = form.getJSONArray("selectedContributors");

List<FlareContributor> selectedContributors = new ArrayList<>();
for (int i = 0; i < selectedUiControls.size(); i++) {
if (selectedUiControls.getBoolean(i)) {
selectedContributors.add(contributors.get(i));
}
}
return selectedContributors;
}

private void writeDiagnosticFlare(List<FlareContributor> selectedContributors, OutputStream out) throws Exception {
try (ZipOutputStream zipOut = new ZipOutputStream(out)) {
for (FlareContributor contributor : selectedContributors) {
zipOut.putNextEntry(new ZipEntry(contributor.getFilename()));
try {
contributor.writeFileContents(zipOut);
} catch (Exception e) {
LOGGER.log(Level.SEVERE, "Datadog plugin flare contributor failed: " + contributor.getClass(), e);

zipOut.closeEntry();
zipOut.putNextEntry(new ZipEntry(contributor.getFilename() + ".error"));
zipOut.write(ExceptionUtils.getStackTrace(e).getBytes(StandardCharsets.UTF_8));
} finally {
zipOut.closeEntry();
}
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ of this software and associated documentation files (the "Software"), to deal
import org.apache.commons.lang.StringEscapeUtils;
import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang.exception.ExceptionUtils;
import org.apache.commons.lang3.tuple.Pair;
import org.datadog.jenkins.plugins.datadog.apm.ShellCommandCallable;
import org.datadog.jenkins.plugins.datadog.clients.HttpClient;
import org.datadog.jenkins.plugins.datadog.model.DatadogPluginAction;
Expand All @@ -59,6 +60,8 @@ of this software and associated documentation files (the "Software"), to deal
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.*;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.function.Function;
import java.util.logging.Level;
import java.util.logging.Logger;
Expand Down Expand Up @@ -954,6 +957,8 @@ public static void severe(Logger logger, Throwable e, String message) {

public static void logException(Logger logger, Level logLevel, String message, Throwable e) {
if (e != null) {
addExceptionToBuffer(e);

String stackTrace = ExceptionUtils.getStackTrace(e);
message = (message != null ? message + " " : "An unexpected error occurred: ") + stackTrace;
}
Expand All @@ -962,6 +967,49 @@ public static void logException(Logger logger, Level logLevel, String message, T
}
}

private static final String EXCEPTIONS_BUFFER_CAPACITY_ENV_VAR = "DD_JENKINS_EXCEPTIONS_BUFFER_CAPACITY";
private static final int DEFAULT_EXCEPTIONS_BUFFER_CAPACITY = 100;
private static final BlockingQueue<Pair<Date, Throwable>> EXCEPTIONS_BUFFER;

static {
int bufferCapacity = getExceptionsBufferCapacity();
if (bufferCapacity > 0) {
EXCEPTIONS_BUFFER = new ArrayBlockingQueue<>(bufferCapacity);
} else {
EXCEPTIONS_BUFFER = null;
}
}

private static int getExceptionsBufferCapacity() {
String bufferCapacityString = System.getenv("EXCEPTIONS_BUFFER_CAPACITY_ENV_VAR");
if (bufferCapacityString == null) {
return DEFAULT_EXCEPTIONS_BUFFER_CAPACITY;
} else {
try {
return Integer.parseInt(bufferCapacityString);
} catch (NumberFormatException e) {
severe(logger, e, EXCEPTIONS_BUFFER_CAPACITY_ENV_VAR + " environment variable has invalid value");
return DEFAULT_EXCEPTIONS_BUFFER_CAPACITY;
}
}
}

private static void addExceptionToBuffer(Throwable e) {
if (EXCEPTIONS_BUFFER == null) {
return;
}
Pair<Date, Throwable> p = Pair.of(new Date(), e);
while (!EXCEPTIONS_BUFFER.offer(p)) {
// rather than popping elements one by one, we drain several with one operation to reduce lock contention
int drainSize = Math.max(DEFAULT_EXCEPTIONS_BUFFER_CAPACITY / 10, 1);
EXCEPTIONS_BUFFER.drainTo(new ArrayList<>(drainSize), drainSize);
}
}

public static BlockingQueue<Pair<Date, Throwable>> getExceptionsBuffer() {
return EXCEPTIONS_BUFFER;
}

public static int toInt(boolean b) {
return b ? 1 : 0;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ public static boolean validateDefaultIntakeConnection(String validatedUrl, Secre
}

@SuppressFBWarnings("DLS_DEAD_LOCAL_STORE")
private static boolean validateWebhookIntakeConnection(String webhookIntakeUrl, Secret apiKey) {
public static boolean validateWebhookIntakeConnection(String webhookIntakeUrl, Secret apiKey) {
Map<String, String> headers = new HashMap<>();
headers.put("DD-API-KEY", Secret.toString(apiKey));

Expand All @@ -157,7 +157,7 @@ private static boolean validateWebhookIntakeConnection(String webhookIntakeUrl,
}
}

private static boolean validateLogIntakeConnection(String logsIntakeUrl, Secret apiKey) {
public static boolean validateLogIntakeConnection(String logsIntakeUrl, Secret apiKey) {
String payload = "{\"message\":\"[datadog-plugin] Check connection\", " +
"\"ddsource\":\"Jenkins\", \"service\":\"Jenkins\", " +
"\"hostname\":\"" + DatadogUtilities.getHostname(null) + "\"}";
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
package org.datadog.jenkins.plugins.datadog.flare;

import hudson.Extension;
import net.sf.json.JSONObject;
import org.apache.commons.io.IOUtils;
import org.datadog.jenkins.plugins.datadog.DatadogClient;
import org.datadog.jenkins.plugins.datadog.DatadogGlobalConfiguration;
import org.datadog.jenkins.plugins.datadog.DatadogUtilities;
import org.datadog.jenkins.plugins.datadog.clients.DatadogApiClient;

import java.io.IOException;
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;

@Extension
public class ConnectivityChecksFlare implements FlareContributor {

@Override
public int order() {
return 2;
}

@Override
public String getDescription() {
return "Connectivity check results";
}

@Override
public String getFilename() {
return "connectivity-checks.json";
}

@Override
public void writeFileContents(OutputStream out) throws IOException {
JSONObject payload = new JSONObject();

// TODO rework the checks below following configuration refactoring

Check warning on line 37 in src/main/java/org/datadog/jenkins/plugins/datadog/flare/ConnectivityChecksFlare.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: rework the checks below following configuration refactoring
DatadogGlobalConfiguration globalConfiguration = DatadogUtilities.getDatadogGlobalDescriptor();
DatadogClient.ClientType clientType = DatadogClient.ClientType.valueOf(globalConfiguration.getReportWith());

if (clientType == DatadogClient.ClientType.DSD) {
payload.put("client-type", DatadogClient.ClientType.DSD);
payload.put("logs-connectivity", globalConfiguration.doCheckAgentConnectivityLogs(globalConfiguration.getTargetHost(), String.valueOf(globalConfiguration.getTargetLogCollectionPort())).toString());
payload.put("traces-connectivity", globalConfiguration.doCheckAgentConnectivityTraces(globalConfiguration.getTargetHost(), String.valueOf(globalConfiguration.getTargetTraceCollectionPort())).toString());

} else if (clientType == DatadogClient.ClientType.HTTP) {
payload.put("client-type", DatadogClient.ClientType.HTTP);
payload.put("api-connectivity", DatadogApiClient.validateDefaultIntakeConnection(globalConfiguration.getTargetApiURL(), globalConfiguration.getUsedApiKey()));
payload.put("logs-connectivity", DatadogApiClient.validateLogIntakeConnection(globalConfiguration.getTargetLogIntakeURL(), globalConfiguration.getUsedApiKey()));
payload.put("traces-connectivity", DatadogApiClient.validateWebhookIntakeConnection(globalConfiguration.getTargetWebhookIntakeURL(), globalConfiguration.getUsedApiKey()));

} else {
throw new IllegalArgumentException("Unsupported client type: " + clientType);
}

String payloadString = payload.toString(2);
IOUtils.write(payloadString, out, StandardCharsets.UTF_8);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
package org.datadog.jenkins.plugins.datadog.flare;

import hudson.Extension;
import hudson.util.XStream2;
import org.datadog.jenkins.plugins.datadog.DatadogGlobalConfiguration;
import org.datadog.jenkins.plugins.datadog.DatadogUtilities;

import java.io.IOException;
import java.io.OutputStream;

@Extension
public class DatadogConfigFlare implements FlareContributor {

// TODO use XSTREAM from DatadogGlobalConfiguration following configuration refactor

Check warning on line 14 in src/main/java/org/datadog/jenkins/plugins/datadog/flare/DatadogConfigFlare.java

View check run for this annotation

ci.jenkins.io / Open Tasks Scanner

TODO

NORMAL: use XSTREAM from DatadogGlobalConfiguration following configuration refactor
private static final XStream2 XSTREAM;

static {
XSTREAM = new XStream2();
XSTREAM.autodetectAnnotations(true);
}

@Override
public int order() {
return 1;
}

@Override
public String getDescription() {
return "Plugin configuration";
}

@Override
public String getFilename() {
return "DatadogGlobalConfiguration.xml";
}

@Override
public void writeFileContents(OutputStream out) throws IOException {
DatadogGlobalConfiguration globalConfiguration = DatadogUtilities.getDatadogGlobalDescriptor();
XSTREAM.toXMLUTF8(globalConfiguration, out);
}
}
Loading
Loading