Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup serialiazation of TaskReportMap #16217

Merged
merged 13 commits into from
Apr 1, 2024

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Mar 29, 2024

Issue

While serializing a Map or even a List containing TaskReport objects, the type information is lost. Thus, the object cannot be serialized back.

This is a known issue with Jackson.

Existing solution in Druid

The serialization of a Map<String, TaskReport> was originally fixed in #12938.

The way this has been tackled in the Druid code till now is:

  • Use custom serialisation logic in SingleFileTaskReportFileWriter.writeReportToStream() to write out each TaskReport object one by one
  • For live reports that are served over HTTP, build a raw map instead of a concrete TaskReport object

Proposed solution

Add a new ReportMap class.

Changes

  • Add class TaskReport.ReportMap
  • Have TaskReport.buildTaskReports() return the new class
  • Add serde tests for writing to string, file for the known task report types
  • Replace all occurrences of Map<String, TaskReport> with TaskReport.ReportMap to ensure
    that we always use this class for serialization of reports
  • Add method AbstractBatchIndexTask.buildLiveIngestionStatsReport() to reduce duplication and hard-coding of serializable field names.

Important classes

  • TaskReport
  • AbstractBatchIndexTask
  • SingleFileTaskReportFileWriter
  • ParallelIndexSupervisorTask
  • SinglePhaseSubTask
  • IndexTask

Rolling upgrade concerns

None

  • Serialization: results in the same json as before
  • Deserialization: works with the same json as before

@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Mar 29, 2024
Assert.assertEquals(reportMap1, reportMap2);
}

@Test
public void testWriteReportMapToStringAndRead() throws Exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add a test to verify that a serialized old type task report Map<String, TaskReport> deserializes correctly into the new type TaskReport.ReportMap? Should address any upgrade concerns.

Ditto for the reverse roundtrip for a downgrade scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that makes sense.

* a TaskReport is serialized without the type information and cannot be
* deserialized back into a concrete implementation.
*/
class ReportMap extends LinkedHashMap<String, TaskReport>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any tests that verify the reports are indeed ordered since we rely on a LinkedHashMap? Just looking at the callers of buildTaskReports(), I don't seem to find any.

Copy link
Contributor Author

@kfaraz kfaraz Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I can add a test to verify the order. Although, I don't see any actual task writing a report map that contains multiple entries. Also not sure why the order was considered to be important in the first place, its json anyway.

@@ -546,12 +548,17 @@ public ListenableFuture<Void> runTask(String taskId, Object taskObject)

@Override
public ListenableFuture<Map<String, Object>> taskReportAsMap(String taskId)
{
return Futures.immediateFuture(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this call getLiveReportsForTask(taskId)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was doing that originally but not needed right now as I have added the other method getLiveReportsForTask() just below this one.

This is anyway used only in the tests and I plan to fix it back up once I replace the Map<String, Object> in the OverlordClient with TaskReport.ReportMap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense

return Futures.immediateFuture(null);
}

public TaskReport.ReportMap getLiveReportsForTask(String taskId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public TaskReport.ReportMap getLiveReportsForTask(String taskId)
protected TaskReport.ReportMap getLiveReportsForTask(String taskId)

@@ -546,12 +548,17 @@ public ListenableFuture<Void> runTask(String taskId, Object taskObject)

@Override
public ListenableFuture<Map<String, Object>> taskReportAsMap(String taskId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can taskReportAsMap() now return the concrete type TaskReport.ReportMap instead of Map<String, Object>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have that change in a follow up PR. Didn't do it here as it requires moving all the TaskReport related classes to the druid-processing module, so that OverlordClient can use it.

@kfaraz
Copy link
Contributor Author

kfaraz commented Apr 1, 2024

Thanks a lot for the review, @abhishekrb19 !
Do you think I can include your suggestions in my follow up PR as they are all related to tests?
I don't want to invoke another CI cycle if I can help it 😛 .

Copy link
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! I'm ok with doing the suggestions in a follow-up 👍

@gianm gianm merged commit 0de44d9 into apache:master Apr 1, 2024
85 checks passed
@kfaraz kfaraz deleted the further_report_cleanup branch April 1, 2024 19:23
kfaraz added a commit that referenced this pull request Apr 15, 2024
Follow up to #16217 

Changes:
- Update `OverlordClient.getReportAsMap()` to return `TaskReport.ReportMap`
- Move the following classes to `org.apache.druid.indexer.report` in the `druid-processing` module
  - `TaskReport`
  - `KillTaskReport`
  - `IngestionStatsAndErrorsTaskReport`
  - `TaskContextReport`
  - `TaskReportFileWriter`
  - `SingleFileTaskReportFileWriter`
  - `TaskReportSerdeTest`
- Remove `MsqOverlordResourceTestClient` as it had only one method
which is already present in `OverlordResourceTestClient` itself
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Streaming Ingestion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants