Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Op#3648 create streaming track parser #554

Open
wants to merge 43 commits into
base: main
Choose a base branch
from

Conversation

KueblerJelle
Copy link
Contributor

@KueblerJelle KueblerJelle commented Sep 18, 2024

Introduced streaming track and detection parser.

Detections in json format are parsed as streams and grouped to tracks.
The finished flag marks the end of a track. After parsing the last detection of a track, that track is provided to the stream of tracks.

Added new implementation of Cli and added command line arguments:
cli_mode (stream or bulk), cli_chunk_size (in case of streaming mode, this confidures the number of tracks to be collected before performing cuting/counting etc.)

To reduce the memory consumption, track and and event repository should be cleared regularly.
This however requires the export of results to work incrementally.
Therefore an export mode is introduced:

  • Overwrite -> write the given results into the file, overwrite existing results
  • Initial Merge -> write or collect results incrementally, on first merge maybe include metadata such as csv header
  • Merge -> subsequent incremental results (append or collect)
  • Flush -> final result increment, if results were collected in memory, flush/write to file

Adapted all current result export implementations to respect the export mode.

Parameterized test_cli and regression_otanalytics to test both cli_modes.

KueblerJelle and others added 30 commits December 12, 2023 18:00
add lazy version of track dataset and detection parser
Detection parser can accept a stream of file contents to lazily load files only when they are needed
…ode duplication with python track parser

make some functionality reusable through classmethod
use bzip 2 json stream reader to sort files by start date
renamed BulkPythonDetectionParser back to PythonDetectionParser
extracted methods for creating PythonDetections and PythonTracks as static methods on module level
i have some old ottrk files for testing that could no longer be parsed
a new metadata format fixer fixes the old date string format by transforming it to timestamp format
TODO: review versions of fixer
clean up streaming parser and add more doc strings
found potential memory leak: delete det_list in StreamOttrkParser.parser()
OttrkVideoParser crashed in a manual test because the metadata format did not match -> apply format fixer
also reduced code duplication in cli.py and main_application.py

also fixed/adapted tests
ar0305 added 10 commits May 27, 2024 13:55
now a chunk size can be specified, tracks will be loaded from the track files until n tracks are collected
the collected tracks are transformed into a TrackDataset, the caller of StreamingOttrkParser can specify a TrackDataset factory
…o be stateful to allow for incremental result exports

Added ExportMode to various export specification data classes:
ExportMode Overwrite for non incremental result export
ExportModes INITIAL_MERGE, MERGE to append data
ExportMode FLUSH to write out results aggregated in stateful exporters
All exporters now consider this export mode

Introduced missing EventExportSpecification to be more similar to other exporters

If first event export has no events the resulting csv/excel have no header, as the data frame is empty, therefore specified all columns as header list

Fixed bug in EventRepository:
clear previously only checked for the existence of (section)events, if none existed nothing was cleared, even if non_section_events were present

stateful exporters for counts require to be cached in a newly added CachedExporterFactory

Added merge_into_dict functions to TracksMetadata and VideosMetadata to allow for incrementally updating a dict with metadata updates
fixed bugs:
no longer use Counter in CsvExport of COunts, instead manually increment counted tags using defaultdict

Streaming parser now has a set of VideosMetadata and TracksMetadata. instead of a list, to avoid duplications

write event list data frame even if empty

OTAnalyticsStreamCli and OTAnalyticsBulkCli both use same attribute name _track_parser
Added error messages to assertions when comparing file content
@KueblerJelle KueblerJelle added the python Pull requests that update Python code label Sep 18, 2024
@KueblerJelle KueblerJelle self-assigned this Sep 18, 2024
ar0305 added 3 commits September 18, 2024 10:19
fixed type annotations for write mode
…comments

removed unused class SingletonTrackDataset
"""
Fix formate changes from older ottrk metadata
format versions to the current version.
"""
current_version = Version.from_str(metadata[ottrk_format.OTTRK_VERSION])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briemla Review: fixing metadata alone can be reused in streaming_parser, hence I made it public.
This might also apply to fixing single detection data, when they are parsed lazy (but currently not used).

def fix(self, metadata: dict, current_version: Version) -> dict:
return self.__fix_recorded_start_date(metadata, current_version)

def __fix_recorded_start_date(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briemla Review: Not sure in which version this changed

@@ -341,6 +391,65 @@ def __str__(self) -> str:
)


# TODO Review: these methods for creating PythonDetections and PythonTracks are static
# TODO and could live outside a class for reusability in streaming_parser
def parse_python_detection(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briemla Review: these methods for creating PythonDetections and PythonTracks are static and could live outside a class for reusability in streaming_parser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants