Op#3648 create streaming track parser #554

KueblerJelle · 2024-09-18T07:31:21Z

Introduced streaming track and detection parser.

Detections in json format are parsed as streams and grouped to tracks.
The finished flag marks the end of a track. After parsing the last detection of a track, that track is provided to the stream of tracks.

Added new implementation of Cli and added command line arguments:
cli_mode (stream or bulk), cli_chunk_size (in case of streaming mode, this confidures the number of tracks to be collected before performing cuting/counting etc.)

To reduce the memory consumption, track and and event repository should be cleared regularly.
This however requires the export of results to work incrementally.
Therefore an export mode is introduced:

Overwrite -> write the given results into the file, overwrite existing results
Initial Merge -> write or collect results incrementally, on first merge maybe include metadata such as csv header
Merge -> subsequent incremental results (append or collect)
Flush -> final result increment, if results were collected in memory, flush/write to file

Adapted all current result export implementations to respect the export mode.

Parameterized test_cli and regression_otanalytics to test both cli_modes.

add lazy version of track dataset and detection parser

Detection parser can accept a stream of file contents to lazily load files only when they are needed

… made to TrackDataset in subclass

…ode duplication with python track parser make some functionality reusable through classmethod use bzip 2 json stream reader to sort files by start date

renamed BulkPythonDetectionParser back to PythonDetectionParser extracted methods for creating PythonDetections and PythonTracks as static methods on module level

i have some old ottrk files for testing that could no longer be parsed a new metadata format fixer fixes the old date string format by transforming it to timestamp format TODO: review versions of fixer

clean up streaming parser and add more doc strings

found potential memory leak: delete det_list in StreamOttrkParser.parser() OttrkVideoParser crashed in a manual test because the metadata format did not match -> apply format fixer

…n_application.py

also reduced code duplication in cli.py and main_application.py also fixed/adapted tests

…b.com/OpenTrafficCam/OTAnalytics into OP#3648-create-streaming-track-parser

fixed test_cli patches

…in SingletonTrackDataset

now a chunk size can be specified, tracks will be loaded from the track files until n tracks are collected the collected tracks are transformed into a TrackDataset, the caller of StreamingOttrkParser can specify a TrackDataset factory

…o be stateful to allow for incremental result exports Added ExportMode to various export specification data classes: ExportMode Overwrite for non incremental result export ExportModes INITIAL_MERGE, MERGE to append data ExportMode FLUSH to write out results aggregated in stateful exporters All exporters now consider this export mode Introduced missing EventExportSpecification to be more similar to other exporters If first event export has no events the resulting csv/excel have no header, as the data frame is empty, therefore specified all columns as header list Fixed bug in EventRepository: clear previously only checked for the existence of (section)events, if none existed nothing was cleared, even if non_section_events were present stateful exporters for counts require to be cached in a newly added CachedExporterFactory Added merge_into_dict functions to TracksMetadata and VideosMetadata to allow for incrementally updating a dict with metadata updates

fixed bugs: no longer use Counter in CsvExport of COunts, instead manually increment counted tags using defaultdict Streaming parser now has a set of VideosMetadata and TracksMetadata. instead of a list, to avoid duplications write event list data frame even if empty OTAnalyticsStreamCli and OTAnalyticsBulkCli both use same attribute name _track_parser

Added error messages to assertions when comparing file content

fixed type annotations for write mode

…comments removed unused class SingletonTrackDataset

KueblerJelle · 2024-09-18T08:28:04Z

OTAnalytics/plugin_parser/otvision_parser.py

+        """
+        Fix formate changes from older ottrk metadata
+        format versions to the current version.
+        """
        current_version = Version.from_str(metadata[ottrk_format.OTTRK_VERSION])


@briemla Review: fixing metadata alone can be reused in streaming_parser, hence I made it public.
This might also apply to fixing single detection data, when they are parsed lazy (but currently not used).

KueblerJelle · 2024-09-18T08:29:52Z

OTAnalytics/plugin_parser/otvision_parser.py

+    def fix(self, metadata: dict, current_version: Version) -> dict:
+        return self.__fix_recorded_start_date(metadata, current_version)
+
+    def __fix_recorded_start_date(


@briemla Review: Not sure in which version this changed

KueblerJelle · 2024-09-18T08:31:38Z

OTAnalytics/plugin_parser/otvision_parser.py

@@ -341,6 +391,65 @@ def __str__(self) -> str:
 )


+# TODO Review: these methods for creating PythonDetections and PythonTracks are static
+# TODO and could live outside a class for reusability in streaming_parser
+def parse_python_detection(


@briemla Review: these methods for creating PythonDetections and PythonTracks are static and could live outside a class for reusability in streaming_parser

KueblerJelle and others added 30 commits December 12, 2023 18:00

first attempts for streaming track parser

ed0c9f0

add lazy version of track dataset and detection parser

renamed lazy to streaming

6e3c81f

Merge branch 'main' into OP#3648-create-streaming-track-parser

5c2d677

Added functionality to parse detections as stream for multiple files

b3944ed

Detection parser can accept a stream of file contents to lazily load files only when they are needed

Merge branch 'main' into OP#3648-create-streaming-track-parser

66eb2d1

Adapted changes to DetectionParser: added id_generator

723f4c8

Reduced code duplication by inheriting OttrkParser, adapted additions…

ed30083

… made to TrackDataset in subclass

Add test of old parser to compare ram usage

2efca9d

Merge branch 'main' into OP#3648-create-streaming-track-parser

8939dd2

Merge branch 'main' into OP#3648-create-streaming-track-parser

9ca3fde

Merge branch 'main' into OP#3648-create-streaming-track-parser

e65ec68

refactored stream detection / track parser -> requires reduction of c…

5c5bfc9

…ode duplication with python track parser make some functionality reusable through classmethod use bzip 2 json stream reader to sort files by start date

added doc strings to streaming_parser.py

b1e627e

cleaned up otvision_parser

f01ed28

renamed BulkPythonDetectionParser back to PythonDetectionParser extracted methods for creating PythonDetections and PythonTracks as static methods on module level

added metadata format fixer for recorded start date format

328b291

i have some old ottrk files for testing that could no longer be parsed a new metadata format fixer fixes the old date string format by transforming it to timestamp format TODO: review versions of fixer

move stateless reusable methods from PythonTrackDataset to module level

26950b5

clean up streaming parser and add more doc strings

Added more doc strings to streaming_parser.py

ec9c022

fixed bugs in ottrk_parser and streaming_parser

2b7d87a

found potential memory leak: delete det_list in StreamOttrkParser.parser() OttrkVideoParser crashed in a manual test because the metadata format did not match -> apply format fixer

add stream version of OTAnalyticsCli and code to instantiate it in ma…

cec9362

…n_application.py

Merge branch 'main' into OP#3648-create-streaming-track-parser

4ab4171

apply streaming track parser in cli loop

7f23cd8

Use always all sections to create events

c5fd43a

Track export does not work with streaming parser

09a3fd9

Add ijson as dependency

d283752

Remove dependency to track repository in ModeTagger

a6c1216

Added cli-mode to cli args to start either bulk or stream processing

bd19151

also reduced code duplication in cli.py and main_application.py also fixed/adapted tests

Merge branch 'OP#3648-create-streaming-track-parser' of https://githu…

3c545de

…b.com/OpenTrafficCam/OTAnalytics into OP#3648-create-streaming-track-parser

fixed cli mode in ArgparseCliParser.parser()

21b6436

fixed MetadataFixer for recorded start date -> catch TypeError

c940583

fixed test_cli patches

Merge branch 'main' into OP#3648-create-streaming-track-parser

ed26a52

ar0305 added 10 commits May 27, 2024 13:55

Merge branch 'main' into OP#3648-create-streaming-track-parser

b1310d2

fixed minor merge bugs, implemented missing max_confidence_of method …

acdfbf6

…in SingletonTrackDataset

Merge branch 'main' into OP#3648-create-streaming-track-parser

3fa15dc

started parameterizing regression test with Stream and Bulk CLI Mode

52b649e

Added error messages to assertions when comparing file content

fixed assertion code after debugging cleanup

abdc0f7

Merge branch 'main' into OP#3648-create-streaming-track-parser

bf07c2a

Merge branch 'main' into OP#3648-create-streaming-track-parser

b11fafa

KueblerJelle added the python Pull requests that update Python code label Sep 18, 2024

KueblerJelle requested a review from briemla September 18, 2024 07:31

KueblerJelle self-assigned this Sep 18, 2024

ar0305 added 3 commits September 18, 2024 10:19

reset cli mode in benchmark to BULK

fb2c45f

fixed type annotations for write mode

fixed todo review comments -> moved to todo comments in pull request …

9cc8ce6

…comments removed unused class SingletonTrackDataset

removed todo

84abad0

KueblerJelle commented Sep 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Op#3648 create streaming track parser #554

Op#3648 create streaming track parser #554

KueblerJelle commented Sep 18, 2024 •

edited

Loading

KueblerJelle Sep 18, 2024

KueblerJelle Sep 18, 2024

KueblerJelle Sep 18, 2024

Op#3648 create streaming track parser #554

Are you sure you want to change the base?

Op#3648 create streaming track parser #554

Conversation

KueblerJelle commented Sep 18, 2024 • edited Loading

KueblerJelle Sep 18, 2024

Choose a reason for hiding this comment

KueblerJelle Sep 18, 2024

Choose a reason for hiding this comment

KueblerJelle Sep 18, 2024

Choose a reason for hiding this comment

KueblerJelle commented Sep 18, 2024 •

edited

Loading