-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Op#3648 create streaming track parser #554
base: main
Are you sure you want to change the base?
Conversation
add lazy version of track dataset and detection parser
Detection parser can accept a stream of file contents to lazily load files only when they are needed
… made to TrackDataset in subclass
…ode duplication with python track parser make some functionality reusable through classmethod use bzip 2 json stream reader to sort files by start date
renamed BulkPythonDetectionParser back to PythonDetectionParser extracted methods for creating PythonDetections and PythonTracks as static methods on module level
i have some old ottrk files for testing that could no longer be parsed a new metadata format fixer fixes the old date string format by transforming it to timestamp format TODO: review versions of fixer
clean up streaming parser and add more doc strings
found potential memory leak: delete det_list in StreamOttrkParser.parser() OttrkVideoParser crashed in a manual test because the metadata format did not match -> apply format fixer
also reduced code duplication in cli.py and main_application.py also fixed/adapted tests
…b.com/OpenTrafficCam/OTAnalytics into OP#3648-create-streaming-track-parser
fixed test_cli patches
…in SingletonTrackDataset
now a chunk size can be specified, tracks will be loaded from the track files until n tracks are collected the collected tracks are transformed into a TrackDataset, the caller of StreamingOttrkParser can specify a TrackDataset factory
…o be stateful to allow for incremental result exports Added ExportMode to various export specification data classes: ExportMode Overwrite for non incremental result export ExportModes INITIAL_MERGE, MERGE to append data ExportMode FLUSH to write out results aggregated in stateful exporters All exporters now consider this export mode Introduced missing EventExportSpecification to be more similar to other exporters If first event export has no events the resulting csv/excel have no header, as the data frame is empty, therefore specified all columns as header list Fixed bug in EventRepository: clear previously only checked for the existence of (section)events, if none existed nothing was cleared, even if non_section_events were present stateful exporters for counts require to be cached in a newly added CachedExporterFactory Added merge_into_dict functions to TracksMetadata and VideosMetadata to allow for incrementally updating a dict with metadata updates
fixed bugs: no longer use Counter in CsvExport of COunts, instead manually increment counted tags using defaultdict Streaming parser now has a set of VideosMetadata and TracksMetadata. instead of a list, to avoid duplications write event list data frame even if empty OTAnalyticsStreamCli and OTAnalyticsBulkCli both use same attribute name _track_parser
Added error messages to assertions when comparing file content
fixed type annotations for write mode
…comments removed unused class SingletonTrackDataset
""" | ||
Fix formate changes from older ottrk metadata | ||
format versions to the current version. | ||
""" | ||
current_version = Version.from_str(metadata[ottrk_format.OTTRK_VERSION]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@briemla Review: fixing metadata alone can be reused in streaming_parser, hence I made it public.
This might also apply to fixing single detection data, when they are parsed lazy (but currently not used).
def fix(self, metadata: dict, current_version: Version) -> dict: | ||
return self.__fix_recorded_start_date(metadata, current_version) | ||
|
||
def __fix_recorded_start_date( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@briemla Review: Not sure in which version this changed
@@ -341,6 +391,65 @@ def __str__(self) -> str: | |||
) | |||
|
|||
|
|||
# TODO Review: these methods for creating PythonDetections and PythonTracks are static | |||
# TODO and could live outside a class for reusability in streaming_parser | |||
def parse_python_detection( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@briemla Review: these methods for creating PythonDetections and PythonTracks are static and could live outside a class for reusability in streaming_parser
Introduced streaming track and detection parser.
Detections in json format are parsed as streams and grouped to tracks.
The finished flag marks the end of a track. After parsing the last detection of a track, that track is provided to the stream of tracks.
Added new implementation of Cli and added command line arguments:
cli_mode (stream or bulk), cli_chunk_size (in case of streaming mode, this confidures the number of tracks to be collected before performing cuting/counting etc.)
To reduce the memory consumption, track and and event repository should be cleared regularly.
This however requires the export of results to work incrementally.
Therefore an export mode is introduced:
Adapted all current result export implementations to respect the export mode.
Parameterized test_cli and regression_otanalytics to test both cli_modes.