Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Encapsulation #15

Merged
merged 30 commits into from
Aug 22, 2022
Merged

feat!: Encapsulation #15

merged 30 commits into from
Aug 22, 2022

Conversation

jshlbrd
Copy link
Contributor

@jshlbrd jshlbrd commented Aug 22, 2022

Description

  • Makes encapsulation the default pattern for handling data in all Substation applications and public APIs (see changes in /config/)
    • Adds backwards compatible functions for handling non-encapsulated data
  • Adds AWS Firehose sink support
  • Adds AWS SQS source and sink support
  • Adds context support condition API
  • Updates code style of public APIs

This is a non-breaking change for ITL applications and a breaking change for the public APIs.

Motivation and Context

The motivation for adding encapsulation was the need to let users simultaneously handle both data (structured or unstructured) and metadata. This has two advantages:

  • sources can add metadata when data is ingested
  • metadata can be tracked alongside unstructured (e.g., binary) data

This change means that users can now access and interpret information only the source application knows; for example:

  • substation/file: the filename and file size of the source file that data is read from
  • substation/aws/kinesis: the Kinesis stream data is delivered by and approximate arrival time of data
  • substation/aws/s3: the S3 bucket and object that data is read from

It also provides total separation of metadata from data, which is most useful in sink applications; for example:

  • http: storing HTTP headers as metadata instead of in the data object
  • s3: storing the S3 prefix as metadata instead of in the data object
  • sumologic: storing the category as metadata instead of in the data object

This gives us some flexibility for growth in the future. For example, instead of relying on configured settings for Kinesis streams and S3 buckets, we can add features to let users dynamically apply these from metadata.

How Has This Been Tested?

  • All public APIs had their unit tests updated
  • Firehose and SQS features were tested in a development AWS account

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@jshlbrd
Copy link
Contributor Author

jshlbrd commented Aug 22, 2022

Go lint is failing due to an unavoidable issue mocking an SQS API call for a unit test. The fix for this is to stop using golint (it's deprecated) and switch to staticcheck (recommended replacement), we'll handle that in a different PR.

@jshlbrd jshlbrd marked this pull request as ready for review August 22, 2022 14:40
@jshlbrd jshlbrd requested a review from a team as a code owner August 22, 2022 14:40
@jshlbrd jshlbrd merged commit e46e780 into main Aug 22, 2022
@jshlbrd jshlbrd deleted the jshlbrd/encapsulation branch August 22, 2022 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant