Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

Removing messytables and adding ci automation #251

Conversation

s7clarke10
Copy link

Problem

  1. When doing discovery, tap-s3-csv uses a sample of the data from csv files to guess the most suitable type of each column in the files. This sometimes leads to type mismatch issues when an unexpected value, of different type e.g float number in a column that was interpreted as integer, appears somewhere in the file and breaks the pipeline.
  2. Currently, development and automated testing on this tap requires an AWS S3 bucket, this is a bottleck to running CI check for PRs from forks, as well as general dev/testing experience

Proposed changes

  1. All columns in the csv header would be interpreted as string types and drop the usage of messytables to guess the column types
  2. Use a local Minio server as an S3 server for dev and testing purpose.

Pros

  • No more runtime errors related to conflict between data and expected type.
  • Getting rid of old and seemingly unmaintained library messytables, which would help in future migrations to recent Python versions the library doesn't support
  • messytables
  • Integration tests will be automatically tested.

Cons

  • This is a breaking change.

Types of changes

What types of changes does your code introduce to PipelineWise?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

s7clarke10 and others added 30 commits June 17, 2022 06:47
Updating pipelinewise-singer-python to 2.0.* from 1.*
The option remove_character can be used to remove a particular character from a file. For example to remove double-quotes set remove_character='\"'.
Adding config for s3_proxies to explicitly set proxy for s3 traffic
Additional required change for proxy server
The fork of singer-encodings contains the update to allow selection of encodings
A new PR on the singer-encodings repository should bring in changes to allow selection of encoding
s7clarke10 and others added 28 commits May 23, 2023 17:04
…t_header_rec

Feature/create sample with just header rec
Updates the requirements on [pytest-cov](https://github.com/pytest-dev/pytest-cov) to permit the latest version.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](pytest-dev/pytest-cov@v3.0.0...v4.1.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [ujson](https://github.com/ultrajson/ultrajson) from 5.4.0 to 5.8.0.
- [Release notes](https://github.com/ultrajson/ultrajson/releases)
- [Commits](ultrajson/ultrajson@5.4.0...5.8.0)

---
updated-dependencies:
- dependency-name: ujson
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Updates the requirements on [pytest](https://github.com/pytest-dev/pytest) to permit the latest version.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@7.1.0.dev0...7.4.0)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Updates the requirements on [more-itertools](https://github.com/more-itertools/more-itertools) to permit the latest version.
- [Release notes](https://github.com/more-itertools/more-itertools/releases)
- [Commits](more-itertools/more-itertools@v8.12.0...v10.1.0)

---
updated-dependencies:
- dependency-name: more-itertools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [boto3](https://github.com/boto/boto3) from 1.26.138 to 1.28.30.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](boto/boto3@1.26.138...1.28.30)

---
updated-dependencies:
- dependency-name: boto3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
…gte-8.12-and-lt-10.2

Update more-itertools requirement from <9.2,>=8.12 to >=8.12,<10.2
…and-lt-7.5

Update pytest requirement from ==7.1.* to >=7.1,<7.5
…3.0-and-lt-4.2

Update pytest-cov requirement from <4.1,>=3.0 to >=3.0,<4.2
…0722

Patching dependent packages to latest version
Updates the requirements on [pylint](https://github.com/pylint-dev/pylint) to permit the latest version.
- [Release notes](https://github.com/pylint-dev/pylint/releases)
- [Commits](pylint-dev/pylint@pylint-3.0.0a0...v3.2.6)

---
updated-dependencies:
- dependency-name: pylint
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
…-and-lt-3.3

Update pylint requirement from <3.1,>=2.12 to >=2.12,<3.3
…thon

Moving to pypi version of realit-singer-python
…coding

Updating dependencies moving to pypi singer_encodings
@s7clarke10
Copy link
Author

Merged to incorrect repo - apologies

@s7clarke10 s7clarke10 closed this Aug 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants