Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source S3: "decimal" type added for parquet #14911

Merged
merged 4 commits into from
Jul 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -825,7 +825,7 @@
- name: S3
sourceDefinitionId: 69589781-7828-43c5-9f63-8925b1c1ccc2
dockerRepository: airbyte/source-s3
dockerImageTag: 0.1.16
dockerImageTag: 0.1.17
documentationUrl: https://docs.airbyte.io/integrations/sources/s3
icon: s3.svg
sourceType: file
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7823,7 +7823,7 @@
supportsNormalization: false
supportsDBT: false
supported_destination_sync_modes: []
- dockerImage: "airbyte/source-s3:0.1.16"
- dockerImage: "airbyte/source-s3:0.1.17"
spec:
documentationUrl: "https://docs.airbyte.io/integrations/sources/s3"
changelogUrl: "https://docs.airbyte.io/integrations/sources/s3"
Expand Down
2 changes: 1 addition & 1 deletion airbyte-integrations/connectors/source-s3/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ COPY source_s3 ./source_s3
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.1.16
LABEL io.airbyte.version=0.1.17
LABEL io.airbyte.name=airbyte/source-s3
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
"boolean": ("boolean", ["BOOLEAN"], None),
"number": ("number", ["DOUBLE", "FLOAT"], None),
"integer": ("integer", ["INT32", "INT64", "INT96"], None),
"decimal": ("number", ["INT32", "INT64", "FIXED_LEN_BYTE_ARRAY"], None),
# supported by PyArrow types
"timestamp": ("string", ["INT32", "INT64", "INT96"], lambda v: v.isoformat()),
"date": ("string", ["INT32", "INT64", "INT96"], lambda v: v.isoformat()),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import tracemalloc
from abc import ABC, abstractmethod
from datetime import datetime, timedelta
from decimal import Decimal
from functools import lru_cache, wraps
from typing import Any, Callable, List, Mapping

Expand Down Expand Up @@ -106,6 +107,8 @@ def _generate_value(cls, typ: str) -> Any:
elif typ == "time":
dt = cls._generate_value("timestamp")
return dt.time() if dt else None
elif typ == "decimal":
return Decimal((0, tuple([random.randint(1, 9) for _ in range(10)]), -4))
raise Exception(f"not supported type: {typ}")

@classmethod
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ def cases(cls) -> Mapping[str, Any]:
"degrees": "number",
"birthday": "string",
"last_seen": "string",
"salary": "decimal",
"created_at": "timestamp",
"created_date_at": "date",
"created_time_at": "time",
Expand Down Expand Up @@ -200,12 +201,14 @@ def cases(cls) -> Mapping[str, Any]:
"degrees": -9.2,
"birthday": cls._generate_value("string"),
"last_seen": cls._generate_value("string"),
"salary": cls._generate_value("decimal"),
"created_at": cls._generate_value("timestamp"),
"created_date_at": cls._generate_value("date"),
"created_time_at": cls._generate_value("time"),
}

expected_record = copy.deepcopy(test_record)
expected_record["salary"] = ParquetParser.convert_field_data("decimal", expected_record["salary"])
expected_record["created_date_at"] = ParquetParser.convert_field_data("date", expected_record["created_date_at"])
expected_record["created_time_at"] = ParquetParser.convert_field_data("time", expected_record["created_time_at"])
expected_record["created_at"] = ParquetParser.convert_field_data("timestamp", expected_record["created_at"])
Expand Down
5 changes: 3 additions & 2 deletions docs/integrations/sources/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,8 +195,9 @@ The avro parser uses [fastavro](https://fastavro.readthedocs.io/en/latest/). Cur

| Version | Date | Pull Request | Subject |
|:--------|:-----------|:----------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------|
| 0.1.16 | 2022-07-13 | [14669](https://github.com/airbytehq/airbyte/pull/14669) | Fixed bug when extra columns apeared to be non-present in master schema |
| 0.1.15 | 2022-05-31 | [12568](https://github.com/airbytehq/airbyte/pull/12568) | Fixed possible case of files being missed during incremental syncs |
| 0.1.17 | 2022-07-21 | [14911](https://github.com/airbytehq/airbyte/pull/14911) | "decimal" type added for parquet |
| 0.1.16 | 2022-07-13 | [14669](https://github.com/airbytehq/airbyte/pull/14669) | Fixed bug when extra columns apeared to be non-present in master schema |
| 0.1.15 | 2022-05-31 | [12568](https://github.com/airbytehq/airbyte/pull/12568) | Fixed possible case of files being missed during incremental syncs |
| 0.1.14 | 2022-05-23 | [11967](https://github.com/airbytehq/airbyte/pull/11967) | Increase unit test coverage up to 90% |
| 0.1.13 | 2022-05-11 | [12730](https://github.com/airbytehq/airbyte/pull/12730) | Fixed empty options issue |
| 0.1.12 | 2022-05-11 | [12602](https://github.com/airbytehq/airbyte/pull/12602) | Added support for Avro file format |
Expand Down