Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1s1t CatalogParser should handle raw table name collisions #27798

Closed
edgao opened this issue Jun 28, 2023 · 8 comments · Fixed by #28366
Closed

1s1t CatalogParser should handle raw table name collisions #27798

edgao opened this issue Jun 28, 2023 · 8 comments · Fixed by #28366
Assignees
Labels
1-stream~1-table change the format of the final table and any airbyte required tables team/destinations Destinations team's backlog

Comments

@edgao
Copy link
Contributor

edgao commented Jun 28, 2023

In 1s1t, two streams may want to write to the same raw table. E.g. public.users_foo and public_users.foo would both naively want to write to airbyte.public_users_foo. CatalogParserTest#rawNameCollision captures this, but is currently disabled.

Decide how we want to resolve these collisions, then implement that decision in CatalogParser#parseCatalog. Maybe options:

  • use a more interesting delimiter between the namespace+name (e.g. write to airbyte.public__airbyte_raw__users_foo + airbyte.public_users__airbyte_raw__foo)
  • add some more characters to colliding names (incrementing counter, more underscores, etc)
  • something else?
@edgao edgao added the team/db-dw-sources Backlog for Database and Data Warehouse Sources team label Jun 28, 2023
@aaronsteers
Copy link
Collaborator

Related:

  • airbytehq/airbyte-internal-issues#1877

@rodireich
Copy link
Contributor

Grooming meeting 6/28:
Sources team to investigate whether this usage of '_' is generated in source.

@edgao
Copy link
Contributor Author

edgao commented Jun 28, 2023

uhhhhhh I may have screwed up a zendesk thing. This is definitely a destinations issue. Not sure why I tagged it as db sources, that's my bad.

image

@edgao edgao added team/destinations Destinations team's backlog and removed team/db-dw-sources Backlog for Database and Data Warehouse Sources team labels Jun 28, 2023
@rodireich
Copy link
Contributor

Thanks Ed!

@jbfbell
Copy link
Contributor

jbfbell commented Jun 29, 2023

@edgao did I make a duplicate of this in #27817 ? or is this something else

@edgao
Copy link
Contributor Author

edgao commented Jun 29, 2023

I think they're slightly different and worth having separate issues. This issue is: within a single connection, user has two streams public.users_foo and public_users.foo. This is solvable purely by making CatalogParser resolve those conflicts, because we have both streams in our catalog.

Your issue is: across multiple connections (or even outside of airbyte entirely) the user has an airbyte schema with tables in it already. That's not solvable in CatalogParser, and presumably requires a lot more thought.

@jbfbell jbfbell added the 1-stream~1-table change the format of the final table and any airbyte required tables label Jul 7, 2023
@jbfbell
Copy link
Contributor

jbfbell commented Jul 7, 2023

When working on this ticket, consider cleaning up the parseCatalog method in CatalogParser per this discussion

@evantahler
Copy link
Contributor

Grooming:

  • we should do something, but something small. To start, let's use __ as a delimiter and we can rev on this as we see bugs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1-stream~1-table change the format of the final table and any airbyte required tables team/destinations Destinations team's backlog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants