Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuring sources/destinations e.g. with YAML #4

Open
jaceksan opened this issue Feb 28, 2024 · 8 comments
Open

Allow configuring sources/destinations e.g. with YAML #4

jaceksan opened this issue Feb 28, 2024 · 8 comments

Comments

@jaceksan
Copy link

Even we could implement readers for 3rd party configurations.
Personally, I implemented a reader for dbt profiles and can contribute with it to this project.

@karakanb
Copy link
Contributor

Thanks for the comment, although I must say I am a bit skeptical to the idea. One of the things that I intend to keep with ingestr is its opinionated simplicity, and having various file-based configuration or extra flexibility may go against that.

I'll give it a thought, and if you have ideas about how a source file could look like or CLI options would behave I'd be happy to read.

@jaceksan
Copy link
Author

My idea is to support both.
I do it exactly this way in our Python SDK:

  • A config exists, use it
  • If not, use arguments
  • If arguments are not used, read ENV variables
  • If empty, use a default or raise an error because it's mandatory(no meaningful default exists for an argument)

So the current behavior of your tool would stay the same.
But, if a config would be found in a default path or a path would be specified using a new argument, the config would be used.

Regarding the config format - well, you could be inspired by the dbt profiles file.
Not only is it battle-proofed but also very large community is behind dbt, which would simplify the onboarding.
dbt uses a templating for ENV variables. I implemented a parser extracting their names and am reading corresponding values from ENV.
It's open-sourced here.

@jaceksan
Copy link
Author

One more comment - specifying credentials in the URL is dangerous.

@akvadrako
Copy link

One more comment - specifying credentials in the URL is dangerous.

Agreed. At least the password (or full URI) should come from a file (ideally) or environmental variable. If you put the password in an argument it will be viewable by all users on the system and saved in shell history.

@karakanb
Copy link
Contributor

The URIs are already supported as env variables: https://github.com/bruin-data/ingestr/blob/main/ingestr/main.py#L80

In fact, I am of the opinion that env variables are far more suitable than having files also for ephemeral environments, e.g. github actions. I am not 100% sold on the idea that we need file-based configuration, but I am not strongly against either, I just don't see it as a priority.

Just to be clear: I don't see us supporting dbt-profiles since it's a platform that we do not have any influence over, therefore if we do support file-based configuration it'll be something specific to ingestr. It can be heavily inspired -or even the same, for that matter- as dbt profile definitions, but they'll evolve separately, and I would rather not play catch-up with that definition.

@jaceksan
Copy link
Author

jaceksan commented Mar 4, 2024

I agree regarding ENV vars, I use them heavily in Gitlab CI/GitHub actions.
I use ENV variables in all artifacts - Meltano, dbt, GoodData.
That is why I specify atomic ENV vars to be able to build any property in any artifact -
e.g. DB_NAME, DB_USER, but also DB-specific, e.g. SNOWFLAKE_WAREHOUSE.
So, if I do it this way, I should be able to set an ingestr ENV variable, e.g.:

export SOURCE_URI="postgresql://${DB_USER}:${DB_PWD}@${DB_HOST}:${DB_PORT}/${DB_NAME}?sslmode=disable"

That's sufficient for me now.

@karakanb
Copy link
Contributor

karakanb commented Mar 4, 2024

not sure if that's clear to you, but you can do what you are suggesting today already with ingestr. does that help?

@jaceksan
Copy link
Author

jaceksan commented Mar 4, 2024

That's what I tried to tell in my last comment:
I am satisfied with the current capabilities of ingestr, specifically with the ability to export the SOURCE_URI variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants