Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse profile mapping #353

Closed
wants to merge 3 commits into from

Conversation

roadan
Copy link

@roadan roadan commented Jun 30, 2023

Description

This PR adds Clickhouse profile mapping using a generic connection type. To prevent cosmos from attaching all generic connections, it uses a required field named clickhouse mapped to extra.clickhouse.

To ensure the profile is claimed, users must add the following JSON to the extra field in the connection:

{
    "clickhouse": "True"
}

Related Issue(s)

closes #95

Breaking Change?

Checklist

[ x] I have made corresponding changes to the documentation (if required)
[ x] I have added tests that prove my fix is effective or that my feature works

@roadan roadan requested a review from a team as a code owner June 30, 2023 01:48
@roadan roadan requested a review from a team June 30, 2023 01:48
@netlify
Copy link

netlify bot commented Jun 30, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit 59c4225
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/649e36946e5c4500089d1783

@roadan roadan changed the title initial commit for clickhouse profile mapping Clickhouse profile mapping Jun 30, 2023
@codecov
Copy link

codecov bot commented Jun 30, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.07%. Comparing base (1ebee49) to head (59c4225).
Report is 327 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #353      +/-   ##
==========================================
+ Coverage   88.91%   89.07%   +0.15%     
==========================================
  Files          39       41       +2     
  Lines        1299     1318      +19     
==========================================
+ Hits         1155     1174      +19     
  Misses        144      144              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jlaneve
Copy link
Collaborator

jlaneve commented Jul 4, 2023

This is looking great! Question for you - it looks like there’s a recommendation to use a SQLite connection for Clickhouse with Airflow (https://github.com/bryzgaloff/airflow-clickhouse-plugin). Do you think this should use generic or SQLite connection type?

@Gaploid
Copy link

Gaploid commented Jul 21, 2023

Wow! Clickhouse should defiantly support! Im voting up for this!

@epikhinm
Copy link

epikhinm commented Jul 21, 2023

Sorry for intervention but I don't think that using airflow-clickhouse-plugin) for dbt-clickhouse is a good idea actually.
dbt-clickhouse is github repo from original company ClickHouse Inc and airflow-cosmos should use bindings from official company, with a great team behind their backs.

Hi @silentsokolov . Could somebody from your company make a decision about future of this feature? All of us thrilled to ha ve the dbt-clickhouse in airflow-cosmos :)

@silentsokolov
Copy link

I am completely in favor of this feature. More tools for working with ClickHouse - that's awesome.

I'm not part of ClickHouse company, so it's better to ask @genzgd or @guykoh

@genzgd
Copy link

genzgd commented Jul 26, 2023

I don't claim to fully understand this project :), but I think this PR correctly uses dbt-clickhouse and not the airflow-clickhouse-plugin? Mapping values to the dbt-profile seems perfectly acceptable based on what I see.

@CorsettiS
Copy link
Contributor

@jlaneve since clickhouse is has no official support on airflow it makes more sense from my perspective to use a generic http connector instead of SQLlite, which is somehow hacky. This implementation looks very solid for me as of now.

@jlaneve
Copy link
Collaborator

jlaneve commented Jul 31, 2023

Hey folks, curious if we still want this profile mapping now that we have support for user-provided profiles.

Also worth noting that the Airflow Clickhouse integration recommends using a sqlite connection, so IMO this profile mapping should translate an Airflow sqlite connection to a dbt clickhouse profile, especially now that the recommendation is to explicitly import a profile mapping. This could look like:

from cosmos.profiles import ClickhouseUserPasswordProfileMapping

profile_config = ProfileConfig(
    profile_name="my_profile_name",
    target_name="my_target_name",
    profile_mapping=ClickhouseUserPasswordProfileMapping(
        conn_id="my_sqlite_connection", # use sqlite bc that's what airflow-clickhouse-plugin uses
        profile_args={
            "additional_arg": "my_value",
        },
    ),
)

@CorsettiS
Copy link
Contributor

The sqlite conn and HTTP generic conn both have the same args if I am not wrong, therefore it is not a big deal to use one or another. In my current company we use the http conn for clickhouse to avoid mixing it up. In the end it is just a naming convention.
I would advocate for keeping this PR relevant since it is easier, at least from a users perspective, to store the creds in airflow instead of in a profiles.yml file.

**self.profile_args,
}

return self.filter_null(profile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, could you add the mock_profile function & a test for it?
https://github.com/astronomer/astronomer-cosmos/blob/3786703609e69c1e8f4b2db1475fe8b6ea00a117/cosmos/profiles/base.py#L97C7-L97C7

This was introduced after this PR was created and is now required.


airflow_connection_type: str = "generic"
default_port = 9000

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roadan, sorry for the delay. Could you please rebase this branch and address the comments?

@jlaneve, at least three community members would like this feature; what do you think about approving it once the feedback has been addressed and tests passed? It seems valuable, it is well documented and tested.

@tatiana tatiana added this to the 1.3.0 milestone Oct 13, 2023
@tatiana tatiana added profile:clickhouse Related to Clickhouse ProfileConfig area:profile Related to ProfileConfig, like Athena, BigQuery, Clickhouse, Spark, Trino, etc labels Nov 9, 2023
@alexisvannier
Copy link

alexisvannier commented Nov 13, 2023

Would be very nice to see this PR merged. 👍
Is it possible to help for testing ?

@vargacypher
Copy link

Would be very nice to see this PR merged. 👍
Is it possible to help in something ???

@tatiana tatiana added the status:awaiting-reviewer The issue/PR is awaiting for a reviewer input label Dec 15, 2023
@tatiana tatiana modified the milestones: 1.3.0, 1.4.0 Jan 4, 2024
@vargacypher
Copy link

@roadan could i colaborate on Unresolved conversations ?
We really want this feature.

@tatiana tatiana modified the milestones: 1.4.0, 1.5.0 Apr 25, 2024
@tatiana tatiana added triage-needed Items need to be reviewed / assigned to milestone and removed triage-needed Items need to be reviewed / assigned to milestone labels May 17, 2024
@tatiana tatiana mentioned this pull request May 17, 2024
@pankajastro
Copy link
Contributor

Hey @roadan, It looks like we're very close to merging! Could you please rebase and add the mock_profile property? Let me know if there's anything I can do to help. thanks!

@pankajastro
Copy link
Contributor

Hi @roadan, I was wondering if you had a chance to read my previous comment. It seems like this feature is very important for the community. If it's alright with you, can I take it forward from here? Thanks!

@pankajastro pankajastro removed the status:awaiting-reviewer The issue/PR is awaiting for a reviewer input label May 31, 2024
@tatiana
Copy link
Collaborator

tatiana commented Jun 4, 2024

@roadan @vargacypher @alexisvannier @CorsettiS @genzgd @epikhinm @Gaploid

Since this work is still relevant, and the PR got stale, @pankajastro is rebasing the original implementation in #1016, so we can get this work merged and released as part of Cosmos 1.5. If we release an alpha, would one of you be able to help us testing this feature?

@tatiana tatiana added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 4, 2024
@vargacypher
Copy link

I could help on tests @tatiana

tatiana pushed a commit that referenced this pull request Jun 6, 2024
This PR adds Clickhouse profile mapping using a `generic` connection
type. To prevent cosmos from attaching all generic connections, it uses
a required field named `clickhouse` mapped to `extra.clickhouse`.

To ensure the profile is claimed, users must add the following JSON to
the extra field in the connection:
```JSON
{
    "clickhouse": "True"
}
```
Co-authored-by: Yaniv Rodenski <roadan@gmail.com>

Original PR by @roadan:
#353

Closes #95
@tatiana
Copy link
Collaborator

tatiana commented Jun 6, 2024

@vargacypher, thanks a lot; we just merged #1016 and let you know once we have an alpha with this feature!

@tatiana tatiana closed this Jun 6, 2024
@dosubot dosubot bot removed this from the Cosmos 1.5.0 milestone Jun 6, 2024
@pankajkoti pankajkoti mentioned this pull request Jun 27, 2024
tatiana pushed a commit that referenced this pull request Jun 27, 2024
New Features

* Speed up ``LoadMode.DBT_LS`` by caching dbt ls output in Airflow
Variable by @tatiana in #1014
* Support to cache profiles created via ``ProfileMapping`` by
@pankajastro in #1046
* Support for running dbt tasks in AWS EKS in #944 by @VolkerSchiewe
* Add Clickhouse profile mapping by @roadan and @pankajastro in #353 and
#1016
* Add node config to TaskInstance Context by @linchun3 in #1044

Bug fixes

* Support partial parsing when cache is disabled by @tatiana in #1070
* Fix disk permission error in restricted env by @pankajastro in #1051
* Add CSP header to iframe contents by @dwreeves in #1055
* Stop attaching log adaptors to root logger to reduce logging costs by
@glebkrapivin in #1047

Enhancements

* Support ``static_index.html`` docs by @dwreeves in #999
* Support deep linking dbt docs via Airflow UI by @dwreeves in #1038
* Add ability to specify host/port for Snowflake connection by @whummer
in #1063

Docs

* Fix rendering for env ``enable_cache_dbt_ls`` by @pankajastro in #1069

Others

* Update documentation for DbtDocs generator by @arjunanan6 in #1043
* Use uv in CI by @dwreeves in #1013
* Cache hatch folder in the CI by @tatiana in #1056
* Change example DAGs to use ``example_conn`` as opposed to
``airflow_db`` by @tatiana in #1054
* Mark plugin integration tests as integration by @tatiana in #1057
* Ensure compliance with linting rule D300 by using triple quotes for
docstrings by @pankajastro in #1049
* Pre-commit hook updates in #1039, #1050, #1064
* Remove duplicates in changelog by @jedcunningham in #1068
@tatiana tatiana mentioned this pull request Jun 27, 2024
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
This PR adds Clickhouse profile mapping using a `generic` connection
type. To prevent cosmos from attaching all generic connections, it uses
a required field named `clickhouse` mapped to `extra.clickhouse`.

To ensure the profile is claimed, users must add the following JSON to
the extra field in the connection:
```JSON
{
    "clickhouse": "True"
}
```
Co-authored-by: Yaniv Rodenski <roadan@gmail.com>

Original PR by @roadan:
astronomer#353

Closes astronomer#95
arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this pull request Jul 14, 2024
New Features

* Speed up ``LoadMode.DBT_LS`` by caching dbt ls output in Airflow
Variable by @tatiana in astronomer#1014
* Support to cache profiles created via ``ProfileMapping`` by
@pankajastro in astronomer#1046
* Support for running dbt tasks in AWS EKS in astronomer#944 by @VolkerSchiewe
* Add Clickhouse profile mapping by @roadan and @pankajastro in astronomer#353 and
astronomer#1016
* Add node config to TaskInstance Context by @linchun3 in astronomer#1044

Bug fixes

* Support partial parsing when cache is disabled by @tatiana in astronomer#1070
* Fix disk permission error in restricted env by @pankajastro in astronomer#1051
* Add CSP header to iframe contents by @dwreeves in astronomer#1055
* Stop attaching log adaptors to root logger to reduce logging costs by
@glebkrapivin in astronomer#1047

Enhancements

* Support ``static_index.html`` docs by @dwreeves in astronomer#999
* Support deep linking dbt docs via Airflow UI by @dwreeves in astronomer#1038
* Add ability to specify host/port for Snowflake connection by @whummer
in astronomer#1063

Docs

* Fix rendering for env ``enable_cache_dbt_ls`` by @pankajastro in astronomer#1069

Others

* Update documentation for DbtDocs generator by @arjunanan6 in astronomer#1043
* Use uv in CI by @dwreeves in astronomer#1013
* Cache hatch folder in the CI by @tatiana in astronomer#1056
* Change example DAGs to use ``example_conn`` as opposed to
``airflow_db`` by @tatiana in astronomer#1054
* Mark plugin integration tests as integration by @tatiana in astronomer#1057
* Ensure compliance with linting rule D300 by using triple quotes for
docstrings by @pankajastro in astronomer#1049
* Pre-commit hook updates in astronomer#1039, astronomer#1050, astronomer#1064
* Remove duplicates in changelog by @jedcunningham in astronomer#1068
tatiana pushed a commit that referenced this pull request Jul 17, 2024
New Features

* Speed up ``LoadMode.DBT_LS`` by caching dbt ls output in Airflow
Variable by @tatiana in #1014
* Support to cache profiles created via ``ProfileMapping`` by
@pankajastro in #1046
* Support for running dbt tasks in AWS EKS in #944 by @VolkerSchiewe
* Add Clickhouse profile mapping by @roadan and @pankajastro in #353 and
#1016
* Add node config to TaskInstance Context by @linchun3 in #1044

Bug fixes

* Support partial parsing when cache is disabled by @tatiana in #1070
* Fix disk permission error in restricted env by @pankajastro in #1051
* Add CSP header to iframe contents by @dwreeves in #1055
* Stop attaching log adaptors to root logger to reduce logging costs by
@glebkrapivin in #1047

Enhancements

* Support ``static_index.html`` docs by @dwreeves in #999
* Support deep linking dbt docs via Airflow UI by @dwreeves in #1038
* Add ability to specify host/port for Snowflake connection by @whummer
in #1063

Docs

* Fix rendering for env ``enable_cache_dbt_ls`` by @pankajastro in #1069

Others

* Update documentation for DbtDocs generator by @arjunanan6 in #1043
* Use uv in CI by @dwreeves in #1013
* Cache hatch folder in the CI by @tatiana in #1056
* Change example DAGs to use ``example_conn`` as opposed to
``airflow_db`` by @tatiana in #1054
* Mark plugin integration tests as integration by @tatiana in #1057
* Ensure compliance with linting rule D300 by using triple quotes for
docstrings by @pankajastro in #1049
* Pre-commit hook updates in #1039, #1050, #1064
* Remove duplicates in changelog by @jedcunningham in #1068

(cherry picked from commit 18d2c90)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:profile Related to ProfileConfig, like Athena, BigQuery, Clickhouse, Spark, Trino, etc epic-assigned profile:clickhouse Related to Clickhouse ProfileConfig stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for dbt-clickhouse adapter