Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: AWS Athena backend (and general AWS connections) #7682

Open
lostmygithubaccount opened this issue Dec 7, 2023 · 5 comments
Open

feat: AWS Athena backend (and general AWS connections) #7682

lostmygithubaccount opened this issue Dec 7, 2023 · 5 comments
Assignees
Labels
feature Features or general enhancements new backend PRs or issues related to adding new backends

Comments

@lostmygithubaccount
Copy link
Member

Consider how to support generic AWS authentication and backends for services, namely but not limited to Athena


Hi @lostmygithubaccount ,

Thanks for the reply and sorry for the delayed response here, I was bit occupied with other work so couldn't able spend time on this.

I had a look at the postgresql backends, but wondering about making a connection to athena using postgresql. At the moment All I have is aws credentials like AccessKeyId and SecretAccessKey. I am not sure how to pass these in the args.

If possible, could you please post a sample code snippet to make a connection to aws athena using postgresql backend ?

Originally posted by @uramith in #7229 (reply in thread)

@lostmygithubaccount lostmygithubaccount added the new backend PRs or issues related to adding new backends label Dec 7, 2023
@lostmygithubaccount lostmygithubaccount changed the title feat: AWS connection and Athena backend feat: AWS Athena backend (and general AWS connections) Dec 7, 2023
@lostmygithubaccount lostmygithubaccount added the feature Features or general enhancements label Dec 7, 2023
@jayceslesar
Copy link
Contributor

possibly could build off of https://github.com/laughingman7743/PyAthena? I remember suggesting this a long time ago but there were concerns on how it would be able to be tested without you know, having an AWS account in CI ect

@EnkiNibiru
Copy link

This feature would be valuable to me too. It'd probably be good to reuse functionality that's already common and built out in other AWS maintained packages.

For example, the AWS SDK for Pandas uses boto3 sessions for authentication. The authentication there includes a default session which will is a nice feature to connect to AWS so long as the environment is configured to work with other AWS tools like their AWS CLI. I tend to rely on the priority search authentication in there to autoload from a credentials/config file made with the AWS CLI to refresh any session tokens, but I know others may prefer refreshing standardized environment variables for AWS authentication instead.

One other pro for using this approach is that the config/credentials files used by boto3 sessions are also what pyarrow implemented for it's authentication into AWS and reading/writing parquet files to S3. So this may work nicely with the to_parquet/read_parquet and s3 file systems as well. Similarly it's what PyAthena mentioned above also uses. In practice this is also just nice to work with in my experience - get the aws authentication working once, and then I can use the same configs for multiple packages (AWS SDK for Pandas, PyArrow, boto3, PyAthena, etc)

Separate from the authentication topic...the AWS SDK for Pandas might make for a good SQL backend for Athena as well, as it implements the standard SQL operators directly in the Athena and Glue services. Likely that'd mean that the Ibis connection object would need to cover some config options, with the main one being different approaches in how to handle getting data from AWS back to the Python session that have a big impact on performance. But if all we need is authentication, then the SQL dialects in Trino (what Athena is based on) ought to get us pretty close too.

Hope the references above are helpful if this gets picked up, thanks!

@cpcloud
Copy link
Member

cpcloud commented Dec 19, 2023

Agreed that we should work towards making it easier to add support for backends that are ostensibly derivative of existing systems.

It's very likely that we won't get to this until after #7580 (or a sequence of its changes) are merged and released, as we'd like to get away from sqlalchemy before supporting more backends.

@gforsyth
Copy link
Member

If someone wants to try handing a pyathena DB-API connection object to the trino.from_connection constructor and see if that's tractable, we can look at what else might be required to make this work. The docs on PyAthena reference dumping query results as CSV to a bucket and then downloading that CSV -- if that's the pattern, we would probably hold off on adding this until there is proper ADBC support.

@gforsyth gforsyth closed this as not planned Won't fix, can't repro, duplicate, stale Jul 31, 2024
@github-project-automation github-project-automation bot moved this from backlog to done in Ibis planning and roadmap Jul 31, 2024
@cpcloud
Copy link
Member

cpcloud commented Dec 23, 2024

I've started working on an Athena backend.

@cpcloud cpcloud reopened this Dec 23, 2024
@cpcloud cpcloud self-assigned this Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements new backend PRs or issues related to adding new backends
Projects
Archived in project
Development

No branches or pull requests

5 participants