Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bunch of issues I ran into when getting started #1082

Closed
bfirsh opened this issue Nov 25, 2020 · 5 comments
Closed

A bunch of issues I ran into when getting started #1082

bfirsh opened this issue Nov 25, 2020 · 5 comments
Labels
type/bug Something isn't working

Comments

@bfirsh
Copy link

bfirsh commented Nov 25, 2020

Hey fellow W20 company! 🙌

I was battling with setting up Singer, and then searched on Bookface and what do you know -- another company in our batch is making the thing I need. :)

I want to feed a bunch of different data sources (GitHub, a BigQuery table, Amplitude, etc) into BigQuery.

I thought it might be useful to talk through my story of how I set it up and what led me to giving up (temporarily, I hope). Might help to prioritize what is actually stopping users from on-boarding.

  1. I got it set it up on GCP, which was a bit clunky. I would much rather use Heroku, or something like that.
  2. The first thing I tried to set up was feeding GitHub stars into BigQuery. I got this error when setting up BigQuery:

I tried several times to re-create the service account. I then looked at the Compose logs and saw that I got the BigQuery project name wrong. It would have been useful to surface that error. I then kept on getting that error, and fixed it by giving the service account BigQuery admin permissions rather than the BigQuery User permissions that your docs suggest.

  1. Sync was "completed", but no data was showing up. Turns out there was an error in the logs:
2020-11-25 05:10:27 ERROR (/tmp/workspace/20/0) DefaultAirbyteStreamFactory(internalLog):108 - CRITICAL {"message":"Must have push access to view repository

Disabling all of the GitHub things except for stargazers and it seems to then sync stargazers. shrug

  1. The data for stargazers is now syncing correctly, but the "starred at" field is not converted to a date type. I can't see any way of doing that transform myself, and I'm not sure the best way to fix this in the code. Is this in the GitHub source? Or maybe it's in the normalizer?

Anyway -- this is the point I gave up. Hope it's useful!

┆Issue is synchronized with this Asana task by Unito

@bfirsh bfirsh added the type/bug Something isn't working label Nov 25, 2020
@michel-tricot
Copy link
Contributor

Hi @bfirsh! Thanks you for this super detailed report.

For 1. we have heroku on our list of platform to run airbyte. the current order is K8s (#1) & Heroku (#2)
For 2. @cgardens do you think we should display logs in the UI when we are checking credentials? (until we have a better way to identify errors?)
for 3. @sherifnada could it be that the container is not failing on this error? (the job should be marked as failed)
for 4. @sherifnada would #979 take care of it?

@ChristopheDuong
Copy link
Contributor

For 2. @cgardens do you think we should display logs in the UI when we are checking credentials? (until we have a better way to identify errors?)

I think it's pretty crucial to display some logs in the UI (everywhere, as much as possible) and probably avoid hiding older logs when retrying too.

When using other data loader SaaS, you do get access to the logs in the UI when setting up your integrations and it's super useful as an end-user!

In addition, when you can't figure out how to properly set things up if you made mistakes, you can more easily copy/paste the exception/errors to the support "chat" for further investigations.

@ChristopheDuong
Copy link
Contributor

This issue #462 might be related

@cgardens
Copy link
Contributor

I'll follow up tomorrow afternoon with a suggestion on what we'll do here. We know that error handling and debugging is too hard right now. We are going to spend some time tomorrow figuring out what the right story is here. The two (not mutually exclusive) options that are considering are:

  1. Better logging in the app anywhere that triggers a worker (check connection, discover schema, sync).
  2. Easier experience finding / accessing logs that airbyte produces.

@sherifnada
Copy link
Contributor

all the sub issues filed here have been fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants