Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Log Storage (take 2) #1137

Merged
merged 3 commits into from
Mar 17, 2016
Merged

Remote Log Storage (take 2) #1137

merged 3 commits into from
Mar 17, 2016

Conversation

jlowin
Copy link
Member

@jlowin jlowin commented Mar 8, 2016

Replaces #1125. Depends on the GCSHook in #1119.

This extends the remote log storage to include Google Cloud Storage in addition to S3. It uses either of the two GCP packages, as available (see discussion in #1109, #1118, #1119).

In addition, per @criccomini's suggestion, it refactors the existing (and new) functionality to use hooks. This allows more fine-grained control over the log writing behavior/access.

@criccomini
Copy link
Contributor

Will have a look tomorrow.

# send log to S3
encrypt = configuration.get('core', 'ENCRYPT_S3_LOGS')
s3_key.set_contents_from_string(new_log, encrypt_key=encrypt)
store_s3_log(log, remote_log_location)
except:
print('Could not send logs to S3.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to do logging.error() instead of prints for these?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to follow the model elsewhere in cli.py -- print instead of logging. I thought maybe it was because the logs are being redirected for the task, so if there's an error you wouldn't actually see it unless you could load the log.

@criccomini
Copy link
Contributor

@jlowin let me know when you're ready for another pass

@jlowin
Copy link
Member Author

jlowin commented Mar 8, 2016

@criccomini I like the idea of factoring out the function -- I will try to get to it tomorrow.

@criccomini
Copy link
Contributor

@jlowin just checking in

@jlowin
Copy link
Member Author

jlowin commented Mar 12, 2016

@criccomini just committed a big refactor

@criccomini
Copy link
Contributor

Haven't forgotten about this. Just a bit back-logged at the moment. Sorry.

@criccomini
Copy link
Contributor

@jlowin can you squash commit?

Also, trying to test locally. How do I install the gcp_api package? My Python foo is lacking. I did a pip install -r requirements.txt and a python setup.py develop.

@jlowin
Copy link
Member Author

jlowin commented Mar 17, 2016

If you first cd to the airflow git repo, it should be pip install ".[gcp_api]" (with quotes) -- or potentially pip install -e ".[gcp_api]" if you want to leave the files where they are.

squashed:
update configuration

more descriptive comment

split remote log uploads into helper functions for S3 and GCS

read logs from s3

read logs from GCS

keep old_log as string

change name to log_base

better logging

overwrite in GCS

use current configuration var

objects could be none; don't check if they exist with method

allow s3 encryption from hook

fix capitalization typo

replace string search with indexing

add param docs

refactor remote log read/write into utility classes
@jlowin
Copy link
Member Author

jlowin commented Mar 17, 2016

I had some trouble loading the gcp_api hook (maybe because my setup is geared around gcloud -- I'm not sure!) so I really appreciate anything you can do to kick the tires with that package :)

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.68% when pulling 274d4e4 on jlowin:rls3 into b95559a on airbnb:master.

@criccomini
Copy link
Contributor

Awesome, thanks.

@criccomini
Copy link
Contributor

Was able to install. Ran some CLI-based tests:

>>> from oauth2client.client import SignedJwtAssertionCredentials, GoogleCredentials

This used to fail. Will run a few more checks in the morning, then merge. Thanks for the squash.

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.61% when pulling b91f6ba on jlowin:rls3 into b95559a on airbnb:master.

@criccomini
Copy link
Contributor

Based on testing, this needs to be: google-api-python-client<=1.4.2 (for gcp_api)

@criccomini
Copy link
Contributor

Other than that, looks good to me. If you fix version, I can merge.

@jlowin
Copy link
Member Author

jlowin commented Mar 17, 2016

@criccomini thanks, just added that cap

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.61% when pulling e2336fa on jlowin:rls3 into b95559a on airbnb:master.

criccomini added a commit that referenced this pull request Mar 17, 2016
Remote Log Storage (take 2)
@criccomini criccomini merged commit 4c73677 into apache:master Mar 17, 2016
@criccomini
Copy link
Contributor

Merged, thanks! Really looking forward to using this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants