S3 Search Improvements #1391

ssalinas · 2017-01-04T19:24:37Z

Opening this early for any feedback.

Currently, all data about which files to upload to s3, and any possible bucket or key pattern overrides is stored in the executor configuration. This unfortunately means that SingularityService has no idea if it needs to search a separate bucket/pattern to find logs. It can only search what it knows about from it's defaults and the request group overrides.

Notable changes so far:

s3 file and file name related configurations are moved from the executor to the service. This way it is available for searching and/or ui use more easily
loggingS3Bucket removed from ExecutorData on the deploy. We have overrides by group and by file name for this purpose. If we want a request level override separate from the group one we can add that later
The executor data passed to the custom executor is now the SingularityTaskExecutorData object not the ExecutorData from the deploy. This way we can pass additional information to the custom executor from system-wide configuration.
S3LogResource will check the now available s3 additional files to determine if additional buckets need to be searched

Still TODO:

Optionally skip get/download url generation to improve efficiency of the logs endpoints
Generate proper prefixes for key pattern overrides
Add request group as possible param in the uploader key pattern
Paginate the s3 search endpoint
Docs for config changes

Note - SingularityService needs to be deployed before the executor for this particular change

/cc @tpetr

ssalinas · 2017-01-06T17:05:28Z

FYI, now building this on top of the commits from #1375 . It modifies/moves a bunch of the same files and the merges are a bit horrid otherwise

jets3t -> aws sdk

@consumes

need @consumes on POST endpoints more s3 pagination tweaks more attempts at better pagination need >= here fix maxPerPage more robust search options add missing file fix for findbugs continuation token format needs group too use isTruncated for ending re-request to respect page size revert re-request for page size missing tokens gives false positive end of content typo

Implement pagination for S3 endpoint using continuation tokens

…work

ssalinas · 2017-01-31T18:19:57Z

Added the docs for this now, going to merge

wsorenson added 3 commits December 14, 2016 14:04

allows setting storage class at upload time

57f3ffe

allow each additional file to specify s3 storage class

e26e8dc

add debug stmt

391295c

ssalinas force-pushed the s3_rework branch from fd4dad8 to 4690e9e Compare January 4, 2017 20:23

ssalinas mentioned this pull request Jan 4, 2017

Log start and end time attrs for s3 uploads #1383

Merged

ssalinas force-pushed the s3_rework branch from 4690e9e to d601872 Compare January 4, 2017 20:33

ssalinas added the Configuration Changes label Jan 5, 2017

ssalinas force-pushed the s3_rework branch 2 times, most recently from ef89fc8 to 9c6ad9a Compare January 5, 2017 15:40

ssalinas changed the title ~~(WIP) Move S3 file configuration to SingularityService~~ Move S3 file configuration to SingularityService Jan 5, 2017

ssalinas force-pushed the s3_rework branch from 9c6ad9a to 2ea0702 Compare January 6, 2017 16:29

ssalinas added 2 commits January 6, 2017 11:46

fix merge conflicts

bb55ae8

Move S3 file configuration to SingularityService

13d2fe9

ssalinas force-pushed the s3_rework branch from 2ea0702 to 13d2fe9 Compare January 6, 2017 17:04

ssalinas added the hs_staging label Jan 6, 2017

jets3t -> aws sdk

84e112b

ssalinas changed the title ~~Move S3 file configuration to SingularityService~~ S3 Search Improvements Jan 9, 2017

ssalinas and others added 2 commits January 9, 2017 09:45

use int here

fcde16e

Merge pull request #1394 from HubSpot/jets_to_aws

9db459b

jets3t -> aws sdk

ssalinas added the hs_qa label Jan 10, 2017

ssalinas and others added 8 commits January 10, 2017 11:24

cleanup needs to know defaults for transition to new format

0c9659a

cleaner check for default service log name

508ee73

more working on getting executor cleanup to run proerply

6811cb0

this should be in cleanup configuration now s3

3e0d3cf

add support for extra buckets to search for logs

ace58e7

Merge pull request #1395 from HubSpot/s3_pagination

8242609

Implement pagination for S3 endpoint using continuation tokens

Merge branch 's3_rework' of github.com:HubSpot/Singularity into s3_re…

8d52018

…work

ssalinas added this to the 0.14.0 milestone Jan 11, 2017

ssalinas added 7 commits January 11, 2017 10:41

search the additional buckets when present

8861141

do not search prefixes that will return duplicate results

c04a98e

don't need group here after all

95d40fd

fix prefix trim

ddb6e43

need to use length + startsWith, not contains

9de457d

multipart upload needs key as well

8d157e3

Also add file arg

6fb4c2c

ssalinas added the hs_stable label Jan 18, 2017

ssalinas added 3 commits January 20, 2017 16:43

Set metadata here as well

5b23fe9

user metadata is separate

d426dea

add docs for s3 changes

5e6d768

ssalinas merged commit cfbab9c into master Jan 31, 2017

ssalinas deleted the s3_rework branch January 31, 2017 18:20

ssalinas removed hs_qa labels Jan 31, 2017

ssalinas mentioned this pull request Jan 31, 2017

Add ability to upload files immediately to S3 #1399

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 Search Improvements #1391

S3 Search Improvements #1391

ssalinas commented Jan 4, 2017 •

edited

Loading

ssalinas commented Jan 6, 2017

ssalinas commented Jan 31, 2017

S3 Search Improvements #1391

S3 Search Improvements #1391

Conversation

ssalinas commented Jan 4, 2017 • edited Loading

ssalinas commented Jan 6, 2017

ssalinas commented Jan 31, 2017

ssalinas commented Jan 4, 2017 •

edited

Loading