-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Promtail, scrape logs from Object store #2270
Conversation
func (r *objectReader) getReader(reader io.ReadCloser) (*bufio.Reader, error) { | ||
// identify the file type | ||
buf := [3]byte{} | ||
n, err := io.ReadAtLeast(reader, buf[:], len(buf)) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
rd := io.MultiReader(bytes.NewReader(buf[:n]), reader) | ||
if isGzip(buf) { | ||
r, err := gzip.NewReader(rd) | ||
if err != nil { | ||
return nil, err | ||
} | ||
rd = r | ||
} | ||
|
||
bufReader := bufio.NewReader(rd) | ||
return bufReader, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyriltovena I need some help here. I am not sure If I am doing this in a proper way.
Codecov Report
@@ Coverage Diff @@
## master #2270 +/- ##
==========================================
+ Coverage 61.65% 61.67% +0.02%
==========================================
Files 160 163 +3
Lines 13565 13836 +271
==========================================
+ Hits 8363 8533 +170
- Misses 4579 4658 +79
- Partials 623 645 +22
|
pkg/promtail/targets/awss3target.go
Outdated
|
||
// NewFileTarget create a new FileTarget. | ||
func NewS3Target(logger log.Logger, handler api.EntryHandler, positions positions.Positions, jobName string, objectClient chunk.ObjectClient, s3Config *scrape.S3Targetconfig) (*S3Target, error) { | ||
// // object store sync period should not be less than 1 minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some code comments :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😃 will fix
pkg/promtail/targets/awss3target.go
Outdated
for { | ||
select { | ||
case <-ticker.C: | ||
objects, _, err := t.objectClient.List(context.Background(), t.prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this will hang on large buckets, but I think that's fine for an initial feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you are right. We have to add multi part download for larger size files. Added a note here https://github.com/grafana/loki/blob/ab130b09eb965c3baa8f94d02c63b15704b1c300/docs/clients/promtail/configuration.md#s3_config
pkg/promtail/targets/awss3target.go
Outdated
} | ||
|
||
// skip any object which is already read completely | ||
if modifiedAt == object.ModifiedAt.UnixNano() && pos > 0 && pos == size { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need the pos > 0
check? It seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a chance that Promtail shutdowns without reading a single line. On restart pos
and size
will be 0
.So, just checking pos == size
will ignore the object.
pkg/promtail/targets/awss3target.go
Outdated
) | ||
|
||
var ( | ||
s3ReadBytes = promauto.NewGaugeVec(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid using path
for these metrics. We're creating metrics in accordance with our stream cardinality here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path
here means object
name. What do you suggest here? Just consider the bucker name instead of each individual object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool first draft!
I'm worried about the metric cardinality (more in comments) and that promtail would try to list entire s3 buckets per sync period. I suspect this will work as long as users configure low per-bucket retention and don't have lots of throughput in said bucket.
Another way to ease this could be an optional flag for deleting s3 files once read. WDYT?
I'll defer to @cyriltovena's judgment for your compression question.
Yep, you are correct. This is just a very simple way of fetching and scraping logs. In future versions we want to support SQS integration. Using SQS we don't have to List the objects in bucket, we just look for the object we want. Mentioned the same in docs as well |
fbc1093
to
0a2b793
Compare
BucketNames: cfg.S3Config.BucketName, | ||
Endpoint: cfg.S3Config.Endpoint, | ||
Region: cfg.S3Config.Region, | ||
AccessKeyID: cfg.S3Config.AccessKeyID, | ||
SecretAccessKey: cfg.S3Config.SecretAccessKey, | ||
Insecure: cfg.S3Config.Insecure, | ||
S3ForcePathStyle: cfg.S3Config.S3ForcePathStyle, | ||
HTTPConfig: awsHTTPConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dependent on #2318
0a2b793
to
0d8dd14
Compare
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
Any update on this? I'd be happy to contribute if i can be of help. |
…ge (grafana#2270) * added FIFO cache metrics for current number of entries and memory usage fixed bug in updating last element of FIFO updated unit tests Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * fixed "make mod-check" Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * Revert "fixed "make mod-check"" This reverts commit fe00ab880b7d7dfacfb9a7af1f38ebd3fe74e1ba. Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * addressed comments Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * addressed comments Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com>
Would love to help out with getting this PR across. There is a use case where we need it too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to hear more about the deployment model of Promtail in this case.
It doesn't seems that you can scale them because the position file is local.
I also think the documentation is a bit light on how to configure SQS at least a reference on how to activate SQS with update on S3 seems required.
You need some sort of disclaimer like and explain how:
This target uses SQS queue on the region same as S3 bucket. We must setup SQS queue and S3 event notification before use this plugin.
1. Create new SQS queue (use same region as S3)
2. Set proper permission to new queue
3. [Configure S3 event notification](http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html)
4. Start a Promtail instance
I think the code is almost there, but small effort on the documentation is required.
Lastly I'd like to see if there's a way to add discovered labels __XXX
from file attribute/metadata ? That could be done in another PR though.
@cyriltovena Will work on documentation. Regarding scaling Promtail, just pasting my comment above again To scale to process more files, running Promtail with same SQS queue would be tricky. One solution would be to create multiple SQS queues based on some pattern like bucket/a/file.gz will go to queue A and bucket/b/file.gz will go to queue B. Then multiple Promtail instances can be configured with different queues (A, B etc...) |
8f0ed7b
to
bc2a265
Compare
Closing this as no activities for long time. Feel free to send new PR if anyone want's to revive the work |
This may be relevant for any repo watchers: #5065 |
What this PR does / why we need it:
Based on the design doc in #2107, this PR adds a new feature to Promtail where users can scrape logs from object store. This version specifically supports just AWS S3. Also, this PR just covers the items mentioned as part of first iteration
Which issue(s) this PR fixes:
Fixes #2045
Checklist