Feature: Promtail, scrape logs from Object store #2270

adityacs · 2020-06-28T14:46:05Z

What this PR does / why we need it:
Based on the design doc in #2107, this PR adds a new feature to Promtail where users can scrape logs from object store. This version specifically supports just AWS S3. Also, this PR just covers the items mentioned as part of first iteration

Which issue(s) this PR fixes:
Fixes #2045

Checklist

Documentation added
Tests updated

adityacs · 2020-06-28T14:51:23Z

pkg/promtail/targets/object_reader.go

+func (r *objectReader) getReader(reader io.ReadCloser) (*bufio.Reader, error) {
+	// identify the file type
+	buf := [3]byte{}
+	n, err := io.ReadAtLeast(reader, buf[:], len(buf))
+	if err != nil {
+		return nil, err
+	}
+
+	rd := io.MultiReader(bytes.NewReader(buf[:n]), reader)
+	if isGzip(buf) {
+		r, err := gzip.NewReader(rd)
+		if err != nil {
+			return nil, err
+		}
+		rd = r
+	}
+
+	bufReader := bufio.NewReader(rd)
+	return bufReader, nil
+}


@cyriltovena I need some help here. I am not sure If I am doing this in a proper way.

codecov-commenter · 2020-06-28T14:53:53Z

Codecov Report

Merging #2270 into master will increase coverage by 0.02%.
The diff coverage is 58.82%.

@@            Coverage Diff             @@
##           master    #2270      +/-   ##
==========================================
+ Coverage   61.65%   61.67%   +0.02%     
==========================================
  Files         160      163       +3     
  Lines       13565    13836     +271     
==========================================
+ Hits         8363     8533     +170     
- Misses       4579     4658      +79     
- Partials      623      645      +22

Impacted Files	Coverage Δ
pkg/promtail/positions/positions.go	`60.71% <0.00%> (+13.39%)`	⬆️
pkg/promtail/scrapeconfig/scrapeconfig.go	`40.00% <ø> (ø)`
...kg/promtail/targets/objectstore/s3targetmanager.go	`0.00% <0.00%> (ø)`
pkg/promtail/targets/objectstore/object_target.go	`65.95% <65.95%> (ø)`
pkg/promtail/targets/objectstore/object_reader.go	`76.56% <76.56%> (ø)`
pkg/promtail/targets/file/tailer.go	`73.86% <0.00%> (-4.55%)`	⬇️
pkg/promtail/targets/file/filetarget.go	`68.67% <0.00%> (-1.81%)`	⬇️
... and 2 more

owen-d · 2020-06-29T12:14:08Z

pkg/promtail/targets/awss3target.go

+
+// NewFileTarget create a new FileTarget.
+func NewS3Target(logger log.Logger, handler api.EntryHandler, positions positions.Positions, jobName string, objectClient chunk.ObjectClient, s3Config *scrape.S3Targetconfig) (*S3Target, error) {
+	// // object store sync period should not be less than 1 minute


Left some code comments :)

😃 will fix

owen-d · 2020-06-29T12:15:25Z

pkg/promtail/targets/awss3target.go

+	for {
+		select {
+		case <-ticker.C:
+			objects, _, err := t.objectClient.List(context.Background(), t.prefix)


I'm afraid this will hang on large buckets, but I think that's fine for an initial feature.

Yep, you are right. We have to add multi part download for larger size files. Added a note here https://github.com/grafana/loki/blob/ab130b09eb965c3baa8f94d02c63b15704b1c300/docs/clients/promtail/configuration.md#s3_config

owen-d · 2020-06-29T12:16:26Z

pkg/promtail/targets/awss3target.go

+		}
+
+		// skip any object which is already read completely
+		if modifiedAt == object.ModifiedAt.UnixNano() && pos > 0 && pos == size {


Do you need the pos > 0 check? It seems unnecessary.

There might be a chance that Promtail shutdowns without reading a single line. On restart pos and size will be 0.So, just checking pos == size will ignore the object.

owen-d · 2020-06-29T12:28:32Z

pkg/promtail/targets/awss3target.go

+)
+
+var (
+	s3ReadBytes = promauto.NewGaugeVec(prometheus.GaugeOpts{


I think we should avoid using path for these metrics. We're creating metrics in accordance with our stream cardinality here.

path here means object name. What do you suggest here? Just consider the bucker name instead of each individual object?

owen-d

Very cool first draft!

I'm worried about the metric cardinality (more in comments) and that promtail would try to list entire s3 buckets per sync period. I suspect this will work as long as users configure low per-bucket retention and don't have lots of throughput in said bucket.

Another way to ease this could be an optional flag for deleting s3 files once read. WDYT?

I'll defer to @cyriltovena's judgment for your compression question.

adityacs · 2020-06-30T03:12:12Z

@owen-d

I suspect this will work as long as users configure low per-bucket retention and don't have lots of throughput in said bucket.

Yep, you are correct. This is just a very simple way of fetching and scraping logs. In future versions we want to support SQS integration. Using SQS we don't have to List the objects in bucket, we just look for the object we want. Mentioned the same in docs as well

adityacs · 2020-07-13T11:25:03Z

pkg/promtail/targets/objectstore/s3targetmanager.go

+			BucketNames:      cfg.S3Config.BucketName,
+			Endpoint:         cfg.S3Config.Endpoint,
+			Region:           cfg.S3Config.Region,
+			AccessKeyID:      cfg.S3Config.AccessKeyID,
+			SecretAccessKey:  cfg.S3Config.SecretAccessKey,
+			Insecure:         cfg.S3Config.Insecure,
+			S3ForcePathStyle: cfg.S3Config.S3ForcePathStyle,
+			HTTPConfig:       awsHTTPConfig,


Dependent on #2318

stale · 2020-08-21T20:30:58Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

angeloskaltsikis · 2021-02-23T08:43:15Z

Hey @adityacs @owen-d ,
Any way to push for this PR to get reviewed 😄 ?

CLAassistant · 2021-04-20T17:30:24Z

All committers have signed the CLA.

matthiaslee · 2021-05-20T19:38:34Z

Any update on this? I'd be happy to contribute if i can be of help.

…ge (grafana#2270) * added FIFO cache metrics for current number of entries and memory usage fixed bug in updating last element of FIFO updated unit tests Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * fixed "make mod-check" Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * Revert "fixed "make mod-check"" This reverts commit fe00ab880b7d7dfacfb9a7af1f38ebd3fe74e1ba. Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * addressed comments Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com> * addressed comments Signed-off-by: Dmitry Shmulevich <dmitry.shmulevich@sysdig.com>

sushantkumar-amagi · 2021-12-10T04:29:11Z

Would love to help out with getting this PR across. There is a use case where we need it too.

cyriltovena

I'd like to hear more about the deployment model of Promtail in this case.

It doesn't seems that you can scale them because the position file is local.

I also think the documentation is a bit light on how to configure SQS at least a reference on how to activate SQS with update on S3 seems required.

You need some sort of disclaimer like and explain how:

This target uses SQS queue on the region same as S3 bucket. We must setup SQS queue and S3 event notification before use this plugin.

1. Create new SQS queue (use same region as S3)
2. Set proper permission to new queue
3. [Configure S3 event notification](http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html)
4. Start a Promtail instance

I think the code is almost there, but small effort on the documentation is required.

Lastly I'd like to see if there's a way to add discovered labels __XXX from file attribute/metadata ? That could be done in another PR though.

adityacs · 2022-01-21T15:59:58Z

@cyriltovena Will work on documentation. Regarding scaling Promtail, just pasting my comment above again

To scale to process more files, running Promtail with same SQS queue would be tricky. One solution would be to create multiple SQS queues based on some pattern like bucket/a/file.gz will go to queue A and bucket/b/file.gz will go to queue B. Then multiple Promtail instances can be configured with different queues (A, B etc...)

kavirajk · 2022-03-18T13:55:03Z

Closing this as no activities for long time. Feel free to send new PR if anyone want's to revive the work

sandstrom · 2022-03-18T14:23:14Z

This may be relevant for any repo watchers: #5065

pull-request-size bot added the size/XXL label Jun 28, 2020

adityacs changed the title ~~Object storage scrape~~ Feature: Promtail, scrape logs from Object store Jun 28, 2020

adityacs requested a review from cyriltovena June 28, 2020 14:50

adityacs commented Jun 28, 2020

View reviewed changes

owen-d reviewed Jun 29, 2020

View reviewed changes

adityacs force-pushed the object_storage_scrape branch 5 times, most recently from fbc1093 to 0a2b793 Compare July 13, 2020 11:20

adityacs commented Jul 13, 2020

View reviewed changes

adityacs marked this pull request as ready for review July 13, 2020 11:27

adityacs force-pushed the object_storage_scrape branch from 0a2b793 to 0d8dd14 Compare July 20, 2020 04:00

stale bot added the stale A stale issue or PR that will automatically be closed. label Aug 21, 2020

adityacs added the keepalive An issue or PR that will be kept alive and never marked as stale. label Aug 24, 2020

stale bot removed the stale A stale issue or PR that will automatically be closed. label Aug 24, 2020

cyriltovena requested changes Jan 14, 2022

View reviewed changes

adityacs added 15 commits January 29, 2022 14:28

S3 target manager

82dc643

S3 Object scrape

b23a90a

rebase upstream changes

0024333

fix test

9e40657

fix tests 2

e42a71c

attempt to identify unreproducible error

b04890e

fixes time precission in tests

4091eac

code with sqs

87c6e0d

add test and update doc

5485f61

fix lint

f242a91

fix lint

fe73650

fix go mod

af5d7f6

fix go mod

db1d9ad

fix go mod

95d3de1

addressed review comments and doc update

bc2a265

adityacs force-pushed the object_storage_scrape branch from 8f0ed7b to bc2a265 Compare January 29, 2022 15:59

adityacs added 5 commits January 29, 2022 22:28

fix lint and test

a98d2e5

fix lint

0457a9f

fix lint

e38c55d

fix vendoring

95a4f23

fix vendoring

f97c6c3

adityacs requested review from cyriltovena and KMiller-Grafana January 30, 2022 08:31

kavirajk closed this Mar 18, 2022

AnthonyWC mentioned this pull request May 23, 2022

Object client scrape configs #2045

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Promtail, scrape logs from Object store #2270

Feature: Promtail, scrape logs from Object store #2270

adityacs commented Jun 28, 2020 •

edited

Loading

adityacs Jun 28, 2020

codecov-commenter commented Jun 28, 2020 •

edited

Loading

owen-d Jun 29, 2020

adityacs Jun 30, 2020

owen-d Jun 29, 2020

adityacs Jun 30, 2020

owen-d Jun 29, 2020

adityacs Jun 30, 2020 •

edited

Loading

owen-d Jun 29, 2020

adityacs Jun 30, 2020

owen-d left a comment

adityacs commented Jun 30, 2020

adityacs Jul 13, 2020

stale bot commented Aug 21, 2020

angeloskaltsikis commented Feb 23, 2021

CLAassistant commented Apr 20, 2021 •

edited

Loading

matthiaslee commented May 20, 2021 •

edited

Loading

sushantkumar-amagi commented Dec 10, 2021

cyriltovena left a comment •

edited

Loading

adityacs commented Jan 21, 2022

kavirajk commented Mar 18, 2022

sandstrom commented Mar 18, 2022

Feature: Promtail, scrape logs from Object store #2270

Feature: Promtail, scrape logs from Object store #2270

Conversation

adityacs commented Jun 28, 2020 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Jun 28, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adityacs Jun 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

owen-d left a comment

Choose a reason for hiding this comment

adityacs commented Jun 30, 2020

Choose a reason for hiding this comment

stale bot commented Aug 21, 2020

angeloskaltsikis commented Feb 23, 2021

CLAassistant commented Apr 20, 2021 • edited Loading

matthiaslee commented May 20, 2021 • edited Loading

sushantkumar-amagi commented Dec 10, 2021

cyriltovena left a comment • edited Loading

Choose a reason for hiding this comment

adityacs commented Jan 21, 2022

kavirajk commented Mar 18, 2022

sandstrom commented Mar 18, 2022

adityacs commented Jun 28, 2020 •

edited

Loading

codecov-commenter commented Jun 28, 2020 •

edited

Loading

adityacs Jun 30, 2020 •

edited

Loading

CLAassistant commented Apr 20, 2021 •

edited

Loading

matthiaslee commented May 20, 2021 •

edited

Loading

cyriltovena left a comment •

edited

Loading