Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qurey:Can not query from Object storage #4466

Closed
btdan opened this issue Jul 22, 2021 · 15 comments
Closed

Qurey:Can not query from Object storage #4466

btdan opened this issue Jul 22, 2021 · 15 comments

Comments

@btdan
Copy link

btdan commented Jul 22, 2021

I run with the command as follows:

nohup thanos receive
--tsdb.path "/aiops/prometheusData/metricsData"
--grpc-address 0.0.0.0:10907
--http-address 0.0.0.0:10909
--receive.replication-factor 1
--label "receive_replica="0""
--label "receive_cluster="aiopssection""
--receive.local-endpoint 10.0.90.202:10907
--receive.hashrings-file /aiops/hashring.json
--remote-write.address 0.0.0.0:10908
--objstore.config-file "/aiops/bucket.yml" > /thanos_receive.log 2>&1 &

nohup thanos query --http-address 0.0.0.0:19192 --grpc-address 0.0.0.0:19193 --store 10.0.90.202:10907 --store 10.0.90.202:19090 > /thanos_query.log 2>&1 &

nohup thanos store --data-dir /aiops/thanosStoreGateway --objstore.config-file /aiops/bucket.yml --http-address 0.0.0.0:19191 --grpc-address 0.0.0.0:19090 > /thanos_store.log 2>&1 &

But I find that the query result on the website url http://10.0.90.202:19192 only include the data from local disk with thanos receive command. That is to say, the data on object store is not queried.

I read the log file thanos_query.log, and find many record like this:
level=warn ts=2021-07-21T07:12:35.910427898Z caller=proxy.go:450 component=proxy request="min_time:1624258500000 max_time:1626850800000 matchers:<type:RE name:"instance" value:"10\.0\.90\.210:9100" > matchers:<name:"name" value:"node_memory_MemAvailable_bytes" > aggregates:COUNT aggregates:SUM " err="receive series from Addr: 10.0.90.202:19090 LabelSets: {receive_cluster="aiopssection", receive_replica="0", tenant_id="default-tenant"} Mint: 16 21123202783 Maxt: 1626847200000: rpc error: code = Aborted desc = fetch series for block 01F9J4Q0B5WYV9SH6KCP2WN3AX: load chunks: get range reader: Get "http://10.0.90.203/api/v1/obj/aiopssection/01F9J4Q0B5WYV9SH6KCP2WN3AX/chunks/000001/?offset=51277073&size=16806\": dial tcp 10.0.90.203:80: socket: too many open files" msg="returning partial response"

We can see there is an log for error. Can anyone help me to solve this problem? Thanks very much.

@bill3tt
Copy link
Contributor

bill3tt commented Jul 26, 2021

socket: too many open files - sounds like your local machine has run out of file descriptors.

@btdan
Copy link
Author

btdan commented Jul 27, 2021

Thanks very much for your reply.
I have change the value of open files. The result for command "ulimit -n" is as follows:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31661
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31661
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

@btdan
Copy link
Author

btdan commented Jul 27, 2021

By the way, I am running in a virtual machine. The operating system is centos .
Thanks very much.
Best wishes.

@bill3tt
Copy link
Contributor

bill3tt commented Jul 27, 2021

This has been raised before: #3047, #3313.

A quick google leads me to this stack overflow question that contains some helpful hints that look relevant to your setup.

I think this can be safely closed.

@btdan
Copy link
Author

btdan commented Jul 29, 2021

Hi, Thanks very much for your reply.
I think there is no ideal resolution to this problem in #3047.
And the method metioned in #3313, I have follow the post to modify /etc/security/limits.conf and other configuration. However, It does not work well.

@ahurtaud
Copy link
Contributor

ahurtaud commented Aug 5, 2021

I am facing the same issue with the new version v0.22.0, whereas at the exact same moment v0.21.1 is working fine so I fear there is a bug introduction on the latest version.
Also it looks like this is happening only on azure blob storage. where the release notes states a refactor of the azure api.

@btdan

  • which thanos version are you using? can you try with v0.21.1.
  • which object storage provider are you using?

on my side, even a thanos tools bucket inspect is failing. Also compactor and stores.

./Downloads/thanos-0.21.1.darwin-amd64/thanos tools bucket inspect --objstore.config-file=/Users/ahurtaud/Downloads/thanos/bucket.yaml --timeout=20m

level=info ts=2021-08-05T13:37:15.377671Z caller=factory.go:46 msg="loading bucket configuration"
level=info ts=2021-08-05T13:37:19.41476Z caller=fetcher.go:476 component=block.BaseFetcher msg="successfully synchronized block metadata" duration=3.424255479s cached=702 returned=702 partial=0
[...]
./Downloads/thanos-0.22.0.darwin-amd64/thanos tools bucket inspect --objstore.config-file=/Users/ahurtaud/Downloads/thanos/bucket.yaml --timeout=20m

level=info ts=2021-08-05T13:10:33.506004Z caller=factory.go:46 msg="loading bucket configuration"
level=error ts=2021-08-05T13:21:05.480248Z caller=main.go:130 err="307 errors: meta.json file exists: 01F8Y8NRW38HNQ69HCWYYA3A6M/meta.json: cannot get properties for Azure blob, address: 01F8Y8NRW38HNQ69HCWYYA3A6M/meta.json: Head \"https://<redacted>/01F8Y8NRW38HNQ69HCWYYA3A6M/meta.json?timeout=1162\": dial tcp: lookup <redacted> on 172.16.107.37:53: dial udp 172.16.107.37:53: socket: too many open files; get meta file: 01F5J3YERX9VZH6SDWZJ5AAME1/meta.json: cannot get properties for container: 01F5J3YERX9VZH6SDWZJ5AAME1/meta.json: Head \"https://<redacted>/01F5J3YERX9VZH6SDWZJ5AAME1/meta.json?timeout=1161\": dial tcp: lookup <redacted> on 172.16.107.37:53: dial udp 172.16.107.37:53: socket: too many open files [...]

@bill3tt
Copy link
Contributor

bill3tt commented Aug 9, 2021

Thanks for the information @ahurtaud - the 0.22.0 release included a refactor of the Azure object store #3970, which looks like it could be related.

I can't see anything obvious in the upstream azure blob repo. I wonder whether we are unintentionally setting some parameters that we previously did not set before? WDYT @wiardvanrij ?

@wiardvanrij
Copy link
Member

wiardvanrij commented Aug 9, 2021

What we implement were extra config options which you could alter. Those options are kept at "default" i.e. what Azure would do anyway 0 - See: https://github.com/thanos-io/thanos/pull/3970/files#diff-d22adb0d02d7e04d958ed73bc7759bdcc29707d623a8b8326abcb1f547f8c5f8R29

It was my intention, to make sure it was not breaking and it has been tested. Though only on k8s.

Obviously it could be that I missed something but I would not know what.

@wiardvanrij
Copy link
Member

#4605 might be related

@stale
Copy link

stale bot commented Oct 30, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Oct 30, 2021
@ahurtaud
Copy link
Contributor

ahurtaud commented Nov 17, 2021

this is still valid to me :/ thanos has a lot of issues with our azure blob storage compared to before 0.22.

@stale stale bot removed the stale label Nov 17, 2021
@wiardvanrij
Copy link
Member

this is still valid to me :/ thanos has a lot of issues with our azure blob storage compared to before 0.22.

The fix was made but only made it in 0.23. Could you perhaps try it again with that version? Please do let me know if that works for you. Otherwise I would love to get some more details.

@ahurtaud
Copy link
Contributor

ahurtaud commented Dec 3, 2021

no no thats the thing, anything above 0.21.1 make the compactor very unstable for us:

Screen Shot 2021-12-03 at 10 11 03
Metrics dropping to zero = pod crashloop = thanos v0.23.1
compaction stable = thanos v0.21.1

@ahurtaud
Copy link
Contributor

ahurtaud commented Jan 4, 2022

It looks like 0.24.0 is as stable as before v0.22.
I think it was a duplicate of what causes: #4962
And maybe #4928

Anyway I suggest updating to v0.24.0 which seems to fix everything we had!
Agree to close on my side.

@wiardvanrij
Copy link
Member

It looks like 0.24.0 is as stable as before v0.22. I think it was a duplicate of what causes: #4962 And maybe #4928

Anyway I suggest updating to v0.24.0 which seems to fix everything we had! Agree to close on my side.

Thanks for the update! Appreciate it and I am glad it works stable again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants