-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using single store (tsdb, boltdb-shipper), indexes and chunks are shipped to object storage but queries are (partly) empty #10529
Comments
We're also experiencing logs gaps on the frontend (Loki v2.9.0) inspite the logs actually being there. A restart of the loki-backend statefulset brings the logs back tho |
Hey @bumarcell , can confirm a |
I believe we've experienced this behavior at least twice so far |
I'm tempted to just restart loki daily via cron, but that shouldn't be needed in a stable application, so i would rather like to help finding the root cause. Is there any additional debug flags i can run to get some more useful data about the state of loki if this happens again? |
My laziness is telling me to wait for the minor updates 😄 |
which hopefully we triggered already 🤞 |
Hi all! Similar problem. I decided to roll back to 2.8.4 |
Hi all, |
Same issue, roll backed to 2.8.4 |
Hello, |
Also using gcp with boltdb-shipper, Upgraded last night to 2.9 and since then my queries only return log lines from BEFORE the upgrade or logs that are still sat on my ingesters. Reverting to 2.8.4 has resolved the issue for now and i can see all the logs ingested since last nights upgrade. |
We are investigating this issue, which may have been introduced with #9710 (still needs to be verified). |
Could someone facing this issue post logs (from |
I'm also seeing this issue. Downgrade to 2.8.4 fixed it. I'm running tsdb in single binary mode. Happy to post logs if you can tell me how to find them. |
tsdb + SSD. |
We are hitting this too, running in microservices mode with tsdb. |
Our Environment is TSDB + AWS S3 I finally rolled back our Loki from 2.9.0 to 2.8.4 |
As a workaround, you can set the per-tenant setting |
Can confirm gaps in logs after upgrading to 2.9.0 using Restarting loki to filled the gaps again. Downgraded to 2.8.4 until a proper patch is provided. I think the issue should be renamed, because it does not occur on S3 only. |
Agree. Will rename the issue. |
Good news, Loki 2.9.1 has been released. |
Will try it out immediately, thank you so much for looking into it this quickly! |
doesn't seem to fix the issue. I have quickly rolled out to 2.9.1 and again started seeing logs only for the last 30mins. is this working for anyone after the upgrade? |
How long was the 2.9.1 loki instance running? |
I let it to run for an hour. |
Running with 2.9.1 for 2 hours now. Until now all fine. |
Running with 2.9.1 (in our dev env) for also roughly 2 hours and it seems to be fine for now. |
total_bytes_structured_metadata=0B this looks off 🤔 |
Running okay since, so we will upgrade production too. |
@isshwar we have the same problem, but it is not issue of total_bytes_structured_metadata=0B
|
I'm seeing a similar issue and we are running 2.9.0, 2.9.1 and now 2.9.2. Still the same issue. I can't see anything in the logs that indicate some kind of issue. If I port-forward to the loki-write-1/2/3 pods and issue We have two loki clusters and they have different time periods where search result is "missing". One has a gap of about 2-3 hours and the other have about ~12 hours. We are using the "grafana/loki" chart. |
@Alexsandr-Random auth_enabled: false
common:
compactor_address: 'loki-backend'
path_prefix: /var/loki
replication_factor: ${LOKI_REPLICATION_FACTOR}
storage:
s3:
access_key_id: ${S3_ACCESS_KEY_ID}
bucketnames: ${S3_BUCKET_NAME}-chunks
endpoint: ${S3_ENDPOINT}
http_config:
insecure_skip_verify: true
insecure: false
region: ${S3_REGION}
s3forcepathstyle: true
secret_access_key: $${q}{S3_SECRET_ACCESS_KEY}
compactor:
compaction_interval: 30m
delete_request_cancel_period: 30m
retention_enabled: true
shared_store: s3
working_directory: /var/loki/retention
frontend:
max_outstanding_per_tenant: 4096
scheduler_address: query-scheduler-discovery.${K8S_NAMESPACE}.svc.${K8S_CLUSTER_NAME}.local.:9095
frontend_worker:
scheduler_address: query-scheduler-discovery.${K8S_NAMESPACE}.svc.${K8S_CLUSTER_NAME}.local.:9095
index_gateway:
mode: ring
limits_config:
enforce_metric_name: false
ingestion_burst_size_mb: 30
ingestion_rate_mb: 20
max_cache_freshness_per_query: 10m
max_chunks_per_query: 6000000
max_entries_limit_per_query: 10000
max_query_parallelism: 256
max_query_series: 2000
max_streams_matchers_per_query: 10000
per_stream_rate_limit: 20MB
query_timeout: 300s
reject_old_samples: false
reject_old_samples_max_age: 168h
retention_period: 744h
split_queries_by_interval: 15m
memberlist:
join_members:
- loki-memberlist
querier:
engine:
timeout: 300s
max_concurrent: 2048
query_range:
align_queries_with_step: true
query_scheduler:
max_outstanding_requests_per_tenant: 32768
ruler:
alertmanager_client:
basic_auth_password: $${q}{ALERTMANAGER_PASSWORD}
basic_auth_username: ${ALERTMANAGER_USERNAME}
alertmanager_url: ${ALERTMANAGER_URL}
enable_alertmanager_v2: true
enable_sharding: true
evaluation_interval: ${EVALUATION_INTERVAL}
remote_write:
clients:
prometheus-0:
basic_auth:
password: $${q}{PROMETHEUS_PASSWORD}
username: ${PROMETHEUS_USERNAME}
name: prom-0
url: https://prometheus.pageplace.de/api/v1/write
enabled: true
storage:
local:
directory: /var/ruler
type: local
wal:
dir: /var/loki/ruler-wal
runtime_config:
file: /etc/loki/runtime-config/runtime-config.yaml
schema_config:
configs:
- from: "2022-01-11"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v12
store: boltdb-shipper
- from: "2023-08-09"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v12
store: tsdb
server:
grpc_listen_port: 9095
grpc_server_max_recv_msg_size: 10485760
grpc_server_max_send_msg_size: 10485760
http_listen_port: 3100
http_server_idle_timeout: 310s
http_server_read_timeout: 310s
http_server_write_timeout: 310s
storage_config:
boltdb_shipper:
active_index_directory: /var/loki/boltdb/index
cache_location: /var/loki/boltdb-cache
shared_store: s3
hedging:
at: 250ms
max_per_second: 20
up_to: 3
tsdb_shipper:
active_index_directory: /var/loki/index
cache_location: /var/loki/tsdb-cache
shared_store: s3
|
oh, @Alexsandr-Random I get the same issue, When I specify a time range of 12 hours, I can only see logs from the last hour or two. I am using version 2.9.2, tsdb+s3 as the backend storage |
My issue is resolved. My issue was due to a configuration error that I made. I had made these changes to ingester
What I had not realise was that the querier setting |
This issue is still present, several months in. 2.8.4 works fine, anything higher than this cuts history down to 2-3 hours maximum. |
Downgrading to 2.8.4 fixed the problem. |
I'm using Loki |
I used Loki |
I am using 2.9.0, the same issue for me. There seems no update..... |
UPDATE: Solved. See https://community.grafana.com/t/logs-are-gone-after-flushing-off-ingester/135458/7 I think I am observing similar behavior on Loki 3.2.1 where logs are gone from dashboard after they are flushed off the ingester. If I set up querier:
query_store_only: true I will get almost no result. My config: target: all,write
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: info
grpc_server_max_concurrent_streams: 1000
common:
instance_addr:
path_prefix: /var/lib/loki
instance_interface_names:
- eth0
storage:
filesystem:
chunks_directory: /var/lib/loki/chunks
rules_directory: /var/lib/loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
instance_enable_ipv6: true
ingester:
chunk_encoding: zstd
max_chunk_age: 6h
chunk_idle_period: 3h
chunk_target_size: 16777216
ingester_rf1:
enabled: false
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
pattern_ingester:
enabled: true
metric_aggregation:
enabled: true
loki_address: localhost:3100
ruler:
alertmanager_url: http://localhost:9093
frontend:
encoding: protobuf
compactor:
working_directory: /var/lib/loki/retention
compaction_interval: 1h
retention_enabled: false
retention_delete_delay: 24h
retention_delete_worker_count: 150
delete_request_store: filesystem
table_manager:
retention_period: 365d
limits_config:
retention_period: 365d
retention_stream: []
max_query_parallelism: 16
discover_log_levels: false It can be clearly seen that old logs are gradually fading out. |
11.2024 v2.9.3 |
12.2024 3.3.0. |
@tirelibirefe @ticup can you try 3.1.2? For my in 3.1.2 no issue. |
@skl256 Thanks for the reply. I'm running it through helm, latest version there seems to be 3.3.0.
Something other people have experienced as well #13409 |
I'm sorry, I read your text incorrectly, I thought you had version 2.9.1. |
Wow I actually found the issue. The problem was that the chunks-cache was not being scheduled anymore.
the disappearing of logs is actually fixed! |
For about a week now my loki has weird gaps in the reported data, and right now doesn't return any data.
This is super annoying, especially because of alerts triggering because their conditions are wrongly reported by loki.
The last time I had similar problems with missing data the local storage of my instance was full. But that is not currently the case.
What else can lead to this behaviour? How can I fix this?
compose:
loki config:
The text was updated successfully, but these errors were encountered: