Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent SIGSEGV #398

Open
chaker-sidhom opened this issue Jul 28, 2022 · 4 comments
Open

Consistent SIGSEGV #398

chaker-sidhom opened this issue Jul 28, 2022 · 4 comments

Comments

@chaker-sidhom
Copy link

Describe the question/issue

Hi all,
We're running the latest aws-for-fluent-bit version 2.26.0 on AWS EKS and we're getting consistent segfaults. The issue is very similar to #383

Configuration

  • Deployment mode: Daemon set an EKS cluster. The docker image is: amazon/aws-for-fluent-bit:2.26.0
  • Log format: JSON
    The configMap looks as follows:
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: amazon-cloudwatch
  labels:
    k8s-app: fluent-bit
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush                     10
        Log_Level                 info
        Daemon                    off
        Parsers_File              parsers.conf
        HTTP_Server               ${HTTP_SERVER}
        HTTP_Listen               0.0.0.0
        HTTP_Port                 ${HTTP_PORT}
        storage.path              /var/fluent-bit/state/flb-storage/
        storage.sync              normal
        storage.checksum          off
        storage.backlog.mem_limit 5M
        
    @INCLUDE application-inputs.conf
    @INCLUDE application-filters.conf
    @INCLUDE application-outputs.conf
    @INCLUDE dataplane-log.conf
    @INCLUDE host-log.conf
  
  application-inputs.conf: |
    [INPUT]
        Name                tail
        Tag                 application.central.*
        Exclude_Path        /var/log/containers/*istio-* 
        Path                /var/log/containers/*_central_*.log
        Docker_Mode         On
        Docker_Mode_Flush   5
        Docker_Mode_Parser  container_firstline
        Parser              docker
        DB                  /var/fluent-bit/state/flb_container_central.db
        Mem_Buf_Limit       25MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
        Read_from_Head      ${READ_FROM_HEAD}
    [INPUT]
        Name                tail
        Tag                 application.istio.*
        Exclude_Path        /var/log/containers/*istio-init*
        Path                /var/log/containers/*istio*.log
        Docker_Mode         On
        Docker_Mode_Flush   5
        Docker_Mode_Parser  container_firstline
        Parser              docker
        DB                  /var/fluent-bit/state/flb_container_istio.db
        Mem_Buf_Limit       25MB
        Skip_Long_Lines     On
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
        Read_from_Head      ${READ_FROM_HEAD}


  application-filters.conf: |
    [FILTER]
        Name                kubernetes
        Match               application.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_Tag_Prefix     application.central.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        Keep_Log            Off
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
        Labels              Off
        Annotations         Off
        Buffer_Size         0
    [FILTER]
        Name                kubernetes
        Match               application.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_Tag_Prefix     application.istio.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        Keep_Log            Off
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
        Labels              Off
        Annotations         Off
        Buffer_Size         0
    [FILTER]
        Name                nest
        Match               application.*
        Operation           lift
        Nested_under        kubernetes
        Add_prefix          Kube.
    [FILTER]
        Name                modify
        Match               application.*
        Remove              Kube.docker_id
        Remove              Kube.container_hash
        Remove              stream
    [FILTER]
        Name                nest
        Match               application.*
        Operation           nest
        Wildcard            Kube.*
        Nested_under        k
        Remove_prefix       Kube.

  application-outputs.conf: |
    [OUTPUT]
        Name                cloudwatch_logs
        Match               application.central.*
        region              ${AWS_REGION}
        log_group_name      /aws/containerinsights/${CLUSTER_NAME}/central
        log_stream_prefix   ${HOST_NAME}-
        auto_create_group   true
        extra_user_agent    container-insights
    [OUTPUT]
        Name                cloudwatch_logs
        Match               application.istio.*
        region              ${AWS_REGION}
        log_group_name      /aws/containerinsights/${CLUSTER_NAME}/istio
        log_stream_prefix   ${HOST_NAME}-
        auto_create_group   true
        extra_user_agent    container-insights

  parsers.conf: |
    [PARSER]
        Name                docker
        Format              json
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ
    [PARSER]
        Name                syslog
        Format              regex
        Regex               ^(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key            time
        Time_Format         %b %d %H:%M:%S
    [PARSER]
        Name                container_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\S(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ
    [PARSER]
        Name                cwagent_firstline
        Format              regex
        Regex               (?<log>(?<="log":")\d{4}[\/-]\d{1,2}[\/-]\d{1,2}[ T]\d{2}:\d{2}:\d{2}(?!\.).*?)(?<!\\)".*(?<stream>(?<="stream":").*?)".*(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
        Time_Key            time
        Time_Format         %Y-%m-%dT%H:%M:%S.%LZ

Fluent Bit Log Output

Last few debug logs before the container is killed

[2022/07/27 10:52:42] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sent 2 events to CloudWatch
[2022/07/27 10:52:42] [debug] [upstream] KA connection #35 to logs.eu-central-1.amazonaws.com:443 has been disconnected by the remote service
[2022/07/27 10:52:42] [debug] [upstream] KA connection #-1 to logs.eu-central-1.amazonaws.com:443 has been disconnected by the remote service
[2022/07/27 10:52:42] [debug] [upstream] KA connection #-1 to logs.eu-central-1.amazonaws.com:443 has been disconnected by the remote service
[2022/07/27 10:52:42] [debug] [upstream] KA connection #-1 to logs.eu-central-1.amazonaws.com:443 has been disconnected by the remote service
[2022/07/27 10:52:42] [debug] [upstream] KA connection #-1 to logs.eu-central-1.amazonaws.com:443 has been disconnected by the remote service
[2022/07/27 10:52:42] [debug] [upstream] KA connection #-1 to logs.eu-central-1.amazonaws.com:443 has been disconnected by the remote service
[2022/07/27 10:52:42] [debug] [upstream] KA connection #-1 to logs.eu-central-1.amazonaws.com:443 has been disconnected by the remote service
[2022/07/27 10:52:42] [debug] [upstream] KA connection #-1 to logs.eu-central-1.amazonaws.com:443 has been disconnected by the remote service
[2022/07/27 10:52:42] [debug] [upstream] KA connection #-1 to logs.eu-central-1.amazonaws.com:443 has been disconnected by the remote service
[2022/07/27 10:52:42] [debug] [socket] could not validate socket status for #35 (don't worry)
[2022/07/27 10:52:42] [debug] [upstream] KA connection #-1 to logs.eu-central-1.amazonaws.com:443 has been disconnected by the remote service
[2022/07/27 10:52:42] [engine] caught signal (SIGSEGV)
AWS for Fluent Bit Container Image Version 2.26.0

Fluent Bit Version Info

version that crash: AWS for Fluent Bit Container Image Version 2.26.0
version that DOES NOT crash AWS for Fluent Bit Container Image Version 2.25.0

Cluster Details

EKS cluster running Kubernetes version v1.22

Steps to reproduce issue

@PettitWesley
Copy link
Contributor

If you can, try this so we can get a proper stack trace or core dump: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#segfaults-and-crashes-sigsegv

@qdupuy
Copy link

qdupuy commented Sep 6, 2022

Hello,

Same issue from amazon/aws-for-fluent-bit:2.24.0

However, if I go back to version 2.23.0 it works

@PettitWesley

@gpetrovgeorgi
Copy link

gpetrovgeorgi commented Feb 1, 2023

Hello guys,

The last "stable" version for us was 2.28.4 it looks like the stable image tag had been moved on further version and our AWS ECS containers started breaking with SISEGV error:

AWS for Fluent Bit Container Image Version 2.28.4

on AWS ECS Fargate and we are facing the same problem - our containers are crashing with caught signal (SIGSEGV). Our config uses two plugins inside the OUTPUT directives - S3 and CloudWatch.

I will be happy to see some root cause and fix here @PettitWesley

Some messages from AWS ECS console after I've enabled the debug according to this page:

image

@PettitWesley
Copy link
Contributor

@gpetrovgeorgi there have been many issues fixed since that version: #542

Also, we now have pre-built debug images that can output stacktraces and upload cores to S3, if you face this issue in a new version.

https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#firelens-crash-report-runbook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants