New connection socket descriptor (1024) is not less than FD_SETSIZE #1441

DasNordlicht · 2023-09-11T08:16:32Z

Sometimes Orion LD stops working with the following message:

New connection socket descriptor (1024) is not less than FD_SETSIZE (1024).

Since the K8S pod doesn't crash, the LD just stops working, we only notice this when we look at the LOGs or the customer complains about missing data.

Are the connections perhaps not closed fast enough?

Tested in v1.2.1 1.3.0 and 1.4.0

kzangeli · 2023-09-11T08:19:49Z

ok, that's interesting!
Too many open file descriptors. Seems like a missing "close" somewhere. That would be an important bug!
Can you give me some more info on the error message about "1024" ?

DasNordlicht · 2023-09-11T10:23:47Z

No, unfortunately not, the log is full with the message when it exits:
New connection socket descriptor (1024) is not less than FD_SETSIZE (1024)
And with this the LD is dead.
K8s Rev: v1.24.17
CPU 12%
MEM 23%

3 Orion-LD Pod working parallel

Containers:                                                                                                                                                                                                                                   
  orion-ld:                                                                                                                                                                                                                                   
    Container ID:  containerd://7a272e0491bf88454a39396c9d2563479dbec7ff7a7b9dc283d96cc84174f4a8                                                                                                                                              
    Image:         fiware/orion-ld:1.4.0                                                                                                                                                                                                      
    Image ID:      docker.io/fiware/orion-ld@sha256:b8f9618e9b089dcec3438c45a9e219daa36f7a428be892998703c6a6f8d736be                                                                                                                          
    Port:          1026/TCP                                                                                                                                                                                                                   
    Host Port:     0/TCP                                                                                                                                                                                                                      
    Args:                                                                                                                                                                                                                                     
      -dbhost                                                                                                                                                                                                                                 
      mongodb-headless.fiware.svc.cluster.local:27017/                                                                                                                                                                                        
      -logLevel                                                                                                                                                                                                                               
      WARN                                                                                                                                                                                                                                    
      -ctxTimeout                                                                                                                                                                                                                             
      10000                                                                                                                                                                                                                                   
      -mongocOnly                                                                                                                                                                                                                             
      -lmtmp                                                                                                                                                                                                                                  
      -coreContext                                                                                                                                                                                                                            
      v1.0                                                                                                                                                                                                                                    
    State:          Running                                                                                                                                                                                                                   
      Started:      Mon, 11 Sep 2023 12:20:24 +0200                                                                                                                                                                                           
    Ready:          True                                                                                                                                                                                                                      
    Restart Count:  0                                                                                                                                                                                                                         
    Limits:                                                                                                                                                                                                                                   
      cpu:     2                                                                                                                                                                                                                              
      memory:  16Gi                                                                                                                                                                                                                           
    Requests:                                                                                                                                                                                                                                 
      cpu:     50m                                                                                                                                                                                                                            
      memory:  64Mi                                                                                                                                                                                                                           
    Environment Variables from:                                                                                                                                                                                                               
      orion-ld-mongodb  Secret  Optional: false                                                                                                                                                                                               
    Environment:                                                                                                                                                                                                                              
      ORIONLD_MONGO_REPLICA_SET:  rs0                                                                                                                                                                                                         
      ORIONLD_MONGO_USER:         root                                                                                                                                                                                                        
      ORIONLD_MONGO_TIMEOUT:      4000                                                                                                                                                                                                        
      ORIONLD_MONGO_POOL_SIZE:    15                                                                                                                                                                                                          
      ORIONLD_MONGO_ID_INDEX:     TRUE                                                                                                                                                                                                        
      ORIONLD_STAT_COUNTERS:      TRUE                                                                                                                                                                                                        
      ORIONLD_STAT_SEM_WAIT:      TRUE                                                                                                                                                                                                        
      ORIONLD_STAT_TIMING:        TRUE                                                                                                                                                                                                        
      ORIONLD_SUBCACHE_IVAL:      60

kzangeli · 2023-09-11T10:30:11Z

ok ... difficult then ...
Describe your setup then, and I'll try to "guess" what's happening.
With "describe" I mean:

tell me what features you're using, like geo-properties, tenants, and the likes.
Also, how many entities you have, numbers of attrs, etc (all approx, naturally).
How many updates/second ... things like that
An example of a "typical" Entity
So I have "something" to go on.

DasNordlicht · 2023-09-11T10:56:01Z

Kubernetes v1.24.17
3xOrion-LD v1.4.0
a Orion-LD Pod has the following Limits:
cpu: 2
memory: 16Gi
All Entities have geo-properties and yes we have NGSI-LD Tenants
I think we have over 30 000 Entities at the Platform
I estimate that we have about 400 - 600 requests per second.

DasNordlicht · 2023-09-11T11:19:47Z

A typical entitie i think is a Scooter like this :

    {
        "id": "urn:ngsi-ld:Vehicle:BOLT:6f0ad756-4c73-4110-84ee-e8c5f1e16d47",
        "type": "Vehicle",
        "dateModified": {
            "type": "Property",
            "value": {
                "@type": "DateTime",
                "@value": "2023-09-11T11:13:24.000Z"
            }
        },
        "category": {
            "type": "Property",
            "value": "private"
        },
        "location": {
            "type": "GeoProperty",
            "value": {
                "coordinates": [
                    10.189408,
                    54.334972
                ],
                "type": "Point"
            }
        },
        "name": {
            "type": "Property",
            "value": "BOLT:eScooter:6f0ad756-4c73-4110-84ee-e8c5f1e16d47"
        },
        "refVehicleModel": {
            "type": "Property",
            "value": "urn:ngsi-ld:VehicleModel:BOLT:eScooter"
        },
        "serviceStatus": {
            "type": "Property",
            "value": "parked"
        },
        "vehiclePlateIdentifier": {
            "type": "Property",
            "value": "nicht bekannt"
        },
        "vehicleType": {
            "type": "Property",
            "value": "eScooter"
        },
        "annotations": {
            "type": "Property",
            "value": [
                "android:https%3A%2F%2Fbolt.onelink.me%2Falge%2Fffkz3db2%3Fdeep_link_value%3Dbolt%25253A%25252F%25252Faction%25252FrentalsSelectVehicleByRotatedUuid%25253Frotated_uuid%25253D6f0ad756-4c73-4110-84ee-e8c5f1e16d47",
                "ios:https%3A%2F%2Fbolt.onelink.me%2Falge%2Fffkz3db2%3Fdeep_link_value%3Dbolt%25253A%25252F%25252Faction%25252FrentalsSelectVehicleByRotatedUuid%25253Frotated_uuid%25253D6f0ad756-4c73-4110-84ee-e8c5f1e16d47",
                "pricing_plan_id:5959b310-f7ed-55f9-bd65-01e2ced234c2",
                "current_range_meters:8640"
            ]
        },
        "speed": {
            "type": "Property",
            "value": 0
        },
        "bearing": {
            "type": "Property",
            "value": 0
        },
        "owner": {
            "type": "Property",
            "value": "urn:ngsi-ld:Owner:BOLT"
        }
    }

kzangeli · 2023-09-11T11:33:18Z

ok, thanks. That I can use.
After how long approximately does the broker start to complain about file descriptor +1024 ?

kzangeli · 2023-09-11T11:36:47Z

2 cores and 16Gb per broker should give you about 5000 updates/second, per broker.
BUT, it depends very much on the size of your mongo instance.
I'm missing that information.
Perhaps the brokers are waiting for mongo and it queues up, and in the end, too many connections ...

We've seen, in loadtests we've performed in the FIWARE Foundation, that mongo needs about 3-5 times the resources that the broker has. But, that's for full speed, meaning around 15,000 updates/second (or 30k queries/sec) with your 3 brokers.

DasNordlicht · 2023-09-11T12:10:02Z

The MongoDB is a Replicaset with 2 Replicas without Limits
The two MongoDB Workers use on average 0.5 CPUs per POD and approx. 2 GB RAM

we have about 160 connections open on average and then rise to 300 connections for a short time when a surge of data arrives

DasNordlicht · 2023-09-11T21:07:00Z

ok, thanks. That I can use. After how long approximately does the broker start to complain about file descriptor +1024 ?

This sometimes happens within minutes, but can also go well for days.
We have not yet been able to find a rule for this.

DasNordlicht · 2023-12-16T13:02:57Z

Is this bug already fixed in 1.5.0?

kzangeli · 2023-12-17T09:15:37Z

Sorry, no time to look into this yet.
Have you tried to augment the max file descriptor (ulimit) ?

Might be that for some reason the connection broker/mongo is slow and the requests pile up.
I really don't have any other explanation for this error right now.
There is no file descriptor leakage. Had there been, nothing would work and everybody would have troubles.

So, try ulimit for more FDs and perhaps this solves our problem.

ibordach · 2024-01-09T10:50:30Z

I think there is a kind of leakage. What we can see:
orion is working quite fine for a period of time. Then suddenly the orion consumes hundreds of FDs in a very short period of time. If it reaches the FD limit of orion, it will crash.

New connection socket descriptor (1024) is not less than FD_SETSIZE (1024).
New connection socket descriptor (1024) is not less than FD_SETSIZE (1024).
INFO@10:33:50  orionld.cpp[521]: Signal Handler (caught signal 15)
INFO@10:33:50  orionld.cpp[528]: Orion context broker exiting due to receiving a signal

If only round about 900 FDs are in use (limit not reached), the orion will not crash as expected, but will never free up the FDs. We can monitor it with

watch 'ls -l /proc/1/fd|wc -l'

During this crash-period our mongodb replicaset has no real trouble with cpu or ram, there are some slow queries.

We can (easily) reproduce it, if we burst some data to the orion.

orion 1.4.0
mongodb 7.0.5 (with and without password)
kubernetes 1.27

ulimit:

[root@orion-ld-deployment-7b5d5f99c7-dc65w /]# ulimit -aS
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 160807
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

orion setup:

  containers:
  - args:
    - -dbhost
    - mongodbnoauth-headless.fiware-noauth.svc.cluster.local:27017/
    - -logLevel
    - DEBUG
    - -ctxTimeout
    - "10000"
    - -mongocOnly
    - -lmtmp
    - -coreContext
    - v1.0
    - -logForHumans
    env:
    - name: ORIONLD_MONGO_REPLICA_SET
      value: rs0
    - name: ORIONLD_MONGO_TIMEOUT
      value: "6000"
    - name: ORIONLD_MONGO_POOL_SIZE
      value: "15"
    - name: ORIONLD_MONGO_ID_INDEX
      value: "TRUE"
    - name: ORIONLD_STAT_COUNTERS
      value: "TRUE"
    - name: ORIONLD_STAT_SEM_WAIT
      value: "TRUE"
    - name: ORIONLD_STAT_TIMING
      value: "TRUE"
    - name: ORIONLD_STAT_NOTIF_QUEUE
      value: "TRUE"
    - name: ORIONLD_SUBCACHE_IVAL
      value: "60"
    - name: ORIONLD_DISTRIBUTED
      value: "TRUE"
    - name: ORIONLD_CONTEXT_DOWNLOAD_TIMEOUT
      value: "10000"
    - name: ORIONLD_NOTIF_MODE
      value: threadpool:120:8
    - name: ORIONLD_DEBUG_CURL
      value: "TRUE"

kzangeli · 2024-01-09T10:52:45Z

ok, interesting. Any idea what the "type" of the leaked file descriptors are?

ibordach · 2024-01-09T10:56:12Z

[root@orion-ld-deployment-7b5d5f99c7-dc65w /]# ls -l /proc/1/fd
total 0
lrwx------ 1 root root 64 Jan  9 10:44 0 -> /dev/null
l-wx------ 1 root root 64 Jan  9 10:44 1 -> 'pipe:[72034538]'
lrwx------ 1 root root 64 Jan  9 10:55 10 -> 'socket:[72033413]'
lrwx------ 1 root root 64 Jan  9 10:55 100 -> 'socket:[72048027]'
lrwx------ 1 root root 64 Jan  9 10:55 105 -> 'socket:[72049073]'
lrwx------ 1 root root 64 Jan  9 10:55 106 -> 'socket:[72048031]'
lrwx------ 1 root root 64 Jan  9 10:55 107 -> 'socket:[72087002]'
lrwx------ 1 root root 64 Jan  9 10:55 108 -> 'socket:[72049079]'
lrwx------ 1 root root 64 Jan  9 10:55 109 -> 'socket:[72049081]'
lrwx------ 1 root root 64 Jan  9 10:55 11 -> 'socket:[72034587]'
lrwx------ 1 root root 64 Jan  9 10:55 110 -> 'socket:[72049085]'
lrwx------ 1 root root 64 Jan  9 10:55 111 -> 'socket:[72049087]'
lrwx------ 1 root root 64 Jan  9 10:55 113 -> 'socket:[72049856]'
lrwx------ 1 root root 64 Jan  9 10:55 115 -> 'socket:[72048033]'
lrwx------ 1 root root 64 Jan  9 10:55 118 -> 'socket:[72084451]'
lrwx------ 1 root root 64 Jan  9 10:55 119 -> 'socket:[72049167]'
lrwx------ 1 root root 64 Jan  9 10:55 12 -> 'socket:[72033415]'
lrwx------ 1 root root 64 Jan  9 10:55 121 -> 'socket:[72049161]'
lrwx------ 1 root root 64 Jan  9 10:55 122 -> 'socket:[72051305]'
lrwx------ 1 root root 64 Jan  9 10:55 123 -> 'socket:[72049990]'
lrwx------ 1 root root 64 Jan  9 10:55 124 -> 'socket:[72050456]'
lrwx------ 1 root root 64 Jan  9 10:55 125 -> 'socket:[72094222]'
lrwx------ 1 root root 64 Jan  9 10:55 127 -> 'socket:[72049165]'
lrwx------ 1 root root 64 Jan  9 10:55 128 -> 'socket:[72092367]'
lrwx------ 1 root root 64 Jan  9 10:55 13 -> 'socket:[72033418]'
lrwx------ 1 root root 64 Jan  9 10:55 130 -> 'socket:[72086227]'
lrwx------ 1 root root 64 Jan  9 10:55 133 -> 'socket:[72050036]'
lrwx------ 1 root root 64 Jan  9 10:55 137 -> 'socket:[72050040]'
lrwx------ 1 root root 64 Jan  9 10:55 138 -> 'socket:[72099212]'
lrwx------ 1 root root 64 Jan  9 10:55 14 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Jan  9 10:55 140 -> 'socket:[72087112]'
lrwx------ 1 root root 64 Jan  9 10:55 142 -> 'socket:[72050100]'
...

kzangeli · 2024-01-09T10:59:54Z

ok, a socket ... doesn't really help.
I'll try to add traces. Not very much I can do though as it's normally 3rd party libraries doing this kind of things.

ibordach · 2024-01-09T15:45:18Z

Ok, we need more debugging options. We stripped down the orion to only have one subscription. We nearly send no data to the orion. At startup orion consumes 20 FDs for a period of time. Suddenly it consumes about 170 FDs and never frees them up.

kzangeli · 2024-01-09T17:34:19Z

Yeah, I was thinking a little about all this ...

I was going to propose a new test without any subscription at all.
Because, a "misbehaving" receptor of notifications could cause this problem, I believe.
A connection that is not properly closed can linger for 15 minutes in the worst case ...

So, please remove that last one subscription and see if the problem disappears.
That would be very important input for this issue.

ibordach · 2024-01-12T11:15:03Z

ok, this bug has nothing to do with replicaset or authentication of mongodb.

kzangeli · 2024-01-12T11:28:32Z

ok, this bug has nothing to do with replicaset or authentication of mongodb.

That's good to know.
I think it may have to do with notification recipients that don't close their part of the socket connection.
At least, that's a thing I'd like to have investigated.

A test, doing the exact same thing, but without any notifications (no matching subscriptions) would give important input.
And, if that works, then add subscriptions but with a well-known notification receiver, that we are sure closes it's fds.

Just to rule one thing out.
This is not an easy issue ...

ibordach · 2024-01-12T11:55:49Z

hmm...the problem persists without any subscription

kzangeli · 2024-01-12T11:57:59Z

ok, valueble info.
It's something else then ...
Pity!

kzangeli · 2024-02-11T14:17:50Z

So, I found something interesting in the mongoc release notes for 1.24.0 (Orion-LD currently uses 1.22.0 of libmongoc):

New Features:
  Support MongoDB server version 7.0.

So, one thing to test is to bump the mongoc version up to 1.24.0 (had problems compiling 1.25.x, so, that later)

AND, another thing we could try is to disable Prometheus metrics, that's on by default.
libprom is one of 3-4 libs that open file descriptors ...

This is all a bit of, blindly trying things, as I have no clue who leaves those file descriptors open.
But, perhaps we get lucky! :)

Two PRs coming:

Bump the libmongoc version
Add a CLI option (-no-metrics) to turn off libprom (it will stay "on" by default, that won't change - you'll have to start the broker with this new option)

ibordach · 2024-02-12T07:58:18Z

We just tested version 1577. The -noprom Feature did not fix the problem.

kzangeli · 2024-02-12T09:05:04Z

ok, good to know.
We'll keep trying.
The unmerged PR is giving me problems and I have a DevOps expert on it.
I'm hoping it will be merged later today but, can't promise anything. Out of my hands

ibordach · 2024-02-22T10:54:34Z

FD_SETSIZE error persists also with 1581

kzangeli · 2024-02-27T17:31:36Z

ok, I'd add a bit more. 10x for example.

Now, an update.
MongoDB 7.0 isn't supported by the mongoc drvier (v1.22.0) that is currently in use.
That's one possible culprit.
So, I've been working on updating the mongoc version the last few days.
There's a serious problem in the github actions scripts of Orion-LD that has made such a simple update quite difficult.
But, we're on it. I found help and we're almost there.
Hopefully tomorrow.

Once we have Orion-LD linked with mongoc 1.24.2 (that supports MongoDB 7.0), I'll let you know and we try again

ibordach · 2024-02-29T10:11:17Z

ok, this error still persists with 1.6.0-PRE-1587 with mongo 6 and mongo 7.
If the FD_SETSIZE error comes up, the orion is not responding any more on http requests.

kzangeli · 2024-02-29T12:30:21Z

I did a quick search on the error (should have done that ages ago ...).
The "article" is old, I know, but it might still be interesting.
Have a look at it and tell me what you think:
https://www.mongodb.com/community/forums/t/mongodb-4-2-19-extremely-high-file-descriptor-counts-70k/170223

ibordach · 2024-02-29T15:33:33Z

Ok, I checked some things on mongo.
We have no problem with the mongo: No crashes, no high use of FDs for open files, no replication problem. The mongo is peaceful :-)

kzangeli · 2024-03-01T14:49:05Z

ok, I guess that if the mongo server is OK, the mongo driver must be as well.
What on earth can it be then???

ibordach · 2024-03-01T15:36:41Z

👽Back to the roots.
Hopefully next week I will try to setup a fresh orion-ld with a fresh mongo in a seperate namespace. After that we start some experiments step by step. I can't believe we won't find the beast.

kzangeli · 2024-03-01T15:38:41Z

Yeah ... we'll get there I'm sure.
I'm away next week, FIWARE Foundation All-Hands Mon-Wed.
Then the week after that I'm preparing to move.
So, won't be too available. I'll do my best.

ibordach · 2024-04-25T14:30:49Z

small update on this: orion-ld (1.6.0-PRE-1608) crashes rarely without quantumleap. if we enable quantumleap, then we get tons of crashes

it could have something to do with subscriptions

ibordach · 2024-05-30T10:20:35Z

What else do we notice:
We get notifications with multiple entitys. But the entitys are all the same with different or old data. Should that happen? We expect to get the latest entity data.

example with about one unique entity several times with different (old) data:

{
    "_msgid": "12345",
    "payload": {
        "id": "urn:ngsi-ld:Notification:12345",
        "type": "Notification",
        "subscriptionId": "urn:ngsi-ld:subscription:12345",
        "notifiedAt": "2024-05-30T07:35:24.725Z",
        "data": [
            {
                "id": "urn:ngsi-ld:Vehicle:AIS:211891460",
                "type": "Vehicle",
                "vehicleConfiguration": "Other",
                "name": "WAVELAB",
                "vehicleIdentificationNumber": 4814550,
                "vehiclePlateIdentifier": "DD8087",
                "vehicleType": "vessel",
                "category": "tracked",
                "refVehicleModel": "urn:ngsi-ld:VehicleModel:vessel:211891460",
                "observationDateTime": {
                    "@type": "DateTime",
                    "@value": "2024-05-29T15:23:52.011Z"
                },
                "location": {
                    "coordinates": [
                        10.16625,
                        54.3388
                    ],
                    "type": "Point"
                },
                "speed": 6.3,
                "source": "IN:HDT",
                "dateObserved": {
                    "@type": "DateTime",
                    "@value": "2024-05-30T07:34:49.778Z"
                },
                "bearing": 5.82
            },
            {
                "id": "urn:ngsi-ld:Vehicle:AIS:211891460",
                "type": "Vehicle",
                "vehicleConfiguration": "Other",
                "name": "WAVELAB",
                "vehicleIdentificationNumber": 4814550,
                "vehiclePlateIdentifier": "DD8087",
                "vehicleType": "vessel",
                "category": "tracked",
                "refVehicleModel": "urn:ngsi-ld:VehicleModel:vessel:211891460",
                "observationDateTime": {
                    "@type": "DateTime",
                    "@value": "2024-05-29T15:23:52.011Z"
                },
                "location": {
                    "coordinates": [
                        10.16625,
                        54.3388
                    ],
                    "type": "Point"
                },
                "speed": 6.3,
                "dateObserved": {
                    "@type": "DateTime",
                    "@value": "2024-05-30T07:34:49.922Z"
                },
                "bearing": 5.8,
                "source": "IN:HDT"
            },
            {
                "id": "urn:ngsi-ld:Vehicle:AIS:211891460",
                "type": "Vehicle",
                "vehicleConfiguration": "Other",
                "name": "WAVELAB",
                "vehicleIdentificationNumber": 4814550,
                "vehiclePlateIdentifier": "DD8087",
                "vehicleType": "vessel",
                "category": "tracked",
                "refVehicleModel": "urn:ngsi-ld:VehicleModel:vessel:211891460",
                "observationDateTime": {
                    "@type": "DateTime",
                    "@value": "2024-05-29T15:23:52.011Z"
                },
                "location": {
                    "coordinates": [
                        10.16625,
                        54.3388
                    ],
                    "type": "Point"
                },
                "speed": 6.3,
                "dateObserved": {
                    "@type": "DateTime",
                    "@value": "2024-05-30T07:34:50.151Z"
                },
                "bearing": 5.8,
                "source": "IN:HDT"
            },
            {
                "id": "urn:ngsi-ld:Vehicle:AIS:211891460",
[...]

ibordach · 2024-06-12T12:55:34Z

By chance a new find. Right before the FD_SETSIZE we saw:
WARN@12:49:35 orionldAlterationsTreat.cpp[329]: Still not enough bytes read for the notification response body. I give up

Might that help?

kzangeli · 2024-06-12T13:26:42Z

It just might.
If the broker is still waiting to read bytes ...
I'll check what it does after saying "I give up". If it does not close that connection ... let's hope not!!! :)

kzangeli · 2024-06-13T11:12:37Z

So, I created a functest with 1000 notifications to a "bad notification client" - a notification client that accepts the connection and reads the incoming notification BUT it doesn't respond. It leaves the poor broker waiting for that response, and finally it times out.
Didn't work too well on the broker side, had to fix the mechanism a little.

Unfortunately, I believe I asked you guys to start the broker without any subscriptions at all and you still had your problem, so, this fix (coming in a few hours) doesn't change anything for you.

I'm still not convinced it's not just "normal execution". File descriptors are needed and the more load the broker receives, the more simultaneously open fds there will be.
I'd set the fd-max to a high value (hundreds of thousands) and test. As far as I know you haven't done that test yet and I really don't understand why.

Anyway, new version on its way, even though I doubt it will fix your problem.

kzangeli · 2024-06-13T16:36:42Z

The PR has been merged, in case you want to test. Dockerfiles should be ready shortly

ibordach · 2024-06-17T09:54:12Z

THX, we will test it.

About the fd-max: We have, as I see, no limit of 1024 filedescriptors.

[root@orion-ld-deployment-84fcb9c689-86jf2 /]# ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 160805
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Or how do you increase FD_SETSIZE outside the source?

ibordach · 2024-06-17T10:04:28Z

What we did in the last weeks to reduce the number of "crashes":

We reduced the number of upserts per second massively
We reduced the number of subscriptions
We reduced the use of idPattern in subscriptions (that seems to be successful)

After all these actions we are able to reduce our 8 instances of orion-ld to only 2 instances, possibly one.

But we will investigate further. idPattern could be a great problem.

kzangeli self-assigned this Sep 11, 2023

kzangeli added the bug Something isn't working label Sep 11, 2023

kzangeli mentioned this issue Oct 16, 2023

Planning #280

Open

ibordach mentioned this issue Feb 9, 2024

Orion-LD (1.4.0) crashes with double free detected in tcache 2 #1499

Closed

This was referenced Feb 11, 2024

Bumped libmongoc version from 1.22.0 to 1.24.0 #1553

Closed

New hidden CLI (-noprom) to turn off Prometheus metrics #1554

Merged

New connection socket descriptor (1024) is not less than FD_SETSIZE #1441

New connection socket descriptor (1024) is not less than FD_SETSIZE #1441

Comments

DasNordlicht commented Sep 11, 2023

kzangeli commented Sep 11, 2023

DasNordlicht commented Sep 11, 2023

kzangeli commented Sep 11, 2023

DasNordlicht commented Sep 11, 2023

DasNordlicht commented Sep 11, 2023

kzangeli commented Sep 11, 2023

kzangeli commented Sep 11, 2023

DasNordlicht commented Sep 11, 2023 • edited Loading

DasNordlicht commented Sep 11, 2023

DasNordlicht commented Dec 16, 2023

kzangeli commented Dec 17, 2023

ibordach commented Jan 9, 2024

kzangeli commented Jan 9, 2024

ibordach commented Jan 9, 2024

kzangeli commented Jan 9, 2024

ibordach commented Jan 9, 2024

kzangeli commented Jan 9, 2024

ibordach commented Jan 12, 2024

kzangeli commented Jan 12, 2024

ibordach commented Jan 12, 2024

kzangeli commented Jan 12, 2024

kzangeli commented Feb 11, 2024

ibordach commented Feb 12, 2024 • edited Loading

kzangeli commented Feb 12, 2024

ibordach commented Feb 22, 2024

kzangeli commented Feb 27, 2024

ibordach commented Feb 29, 2024

kzangeli commented Feb 29, 2024

ibordach commented Feb 29, 2024

kzangeli commented Mar 1, 2024

ibordach commented Mar 1, 2024

kzangeli commented Mar 1, 2024

ibordach commented Apr 25, 2024 • edited Loading

ibordach commented May 30, 2024

ibordach commented Jun 12, 2024

kzangeli commented Jun 12, 2024

kzangeli commented Jun 13, 2024

kzangeli commented Jun 13, 2024

ibordach commented Jun 17, 2024

ibordach commented Jun 17, 2024

DasNordlicht commented Sep 11, 2023 •

edited

Loading

ibordach commented Feb 12, 2024 •

edited

Loading

ibordach commented Apr 25, 2024 •

edited

Loading