-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible memory leak #45
Comments
how many inbound messages are you typically receiving in a given day? i'm assuming you're operating at about my scale, but i want to check. my vps has 1gig of memory, and my instance fits comfortably in that. memory usage does increase, but it doesn't grow until a crash. at least for me. |
Counting by the posts I see in the front page, I just got 9 posts yesterday before the server became unresponsive. I am not sure how many messages (pings etc.) I received. I have started collecting my logs; I will count the number of lines before this happens again and update. |
ok, i get 30-40 posts a minute on average. there's usually a pretty steep memory ramp when i first start the server, but then it generally plateaus, and it will run for days or weeks like that. i've had one or two crashes historically, but those might have been the server being killed off for memory issues unrelated to the server. that's not to say there's not a memory leak. i'm keeping an eye on my instance in case i introduced a problem recently. |
i'm trying to find an easy way to chart the memory usage/growth on my server. i don't believe i've done anything that would prevent instances from being garbage collected, but i could be wrong, or there could be bugs in the core libraries. |
I have not monitored my server in detail, yet. But ktistec (in docker) crawled from 600MB to 1600MB in the past 5 days. I have been a bit more active over the weekend, so I am not sure if that's a problem. Edit: that's on a VPS with 4GB RAM |
memory usage should generally be flat. it doesn't, for example, intentionally try to keep an increasing subset of data in memory. so if you're seeing growth like that day after day, then i'm confident you're seeing a memory leak. i'm going to add additional logging to dump gc and memory stats periodically to see if that help shed light on what's going on. |
commit 4ce1d3e introduces some code to capture memory information. |
Thanks; I have deployed this version. Let me know what to look for and report. |
Is there any way to see the current commit/version from a deployed instance? |
there is not. i thought version would be more useful—i didn't count on all the early adopters building from |
@vrthra i've found a petty good candidate bug for the memory leak. i believe sqlite is caching prepared statements for all queries issued. this wouldn't be a huge problem—queries should be parameterized—except in one query i embedded an id directly in the string instead of passing the value in as a parameter. this is in code that handles inbound activities, so it's going to result in a lot of caching of non-reusable statements. i don't know if this is the whole problem, but in local tests for me it's made a big difference in footprint over time. |
@vrthra i created a branch with some changes that seem to make thing significantly better for me: https://github.com/toddsundsted/ktistec/tree/memory-problems i need to make some improvements to the code before i put it in main, but it should improve things for you, as well. |
After deploying this, I get
Any idea? |
it looks like your database has two activitypub object records with the same id/iri. that migration rebuilds your notification and timeline. the server is processing a delete activity but it's finding more than one object. strictly speaking you probably don't need to run that migration, but the duplicate object is disturbing. i'm at a loss at the moment on how it could have been created. something like the following query should return the duplicates. i'd be interested to know how many you have: |
It doesn't seem link that is causing it.
|
ok. i don't have a better theory about the trace at the moment. thanks for testing! |
How do I turn off the migration? |
you can either delete src/database/migrations/000019-update-timeline-and-notifications.cr before you build, or give me ~1 more day to push out the changes in this branch. those changes will include a try/catch in that migration to log the stack traces, but continue running. |
@vrthra all changes have been merged into main, including some protection against the migration failing—you'll still see the error messages but the server should continue to start. i'd be interested in how many error messages you get, along with any other logging output. i don't get any errors running that migration, so i want to get a better understanding of what you're seeing. |
These are the logs when starting. I can also supply my |
weird weird weird. i'm guessing you're following some server i'm not and it's sending something i don't expect. maybe it's another aspect of the guppe problems. the one common thing is that it's only happening when processing deletes, so it's probably not noticeable in your timeline or notifications, but if you see anything that looks amiss there, let me know. i've signed up to guppe now, so if it shows up in any way related to that i should see if myself... |
@vrthra have you seen any improvement in memory usage on your system? |
thanks! there are still a few more possibilities, one is that there is another instance somewhere of a prepared statement with an interpolated value. sqlite also allocates pages for its own work, and i haven't looked to see how they are managed. generally what i want to understand now is does this plateau and/or does this eventually crash. i'll set some vm limits next time i start my instance and see what happens when it bumps into them. |
@vrthra the output from |
@toddsundsted please find attached the output of the query |
@vrthra thanks! it looks like there's more hunting to be done! |
update on this: i've added a chart to the metrics page that shows heap size to make it easier to track this. i still see heap growth over time, even on my instance, so the problem is there. |
Just so there is another deployment of ktistec in here. Memory usage seems to have changed since deploying 2.0.0-6 for me. The first server restart is the update to 2.0.0-5, when collection of heap-size starts. Second restart is the update to 2.0.0-6. Also, when upgrading to 2.0.0-6, I deleted duplicated rows from the DB, so maybe that's another factor(?). |
thanks! some good news on my end. after initial growth, the last 3-4 days on top of 2.0.0-6 have been flat for me, even when i look at the hourly data. i'm going to deploy again today, so i'll lose the trend—i'm going to keep this open a little longer to see if there are any data points refuting the improvements. |
i see this, as well. i'm reporting the values return by the garbage collector, but it's possible i'm interpreting them incorrectly. fwiw, over the last couple days i saw a dip in heap for the first time. unfortunately, i'm not monitoring the process size outside of the app, so i don't know where there was a corresponding reduction reported there. |
@toddsundsted the larger memory leak is back again after upgrading to 07. |
thanks! yes, it's definitely been more consistent. i wish i knew better what fixed it. |
@vrthra i'd be interested to know if you are running the latest version of ktistec, and if you are still seeing a leak. i've been able to run the latest for extended periods and memory definitely plateaus for me. |
@vrthra if there's any indication at all toward the end of the log file about what led up to the crash, i'd love to see that. a stack trace or other abnormal error message. |
I have now checked it two times, and in both times, Ktistec seem to have just exited without an error or a trace. The process just exits. Any idea what I should be looking for in the log or what the cause could be? The log is 500 MB |
in my experience this happens when the OS kills the process. there does appear to be a memory spike over the last couple days. did anything out of the ordinary happen then? what are the two dates when you checked and it had been killed? i'm guessing ~8/27 and ~7/18? or was it twice recently? |
fwiw, i've also been running the exact same server for the last ~6-7 months. the restarts are due to OS updates that required a reboot. |
There is a possible memory leak; Here is my memory utilization;
At this point, the machine becomes unresponsive, and has to be rebooted. It is an oracle cloud instance, and Ktistec is the only application running on it.
The text was updated successfully, but these errors were encountered: