-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistence of stale session ids in MnesiaCache #713
Comments
Very peculiar. This will take some effort to figure out, but could very well be a bug! First, lets verify that when you fetch the key calling Next, let's check what format the existing keys have and what's the expire timestamp value. Maybe something have set a much larger expire timestamp. If the config setup above worked it should return them. If not you should be able to query them directly from Mnesia with The First we'll check if you can fetch the key and values so we can further debug if there's anything different about the old sessions. |
Thank you for the reply @danschultzer ! Indeed once I added the missing config :app, :pow,
user: App.Auth.User,
users_context: App.Auth,
extensions: [PowResetPassword, PowEmailConfirmation, PowPersistentSession, PowInvitation],
repo: App.Repo,
cache_store_backend: Pow.Store.Backend.MnesiaCache,
persistent_session_cookie_opts: [
secure: true
] Querying the expire timestamps, it does look like there's a significant chunk that's already past their expiration date. iex> alias Pow.Store.Backend.MnesiaCache
iex> :mnesia.dirty_select(MnesiaCache, [{{MnesiaCache, ["credentials", :_], :_}, [], [:"$_"]}]) |>
Map.new(fn {_cache, key, {_, expire}} -> {key, DateTime.from_unix(expire, :millisecond) end)
%{
["credentials", "bb8a0a9b-d5dd-4d6e-8913-f4de38bafbd5"] => {:ok,
~U[2023-08-28 21:16:42.873Z]},
["credentials", "65981219-a750-4001-8361-586ad5f07a5c"] => {:ok,
~U[2023-08-10 20:28:33.378Z]},
["credentials", "db7ffa49-143a-46cb-90e8-c4efba633849"] => {:ok,
~U[2023-08-10 19:02:25.777Z]},
["credentials", "b7709629-17e8-42f8-ad87-8d5ab3137a0e"] => {:ok,
~U[2023-07-13 20:15:53.189Z]},
["credentials", "bebe6449-2c8e-49fb-9da5-146a4212b5cf"] => {:ok,
~U[2023-06-13 23:52:52.089Z]},
["credentials", "dc395bc3-a9b5-4d2d-b78c-9a95ef6beb7e"] => {:ok,
~U[2023-07-14 19:53:13.103Z]},
["credentials", "784fb420-a415-41c6-9c5a-56fd27cd1f7b"] => {:ok,
~U[2023-06-27 22:24:41.075Z]},
["credentials", "78d0b931-6258-4e53-a7e9-6d9ea0501a73"] => {:ok,
~U[2023-08-29 20:58:04.250Z]},
["credentials", "96e426c6-cfcc-44f5-a664-351af9a52c6d"] => {:ok,
~U[2023-05-24 22:32:45.565Z]},
...
|
I don't believe so as you can fetch the credentials setting the cache. The only way you can override this namespace is to make a custom cache module. Back to what could go wrong here. Following the codepath we know that these are added to the invalidators when starting the mnesia cache (as you already wrote): pow/lib/pow/store/backend/mnesia_cache.ex Lines 455 to 460 in 7c57758
The TTL being 0 is what catches my eye here. We are calling pow/lib/pow/store/backend/mnesia_cache.ex Lines 229 to 234 in 7c57758
pow/lib/pow/store/backend/mnesia_cache.ex Lines 236 to 238 in 7c57758
pow/lib/pow/store/backend/mnesia_cache.ex Lines 202 to 206 in 7c57758
pow/lib/pow/store/backend/mnesia_cache.ex Lines 208 to 213 in 7c57758
From reading the docs where it says that the messages may be received before the message send by |
No, I can't reproduce the error by only setting an expired TTL and restarting: # test/pow/store/backend/mnesia_cache_test.exs
test "expired records are purged immediately on startup" do
:mnesia.dirty_write({MnesiaCache, ["pow:test", "key"], {"value", :os.system_time(:millisecond) - 1}})
assert MnesiaCache.get(@default_config, "key") == "value"
restart(@default_config)
assert_receive {:mnesia_table_event, {:delete, _, _}} # Wait delete to happen
assert MnesiaCache.get(@default_config, "key") == :not_found
end This all works as expected. I'll try a few other things, and see how I can reproduce something where an expired session isn't being removed. |
I've inspected the codepath locally with a copy of our staging environment and one thing that caught my attention is the With the inspects still in place, I then tried to sign in with an user and lo and behold, this time the config was there: 🤯
The MnesiaCache usage docs do mention iex> :mnesia.dirty_select(MnesiaCache, [{{MnesiaCache, ["credentials", :_], :_}, [], [:"$_"]}])
[] There are still invalidators in there, but only those with UUID keys, and they still have mostly active timers:
Here's what I'd logged (before making any modifications to the namespace):
Would it be fine to explicitly pass the namespace to |
We know that expired sessions gets picked up when mnesiacache inits, but for some reason the trigger to actually expires it either gets cancelled, or never runs in the first place. Alternatively something rewrite the session though I have no idea why. It smells of some race condition. Are the expired sessions consistent (it's the same key)? Do you use Mnesia clustering or is this just a single node? |
I see, that makes perfect sense. I've worked on a more contrived example to reproduce this from an empty mnesia state. I got to the problematic state after doing these steps: (for these, using a short TTL will be convenient) 1. Spin up the phoenix serverConsole output:init([]): {:ok, %{invalidators: %{}}} 2. Authenticate an userVia the custom authorization plug from the API tutorial, this calls Console output:trigger_ttl("7c56f349-a262-49c6-a3d7-013bed0d1691", 1800000, [ttl: 1800000, namespace: "credentials"]): #Reference<0.176822019.4286054407.197793>
trigger_ttl([App.Auth.User, :user, 1], 1800000, [ttl: 1800000, namespace: "credentials"]): #Reference<0.176822019.4286054404.201834>
trigger_ttl([App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"], 1800000, [ttl: 1800000, namespace: "credentials"]): #Reference<0.176822019.4286054404.201837>
trigger_ttl("887a5604-6f68-4e66-9f2d-054604175e29", 2592000000, [ttl: 2592000000, namespace: "persistent_session"]): #Reference<0.176822019.4286054404.201840> 3. Stop the server, wait until after the TTL passes, then start it up againConsole output:trigger_ttl([App.Auth.User, :user, 1], 0, []): #Reference<0.3574711379.2956460033.218191>
append_invalidator(["credentials", App.Auth.User, :user, 1], %{}, 0, []): %{[App.Auth.User, :user, 1] => #Reference<0.3574711379.2956460033.218191>}
trigger_ttl([App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"], 0, []): #Reference<0.3574711379.2956460033.218205>
append_invalidator(["credentials", App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"], %{[App.Auth.User, :user, 1] => #Reference<0.3574711379.2956460033.218191>}, 0, []): %{
[App.Auth.User, :user, 1] => #Reference<0.3574711379.2956460033.218191>,
[App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"] => #Reference<0.3574711379.2956460033.218205>
}
trigger_ttl("7c56f349-a262-49c6-a3d7-013bed0d1691", 0, []): #Reference<0.3574711379.2956460033.218211>
append_invalidator(["credentials", "7c56f349-a262-49c6-a3d7-013bed0d1691"], %{[App.Auth.User, :user, 1] => #Reference<0.3574711379.2956460033.218191>, [App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"] => #Reference<0.3574711379.2956460033.218205>}, 0, []): %{
[App.Auth.User, :user, 1] => #Reference<0.3574711379.2956460033.218191>,
[App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"] => #Reference<0.3574711379.2956460033.218205>,
"7c56f349-a262-49c6-a3d7-013bed0d1691" => #Reference<0.3574711379.2956460033.218211>
}
trigger_ttl("887a5604-6f68-4e66-9f2d-054604175e29", 2587044572, []): #Reference<0.3574711379.2956460033.218217>
append_invalidator(["persistent_session", "887a5604-6f68-4e66-9f2d-054604175e29"], %{[App.Auth.User, :user, 1] => #Reference<0.3574711379.2956460033.218191>, [App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"] => #Reference<0.3574711379.2956460033.218205>, "7c56f349-a262-49c6-a3d7-013bed0d1691" => #Reference<0.3574711379.2956460033.218211>}, 2587044572, []): %{
[App.Auth.User, :user, 1] => #Reference<0.3574711379.2956460033.218191>,
[App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"] => #Reference<0.3574711379.2956460033.218205>,
"7c56f349-a262-49c6-a3d7-013bed0d1691" => #Reference<0.3574711379.2956460033.218211>,
"887a5604-6f68-4e66-9f2d-054604175e29" => #Reference<0.3574711379.2956460033.218217>
}
init([]): {:ok,
%{
invalidators: %{
[App.Auth.User, :user, 1] => #Reference<0.3574711379.2956460033.218191>,
[App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"] => #Reference<0.3574711379.2956460033.218205>,
"7c56f349-a262-49c6-a3d7-013bed0d1691" => #Reference<0.3574711379.2956460033.218211>,
"887a5604-6f68-4e66-9f2d-054604175e29" => #Reference<0.3574711379.2956460033.218217>
}
}}
fetch([], [App.Auth.User, :user, 1]): nil
handle_info({:invalidate, [], [App.Auth.User, :user, 1]}, ...): {:noreply,
%{
invalidators: %{
[App.Auth.User, :user, 1] => #Reference<0.3574711379.2956460033.218191>,
[App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"] => #Reference<0.3574711379.2956460033.218205>,
"7c56f349-a262-49c6-a3d7-013bed0d1691" => #Reference<0.3574711379.2956460033.218211>,
"887a5604-6f68-4e66-9f2d-054604175e29" => #Reference<0.3574711379.2956460033.218217>
}
}}
fetch([], [App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"]): nil
handle_info({:invalidate, [], [App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"]}, ...): {:noreply,
%{
invalidators: %{
[App.Auth.User, :user, 1] => #Reference<0.3574711379.2956460033.218191>,
[App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"] => #Reference<0.3574711379.2956460033.218205>,
"7c56f349-a262-49c6-a3d7-013bed0d1691" => #Reference<0.3574711379.2956460033.218211>,
"887a5604-6f68-4e66-9f2d-054604175e29" => #Reference<0.3574711379.2956460033.218217>
}
}}
fetch([], "7c56f349-a262-49c6-a3d7-013bed0d1691"): nil
handle_info({:invalidate, [], "7c56f349-a262-49c6-a3d7-013bed0d1691"}, ...): {:noreply,
%{
invalidators: %{
[App.Auth.User, :user, 1] => #Reference<0.3574711379.2956460033.218191>,
[App.Auth.User, :user, 1, :session, "7c56f349-a262-49c6-a3d7-013bed0d1691"] => #Reference<0.3574711379.2956460033.218205>,
"7c56f349-a262-49c6-a3d7-013bed0d1691" => #Reference<0.3574711379.2956460033.218211>,
"887a5604-6f68-4e66-9f2d-054604175e29" => #Reference<0.3574711379.2956460033.218217>
}
}} The expired sessions and invalidators remained untouched, with the same key as before, locally on a single node. It does look like some sort of race condition, but by looking at the code/logs I can't tell for sure what might be causing it. I want to build a minimal repo to reproduce this, but I might not be able to get to it for the next few days... |
I see, reading this I strongly suspect it has something to do with this part: pow/lib/pow/store/backend/mnesia_cache.ex Lines 209 to 212 in 7c57758
If for some reason fetching for the key returns nil then the invalidators remain be untouched (which I think in the first place is incorrect, it should still clear the invalidator ref). I'm unsure how that could happen without the key actually having been deleted, and I haven't been able to reproduce this in tests. I'll try set up a full demo app in local to see if I can reproduce it that way (since you are able to reproduce it consistently it should be trivial). |
I've found a way to reproduce it, and indeed it seems to be a race condition. I'll work on a solution. |
I found the issue, it was not a race condition, but incorrect handling of the keys. I have publish v1.0.34 that resolves this, please let me know if it works correctly now, and all the expired keys are being wiped. This is a more serious issue than at first glance, so thank you so much for bringing this up! A CVE will be published shortly. |
Hello all! :)
I have recently noticed a peculiar scenario in an API using Mnesia-backed Pow in production which I'm interested in reproducing to get to the bottom of it, but I've had no success thus far.
Due to reasons yet unknown some sessions are seemingly persisting in the cache even after their TTL has passed.
We have set up an authentication plug similar to the one in "How to use Pow in an API" tutorial and we are using
Pow.Store.Backend.MnesiaCache
as ourcache_store_backend
. Under normal circumstances, creating/renewing/deleting sessions via the plug works just as expected, as well as having them expire automatically. Even though the application is containerized, mnesia data is kept between versions in an attached persistent volume. I'm currently using this snippet to check active sessions for an user:Testing with a newly-created user, this list behaves normally, having a session id inserted after signing in, replaced after renewal, and deleted after signing out or after 30 minutes. However, for some of the earlier users, this list has been steadily growing over the months, even reaching hundreds of session ids. Luckily, these sessions cannot be used for authentication, since
Pow.Store.CredentialsCache.get/1
returns:not_found
.Reproducing these steps in CredentialsCache, the first call for
Base.get
returns the expected{user_key, metadata}
, but the second one returns:not_found
. Manually assembling the session_key ([struct, :user, id, :session, session_id]
) I was also able to verify that the timestamps are also still present in the cache.Next, I tried to inspect the current state of MnesiaCache as well using the code below:
And found that all of the stale sessions still had invalidators in the state, and the call to
Process.read_timer/1
to each of their References returnedfalse
which suggests they should be expired. Then I went to check the deletion logic in MnesiaCache and tried to reproduce its steps as well. Right away it begins by fetching the key so I figured I'd get the same result by callingget/2
:Which according to
table_get/2
means that the fetch has returnednil
. This seemed off since the same call forPow.Store.Base.get/3
yielded the {user, metadata} earlier, and it was at this point that I decided to reach out for help. 😅What could be going off here? I suspect it has to do with sessions that persist between deploys, but I couldn't reproduce it consistently yet.
Any help will be much appreciated, thanks!
The text was updated successfully, but these errors were encountered: