* 1st argument: the table identifier does not refer to an existing ETS table #181

maxguzenski · 2023-01-06T04:18:18Z

Hi,
I have an app with a intensive use of nebulex and more or less 100 times per day I receive this error on elixir logs,
can anyone help me?

(ArgumentError errors were found at the given arguments:

  * 1st argument: the table identifier does not refer to an existing ETS table
)

The flow is: it happens many times at once, run many hours without issue, it happens again... (repeat)

my nebulex config

config :app, App.Cache,
  gc_cleanup_min_timeout: :timer.seconds(10),
  gc_cleanup_max_timeout: :timer.minutes(10),
  allocated_memory: 8_000_000_000

defmodule App.Cache do
  use Nebulex.Cache,
    otp_app: :app,
    adapter: Nebulex.Adapters.Local
end

Error on sentry:

The text was updated successfully, but these errors were encountered:

cabol · 2023-01-06T09:43:22Z

Hi 👋 !! Real quickly, are you using the latest Nebulex version? or what version are you using?

maxguzenski · 2023-01-06T16:08:33Z

Everything is updated.

Dependency              Current  Latest  Status
absinthe                1.6.6    1.7.0   Update possible
absinthe_phoenix        2.0.2    2.0.2   Up-to-date
absinthe_plug           1.5.8    1.5.8   Up-to-date
bamboo                  2.2.0    2.2.0   Up-to-date
bcrypt_elixir           2.3.0    3.0.1   Update not possible
broadway                1.0.5    1.0.5   Up-to-date
broadway_sqs            0.7.2    0.7.2   Up-to-date
comeonin                5.3.3    5.3.3   Up-to-date
cors_plug               2.0.3    3.0.3   Update not possible
cowboy                  2.9.0    2.9.0   Up-to-date
credo                   1.6.7    1.6.7   Up-to-date
dataloader              1.0.10   1.0.10  Up-to-date
decorator               1.4.0    1.4.0   Up-to-date
dialyxir                1.2.0    1.2.0   Up-to-date
ecto_psql_extras        0.7.10   0.7.10  Up-to-date
ecto_sql                3.9.2    3.9.2   Up-to-date
eqrcode                 0.1.10   0.1.10  Up-to-date
ex_aws                  2.4.1    2.4.1   Up-to-date
ex_aws_lambda           2.1.0    2.1.0   Up-to-date
ex_aws_s3               2.3.3    2.3.3   Up-to-date
ex_aws_sqs              3.3.1    3.4.0   Update possible
feeder_ex               1.1.0    1.1.0   Up-to-date
gettext                 0.20.0   0.21.0  Update possible
guardian                2.3.1    2.3.1   Up-to-date
hackney                 1.18.1   1.18.1  Up-to-date
httpoison               1.8.2    1.8.2   Up-to-date
jason                   1.4.0    1.4.0   Up-to-date
jose                    1.11.5   1.11.5  Up-to-date
mock                    0.3.7    0.3.7   Up-to-date
nebulex                 2.4.2    2.4.2   Up-to-date
phoenix                 1.6.15   1.6.15  Up-to-date
phoenix_ecto            4.4.0    4.4.0   Up-to-date
phoenix_html            3.2.0    3.2.0   Up-to-date
phoenix_live_dashboard  0.7.2    0.7.2   Up-to-date
phoenix_pubsub          2.1.1    2.1.1   Up-to-date
plug_cowboy             2.6.0    2.6.0   Up-to-date
poolboy                 1.5.2    1.5.2   Up-to-date
postgrex                0.16.5   0.16.5  Up-to-date
remote_ip               1.0.0    1.1.0   Update not possible
sentry                  8.0.6    8.0.6   Up-to-date
shards                  1.1.0    1.1.0   Up-to-date
sobelow                 0.11.1   0.11.1  Up-to-date
telemetry               1.1.0    1.2.0   Update not possible
telemetry_metrics       0.6.1    0.6.1   Up-to-date
telemetry_poller        1.0.0    1.0.0   Up-to-date
timex                   3.7.9    3.7.9   Up-to-date
ua_inspector            2.2.0    3.0.1   Update not possible
uuid                    1.1.8    1.1.8   Up-to-date
xxhash                  0.3.1    0.3.1   Up-to-date

maxguzenski · 2023-01-06T16:14:00Z

I move backed to :shard to have a try, it created 48 shards (I run this app on an aws c5.12xlarge, into a ubuntu docker).
Until now, with shards, it happens 2 times

maxguzenski · 2023-01-06T16:23:49Z

Now, with shards, I have others errors

ErlangError :shards_meta.get/1

ArgumentError :erlang.send_after/3

maxguzenski · 2023-01-06T16:29:36Z

Maybe when the old ets table is removed there are still accesses to it?

cabol · 2023-01-07T08:55:45Z

Right, let me check, it is weird because when a new generation is created, the older one is just marked for deletion, but it is not deleted immediately, that happens the next time a new generation is created, giving time precisely to processes still hitting it, which must be a very short period of time, because the references to the new and old generation have changed. Anyway, let me review this and try to reproduce it somehow.

cabol · 2023-01-07T10:32:58Z

I was investigating and the case this may happen is if for example two (or more) generations are created one right after the other, almost at the same time not giving time to existing processing to update the new generation list, but it is very weird. My first suspicion is that maybe the GC timer and the cleanup timer are happening one right after the other causing a situation like this, it is very unusual, but it may happen, especially if you mention you have a high volume of data and load on the cache. So, I made a fix to avoid it, I pushed the changes to the master branch already, please try it out and let me know if the issue persists, let's first discard this.

maxguzenski · 2023-01-07T10:57:43Z

Ok, I'll give it a try at production... on monday morning

maxguzenski · 2023-01-07T11:08:29Z

Those are my request count/min, every request access nebulex more than once,
anyway the cache takes many hours to reach 8gb

So, by the number of requests over errors on sentry, it represents much less than 0.01%

cabol · 2023-01-09T11:15:51Z

I'm still investigating because it is not yet clear to me why this is happening, I've pushed some fixes/improvements in the meantime, perhaps it won't fix this issue but let me know once you try it out.

maxguzenski · 2023-01-09T17:26:35Z

I'm not push your new code on production yet (I will soon), but I changed config 3 days ago to:

  # from 
  gc_cleanup_min_timeout: :timer.minutes(10),
  gc_cleanup_max_timeout: :timer.minutes(60)

 #to 
  gc_interval: :timer.hours(24),
  gc_cleanup_min_timeout: :timer.minutes(10),
  gc_cleanup_max_timeout: :timer.minutes(60)

And everything is running without issues, in this time, the cache cleanup itself 4 times with no problems. I believe gc interval (with default value) is related with the bug.

maxguzenski · 2023-01-09T17:53:59Z

Ok, there is a bug... I will make a video, something is very wrong.

maxguzenski · 2023-01-09T18:28:32Z

My mistake... I need more time investigating it. what I thought was a bug might is not.

cabol · 2023-01-10T09:06:52Z

I'm not push your new code on production yet (I will soon), but I changed config 3 days ago to:
  # from 
  gc_cleanup_min_timeout: :timer.minutes(10),
  gc_cleanup_max_timeout: :timer.minutes(60)

 #to 
  gc_interval: :timer.hours(24),
  gc_cleanup_min_timeout: :timer.minutes(10),
  gc_cleanup_max_timeout: :timer.minutes(60)
And everything is running without issues, in this time, the cache cleanup itself 4 times with no problems. I believe gc interval (with default value) is related with the bug.

Interesting, this gives me a better idea of where to look into it, will check and keep you posted, thanks!!

maxguzenski · 2023-01-10T20:14:46Z

How exactly is the allocated_memory parameter along with a high gc_interval value supposed to work when memory reaches the maximum allowed?

A) Everything is deleted and new ets are created
B) New ets is created and old cache is kept for some time.

I have the impression that in my app option "A" is being made. Becasuse used memory drops from 16gb (allocated_memory) to near zero.

I'm adding some telemetry data to help see how the cache behaves over time and I'll be putting it all into production soon.

cabol · 2023-01-14T08:23:35Z

Hey 👋 !! There was a fix very much related to this one (#183), perhaps you want to try it out, it is on the master branch. Let me know how it goes, stay tuned!

maxguzenski · 2023-01-19T22:08:47Z

Probably yes, because I tested it with several configurations of gc_interval and gc_cleanup_min_timeout, and the bug only happens when gc_interval is not setted.

I'll test master branch.

cabol · 2023-03-26T07:25:15Z

Closing this issue but feel free to reopen it if even with fix #183 the issue persists.

pierre-pretorius · 2023-07-19T17:21:39Z

@cabol we see the same issue with nebulex 2.5.1. Will try it soon with the latest 2.5.2. The stack trace reports :ets.lookup/2:

ArgumentError
:ets.lookup/2

(ArgumentError errors were found at the given arguments:

  * 1st argument: the table identifier does not refer to an existing ETS table
)
Stack Trace

Newest

ArgumentError: errors were found at the given arguments:

  * 1st argument: the table identifier does not refer to an existing ETS table

  Module "ets", in :ets.lookup/2
  File "lib/nebulex/adapters/local.ex", line 753, in Nebulex.Adapters.Local.get_entry/4
  File "lib/nebulex/adapters/local.ex", line 465, in Nebulex.Adapters.Local.gen_fetch/4
  File "lib/nebulex/adapters/local.ex", line 456, in Nebulex.Adapters.Local.do_get/3
  File "lib/nebulex/adapters/local.ex", line 447, in Nebulex.Adapters.Local.get_/2
  File "lib/nebulex/adapters/local.ex", line 444, in anonymous fn/3 in Nebulex.Adapters.Local.get_/2
  File "/opt/sa_cloud/deps/telemetry/src/telemetry.erl", line 272, in :telemetry.span/3
  File "lib/nebulex/adapters/local.ex", line 440, in Nebulex.Adapters.Local.get/3

Sometimes the stack trace says :ets.take/2:

ArgumentError
:ets.take/2

(ArgumentError errors were found at the given arguments:
  * 1st argument: the table identifier does not refer to an existing ETS table
)

ArgumentError: errors were found at the given arguments:

  * 1st argument: the table identifier does not refer to an existing ETS table

  Module "ets", in :ets.take/2
  File "lib/nebulex/adapters/local.ex", line 761, in Nebulex.Adapters.Local.pop_entry/4
  File "lib/nebulex/adapters/local.ex", line 465, in Nebulex.Adapters.Local.gen_fetch/4
  File "lib/nebulex/adapters/local.ex", line 457, in Nebulex.Adapters.Local.do_get/3
  File "lib/nebulex/adapters/local.ex", line 447, in Nebulex.Adapters.Local.get_/2
  File "lib/nebulex/adapters/local.ex", line 444, in anonymous fn/3 in Nebulex.Adapters.Local.get_/2
  File "/opt/sa_cloud/deps/telemetry/src/telemetry.erl", line 272, in :telemetry.span/3
  File "lib/nebulex/adapters/local.ex", line 440, in Nebulex.Adapters.Local.get/3

Our Nebulex usage is relatively simple:

defmodule MyApp.Cache do
  use Nebulex.Cache,
      otp_app: :sa_cloud,
      adapter: Nebulex.Adapters.Replicated
end

The only methods we call is cache put and get:

MyApp.Cache.get(string)
MyApp.Cache.put(string, data)

We do however run two Elixir nodes on the same machine so that we can restart Elixir without downtime.

rel/env.sh.eex:

#!/bin/sh

# Sets and enables heart (recommended only in daemon mode)
# case $RELEASE_COMMAND in
#   daemon*)
#     HEART_COMMAND="$RELEASE_ROOT/bin/$RELEASE_NAME $RELEASE_COMMAND"
#     export HEART_COMMAND
#     export ELIXIR_ERL_OPTIONS="-heart"
#     ;;
#   *)
#     ;;
# esac

# Set the release to work across nodes. If using the long name format like
# the one below (my_app@127.0.0.1), you need to also uncomment the
# RELEASE_DISTRIBUTION variable below. Must be "sname", "name" or "none".
export RELEASE_DISTRIBUTION=name
export RELEASE_MODE=embedded
export RELEASE_NODE=<%= @release.name %>_$PORT@127.0.0.1

config/releases.exs:

config :kernel,
       sync_nodes_optional: [:"my_app_3000@127.0.0.1", :"my_app_3001@127.0.0.1"],
       sync_nodes_timeout: 10000

cabol · 2023-07-20T07:04:02Z

Hey 👋 ! May you share the cache config? Also, if you can elaborate a bit more on when this happens, like if there is a particular situation or something, or just happens randomly? Thanks!

cabol added a commit that referenced this issue Jan 7, 2023

[#181] Reset cleanup timer when the GC timeout pushes a new generation

536310d

cabol added a commit that referenced this issue Jan 7, 2023

[#181] Reset cleanup timer when the GC timeout pushes a new generation

bc2a28b

cabol added a commit that referenced this issue Jan 7, 2023

[#181] Reset cleanup timer when the GC timeout pushes a new generation

f253bcc

cabol added a commit that referenced this issue Jan 8, 2023

[#181] Reset cleanup timer when the GC timeout pushes a new generation

4bbd203

This was referenced Jan 13, 2023

nebulex crash when gc_interval is not set #182

Closed

fix the crash when gc_interval is not set #183

Merged

cabol pushed a commit that referenced this issue Jan 14, 2023

[#182][#181] Fix the crash when :gc_interval is not set (#183)

1da2129

cabol closed this as completed Mar 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

* 1st argument: the table identifier does not refer to an existing ETS table #181

* 1st argument: the table identifier does not refer to an existing ETS table #181

maxguzenski commented Jan 6, 2023

cabol commented Jan 6, 2023

maxguzenski commented Jan 6, 2023

maxguzenski commented Jan 6, 2023

maxguzenski commented Jan 6, 2023

maxguzenski commented Jan 6, 2023

cabol commented Jan 7, 2023

cabol commented Jan 7, 2023

maxguzenski commented Jan 7, 2023

maxguzenski commented Jan 7, 2023 •

edited

Loading

cabol commented Jan 9, 2023

maxguzenski commented Jan 9, 2023

maxguzenski commented Jan 9, 2023

maxguzenski commented Jan 9, 2023

cabol commented Jan 10, 2023 •

edited

Loading

maxguzenski commented Jan 10, 2023 •

edited

Loading

cabol commented Jan 14, 2023

maxguzenski commented Jan 19, 2023

cabol commented Mar 26, 2023

pierre-pretorius commented Jul 19, 2023

cabol commented Jul 20, 2023

* 1st argument: the table identifier does not refer to an existing ETS table #181

* 1st argument: the table identifier does not refer to an existing ETS table #181

Comments

maxguzenski commented Jan 6, 2023

my nebulex config

Error on sentry:

cabol commented Jan 6, 2023

maxguzenski commented Jan 6, 2023

maxguzenski commented Jan 6, 2023

maxguzenski commented Jan 6, 2023

ErlangError :shards_meta.get/1

ArgumentError :erlang.send_after/3

maxguzenski commented Jan 6, 2023

cabol commented Jan 7, 2023

cabol commented Jan 7, 2023

maxguzenski commented Jan 7, 2023

maxguzenski commented Jan 7, 2023 • edited Loading

cabol commented Jan 9, 2023

maxguzenski commented Jan 9, 2023

maxguzenski commented Jan 9, 2023

maxguzenski commented Jan 9, 2023

cabol commented Jan 10, 2023 • edited Loading

maxguzenski commented Jan 10, 2023 • edited Loading

cabol commented Jan 14, 2023

maxguzenski commented Jan 19, 2023

cabol commented Mar 26, 2023

pierre-pretorius commented Jul 19, 2023

cabol commented Jul 20, 2023

maxguzenski commented Jan 7, 2023 •

edited

Loading

cabol commented Jan 10, 2023 •

edited

Loading

maxguzenski commented Jan 10, 2023 •

edited

Loading