Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLATFORM-474]: [Auth0Ex] Make observability of cache set errors better #87

Conversation

cpiemontese
Copy link
Contributor

https://prima-assicurazioni-spa.myjetbrains.com/youtrack/issue/PLATFORM-474

This PR tries to enhance the observability of the library by

  • adding missing error logs on token encryption failure
  • adding a metric to track how many tokens are obtained from Auth0

Moreover, we are considering the possibility of adding a flag to make the service crash if something goes wrong, which could be used in staging/qa to detect certain errors more easily.

@cpiemontese
Copy link
Contributor Author

I am REALLY unsure about how I introduced statix into the library.
Should the library even have statix as a dependency?
Should the library explicitly connect to statix or should we make it so that a statix module can passed to it?

@cpiemontese cpiemontese reopened this May 30, 2022
@cpiemontese cpiemontese requested a review from dallagi May 30, 2022 07:12
@enerdgumen
Copy link
Contributor

Should the library even have statix as a dependency?
Should the library explicitly connect to statix or should we make it so that a statix module can passed to it?

I would put statix as an optional dependency, with a PrimaAuth0Ex.Statix module similar to the following:

if Code.ensure_loaded?(Statix) do
  defmodule PrimaAuth0Ex.Statix do
    @moduledoc false

    use Statix, runtime_config: true
  end
else
  defmodule PrimaAuth0Ex.Statix do
    @moduledoc false

    require Logger

    def connect() do
      Logger.debug("Statix is not enabled")
    end

    def increment(_metric, _value, _opts), do: :ok
  end
end

Trying it I noted that also the following optional modules should be adapted to prevent unexpected errors during compilation:

if Code.ensure_loaded?(Plug) do
  defmodule PrimaAuth0Ex.Plug.VerifyAndValidateToken do
  ...
  end
end

if Code.ensure_loaded?(Absinthe.Plug) do
  defmodule PrimaAuth0Ex.Absinthe.CreateSecurityContext do
  ...
  end
end

if Code.ensure_loaded?(Absinthe) do
  defmodule PrimaAuth0Ex.Absinthe.RequirePermissions do
  ...
  end
end

@cpiemontese
Copy link
Contributor Author

cpiemontese commented May 30, 2022

Should the library even have statix as a dependency?
Should the library explicitly connect to statix or should we make it so that a statix module can passed to it?

I would put statix as an optional dependency, with a PrimaAuth0Ex.Statix module similar to the following:

if Code.ensure_loaded?(Statix) do
  defmodule PrimaAuth0Ex.Statix do
    @moduledoc false

    use Statix, runtime_config: true
  end
else
  defmodule PrimaAuth0Ex.Statix do
    @moduledoc false

    require Logger

    def connect() do
      Logger.debug("Statix is not enabled")
    end

    def increment(_metric, _value, _opts), do: :ok
  end
end

I choose to remove statix and just use telemetry to emit an event. This way we are independent of libraries and the caller can freely choose whether to listen to the events or not.
What do you think?

Trying it I noted that also the following optional modules should be adapted to prevent unexpected errors during compilation:

if Code.ensure_loaded?(Plug) do
  defmodule PrimaAuth0Ex.Plug.VerifyAndValidateToken do
  ...
  end
end

if Code.ensure_loaded?(Absinthe.Plug) do
  defmodule PrimaAuth0Ex.Absinthe.CreateSecurityContext do
  ...
  end
end

if Code.ensure_loaded?(Absinthe) do
  defmodule PrimaAuth0Ex.Absinthe.RequirePermissions do
  ...
  end
end

Yes it makes sense, I'll try to fix it

@cpiemontese
Copy link
Contributor Author

cpiemontese commented May 30, 2022

About this

Moreover, we are considering the possibility of adding a flag to make the service crash if something goes wrong, which could be used in staging/qa to detect certain errors more easily.

I am not so sure it makes sense, we could make the server crash but what would be gain?

@cpiemontese cpiemontese requested a review from enerdgumen May 30, 2022 10:19
@cpiemontese cpiemontese marked this pull request as ready for review May 30, 2022 10:19
@cpiemontese cpiemontese removed request for fabuloso and Fs00 May 30, 2022 10:19
@enerdgumen
Copy link
Contributor

I choose to remove statix and just use telemetry to emit an event. This way we are independent of libraries and the caller can freely choose whether to listen to the events or not.
What do you think?

It's a good idea to use a generic library like this one, but this is not so convenient for our downstream dependencies.
As far as I understand, if a project uses auth0_ex and wants to send the telemetry to Datadog it should define an handler like the following:

defmodule Auth0TelemetryHandler do
  def handle_event([:prima_auth0_ex, :retrieve_token, :failure], measurements, %{audience: audience}, _config) do
    Statix.increment("new_token:failure", 1, tags: ["audience:#{audience}"])
  end

  def handle_event([:prima_auth0_ex, :retrieve_token, :success], measurements, %{audience: audience}, _config) do
    Statix.increment("new_token:success", 1, tags: ["audience:#{audience}"])
  end
end

and remember to configure it at the startup:

:telemetry.attach_many(
  "auth0-handler",
  [
    [:prima_auth0_ex, :retrieve_token, :failure],
    [:prima_auth0_ex, :retrieve_token, :success]
  ],
  &Auth0TelemetryHandler.handle_event/4,
  nil
)

If the library adds some metrics, we need to remember to add them to all the clients.
It seems to be annoying and error-prone to do this in every project, so it would be useful to have a ready-to-use component for Statix to reduce boilerplate, although :telemetry still makes sense since someone could not use Statix.

@cpiemontese
Copy link
Contributor Author

cpiemontese commented May 30, 2022

@enerdgumen

You are right, I created a module which could serve as a pre-defined statix handler.
I am not sure how we could handle configurations, e.g.

config :statix, PrimaAuth0Ex.Telemetry.Statix,
  enabled: false,
  host: "statsd",
  prefix: "prima_auth0_ex",
  tags: ["env:#{config_env()}"]

Edit: reword

Statix.increment("retrieve_token:success", count, tags: ["audience:#{audience}"])
end
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure.. in this way we need to duplicate the Statix configuration, the application will open another socket for datadog instead of using the existing one, and it still need to call "setup" manually.

I see some options:

  1. We could make the reporter module configurable in the setup and use it in the handler, example:
def setup do
   reporter = Application.get_env(:prima_auth0_ex, :reporter_module)
   if reporter != nil do
      :ok =
        :telemetry.attach_many(
          "auth0-handler",
          [
            [:prima_auth0_ex, :retrieve_token, :failure],
            [:prima_auth0_ex, :retrieve_token, :success]
          ],
          &Handler.handle_event/4,
          %{reporter: reporter}
        )
    end
end

and then in the handler:

def handle_event([:prima_auth0_ex, :retrieve_token, :failure], %{count: count}, %{audience: audience}, %{reporter: reporter}) do
  reporter.increment("retrieve_token:failure", count, tags: ["audience:#{audience}"])
end

Finally we can call setup from PrimaAuth0Ex.Application.

  1. Instead of Statix, we could use instrument. It stores the reporter module in a genserver and provides a good API, backed by Statix. But then we need to update our projects to use Instrument...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can easily detect the Statix module too, for instance taking the first configured module:

iex> Application.get_all_env(:statix)
[
  {Vito.Statix, ...}
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I wouldn't add any "complicated" module detection I like the idea of letting the user provide a telemetry reporter so that they can leverage the pre-defined handler! It seems the simplest thing we can offer to avoid having users write a lot of boiler-plate code

@cpiemontese cpiemontese requested a review from enerdgumen May 31, 2022 09:18
enerdgumen
enerdgumen previously approved these changes May 31, 2022
Copy link
Contributor

@enerdgumen enerdgumen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great 👍
I would just document the new telemetry_reporter settings in the README.

@enerdgumen
Copy link
Contributor

enerdgumen commented May 31, 2022

I am not so sure it makes sense, we could make the server crash but what would be gain?

Regarding this I agree with you, having logging and telemetry is enough.

@cpiemontese cpiemontese merged commit 80c7cfd into master May 31, 2022
@cpiemontese cpiemontese deleted the PLATFORM-474/user-story/make-observability-of-cache-set-errors-better branch May 31, 2022 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants