Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better path handling #233

Closed
erez-rabih opened this issue Nov 7, 2021 · 14 comments · Fixed by #245
Closed

better path handling #233

erez-rabih opened this issue Nov 7, 2021 · 14 comments · Fixed by #245
Milestone

Comments

@erez-rabih
Copy link

I'm trying to use the collector rack middleware with a sinatra app

I have a simple endpoint /add/:num_a/:num_b that takes num_a and num_b and returns their sum

The path label value generated by the collector middleware for that route is path="/add/:id/5 , path="/add/:id/3 ...

I think a batter solution would be to take the matching route pattern and use it as the path value, in my case, I'd expect to have path="/add/:num_a/:num_b

any thoughts about that?

@dmagliola
Copy link
Collaborator

Hello, sorry for my late response!

Our general recommended approach with the Collector middleware is that if you have custom requirements like this, the easiest approach is to make your own and add that to the Rack middleware chain.

For this case, it'd probably be easiest to subclass the one provided with the gem and override the strip_ids_from_path method, putting in any custom logic you need.

With respect to your suggestion, we tried in the past making this more configurable through parameters, and it invariable became a messy pile of indirection which was very hard to understand, and which ended up providing very little benefit for an object that does so little work, and which invariable ended up requiring more customization.

We think it's better to provide a simple one that you can either use as-is, if it works for you, or that is very easy to understand if you need to use it as a starting point to make your own.

Hope that helps!

@erez-rabih
Copy link
Author

erez-rabih commented Jan 7, 2022

Thank you for taking the time to answer

I'd like to try and challenge the assumption that your argument relies on: that this is a "custom" and very specific use case I'm referring to in my issue. I think that this kind of behavior is what I would have expected from the ruby client by default.

Taking the matching route pattern as-is makes zero assumptions on the clients needs where the current implementation applies pre defined logic trying to guesstimate :id and :uuid in the actual path, enforcing users to repeatedly override it just to avoid their route patterns being altered on the monitoring dashboard.

Personally I was very surprised by the fact that the Prometheus library applies such a logic to my routes as a default behavior.

Another point I would like make is that the current solution uses logic to replace actual path parts with path variables like :id and :uuid but might miss a lot of the cases where the path part does not match the regular expression
For example, a site that uses a /users/:username path
The current logic would take each path and report it as a separate time series, eventually causing failure on prometheus since it is not designed to handle unique identifiers like user name as label values
My suggestion would also catch these kinds of cases since it would use path="/users/:username" as the path label value, avoiding the creation of time series per username

So to sum it up I would say the current behavior is the custom one whereas my suggestion applies zero logic to the matched route pattern and as such is more predictable from client's perspective.
In addition, the current solution is less robust than the one I offered for purposes of lowering the number of unique path values generated for Prometheus, making clients override the "default" logic in favor of a more robust one.

Would love to hear your opinion and thanks for the answer again

@dmagliola
Copy link
Collaborator

Thank you for coming back to this.

I've started looking at this a bit deeper, and I think I understand what you're asking for a bit better.

I have two comments:

  1. "a better solution would be to take the matching route pattern"
    How would we do this? (in a way that is agnostic to Rails/Sinatra/etc)
    How does the Prometheus client know about the app's routes?
    (There may be a super simple response to this that I just don't know, but this doesn't seem possible to me)

  2. Looking at your example a bit closer, however, I noticed something I missed originally. I do think now, that our ID detection regex is (arguably) a bit broken.
    Let's look at your example: /add/3/5
    I don't fully agree we should report: /add/:num_a/:num_b (mostly because i'm not sure we can do that)
    However, I agree that what you are getting: /add/:id/5 is wrong.
    In my mind, you should be getting /add/:id/:id

Basically, in my mind, any part of the path that is a number delimited by slashes should be turned to :id.
Our current regex doesn't do this, and i'm not sure whether it is intentional.
I'm expecting it isn't intentional, but no one found out because it's unusual to have routes with 2 consecutive numbers like that.
Most routes with 2 IDs in the are generally of the form /posts/3/comments/5, which our regex correctly turns into /posts/:id/comments/:id.

So I think we should fix the Regex to pick up all these groups of numbers delimited by slashes, and replace all of them with :id


The Regex in question: .gsub(%r{/\d+([/$])}, '/:id\\1')
This finds "a slash, a bunch of digits, and either a slash or end of string" (capturing that last bit)
And replaces with "slash, :id, and the captured 'slash or end of string'"

The problem with this is that it consumes the second slash, so the next numerical parameter doesn't match anymore.
In a path with many consecutive numbers separated by slashes, it picks "every other number"

If we change the regex to .gsub(%r{/\d+(?=/|$)}, '/:id') instead, the look-ahead means we don't consume the second slash, and it's available for consuming again for the next part of the path.

Example:

irb(main):029:0> "/add/0/1/2/3/4/5/6".gsub(%r{/\d+(/|$)}, '/:id\\1')
=> "/add/:id/1/:id/3/:id/5/:id"
irb(main):030:0> "/add/0/1/2/3/4/5/6".gsub(%r{/\d+(?=/|$)}, '/:id')
=> "/add/:id/:id/:id/:id/:id/:id/:id"

I propose that we change the Regex to do that, so that all "number parts" in the string will be recognized as an :id, even if there are 2 consecutive ones. (And of course, doing the same for the GUID matcher which I haven't tested but looks like it'll behave the same.

@Sinjo what do you think about this second proposed change?
(assuming we don't get a satisfying way to match to routes)

@dmagliola dmagliola reopened this Jan 7, 2022
@erez-rabih
Copy link
Author

thanks for opening this issue again
regarding your comments:

  1. I assumed the route matching middleware injects the matching route pattern somewhere as a key in the ENV map. If that's not the case then indeed we have no way to receive this information unless the route matching middleware is changed to do so. I'll try to check if such information exists.
  2. the new solution seems to do better than the current one but still would miss on cases like /users/:username. It's definitely an improvement but IMO not a complete solution.

I'll try to check regarding 1 and update here if I find anything we can use

@erez-rabih
Copy link
Author

@dmagliola so I checked it on the sinatra repo and there's a ENV['sinatra.route'] available
In my case, I received the following

{"sinatra.route"=>"GET /add/:num_a/:num_b"}

we can use this path pattern for the path value when reporting to prometheus instead of reverse-engineering the path pattern given an actual path

what do you think?

@dmagliola
Copy link
Collaborator

Is there an equivalent for Rails?
Should we only support this feature for Sinatra, and then if that ENV var is not there, fall back to our current logic? (possibly with the improvement to the Regex mentioned above)

@Sinjo do you have any thoughts?
Either of there changes would be "slightly breaking" as they can change labels for existing metrics, so now would be a great time to decide this :D

@Sinjo
Copy link
Member

Sinjo commented Jan 28, 2022

I think it would be nice to do something like this on a best-effort basis (obviously dependant on us being provided the route that matched by the web framework).

Given Rails' popularity it would be awesome to have similar support for it built in. I'm not sure off the top of my head if it injects anything like this into ENV, so I'd have to try it and see.

I'm not sure if there are other frameworks worth supporting? Do people use Grape or Hanami in significant enough numbers for us to try and add those?

@Sinjo
Copy link
Member

Sinjo commented Jan 28, 2022

Oh, to be clear: even if we can't find anything similar for Rails I still think we should make it nicer for Sinatra. I don't think most people spend their days switching between the two, so it doesn't seem like it would be all that jarring.

We can always ask the Rails team/send a PR as a proposal to them to inject this info.

@Sinjo
Copy link
Member

Sinjo commented Jan 29, 2022

@dmagliola So I've been playing around with this a bit.

I think that what @erez-rabih has suggested is the best solution for Sinatra and we should go ahead with that for 3.0.0 (i.e. squeeze this in while we're making other breaking changes).

I've had a bit of a play with a sample Rails app, and I can't see anything similar in request.env. In some ways, they go a bit further and give you the fully deconstructed params like:

"action_dispatch.request.path_parameters": {
    "controller": "example",
    "action": "show",
    "id": "7"
},

but I can't see anything would give us the original route definition. Rails.application.routes.recognize_path (or its equivalent from older Rails versions - Rails.application.routes.recognize_path) returns the same value in action_dispatch.request.path_parameters. I think we'd need to raise an issue in Rails and send them a PR if they agree to add something similar (if it's even reasonable to do with how Rails' route matching works - I don't know the internals well enough to say).

In digging into this, I did notice something about our implementation of strip_ids_from_path, which I hadn't quite understood from the original issue. Our gsub doesn't replace consecutive numeric parts in a path. For example:

$ bx rails c
Loading development environment (Rails 7.0.1)
irb(main):001:0> "/example/1/2/3/4".gsub(%r{/\d+(/|$)}, '/:id\\1')
=> "/example/:id/2/:id/4"

I'm not sure if this was intentional, but if not I can definitely see how we ended up here. Most URL paths for show routes are made up of either a single resource followed by an ID, or an alternating pattern of <resource>/<id>/<nested_resource>/<id>.... It's plausible we never tested the case of two consecutive ID-like path segments, and it's possibly something we don't want to try and second guess in this middleware even now that we're talking about it.

My current leaning is:

  • For Sinatra specifically, use sinatra.route (which I think we still need to prepend SCRIPT_NAME to, but we should check when implementing it).
  • Don't change anything else now. Speak to the Rails maintainers and see what they think about passing this information into request.env.

@Sinjo
Copy link
Member

Sinjo commented Jan 29, 2022

I decided to have a look at Grape, since it's easy enough to set a small app up. It looks like there is an equivalent we can do there if we particularly feel like it, but it's a little less direct than the Sinatra one:

request.env["grape.routing_args"][:route_info].pattern.origin
=> "/example/:id"

@Sinjo
Copy link
Member

Sinjo commented Jan 29, 2022

I also tried Hanami, and their app generator throws an exception, and that is exactly how much time I am willing to give it:

$ gem install hanami
...
$ hanami new sample-hanami
/home/sinjo/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/dry-types-0.12.3/lib/dry/types/definition.rb:33:in `initialize': wrong number of arguments (given 3, expected 1..2) (ArgumentError)

@Sinjo
Copy link
Member

Sinjo commented Jan 29, 2022

@dmagliola Oh, I'd totally skipped over your second point in this comment before investigating our regex and coming to the same conclusion. 🙈

I don't feel all that strongly either way about it. It's maybe a slight improvement to the heuristic we use, but we're getting deeper into guesswork territory about people's route structures.

@Sinjo
Copy link
Member

Sinjo commented Jan 31, 2022

After a quick chat with @dmagliola, we've agreed on the following:

  • For Sinatra and Grape, we'll override the matching route with the value those frameworks inject into request.env
  • For everything else, we'll amend the regex used in strip_ids_from_path so that it works on every path segment, rather than only alternating ones
  • We'll raise an issue with Rails and see if they're open to supporting a similar feature (injecting the matched path)

We're going to slightly delay 3.0.0 to get this in. I'll hopefully have time to make a PR some time this week and we can cut a release soon after that's merged.

@Sinjo
Copy link
Member

Sinjo commented Feb 1, 2022

Everything we can implement at the moment is in #245, and I've opened a thread on the rubyonrails-core discussion board.

@Sinjo Sinjo closed this as completed in #245 Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants