Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: identify google page renderer user agent as a bot #2167

Merged
merged 1 commit into from
Mar 15, 2023

Conversation

halfwhole
Copy link
Collaborator

@halfwhole halfwhole commented Mar 8, 2023

Problem

tldr; our server is wrongly classifying google's page renderer bot as a user on "Desktop", when it should be classified under "Others" instead. full story:

  • Recently, a user sent out a Go link to many users via SMS. Unexpectedly, they saw that in their link statistics section, there were many more hits than expected and 90% of them were coming from desktop, when all their actual human users should be coming from mobile instead (since they're opening the link from SMS).
  • Upon investigating our logs, we can see from the user-agent string that these requests are being made by the "Google-PageRenderer" bot, whose purpose is presumably to automatically render pages from links sent out by SMS. Due to incomplete bot-handling logic on Go, these hits were wrongly categorised under "Desktop", when they actually should be categorised under "Others" instead.

Solution

Change the BOTS_USER_AGENTS expression to include Google-PageRenderer.

Doing this might not work for the long-term though. The current solution to check for bots was implemented in #209, a custom-written BOTS_USER_AGENTS expression is used to determine if a user-agent is a bot or not. But naturally, the expression cannot exhaustively cover all possible bots, leading to many other bots falling through the cracks (like this google page renderer bot).

In the future, we may want to opt for a more comprehensive solution from using a library like isbot

Copy link
Contributor

@gweiying gweiying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@halfwhole halfwhole merged commit 2db8dfc into develop Mar 15, 2023
@halfwhole halfwhole deleted the fix/bot-google-page-renderer branch March 15, 2023 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants