-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClaudeBot causing collection#show performance problems #4130
Comments
This looks like another ill-behaved spider problem. During a 3-hour window--the beginning half of the graph above--we saw
2.7.3 :010 > sample = show_events.sample(100)
=> [#<Ahoy::Event id: 298399469, visit_id: 209692590, user_id: nil, name: "collection#show", properties: {"collecti...
2.7.3 :011 > sample.map{|event| event.visit.browser}.tally
=> {"ClaudeBot"=>96, "Chrome Mobile"=>4}
2.7.3 :012 > sample.map{|event| event.visit.user_agent}.tally
=> {"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)"=>96, "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.65 Mobile Safari/537.36 (compatible; GoogleOther)"=>4}
2.7.3 :014 > pp sample.map{|event| event.visit.ip}.tally
{"3.138.172.82"=>1,
"3.142.166.143"=>2,
"3.145.49.160"=>1,
"3.147.68.236"=>1,
"18.220.148.149"=>1,
"3.21.46.181"=>1,
"3.21.234.238"=>2,
"3.137.163.44"=>1,
"3.129.67.59"=>1,
"3.129.21.37"=>1,
"3.146.255.113"=>1,
"66.249.66.3"=>1,
"18.219.244.12"=>1,
"18.191.225.36"=>1,
"66.249.66.22"=>2,
"3.128.226.23"=>1,
"3.144.2.77"=>1,
"18.118.210.41"=>1,
"18.224.31.50"=>1,
"3.142.196.223"=>1,
"3.142.42.136"=>1,
"18.118.19.223"=>1,
"3.145.41.49"=>1,
"3.147.238.61"=>1,
"3.12.102.118"=>1,
"18.188.180.136"=>1,
"3.138.69.172"=>1,
"3.145.34.109"=>1,
"52.14.239.105"=>1,
"3.144.151.46"=>1,
"3.23.126.48"=>1,
"18.118.159.232"=>1,
"18.223.188.202"=>1,
"3.22.77.79"=>1,
"3.147.43.233"=>1,
"18.219.190.220"=>1,
"3.147.140.206"=>1,
"18.217.232.226"=>1,
"3.141.47.178"=>1,
"18.117.154.219"=>1,
"52.14.6.128"=>1,
"18.223.205.116"=>1,
"3.17.181.181"=>1,
"18.221.117.51"=>1,
"18.119.119.137"=>1,
"18.220.164.222"=>1,
"18.224.19.167"=>1,
"18.222.12.201"=>1,
"52.15.137.232"=>1,
"3.14.64.41"=>1,
"52.14.196.241"=>1,
"3.17.29.195"=>2,
"3.148.145.236"=>1,
"18.118.135.213"=>1,
"18.219.73.146"=>1,
"13.58.53.247"=>1,
"18.223.211.51"=>1,
"13.58.242.200"=>1,
"18.191.216.97"=>1,
"3.142.250.95"=>1,
"3.14.149.176"=>1,
"52.14.105.188"=>1,
"18.188.90.132"=>1,
"3.12.152.21"=>1,
"3.22.61.226"=>1,
"3.22.118.21"=>1,
"52.15.233.135"=>1,
"3.144.237.77"=>2,
"3.133.131.32"=>1,
"18.119.133.72"=>1,
"3.16.25.91"=>1,
"18.216.27.23"=>1,
"3.16.67.33"=>1,
"3.128.197.221"=>1,
"52.15.48.176"=>1,
"3.144.211.45"=>1,
"18.118.28.34"=>1,
"3.128.201.209"=>1,
"18.117.231.6"=>1,
"66.249.66.23"=>1,
"3.145.204.201"=>1,
"13.58.82.135"=>1,
"3.129.18.221"=>1,
"18.117.102.235"=>1,
"18.117.172.160"=>1,
"18.191.193.231"=>1,
"3.146.176.193"=>1,
"3.133.103.100"=>1,
"3.15.212.91"=>1,
"3.134.85.72"=>1,
"18.223.206.160"=>1,
"3.145.78.155"=>1,
"3.144.187.55"=>1,
"18.219.116.162"=>1,
"3.138.246.227"=>1} It looks like ClaudeBot is getting confused by facets, since each visit in the sample has a unique landing page URL, which look something like this: https://fromthepage.com/digitalindy/ipr?search%5Bs2%5D%5B%5D=IPR-Box001_077.jpg&search%5Bs2%5D%5B%5D=IPR-Box016_230.jpg&search%5Bs2%5D%5B%5D=IPR-Box021_193&search%5Bs2%5D%5B%5D=IPR-Box026_863.jpg&search%5Bwork-collection_id%5D=25000140 |
Several other bots are also crawling, include Bytespider, which does not, however, identify itself by browser:
|
Added additional agents based on this:
|
The throttling does not appear to be working. To test, execute this from a laptop and watch for 529 response codes:
|
This is still not working, even after adding a blocklist. It looks like our addition of Production
Development
|
Success!
|
…nitializer Externalize rack attack initializer #4130
The
collection#show
action is getting hammered; possibly by bots. This seems to have brought perfomance to a stand-still at different times of day:The text was updated successfully, but these errors were encountered: