The AI/LLM bot blocker web server, firewall, and robots.txt config generator used in production by the Ichido Search Engine. These configs block known large AI and LLM bots from accessing your site content, while still allowing classical search engines and legitimate users to access content. Supports the following web servers, firewalls, and standards:
Server/Firewall | Blocked |
---|---|
Iptables | IP Addresses |
Apache | User-Agent |
Nginx | User-Agent |
Lighttpd | User-Agent |
Caddy | User-Agent |
IIS | User-Agent |
Robots.txt | User-Agent |
In total there are 16 variants of config files, of which you'll only need 2 with the recommended config (1 web server config and 1 robots.txt), or 3 with the non-recommended config (1 web server config, 1 robots.txt, and 1 firewall config):
- The recommended config will block most AI bots with a low false positive rate, while still allowing archival services and classical search engines access to site content.
- The non-recommended will config aggressively blocks bots and site scrapers (including AI bots, classical search engines, and archival services), but will likely have many false positives. It is recommended for nearly all use cases to use the recommended config.
The config files can be built manually from source, or prebuilt files can be downloaded from Ichido's file server. Recommended config prebuilt files are prefixed with recommended-
and non-recommended files with nonrecommended-
. Below are instructions for applying the configurations.
- Download the robots.txt file and add it to the root of your web content (should be reachable at
https://\<your_site\>/robots.txt
).
wget https://files.ichi.do/recommended-robots-block-ai-bots.conf /var/www/html/<web_root>/robots.txt
For shared hosting, use the .htaccess
file instructions below.
- Enable the rewrite module.
sudo a2enmod rewrite
- Download the config file into apache's
conf-available
directory:
sudo wget https://files.ichi.do/recommended-apache-block-ai-bots.conf -O /etc/apache2/conf-available/block-ai-bots.conf
- Create a symbolic link to the config in
/etc/apache2/conf-enabled/
ln -s /etc/apache2/conf-available/block-ai-bots.conf /etc/apache2/conf-enabled/
- Restart apache.
sudo service apache2 restart
- Download the config file.
sudo wget https://files.ichi.do/recommended-htaccess-block-ai-bots.conf
- Merge the config with your existing
.htaccess
file, either manually using your hosting provider tools or with this command.
cat .htaccess recommended-htaccess-block-ai-bots.conf > temp.conf
mv temp.conf .htaccess
- Download the config file into nginx's
modules-available
directory:
sudo wget https://files.ichi.do/recommended-nginx-block-ai-bots.conf -O /etc/nginx/modules-available/11-block-ai-bots.conf
- Include the config in your
server
blocks after thelisten
directives:
# Ichido AI Bot Blocker.
include /etc/nginx/modules-available/11-block-ai-bots.conf;
- Restart nginx.
sudo service nginx restart
- Download the config file into lighttpd's
conf-available
directory:
sudo wget https://files.ichi.do/recommended-lighttpd-block-ai-bots.conf -O /etc/lighttpd/conf-available/11-block-ai-bots.conf
- Create a symbolic link to the config in
/etc/lighttpd/conf-enabled/
sudo ln -s /etc/lighttpd/conf-available/11-block-ai-bots.conf /etc/lighttpd/conf-enabled/
- Restart lighttpd.
sudo service lighttpd restart
- Make directories to store caddy config files.
sudo mkdir -p /etc/caddy/conf-available/
sudo mkdir -p /etc/caddy/conf-enabled/
- Download the config file into
/etc/caddy/conf-available/
:
sudo wget https://files.ichi.do/recommended-caddy-block-ai-bots.conf -O /etc/caddy/conf-available/11-block-ai-bots.conf
- Create a symbolic link to the config in
/etc/caddy/conf-enabled/
sudo ln -s /etc/caddy/conf-available/11-block-ai-bots.conf /etc/caddy/conf-enabled/
- Import the config file in your site blocks. For example:
# Ichido AI Bot Blocker.
:80 {
import /etc/caddy/conf-enabled/11-block-ai-bots.conf
}
- Restart caddy.
sudo service caddy restart
TODO
- Install
iptables-persistent
.
sudo apt-get install -y iptables-persistent
- Download the config file into
/etc/iptables/rules.v4
:
sudo wget https://files.ichi.do/non-recommended-iptables-block-ai-bots.conf -O /etc/iptables/rules.v4
- Restart iptables.
sudo service iptables restart
For ease of contribution, this repo is hosted on Github and mirrored on Ichido's Software Forge. If you have a Github account, you can contribute using Github's standard workflow, but if you do not have a Github account you can still contribute via email patches using the workflow below:
- Clone this repo:
git clone https://git.ichi.do/anthony/ai-bot-blocker
cd ai-bot-blocker
- Add your name and an email address to the locally cloned repo:
git config user.name "<name>"
git config user.email "<email>"
- Make changes to the source code.
- Add those changes and commit:
git add .
git commit -m "ADD: new commit."
- Create a patch file from the new commit:
# Use HEAD~1 for 1 commit, HEAD~2 for 2 commits, etc.
git diff HEAD~1 > diff.patch
- Send the patch file to <anthony.m.mancini@protonmail.com> through email.
(C) Anthony Mancini 2024. Licensed under the AGPL-3.0 (see LICENSE.txt).
- Anthony Mancini <anthony.m.mancini@protonmail.com>