Validate if a password has already been compromised with on-premises service.
This service is meant for people and organizations that want to protect their users from using already compromised passwords without exposing any information (password hash or even a part of it) to a third-party service, such is https://haveibeenpwned.com/. The same dataset is used as on haveibeenpwned, but only locally, as it provides the complete dataset do be downloaded.
This service is created for and used in production by NewReleases.
For any online service, NIST SP 800-63B guidelines state that user-provided passwords should be checked against existing data breaches.
This service provides a CLI interface to run an HTTP API service to validate if a specific password has been compromised and how many times.
Its initial setup is not trivial as it requires a database to be generated from a publicly available data collection, while providing various options to reduce the database size.
Compromised service binaries have no external dependencies and can just be copied and executed locally.
Binary downloads of the Compromised service can be found on the Releases page.
To install on Linux:
wget https://github.com/janos/compromised/releases/latest/download/compromised-linux-amd64 -O /usr/local/bin/compromised
chmod +x /usr/local/bin/compromised
You may need additional privileges to write to /usr/local/bin
, but the file can be saved at any location that you want.
Supported operating systems and architectures:
- macOS 64bit
darwin-amd64
- macOS 64bit
darwin-arm64
- Linux 64bit
linux-amd64
- Linux 32bit
linux-386
- Linux ARM 64bit
linux-arm64
- Linux ARM 32bit
linux-armv6
- Windows 64bit
windows-amd64
- Windows 32bit
windows-386
This tool is implemented using the Go programming language and can also be installed by issuing a go get
command:
go install resenje.org/compromised/cmd/compromised@latest
This service does not distribute any passwords or password hashes. It relies on the validity of data provided by https://haveibeenpwned.com/Passwords and provides command to generate a searchable database from that data.
It provides an HTTP server with a JSON-encoded API endpoint to be used to validate if a password has been compromised and how many times.
In order to use the service it is required to generate the database and then start the service by loading the database.
Descriptions of available commands and flags can be printed with:
compromised -h
USAGE
compromised [options...] [command]
Executing the program without specifying a command will start a process in
the foreground and log all messages to stderr.
COMMANDS
daemon
Start program in the background.
stop
Stop program that runs in the background.
status
Display status of a running process.
config
Print configuration that program will load on start. This command is
dependent of -config-dir option value.
debug-dump
Send to a running process USR1 signal to log debug information in the log.
index-passwords
Generate passwords database from pwned passwords sha1 file.
version
Print version to Stdout.
OPTIONS
-config-dir string
Directory that contains configuration files.
-h Show program usage.
And flags of the index-passwords
command:
compromised index-passwords -h
USAGE
index-passwords [input filename] [output directory]
OPTIONS
-h Show program usage.
-hash-counting string
Store approximate hash counts. Possible values: exact, approx, none. (default "exact")
-min-hash-count uint
Skip hashes with counts lower than specified with this flag. (default 1)
-shard-count int
Split hashes into a several files. Possible values: 1, 2, 4, 8, 16, 32, 64, 128, 256. (default 32)
Download Pwned passwords SHA1 ordered by hash 7z file from https://haveibeenpwned.com/Passwords. This file is several gigabytes long (version 6 is 10.1GB) so make sure that you have enough disk space.
wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-sha1-ordered-by-hash-v8.7z
Extract a textual file from the downloaded 7z archive. This file is roughly twice in size of 7z archive that contains it, around 24G for version 6. Feel free to remove the 7z archive.
Generate the database with the following command:
compromised index-passwords \
pwned-passwords-sha1-ordered-by-hash-v6.txt \
compromised-passwords-db
This command will read the content of pwned-passwords-sha1-ordered-by-hash-v6.txt
file (make sure that you enter the correct path to it) and store indexes in fast searchable database in compromised-passwords-db
directory. Command index-passwords
will create the directory itself and it will stop execution if it already exists. It is expected that the database size is around 12GB.
By default, all hashes are stored and indexed into 32 files called shards. It is possible to reduce the database size with two optional CLI flags --hash-counting
and --min-hash-count
.
For example:
compromised index-passwords \
--hash-counting approx \
--min-hash-count 10 \
pwned-passwords-sha1-ordered-by-hash-v6.txt \
compromised-passwords
Flag --hash-counting
with approx
value stores approximate hash counts by having exact values for very small values of to around 17 and with the larger values less precise (with variance of around 5%), but close enough to make an estimation on password popularity. With this option, the complete database is 9.7GB large.
Flag --hash-counting
with none
value does not store hash counts and API always returns 1 for count of compromised passwords. With this option, the complete database is 9.3GB large.
Flag --min-hash-count
receives a numerical value which filters out all password hashes which have less number of compromisations than specified. This way it is possible to reduce the size of the database by excluding less frequently used passwords. For example by --min-hash-count 2
only excluding passwords with count 1, the database size is reduced to 7.6GB, or with --min-hash-count 5
to 1.9GB, or with --min-hash-count 10
to 800MB.
You can combine these two options according to available capacity and the level of security and information that you want to provide.
Service configuration is stored in configuration file compromised.yaml
in /etc/compromised
directory by default. You can change the directory with --config-dir
flag:
compromised --config-dir /data/config/compromised
All available options and their default values can be printed with:
compromised config
# compromised
---
listen: :8080
listen-instrumentation: 127.0.0.1:6060
headers:
Server: compromised/0.1.0-6ed439e-dirty
X-Frame-Options: SAMEORIGIN
passwords-db: ""
log-dir: ""
log-level: DEBUG
syslog-facility: ""
syslog-tag: compromised
syslog-network: ""
syslog-address: ""
access-log-level: DEBUG
access-syslog-facility: ""
access-syslog-tag: compromised-access
daemon-log-file: daemon.log
daemon-log-file-mode: "644"
pid-file: /var/folders/l4/tn9ytbgs5xx76lshwgx5bj1w0000gn/T/compromised.pid
# config directories
---
- /etc/compromised
- /Users/janos/Library/Application Support/compromised
The service can be configured with environment variables as well. Variable names can be constructed based on the keys in configuration files.
For variables in compromised.yaml
, capitalize all letters, replace -
with _
and prepend COMPROMISED_
prefix. For example, to set passwords-db
, the environment variable is COMPROMISED_PASSWORDS_DB
:
COMPROMISED_PASSWORDS_DB=/path/to/passwords-db compromised
Executing the program without specifying a command will start a process in the foreground and log all messages to stderr:
compromised
Service requires passwords-db
directory to be specified:
cat /etc/compromised/compromised.yaml
passwords-db: /data/storage/compromised/passwords
To write logs to files on local filesystem:
cat /etc/compromised/compromised.yaml
passwords-db: /data/storage/compromised/passwords
log-dir: /data/log/compromised
Paths in configuration files are given only as examples.
The service can be run in the background and managed by itself with commands:
compromised daemon
compromised status
compromised stop
Or you can choose a process manager to manage it. For example this is a systemd service file:
[Unit]
Description=Compromised
After=network.target
[Service]
ExecStart=/usr/local/bin/compromised
ExecStop=/bin/kill $MAINPID
KillMode=none
Restart=on-failure
RestartPreventExitStatus=255
LimitNOFILE=65536
PrivateTmp=true
NoNewPrivileges=true
[Install]
WantedBy=default.target
In order to minimize the exposure of passwords that are checked, only SHA1 hash of a password is accepted by the API.
First calculate the hash (use printf, not echo as echo is appending new line):
printf 12345678 | sha1
7c222fb2927d828af22f592134e8932480637c0d
Then make an HTTP request like this one.
curl http://localhost:8080/v1/passwords/7c222fb2927d828af22f592134e8932480637c0d
{"compromised":true,"count":2996082}
Make sure that the port is the same as you configured it for the listen
option.
Of if you choose a very strong password:
printf "my not compromised password" | sha1sum
d391477a0849048fc28e62850a25518d72afd013
Then the HTTP response will look like this:
curl http://localhost:8080/v1/passwords/d391477a0849048fc28e62850a25518d72afd013
{"compromised":false}
Beside the main API, there is another API endpoint, by default available on port 6060
only on localhost
which exposes some of the instrumentation information about the service:
- Prometheus metrics
http://localhost:6060/metrics
- Most basic health check endpoint
http://localhost:6060/status
- Most basic JSON health check endpoint
http://localhost:6060/api/status
- Go pprof
http://localhost:6060/debug/pprof/
Instrumentation API can be disabled with an empty value for listen-instrumentation
configuration option in /etc/compromised/compromised.yaml
:
listen-instrumentation: ""
As this service is written in the Go programming language, an HTTP client package is provided, but also a package that allows loading the database in your own application if you do not want to manage the compromised
service.
package main
import (
"contex"
"crypto/sha1"
"fmt"
httppasswords "resenje.org/compromised/pkg/passwords/http"
)
func main() {
// url with host and port where compromised service is listening
s, err := httppasswords.New("http://localhost:8080", nil)
if err != nil {
panic(err)
}
c, err := s.IsPasswordCompromised(contex.Background(), sha1.Sum([]byte("my password")))
if err != nil {
panic(err)
}
fmt.Println("this password has been compromised", c, "times")
}
package main
import (
"contex"
"crypto/sha1"
"fmt"
filepasswords "resenje.org/compromised/pkg/passwords/file"
)
func main() {
s, err := filepasswords.New("/path/to/passwords-db")
if err != nil {
panic(err)
}
defer s.Close()
c, err := s.IsPasswordCompromised(contex.Background(), sha1.Sum([]byte("my password")))
if err != nil {
panic(err)
}
fmt.Println("this password has been compromised", c, "times")
}
Database stores SHA1 hashes in binary format and count values associated with them. A database is generated once and can be used only in read only mode.
SHA1 hashes are 20 bytes long and they are split into 3 bytes long partitions and 17 bytes long remainders. This allows to categorize hashes into 16777216 (count of all 3 bytes long integers) partitions.
All hash remainders are stored in multiple files called shards named hashes-*.db, where * is a base36-encoded positive integer. Shard count shardCount is configurable and can be set to 1, 2, 4, 8, 16, 32, 64, 128 or 256. Shard file number for a particular hash is determined by its first byte with formula byte/256*shardCount, which ensures that every shard contains the same number of partitions distributed in a serial manner.
Database files are db.json, index.db and a series of hashes-*.db.
File db.json stores JSON-encoded meta information about the database.
File index.db stores information where a partition of hashes with a common prefix can be found in a particular hashes-*.db shard.
Files hashes-*.db store hash remainders and count values associated for every hash.
File index.db stores a total of 16777216 + shardCount 32bit integers in an array. Each representing either a shard start or a single partition. In other words, index.db associates a number for every possible partition and that number is the index of partition's last hash in the shard file that it belongs to.
Binary file index.db consists of an array of big endian encoded 32bit unsigned integers. Each integer represents a start of a shard as value 0x00000000 or a last hash index in a particular partition in a particular shard file.
4 bytes
+----------+
+----------+
|0x00000000| shard 0 start
+----------+
| | shard 0, partition 0 end
+----------+
| ... |
+----------+
| | shard 0, partition n end
+----------+
| ... |
+----------+
| | shard 0, partition 16777216/shardCount end
+----------+
|0x00000000| shard 1 start
+----------+
| | shard 1, partition (16777216/shardCount)+1 end
+----------+
| ... |
+----------+
| | shard 1, partition (16777216/shardCount)+n end
+----------+
| ... |
+----------+
| | shard 1, partition (16777216/shardCount)*2 end
+----------+
| ... |
+----------+
| | shard shardCount, partition 16777215 end
+----------+
This structure makes index.db file length from 64MB and one byte, to 64MB and 256 bytes, depending on the shardCount and length is irrelevant of the number of hashes.
This structure is justified as every partition contains at least one compromised password hash.
Limitation is that every shard can contain up to 4,294,967,296 (unsigned 32 bit integer count), or with the maximal shardCount of 256, the database can contain up to 1,099,511,627,776 hashes. These values are larger enough than the number of compromised hashes which is currently 572,611,621, to assume that it will support the growth of the database in the foreseeable future.
Binary files hashes-*.db consist of an array of two part elements. The first part is a fixed size 17 bytes long SHA1 remainder, the second part holds information about the count of the hash that this remainder belongs to and it is fixed for every database but configurable as indexing stage based on the precision that is needed:
- exact - big endian encoded 32 bit unsigned integers - countSize is 4 bytes
- approx - 8 bits long approximation value - countSize is 1 byte
- none - count value is not stored - countSize is 0 bytes
17 bytes countSize
+-----------+-----------+
+-----------+-----------+
| remainder | count | hash 1
+-----------+-----------+
| remainder | count | ...
+-----------+-----------+
| remainder | count | hash n
+-----------+-----------+
To perform a query on the database is to get the information if a particular SHA1 hash is in the database and what count value is associated with it.
The uniform distribution of SHA1 hashes allows the described database structure to be efficient in finding if the hash is present in the database or not.
The query for a particular hash starts with identifying which shard and partition that hash should belong to.
Shard is calculated with formula byte/256*shardCount, where byte is the first byte of the hash, shardCount is read from a db.json file and 256 is the size of a byte (unsigned 8 bit integer) and it is also the maximal number of shards that is supported.
Partition number is a binary decoded 24 bit unsigned integer from the first 3 bytes of the hash.
File index.db is read at the position of the partition number and the next one, getting the range of positions of remainders in that partition in the shard file.
Shard number is used to identify which shard file should be read at the remainder positions. Every remainder should be read sequentially and check if it matches the hash last 17 bytes. At average, 34 check iterations should be made. Partition size of 3 bytes is chosen as optimal for the number of hashes in pwned passwords hashes list, as it leaves in average of 34 hashes per partition. If the match is found, count is decoded from the rest of the second part of the hashes file element.
To see the current version of the binary, execute:
compromised version
Each version is tagged and the version is updated accordingly in version.go
file.
Read the contribution guidelines.
This application is distributed under the BSD-style license found in the LICENSE file.