Skip to content

Diskover v2 Install Guide

Chris Park edited this page Oct 20, 2021 · 104 revisions

Below is an install guide for diskover v2 and diskover-web v2. It is written for CentOS 7.x but could also be used as a rough-guide for how to install on Ubuntu or other Linux distros. If you are looking for documentation on how to use Diskover v2, see the v2 user guide.

Main requirements

  • Python 3.5+
  • Elasticsearch 7.x
  • PHP 7.x + PHP-FPM
  • Nginx

See hardware requirements.

Other notes

  • Disabling SELinux and using software firewall are optional and not required to run diskover.
  • Internet access is required during install to download packages with yum.
  • Apache could be used instead of Nginx but set up is not covered in this guide.

Installation How-to - diskover

  1. Install CentOS 7.x (tested with CentOS 7.8 DVD iso using minimal install)
  2. Disable SELINUX (optional, not required to run diskover, if you use selinux you will need to adjust the selinux policies to allow diskover to run)
vi /etc/sysconfig/selinux
change SELINUX to disabled
reboot now
  1. Update Server
yum -y update
  1. Install Java 8 JDK (OpenJDK) (req. for ES)
yum -y install java-1.8.0-openjdk.x86_64
  1. Install ElasticSearch 7.x
yum install -y https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-x86_64.rpm
**Set JVM configuration (mem heap size)
vi /etc/elasticsearch/jvm.options
-Xms8g    ** set to 50% of Memory, up to 32g max
-Xmx8g    ** set to 50% of Memory, up to 32g max
**Set Firewall rules
firewall-cmd --add-port=9200/tcp --permanent
firewall-cmd --reload
**Update /etc/elasticsearch/elasticsearch.yml
network.host:   ** leave commented out for localhost (default) or uncomment and set to the ip you want to bind to, using "0.0.0.0" will bind to all ips
discovery.seed_hosts:   ** leave commented out for ["127.0.0.1", "[::1]"] (default) or uncomment and set to ["<host ip>"]
path.data:   ** set to fast SSD path or other fast disk
path.logs:   ** set to fast SSD path or other fast disk
bootstrap.memory_lock: true    *** uncomment
**Update elasticsearch systemd service settings
mkdir /etc/systemd/system/elasticsearch.service.d
vi /etc/systemd/system/elasticsearch.service.d/elasticsearch.conf
**Add the text
[Service]
LimitMEMLOCK=infinity
LimitNPROC=4096
LimitNOFILE=65536
**
systemctl enable elasticsearch.service
systemctl start elasticsearch.service
systemctl status elasticsearch.service
  1. Install Kibana 7.x (optional)
yum install -y https://artifacts.elastic.co/downloads/kibana/kibana-7.10.2-x86_64.rpm
vi /etc/kibana/kibana.yml
**Uncomment and set the following line:
server.host: "<host ip>"
**Uncomment and set the following line if ES is not listening on localhost:
elasticsearch.hosts: ["http://<es host ip>:9200"]
**Set Firewall rules
firewall-cmd --add-port=5601/tcp --permanent
firewall-cmd --reload
systemctl enable kibana.service
systemctl start kibana.service
systemctl status kibana.service

For securing Elasticsearch and Kibana, see security guide.

  1. Install Python 3 (Python 3.6.8), Pip and dev tools
yum -y install python3 python3-devel gcc
python3 -V
pip3 -V
  1. Install diskover
** Extract diskover compressed file (from ftp server)
mkdir /tmp/diskover
tar -zxvf diskover-<version>.tar.gz -C /tmp/diskover
cd /tmp/diskover/diskover-<version>
** Copy diskover files to opt
cp -a diskover /opt/
cd /opt/diskover
** Install required python dependencies
pip3 install -r requirements.txt
*** If indexing to AWS Elasticsearch run
pip3 install -r requirements-aws.txt
** Copy default/sample configs
for d in configs_sample/*; do d=`basename $d` && mkdir -p ~/.config/$d && cp configs_sample/$d/config.yaml ~/.config/$d/; done 
** edit diskover config file
vi ~/.config/diskover/config.yaml
*** set databases > elasticsearch > host to your elasticsearch hostname/ip
** Copy diskover.lic file to /opt/diskover/
  1. Mount your network storage (set up client connection to storage)
*** for NFS
yum -y install nfs-utils
mkdir /mnt/nfsstor1
mount -t nfs -o ro,noatime,nodiratime server_name:/export_name /mnt/nfsstor1
*** for SMB/CIFS
yum -y install cifs-utils
mkdir /mnt/smbstor1
mount -t cifs -o username=user_name //server_name/share_name /mnt/smbstor1
  1. Run your first crawl
cd /opt/diskover
**start crawling
python3 diskover.py -i diskover-<indexname> <storage_top_dir>
  1. Set up diskoverd (task worker) daemon as a systemd service (optional) See here for how to.

  2. Add additional Elasticsearch nodes to cluster (recommended min 3 nodes for production use) See here for how to.

Installation How-to - diskover-web

  1. Install Nginx
yum -y install epel-release yum-utils
yum -y install http://rpms.remirepo.net/enterprise/remi-release-7.rpm
yum -y install nginx
systemctl enable nginx
systemctl start nginx
systemctl status nginx
  1. Install PHP 7 and PHP-FPM (fastcgi)
yum-config-manager --enable remi-php74
yum -y install php php-common php-fpm php-opcache php-pecl-mcrypt php-cli php-gd php-mysqlnd php-ldap php-pecl-zip php-xml php-xmlrpc php-mbstring php-json
vi /etc/php-fpm.d/www.conf
** change user = nginx and group = nginx
** uncomment and change listen.owner = nginx and listen.group = nginx
** change listen to listen = /var/run/php-fpm/php-fpm.sock
chown -R root:nginx /var/lib/php
chown -R nginx:nginx /var/run/php-fpm/
systemctl enable php-fpm
systemctl start php-fpm
systemctl status php-fpm
  1. Install diskover-web
** Extract diskover compressed file (from ftp server) ** can skip this step if you did this already when installing diskover
mkdir /tmp/diskover
tar -zxvf diskover-<version>.tar.gz -C /tmp/diskover
cd /tmp/diskover/diskover-<version>
** Copy web files to www
cp -a diskover-web /var/www/
** Edit diskover-web config
cd /var/www/diskover-web/src/diskover
cp Constants.php.sample Constants.php
vi Constants.php (diskover-web config file)
** set ES_HOSTS to your elasticsearch hostname/ip
** change ADMIN_PASS to a strong password (default admin user password is darkdata)
** change PASS to a strong password (default diskover user password is darkdata)
cd /var/www/diskover-web/public
** copy default/sample txt files
for f in *.txt.sample; do cp $f "${f%.*}"; done
chmod 660 *.txt
cd /var/www/diskover-web/public/tasks/
** copy default/sample json files
for f in *.json.sample; do cp $f "${f%.*}"; done
chmod 660 *.json
** Set permissions
chown -R nginx:nginx /var/www/diskover-web
** Create nginx config
vi /etc/nginx/conf.d/diskover-web.conf
*** add below text to diskover-web.conf

server {
        listen   8000;
        server_name  diskover-web;
        root   /var/www/diskover-web/public;
        index  index.php index.html index.htm;
        error_log  /var/log/nginx/error.log;
        access_log /var/log/nginx/access.log;
        location / {
            try_files $uri $uri/ /index.php?$args =404;
        }
        location ~ \.php(/|$) {
            fastcgi_split_path_info ^(.+\.php)(/.+)$;
            set $path_info $fastcgi_path_info;
            fastcgi_param PATH_INFO $path_info;
            try_files $fastcgi_script_name =404; 
            fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock;
            #fastcgi_pass 127.0.0.1:9000;
            fastcgi_index index.php;
            include fastcgi_params;
            fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
            include fastcgi_params;
            fastcgi_read_timeout 900;
            fastcgi_buffers 16 16k;
            fastcgi_buffer_size 32k;
        }
}

systemctl reload nginx
**open firewall ports for diskover-web
firewall-cmd --add-port=8000/tcp --permanent
firewall-cmd --reload
** Copy diskover-web.lic file to /var/www/diskover-web/src/diskover/
*** Set permissions for license file
chown nginx:nginx diskover-web.lic
chmod 644 diskover-web.lic
  1. View index in diskover-web after crawl finishes
http://<host_ip>:8000/

* default login is username: diskover and password: darkdata
* default admin login is username: admin and password: darkdata
* usernames/passwords can be set in web config file Constants.php
  1. Check for any errors in nginx log (e.g. permission issues)
tail -f /var/log/nginx/error.log

Updating Diskover v2 to latest version

To update diskover v2, download the latest update-diskover.sh file from the diskoverspace.com ftp server's scripts directory. After downloading, edit the top of the file to include your ftp info, paths to diskover v2, and then save it and run it. This will update diskover v2 and diskover-web v2 to latest version on ftp server.

Make a backup of your existing config files (optional):

cd ~/.config/diskover && cp config.yaml config.yaml.bak
cd <diskover-web_dir>/src/diskover && cp Constants.php Constants.php.bak

Make a backup of your existing data files (optional):

cd <diskover-web_dir>/public && for f in *.txt; do cp $f $f.bak; done
cd <diskover-web_dir>/public/tasks && for f in *.json; do cp $f $f.bak; done

Stop diskoverd (if running):

sudo systemctl stop diskoverd
ps -ef | grep diskoverd

Run update script:

chmod +x update-diskover.sh
./update-diskover.sh

Check your config files are not missing any new settings:

diff <diskover_dir>/configs_sample/diskover/config.yaml ~/.config/diskover/config.yaml
cd <diskover-web_dir>/src/diskover && diff Constants.php.sample Constants.php 

Restart nginx and php-fpm

systemctl restart nginx
systemctl restart php-fpm

Check for any errors in nginx log (e.g. permission issues)

tail -f /var/log/nginx/error.log

Upgrading from tar.gz file

If you have a diskover tar.gz file you can also update using the below commands assuming diskover and diskover-web are installed in the default locations.

tar -zxvf diskover-v2-<version>.tar.gz -C /tmp/diskover-v2/
cd /tmp/diskover-v2/
rsync -rcv --exclude=diskover.lic diskover/ /opt/diskover/
rsync -rcv --exclude=diskover-web.lic diskover-web/ /var/www/diskover-web/
chown -R nginx:nginx /var/www/diskover-web
systemctl restart php-fpm
systemctl restart nginx

See above that your existing diskover and diskover-web config files are not missing any new settings from sample/default configs.

Running Windows 10 Scanner

  1. Extract diskover zip file from ftp server to temp folder

  2. Open a command prompt and copy diskover folder to program files

Xcopy C:\tmp\diskover "C:\Program Files\" /E /H /C /I
  1. Install Python

Get python 3.5+ from https://www.python.org/downloads/ or Windows Store and install

  1. Install Python Modules

open a command prompt (run as administrator)

cd "C:\Program Files\diskover"
pip3 install -r requirements-win.txt
*** If indexing to AWS Elasticsearch run
pip3 install -r requirements-aws.txt
  1. Copy default/sample configs

open a command prompt (run as administrator)

cd "C:\Program Files\diskover\configs_sample"
for /F %i in ('dir /b') do (mkdir %APPDATA%\%i & copy %i\config.yaml %APPDATA%\%i\)
  1. Setup diskover configuration file

Use Notepad to open the following configuration file

%APPDATA%\diskover\config.yaml

Setup Elastic Search Host Information

*** If using Elasticsearch in AWS

Set AWS to True (remove the # comment indicator)
aws: True

Setup AWS Elasticsearch url (remove the # comment indicator, and https://)
host: <es host endpoint>

Setup port to use AWS Port 443
port: 443

Configure Username
user: myusername

Configure Password
password: changeme
***

*** If using on-prem Elasticsearch instance

Set host information
host: <es host ip>

Set Elasticsearch port
port: 9200
***

Set replacepaths to True
replace: True
  1. Generate an index / scan

Open a command prompt, running as Administrator is optional if you need elevated privileges to scan/index all the files.

python3 diskover.py -i diskover-<indexname> <top path>