-
Notifications
You must be signed in to change notification settings - Fork 2
Infrastructure Requirements
The infrastructure required to run a Living Atlases depends on following factors:
- The number of components beyond the core set of components you wish to run
- Amount of occurrence records you need to index in your system
- The number of spatial layers you wish to incorporate
We recommend the use of cloud infrastructures for Living Atlas installations. This could be a commercial provider (e.g. Amazon EC2, Google Cloud Engine, Microsoft Azure), or a cloud infrastructure within your country operated by an institution (e.g. an OpenStack based installation).
A basic installation of the core components with support for up to 20 million records could be a single Ubuntu 18.04 server with 4-8 CPU, 32GB RAM and SSD storage. Ideally though, it is recommended that Cassandra and SOLR
are ran on separate virtual machines, as both of these components require a reasonable amount of resources. Running Cassandra
and SOLR
separately will allow you to run data maintenance task (loading, processing, indexing) without impacting the performance of your web portal tools.
For installations requiring the indexing of large amounts of data (over 50 million records and/or indexing of large number of spatial layers), we would recommend a clustered installation. This clustered installation is in use by Australia (75 million records and 500+ spatial layers) and UK (219 million records and 50+ spatial layers).
Clustering affects the installation of Apache SOLR
, Apache Cassandra and the biocache
command-line tools.
See the Cassandra requirements and solr requirements.
The core set of components that an Living Atlas will require as a starting point are the following:
- Data registry (component name:
collectory
, example: UK registry) - Occurrence search UI (component name:
biocache-hub
, example: ALA occurrence search) - Occurrence web services (component name:
biocache-service
) - Occurrence data loading tools (component name:
biocache-store
akabiocache-cli
) - Image service (component name:
image-service
, example: ALA images) Apache SOLR
Apache Cassandra
-
mysql
(for thecollectory
) andpostgresql
(for theimage-service
) -
apache
ornginx
as proxies
These components will give a Living Atlas installation the following capabilities:
- metadata editing for collections, institutions, data publishers
- loading, processing and indexing of darwin core archives
- occurrence searching
- image storage and serving
- basic mapping capabilities
These components are installed as part of the ALA demo installation scripts. This is the recommended starting point for projects in the initial phase of looking at the Living Atlases components for their project.
In addition to the core components, the following components can be setup to enhance an installation further:
- Authentication (
CAS
based) - Species lists (component name:
specieslist
, example: NBN UK lists portal ) - Species pages & services
- Spatial services
- Spatial portal - advanced spatial tools and species distribution modelling (component name:
spatial-hub
, example: ALA spatial portal) - Alerts
- Logger
- Regions
- Dashboard
- Sandbox
- spark hardware requirements
- [zookeeper hardware requirements](https://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Single+Machine+Requirements
- hdfs storage calculation
Index
- Wiki home
- Community
- Getting Started
- Support
- Portals in production
- ALA modules
- Demonstration portal
- Data management in ALA Architecture
- DataHub
- Customization
- Internationalization (i18n)
- Administration system
- Contribution to main project
- Study case