-
Notifications
You must be signed in to change notification settings - Fork 2
Data Removing
Some info about deleting data.
There is a job for that in jenkins
and a dag
for airflow
.
If you use HDFS
see this job in jenkins that deletes the dr
from HDFS
so the next time it's not indexed (also you should delete the dr
in your collectory).
ALA now uses EMR and airflow so the equivalent job is this.
Imagine that you want to remove the occurrences of a new dr0
you have created for later ingestion. Easy:
biocache delete-records -dr dr0
First you need to make a search that fit to what you want to delete, and download the results in CSV
.
Later you can delete using the API with commands like:
curl -X DELETE https://images.your.l-a.site/ws/image/5ea4b6ed-7567-4a37-b0eb-5c46daf582e0 -H "apiKey: XXXXX"
{"success":true,"message":"Image scheduled for deletion."}%
So to delete all the images.csv
downloaded of a search you can use something like:
cat images.csv | awk -F',' '{print $15}' | sed 's/"//g' | sed '1d' | xargs -I ImageId curl -X DELETE https://images.your.l-a.site/ws/image/ImageId -H "apiKey: XXXXX"
Later you have to do an additional step to delete the images on disk in /admin/ > "Tools" > "Purge deleted images".
biocache-store
delete-records
task doesn't delete media in the image service. Furthermore ingest
task, only re-process images that don't exist in the image service. So, sometimes you need to delete images of some resource.
For this, authenticated via CAS
and with Admin role you can do advanced searches (for instance images with zero height of some data resource) and select all of them and later delete it.
So, the general procedure is, first select images, then go on selected images
near the trolley (top right).
On this new page you have a button "Admin Functions", then use "Deleted images".
If you want to delete all the documents from a core (for instance from bie-offline), you can use something like: thanks to Jason Loomis for the tip and the screenshot.
Depending on the use you did of your demo you'll need to at least:
- For your
collectory
service, stoptomcat
andmysql
, and clearmysql
databasecollectory
- Stop
solr
& clearsolr
index under/data
- Stop
cassandra
and clearcassandra
database (see this task in cassandra ansible role)
Other databases and services to check (image service, species list service, spatial, ...).
Index
- Wiki home
- Community
- Getting Started
- Support
- Portals in production
- ALA modules
- Demonstration portal
- Data management in ALA Architecture
- DataHub
- Customization
- Internationalization (i18n)
- Administration system
- Contribution to main project
- Study case