Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning about clearsource_history command and duplicates #393

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 12 additions & 10 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -219,12 +219,14 @@ The following operations can be run from the command line as described underneat
but keeps the source itself

harvester clearsource_history [{source-id}]
- If no source id is given the history for all harvest sources (maximum is 1000)
will be cleared.
Clears all jobs and objects related to a harvest source, but keeps the source
itself. The datasets imported from the harvest source will **NOT** be deleted!!!
If a source id is given, it only clears the history of the harvest source with
the given source id.
- clears the history for all active harvest sources (up to a maximum of
1000 sources), or specify a specific harvest source to clear just
that one. All jobs and objects related to the harvest source(s) will
be cleared, but it keeps the source itself. This is useful to clean
history of long running harvest sources to start again fresh.
Warning: The datasets imported from the harvest source will NOT be deleted.
They will be disassociated with the harvest source, so if you harvest again
it'll create duplicate datasets.

harvester sources [all]
- lists harvest sources
Expand Down Expand Up @@ -253,9 +255,9 @@ The following operations can be run from the command line as described underneat
import) without involving the web UI or the queue backends. This is
useful for testing a harvester without having to fire up
gather/fetch_consumer processes, as is done in production.

harvester run_test {source-id/name} force-import=guid1,guid2...
- In order to force an import of particular datasets, useful to
- In order to force an import of particular datasets, useful to
target a dataset for dev purposes or when forcing imports on other environments.

harvester gather_consumer
Expand Down Expand Up @@ -654,10 +656,10 @@ harvester run_test
You can run a harvester simply using the ``run_test`` command. This is handy
for running a harvest with one command in the console and see all the output
in-line. It runs the gather, fetch and import stages all in the same process.
You must ensure that you have pip installed ``dev-requirements.txt``
You must ensure that you have pip installed ``dev-requirements.txt``
in ``/home/ckan/ckan/lib/default/src/ckanext-harvest`` before using the
``run_test`` command.

This is useful for developing a harvester because you can insert break-points
in your harvester, and rerun a harvest without having to restart the
gather_consumer and fetch_consumer processes each time. In addition, because it
Expand Down
15 changes: 9 additions & 6 deletions ckanext/harvest/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,14 +110,17 @@ def clear(ctx, id):
@click.argument(u"id", metavar=u"SOURCE_ID_OR_NAME", required=False)
@click.pass_context
def clear_history(ctx, id):
"""If no source id is given the history for all harvest sources
(maximum is 1000) will be cleared.
"""
Clears all jobs and objects related to a harvest source, but keeps the
source itself. This is useful to clean history of long running harvest
sources to start again fresh.

Clears all jobs and objects related to a harvest source, but keeps
the source itself. The datasets imported from the harvest source
will NOT be deleted!!! If a source id is given, it only clears
the history of the harvest source with the given source id.
Warning: The datasets imported from the harvest source will NOT be deleted.
They will be disassociated with the harvest source, so if you harvest again
it'll create duplicate datasets.

If no source id is given the history for all harvest sources (up to a
maximum of 1000 sources) will be cleared.
"""
flask_app = ctx.meta["flask_app"]

Expand Down
12 changes: 8 additions & 4 deletions ckanext/harvest/commands/harvester.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,14 @@ class Harvester(CkanCommand):
but keeps the source itself

harvester clearsource_history [{source-id}]
- If no source id is given the history for all harvest sources (maximum is 1000) will be cleared.
Clears all jobs and objects related to a harvest source, but keeps the source itself.
The datasets imported from the harvest source will NOT be deleted!!!
If a source id is given, it only clears the history of the harvest source with the given source id.
- clears the history for all active harvest sources (up to a maximum of
1000 sources), or specify a specific harvest source to clear just
that one. All jobs and objects related to the harvest source(s) will
be cleared, but it keeps the source itself. This is useful to clean
history of long running harvest sources to start again fresh.
Warning: The datasets imported from the harvest source will NOT be deleted.
They will be disassociated with the harvest source, so if you harvest again
it'll create duplicate datasets.

harvester sources [all]
- lists harvest sources
Expand Down
18 changes: 12 additions & 6 deletions ckanext/harvest/logic/action/update.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,11 +236,14 @@ def harvest_source_clear(context, data_dict):

def harvest_sources_job_history_clear(context, data_dict):
'''
Clears the history for all active harvest sources. All jobs and objects related to a harvest source will
be cleared, but keeps the source itself.
This is useful to clean history of long running harvest sources to start again fresh.
The datasets imported from the harvest source will NOT be deleted!!!

Clears the history for all active harvest sources (up to a maximum of
1000 sources). All jobs and objects related to a harvest source will be
cleared, but it keeps the source itself. This is useful to clean history of
long running harvest sources to start again fresh.

Warning: The datasets imported from the harvest source will NOT be deleted.
They will be disassociated with the harvest source, so if you harvest again
it'll create duplicate datasets.
'''
check_access('harvest_sources_clear', context, data_dict)

Expand All @@ -265,7 +268,10 @@ def harvest_source_job_history_clear(context, data_dict):
'''
Clears all jobs and objects related to a harvest source, but keeps the source itself.
This is useful to clean history of long running harvest sources to start again fresh.
The datasets imported from the harvest source will NOT be deleted!!!

Warning: The datasets imported from the harvest source will NOT be deleted.
They will be disassociated with the harvest source, so if you harvest again
it'll create duplicate datasets.

:param id: the id of the harvest source to clear
:type id: string
Expand Down