Skip to content

Commit

Permalink
Fix #172 - Add irrd_load_database command for manual loading of sources.
Browse files Browse the repository at this point in the history
  • Loading branch information
mxsasha committed Feb 7, 2019
1 parent abcb15a commit efe4430
Show file tree
Hide file tree
Showing 9 changed files with 301 additions and 22 deletions.
3 changes: 2 additions & 1 deletion docs/admins/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,8 @@ configuration. A successful reload after a `SIGHUP` is also logged.

.. note::
As a separate script, `irrd_submit_email`, the handler for email submissions
by IRRd users, **always acts on the current configuration file** - not on
by IRRd users, and `irrd_load_database` for manually loading data,
**always act on the current configuration file** - not on
the configuration that IRRd started with.


Expand Down
65 changes: 64 additions & 1 deletion docs/users/mirroring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ This page explains the processes and caveats involved in mirroring.
For details on all configuration options, see
the :doc:`configuration documentation </admins/configuration>`.

.. contents:: :backlinks: none

Scheduling
----------
Expand Down Expand Up @@ -38,7 +39,7 @@ mirroring of authoritative or mirrored data by other users.

Periodic exports of the database can be produced for all sources. They consist
of a full export of the text of all objects for a source, gzipped and encoded
in UTF-8. If a local journal is kept, another file is exported with the serial
in UTF-8. If a serial is known, another file is exported with the serial
number of this export. If the database is entirely empty, an error is logged
and no files are exported.

Expand Down Expand Up @@ -153,3 +154,65 @@ The mirror can be limited to certain RPSL object classes using the
in this list, are immediately discarded. No logs are kept of this. They
are also not kept in the local journal.
If this setting is undefined, all known classes are accepted.


Manually loading data
---------------------

A third option is to manually load data. This can be useful while testing,
or when generating data files from scripts, as it provides direct feedback
on whether loading data was successful.

Manual loading uses the ``irrd_load_database`` command:

* The command can be called, providing a name of a source and a path to
the file to import. This file can not be gzipped.
* The source must already be in the config file, with empty settings
otherwise if no other settings are needed.
* Optionally, a serial number can be set. See the note about serials below.
* Upon encountering the first error, the process is aborted, and an error
is printed to stdout. No records are made/changed in the database or in
the logs, the previously existing objects will remain in the database.
The exit status is 1.
* When no errors were encountered, the data is saved, and log messages
are written about the result of the import. The exit status is 0.
Nothing is written to stdout.
* An error means encountering an object that raised errors in
:doc:`non-strict object validation </admins/object-validation>`,
an object with an unknown object class, or an object for which
the `source` attribute is inconsistent with the `--source` argument.
* The object class filter configured, if any, is followed.

On serials:

* If no serial is provided, and none has in the past, no serial is
recorded. This is similar to sources that have ``import_source``
set, but not ``import_source_serial``.
* If no serial is provided, but a serial has been provided in a past
command, or through another mirroring process, the existing serial
is kept.
* If a lower serial is provided than in a past import, the lower
serial is recorded, but the existing data is still overwritten.
This is not recommended.
* The data is reloaded from the provided file regardless of whether a
serial was provided, or what the provided serial is.

.. note::
When other databases mirror the source being loaded,
it is advisable to use incrementing serials, as they may use the
CURRENTSERIAL file to determine whether to run a new import.
Journals can not be kept of manually loaded sources.

For example, to load data for source TEST with serial 10::

irrd/scripts/load_database.py --source TEST --serial 10 test.db

The ``--config`` parameter can be used to read the configuration from a
different config file. Note that this script always acts on the current
configuration file - not on the configuration that IRRd started with.

.. caution::
Upon manually loading data, all existing journal entries for the
relevant source are discarded, as they may no longer be complete.
This only applies if loading was successful.

5 changes: 3 additions & 2 deletions irrd/mirroring/mirror_runners_import.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,9 @@ def run(self, database_handler: DatabaseHandler, serial_newest_seen: Optional[in

database_handler.disable_journaling()
for import_filename, to_delete in import_data:
MirrorFileImportParser(source=self.source, filename=import_filename, serial=import_serial,
database_handler=database_handler)
p = MirrorFileImportParser(source=self.source, filename=import_filename, serial=import_serial,
database_handler=database_handler)
p.run_import()
if to_delete:
os.unlink(import_filename)

Expand Down
50 changes: 36 additions & 14 deletions irrd/mirroring/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,32 +32,47 @@ class MirrorFileImportParser(MirrorParser):
This parser handles imports of files for mirror databases.
Note that this parser can be called multiple times for a single
full import, as some databases use split files.
If direct_error_return is set, run_import() immediately returns
upon an encountering an error message. It will return an error
string.
"""
obj_parsed = 0 # Total objects found
obj_errors = 0 # Objects with errors
obj_ignored_class = 0 # Objects ignored due to object_class_filter setting
obj_unknown = 0 # Objects with unknown classes
unknown_object_classes: Set[str] = set() # Set of encountered unknown classes

def __init__(self, source: str, filename: str, serial: Optional[int], database_handler: DatabaseHandler) -> None:
def __init__(self, source: str, filename: str, serial: Optional[int], database_handler: DatabaseHandler,
direct_error_return: bool=False) -> None:
logger.debug(f'Starting file import of {source} from {filename}, setting serial {serial}')
self.source = source
self.filename = filename
self.serial = serial
self.database_handler = database_handler
self.direct_error_return = direct_error_return
super().__init__()

self.run_import()

def run_import(self):
def run_import(self) -> Optional[str]:
"""
Run the actual import. If direct_error_return is set, returns an error
string on encountering the first error. Otherwise, returns None.
"""
f = open(self.filename, encoding='utf-8', errors='backslashreplace')
for paragraph in split_paragraphs_rpsl(f):
self.parse_object(paragraph)
error = self._parse_object(paragraph)
if error is not None:
return error

self.log_report()
f.close()
return None

def parse_object(self, rpsl_text: str) -> None:
def _parse_object(self, rpsl_text: str) -> Optional[str]:
"""
Parse a single object. If direct_error_return is set, returns an error
string on encountering an error. Otherwise, returns None.
"""
try:
self.obj_parsed += 1
# If an object turns out to be a key-cert, and strict_import_keycert_objects
Expand All @@ -67,32 +82,39 @@ def parse_object(self, rpsl_text: str) -> None:
obj = rpsl_object_from_text(rpsl_text.strip(), strict_validation=True)

if obj.messages.errors():
log_msg = f'Parsing errors: {obj.messages.errors()}, original object text follows:\n{rpsl_text}'
if self.direct_error_return:
return log_msg
self.database_handler.record_mirror_error(self.source, log_msg)
logger.critical(f'Parsing errors occurred while importing from file for {self.source}. '
f'This object is ignored, causing potential data inconsistencies. A new operation for '
f'this update, without errors, will still be processed and cause the inconsistency to '
f'be resolved. Parser error messages: {obj.messages.errors()}; '
f'original object text follows:\n{rpsl_text}')
self.database_handler.record_mirror_error(self.source, f'Parsing errors: {obj.messages.errors()}, '
f'original object text follows:\n{rpsl_text}')
self.obj_errors += 1
return
return None

if obj.source() != self.source:
msg = f'Invalid source {obj.source()} for object {obj.pk()}, expected {self.source}. '
logger.critical(msg + 'This object is ignored, causing potential data inconsistencies.')
msg = f'Invalid source {obj.source()} for object {obj.pk()}, expected {self.source}'
if self.direct_error_return:
return msg
logger.critical(msg + '. This object is ignored, causing potential data inconsistencies.')
self.database_handler.record_mirror_error(self.source, msg)
self.obj_errors += 1
return
return None

if self.object_class_filter and obj.rpsl_object_class.lower() not in self.object_class_filter:
self.obj_ignored_class += 1
return
return None

self.database_handler.upsert_rpsl_object(obj, forced_serial=self.serial)

except UnknownRPSLObjectClassException as e:
if self.direct_error_return:
return f'Unknown object class: {e.rpsl_object_class}'
self.obj_unknown += 1
self.unknown_object_classes.add(str(e).split(':')[1].strip())
self.unknown_object_classes.add(e.rpsl_object_class)
return None

def log_report(self) -> None:
obj_successful = self.obj_parsed - self.obj_unknown - self.obj_errors - self.obj_ignored_class
Expand Down
9 changes: 6 additions & 3 deletions irrd/mirroring/tests/test_mirror_runners_import.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,12 +312,15 @@ class MockMirrorFileImportParser:
rpsl_data_calls: List[str] = []
expected_serial = 424242

def __init__(self, source, filename, serial, database_handler):
with open(filename, 'r') as f:
self.rpsl_data_calls.append(f.read())
def __init__(self, source, filename, serial, database_handler, direct_error_return=False):
self.filename = filename
assert source == 'TEST'
assert serial == self.expected_serial

def run_import(self):
with open(self.filename, 'r') as f:
self.rpsl_data_calls.append(f.read())


class TestNRTMImportUpdateStreamRunner:
def test_run_import(self, monkeypatch, config_override):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,13 @@ def test_parse(self, monkeypatch, caplog, tmp_gpg_dir, config_override):
with tempfile.NamedTemporaryFile() as fp:
fp.write(test_input.encode('utf-8'))
fp.seek(0)
MirrorFileImportParser(
parser = MirrorFileImportParser(
source='TEST',
filename=fp.name,
serial=424242,
database_handler=mock_dh,
)
parser.run_import()
assert len(mock_dh.mock_calls) == 4
assert mock_dh.mock_calls[0][0] == 'upsert_rpsl_object'
assert mock_dh.mock_calls[0][1][0].pk() == '192.0.2.0/24AS65537'
Expand All @@ -62,6 +63,89 @@ def test_parse(self, monkeypatch, caplog, tmp_gpg_dir, config_override):
key_cert_obj = rpsl_object_from_text(SAMPLE_KEY_CERT, strict_validation=False)
assert key_cert_obj.verify(KEY_CERT_SIGNED_MESSAGE_VALID)

def test_direct_error_return_invalid_source(self, monkeypatch, caplog, tmp_gpg_dir, config_override):
config_override({
'sources': {
'TEST': {},
}
})
mock_dh = Mock()

test_data = [
SAMPLE_UNKNOWN_ATTRIBUTE, # valid, because mirror imports are non-strict
SAMPLE_ROUTE.replace('TEST', 'BADSOURCE'),
]
test_input = '\n\n'.join(test_data)

with tempfile.NamedTemporaryFile() as fp:
fp.write(test_input.encode('utf-8'))
fp.seek(0)
parser = MirrorFileImportParser(
source='TEST',
filename=fp.name,
serial=424242,
database_handler=mock_dh,
direct_error_return=True,
)
error = parser.run_import()
assert error == 'Invalid source BADSOURCE for object 192.0.2.0/24AS65537, expected TEST'
assert len(mock_dh.mock_calls) == 1
assert mock_dh.mock_calls[0][0] == 'upsert_rpsl_object'
assert mock_dh.mock_calls[0][1][0].pk() == '192.0.2.0/24AS65537'

assert 'Invalid source BADSOURCE for object' not in caplog.text
assert 'File import for TEST' not in caplog.text

def test_direct_error_return_malformed_pk(self, monkeypatch, caplog, tmp_gpg_dir, config_override):
config_override({
'sources': {
'TEST': {},
}
})
mock_dh = Mock()

with tempfile.NamedTemporaryFile() as fp:
fp.write(SAMPLE_MALFORMED_PK.encode('utf-8'))
fp.seek(0)
parser = MirrorFileImportParser(
source='TEST',
filename=fp.name,
serial=424242,
database_handler=mock_dh,
direct_error_return=True,
)
error = parser.run_import()
assert 'Invalid address prefix: not-a-prefix' in error
assert not len(mock_dh.mock_calls)

assert 'Invalid address prefix: not-a-prefix' not in caplog.text
assert 'File import for TEST' not in caplog.text

def test_direct_error_return_unknown_class(self, monkeypatch, caplog, tmp_gpg_dir, config_override):
config_override({
'sources': {
'TEST': {},
}
})
mock_dh = Mock()

with tempfile.NamedTemporaryFile() as fp:
fp.write(SAMPLE_UNKNOWN_CLASS.encode('utf-8'))
fp.seek(0)
parser = MirrorFileImportParser(
source='TEST',
filename=fp.name,
serial=424242,
database_handler=mock_dh,
direct_error_return=True,
)
error = parser.run_import()
assert error == 'Unknown object class: foo-block'
assert not len(mock_dh.mock_calls)

assert 'Unknown object class: foo-block' not in caplog.text
assert 'File import for TEST' not in caplog.text


class TestNRTMStreamParser:
def test_test_parse_nrtm_v3_valid(self):
Expand Down
57 changes: 57 additions & 0 deletions irrd/scripts/load_database.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/usr/bin/env python
# flake8: noqa: E402
import argparse
import logging
import sys

from pathlib import Path

"""
Load an RPSL file into the database.
"""

logger = logging.getLogger(__name__)
sys.path.append(str(Path(__file__).resolve().parents[2]))

from irrd.conf import config_init, CONFIG_PATH_DEFAULT
from irrd.mirroring.parsers import MirrorFileImportParser
from irrd.storage.database_handler import DatabaseHandler


def load(source, filename, serial) -> int:
dh = DatabaseHandler()
dh.delete_all_rpsl_objects_with_journal(source)
dh.disable_journaling()
parser = MirrorFileImportParser(source, filename, serial=serial, database_handler=dh, direct_error_return=True)
error = parser.run_import()
if error:
dh.rollback()
else:
dh.commit()
dh.close()
if error:
print(f'Error occurred while processing object:\n{error}')
return 1
return 0


def main(): # pragma: no cover
description = """Load an RPSL file into the database."""
parser = argparse.ArgumentParser(description=description)
parser.add_argument('--config', dest='config_file_path', type=str,
help=f'use a different IRRd config file (default: {CONFIG_PATH_DEFAULT})')
parser.add_argument('--serial', dest='serial', type=int,
help=f'serial number (optional)')
parser.add_argument('--source', dest='source', type=str, required=True,
help=f'name of the source, e.g. NTTCOM')
parser.add_argument('input_file', type=str,
help='the name of a file to read')
args = parser.parse_args()

config_init(args.config_file_path)

sys.exit(load(args.source, args.input_file, args.serial))


if __name__ == '__main__': # pragma: no cover
main()
Loading

0 comments on commit efe4430

Please sign in to comment.