Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a "dry-run" functionality for bulk import/update #13778

Closed
peteeckel opened this issue Sep 15, 2023 · 8 comments
Closed

Provide a "dry-run" functionality for bulk import/update #13778

peteeckel opened this issue Sep 15, 2023 · 8 comments
Labels
type: feature Introduction of new functionality to the application

Comments

@peteeckel
Copy link
Contributor

NetBox version

v3.6.1

Feature type

New functionality

Proposed functionality

Based on the brief discussion in #13773 I suggest implementing a "dry-run" functionality for bulk import/update data.

Use case

This FR needs to be seen in conjunction with #13775 und #13777. Importing or updating data in bulk can be complex and deal with large amounts of data and certain errors such as mis-spelled or mis-cased headers currently result in data being silently ignored.

In case of an input, data sets with some invalid header names are currently silently ignored while the remainder of the columns is imported, which requires a subsequent update run with a new data set including IDs. It would be helpful to validate the input data and check for this kind of error before the import is actually executed so errors can be fixed.

Database changes

None

External dependencies

None

@peteeckel peteeckel added the type: feature Introduction of new functionality to the application label Sep 15, 2023
@peteeckel peteeckel changed the title Provide a "dry-run" functionality for bulk update/import Provide a "dry-run" functionality for bulk import/update Sep 15, 2023
@jeremystretch
Copy link
Member

I don't see how this would really help. After completing a "dry run" import, you would presumably only be able to see what attributes are listed in the resulting import table: Any others that happen not to be displayed in the table cannot be verified.

Additionally, the mechanism by which imported objects are displayed would not permit this behavior. After objects are imported, NetBox redirects the user to a list of objects filtered by request ID. This would not be feasible if the imported objects don't actually exist.

@peteeckel
Copy link
Contributor Author

peteeckel commented Sep 15, 2023

In combination with the features suggested in #13775 and especially #13777 a dry-run would give the user a list of ignored columns that would not be used in the import without actually performing the import.

Without the "dry-run" the import is performed, but lacking the affected columns. Currently this might not even be noticed, with a notice that these rows haven't been imported would at the very least require a bulk update which needs an additional ID column - which in turn requires exporting the records and generating a new import set.

With a dry-run the user will be aware that there are some problematic columns and can fix them for the import, thus avoiding the need for the subsequent update.

@jeremystretch
Copy link
Member

a dry-run would give the user a list of ignored columns that would not be used in the import without actually performing the import.

How? Maybe an example would help.

@peteeckel
Copy link
Contributor Author

peteeckel commented Sep 15, 2023

Let's assume someone is trying to import the following data set:

address,status,dnsname
10.0.0.1/16,active,node1.zone1.example.com
10.0.0.2/16,active,node2.zone1.example.com
10.0.0.3/16,active,node3.zone1.example.com
10.0.0.4/16,active,node4.zone1.example.com
10.0.0.5/16,active,node5.zone1.example.com
10.0.0.6/16,active,node6.zone1.example.com
10.0.0.7/16,active,node7.zone1.example.com
[...]
10.0.0.254/16,active,node7.zone1.example.com
10.0.1.1/16,active,node1.zone2.example.com
10.0.1.2/16,active,node2.zone2.example.com
10.0.1.3/16,active,node3.zone2.example.com
10.0.1.4/16,active,node4.zone2.example.com
10.0.1.5/16,active,node5.zone2.example.com
10.0.1.6/16,active,node6.zone2.example.com
10.0.1.7/16,active,node7.zone2.example.com
[...]
10.0.1.254/16,active,node7.zone2.example.com
[...]
10.0.16.1/16,active,node1.zone16.example.com
10.0.16.2/16,active,node2.zone16.example.com
10.0.16.3/16,active,node3.zone16.example.com
10.0.16.4/16,active,node4.zone16.example.com
10.0.16.5/16,active,node5.zone16.example.com
10.0.16.6/16,active,node6.zone16.example.com
10.0.16.7/16,active,node7.zone16.example.com
[...]
10.0.16.254/16,active,node7.zone16.example.com

(and imagine the data being less schematic to add a bit of complexity).

What currently happens, as the dns_name field is optional, is that all data would be imported without the dns_name as the column header is not spelled correctly.

Now since the field is missing for all records, the only way to fix it is a bulk update. For that, the user needs the IDs for the IPAddress objects in question, so the CSV data need to be amended:

id,address,status,dns_name
1,10.0.0.1/16,active,node1.zone1.example.com
2,10.0.0.2/16,active,node2.zone1.example.com
3,10.0.0.3/16,active,node3.zone1.example.com
4,10.0.0.4/16,active,node4.zone1.example.com
5,10.0.0.5/16,active,node5.zone1.example.com
6,10.0.0.6/16,active,node6.zone1.example.com
7,10.0.0.7/16,active,node7.zone1.example.com
[...]
8,10.0.0.254/16,active,node7.zone1.example.com
9,10.0.1.1/16,active,node1.zone2.example.com
10,10.0.1.2/16,active,node2.zone2.example.com
11,10.0.1.3/16,active,node3.zone2.example.com
12,10.0.1.4/16,active,node4.zone2.example.com
13,10.0.1.5/16,active,node5.zone2.example.com
14,10.0.1.6/16,active,node6.zone2.example.com
15,10.0.1.7/16,active,node7.zone2.example.com
[...]
16,10.0.1.254/16,active,node7.zone2.example.com
[...]
17,10.0.16.1/16,active,node1.zone16.example.com
18,10.0.16.2/16,active,node2.zone16.example.com
19,10.0.16.3/16,active,node3.zone16.example.com
20,10.0.16.4/16,active,node4.zone16.example.com
21,10.0.16.5/16,active,node5.zone16.example.com
22,10.0.16.6/16,active,node6.zone16.example.com
23,10.0.16.7/16,active,node7.zone16.example.com
[...]
24,10.0.16.254/16,active,node7.zone16.example.com

A dry run would have returned the message like 'Field "dnsname" is unknown and will not be imported' (provided #13777 gets implemented) without actually importing anything, thereby giving the user the chance to fix the issue by correcting the header field.

@jeremystretch
Copy link
Member

Ok, I think I understand the concern better, thanks. I believe this would be addressed by #11617, which seeks to raise a validation error on the presence of an unrecognized column header.

In general I don't like the concept of dry runs because in the best case scenario, they require wasting time, and in the worst the user forgets to utilize them in the first place.

@peteeckel
Copy link
Contributor Author

Absolutely d'accord, but in #13773 @pv2b answered that silently ignoring this kind of error was a feature and not a bug, and he suggested the dry-run feature as a way to solve the issue. I'd prefer the error message in combination with not accepting errorneous data as well.

@jeremystretch
Copy link
Member

in #13773 @pv2b answered that silently ignoring this kind of error was a feature and not a bug

I'll admit it's a bit subjective, but I'd prefer to treat it as a bug per the principle of least astonishment.

@peteeckel
Copy link
Contributor Author

I'll admit it's a bit subjective, but I'd prefer to treat it as a bug per the principle of least astonishment.

Since I was quite astounded when I stumbled across this behaviour today I'm totally with you on that. Especially since in many cases you won't even notice that something is missing, i.e. when the misspelled column is not in the list of columns displayed in the table popping up after the import.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: feature Introduction of new functionality to the application
Projects
None yet
Development

No branches or pull requests

2 participants