Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add simple TUF role metadata model #1112

Merged
merged 16 commits into from
Sep 10, 2020

Conversation

lukpueh
Copy link
Member

@lukpueh lukpueh commented Aug 18, 2020

This PR copies over the new api/metadata.py module (+ tests) added in #1060, which has shifted focus to implementing a Hashicorp Vault interface.

Co-authorship by @trishankatdatadog, @joshuagl, @sechkova and me plus note-worthy commit message details are preserved in a single squashed commit.

Missing TODOs items for this PR include at least.

  • add code documentation (docstrings)
  • finalize signature creation and verification interface
  • add input validatation

See doc header of api/metadata.py for more details


# Copyright 2020, New York University and the TUF contributors
# SPDX-License-Identifier: MIT OR Apache-2.0
""" Unit tests for api/metdata.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
""" Unit tests for api/metdata.py
""" Unit tests for api/metadata.py

@lukpueh lukpueh force-pushed the simple-tuf-api branch 3 times, most recently from be19a11 to ae23d32 Compare August 19, 2020 08:13
Add metadata module with container classes for TUF role metadata, including
methods to read/serialize/write from and to JSON, perform TUF-compliant
metadata updates, and create and verify signatures.

The 'Metadata' class provides a container for inner TUF metadata objects (Root,
Timestamp, Snapshot, Targets) (i.e. OOP composition)

The 'Signed' class provides a base class to aggregate common attributes (i.e.
version, expires, spec_version) of the inner metadata classes. (i.e. OOP
inheritance). The name of the class also aligns with the 'signed' field of
the outer metadata container.

Based on prior observations in TUF's sister project in-toto, this architecture
seems to well represent the metadata model as it is defined in the
specification (see in-toto/in-toto#98 and in-toto/in-toto#142 for related
discussions).

This commits also adds tests.

**TODO: See doc header TODO list**

**Additional design considerations**
(also in regards to prior sketches of this module)

 - Aims at simplicity, brevity and recognizability of the wireline metadata
   format.

 - All attributes that correspond to fields in TUF JSON metadata are public.
   There doesn't seem to be a good reason to protect them with leading
   underscores and use setters/getters instead, it just adds more code, and
   impedes recognizability of the wireline metadata format.

 - Although, it might be convenient to have short-cuts on the Metadata class
   that point to methods and attributes that are common to all subclasses of
   the contained Signed class (e.g. Metadata.version instead of
   Metadata.signed.version, etc.), this also conflicts with goal of
   recognizability of the wireline metadata. Thus we won't add such short-cuts
   for now. See:
   theupdateframework#1060 (comment)

 - Signing keys and a 'consistent_snapshot' boolean are not on the targets
   metadata class. They are a better fit for management code. See:
   theupdateframework#1060 (comment),
   and theupdateframework#660.

 - Does not use sslib schema checks (see TODO notes about validation in
   doc header)

 - Does not use existing tuf utils, such as make_metadata_fileinfo,
   build_dict_conforming_to_schema, if it is easy and more explicit to
   just re-implement the desired behavior on the metadata classes.

 - All datetime's are treated as UTC. Since timezone info is not captured in
   the wireline metadata format it should not be captured in the internal
   representation either.

 - Does not use 3rd-party dateutil package, in order to minimize dependency
   footprint, which is especially important for update clients which often have
   to vendor their dependencies.
   However, compatibility between the more advanced dateutil.relativedelta (e.g
   handles leap years automatically) and timedelta is tested.

 - Uses PEP8 indentation (4 space) and Google-style doc string instead of
   sslab-style. See
   secure-systems-lab/code-style-guidelines#20

 - Does not support Python =< 3.5

Co-authored-by: Trishank Karthik Kuppusamy <trishank.kuppusamy@datadoghq.com>
Co-authored-by: Joshua Lock <jlock@vmware.com>
Co-authored-by: Teodora Sechkova <tsechkova@vmware.com>
Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
@sechkova
Copy link
Contributor

I will be leaving some thoughts here while looking at the way the client side (Updater) may use the api/metadata.

  • There are several places in the code where Metadata needs to be loaded by passing a file object instead of a filename, in particular when Updater downloads temporary metadata files and verifies them. Should we plan for adding another method like read_from_json_object or we may use the Metadata() constructor directly by passing the file object already serialised as a dictionary and not complicating this api?

@lukpueh
Copy link
Member Author

lukpueh commented Aug 31, 2020

  • There are several places in the code where Metadata needs to be loaded by passing a file object instead of a filename, in particular when Updater downloads temporary metadata files and verifies them. Should we plan for adding another method like read_from_json_object or we may use the Metadata() constructor directly by passing the file object already serialised as a dictionary and not complicating this api?

Great idea! What about the following interface?

# Classmethod on all classes
def from_dict(json_dict):
    """
    Initialize object from Python dictionary, and call from_dict for any
    inner complex attribute that is represented by a class.

    In the case of Metadata.from_dict(), it must dispatch to the corresponding
    from_dict of the base class of Signed, identified by
    json_dict['signed']['_type'].

    NOTE: This is basically a mix of the constructor and load_from_json and
    the inverse of `as_dict` in the current version of metadata.py. """

# Classmethods on Metadata (and optionally on Signed)
def from_json(json_object):
    """Deserialize a json string and pass to from_dict(). """

def from_json_file(filename):
    """Read file from storage and pass to from_json(). """

@lukpueh lukpueh force-pushed the simple-tuf-api branch 3 times, most recently from a71b03e to 520bc20 Compare September 3, 2020 14:26
@lukpueh lukpueh changed the title Add simple TUF role metadata model (WIP) Add simple TUF role metadata model Sep 3, 2020
@lukpueh lukpueh marked this pull request as ready for review September 3, 2020 14:30
@lukpueh
Copy link
Member Author

lukpueh commented Sep 3, 2020

Hey @sechkova, I added an Metadata.from_json(metadat_json_string) in a7a33ce. Let me know if this is what you had in mind. 1cdae24 generally cleans up the deserialization interface as I proposed in #1112 (comment) (see detailed commit message).

@woodruffw, @joshuagl, @trishankatdatadog, I'd say this is ready for review aka. throwing rocks at it. It still misses some features but what's there is tested, documented and should serve our purposes for PEP 458. I think we can work off the todo list from metadata.py's doc header independently in follow-up PRs.

version: The metadata version number.
spec_version: The TUF specification version number (semver) the
metadata format adheres to.
expires: The metadata expiration date in 'YYYY-MM-DDTHH:MM:SSZ' format.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now a datetimeobject.

tuf/api/metadata.py Outdated Show resolved Hide resolved
and assign attributes later on.
- Fail on bad json metadata in write_to_json method, but with option to
disable check as there might be a justified reason to write WIP
metadata to json.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. what do others think about the ValidationMixin methods we use in in-toto? (also see Layout._validate_* methods for example usage)

I think it's a commendable way of doing this (credits to @SantiagoTorres)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That ValidationMixin looks nice! I'd be happy to see it used here in tuf

Copy link
Member

@trishankatdatadog trishankatdatadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, @lukpueh! 🎉

I don't have strong opinions about this PR: it looks largely like the Metadata API I originally proposed, except for changes like removing KeyRing and Threshold so on (we can argue about that later). As long as @woodruffw and @joshuagl also approve, I'm 👍🏽

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice streamlining of the metadata API PR. Thanks @lukpueh! I have a few minor questions and comments inline. I'm also curious to understand why version and spec_version have different types? version: int and spec_version: str, is there any reason spec_version can't be an int?

tuf/api/metadata.py Outdated Show resolved Hide resolved

FIXME: Is this behavior expected? An alternative approach would be
to raise an exception if no signature is found for the keyid,
and/or if more than one sigantures are found for the keyid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and/or if more than one sigantures are found for the keyid.
and/or if more than one signatures are found for the keyid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the alternative approach of raising an exception if no signature is found, or more than one signature is found, for the keyid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I do too. Any advice on the exception type? At some point we should discuss exception taxonomy guidelines. But we shouldn't block on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to have an exception in tuf.exceptions or securesystemslib.exceptions that would fit, we could either throw a generic exception or define a new one. Either way, it will probably need some cleanup once guidelines are defined.

cls.temporary_directory = tempfile.mkdtemp(dir=os.getcwd())

test_repo_data = os.path.join(
os.path.dirname(os.path.realpath(__file__)), 'repository_data')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 space indentation on a continuation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Good question. I always thought PEP 8 required double-indentation for hanging indent, but it is only a suggestion, and also only to distinguish it from a subsequent nested block.

IMO it's a good idea to always differentiate line continuation indentation from functional indentation, even if there is no subsequent nested block. But I'm fine either way.

FYI, the google style guide does not say anything about double indent but has some examples that could be interpreted that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's a good idea to always differentiate line continuation indentation from functional indentation, even if there is no subsequent nested block. But I'm fine either way.

That seems like a reasonable coding style for us to adopt. We should document it and teach the linter about it 😃

and assign attributes later on.
- Fail on bad json metadata in write_to_json method, but with option to
disable check as there might be a justified reason to write WIP
metadata to json.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That ValidationMixin looks nice! I'd be happy to see it used here in tuf

# Convert 'expires' TUF metadata string to a datetime object, which is
# what the constructor expects and what we store. The inverse operation
# is implemented in 'to_dict'.
signed_dict['expires'] = iso8601.parse_date(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need an iso8601 dependency if we're only targeting Python 3.5+, can we just use datetime.fromisoformat() here?

@@ -3,7 +3,12 @@
# Copyright 2020, New York University and the TUF contributors
# SPDX-License-Identifier: MIT OR Apache-2.0
""" Unit tests for api/metdata.py

Skipped on Python < 3.6.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume the goal would be to revert this patch once all Python 2.7 supporting code has been removed?

tuf/api/metadata.py Outdated Show resolved Hide resolved
@lukpueh
Copy link
Member Author

lukpueh commented Sep 8, 2020

version: int and spec_version: str, is there any reason spec_version can't be an int?

Yes, spec_version is in semver format.

@lukpueh
Copy link
Member Author

lukpueh commented Sep 8, 2020

Thanks for the reviews, @joshuagl and @trishankatdatadog!. I addressed the minor comments in 0a9fc93, which I'm happy to squash with previous commits before merging. The more serious change of Metadata.verify behaviour is in a separate commit, to make review easier. :)

Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lukpueh, this looks solid. I'm very happy to see this long-discussed model implemented.

One minor typo in a comment, but otherwise this looks good to rebase and merge.

@@ -19,6 +17,7 @@
from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta

# TODO: Remove case handling when fully dropping support for versions >= 3.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO: Remove case handling when fully dropping support for versions >= 3.6
# TODO: Remove case handling when fully dropping support for versions < 3.6

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 thanks for catching this!


import iso8601
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Comment on lines +320 to +299
signed_dict['expires'] = datetime.strptime(
signed_dict['expires'],
"%Y-%m-%dT%H:%M:%SZ").replace(tzinfo=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for adding this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have three uses of iso8601.parse_date in the current codebase, so this might be a good candidate for a helper function in future.

This was referenced Sep 10, 2020
The new metadata module uses constructs that are only available
on Python >= 3.6 (typing, f-format strings, etc.).

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Add generic read from json class method that returns a Metadata
object with a signed field that contains the appropriate Signed
subclass, based on the signed._type field of the read metadata.

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Add simple as_json Metadata method and use it instead of repository
lib's internal _get_written_metadata function in write_to_json.

This commit further adds code documentation and the possibility to
write compact json by excluding whitespace to write_to_json, and
also removes a call to the sign method from write_to_json.

The commit also adds tests.

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Add simple methods to create or verify signatures of the
canonical_signed property of a Metadata object.

See corresponding docstrings for behavior and design
considerations.

The commit also adds tests and updates the test setup to load
some test keys into memory.

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Remove metadata factory on Signed class, for the sake of API
simplicity/non-ambiguity, i.e. it's enough to have one
way of loading any Metadata, that is:
Metadata.read_from_json

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Consistenly rename de/serialization interface methods, using
a 'from_' and 'to_' prefix.

read_from_json -> from_json_file
write_to_json  -> to_json_file
as_json        -> to_json
as_dict        -> to_dict
signed_bytes   -> to_canonical_bytes

The latter is also changed from a property to a method for
consistency with the other serialization methods.

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
This commit better separates the Metadata class model from the
Metadata wireline format, by tailoring the constructors
towards class-based parameters and adding an additional
factory classmethod that creates Metadata objects based on the
wireline json/dictionary metadata representation. (pythonic
way of constructor overloading).

This 'from_dict' factory method recurses into the 'from_dict'
methods of each contained complex field/attribute that is also
represented by a class. Currently 'signed' is the only such
attribute.

This commit further:
- Changes optional constructor keyword arguments to mandatory
positional arguments: Reduces code and simplifies usage by
restricting it. For now, users are unlikely to call
constructor directly anyway, but the 'from_dict' factory (or
its 'from_json_file' wrapper) instead.

- Removes Signed.__expiration (datetime) vs. Signed.expires
(datestring) dichotomy: Keeping only one representation of the
same attribute in memory makes the interface simpler and less
ambiguous. We choose the datetime object, because it is more
convenient to modify. Transformation from and to the string
format required by the tuf wireline format is performed in the
corresponding metadata de/serialization methods, i.e.
('to_dict' and 'from_dict').

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Add convenience wrapper that takes a json string and passes it
to from_dict to create a Metadata object.

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Change Metadata.verify(key) behavior to raise an exception if
none or multiple signatures for the passed key are found on the
Metadata object.

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
See:
Add root metadata class to new TUF metadata model theupdateframework#1137
Add classes for complex metadata fields theupdateframework#1139
Add input validation to simple metadata api theupdateframework#1140

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Use builtin datetime instead of external iso6801 for simple
datetime string parsing. Also see
theupdateframework#1065

Signed-off-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
@lukpueh
Copy link
Member Author

lukpueh commented Sep 10, 2020

Thanks again for the reviews! I just cleaned up the git history, fixed a minor inconsistency and removed the todo list after created GitHub issues for each item.
The diff of the new force-pushed tip to your last review can be seen here: https://github.com/lukpueh/tuf/compare/simple-tuf-api-before-rebase..simple-tuf-api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants