Skip to content

Commit

Permalink
Merge pull request #12 from sarnold/yagrep
Browse files Browse the repository at this point in the history
add support for simple control ID analysis
  • Loading branch information
sarnold authored Mar 12, 2024
2 parents 511577b + f502a9d commit 526ad3a
Show file tree
Hide file tree
Showing 23 changed files with 1,582 additions and 75 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-20.04, macos-latest, windows-latest]
python-version: [3.7, 3.8, 3.9, '3.10']
python-version: [3.8, 3.9, '3.10', '3.11']
steps:
- name: Set git crlf/eol
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -150,11 +150,11 @@ jobs:
- name: Setup old python for test
uses: actions/setup-python@v4
with:
python-version: 3.7
python-version: 3.8

- name: Generate coverage
run: |
tox -e coverage,py37,py311
tox -e coverage,py38,py311
- name: Code Coverage Summary Report (data)
uses: irongut/CodeCoverageSummary@v1.3.0
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-20.04, macos-latest, windows-latest]
python-version: [3.7, 3.9, '3.11']
python-version: [3.8, 3.9, '3.11']

steps:
- name: Set git crlf/eol
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-20.04, macos-latest, windows-latest]
python-version: [3.7, 3.8, 3.9, '3.10']
python-version: [3.8, 3.9, '3.10', '3.11']

steps:
- name: Set git crlf/eol
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@ __pycache__/
src/ymltoxml/_version.py
.ymltoxml.y*
.yasort.y*
.yagrep.y*
in.*
out.*
sorted-out/
munch/
nested_lookup/

# C extensions
*.so
Expand Down
119 changes: 99 additions & 20 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@

Python command line tools to convert between XML_ files and YAML_ files,
preserving attributes and comments (with minor corrections). The default
file encoding for both types is UTF-8 without a BOM. Includes another
console entry point to sort large YAML lists (eg, lists of rules found
in the `SCAP Security Guide`_).
file encoding for both types is UTF-8 without a BOM. Now includes more
console entry points to grep or sort interesting YAML files (eg, lists
of rules found in the `SCAP Security Guide`_).

.. _SCAP Security Guide: https://github.com/ComplianceAsCode/content

Expand Down Expand Up @@ -47,8 +47,8 @@ idiom to install it on your system in a virtual env after cloning::
The alternative to python venv is the ``tox`` test driver. If you have it
installed already, see the example tox commands below.

Usage
-----
ymltoxml
--------

The current version supports minimal command options; if no options are
provided, the only required arguments are one or more files of a single
Expand Down Expand Up @@ -102,20 +102,75 @@ configuration file, do::
$ ymltoxml --dump-config > .ymltoxml.yaml
$ $EDITOR .ymltoxml.yaml

An additional helper script is now provided for sorting large (YAML) lists.
The new ``yasort`` script uses its own configuration file, creatively named
``yasort.yaml``. The above applies equally to this new config file.
yagrep
------

A new helper script is now included for searching keys or values in
YAML files. The ``yagrep`` script also has its own built-in config
file, which can be copied and edited as shown above. In this case the
script is intended to feel more-or-less like ``grep`` so the default
config should Just Work. That said, the script uses the ``dpath``
python library, so you may need to change the default "path" separator
if your data has keys containing forward slashes (see the `upstream
docs`_ for details).

General usage guidelines:

* use the ``-f`` (filter) arg to search for a value string
* follow the (json) output from above to find the key name
* then use the ``-l`` (lookup) arg to extract the values for the above key

Useful yagrep config file settings:

:default_separator: change the path separator to something like ``;`` if data
has forward slashes
:output_format: set the output format to ``raw`` for unformmated output

::

$ yasort
$ yagrep -h
usage: yagrep [-h] [--version] [-v] [-d] [-s] [-f | -l] TEXT FILE [FILE ...]

Search in YAML files for keys and values.

positional arguments:
TEXT Text string to look for (one-only, required) (default:
None)
FILE Look in file(s) for text string (at least one, required)
(default: None)

options:
-h, --help show this help message and exit
--version show program's version number and exit
-v, --verbose Display more processing info (default: False)
-d, --dump-config Dump default configuration file to stdout (default:
False)
-s, --save-config save active config to default filename (.yagrep.yml) and
exit (default: False)
-f, --filter Filter out data not matching input string (no paths)
(default: False)
-l, --lookup Lookup by key and return list of values for any matches
(default: False)


.. _upstream docs: https://github.com/dpath-maintainers/dpath-python

yasort
------

Another helper script is included for sorting large (YAML) lists.
The ``yasort`` script also uses its own configuration file, creatively named
``.yasort.yaml``. The above applies equally to this config file.

::

$ yasort -h
usage: yasort [-h] [--version] [-v] [-d] [-s] [FILE ...]

Sort YAML lists and write new files.

positional arguments:
FILE Process input file(s) to target directory (default:
None)
FILE Process input file(s) to target directory (default: None)

options:
-h, --help show this help message and exit
Expand All @@ -126,18 +181,42 @@ The new ``yasort`` script uses its own configuration file, creatively named
-s, --save-config save active config to default filename (.yasort.yml) and
exit (default: False)

All of the optional arguments for ``yasort`` are essentially orthogonal to
sorting, thus the only required argument for normal usage is one or more
input files. All of the user settings are in the default configuration file
shown below; use the ``--save-config`` option to create your own config file.

Default yasort.yaml:

.. code-block:: yaml
---
# comments should be preserved
file_encoding: 'utf-8'
default_yml_ext: '.yaml'
output_dirname: 'sorted-out'
default_parent_key: 'controls'
default_sort_key: 'rules'
has_parent_key: true
preserve_quotes: true
process_comments: false
mapping: 4
sequence: 6
offset: 4
Features and limitations
------------------------

We only test on mavlink XML message definitions, so it probably *will not*
work at all on arbitrarily complex XML files with namespaces, etc. The
current round-trip is not exact, due to the following:
We mainly test on mavlink XML message definitions and NIST/SSG YAML files,
so round-trip conversion *may not* work at all on arbitrarily complex XML
files with namespaces, etc. The current round-trip is not exact, due to
the following:

* missing encoding is added to version tag
* leading/trailing whitespace in text elements and comments is not preserved
* elements with self-closing tags are converted to full closing tags
* empty elements on more than one line are not preserved
* XML - elements with self-closing tags are converted to full closing tags
* XML - empty elements on more than one line are not preserved

For the files tested (eg, mavlink) the end result is cleaner/shinier XML.

Expand Down Expand Up @@ -172,13 +251,13 @@ only Git, Python, and Tox.
SCAP support
------------

The yasort tool is also intended to be part of a larger workflow, mainly
The yasort/yagrep tools are intended to be part of a larger workflow, mainly
working with SCAP content, ie, the scap-security-guide source files (or
just content_). It is currently used to sort profiles with large numbers
of rules to make it easier to visually diff and spot duplicates, etc.

The configuration file defaults are based on existing yaml structure, but
you are free to change them for another use case. To adjust how the sorting
The yasort configuration file defaults are based on existing yaml structure,
but feel free to change them for another use case. To adjust how the sorting
works, make a local config file (see above) and edit as needed the following
options:

Expand All @@ -189,7 +268,7 @@ options:
:default_yml_ext: change the output file extension

The rest of the options are for YAML formatting/flow style (see the ruamel_
documetation for formatting details)
documentation for formatting details)

.. _content: https://complianceascode.readthedocs.io/en/latest/
.. _ruamel: https://yaml.readthedocs.io/en/latest/
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
importlib-metadata; python_version < '3.8'
importlib-resources; python_version < '3.10'
dpath
munch
nested-lookup
PyYAML
ruamel.yaml
xmltodict
53 changes: 53 additions & 0 deletions scripts/analyze_control_ids.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""
Simple ID string counter.
"""

import os
import sys
import typing
from collections import Counter
from pathlib import Path

from ymltoxml.utils import get_profile_sets

id_count: typing.Counter[str] = Counter()
FILE = os.getenv('ID_FILE', default='tests/data/PRIVACY-ids.txt')
DEBUG = os.getenv('DEBUG', default=None)
SELFTEST = os.getenv('SELFTEST', default=None)

if not Path(FILE).exists():
print(f'Input file {FILE} not found!')
sys.exit(1)

input_ids = list(Path(FILE).read_text(encoding='utf-8').splitlines())
in_set = set(input_ids)

print(f"Input control IDs -> {len(in_set)}")
if DEBUG:
print(sorted(in_set))
if SELFTEST:
id_sets, id_names = get_profile_sets('tests/data')
else:
id_sets, id_names = get_profile_sets('800-53-control-ids/nist')

for id_set, ptype in zip(id_sets, id_names):
print(f"\n{ptype} profile control IDs -> {len(id_set)}")

print(f"Input set is in {ptype} set: {id_set > in_set}")
common_set = sorted(id_set & in_set)
print(f"Num input controls in {ptype} set -> {len(common_set)}")
not_in_set = sorted(in_set - id_set)
print(f"Num input controls not in {ptype} set -> {len(not_in_set)}")
if DEBUG:
print(f"Input controls not in {ptype} set: {not_in_set}")

print(f"\n{id_names[2]} set is in {id_names[0]} set: {id_sets[0] > id_sets[2]}")
print(f"{id_names[2]} set is in {id_names[1]} set: {id_sets[1] > id_sets[2]}")
print(f"{id_names[1]} set is in {id_names[0]} set: {id_sets[0] > id_sets[1]}")
print(f"{id_names[3]} set is in {id_names[0]} set: {id_sets[0] > id_sets[3]}")

if DEBUG:
not_in_high = sorted(in_set - id_sets[0])
print("\nInput controls not in HIGH set\n")
for ctl_id in not_in_high:
print(ctl_id)
10 changes: 7 additions & 3 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,35 +1,38 @@
[metadata]
name = ymltoxml
version = attr: setuptools_scm.get_version
description = attr: ymltoxml.__description__
description = Console tools for YAML and XML processing with config files in YAML.
url = https://github.com/sarnold/ymltoxml
author = Stephen Arnold
author_email = nerdboy@gentoo.org
long_description = file: README.rst
long_description_content_type = text/rst; charset=UTF-8
license_expression = LGPL-2.1-or-later
license_files = LICENSE
license = LGPLv2+
classifiers =
Development Status :: 4 - Beta
Intended Audience :: Developers
Programming Language :: Python
Environment :: Console
Topic :: Software Development
Topic :: Software Development :: Testing
License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)

[options]
python_requires = >= 3.7
python_requires = >= 3.8

setup_requires =
setuptools_scm[toml]

install_requires =
importlib-metadata; python_version < '3.8'
importlib-resources; python_version < '3.10'
nested-lookup
xmltodict
munch
ruamel.yaml
PyYAML
dpath

packages = find_namespace:
package_dir =
Expand All @@ -46,6 +49,7 @@ ymltoxml.data =
console_scripts =
ymltoxml = ymltoxml.ymltoxml:main
yasort = ymltoxml.yasort:main
yagrep = ymltoxml.yagrep:main

# extra deps are included here mainly for local/venv installs using pip
# otherwise deps are handled via tox, ci config files or pkg managers
Expand Down
4 changes: 0 additions & 4 deletions src/ymltoxml/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1 @@
"""Console tools for YAML/XML processing with config files in YAML."""

__description__ = "Console tools for YAML/XML conversion and sorting."

__all__ = ["__description__"]
11 changes: 11 additions & 0 deletions src/ymltoxml/data/yagrep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
# comments should be preserved
file_encoding: 'utf-8'
default_yml_ext: '.yaml'
default_separator: '/'
output_format: 'json'
preserve_quotes: true
process_comments: false
mapping: 4
sequence: 6
offset: 4
Loading

0 comments on commit 526ad3a

Please sign in to comment.