Skip to content

Commit

Permalink
Merge pull request #39 from nexB/37-basic-scan
Browse files Browse the repository at this point in the history
Add new pipeline for codebase scan for #37

Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
  • Loading branch information
pombredanne authored Nov 10, 2020
2 parents 62dbe3a + 8a22e7f commit f03d04b
Show file tree
Hide file tree
Showing 20 changed files with 500 additions and 50 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@

### v1.0.4 (unreleased)

- Add new ScanCodebase pipeline for codebase scan
Fix for https://github.com/nexB/scancode.io/issues/37

- Upgrade Django, Metaflow, and ScanCode-toolkit to latest versions

### v1.0.3 (2020-09-24)
Expand Down
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ First application is for Docker container and VM composition analysis.
:caption: Tutorial

scanpipe-tutorial-1
scanpipe-tutorial-2

.. toctree::
:maxdepth: 2
Expand All @@ -25,6 +26,7 @@ First application is for Docker container and VM composition analysis.
scanpipe-concepts
scanpipe-command-line
scanpipe-api
scancodeio-settings
offline-installation

Indices and tables
Expand Down
22 changes: 22 additions & 0 deletions docs/scancodeio-settings.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.. _scancodeio_settings:

ScanCode.io Settings
====================

The ``.env`` file is created at the root of the ScanCode.io codebase during its
installation.
You can configure your preferences using the following settings in the ``.env``
file.

SCANCODE_DEFAULT_OPTIONS
------------------------

Use this settings to provide default options for running the scancode-toolkit.

Refer to `ScanCode-toolkit Available Options <https://scancode-toolkit.readthedocs.io/en/latest/cli-reference/list-options.html>`_
for the full options list.

The following example explicitly define a value for timeout and set the number
of parallel processes to 4::

SCANCODE_DEFAULT_OPTIONS=--processes 4,--timeout 60
2 changes: 2 additions & 0 deletions docs/scanpipe-command-line.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ Optional arguments:

- ``--input INPUTS`` Input file locations to copy in the :guilabel:`input/` workspace directory.

- ``--run`` Start running the pipelines right after project creation.

.. warning::
The pipelines are added and will be running in the order of the provided options.

Expand Down
18 changes: 10 additions & 8 deletions docs/scanpipe-tutorial-1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ Requirements
Before you start
----------------

- Download the following test Docker image and save this in your home directory:
https://github.com/nexB/scancode.io-tutorial/releases/download/sample-images/30-alpine-nickolashkraus-staticbox-latest.tar
Download the following test Docker image and save this in your home directory:
`30-alpine-nickolashkraus-staticbox-latest.tar <https://github.com/nexB/scancode.io-tutorial/releases/download/sample-images/30-alpine-nickolashkraus-staticbox-latest.tar>`_


Step-by-step
Expand Down Expand Up @@ -49,7 +49,7 @@ Step-by-step
The ``scanpipe show-pipeline`` command lists all the pipelines added to the
project and their planned runs.
You can use this to get a quick overview of the pipelines that have been running already
(with their success "S" or fail status "F") and those that will be running next.
(with their "SUCCESS" or "FAILURE" status) and those that will be running next.

- Run the docker pipeline on this project::

Expand All @@ -59,7 +59,7 @@ Step-by-step
pipeline run::

$ scanpipe show-pipeline --project staticbox
"[S] scanpipe/pipelines/docker.py"
"[SUCCESS] scanpipe/pipelines/docker.py"

- Get the results of the pipeline run as a JSON file using the ``output`` command::

Expand All @@ -72,8 +72,10 @@ Step-by-step
.. note::
The ``inputs`` and ``pipelines`` can be provided directly at once when
calling the ``create-project`` command.
For example, this command will create a project named ``p2``, copy our test
docker image to the project's inputs, and add the docker pipeline in one
operation::
A ``run`` option is also available to start the pipeline execution right
after the project creation.
For example, the following command will create a project named ``p2``,
copy the test docker image to the project's inputs, add the docker pipeline,
and execute the pipeline run in one operation::

$ scanpipe create-project p2 --input ~/30-alpine-nickolashkraus-staticbox-latest.tar --pipeline scanpipe/pipelines/docker.py
$ scanpipe create-project p2 --input ~/30-alpine-nickolashkraus-staticbox-latest.tar --pipeline scanpipe/pipelines/docker.py --run
45 changes: 45 additions & 0 deletions docs/scanpipe-tutorial-2.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
.. _scanpipe_tutorial_2:

Scan Codebase (command line)
============================

Requirements
------------

- **ScanCode.io is installed**, see :ref:`installation`
- **Shell access** on the machine where ScanCode.io is installed


Before you start
----------------

Download the following package archive save this in your home directory:
`asgiref-3.3.0-py3-none-any.whl <https://files.pythonhosted.org/packages/c0/e8/578887011652048c2d273bf98839a11020891917f3aa638a0bc9ac04d653/asgiref-3.3.0-py3-none-any.whl>`_


Step-by-step
------------

- Open a shell in the ScanCode.io installation directory and activate the virtualenv::

$ source bin/activate

- The following command will create a new project named ``asgiref``,
add the archive as an input for the project,
add the ``scan_codebase`` pipeline, and run its execution::

$ scanpipe create-project asgiref \
--input asgiref-3.3.0-py3-none-any.whl \
--pipeline scanpipe/pipelines/scan_codebase.py \
--run

.. note::
The content of the :guilabel:`input/` directory will be copied in the
:guilabel:`codebase/` directory where ``extractcode`` will be run before
running ``scancode``.
Alternatively, the codebase content can be manually copied to the
:guilabel:`codebase/` directory in which case the ``--input`` option can be
omitted.

- The scan results as JSON and CSV will be available in the project
:guilabel:`output/` directory.
5 changes: 4 additions & 1 deletion scancodeio/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,6 @@

USE_TZ = True


# Static files (CSS, JavaScript, Images)

STATIC_URL = "/static/"
Expand Down Expand Up @@ -200,3 +199,7 @@
env = environ.Env()

SECRET_KEY = env.str("SECRET_KEY")

# ScanCode.io custom settings

SCANCODE_DEFAULT_OPTIONS = env.list("SCANCODE_DEFAULT_OPTIONS", default=[])
2 changes: 2 additions & 0 deletions scanpipe/api/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ class Meta:
"created_date",
"pipeline",
"input_root",
"output_root",
"next_run",
"runs",
"extra_data",
Expand All @@ -118,6 +119,7 @@ class Meta:
)
exclude_from_list_view = [
"input_root",
"output_root",
"extra_data",
"codebase_resources_summary",
"discovered_package_summary",
Expand Down
5 changes: 3 additions & 2 deletions scanpipe/apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,12 @@ def ready(self):
"""
project_root = Path(__file__).parent.parent.absolute()
pipelines_dir = project_root / "scanpipe" / "pipelines"
dot_py_suffix = ".py"

for child in pipelines_dir.iterdir():
if child.name.endswith(".py") and not child.name.startswith("_"):
if child.name.endswith(dot_py_suffix) and not child.name.startswith("_"):
location = str(child.relative_to(project_root))
name = child.name.rstrip(".py")
name = child.name[: -len(dot_py_suffix)]
self.pipelines.append((location, name))

def is_valid(self, pipeline):
Expand Down
13 changes: 13 additions & 0 deletions scanpipe/management/commands/create-project.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

from django.core.exceptions import ValidationError
from django.core.management import CommandError
from django.core.management import call_command
from django.core.management.base import BaseCommand

from scanpipe.management.commands import copy_inputs
Expand Down Expand Up @@ -52,11 +53,17 @@ def add_arguments(self, parser):
default=list(),
help="Input file locations to copy in the input/ work directory.",
)
parser.add_argument(
"--run",
action="store_true",
help="Start running the pipelines right after project creation.",
)

def handle(self, *args, **options):
name = options["name"]
pipelines = options["pipelines"]
inputs = options["inputs"]
run = options["run"]

project = Project(name=name)
try:
Expand All @@ -68,6 +75,9 @@ def handle(self, *args, **options):
validate_pipelines(pipelines)
validate_inputs(inputs)

if run and not pipelines:
raise CommandError("The --run option requires one or more pipelines.")

project.save()
msg = f"Project {name} created with work directory {project.work_directory}"
self.stdout.write(self.style.SUCCESS(msg))
Expand All @@ -76,3 +86,6 @@ def handle(self, *args, **options):
project.add_pipeline(pipeline_location)

copy_inputs(inputs, project.input_path)

if run:
call_command("run", project=project, stderr=self.stderr, stdout=self.stdout)
4 changes: 1 addition & 3 deletions scanpipe/management/commands/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,7 @@ def handle(self, *args, **options):
if not run:
raise CommandError(f"No pipelines to {action} on project {self.project}")

msg = f"Pipeline {run.pipeline} {action} in progress..."
self.stdout.write(msg)

self.stdout.write(f"Pipeline {run.pipeline} {action} in progress...")
getattr(run, task_function)()

run.refresh_from_db()
Expand Down
4 changes: 2 additions & 2 deletions scanpipe/management/commands/show-pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ def handle(self, *args, **options):
def get_run_status_code(self, run):
status = " "
if run.task_succeeded:
status = self.style.SUCCESS("S")
status = self.style.SUCCESS("SUCCESS")
elif run.task_exitcode and run.task_exitcode > 0:
status = self.style.ERROR("F")
status = self.style.ERROR("FAILURE")
return status
24 changes: 21 additions & 3 deletions scanpipe/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,15 +257,29 @@ def input_files(self):
if path.is_file()
]

@staticmethod
def get_root_content(directory):
"""
Return the list of all files and directories of the `directory`.
Only the first level children are listed.
"""
return [str(path.relative_to(directory)) for path in directory.glob("*")]

@property
def input_root(self):
"""
Return the list of all files and directories of the input/ directory.
Only the first level children are listed.
"""
return [
str(path.relative_to(self.input_path)) for path in self.input_path.glob("*")
]
return self.get_root_content(self.input_path)

@property
def output_root(self):
"""
Return the list of all files and directories of the output/ directory.
Only the first level children are listed.
"""
return self.get_root_content(self.output_path)

def add_input_file(self, file_object):
"""
Expand Down Expand Up @@ -636,6 +650,10 @@ class Meta:
def __str__(self):
return self.package_url or str(self.uuid)

@property
def purl(self):
return self.package_url

@classmethod
def create_from_data(cls, project, package_data):
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@

from scanpipe.pipelines import Pipeline
from scanpipe.pipelines import step
from scanpipe.pipes import scancode_utils
from scanpipe.pipes import scancode


class CollectInventoryFromScanCodeScan(Pipeline):
class LoadInventoryFromScanCodeScan(Pipeline):
"""
A pipeline to load a files and packages inventory from a ScanCode JSON scan
A pipeline to load a files and packages inventory from a ScanCode JSON scan.
(assumed to contain file information and package scan data).
"""

Expand All @@ -43,6 +43,17 @@ def start(self):
Load the Project instance.
"""
self.project = self.get_project(self.project_name)
self.next(self.get_scan_json_input)

@step
def get_scan_json_input(self):
"""
Locate the JSON scan input from the project input/ directory.
"""
inputs = list(self.project.inputs(pattern="*.json"))
if len(inputs) != 1:
raise Exception("Only 1 JSON input file supported")
self.input_location = str(inputs[0].absolute())
self.next(self.build_inventory_from_scan)

@step
Expand All @@ -51,9 +62,9 @@ def build_inventory_from_scan(self):
Process a JSON scan to populate resources and packages.
"""
project = self.project
scanned_codebase = scancode_utils.get_virtual_codebase(project)
scancode_utils.create_codebase_resources(project, scanned_codebase)
scancode_utils.create_discovered_packages(project, scanned_codebase)
scanned_codebase = scancode.get_virtual_codebase(project, self.input_location)
scancode.create_codebase_resources(project, scanned_codebase)
scancode.create_discovered_packages(project, scanned_codebase)
self.next(self.end)

@step
Expand All @@ -64,4 +75,4 @@ def end(self):


if __name__ == "__main__":
CollectInventoryFromScanCodeScan()
LoadInventoryFromScanCodeScan()
Loading

0 comments on commit f03d04b

Please sign in to comment.