Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input format expression not considering $namespaces #2033

Open
fmigneault opened this issue Aug 23, 2024 · 2 comments
Open

Input format expression not considering $namespaces #2033

fmigneault opened this issue Aug 23, 2024 · 2 comments

Comments

@fmigneault
Copy link
Contributor

Expected Behavior

When using a file format with a JS expression (workaround for common-workflow-language/cwl-v1.3#52), the format check should consider any relevant resolution of $namespaces beforehand.

Actual Behavior

The format check fails if the input file format and the evaluated format do not match exactly. Since $namespaces can be used to write equivalent formats (i.e.: https://www.iana.org/assignments/media-types/application/geo+jsoniana:application/geo+json), they should be considered interchangeably. However, the format expression fails unless the evaluated format is explicitly written in its long form (ie: the full URI).

Given that inputs submitted (in job.yml) either with the long-form URI or the namespace'd format are both converted to the long-form URI when reaching the below check, this forces the JS expression to use the long-form URI to be considered valid.

cwltool/cwltool/builder.py

Lines 555 to 559 in 6d8c2a4

check_format(
datum,
evaluated_format,
self.formatgraph,
)

However, for a user writing the CWL document that defined a $namespace section, it is very counter-intuitive to use the long-form URI only in the format expression, when everywhere else accepts iana:application/geo+json.

Workflow Code

job.yml

features:
  class: File
  path: /tmp/feature-0.geojson
  format: "https://www.iana.org/assignments/media-types/application/geo+json"

OR

features:
  class: File
  path: /tmp/feature-0.geojson
  format: "iana:application/geo+json"

echo_features.cwl

cwlVersion: "v1.2"
class: CommandLineTool
$namespaces:
  iana: "https://www.iana.org/assignments/media-types/"
baseCommand: echo
requirements:
  InlineJavascriptRequirement: {}
  DockerRequirement:
    dockerPull: "debian:stretch-slim"
inputs:
  features:
    type:
      - "File"
      - type: array
        items: File
    format: |
      ${
        if (Array.isArray(inputs.features)) {
          return "iana:application/geo+json";  # (!) here is the problematic format, unless the full URI is given
        }
        return "http://www.opengis.net/def/glossary/term/FeatureCollection";
      }
    inputBinding:
      valueFrom: |
        ${
          if (Array.isArray(inputs.features)) {
            return {
              "type": "FeatureCollection",
              "features": inputs.features.every(item => item.contents)
            };
          }
          return inputs.features.contents;
        }
outputs:
  features:
    type: File
    format: "http://www.opengis.net/def/glossary/term/FeatureCollection"
    outputBinding:
      glob: "features.json"
stdout: "features.json"

Full Traceback

❯ cwllog --debug echo_features.cwl job.yml
Running:  [cwltool --disable-color --debug echo_features.cwl job.yml 2>&1 | tee echo_features.log]
Log Path: [/tmp/echo_features.log]
INFO /home/francis/dev/conda/envs/weaver/bin/cwltool 3.1.20230906142556
INFO Resolved 'echo_features.cwl' to 'file:///tmp/echo_features.cwl'
URI prefix '${
  if (Array.isArray(inputs.features)) {
    return "iana' of '${
  if (Array.isArray(inputs.features)) {
    return "iana:application/geo+json";
  }
  return "http://www.opengis.net/def/glossary/term/FeatureCollection";
}
' not recognized, are you missing a $namespaces section?
URI prefix '${
  if (Array.isArray(inputs.features)) {
    return "iana' of '${
  if (Array.isArray(inputs.features)) {
    return "iana:application/geo+json";
  }
  return "http://www.opengis.net/def/glossary/term/FeatureCollection";
}
' not recognized, are you missing a $namespaces section?
echo_features.cwl:9:3: object id 'echo_features.cwl#features' previously defined
WARNING echo_features.cwl:22:7: JSHINT:       "features": inputs.features.every(item => item.contents)
echo_features.cwl:22:7: JSHINT:                                              ^
echo_features.cwl:22:7: JSHINT: W119: 'arrow function syntax (=>)' is only available in ES6. CWL only supports ES5.1
ERROR Workflow error:
Expected value of 'features' to have format '${\n  if (Array.isArray(inputs.features)) {\n    return "iana:application/geo+json";\n  }\n  return "http://www.opengis.net/def/glossary/term/FeatureCollection";\n}\n' but
 File has an incompatible format: {
    "class": "File",
    "format": "https://www.iana.org/assignments/media-types/application/geo+json",
    "location": "file:///tmp/feature-0.geojson",
    "size": 162,
    "basename": "feature-0.geojson",
    "nameroot": "feature-0",
    "nameext": ".geojson"
}
Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 554, in bind_input
    check_format(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwl_utils/file_formats.py", line 70, in check_format
    raise ValidationException(
schema_salad.exceptions.ValidationException: File has an incompatible format: {
    "class": "File",
    "format": "https://www.iana.org/assignments/media-types/application/geo+json",
    "location": "file:///tmp/feature-0.geojson",
    "size": 162,
    "basename": "feature-0.geojson",
    "nameroot": "feature-0",
    "nameext": ".geojson"
}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/main.py", line 1298, in main
    (out, status) = real_executor(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 62, in __call__
    return self.execute(process, job_order_object, runtime_context, logger)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 145, in execute
    self.run_jobs(process, job_order_object, logger, runtime_context)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 218, in run_jobs
    for job in jobiter:
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 963, in job
    builder = self._init_job(job_order, runtimeContext)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/process.py", line 888, in _init_job
    builder.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 330, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 262, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 560, in bind_input
    raise WorkflowException(
cwltool.errors.WorkflowException: Expected value of 'features' to have format '${\n  if (Array.isArray(inputs.features)) {\n    return "iana:application/geo+json";\n  }\n  return "http://www.opengis.net/def/glossary/term/FeatureCollection";\n}\n' but
 File has an incompatible format: {
    "class": "File",
    "format": "https://www.iana.org/assignments/media-types/application/geo+json",
    "location": "file:///tmp/feature-0.geojson",
    "size": 162,
    "basename": "feature-0.geojson",
    "nameroot": "feature-0",
    "nameext": ".geojson"
}

Your Environment

  • cwltool version: 3.1.20230906142556
@fmigneault fmigneault changed the title Input format expression not considering $namespace Input format expression not considering $namespaces Aug 23, 2024
@fmigneault
Copy link
Contributor Author

Error on my end.

If the format is updated with the expected structure from the script, all format values work as expected and interchangeably.

single "FeatureCollection" File

features:
  class: File
  path: /tmp/feature-0.geojson
  format: "ogc-term:FeatureCollection"

OR

features:
  class: File
  path: /tmp/feature-0.geojson
  format: "http://www.opengis.net/def/glossary/term/FeatureCollection"

array of "feature" Files

features:
  - class: File
    path: /tmp/feature-0.geojson
    format: "iana:application/geo+json"

OR

features:
  - class: File
    path: /tmp/feature-0.geojson
    format: "https://www.iana.org/assignments/media-types/application/geo+json"

@fmigneault
Copy link
Contributor Author

fmigneault commented Aug 26, 2024

Further investigation reveals that this is actually still an issue.

More specifically, if the input is defined with the following, everything works
(everything, as in, whether the full URI or namespace variant are used in job.yml, the job succeeds).

inputs:
  features:
    format: |
      ${
        if (Array.isArray(inputs.features)) {
          return "https://www.iana.org/assignments/media-types/application/geo+json"; 
        }
        return "http://www.opengis.net/def/glossary/term/FeatureCollection";
      }

However, if using the namespace format inside the CWL expression, as below, the job always fails, no matter which format variant is provided in job.yml.

inputs:
  features:
    format: |
      ${
        if (Array.isArray(inputs.features)) {
          return "iana:application/geo+json";
        }
        return "ogc-term:FeatureCollection";
      }

With the namespace format in the CWL expression, 2 errors happen according to the format provided in job.yml.

  1. The job format is also the namespaced value (trying to do an == match with the evaluated CWL expression). The error is simply the generic schema_salad.exceptions.ValidationException: File has an incompatible format.

  2. The job format is the full URI. This causes a parsing error (looking for some name key?) with the following traceback.

Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 554, in bind_input
    check_format(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwl_utils/file_formats.py", line 70, in check_format
    raise ValidationException(
schema_salad.exceptions.ValidationException: File has an incompatible format: {
    "class": "File",
    "format": "https://www.iana.org/assignments/media-types/application/geo+json",
    "location": "file:///tmp/feature-0.geojson",
    "size": 162,
    "basename": "feature-0.geojson",
    "nameroot": "feature-0",
    "nameext": ".geojson"
}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 218, in run_jobs
    for job in jobiter:
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 963, in job
    builder = self._init_job(job_order, runtimeContext)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/process.py", line 888, in _init_job
    builder.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 330, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 262, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 288, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 355, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 561, in bind_input
    f"Expected value of {schema['name']!r} to have "
KeyError: 'name'
ERROR Workflow error:
'name'
Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 554, in bind_input
    check_format(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwl_utils/file_formats.py", line 70, in check_format
    raise ValidationException(
schema_salad.exceptions.ValidationException: File has an incompatible format: {
    "class": "File",
    "format": "https://www.iana.org/assignments/media-types/application/geo+json",
    "location": "file:///tmp/feature-0.geojson",
    "size": 162,
    "basename": "feature-0.geojson",
    "nameroot": "feature-0",
    "nameext": ".geojson"
}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 218, in run_jobs
    for job in jobiter:
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 963, in job
    builder = self._init_job(job_order, runtimeContext)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/process.py", line 888, in _init_job
    builder.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 330, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 262, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 288, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 355, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 561, in bind_input
    f"Expected value of {schema['name']!r} to have "
KeyError: 'name'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/main.py", line 1298, in main
    (out, status) = real_executor(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 62, in __call__
    return self.execute(process, job_order_object, runtime_context, logger)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 145, in execute
    self.run_jobs(process, job_order_object, logger, runtime_context)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 252, in run_jobs
    raise WorkflowException(str(err)) from err
cwltool.errors.WorkflowException: 'name'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant