Skip to content

Commit

Permalink
Merge pull request #210 from openpreserve/rel/1.6
Browse files Browse the repository at this point in the history
REL: FIDO v1.6.0
  • Loading branch information
carlwilson authored Dec 16, 2022
2 parents 4c1bedb + d26d2b0 commit 72c9727
Show file tree
Hide file tree
Showing 25 changed files with 65,943 additions and 1,084 deletions.
61 changes: 61 additions & 0 deletions .github/workflows/test-pr.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: PR QA

on:
pull_request:
types: [opened, synchronize, reopened]

jobs:
build:
name: Checkout and Build
runs-on: ubuntu-20.04

strategy:
matrix:
python-version: ["3.6", "3.7", "3.8", "3.9", "3.10"]

steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -U flake8 pep257 pytest-cov codecov codacy-coverage pluggy
pip install -e .
- name: Lint code with flake8
run: flake8 . --count --show-source --max-line-length=127 --statistics
- name: Lint code with pep257
if: matrix.python-version == 2.7
run: pep257 --match="(?!fido).*\.py" ./fido
- name: Test using pytest
run: pytest --cov=fido
- name: Generate LCOV coverage report
if: matrix.python-version == 3.10
run: coverage xml -o cobertura.xml
- name: Upload coverage report
if: matrix.python-version == 3.10
uses: actions/upload-artifact@v3
with:
name: coverage-report
path: cobertura.xml
coverage:
name: Quality Assurance
runs-on: ubuntu-20.04
needs: [ build ]

steps:
- name: Download coverage report
uses: actions/download-artifact@v3
with:
name: coverage-report
path: cobertura.xml
- name: Codecov coverage reporting
uses: codecov/codecov-action@v3
with:
files: cobertura.xml
fail_ci_if_error: false # optional (default = false)
verbose: true # optional (default = false)
- name: Codacy analysis reporting
uses: codacy/codacy-analysis-cli-action@4.0.0
23 changes: 0 additions & 23 deletions .travis.yml

This file was deleted.

33 changes: 33 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
FROM python:3.6-alpine as builder

LABEL maintainer="carl.wilson@openpreservation.org" \
org.openpreservation.vendor="Open Preservation Foundation" \
version="0.1"

RUN apk update && apk --no-cache --update-cache add gcc build-base libxml2-dev libxslt-dev git

WORKDIR /src

COPY setup.py setup.py
COPY requirements.txt requirements.txt
COPY README.md README.md
COPY fido/* fido/

RUN mkdir /install && pip install -U pip && pip install -r requirements.txt --prefix=/install && pip install --prefix=/install .

FROM python:3.6-alpine

RUN apk update && apk add --no-cache --update-cache libc6-compat libstdc++ bash libxslt
RUN install -d -o root -g root -m 755 /opt && adduser --uid 1000 -h /opt/fido_sigs -S eark && pip install -U pip python-dateutil

WORKDIR /opt/fido_sigs

COPY --from=builder /install /usr/local
COPY . /opt/fido_sigs/
RUN chown -R eark:users /opt/fido_sigs

USER eark

EXPOSE 5000
ENV FLASK_APP='fido.signatures'
ENTRYPOINT flask run --host "0.0.0.0" --port "5000"
92 changes: 67 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,31 +9,34 @@ FIDO is a command-line tool to identify the file formats of digital objects.
It is designed for simple integration into automated work-flows.

FIDO uses the UK National Archives (TNA) PRONOM File Format and Container descriptions.
PRONOM is available from http://www.nationalarchives.gov.uk/pronom/
PRONOM is available from <http://www.nationalarchives.gov.uk/pronom/>
See [LICENSE](LICENSE.txt) for license information.

* Download from: https://github.com/openpreserve/fido/releases
* Usage guide: http://wiki.opf-labs.org/display/KB/FIDO+usage+guide
* Download from: <https://github.com/openpreserve/fido/releases>
* Usage guide: <http://wiki.opf-labs.org/display/KB/FIDO+usage+guide>
* Author: Adam Farquhar (BL), 2010
* Maintainer: Maurice de Rooij (OPF/NANETH), 2011, 2012, 2013, Misty de Meo 2014, 2015, 2016, Holly Becker 2016

Usage
-----

```
usage: fido.py [-h] [-v] [-q] [-recurse] [-zip] [-nocontainer] [-pronom_only]
[-input INPUT] [-filename FILENAME] [-useformats INCLUDEPUIDS]
[-nouseformats EXCLUDEPUIDS] [-matchprintf FORMATSTRING]
[-nomatchprintf FORMATSTRING] [-bufsize BUFSIZE]
[-container_bufsize CONTAINER_BUFSIZE]
[-loadformats XML1,...,XMLn] [-confdir CONFDIR]
[FILE [FILE ...]]
```shell
usage: fido [-h] [-v] [-q] [-recurse] [-zip] [-noextension] [-nocontainer]
[-pronom_only] [-input INPUT] [-filename FILENAME]
[-useformats INCLUDEPUIDS] [-nouseformats EXCLUDEPUIDS]
[-matchprintf FORMATSTRING] [-nomatchprintf FORMATSTRING]
[-bufsize BUFSIZE] [-sigs SIG_ACT]
[-container_bufsize CONTAINER_BUFSIZE]
[-loadformats XML1,...,XMLn] [-confdir CONFDIR]
[FILE [FILE ...]]
```
positional arguments:
* `FILE`: files to check. If the file is -, then read content from stdin. In this case, python must be invoked with `-u` or it may convert the line terminators.
optional arguments:
* `-h`, `--help`: show this help message and exit
* `-v`: show version information
* `-q`: run (more) quietly
Expand All @@ -48,18 +51,22 @@ optional arguments:
* `-matchprintf FORMATSTRING`: format string (Python style) to use on match. See nomatchprintf, README.txt.
* `-nomatchprintf FORMATSTRING`: format string (Python style) to use if no match. See README.txt
* `-bufsize BUFSIZE`: size (in bytes) of the buffer to match against (default=131072 bytes)
* `-sigs SIG_ACT`: SIG_ACT "check" for new version of signature file for download.
SIG_ACT "list" list all available sig file versions.
SIG_ACT "update" to automatically update to latest available sig file.
SIG_ACT "n" download and use version n.
* `-container_bufsize CONTAINER_BUFSIZE`: size (in bytes) of the buffer to match against (default=524288 bytes)
* `-loadformats XML1,...,XMLn`: comma separated string of XML format files to add.
* `-confdir CONFDIR`: configuration directory to load_fido_xml, for example, the format specifications from.
Installation
------------
(also see: http://wiki.opf-labs.org/display/KB/FIDO+usage+guide)
(also see: <http://wiki.opf-labs.org/display/KB/FIDO+usage+guide>)
Any platform
1. Download the latest zip release from https://github.com/openpreserve/fido/releases
1. Download the latest zip release from <https://github.com/openpreserve/fido/releases>
2. Unzip into some directory
3. Open a command shell, cd to the directory that you placed the zip contents into
4. Run `python setup.py install` to install FIDO and dependencies. This may require sudo on Linux/OSX or admin privileges on Windows.
Expand All @@ -75,11 +82,42 @@ Using pip
Updating signatures
-------------------
To update FIDO with the latest PRONOM file format definitions, run:
`fido-update-signatures`
This is an interactive CLI script which downloads the latest PRONOM signature file and signatures. Please note that it can take a while to download all PUID signatures.
Signatures can be updated from the OPF's signature service.
The service is pull only and iit's location is in the `versions.xml`
configuration file as
```xml
<updateSite>https://fidosigs.openpreservation.org</updateSite>
```
To check what version of the PRONOM signatures you are using
type: `fido -v` and you'll see something like:
If you are having trouble running the script due to firewall restrictions, see OPF wiki: http://wiki.opf-labs.org/display/PT/Command+Line+Interface+proxy+usage
```shell
FIDO v1.6.0 (pronom-xml-95.zip, container-signature-20200121.xml, format_extensions.xml)
```
Here `pronom-xml-95.zip` denotes PRONOM version 95. To see if a more recent
set of signatures is available type `fido -sigs check` which will report back:
```shell
Updated signatures v104 are available, current version is v95
```
if new signatures are available or
```shell
Your signature files are up to date, current version is v104
```
if not. To update signatures to the latest version type `fido -sigs update`:
```shell
Updated signatures v104 are available, current version is v95
Updating signatures
```
If you are having trouble due to firewall restrictions, see OPF wiki: <http://wiki.opf-labs.org/display/PT/Command+Line+Interface+proxy+usage>
Please note that this WILL NOT update the container signature file located in the 'conf' folder.
The reason for this that the PRONOM container signature file contains special types
Expand All @@ -97,6 +135,8 @@ or a pip installation will handle dependencies.
FIDO 1.3.3 and later have experimental Python 3 support.
FIDO 1.4 and later have Python 3 support.
Format Definitions
------------------
Expand All @@ -118,11 +158,12 @@ an object called info with the following fields:
* `printnomatch`: `info.count` (file N)
The defaults for FIDO 1.0 are:
* `printmatch`:
* `"OK,%(info.time)s,%(info.puid)s,%(info.formatname)s,%(info.signaturename)s,%(info.filesize)s,\"%(info.filename)s\",\"%(info.mimetype)s\",\"%(info.matchtype)s\"\n"`
* `"OK,%(info.time)s,%(info.puid)s,%(info.formatname)s,%(info.signaturename)s,%(info.filesize)s,\"%(info.filename)s\",\"%(info.mimetype)s\",\"%(info.matchtype)s\"\n"`
* `printnomatch`:
* `"KO,%(info.time)s,,,,%(info.filesize)s,\"%(info.filename)s\",,\"%(info.matchtype)s\"\n"`
* `"KO,%(info.time)s,,,,%(info.filesize)s,\"%(info.filename)s\",,\"%(info.matchtype)s\"\n"`
It can be useful to provide an empty string for either, for example to ignore all failed matches, or all successful ones (see examples below).
Note that a newline needs to be added to the end of the string using \n.
Expand All @@ -131,10 +172,11 @@ Matchtypes
-----------
FIDO returns the following matchtypes:
- fail: the object could not be identified with signature or file extension
- extension: the object could only be identified by file extension
- signature: the object has been identified with (a) PRONOM signature(s)
- container: the object has been idenfified with (a) PRONOM container signature(s)
* fail: the object could not be identified with signature or file extension
* extension: the object could only be identified by file extension
* signature: the object has been identified with (a) PRONOM signature(s)
* container: the object has been idenfified with (a) PRONOM container signature(s)
In some cases multiple results are returned.
Expand All @@ -152,14 +194,14 @@ Take input from a list of files:
Linux:
```
```shell
ls > files.txt
python fido.py -input files.txt
```
Windows:
```
```shell
dir /b > files.txt
python fido.py -input files.txt
```
Expand Down
42 changes: 42 additions & 0 deletions RELEASENOTES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
RELEASE NOTES
=============

Format Identification for Digital Objects (fido).
Copyright 2010 by Open Preservation Foundation.

Expand All @@ -8,8 +9,49 @@ Copyright 2010 The Open Preservation Foundation
Fido is made available under the Apache License, Version 2.0; see the file
LICENSE.txt for details.

Fido 1.6.0
-------------

2022-12-15

New command line options for updating signatures

- PRONOM signatures can now be updated from a web service [[#202][]].
- PRONOM v104 support with successful signature compilation (see issue [#203][]) [[#204][]].
- Closed issue [#100][], Added Unicode support for Windows Python 2.7 [[#200][]].
- Generated signature file now validated against XSD schema [[#197][]].
- Refactoring and cleared final PEP and FLAKE code lint warnings [[#197][]].
- Closed issue [#150][], trapped some of the signature compliation issues [[#197][]].
- Closed issue [#179][], [#198][]: Crash on XLS format by updating olefile version to 0.46 [[#195][]].
- Closed issue [#179][]: Crash on XLS format by updating olefile version to 0.46 [[#195][]].
- Closed issue [#192][]: Fixed signature file defaults [[#193][]].
- added update signature parameter to control signature download verison:
- trapped regex creation exception so that sig file creation is not derailed;
- PRONOM/DROID signature file now downloaded from URL rather than via SOAP service;
- moved sleep between SOAP downloads so that it's only applied between actual downloads, not when processing cached results;
- code style warnings:
- some minor refactoring for complex methods;
- factoring out string constants;
- renamed some variables and methods;
- removed some commented code;
- tidied exit conditions; and
- removed some unreachable code.

[#100]: https://github.com/openpreserve/fido/issues/100
[#150]: https://github.com/openpreserve/fido/issues/150
[#179]: https://github.com/openpreserve/fido/issues/179
[#192]: https://github.com/openpreserve/fido/issues/192
[#193]: https://github.com/openpreserve/fido/pull/193
[#195]: https://github.com/openpreserve/fido/pull/195
[#198]: https://github.com/openpreserve/fido/issues/198
[#200]: https://github.com/openpreserve/fido/pull/200
[#202]: https://github.com/openpreserve/fido/pull/202
[#203]: https://github.com/openpreserve/fido/issues/203
[#204]: https://github.com/openpreserve/fido/pull/204

Fido 1.4.0
-------------

2018-12-19

- Python 3 support [[#156][]]
Expand Down
2 changes: 1 addition & 1 deletion fido/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from six.moves import input as rinput


__version__ = '1.4.1'
__version__ = '1.6.0'


CONFIG_DIR = join(abspath(dirname(__file__)), 'conf')
Expand Down
2 changes: 0 additions & 2 deletions fido/conf/DROID_SignatureFile-v104.xml

This file was deleted.

Loading

0 comments on commit 72c9727

Please sign in to comment.