-
Notifications
You must be signed in to change notification settings - Fork 6
neuston_net RUN
Once a .ptl model is trained with neuston_net TRAIN
, it can be used with neuston_net RUN
to perform inference on ifcb bins and unlabeled images. As output, the command creates one or more classification result files containing the each image's determined class, confidence score, and other run/model metadata.
neuston_net RUN
requires the following input parameters in the following order
-
SRC
- input data -
MODEL
- a .ptl model file -
RUN_ID
- a label for this inference run (included in result metadata and output options)
neuston_net RUN
accepts as input for the SRC
file either a single ifcb bin-id, a .txt
text file with a list of bin-id's (newline-deliminated), or a directory containing ifcb-bins (directories are accessed recursively. ifcb bin-ids must be prefixed with the path to the ifcb bin's actual files on-disk (a bin comprises of three files bearing the same bin-id, see pyifcb for details on ifcb-bins). It is also possible to run inference on regular image files instead of bins using the --type img
flag, though note that this affects output options.
It is possible to further tune what bins or images you with to run inference for using the --filter
flag.
You can exclude particular bins/images using the --filter OUT
option.
Contrarily, with --filter IN
you can exclude all bins/images with the exception of the ones you specify.
More that one filter values may be submitted sequentially on the command line.
The filter option will also accept a .txt
file (newline deliminated) of filter values.
You do NOT need to specify a bin or image's filepath when filtering.
Note: For images, if any of the the values being filtered for appear in an image filename, that file will be filtered.
Examples:
-
--filter IN bin1 bin2
- limit processing to justbin1
andbin2
-
--filter IN list-of-binID.txt
- limit processing to the list if bins in the.txt
file -
--filter OUT badbin
- if you know that you don't want to classify data frombadbin
, you can filter it out -
--type img --filter IN 2021-03-22
- assuming a multi-year directory of images forSRC
that image filenames are formatted to include a date, this filter option will only process images with "2021-03-22" in the filename, ie images from March 22nd 2021
By default if a target bin output file is found to already exist, re-processing for that bin is skipped. If bin-processing is interrupted before completion, this behavior is practical for picking up processing where it left. To disable this behavior and overwrite any existing files, use --clobber
. This behavior is NOT enabled for --type img
.
By default, one output file is created per bin. The directory it gets saved under is determined by --outdir
and --outfile
.
OUTDIR defines the root folder inference results get saved under, and OUTFILE specifies the filetype, filename, and any bin-based directory structure beyond OUTDIR. Note the formatting tags in the {
curly braces}
which get replaced by actual values at output.
Default value: run-output/{RUN_ID}/v3/{MODEL_ID}/
. MODEL_ID
is the same as the model id in the MODEL
's metadata and RUN_ID
is of course provided directly in the neuston_net RUN command.
Default value: "D{BIN_YEAR}/D{BIN_DATE}/{BIN_ID}_class.h5"
. This creates a year-date-files directory structure under OUTDIR
.
There are three available output formats: HDF .h5
, matlab .mat
, and json .json
. {BIN_YEAR}
, {BIN_DATE}
, and {BIN_ID}
get replaces with a given bins collection year, collection date, and bin id respectively. When processing bins, {BIN_ID}
is required. Additionally, {INPUT_SUBDIRS}
is an available formatting tag who's value is a bin's parent directory filepath (after/not-including SRC
).
When processing images, no formatting tags are available. The default is img_results.json
.
neuston_net.py RUN path/to/SRC path/to/MODEL RUN_ID
usage: neuston_net.py RUN [-h] [--type {bin,img}] [--outdir OUTDIR] [--outfile OUTFILE]
[--filter IN|OUT [KEYWORD ...]] [--clobber] SRC MODEL RUN_ID
positional arguments:
SRC Resource(s) to be classified. Accepts a bin, an image, a text-file, or a directory.
Directories are accessed recursively
MODEL Path to a previously-trained model file
RUN_ID Run ID. Used by --outdir
optional arguments:
-h, --help show this help message and exit
--type {bin,img} File type to perform classification on. Defaults is "bin"
--outdir OUTDIR Default is "run-output/{RUN_ID}/v3/{MODEL_ID}"
--outfile OUTFILE Name/pattern of the output classification file.
If TYPE==bin, files are created on a per-bin basis.
OUTFILE must include "{BIN_ID}", which will be replaced with the a bin's id.
A few patters are recognized: {BIN_ID}, {BIN_YEAR}, {BIN_DATE}, {INPUT_SUBDIRS}.
A few output file formats are recognized: .json, .mat, and .h5 (hdf).
Default for TYPE==bin is "D{BIN_YEAR}/D{BIN_DATE}/{BIN_ID}_class.h5";
Default for TYPE==img is "img_results.json".
--filter IN|OUT [KEYWORD ...]
Explicitly include (IN) or exclude (OUT) bins or image-files by KEYWORDs.
KEYWORD may also be a text file containing KEYWORDs, line-deliminated.
--clobber If set, already processed bins in OUTDIR are reprocessed.
By default, if an OUTFILE exists already the associated bin is not reprocessed.
- Overview
- Installation
- local
- whoi hpc
- Training a Model
- Running a Model
- Utilities
- SLURM SBATCH Tool ⊛
- Dupes Training ⊛
- Tips
- HPC Patch Notes