Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, I want to validate MP4/H.264/AAC encoded video with audio as observational data #605

Closed
Tracked by #717
jordanpadams opened this issue Mar 9, 2023 · 19 comments

Comments

@jordanpadams
Copy link
Member

jordanpadams commented Mar 9, 2023

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

Archive Manager

πŸ’ͺ Motivation

...so that I the encoded video is valid and meets the standards outlined by ISO standard for MP4/H.264/AAC.

πŸ“– Additional Details

See CCB-325 for more details. Info to validate in this doc: white_paper_v4.pdf (see Testing for PDS MP4 Compliance section)

Acceptance Criteria

Given an properly labeled and valid MP4/H.264/AAC encoded audio product
When I perform validation of that data content
Then I expect validate to complete successfully

Given an properly labeled but invalid MP4/H.264/AAC encoded audio product
When I perform validation of that data content
Then I expect validate to fail with an ERROR

βš™οΈ Engineering Details

Related to NASA-PDS/pds4-information-model#616

@thareUSGS
Copy link

thareUSGS commented May 2, 2023

Taking the video from this post #604 and a free audio file "SCIShip_Spaceship passage 1 (ID 1971)_BSB.wav" from https://bigsoundbank.com, I combined them in avidemux which can simply combine the video (no change or pass-thru) with the wav file converting it to AAC on output.

ingenuity_sol120_30fps_crf13_fakeAudioAAC.mp4

The added sound is fake! There is no audio for ingenuity.

@thareUSGS
Copy link

thareUSGS commented May 2, 2023

Here are two sibling raccoons wrestling at 3 am from my house. The sound is just static but the video is H.264 and the audio AAC.

6109045024.mp4

@jordanpadams
Copy link
Member Author

@thareUSGS so how would we label this file? basically just have this:

        <Encoded_Audio>
            <offset unit="byte">0</offset>
            <encoding_standard_id>MP4/H.264/AAC</encoding_standard_id>
        </Encoded_Audio>

would this basically be the same for Video too? Or would you have both Audio and Video in the label?

@jordanpadams
Copy link
Member Author

@al-niessner here is an example label. example.xml.txt. will need to change the filename referenced to match the sample files above, but awaiting response from Trent.

@thareUSGS
Copy link

offset should be 0 since we are pointing to the "container" of video only or video/audio
The encoding_standard_id should remove the "AAC" if no audio.

looks like you found the example from the jira ticket. But I think we need IM "K" to fix the schema links in that example.

@al-niessner
Copy link
Contributor

@jordanpadams

Using the provided example:

      ERROR  [error.label.schema]   line 117, 24: cvc-complex-type.2.4.a: Invalid content was found starting with element '{"http://pds.nasa.gov/pds4/pds/v1":Encoded_Video}'. One of '{"http://pds.nasa.gov/pds4/pds/v1":Composite_Structure, "http://pds.nasa.gov/pds4/pds/v1":Array, "http://pds.nasa.gov/pds4/pds/v1":Array_1D, "http://pds.nasa.gov/pds4/pds/v1":Array_2D, "http://pds.nasa.gov/pds4/pds/v1":Array_2D_Image, "http://pds.nasa.gov/pds4/pds/v1":Array_2D_Map, "http://pds.nasa.gov/pds4/pds/v1":Array_2D_Spectrum, "http://pds.nasa.gov/pds4/pds/v1":Array_3D, "http://pds.nasa.gov/pds4/pds/v1":Array_3D_Image, "http://pds.nasa.gov/pds4/pds/v1":Array_3D_Movie, "http://pds.nasa.gov/pds4/pds/v1":Array_3D_Spectrum, "http://pds.nasa.gov/pds4/pds/v1":Encoded_Header, "http://pds.nasa.gov/pds4/pds/v1":Header, "http://pds.nasa.gov/pds4/pds/v1":Stream_Text, "http://pds.nasa.gov/pds4/pds/v1":Table_Binary, "http://pds.nasa.gov/pds4/pds/v1":Table_Character, "http://pds.nasa.gov/pds4/pds/v1":Table_Delimited}' is expected.
      ERROR  [error.validation.internal_error]   Could not process the encoding type: encoding parameter 'MP4/H.264/AAC' is not known to this version of validate.
      ERROR  [error.label.context_ref_not_found]   line 59: 'Context product not found: urn:nasa:pds:context:instrument:mars2020.camera
  1. First error seems to be a disagreement with PDS4 schema. Is there a newer schema?
  2. Can fix the second error by updating validate for MP4/H.264/AAC.
  3. Can fix the last error by using --skip-context-validation

Does fixing the second and third errors as stated match with your expectations or should the third error be fixed some other way?

@al-niessner
Copy link
Contributor

@jordanpadams

With respect to audio: the example has MP4/H.264/AAC but PDS says Encoded_Audio/pds:encoding_standard_id must be equal to one of the following values 'M4A/AAC', 'WAV'.. Do we have a new audio that exactly matches the video or should audio be M4A/AAC in the example?

@jordanpadams
Copy link
Member Author

Using the provided example:

      ERROR  [error.label.schema]   line 117, 24: cvc-complex-type.2.4.a: Invalid content was found starting with element '{"http://pds.nasa.gov/pds4/pds/v1":Encoded_Video}'. One of '{"http://pds.nasa.gov/pds4/pds/v1":Composite_Structure, "http://pds.nasa.gov/pds4/pds/v1":Array, "http://pds.nasa.gov/pds4/pds/v1":Array_1D, "http://pds.nasa.gov/pds4/pds/v1":Array_2D, "http://pds.nasa.gov/pds4/pds/v1":Array_2D_Image, "http://pds.nasa.gov/pds4/pds/v1":Array_2D_Map, "http://pds.nasa.gov/pds4/pds/v1":Array_2D_Spectrum, "http://pds.nasa.gov/pds4/pds/v1":Array_3D, "http://pds.nasa.gov/pds4/pds/v1":Array_3D_Image, "http://pds.nasa.gov/pds4/pds/v1":Array_3D_Movie, "http://pds.nasa.gov/pds4/pds/v1":Array_3D_Spectrum, "http://pds.nasa.gov/pds4/pds/v1":Encoded_Header, "http://pds.nasa.gov/pds4/pds/v1":Header, "http://pds.nasa.gov/pds4/pds/v1":Stream_Text, "http://pds.nasa.gov/pds4/pds/v1":Table_Binary, "http://pds.nasa.gov/pds4/pds/v1":Table_Character, "http://pds.nasa.gov/pds4/pds/v1":Table_Delimited}' is expected.
      ERROR  [error.validation.internal_error]   Could not process the encoding type: encoding parameter 'MP4/H.264/AAC' is not known to this version of validate.
      ERROR  [error.label.context_ref_not_found]   line 59: 'Context product not found: urn:nasa:pds:context:instrument:mars2020.camera
  1. First error seems to be a disagreement with PDS4 schema. Is there a newer schema?

Here is a new example that references the latest version of the IM and standard_id = WAV
example.xml.txt

  1. Can fix the second error by updating validate for MP4/H.264/AAC.
    I think this should be fixed per the other comment above's discussion to come.
  1. Can fix the last error by using --skip-context-validation
    Yes. This is a notional instrument that will take audio/video.

With respect to audio: the example has MP4/H.264/AAC but PDS says Encoded_Audio/pds:encoding_standard_id must be equal to one of the following values 'M4A/AAC', 'WAV'.. Do we have a new audio that exactly matches the video or should audio be M4A/AAC in the example?

Let's use the WAV file referenced from here: https://bigsoundbank.com/sound-1971-spaceship-passage-1.html

I think the WAV overlayed with the video can probably be used as a second example to #604, where we can have 1 example with just video, and another with video+audio. see Trent's comment above.

@jordanpadams
Copy link
Member Author

@al-niessner ☝️

@al-niessner
Copy link
Contributor

@jordanpadams

Sorry, a bit confused. I thought we had 2 items (one per issue) of video and video+audio. Are you saying we have 3: audio, video, video+audio? I get <Encoding_Audio> for audio only and <Encoding_Video> for video only but do you need (should have?) both for video+audio?

I remade the #604 video using same ffmpeg command but really shortened it. Do not need 10 MB of movie. Using raccoons for video+audio and will get your suggestion for audio.

@jordanpadams
Copy link
Member Author

@al-niessner yeah per #605 (comment), it sounds like for Encoded_Video there may be 2 different use cases possible there: video or video+audio with encoding_standard_id values of MP4/H.264 and MP4/H.264/AAC, respectively.

@al-niessner
Copy link
Contributor

@jordanpadams

1K00 does not seem to be available:

    69           error.label.unresolvable_resource

      FATAL_ERROR  [error.label.unresolvable_resource]   https://pds.nasa.gov/pds4/proc/v1/PDS4_PROC_1K00_1300.xsd

Different version like J or L maybe?

@jordanpadams
Copy link
Member Author

@al-niessner right yes. forgot to mention. you will either need to include those LDDs in the GitHub repo and feed them in using the appropriate flags, or link to their dev locations in the schemaLocation and schematron references since they will not be released until the first week of June.

PDS XSD: https://pds.nasa.gov/datastandards/schema/develop/pds/PDS4_PDS_1K00.xsd
PDS SCH: https://pds.nasa.gov/datastandards/schema/develop/pds/PDS4_PDS_1K00.sch

PROC XSD: https://raw.githubusercontent.com/pds-data-dictionaries/ldd-proc/main/build/release/1.20.0.0/PDS4_PROC_1K00_1300.xsd
PROC SCH: https://raw.githubusercontent.com/pds-data-dictionaries/ldd-proc/main/build/release/1.20.0.0/PDS4_PROC_1K00_1300.sch

@al-niessner
Copy link
Contributor

@jordanpadams

I am good with using local schema and schematron but do you want to commit it that way with a ticket to remove it when 1K00 becomes mainstream or just leave the work done here sit on a PR until 1K00 becomes mainstream to not set the dev schema/schematron that may change before going mainstream? I suggest the latter.

@jordanpadams
Copy link
Member Author

Let's go with the latter then and we can keep this in draft until then

@jordanpadams jordanpadams changed the title As a user, I want to validate MP4/H.264/AAC encoded audio as observational data As a user, I want to validate MP4/H.264/AAC encoded video with audio as observational data Jun 14, 2023
@jordanpadams
Copy link
Member Author

@al-niessner can we add the example from this comment from @thareUSGS to test that the software is catching invalid video+audio files?

#606 (comment)

Labeling should follow the same format as the others.

@al-niessner
Copy link
Contributor

@al-niessner can we add the example from this comment from @thareUSGS to test that the software is catching invalid video+audio files?

#606 (comment)

Labeling should follow the same format as the others.

@jordanpadams

Can but I used the ones I did not because they look good but because they are trimmed to a minimum for size purposes - I trimmed the ones given in this ticket even more just save space. Do not need GB test files for validate just 1 KB files that are representative. If the suggested files are too big, git will refuse them like it did for the data given for #500 that was around 65 MB.

@al-niessner
Copy link
Contributor

@al-niessner can we add the example from this comment from @thareUSGS to test that the software is catching invalid video+audio files?

#606 (comment)

Labeling should follow the same format as the others.

Are you saying you want validate to detect a text file name fake.mp4 is not an MP4? I thought you just wanted to handle the new extension and that it matched what was allowed for its supposed type (MP4) not that you wanted a binary validation that the file was MP4. As in fake.mp4 with the text "hahaha" would work just fine with the encoding_standard_id of MP4 (M4A now it seems).

@jordanpadams
Copy link
Member Author

@al-niessner sorry for the lack of clarity. if possible, we did want to actually perform the binary content validation of the products using the libraries noted in the whitepaper / ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: 🏁 Done
Development

No branches or pull requests

4 participants