Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate making incorrect assumption that first object has and object length == file_size #781

Closed
jordanpadams opened this issue Dec 4, 2023 · 5 comments Β· Fixed by #783
Closed
Assignees
Labels
B14.1 bug Something isn't working i&t.done s.high

Comments

@jordanpadams
Copy link
Member

jordanpadams commented Dec 4, 2023

Checked for duplicates

Yes - I've already checked

πŸ› Describe the bug

When I did ran validate against a product with numerous objects and a file_size defined, it seems that Validate expects the first object to be that length.

πŸ•΅οΈ Expected behavior

I expected file_size to mean all objects combined

πŸ“œ To Reproduce

See #684 (comment)

πŸ–₯ Environment Info

Mac OSx

πŸ“š Version of Software Used

gov.nasa.pds:validate
Version 3.4.0-SNAPSHOT
Release Date: 2023-12-04 07:19:11

🩺 Test Data / Additional context

See See #684 (comment)

πŸ¦„ Related requirements

No response

βš™οΈ Engineering Details

No response

I&T

TestRail Test ID: T8681196

@jordanpadams jordanpadams added bug Something isn't working needs:triage labels Dec 4, 2023
@jordanpadams jordanpadams self-assigned this Dec 4, 2023
@github-project-automation github-project-automation bot moved this to Release Backlog in B14.1 Dec 4, 2023
@jordanpadams jordanpadams changed the title Validate seems to making incorrect assumption that first object has and object length == file_size Validate making incorrect assumption that first object has and object length == file_size Dec 4, 2023
@al-niessner
Copy link
Contributor

@jordanpadams

This is not a problem with validate but what you want validate to do and complete contradiction to #684

The label does not define the object_length. For #684 it was determined that when object_length was missing, the file size should be used. Because object_length is missing there is no way to tell if the table A in the file overlaps table B. validate thinks that it does because #684 said that missing the object_length use file_size instead forcing the assumption that there would be only one table.

The question is, given that many tables exist in the file, must they use object_length. If not, then do we assume it just works out?

Anyway, the user in #684 wanted validate to use file_size when object_length was missing. This user does not want us doing that. You are the arbiter.

@jordanpadams
Copy link
Member Author

jordanpadams commented Dec 5, 2023

The question is, given that many tables exist in the file, must they use object_length. If not, then do we assume it just works out?

Unfortunately, in this case, I think we have to just assume the tables do not overlap and hope one of the other checks we are doing for the content triggers some other error.

I think we can still use file_size when we do not have multiple tables, but for these kinds of tables, we have to just throw our hands up I guess.

@rchenatjpl if this is in a peer review, can we recommend object_length for table delimited? As @al-niessner notes, we have no way of knowing how big each table is supposed to be, so we just have to trust it is described correctly. Ideally, the label more be more explicit about the object_length of each table so we can actually check each object size against the following offsets.

@jordanpadams
Copy link
Member Author

@al-niessner Would it be possible for us to "guess" the object_length using the numbers records and the record delimiters to know when we have come to the end of a table? So for calculating the size of each table, we assume the point where we encounter the record delimiter of the last record, that is the size of that object.

So from the example, for the first table, we read 1 record, and then calculate the bytes from the expected offset (0) to where we encounter the end of that 1 record (?) and that is our guess for the object_length.

@al-niessner
Copy link
Contributor

al-niessner commented Dec 6, 2023

@jordanpadams

If by "guess" you mean estimate, then maybe. In the case of the data provided for this ticket, no. The file is multiple comma delimited tables where nothing is known about the field sizes just their position and that they are of type string or ascii integer. Obviously ascii integer is constrained in a maximum size, but it can also be 1 byte. Here is an example of what is given:

				<Field_Delimited>
					<name>N_F</name>
					<field_number>2</field_number>
					<data_type>ASCII_Integer</data_type>
					<description>Number of Fixed Parameters</description>
				</Field_Delimited>
				<Field_Delimited>
					<name>BODY_NAME</name>
					<field_number>3</field_number>
					<data_type>ASCII_String</data_type>
					<description>Planet/Body Name</description>
				</Field_Delimited>

From validate's perspective, the interesting part is that for any offset to ever work, then one of two things have to be true: One, all sizes are fixed and therefore could be provided in fixed rather than delimited table. Two, sizes are variable but constrained by an upper maximum which must always be padded. If fixed table was used, then all checks would have simply worked. If variable delimited table, then will verify padding indirectly when the file content is checked against the label.

It seems that if a file area is given with delimited tables that do not define an object length, then skip this check and let the content checker find problems.

PS: for completeness, if padded then object length is known a priori and could easily be defined in the label.

@rchenatjpl
Copy link
Contributor

Sorry, I just now noticed I got tagged. So for me or whomever: recommend object_length for Table_Delimited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B14.1 bug Something isn't working i&t.done s.high
Projects
Status: 🏁 Done
Status: 🏁 Done
Development

Successfully merging a pull request may close this issue.

4 participants