-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCO should represent a database record at the record-cell level #415
Comments
Drafting examples are being developed at: https://github.com/casework/CASE-Examples/tree/415-database-records/examples/illustrations/database_records |
CASE-Example development has been merged, and the example is now here. @kchason , could you please copy that The crossover page presents that row 3 (last row) in @kchason , can you please add rows 1 and 3 as a new code snippet with the drafting concepts, with appropriate descriptive narrative (could reword what I put above)? I'll revise this Issue into a proposal now. |
While the referenced examples contain anecdotal use of the new DatabaseRecordFacet and some properties, I believe the Solution Suggestion section of this CP should explicitly outline details of all proposed new classes and properties including property type and cardinality. |
I voted to accept the above, as I do agree with the majority of what is being proposed, but have the following main reservation in the definition drafting.ttl; and in the supporting example database_records.json :
note: A Record contains a [ordered/unordered] collection of Fields, but the current definitions only describes a single field. |
A further reservation is that Databases do not directly have have records; or fields, assigned to them, they only have tables (of types Table, View, and Index), so semantically I think the current naming of a number of the classes and properties in the definition is likely to cause confusion for the following reasons:
Personally, I would much prefer that some time in the future the following simple semantic changes be made to more closely match what seems to be being described by the current draft ontology and its associated example: Current -> Rename drafting:DatabaseRecord -> drafting:TableField drafting:databaseRowID -> drafting:recordRowID and add a new class called drafting:Table to hold the current properties drafting:databaseTable and drafting:databaseSchema. These changes would, in my opinion, also make it relatively simple to describe non-relational databases; and other table types found forensics, using the same classes and properties. |
@gwebb-case , I agree with you on "Record" more typically corresponding with a database row. I'd vaguely recalled relational algebra using "record" when referencing a tuple. As for your field names, we had discussed leaving the possibility open in the future for a higher-level abstraction of "Table" that could also accommodate a table in a HTML, Word, PDF, or other document. So, these two changes I'd NACK, because "tableSchema" would not mesh well with that.
I had thought that
I'm on the fence on whether we need to keep the word "database" in these, as a matter of verbosity vs. concept scope confusion. We might want close to the same property names for doing (in the future) HTML forms. (Skipping a few steps in modeling needs, I could see some desire to certain HTML web form based attacks.) I think for now we would be reasonably future-safe for that other modeling need if we used
|
Implementation has not been completed to the point of exercising the database NULL representation. We could vote on this today if we think it is sufficiently specified, but there is some risk form not testing yet. |
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Looking over the state of the PR, I think there's only one thing I'd like to see changed. If there's anyone else in favor, please note so, and we can make the extension.
Should |
These tests were initially drafted in CASE-Examples, and merged in that repository's Pull Request 89. A follow-on patch will regenerate Make-managed files. References: * casework/CASE-Examples#89 * ucoProject#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
The sample is the SHA-1 of the 0-length string. No effects were observed on Make-managed files. References: * ucoProject#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject/UCO#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed in Make-managed files. References: * ucoProject/UCO#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
A follow-on patch will regenerate Make-managed files. References: * ucoProject/UCO#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
References: * ucoProject/UCO#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Alex,
I know this is a long time since, and I should have reacted sooner. Let me just ask the question and leave it with that.
Why are we modelling the design of a relational database schema, as opposed to what it represents?
We can discuss this Friday if you want.
Best
Paul
… On 26 Jul 2022, at 18:03, Alex Nelson ***@***.***> wrote:
On the CASE Crossover Scenario, a tool's output identified an incorrect association with an ICCID number for a phone. The ICCID number is known to be incorrect because the person who seeded the phone data knows what the ICCID was.
https://caseontology.org/examples/crossover_wmd/ <https://caseontology.org/examples/crossover_wmd/>
The ICCCID should be 8931088918010550289, whereas the tool claimed a value 89390100002217635543. This discrepancy comes from having selected an incorrect record in a SQLite file.
How should UCO designate a certain record within SQLite? SQLite has the advantage of being able to identify its allocated rows with row_number(). (I had thought there was a ROW_NO, but haven't had a chance to go back through the SQLite docs to confirm.)
https://www.sqlite.org/windowfunctions.html <https://www.sqlite.org/windowfunctions.html>
Additional nuance for this example: This example relies on allocated records. Do we need a mechanism to identify unallocated records as well initially (especially for carving work)?
—
Reply to this email directly, view it on GitHub <#415>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAWGQK5RKI7T3JU2EUIAS3VWAD35ANCNFSM54WPRG7Q>.
You are receiving this because you are subscribed to this thread.
|
Hi Alex, I agree with your view, xsd:integer would be by far the better data type to use. In my opinion xsd:integer would still be best irrespective of whether a record is modelled directly on the Database or is based on what it representing, it would also allow it to be modelled on its location based on a disk offset (e.g a record in unallocated disk space). |
Structurally, I agree. Higher precision is typically better when modeling the world (I.e., 4096 vs. 4K)
The only challenge is that, as I’m certain has been discussed, different methods are frequently used to characterize file and disk sizes (i.e. 40GB). People and/or tools populating these values are likely to use non-precise terms.
Perhaps a modifier?
Note: I do support Integer.
…________________________________
From: gjwebb-case ***@***.***>
Sent: Wednesday, November 9, 2022 9:11:20 AM
To: ucoProject/UCO ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [ucoProject/UCO] UCO should represent a database record at the record-cell level (Issue #415)
Hi Alex,
I agree with your view, xsd:integer would be by far the better data type to use.
In my opinion xsd:integer would still be best irrespective of whether a record is modelled directly on the Database or is based on what it representing, it would also allow it to be modelled on its location based on a disk offset (e.g a record in unallocated disk space).
—
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/ucoProject/UCO/issues/415*issuecomment-1308824289__;Iw!!BhdT!g1IBu1mewVDEGmLUXFnrAy1xtSGcU5kpm2Mdtc7rWRI93mDN9xFBd6SrdAxAmk0Hc9Vjm1wxB3_FXb7kKC7UifmTVJo$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAYSFYP2H6SW5DWVQDT6ZJTWHOWIRANCNFSM54WPRG7Q__;!!BhdT!g1IBu1mewVDEGmLUXFnrAy1xtSGcU5kpm2Mdtc7rWRI93mDN9xFBd6SrdAxAmk0Hc9Vjm1wxB3_FXb7kKC7Uk6dBgpo$>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Re: @gjwebb-case
The point I raised about integers was scoped to
I need to warn everybody about usage of the word "Unique" in general. "Unique" implies a universe of scope and a defining authority. @gjwebb-case , what you suggested about using the location based on disk offset would be incongruous with the an identifier supplied by the database engine. If you need to recover such a record from unallocated space, and you don't have an inlined identifier within that record, it would be inappropriate in my opinion to assign your own unique identifier. In that scenario, you should instead use If others agree with avoiding concept conflation, I suggest
Meanwhile, while I prefer Re: @packet-rat :
I suggest not introducing support for "rounding" modifiers like "kiB". The original issue inspiring this proposal needed to select between two records in a single SQLite table that had, IIRC, two records total. Permitting rounding defeats the use case where we need to point to this offset within a disk sector or database page, rather than that offset. |
Re: @plbt5
I'm not quite sure what level this question is at, but I'll reply per my best guess. We are defining a model that can, as a user needs, represent a record with an individual object and a field per layer, rather than requiring representing each layer of the database model (database engine class, database engine instance, backing-store file or disk partition, schema instance if applicable, table instance, row instance, column instance) and relationships between each layer. We do this in part to support provenience (independent of provenance) - so we can relate a single field within a record geometrically with its containing objects (such as the backing-store storage object). Does that answer your question? |
I think this generally makes sense. I have a concern with just switching it to While not really the concern of the ontology community itself, this would cause issues for some of the strongly-typed bindings that have been developed to help generate CASE graphs and would otherwise cause additional logic needed for consumers of the graphs to account for both types. |
On review, not really.
So, I don't think we have precedent, but we have the knowledge of how to encode and test for a union. We can make a union work.
This is an adoption-level concern that is relevant to the ontology committee. Do we know what the total union of datatypes permitted for primary key usage is? E.g. is a |
I'm becoming hesitant to include
|
This could instead refer to the internal record ID contained within the database, but it may not be as readily available from standard database queries or record exports into flat files, so I would still prefer this be defined as the defined primary key of the record. Some databases recommend integers, some suggest strings, and other recommend GUIDs for the primary key field of a database. As @ajnelson-nist pointed out on a call this morning, this also doesn't account for clustered primary keys consisting of two or more fields. I'd still suggest |
The OCs decided this morning that |
The definition of |
References: * ucoProject#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
No effects were observed on Make-managed files. References: * ucoProject#415 Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Background
This Issue stemmed from a question:
Requirements
Requirement 1
UCO must be able to represent an individual cell within a database table.
Requirement 2
UCO must be able to relate a cell within a database to its containing column within the row. Likewise for containing row within the table, table within the database, or table within the schema and schema within the database. Each of these relating properties must tie to the cell.
Requirement 3
At least in the initial representation, it must not be necessary to represent each layer between the whole database as an object, down to schema, down to table, down to row, down to column, down to cell. The cell object itself must be able to carry its locating characteristics.
Requirement 4
The cell contents must be able to represent strings, binary content, floating point numbers (
xsd:decimal
), integers, OR that the cell isNULL
.Risk / Benefit analysis
Benefits
Risks
Competencies demonstrated
Competency 1
The Crossover "WMD" scenario has an issue where a tool drew an assignment for
ICCID
from either a system.dat
file, or from a certain selection within a table that stored ICCID histories. (See the "SIM CARDS" section on that page.) The ground truth is known in this instance, and the value reported by that tool is incorrect.Competency Question 1.1
What database cell provided the ICCID values
89390100002217635543
(known incorrect) and8931088918010550289
(known in ground truth to be correct)?
Result 1.1
Solution suggestion
The solution is drafted in CASE-Examples'
drafting.ttl
here, except a propertydrafting:databaseFieldIsNull
, Boolean-ranged, should be added as part of ansh:xone
(exactly one) constraint incorporating constraints ondrafting:databaseFieldValue
.Coordination
recordRowId
range including strings or not is recordedrecordRowId
definition to:The unique ID that identifies a database record, supplied by the originating database engine.
.develop
The text was updated successfully, but these errors were encountered: