Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ActOfDataTransformation up to the Event Ontology #211

Closed
APCox opened this issue Jan 10, 2024 · 4 comments · Fixed by #388
Closed

Move ActOfDataTransformation up to the Event Ontology #211

APCox opened this issue Jan 10, 2024 · 4 comments · Fixed by #388
Assignees

Comments

@APCox
Copy link
Contributor

APCox commented Jan 10, 2024

cco:ActOfDataTransformation currently resides in a CCO extension ontology, but is generic enough that it more appropriately belongs in the CCO. Propose moving the term into the Event Ontology along with its parent class cco:ActOfInformationProcessing.

# http://www.ontologyrepository.com/CommonCoreOntologies/ActOfDataTransformation
:ActOfDataTransformation a owl:Class;
  rdfs:subClassOf :ActOfInformationProcessing;
  :definition "An Act of Information Processing in which an algorithm is executed to transform one or more input Information Content Entities into one or more output Information Content Entities."@en;
  :elucidation "It is not a requirement that the output Information Content Entity(ies) be qualitatively distinct from the input(s) as a result of an Act of Data Transformation, though doing so is typically the goal of performing this Act. Consider, for example, selecting a column in an Excel spreadsheet then executing the \"Remove Duplicates\" Algorithm on it. The intent is to remove rows in that column containing duplicate content. If no duplicate values are present, the information in the column remains unchanged but an Act of Data Transformation was nonetheless performed."@en;
  :is_curated_in_ontology "http://www.ontologyrepository.com/CommonCoreOntologies/Mid/EventOntology"^^xsd:anyURI;
  rdfs:label "Act of Data Transformation"@en .

# http://www.ontologyrepository.com/CommonCoreOntologies/ActOfInformationProcessing
:ActOfInformationProcessing a owl:Class;
  rdfs:subClassOf :IntentionalAct;
  :definition "A Planned Act in which one or more input Information Content Entities are received, manipulated, transferred, or stored by an Agent."@en;
  :is_curated_in_ontology "http://www.ontologyrepository.com/CommonCoreOntologies/Mid/EventOntology"^^xsd:anyURI;
  rdfs:label "Act of Information Processing"@en .
@mark-jensen
Copy link
Contributor

I agree these probably deserve a home in CCO-mid. Although I can see case for scoping a domain ontology for information processing. But until then, let's keep them here.

The definition for ActOfDataTransformation is circular. What it means to transform something needs to articulated. Presumably 'manipulated' in the parent class includes transformation.

Fist hit on google:
"Data transformation is the process of converting, cleansing, and structuring data into a usable format ...". It goes on to suggest four types:

  • Constructive, where data is added, copied or replicated
  • Destructive, where records and fields are deleted
  • Aesthetic, where certain values are standardized, or
  • Structural, which includes columns being renamed, moved, and combined

Wikipedia says "In computing, data transformation is the process of converting data from one format or structure into another format or structure."

Question is: do we cast a wide net and allow transformation to include generating new content, eg- when a table is "transformed" into a graph with added content provided by the semantic model is added, or, limit it to formatting and structural changes? If the former, then how do we reconcile transformation with statistical and ML processes?

@cameronmore
Copy link
Contributor

Act of Data Transformation = An Act of Information Processing in which an algorithm is executed to act upon one or more input Information Content Entities into one or more output Information Content Entities.

Saying 'act upon' avoids the problem of enumerating the possibilities of transformation (conversion, restructuring, etc), and also (per the elucidation) allows for the possibility that the data is not changed, just acted upon. I may have a function that removes references to a certain word in a body of text, but if the text never contained that word, then the text data that was transformed never actually changed.

@alanruttenberg
Copy link
Contributor

Answering the comment on #133, yes, this is the sort of thing I'm looking for. However, it's not a data transformation if there's no change, so I don't buy the rationale for "act on". That might be appropriate for a more neutral superclass act of information processing. However, a problem with its current definition is that merely receiving doesn't seem to be a "processing", in the normal sense. If something is received and acted on, that's a processing.

Also, there's something wrong with the grammar: "act on ... into". Note the OBI definition "A planned process that produces output data from input data.". Perhaps: a process which takes as input an information content entity and has output a changed input or new output information content entity.
As an aside I dislike the automatic prefix "Act of". It's hard to misinterpret the simpler "data transformation".

There's a bit of an issue with intentional processes in general, in that it isn't clear what the scope of intention is. Suppose I run a command line with an input being not the file I intended, perhaps because autocomplete completed the wrong thing. That doesn't seem to satisfy the definition of "Planned Act": "An Act in which at least one Agent plays a causative role and which is prescribed by some Directive Information Content Entity held by at least one of the Agents.". The first part holds (assuming general problems with definitions which depend on cause are resolved), but the second clause would seem to not be.

@neilotte neilotte added for 2.0 release This label indicates updates to be made in the 2.0 release, which will include a new IRI format. and removed term request labels Aug 18, 2024
@neilotte
Copy link
Contributor

@cameronmore Please move out on executing the revision you've articulated based on @mark-jensen 's request.

For @alanruttenberg's comments regarding treating these as 'acts', this is a matter that warrants a larger discussion. Hence I'll start a thread on the discussion board and cite both this ticket and 133 for context. Thanks!

@neilotte neilotte added for 1.6 release and removed for 2.0 release This label indicates updates to be made in the 2.0 release, which will include a new IRI format. labels Aug 18, 2024
cameronmore added a commit to cameronmore/CommonCoreOntologies that referenced this issue Aug 18, 2024
@cameronmore cameronmore linked a pull request Aug 18, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants