-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose new environmental consideration information for ML models #396
Comments
i dont understand the issue
this description does in no way describe the actual problem, but gives a reason why a certain problem shall be solved |
@stevespringett can you help me here? I dont see a reason for putting these values in an ML-BOM. |
@jkowalleck The energy crisis for AI was just starting to happen when the AI/ML workgroup was operational. Over the last year, the crisis has grown exponentially. Organizations previously were talking about being carbon neutral. With the energy demands of AI, that likely is not possible. This reality is captured in the text of the AI Act. The energy considerations can also be combined with CDXA so that organizations can attest to the data in the model card. The environment consideration support that Matt is working on will help CycloneDX adopters meet requirements in the AI Act.
This is the use case that Matt is trying to achieve with this feature. |
To frame this in a use case: As a AI producer or operator, I want the ability to represent environmental concerns including energy consumption and CO2 emissions throughout the lifecycle of a model, including data acquisition, training and fine-tuning, to MLOps (including inference). I want to use CycloneDX to help my organization comply with the environmental transparency requirements in the AI Act. |
Environmental costs for ML-BOM is just one aspect. Thing is, all these "costs" are currently (in real world) priced in money (taxes, operational costs, RnD, etc). |
Valid point. However, the same logic could be applied to the majority of the model card, including performance metrics and biases. But that's not where the industry is currently at. But in the proposed design, we could reuse this data outside of just the model card in a generic sense and make it available to every component and service. |
that sounds good. finding a generalized solution that can be reused 👍 PS: here are others asking for a generic approach
|
Existing work/art in the field : Green Software Foundation - Impact Framework - see https://if.greensoftware.foundation/ |
a followp will be #406 |
The fact that datasets used to train AI models are increasingly large and take an enormous amount of energy (and indirectly produce large CO2 emissions) to develop, train and run has come to the forefront. This PR contains proposed additions to the "modelCard" type to account for these considerations when selecting/utilizing a model. - Adds `ModelCardConsiderations.environmentalConsiderations` this fixes #396 - Adds `OrganizationalEntity.address` ---- TODO - [x] modify JSON schema - [x] modify XML schema - [x] modify protobuf schema - [x] add examples & test resources
## Added * Core enhancement: Attestation ([#192](#192) via [#348](#348)) * Core enhancement: Cryptography Bill of Materials — CBOM ([#171](#171), [#291](#291) via [#347](#347)) * Feature to express the URL to source distribution ([#98](#98) via [#269](#269)) * Feature to express the URL to RFC 9116 compliant documents ([#380](#380) via [#381](#381)) * Feature to express tags/keywords for services and components (via [#383](#383)) * Feature to express details for component authors ([#335](#335) via [#379](#379)) * Feature to express details for component and BOM manufacturer ([#346](#346) via [#379](#379)) * Feature to express communicate concluded values from observed evidences ([#411](#411) via [#412](#412)) * Features to express license acknowledgement ([#407](#407) via [#408](#408)) * Feature to express environmental consideration information for model cards ([#396](#396) via [#395](#395)) * Feature to express the address of organizational entities (via [#395](#395)) * Feature to express additional component identifiers: Universal Bill Of Receipts Identifier and Software Heritage persistent IDs ([#413](#413) via [#414](#414)) ## Fixed * Allow multiple evidence identities by XML/JSON schema ([#272](#272) via [#359](#359)) This was already correct via ProtoBuff schema. * Prevent empty `license` entities by XML schema ([#288](#288) via [#292](#292)) This was already correct in JSON/ProtoBuff schema. * Prevent empty or malformed `property` entities by JSON schema ([#371](#371) via [#375](#375)) This was already correct in XML/ProtoBuff schema. * Allow multiple `licenses` in `Metadata` by ProtoBuff schema ([#264](#264) via [#401](#401)) This was already correct in XML/JSON schema. ## Changed * Allow arbitrary `$schema` values by JSON schema ([#402](#402) via [#403](#403)) * Increased max length of `versionRange` (via [`3e01ce6`](3e01ce6)) * Harmonized length of `version` (via [#417](#417)) ## Deprecated * Data model "Component"'s field `author` was deprecated. (via [#379](#379)) Use field `authors` or field `manufacturer` instead. * Data model "Metadata"'s field `manufacture` was deprecated. ([#346](#346) via [#379](#379)) Use "Metadata"'s field `component`'s field `manufacturer` instead. - for XML: `/bom/metadata/component/manufacturer` - for JSON: `$.metadata.component.manufacturer` - for ProtoBuf: `Bom:metadata.component.manufacturer` ## Documentation * Centralize version and version-range (via [#322](#322)) * Streamlined SPDX expression related descriptions (via [#327](#327)) * Enhanced descriptions of `bom-ref`/`refType` ([#336](#336) via [#344](#344)) * Enhanced readability of enum documentation in JSON schema ([#361](#361) via [#362](#362)) * Fixed typo "compliment" -> "complement" (via [#369](#369)) * Added documentation for enum "ComponentScope"'s values in JSON schema ([#293](#293) via [`d92e58e`](d92e58e)) Texts were a taken from the existing ones in XML/ProtoBuff schema. * Added documentation for enum "TaskType"'s values ([#245](#245) via [#377](#377)) * Improve documentation for data model "Metadata"'s field `licenses` ([#273](#273) via [#378](#378)) * Added documentation for enum "MachineLearningApproachType"'s values ([#351](#351) via [#416](#416)) * Rephrased some texts here and there. ## Test data * Added test data for newly added use cases * Added quality assurance for our ProtoBuf schemas ([#384](#384) via [#385](#385))
see #396 (comment)
The fact that datasets used to train AI models are increasingly large and take an enormous amount of energy (and indirectly produce large CO2 emissions) to develop, train and run has come to the forefront. This PR contains proposed additions to the "modelCard" type to account for these considerations when selecting/utilizing a model.
Background:
many more from any search engine...
The text was updated successfully, but these errors were encountered: