Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax validation and differences between uom-systems/ucum and ucum-essence.xml #173

Open
JohnTimm opened this issue Aug 13, 2020 · 7 comments

Comments

@JohnTimm
Copy link

JohnTimm commented Aug 13, 2020

I am looking for a JSR-385 based library to parse and validate UCUM units in our FHIR server implementation: http://github.com/ibm/fhir with the potential for supporting unit conversion in the future. I wrote a unit test to test parsing on: http://hl7.org/fhir/valueset-ucum-common.html and found a number of issues:

  1. handling of annotations in numerator or denominator (e.g. code: %/100{WBC}, display: percent / 100 WBC)
    Encountered " <ANNOTATION> "{WBC} "" at line 1, column 6.|
    For this one there are also problems with this syntax: /{oif} where there is only an annotation in the denominator (or the numerator)

  2. handling of annotations that contains spaces (e.g. code: %{Negative Control}, display: percent Negative Control)
    Lexical error at line 1, column 11. Encountered: " " (32), after : "{NEGATIVE"

  3. missing symbols(e.g. [iU] (or [IU]) for international units, bit_s, bd, etc.

Here are the numbers from the codes listed in that valueset:

Total: 1364
Success: 1117
Error: 247

It looks like part of the problem is how strict the UCUM format parser is. In the short term, I can look at "fixing up" some of the codes before passing them to the parser (e.g. remove spaces from annotations). The thing that concerns me the most, however, is how many missing symbols there are. Especially if you consider what's in ucum-essence.xml and compare that to the resource files that the format parser uses.

Is there a way to configure UCUMFormatParser to use ucum-essence.xml as a starting point for its symbol map? I looked into using Eclipse uomo but the activity there isn't the same as this project and it doesn't look like it is up to speed on its JSR 385 compliance. Please advise.

Here's a list of the 247 codes that generated exceptions:

%/100{WBC}
%{Negative Control}
/[arb'U]
/[HPF]
/[iU]
/[LPF]
/[HPF]
/[LPF]
/1010
/10
12
/1012{rbc}
/10
6
/109
/100{cells}
/100{neutrophils}
/100{spermatozoa}
/100{WBC}
/100{WBCs}
/cm[H2O]
[APL'U]
[APL'U]/mL
[arb'U]
[arb'U]/L
[arb'U]/mL
[AU]
[BAU]
[beth'U]
[beth'U]
[CFU]
[CFU]/L
[CFU]/mL
[Ch]
[drp]
[drp]/[HPF]
[drp]/h
[drp]/min
[drp]/mL
[drp]/s
[GPL'U]
[iU]
[IU]/(2.h)
[IU]/(24.h)
[IU]/10
9{RBCs}
[IU]/d
[IU]/dL
[IU]/g
[IU]/g{Hb}
[iU]/g{Hgb}
[IU]/h
[IU]/kg
[IU]/kg/d
[IU]/L
[IU]/min
[IU]/mL
[MPL'U]
[tb'U]
[todd'U]
[todd'U]
{# of calculi}
{# of donor informative markers}
{# of fetuses}
{# of informative markers}
{2 or 3 times}/d
{3 times}/d
{4 times}/d
{5 times}/d
{cells}/[HPF]
{clock time}
U{G}
{P2Y12 Reaction Units}
1012/L
10
3
103.{RBC}
10
3.U
103/L
10
3/mL
103/uL
10
3{Copies}/mL
10*-3{Polarization'U}
105
10
6
106.[iU]
10
6.eq/mL
106.U
10
6/{Specimen}
106/kg
10
6/L
106/mL
10
6/mm3
106/uL
10
-6{Immunofluorescence'U}
108
10
9/L
109/mL
10
9/uL
cm[H2O]
cm[H2O]/(s.m)
cm[H2O]/L/s
cm[Hg]
dB
eq
eq/L
eq/mL
eq/mmol
eq/umol
GBq
[iU]
k[IU]/L
k[IU]/mL
kPa
m[iU]
m[IU]/L
m[IU]/mL
meq
meq/(12.h)
meq/(2.h)
meq/(24.h)
meq/(8.h)
meq/(8.h.kg)
meq/(kg.d)
meq/{Specimen}
meq/d
meq/dL
meq/g
meq/g{Cre}
meq/h
meq/kg
meq/kg/h
meq/kg/min
meq/L
meq/m2
meq/min
meq/mL
mg/d/(173.10*-2.m2)
mL/cm[H2O]
mL/min/(173.10*-2.m2)
mm[H2O]
mm[Hg]
mosm
mosm/kg
mosm/L
mPa
ng/106
osm/kg
osm/L
U/10
10{cells}
U/1012
U/10
6
U/109
u[IU]
u[IU]/L
u[IU]/mL
ueq
ueq/L
ueq/mL
10
4/uL
[bdsk'U]
cm[H2O]/s/m
{CPM}/103{cell}
U/10
10
U/(10.g){feces}
U{25Cel}/L
U{37Cel}/L
U/1012{RBCs}
{Globules}/[HPF]
g/(8.h){shift}
g/kg/(8.h){shift}
[HPF]
[GPL'U]/mL
[MPL'U]/mL
[in_i'H2O]
[IU]
[IU]/L{37Cel}
[IU]/mg{creat}
[ka'U]
[LPF]
[mclg'U]
meq/g{creat}
meq/{specimen}
meq/{total_volume}
10
6.[CFU]/L
106.[IU]
10
6/(24.h)
mPa.s
ng/106{RBCs}
nmol/min/10
6{cells}
{#}/[HPF]
{#}/[LPF]
osm
/104{RBCs}
/[IU]
/10
3
/103.{RBCs}
/10
12{RBCs}
103{copies}/mL
10
3{RBCs}
%[slope]
/100{Spermatozoa}
[Amb'a'1'U]
[CCID_50]
[D'ag'U]
[diop]
[dye'U]
[FFU]
[hnsf'U]
[hp_C]
[hp_M]
[hp_Q]
[hp_X]
[in_i'Hg]
[iU]/dL
[iU]/g
[iU]/kg
[iU]/L
[iU]/mL
[knk'U]
[Lf]
[mesh_i]
[MET]
[p'diop]
[PFU]
[PNU]
[S]
[smgy'U]
[smoot]
[TCID_50]
[USP'U]
10*
10^
a_g
a_j
a_t
b
B
B[kW]
B[mV]
B[SPL]
B[uV]
B[V]
B[W]
Bd
bit_s
k[iU]/mL
m[H2O]
m[Hg]
R
REM

Needs #59

@keilw
Copy link
Member

keilw commented Aug 26, 2020

Thanks for the input and creating the JUnit test, is there a chance it could be run here, e.g. on a special Maven profile?
There are a few units we found missing, but those helping us then cold not contribute further on it: #59 Does that match the missing symbols or units?
You are right about UOMo UCUM, it is fully functional and supports the latest ucum-essence.xml, but it is currently based on version 1.x of the API and Indriya (JSR 363)
The biggest difference between the UCUM class and the XML file is, that the class implements the SystemOfUnits interface and supports the type-safe unit model of JSR 385, while UOMO UCUM implements the most basic types like Unit, but with late-binding via quantity wildcard. The UnitFormat implementations also do that for parsing, but I can't say for sure, if it would be possible to use if for UCUM the same way UOMo does?

@keilw keilw added this to To do in Unit Systems via automation Sep 3, 2020
@keilw
Copy link
Member

keilw commented Sep 4, 2020

In https://ucum.org/ucum.html#chemical the "international unit" exists twice with variations of both the print format and c/s. The only way to manifest that is via an alias like INTERNATIONAL_UNIT_ALT ("alternate", happy about other name suggestions) because there is no UnitFormat.alias() that would work for a variant. Parsing the c/i variant leads to an ambiguity, AFAIK the first one is picked there. If we should eliminate one, please advise, but it seems the ucum-essence contains a few of those irregularities, not many but a handful maybe.

@keilw
Copy link
Member

keilw commented Sep 4, 2020

Btw, how come BAUD is missing, it is already there since 2018?
I also added tests for Baud to https://github.com/unitsofmeasurement/uom-systems/blob/master/ucum/src/test/java/systems/uom/ucum/format/UCUMFormatTable4Test.java, so @JohnTimm could you elaborate, what fails with "Bd"?

@keilw
Copy link
Member

keilw commented Sep 6, 2020

@JohnTimm I hope, you are well because we haven't heard any feedback for almost a month? A significant number of these are combinations with previously missing units like "eq" but most of them are there now, could you repeat the test with 2.1-SNAPSHOT of uom-systems?

@alexanderkiel
Copy link

I also need all units from https://build.fhir.org/valueset-ucum-units.html. I would happy to contribute with some guideline.

@keilw
Copy link
Member

keilw commented Oct 26, 2023

@alexanderkiel Is that list from FHIR identical to the 1364+ entries in UCUM?
It seems many of them are not in the latest UCUM files, and a large portion are combined units like "pmol/min" which should be derived from either UCUM or other system units like PICO(MOL).divide(MINUTE).
Others are annotated units created like RED_BLOOD_CELLS = ((AbstractUnit)Units.ONE).annotate("RBC") or Unit<Volume> PERCENT_VOL = ((AbstractUnit)Units.PERCENT).annotate("vol").

There is no system for that, and it does not seem part of UCUM, so either something application specific or a domain specific system under uom-domain, I'd say a module under health sounds appropriate. You'd be more than welcome to contribute if you have time.

@alexanderkiel
Copy link

Hi @keilw I'm not an expert in UCUM. I work on a FHIR server written in Clojure/Java and use the systems.uom/systems-ucum and systems.uom/systems-quantity dependencies inside a query engine in order to be able to represent quantities so that the calculations are able to make use of some unit conversations.

Both the data and the queries can contain UCUM units from the FHIR UCUM Valueset I mentioned above. All the quantities have to pass a parsing step before I can evaluate queries. So I have a problem if I encounter a unit that can't be parsed.

Although it would be good to support as many units as possible in the future, maybe you have a recommendation for me how I can deal with unknown units. Is there a hook were I can just return for example an annotated dimensionless unit for unknown units?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants