The Hawkins-Young Uncertainty Method #240
Replies: 4 comments
-
Thanks for laying this out so clearly. I don't know your implementation plans here in eLCI, but it could be valuable to have this function available more broadly within esupy if you are open to considering it. |
Beta Was this translation helpful? Give feedback.
-
I'm open to it!
…On Wed, May 8, 2024, 19:03 Ben Young (ERG) ***@***.***> wrote:
Thanks for laying this out so clearly. I don't know your implementation
plans here in eLCI, but it could be valuable to have this function
available more broadly within esupy if you are open to considering it.
—
Reply to this email directly, view it on GitHub
<#240 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHCFB5MBIVX46SDWMQHXXHLZBKVKTAVCNFSM6AAAAABHNWOABOVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TGNRRGI3DS>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
The one suggestion I would make would be to allow a single data point to be evaluated. Currently, it would fail and there's no checks to prevent it (that's another point). For sensitivity analysis using Monte Carlo and log-normal error distributions, it might be nice to provide a value of |
Beta Was this translation helpful? Give feedback.
-
After testing the 2020 baseline inventory in eLCI, I've made a few updates/improvements to the main function. def hawkins_young(data, ef, alpha):
"""Model a log-normal uncertainty distribution to a dataset.
The (geometric) mean and (geometric) standard deviation are fitted to an
assumed distribution that has an expected value of the emission factor,
`ef`, and the 95th percentile at the 90% confidence upper interval.
This modeled log-normal distribution is for use with Monte-Carlo
simulations to guarantee non-negative emission values with an expected
value that matches a given emission factor.
This method does not fit a log-normal distribution to the given data!
Parameters
----------
data : numpy.array
A data array.
ef : float
The emission factor, the expected value.
alpha : float
The confidence level, expressed as a fraction
(e.g., 90% confidence = 0.9).
Returns
-------
dict
A dictionary of results. Keys include the following.
- 'mu' (float): The mean of a normally distributed values of
Y = log(X)
- 'sigma' (float): The standard deviation of the normally distributed
values of Y = log(X)
- 'mu_g' (float): The geometric mean for the log-normal distribution.
- 'sigma_g' (float): The geometric standard deviation for the
log-normal distribution.
- 'ci_%' (float): The upper end of the 90% confidence interval for X
(expressed as a percentage).
- 'error' (bool): Whether the method failed (e.g., too few data
points, ci < -1, ef < 0). To be used to quality check results.
"""
# From Young et al. (2019) <https://doi.org/10.1021/acs.est.8b05572>,
# the prediction interval is expressed as the percentage of the expected
# release factor; Eq 3. expresses it as:
# P = s * sqrt(1 + 1/n)*z/y_hat
# where:
# s is the standard error of the expected value, SEM
# n is the sample size
# z is the critical value for 90% confidence
# y_hat is the expected value
# Note that there is no assumed log-normal distribution here.
# HOTFIX nans in z and ciu calcs [2024-05-14; TWD]
is_error = True
n = len(data)
z = 0.0
if n > 1:
is_error = False
z = t.ppf(q=alpha, df=n-1)
se = np.std(data)/np.sqrt(n)
y_hat = data.mean()
ciu = 0.0
if y_hat != 0:
ciu = se*np.sqrt(1 + 1/n)*z/y_hat
if ciu <= -1:
is_error = True
ciu = -9.999999e-1 # makes log(0.0000001) in hawkins_young_sigma
# Use least-squares fitting for the quadratic.
# NOTE: remember, we are fitting sigma, the standard deviation of the
# underlying normal distribution. A 'safe' assumption is to
# expect sigma to be between 1 and 5. So run a few fits and
# get the one that isn't negative (most positive).
# Alternatively, we could take std(ddof=1) of the log of the data
# to get an estimate of the standard deviation and search across
# 4x's of it. See snippet code for method:
# `s_std = np.round(4*np.log(data).std(ddof=1), 0)`
all_ans = []
for i in uniform.rvs(0, 6, size=10):
ans = least_squares(
hawkins_young_sigma, i, kwargs={'alpha': alpha, 'ciu': ciu})
all_ans.append(ans['x'][0])
# Find the minimum positive root:
all_ans = np.array(all_ans)
sigma = all_ans[np.where(all_ans > 0)].min()
if ef < 0:
is_error = True
mu = np.nan
else:
mu = np.log(ef) - 0.5*sigma**2
mu_g = np.exp(mu)
sigma_g = np.exp(sigma)
return {
'mu': mu,
'sigma': sigma,
'mu_g': mu_g,
'sigma_g': sigma_g,
'ci_%': ciu*100,
'error': is_error,
} |
Beta Was this translation helpful? Give feedback.
-
It took a while, but I think I am finally coming to grips with the uncertainty calculation used in eLCI's generation.py. Most of the origins were lost in the evolution of
_geometric_mean
and_calc_geom_std
sub-modules found in theaggregate_data
function. This seems reasonable having waded through all the approximation methods, value substitutions, and distribution assumptions that serve as the background.First, I had to dispel the error function approximation methods that were used at the heart of this method.
As a reminder, the cumulative distribution function, D(x), is given by:
and let,$x=EF\times(1+CIU)$ and $\mu=\ln(EF)-0.5\times\sigma^2$ , where $EF$ is the emission factor and $CIU$ is the 90% confidence interval expressed as a fraction (or percentage), such that $EF\times(1+CIU)$ gives you the upper limit value. By substitution, you get:
So we are looking for the$x$ that gives us the error-function value 0.9.
Here are the two approximate methods of Abramowitz & Stegun's equations (.html), which have been designed to be minimized for a value of 0.9 (hence the
return 0.9-r
).Just using scipy's least squares method, we can find the x value associated with 0.9.
And we see the output:
Okay, we can use scipy's special inverse error function,
erfinv
in place of the approximations.Let's looks at example data, which has been purposefully log-normally oriented, but doesn't have to be:
In eLCI, the emission factor tends to be the regional sum and our confidence interval level is 90% (inexcusably labeled here as alpha, which is in all the textbooks, confidence = 1 - alpha).
The goal was to assign log-normal distribution values for the geometric mean and geometric standard deviation where the expected value is the emission factor and the 95th percentile is the 90% CIU.
The output is:
We see that the expected value of the error distribution matches the emission factor and the CDF is 0.95!
If you're feeling it, you can check these values against their fitted alternatives (this assumes the data are log-normal, which is not the assumption held within the Hawkins-Young method).
Start with a handy function that computes geometric mean and geometric standard deviation from a given dataset.
Then get the results.
Which, in my case, returned the following:
I expect them to be different and they are. The assumptions are laid out now for the Hawkins-Young method and it serves its purpose. Please feel free to comment.
Beta Was this translation helpful? Give feedback.
All reactions