Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple bugfixes for CARQ correction of OFCL track #68

Merged
merged 10 commits into from
Mar 24, 2023

Conversation

SorooshMani-NOAA
Copy link
Collaborator

Fixes #64, #65, #66, #67

Comment on lines -1022 to -1032
forecast.loc[
mslp_missing, "central_pressure"
] = relation.central_pressure(
max_sustained_wind_speed=forecast.loc[
mslp_missing, "max_sustained_wind_speed"
],
background_pressure=forecast.loc[
mslp_missing, "background_pressure"
],
holland_b=holland_b,
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to the fixes in this pull, this update forecast data frame was not stored anywhere and the corrections were discarded. #66

holland_b=holland_b,
)

corr_ofcl_tracks[initial_time] = forecast
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the correction is stored in the corr_ofcl_tracks which replaces the values in the original forecast track. #66

holland_b=holland_b,
)
if all(adv in tracks for adv in ["OFCL", "CARQ"]):
tracks = correct_ofcl_based_on_carq_n_hollandb(tracks)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for CARQ correction is now moved to a standalone function

)
if all(adv in tracks for adv in ["OFCL", "CARQ"]):
tracks = correct_ofcl_based_on_carq_n_hollandb(tracks)
dataframe = combine_tracks(tracks)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corrected tracks are now converted back into a dataframe which replaces the original one #66

Comment on lines -1007 to -1009
mrd_missing = pandas.isna(forecast["radius_of_maximum_winds"])
mslp_missing = pandas.isna(forecast["central_pressure"])
radp_missing = pandas.isna(forecast["background_pressure"])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original values for these variables from the ATCF file were not empty, but they were 0 instead. So, the correction logic would ignore them and didn't fix the incorrect 0 value. #64

Comment on lines +1205 to +1210
columns_of_interest[columns_of_interest == 0] = pandas.NA
missing = columns_of_interest.isna()
# Order of columns is the same as columns_of_interest
mrd_missing = missing.iloc[:, 0]
mslp_missing = missing.iloc[:, 1]
radp_missing = missing.iloc[:, 2]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values are now checked for both NA as well as 0 to indicate the missing data. #64

@@ -1175,6 +1129,7 @@ def separate_tracks(data: DataFrame) -> Dict[str, Dict[str, DataFrame]]:
tracks = {}
for advisory in pandas.unique(data["advisory"]):
advisory_data = data[data["advisory"] == advisory]
advisory_data["forecast_hours"] = advisory_data.forecast_hours.astype(int)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forecast_hours data is used for sorting the dataframe. Because the data was originally of string type, the sort used to happen at character level (i.e. first 1s and then 2s, etc. so that 120 would come before 24). Now this issue is fixed by using int type #67

holland_b = numpy.nanmean(holland_b)

# Get CARQ from forecast hour 0 and isotach 34kt (i.e. the first item)
carq_ref = carq_forecast.loc[carq_forecast.forecast_hours == 0].iloc[0]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CARQ track entry to be used is now the first entry of hour-0 forecast (i.e. 34 kt) #65

# fill OFCL maximum wind radius with the first entry from the CARQ advisory
forecast.loc[
mrd_missing, "radius_of_maximum_winds"
] = carq_forecast["radius_of_maximum_winds"].iloc[0]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first CARQ track could be a -24 hour "forecast", which has missing values for variables of interest. iloc[0] just gets the first entry of carq_forecast dataframe instead of getting the first hour-0 entry

Comment on lines +277 to +289
def test_carq_autofix_ofcl():
track = VortexTrack.from_storm_name(
"Florence", 2018, advisories=["OFCL"], file_deck="a"
)

variables_of_interest = [
"central_pressure",
"background_pressure",
"radius_of_maximum_winds",
]

assert not (track.data[variables_of_interest] == 0).any(axis=None)
assert not (track.data[variables_of_interest].isna()).any(axis=None)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for CARQ correction of OFCL functionality

@codecov
Copy link

codecov bot commented Mar 15, 2023

Codecov Report

Merging #68 (e6b777d) into main (44a5fa4) will increase coverage by 0.11%.
The diff coverage is 97.67%.

@@            Coverage Diff             @@
##             main      #68      +/-   ##
==========================================
+ Coverage   90.89%   91.01%   +0.11%     
==========================================
  Files          18       18              
  Lines        1857     1880      +23     
==========================================
+ Hits         1688     1711      +23     
  Misses        169      169              
Impacted Files Coverage Δ
stormevents/nhc/track.py 92.57% <97.36%> (+0.26%) ⬆️
tests/test_nhc.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@@ -184,7 +184,7 @@ def test_vortex_track_distances():
track_1.distances["BEST"]["20180830T060000"], 8725961.838567913
)
assert numpy.isclose(
track_2.distances["OFCL"]["20180831T000000"], 8882602.389540724
track_2.distances["OFCL"]["20180831T000000"], 3499027.5307995058
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old value was wrong as the order of track points was incorrect due to #67. It used to be:

[(-20.2, 13.1), (-22.3, 13.7), (-44.0, 21.5), (-47.0, 24.0), (-48.5, 27.0), (-24.9, 14.5), (-20.9, 13.2), (-27.7, 15.3), (-30.4, 16.0), (-35.5, 17.5), (-40.0, 19.0)]

while it should be

[(-20.2, 13.1), (-20.9, 13.2), (-22.3, 13.7), (-24.9, 14.5), (-27.7, 15.3), (-30.4, 16.0), (-35.5, 17.5), (-40.0, 19.0), (-44.0, 21.5), (-47.0, 24.0), (-48.5, 27.0)]

This wrong order results in the large difference between the distance values prior to the fix of #67

@SorooshMani-NOAA
Copy link
Collaborator Author

Ideally I'd like to merge this by mid-week next week. I'd appreciate it if you could please review it by then. Especially since you're more familiar with the code @zacharyburnett and @WPringle

@SorooshMani-NOAA SorooshMani-NOAA self-assigned this Mar 15, 2023
zacharyburnett
zacharyburnett previously approved these changes Mar 16, 2023
Copy link
Collaborator

@zacharyburnett zacharyburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, good catch for the issues

@WPringle
Copy link
Contributor

@SorooshMani-NOAA What is the function call you were using for this work?

@SorooshMani-NOAA
Copy link
Collaborator Author

looks good to me, good catch for the issues

Thank you for your quick response @zacharyburnett

@SorooshMani-NOAA
Copy link
Collaborator Author

@SorooshMani-NOAA What is the function call you were using for this work?

@WPringle do mean for the correction?
The logic used to be embedded in the the function that was returning the data. The first time that track data is accessed, it is corrected (if track was not created by passing a dataframe).

def unfiltered_data(self, dataframe: DataFrame):
# fill missing values of MRD and MSLP in the OFCL advisory
if "OFCL" in self.advisories:
tracks = separate_tracks(dataframe)
if "OFCL" in tracks:
ofcl_tracks = tracks["OFCL"]
carq_tracks = tracks["CARQ"]

Now I moved all of that logic to a function at the end of the track.py file:

def correct_ofcl_based_on_carq_n_hollandb(
tracks: Dict[str, Dict[str, DataFrame]]
) -> Dict[str, Dict[str, DataFrame]]:

Does that answer your question?

@WPringle
Copy link
Contributor

@SorooshMani-NOAA I mean what is the example? Like the command from CLI or main python script? So I can repeat and check on my system and try with other advisories too.

@SorooshMani-NOAA
Copy link
Collaborator Author

SorooshMani-NOAA commented Mar 16, 2023

@SorooshMani-NOAA I mean what is the example? Like the command from CLI or main python script? So I can repeat and check on my system and try with other advisories too.

stormevents doesn't have a CLI interface right now. I would install the package and test it in Python. Since the correction is automatically applied to the data, you can just do the following:

import stormevents
my_track = stormevents.nhc.VortexTrack('al062018', file_deck='a')
my_track.to_file("path/to/output/file.dat")

and then look at OFCL entries in a text editor. Or you can just do

my_dataframe = my_track.data

and inspect the entries of the my_dataframe in Python.

It only automatically applies this on OFCL, if you want to apply the same logic on other tracks (to be fixed by CARQ or by other tracks), I need to rename and modify the correct_ofcl_based_on_carq_n_hollandb function a little bit to accept advisory as argument.

Please let me know if this is useful to add or not, thanks!

Update
Note that you can also create a new storm track by name and year:

my_track = stormevents.nhc.VortexTrack.from_storm_name('Florence', 2018, file_deck='a')

@WPringle
Copy link
Contributor

WPringle commented Mar 23, 2023

@SorooshMani-NOAA I think there is something going wrong with getting pressure from CARQ.
Just looking at the 2018091218 forecast as example which I had done before, it is putting min pressure at 883 hPa! I don't know where this came from but I see the correct value from CARQ at 948 hPa being put in the background pressure column.
The background pressure should 1013 from the CARQ as well.

                                               Vmax,  Pmin                                        Pb
AL, 06, 2018091218, 01, CARQ,   0, 304N,  719W,  110,  948, HU,  34, NEQ,  170,  140,  100,  140, 1013, 
AL, 06, 2018091218, 03, OFCL,   0, 304N,  719W,  110,  883, HU,  34, NEQ,  170,  140,  100,  140,  948,

@WPringle
Copy link
Contributor

@SorooshMani-NOAA The Rmax seems to be correct.

@WPringle
Copy link
Contributor

WPringle commented Mar 23, 2023

@SorooshMani-NOAA How is the start_date and end_date being used with this? When I put start_date of '2018091218' it still gives all the data before it.

@WPringle
Copy link
Contributor

WPringle commented Mar 23, 2023

@SorooshMani-NOAA I think we are very close to be able to get single advisory, all you need to do is use start_date to filter out only those dates. Currently start_date and end_dates are doing nothing..

Using the following I could get the file with only OFCL inside correctly:
track = stormevents.nhc.VortexTrack( storm='al062018', start_date='2018091218', end_date=None, advisories='OFCL', file_deck='a', )

@WPringle
Copy link
Contributor

WPringle commented Mar 23, 2023

@SorooshMani-NOAA Then we may add another capability in another PR for a variable called hindcast_days or something that you specify how many days before the start_date to add on using the previous CARQ 0-hr values.

Or we could have a variable called forecast_hours than can be [backward_hours, forward_hours] or just forward_hours, so by default forecast_hours is [0,120] for 120 hour forecast but can be change to be like [-120,120] to get the five days before and five days forecast, or [-48,72], to have two days hindcast, 3-day forecast etc

@SorooshMani-NOAA
Copy link
Collaborator Author

@WPringle Thanks for testing

it is putting min pressure at 883 hPa! I don't know where this came from but I see the correct value from CARQ at 948 hPa being put in the background pressure column. The background pressure should 1013 from the CARQ as well.

This is the logic:

relation = HollandBRelation()
holland_b = relation.holland_b(
max_sustained_wind_speed=carq_forecast["max_sustained_wind_speed"],
background_pressure=carq_forecast["background_pressure"],
central_pressure=carq_forecast["central_pressure"],
)
holland_b[holland_b == numpy.inf] = numpy.nan
holland_b = numpy.nanmean(holland_b)
# Get CARQ from forecast hour 0 and isotach 34kt (i.e. the first item)
carq_ref = carq_forecast.loc[carq_forecast.forecast_hours == 0].iloc[0]
columns_of_interest = forecast[
["radius_of_maximum_winds", "central_pressure", "background_pressure"]
]
columns_of_interest[columns_of_interest == 0] = pandas.NA
missing = columns_of_interest.isna()
# Order of columns is the same as columns_of_interest
mrd_missing = missing.iloc[:, 0]
mslp_missing = missing.iloc[:, 1]
radp_missing = missing.iloc[:, 2]
# fill OFCL maximum wind radius with the first entry from the CARQ advisory
forecast.loc[mrd_missing, "radius_of_maximum_winds"] = carq_ref[
"radius_of_maximum_winds"
]
# fill OFCL background pressure with the first entry from the CARQ advisory central pressure (at sea level)
forecast.loc[radp_missing, "background_pressure"] = carq_ref["central_pressure"]
# fill OFCL central pressure (at sea level) with the 3rd hour entry, preserving Holland B
forecast.loc[mslp_missing, "central_pressure"] = relation.central_pressure(
max_sustained_wind_speed=forecast.loc[
mslp_missing, "max_sustained_wind_speed"
],
background_pressure=forecast.loc[mslp_missing, "background_pressure"],
holland_b=holland_b,
)

So the radius of max wind is picked up from the CARQ directly:

forecast.loc[mrd_missing, "radius_of_maximum_winds"] = carq_ref[
"radius_of_maximum_winds"
]

but forecast's background_pressure is set equal to CARQ's central_pressure

forecast.loc[radp_missing, "background_pressure"] = carq_ref["central_pressure"]

and the forecast's central_pressure is calculated based on HollandB relation and forecast background_pressure and max_sustained_wind_speed:

forecast.loc[mslp_missing, "central_pressure"] = relation.central_pressure(
max_sustained_wind_speed=forecast.loc[
mslp_missing, "max_sustained_wind_speed"
],
background_pressure=forecast.loc[mslp_missing, "background_pressure"],
holland_b=holland_b,
)

I assumed this correction was reviewed between you and @zacharyburnett so I didn't touch it. Please let me know if it needs to change.

@WPringle
Copy link
Contributor

@SorooshMani-NOAA I can go in and make the changes in the code for the correct pressures if you think that's OK?

@SorooshMani-NOAA
Copy link
Collaborator Author

Then we may add another capability in another PR for a variable ...

I'm not sure if I fully understand what you mean, may be it will be covered by this: #70

If it does, please add the description of your use-case in that ticket, if not please either create a new Issue, or let me know so that I can create a new one and copy your description from above.

We can then discuss the details on that separate ticket.

@SorooshMani-NOAA
Copy link
Collaborator Author

@SorooshMani-NOAA I can go in and make the changes in the code for the correct pressures if you think that's OK?

@WPringle sure, please feel free to do so, then please push it to the same branch or create a new branch and I'll merge.

@SorooshMani-NOAA
Copy link
Collaborator Author

SorooshMani-NOAA commented Mar 23, 2023

... Currently start_date and end_dates are doing nothing..

I think all they do is to filter track entries returned by .data based on the datetime (not track_start_time). I will create a new ticket for this one as well, but this one would be outside the scope of this pull request.

Let's discuss the details of the expected behavior in the new ticket.

Updated
The new ticket is #71

@WPringle
Copy link
Contributor

WPringle commented Mar 23, 2023

... Currently start_date and end_dates are doing nothing..

I think all they do is to filter track entries returned by .data based on the datetime (not track_start_time). I will create a new ticket for this one as well, but this one would be outside the scope of this pull request.

Let's discuss the details of the expected behavior in the new ticket.

@SorooshMani-NOAA yeah but I still don't understand why entries that have datetimes before the start_date are output to file.

@SorooshMani-NOAA
Copy link
Collaborator Author

but I still don't understand why entries that have datetimes before the start_date are output to file.

@WPringle let me test on my side to check what I see. In any case, let's continue the discussion about start_time on #71. Thanks!

WPringle
WPringle previously approved these changes Mar 23, 2023
Copy link
Contributor

@WPringle WPringle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's working correctly to replace OFCL central pressure, background pressure, and Rmax from the 0-hr CARQ. So can merge this PR and work on the datetime issue in PR #70

@SorooshMani-NOAA SorooshMani-NOAA merged commit a3e9c47 into main Mar 24, 2023
@SorooshMani-NOAA SorooshMani-NOAA deleted the bugfix/carq_fix branch March 24, 2023 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants