Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHINY: support migration from Chandra.cmd_states / Ska.ParseCM to kadi #30

Merged
merged 9 commits into from
Aug 14, 2020

Conversation

taldcroft
Copy link
Contributor

@taldcroft taldcroft commented Jun 30, 2020

Description

This makes necessary changes to migrate from using Chandra.cmd_states and Ska.ParseCM to just using kadi.commands and kadi.commands.states.

This is a WIP and requires the ska3 "shiny" distribution (https://github.com/sot/skare3/wiki/Shiny-Ska3). At this moment the shiny distribution is not available for installation and testing, so this PR is just to let ACIS know I'm working on this. After taking a look at the original code I realized that it would probably be a tough job for someone else to do this migration, and in fact I ended up making a number of substantive changes to kadi to support this.

Of course this will need comprehensive testing which is best done by ACIS.

Apologies for all the whitespace changes. VS code just does this for me (extraneous whitespace not allowed by astropy), but it is easy enough to change the GitHub diff settings to ignore whitespace (there is a settings button on the file diffs page, google if you can't find it).

This is paired with a PR on backstop_history, to be submitted shortly.

In addition, this fixes what appears to be bug in calculating the MD5 sum for a thermal model spec. It seems to be computing the MD5 sum on the file name string, not the file contents.

Testing

  • [N/A] Passes unit tests (no unit tests)
  • Functional testing (described below)

Functional testing

This is not intended to be comprehensive but just a check that this PR is on the right track.

Using "sql" state builder

On my Mac using the ska3-shiny distribution:

ska3-shiny) ➜  acis_thermal_check git:(shiny) dpa_check \                                  
   --outdir=out_shiny_shiny \
   --oflsdir=${SKA}/data/ska_testr/test_loads/2019/MAY2019/oflsa \
   --nlet_file=${SKA}/data/ska_testr/test_loads/2019/MAY2019/oflsa/NonLoadTrackedEvents.txt \
   --state-builder=sql \
   --run-start=2019:135:12:00:00
#####################################################################
# dpa_check (version 3.0.0) run at Tue Jun 30 12:19:03 2020 by aldcroft
# acis_thermal_check version = 3.1.1.dev5+g8bdd907
# model_spec file = /Users/aldcroft/miniconda3/envs/ska3-shiny/lib/python3.6/site-packages/dpa_check/dpa_model_spec.json
# model_spec file MD5sum = 014a9257120258350c148489799f7da6
#####################################################################

Command line options:
{'T_init': None,
 'backstop_file': '/Users/aldcroft/ska/data/ska_testr/test_loads/2019/MAY2019/oflsa',
 'days': 21.0,
 'interrupt': False,
 'model_spec': '/Users/aldcroft/miniconda3/envs/ska3-shiny/lib/python3.6/site-packages/dpa_check/dpa_model_spec.json',
 'nlet_file': '/Users/aldcroft/ska/data/ska_testr/test_loads/2019/MAY2019/oflsa/NonLoadTrackedEvents.txt',
 'oflsdir': '/Users/aldcroft/ska/data/ska_testr/test_loads/2019/MAY2019/oflsa',
 'outdir': 'out_shiny_shiny',
 'pred_only': False,
 'run_start': '2019:135:12:00:00',
 'state_builder': 'sql',
 'traceback': True,
 'verbose': 1,
 'version': False}

ACISThermalCheck is using the 'sql' state builder.
Fetching telemetry between 2019:114:12:00:00.000 and 2019:135:12:00:00.000
Calculating DPA thermal model
RLTT = 2019:140:01:00:00.000
sched_stop = 2019:146:23:52:35.589
Fetching msid: dp_dpa_power over 2019:135:11:09:34.816 to 2019:147:00:13:42.816
Making temperature prediction plots
Writing plot file out_shiny_shiny/1dpamzt.png
Writing plot file out_shiny_shiny/pow_sim.png
Writing plot file out_shiny_shiny/roll.png
Checking for limit violations
Writing states to out_shiny_shiny/states.dat
Writing temperatures to out_shiny_shiny/temperatures.dat
Getting commanded states between 2019:114:12:02:38.816 - 2019:135:11:58:46.816
Calculating DPA thermal model for validation
Fetching msid: 1dpamzt over 2019:114:11:40:46.816 to 2019:135:12:15:10.816
Fetching msid: dp_dpa_power over 2019:114:11:40:46.816 to 2019:135:12:15:10.816
Making DPA model validation plots and quantile table
Writing plot file out_shiny_shiny/1dpamzt_valid.png
Writing plot file out_shiny_shiny/1dpamzt_valid_hist.png
Writing plot file out_shiny_shiny/pitch_valid.png
Writing plot file out_shiny_shiny/pitch_valid_hist.png
Writing plot file out_shiny_shiny/tscpos_valid.png
Writing plot file out_shiny_shiny/tscpos_valid_hist.png
Writing plot file out_shiny_shiny/roll_valid.png
Writing plot file out_shiny_shiny/roll_valid_hist.png
Writing plot file out_shiny_shiny/ccd_count_valid.png
Writing quantile table out_shiny_shiny/validation_quant.csv
Writing validation data out_shiny_shiny/validation_data.pkl
Checking for validation violations
WARNING: PITCH 1% quantile value of -3.107 exceeds limit of 3.00
WARNING: PITCH 99% quantile value of 4.022 exceeds limit of 3.00
validation warning(s) in output at out_shiny_shiny
Writing report file out_shiny_shiny/index.rst

The output states are the same except for the first one. kadi states always start at exactly the requested start time, not the start of the state it found.

(ska3-shiny) ➜  acis_thermal_check git:(shiny) diff out_shiny_shiny/states.dat ~/tmp/dpa/out_flight/states.dat 
2c2
< 5     1       2019:135:11:36:54.816   2019:135:12:18:50.816   66.40351472572986       ENAB    5       RETR    RETR    20552 NPNT     94.59   XTZ0000005      0.391128545     -0.739199939    0.237963076     0.493938747     239.01337324562377      175.04386642278533     TE_00458        -536    92904           674307484.00    674310000.00    1
---
> 5     1       2019:135:09:32:10.816   2019:135:12:18:50.816   66.40351472572986       ENAB    5       RETR    RETR    20552 NPNT     94.59   XTZ0000005      0.391128545     -0.739199939    0.237963076     0.493938747     239.01337324562377      175.04386642278533     TE_00458        -536    92904   pitch   674300000.00    674310000.00    1

The output predicted temperatures are the exactly the same (when rounded to 0.01 degC) except for a startup transient that I believe related to the flight version starting 2 hours earlier (09:32 instead of 11:36).
image

Using "ACIS" state builder

Run on my Mac. This uses an sshfs mount to see the appropriate files on the HEAD LAN.

(ska3-shiny) ➜  acis_thermal_check git:(shiny) dpa_check \                                                    
   --outdir=headout \
   --oflsdir=/Users/aldcroft/kady/ska/data/acis/LoadReviews/2018/MAY2818/oflsa \
   --run-start=2018:142
GET_BS_CMDS - Using backstop file /Users/aldcroft/kady/ska/data/acis/LoadReviews/2018/MAY2818/oflsa/CR147_2306.backstop
GET_BS_CMDS - Found 1230 backstop commands between 2018:147:23:25:00.000 and 2018:154:23:29:21.284
#####################################################################
# dpa_check (version 3.0.0) run at Tue Jun 30 12:25:10 2020 by aldcroft
# acis_thermal_check version = 3.1.1.dev5+g8bdd907
# model_spec file = /Users/aldcroft/miniconda3/envs/ska3-shiny/lib/python3.6/site-packages/dpa_check/dpa_model_spec.json
# model_spec file MD5sum = 014a9257120258350c148489799f7da6
#####################################################################

Command line options:
{'T_init': None,
 'backstop_file': '/Users/aldcroft/kady/ska/data/acis/LoadReviews/2018/MAY2818/oflsa',
 'days': 21.0,
 'interrupt': False,
 'model_spec': '/Users/aldcroft/miniconda3/envs/ska3-shiny/lib/python3.6/site-packages/dpa_check/dpa_model_spec.json',
 'nlet_file': '/data/acis/LoadReviews/NonLoadTrackedEvents.txt',
 'oflsdir': '/Users/aldcroft/kady/ska/data/acis/LoadReviews/2018/MAY2818/oflsa',
 'outdir': 'headout',
 'pred_only': False,
 'run_start': '2018:142',
 'state_builder': 'acis',
 'traceback': True,
 'verbose': 1,
 'version': False}

ACISThermalCheck is using the 'acis' state builder.
Fetching telemetry between 2018:121:12:00:00.000 and 2018:142:12:00:00.000
Calculating DPA thermal model
GET_BS_CMDS - Using backstop file /Users/aldcroft/kady/ska/data/acis/LoadReviews/2018/MAY2118/ofls/CR141_0311.backstop
GET_BS_CMDS - Found 1492 backstop commands between 2018:141:03:35:03.643 and 2018:147:23:27:16.409
Fetching msid: dp_dpa_power over 2018:142:11:11:58.816 to 2018:154:23:46:30.816
Making temperature prediction plots
Writing plot file headout/1dpamzt.png
Writing plot file headout/pow_sim.png
Writing plot file headout/roll.png
Checking for limit violations
Writing states to headout/states.dat
Writing temperatures to headout/temperatures.dat
Getting commanded states between 2018:121:12:05:02.816 - 2018:142:11:55:42.816
Calculating DPA thermal model for validation
Fetching msid: 1dpamzt over 2018:121:11:43:10.816 to 2018:142:12:17:34.816
Fetching msid: dp_dpa_power over 2018:121:11:43:10.816 to 2018:142:12:17:34.816
Making DPA model validation plots and quantile table
Writing plot file headout/1dpamzt_valid.png
Writing plot file headout/1dpamzt_valid_hist.png
Writing plot file headout/pitch_valid.png
Writing plot file headout/pitch_valid_hist.png
Writing plot file headout/tscpos_valid.png
Writing plot file headout/tscpos_valid_hist.png
Writing plot file headout/roll_valid.png
Writing plot file headout/roll_valid_hist.png
Writing plot file headout/ccd_count_valid.png
Writing quantile table headout/validation_quant.csv
Writing validation data headout/validation_data.pkl
Checking for validation violations
WARNING: 1DPAMZT 1% quantile value of -2.81 exceeds limit of 2.00
WARNING: PITCH 1% quantile value of -3.882 exceeds limit of 3.00
WARNING: PITCH 99% quantile value of 3.867 exceeds limit of 3.00
validation warning(s) in output at headout
Writing report file headout/index.rst

The results are similar to the SQL builder diffs:

  • The states are all identical except for the start time of the first state.
  • The temperature predictions are identical after about a day from the start, and within 0.3 degC otherwise.

@@ -35,6 +36,9 @@
"less_equal": "<="}


use_noon_day_start()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment, like: Without using this then the epoch times in the existing model spec files are interpreted using midnight for YYYY:DDD dates while the regression data are for YYYY:DDD:12:00:00. This compatibility shim makes everything work without changing all the configured model spec files.

@@ -499,7 +503,8 @@ def write_states(self, outdir, states):
states_table['pitch'].format = '%.2f'
states_table['tstart'].format = '%.2f'
states_table['tstop'].format = '%.2f'
states_table.write(outfile, format='ascii', delimiter='\t', overwrite=True)
states_table.write(outfile, format='ascii', delimiter='\t', overwrite=True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs a comment. The fast writer fails because there is an object column (the transition keys) which makes it crash. The Python writer just takes the str of every entry and that gives the desired output.

@taldcroft taldcroft changed the title SHINY: WIP support migration from Chandra.cmd_states / Ska.ParseCM to kadi SHINY: support migration from Chandra.cmd_states / Ska.ParseCM to kadi Jun 30, 2020
@taldcroft
Copy link
Contributor Author

Ready for review (along with acisops/backstop_history#9), but I'm not in any particular hurry on this.

@jzuhone
Copy link
Member

jzuhone commented Jul 1, 2020

@taldcroft thanks for doing this--I started on it but you've gone way beyond what I was anticipating. This week is bad (vacation week for preschool) but I am going to look at this soon.

@@ -1009,7 +1014,7 @@ def _setup_proc_and_logger(self, args):
config_logging(args.outdir, args.verbose)

# Store info relevant to processing for use in outputs
proc = dict(run_user=os.environ['USER'],
proc = dict(run_user=getpass.getuser(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The USER environment variable doesn't exist on Mac. This version is platform-independent.

states[0].datestart = DateTime(states[0].tstart).date
states[-1].tstop = DateTime(datestop).secs + 0.01
states[-1].datestop = DateTime(states[-1].tstop).date
dt = 0.01 / 86400
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious if this is still really needed. Do you have a regression test case where this was required?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been there for eons. Not sure if we really need it or not.

@@ -29,26 +38,6 @@ def get_prediction_states(self, tlm):
"""
raise NotImplementedError("'StateBuilder should be subclassed!")

def _get_bs_cmds(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat gratuitous, but I moved this method back into SQLStateBuilder because it isn't called by ACISStateBuilder and it made me confused to have this as a base class.

@jzuhone
Copy link
Member

jzuhone commented Jul 16, 2020

@taldcroft how can I get an environment on my laptop to test this? I have a fully updated (I think) version of shiny, but I then find that I have to pip install Chandra.Time and kadi locally, but I also get an ImportError for Chandra.Maneuver (from within kadi) even though it is installed.

What is the appropriate way to update shiny? I did this:

conda update -c $ska3conda ska3-core-latest ska3-flight-latest

@jzuhone
Copy link
Member

jzuhone commented Jul 16, 2020

@taldcroft never mind, I am trying a fresh install of shiny and I'll get back to you.

@jzuhone
Copy link
Member

jzuhone commented Jul 16, 2020

Hi @taldcroft,

Ok, so I managed to get this and the corresponding PR for backstop_history working.

I ran the regression tests for dpa_check, and they are failing in this way:

E               AssertionError:
E               Not equal to tolerance rtol=1e-05, atol=0
E
E               Mismatched elements: 103 / 5529 (1.86%)
E               Max absolute difference: 0.03582576
E               Max relative difference: 0.001088
E                x: array([12.280151, 12.26951 , 12.258868, ..., 10.796232, 10.793319,
E                      10.793319])
E                y: array([12.280151, 12.26951 , 12.258868, ..., 10.796232, 10.793319,
E                      10.793319])

This is a pretty small error, but I think I'd like to understand where it's happening (maybe it's related to your differences you had above, I'm not sure). This is for the tests using the "SQL" state builder. The ACIS state builder just fails for a reason I need to dig into further.

@taldcroft
Copy link
Contributor Author

Some comments:

  • The shiny distribution has been in a state of extreme flux in the last couple of days as I have moved forward to Python 3.8 and was ironing out packaging kinks. I think that I have a good 3.8 build / package set for Mac and linux now. Windows has been thorny because of what seems like a bug in conda packaging (Numpy dependencies in pkgs/main/linux-64/repodata.json don't match index.json or recipe ContinuumIO/anaconda-issues#11920) that I found. But I believe I have something working for Windows now but it isn't avalable on the server yet.
  • You should create a fresh environment at this point.
  • About the diffs, I guess it is important to know if they are at the beginning related to a startup transient or later. Hopefully startup transient, in which case this would be acceptable. If the diffs are up to 0.03 deg throughout then that needs investigation. From my runs at least it was just startup (propagation) related.

@jzuhone
Copy link
Member

jzuhone commented Jul 17, 2020

Hi @taldcroft,

Can you do this run and check the differences in states.dat?

dpa_check --oflsdir=/data/acis/LoadReviews/2017/MAR0617/oflsa --state-builder=acis --out=test_dpa_mar0617a_acis

vs.

dpa_check --oflsdir=/data/acis/LoadReviews/2017/MAR0617/oflsa --state-builder=sql --out=test_dpa_mar0617a_sql

The "SQL" run is missing the ACIS stopScience command from the FEB2717 load previous to MAR0617, which occurs at the exact same time as the first command in the MAR0617 load, which is a SIM translation.

@taldcroft
Copy link
Contributor Author

@jzuhone - thanks for the feedback, very useful! The problem you showed is fixed with 36fdadd, which showed that my understanding of overlapping loads was incomplete. I thought it was just a single AOACRSTD but for MAR0617 there are a couple of subfunction disables after that.

Please have a look and see if you think the logic is OK. At some level it is just a matter of getting it to pass regressions because going forward we have RLTT commanding in the loads, so the new logic will never be hit.

@jzuhone
Copy link
Member

jzuhone commented Jul 28, 2020

@taldcroft I did some more digging into the regression testing of this PR (combined with acisops/backstop_history#9) and I can confirm that both the "SQL" state builder (which we should probably rename to "Kadi" or something similar now) and the "ACIS" state builder give the same results for the DPA model. I haven't checked the other models yet but I suspect I will get a similar answer.

Then I checked the new states.dat table against the "gold standard" answer version for a number of different loads which acis_thermal_check uses to test, and I see differences. As you already noted, the beginning state has a different start time for the reason you mentioned, which is fine. For loads which begin after an interrupt of some kind, I get some additional differences that I'd like you to check. I've put all of the loads in ~jzuhone/shiny_tests on the HEAD LAN, with this structure:

dpa_test_acis # ACIS state builder using "shiny"
dpa_test_sql  # SQL state builder using "shiny"
dpa_test_old  # answers produced by flight code

If I compare the "ACIS" states.dat table with the one currently produced by the "flight" code, I see things like the following:

  • diff dpa_test_acis/MAR0817B/states.dat dpa_test_old/MAR0817B/states.dat shows that they do not agree on the value of vid_board in the first 13 states.
  • diff dpa_test_acis/JUL2717A/states.dat dpa_test_old/JUL2717A/states.dat shows 13 states in the old table which are not in the new, and even more oddly it looks like in the new states table that tstart > tstop: 2017:207:23:37:34.816 > 2017:207:23:27:18.468
  • diff dpa_test_acis/SEP0417A/states.dat dpa_test_old/SEP0417A/states.dat shows a change in pitch by 0.1 degree in the first two entries

There are some other oddities but I figured this was enough to start--maybe some of these are expected, but I have no idea what is going on with the JUL2717A case.

@jzuhone
Copy link
Member

jzuhone commented Aug 3, 2020

@taldcroft any way I can help further with this? if you point me in the right direction regarding my last comment about the state differences I might be able to tease it out myself.

@taldcroft
Copy link
Contributor Author

@jzuhone - I'll have a look.

@taldcroft
Copy link
Contributor Author

FYI, I haven't gotten to this yet due to other priorities but it is still on a sticky on my home page.

@taldcroft
Copy link
Contributor Author

taldcroft commented Aug 12, 2020

@jzuhone -

diff dpa_test_acis/MAR0817B/states.dat dpa_test_old/MAR0817B/states.dat shows that they do not agree on the value of vid_board in the first 13 states.

This is the return-to-science load after the 2017:066 NSM. From what I can see in emails and the CAPs iFOT database, there was not an ECS measurement, which leads me to believe that the correct answer for the vid_board state at the beginning of the MAR0817B loads would be 0 (expected state after SCS107). The "old" states are showing 1 while the shiny states are showing 0.

My conclusion here is that the "gold standard" values do not correspond to the actual as-flown state of the spacecraft, BUT I could be missing something. The shiny version of continuity state is driven by the commands database which shows:

In [4]: sts                                                                                                                        
Out[4]: 
<Table length=10>
      datestart              datestop           tstart        tstop     vid_board trans_keys
        str21                 str21            float64       float64      int64     object  
--------------------- --------------------- ------------- ------------- --------- ----------
2017:066:00:00:00.000 2017:066:00:25:38.960 605232069.184 605233608.144         1           
2017:066:00:25:38.960 2017:067:05:07:06.506 605233608.144 605336895.690         0  vid_board
2017:067:05:07:06.506 2017:067:05:07:30.506 605336895.690 605336919.690         0  vid_board
2017:067:05:07:30.506 2017:067:11:57:43.506 605336919.690 605361532.690         1  vid_board

@jzuhone
Copy link
Member

jzuhone commented Aug 12, 2020

@taldcroft I can check. What about my second example? That's the really weird one to me.

@taldcroft
Copy link
Contributor Author

I'm going one at a time. 😄

@jeanconn
Copy link

And I assume in this thread that dpa_test_old "flight" is using the ACIS state builder even if that wasn't in use at the time of some of these historical schedules?

@jzuhone
Copy link
Member

jzuhone commented Aug 12, 2020

@jeanconn that's correct, but the other state builder also gives the same answer.

@jeanconn
Copy link

Thanks, good to know.

@taldcroft
Copy link
Contributor Author

diff dpa_test_acis/JUL2717A/states.dat dpa_test_old/JUL2717A/states.dat shows 13 states in the old table which are not in the new, and even more oddly it looks like in the new states table that tstart > tstop: 2017:207:23:37:34.816 > 2017:207:23:27:18.468

Good to get some real-world testing of kadi states. This was a real bug related to starting the states request mid-maneuver. There was actually a test of exactly this situation (I did worry about this case) but it turned out the test did not hit this bug for obscure reasons.

Anyway this is fixed in sot/kadi#176 and I confirmed that now the output states look reasonable. There are still a number of states that only appear in the "old" states file, but these are superfluous since they all occur before the run start time. This might be related to the availability of telemetry at the time it was run or it might be related to how Chandra.cmd_states goes back in time a bit further.

In any case @jzuhone, I will put the new version of kadi into shiny and you can update that package and give it another go.

And finally I will look at the 3rd point you mentioned.

@taldcroft
Copy link
Contributor Author

diff dpa_test_acis/SEP0417A/states.dat dpa_test_old/SEP0417A/states.dat shows a change in pitch by 0.1 degree in the first two entries

This goes away with the latest version of kadi. Yay.

@jzuhone
Copy link
Member

jzuhone commented Aug 12, 2020

@taldcroft thanks! I'll update and check things over here and let you know.

@taldcroft
Copy link
Contributor Author

@jzuhone - I've updated the shiny package repository with the new version and updated the installation on HEAD. Locally:

conda update --override-channels \
 -c https://icxc.cfa.harvard.edu/aspect/ska3-conda/shiny kadi

@jzuhone
Copy link
Member

jzuhone commented Aug 14, 2020

Hi @taldcroft,

Ok, I've looked at everything for our 4 production models and I am satisfied now with the results. This is good to go.

@taldcroft taldcroft merged commit 1a661d5 into acisops:master Aug 14, 2020
@taldcroft taldcroft deleted the shiny branch August 14, 2020 21:41
@taldcroft taldcroft mentioned this pull request Nov 25, 2020
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants