Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pvdaq io functions #664

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

pvdaq io functions #664

wants to merge 7 commits into from

Conversation

bmeyers
Copy link

@bmeyers bmeyers commented Feb 18, 2019

  • Closes issue PVDAQ API IO tools #663
  • I am familiar with the contributing guidelines.
  • Fully tested. Added and/or modified tests to ensure correct behavior for all reasonable inputs. Tests (usually) must pass on the TravisCI and Appveyor testing services.
  • Updates entries to docs/sphinx/source/api.rst for API changes.
  • Adds description and name entries in the appropriate docs/sphinx/source/whatsnew file for all changes.
  • Code quality and style is sufficient. Passes LGTM and SticklerCI checks.
  • New code is fully documented. Includes sphinx/numpydoc compliant docstrings and comments in the code where necessary.
  • Pull request is nearly complete and ready for detailed review.

Brief description of the problem and proposed solution (if not already fully described in the issue linked to above): Functions for making data requests to NREL's PVDAQ API. I'm using the year CSV file API, which is the most efficient way of obtaining multi-year, sub-hourly data.

@bmeyers
Copy link
Author

bmeyers commented Feb 18, 2019

I will fix the lint errors. Other suggestions on what to complete before this can be merged are appreciated. Thanks!

@bmeyers
Copy link
Author

bmeyers commented Feb 18, 2019

Also, this function relies on requests. What should I do to capture that dependency?

@mikofski
Copy link
Member

Maybe take a quick look at the other iotools to see how they handle dependencies?

@wholmgren
Copy link
Member

Thanks @bmeyers!

I think requests is already pulled into our test environments for another package. It's probably ok to make it an explicit requirement. The tmy module calls on urllib and I'd be happy to replace that with requests if it's an explicit dependency.

Please add get_pvdaq_data to https://github.com/pvlib/pvlib-python/blob/master/pvlib/iotools/__init__.py

Have you considered any simple methods for testing? It's not necessary to test the live connection, but it's good to assert that the format is read in the way you expect it to be on a very short file.

The progress bar is interesting. Do you know how it behaves in the different environments that are often used by pvlib community (python terminal, ipython terminal, notebooks, scripts w/ w/o logging)?

@wholmgren wholmgren added this to the 0.6.2 milestone Feb 18, 2019
@wholmgren wholmgren added the io label Feb 18, 2019
@cwhanse
Copy link
Member

cwhanse commented Feb 19, 2019

The standardize_time_axis function could have use outside the pvdaq. What does the group think about an iotools.util module?

@bmeyers
Copy link
Author

bmeyers commented Feb 19, 2019

@wholmgren:
For testing, I could see two options. (1) Test the standardize_time_axis function on a small, minimal test data frame, generated by code, or (2) test a live connection using the DEMO_KEY. Do you have a preference?

The progress bar has been tested in python terminal, ipython terminal, notebooks, and with scripts. My understanding is that server logs will capture the output from the sys.stdout.write command, but I have not tested this. I also haven't tested server deployment without logging, but I believe that the sys.stdout.write command would have no effect.

Should I add the requests dependency to setup.py? Is it an "extra"?

@cwhanse:
Thanks! I'm happy to start a iotools.util module with this in it. The function is from this package I recently put together, and there might be some other nuggets in there that could start to fill out an iotools.util modules. Shall we make this part of the same PR?

…progress bar is not overwritten by a later print statement in a user script. Also added an if __name__ == "__main__" block for testing purposes
@wholmgren
Copy link
Member

The resampling part of standardize_time_axis could be useful in more situations, but I don't know if it should be applied by default.

What's the advantage of the set_index part of this function over calling the pandas functions directly? In my experience using to_datetime is either straightforward (single column in a format very close to ISO 8601) or a pain that is hard to generalize (e.g. #666).

For testing, we'd want a pvlib/test/test_pvdaq.py module that covers most if not all use cases. I'm trying to be less picky about code and test code in iotools to encourage more contributions. Testing small data files bundled in pvlib/data is usually preferable to testing network connections. Tests that need network connections should be wrapped in the pandas network decorator (see the test_tmy module for example).

@bmeyers
Copy link
Author

bmeyers commented Feb 20, 2019

Hi Will, you're right, I should have made the usage of standardize_time_axis user controllable. Just pushed a commit to do just that. I think having the default be True is nice, but I'm happy to change it if you'd rather the default behavior be not to apply the function.

The set_index makes the Date-Time column the axis for the data frame. This allow the usage of the reindex method later on when applying the standardized time index. What do you mean by calling pandas functions directly? I also rely on to_datetime for parsing the datetime strings. Is that a problem? I haven't hit a request with the PVDAQ API where that didn't work.

10-4 on testing.

@wholmgren
Copy link
Member

What do you mean by calling pandas functions directly?

Hopefully the discussion below is more clear.

The first part of your function contains these lines to create a DatetimeIndex:

    # convert index to timeseries
    try:
        df[datetimekey] = pd.to_datetime(df[datetimekey])
        df.set_index('Date-Time', inplace=True)
    except KeyError:
        time_cols = [col for col in df.columns
                     if np.logical_or('Time' in col, 'time' in col)]
        key = time_cols[0]
        df[datetimekey] = pd.to_datetime(df[key])
        df.set_index(datetimekey, inplace=True)

The first part of the try block is straightforward and I'm questioning if putting it in a pvlib.iotools.utils function is actually helpful. Beyond that, I'm wondering two related things about this part of the function:

  1. Is it general enough to be useful as its own function in a utils module?
  2. Assuming it is general enough, does it actually improve code clarity in other parsing functions?

Here are links to all of the datetime parsing code in iotools:

midc
srml
surfrad
tmy
proposed CRN parser in #666

It seems to me that the answer to question 1 is "not at this point". I don't yet know the answer to 2, but I think writing generic parsing code is hard!

I also rely on to_datetime for parsing the datetime strings. Is that a problem?

No problem. I think that's the right way to do it.

I think the 2nd half of the function merits its own function that assumes a Series/DataFrame that already has a DatetimeIndex.

years will be concatenated into a single data frame
delim: string
The deliminator used in the CSV file being requested

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does pvdaq ever use other delimiters?



def get_pvdaq_data(sysid=2, api_key='DEMO_KEY', year=2011, delim=',',
standardize=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest year before api_key

for item in req_params.items()]
req_url = base_url + '&'.join(param_list)
response = requests.get(req_url)
if int(response.status_code) != 200:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about raising an exception?

@@ -7,3 +7,4 @@
from pvlib.iotools.midc import read_midc_raw_data_from_nrel # noqa: F401
from pvlib.iotools.ecmwf_macc import read_ecmwf_macc # noqa: F401
from pvlib.iotools.ecmwf_macc import get_ecmwf_macc # noqa: F401
from pvlib.iotools.pvdaq import get_pvdaq_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just append # noqa: F401 to tell stickler to ignore -- this is solely for the api

@wholmgren wholmgren modified the milestones: 0.6.2, 0.7.0 May 15, 2019
@wholmgren wholmgren modified the milestones: 0.7.0, 0.8.0 Nov 26, 2019
@mikofski
Copy link
Member

@bmeyers any update on this? There's renewed interest from @shirubana (Silvana @ NREL) 😁

@bmeyers
Copy link
Author

bmeyers commented Jan 22, 2020

@mikofski Sorry all for the lack of progress on my point. After PVSC abstracts get submitted, I'll pick this up again. I definitely want to get this completed.

@wholmgren
Copy link
Member

I implemented a couple of PVDAQ fetch functions over in our Solar Forecast Arbiter project:

https://github.com/SolarArbiter/solarforecastarbiter-core/blob/master/solarforecastarbiter/io/fetch/pvdaq.py

https://github.com/SolarArbiter/solarforecastarbiter-core/blob/master/solarforecastarbiter/io/fetch/tests/test_pvdaq.py

I based them on @bmeyers code but removed most of the DatetimeIndex handling code and the progress bar. I'll describe my motivations here in case they're useful for advancing this PR:

I didn't find that the date/time key control was needed to handle the publicly accessible data. Perhaps it's needed for non-public data?

I also wanted to control how I handled the resampling/reindexing based on my requirements, so I left that to my own function. For example, I wanted to resample sub 1 minute data into 1 minute data using resample.mean but wanted to use resample.first for 5 and 15 minute data with the occasional odd reading.

Finally, while I'd like to return a localized DatetimeIndex, the time zone of the data is not clear. It appears to be in local standard time, but I don't see any metadata or documentation about this (I could have missed it). So, it appears it's up to the user to decipher the UTC offset based on the lat/lon of the separately accessible metadata.

I still think it would be great if the functionality ended up in pvlib in the long run.

@wholmgren wholmgren added the solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter label May 12, 2020
@mikofski
Copy link
Member

My vote would be to add the solar arbiter pvdaq implementation to iotools and close this PR.

Usage is the best user story, pre-mature optimization is the killer (this PR case in point).

I would rather see this in pvlib, observe usage, and have users create issues or discuss on groups/SO, and then we modify as needed, than not have it all or living in PR state indefinitely.

@wholmgren wholmgren modified the milestones: 0.8.0, Someday Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement io solarfx2 DOE SETO Solar Forecasting 2 / Solar Forecast Arbiter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants