Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable "RNN-like" behaviour for sequences: Each cross-attention gets a different input timestep #1

Open
JackKelly opened this issue Jul 27, 2021 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@JackKelly
Copy link
Member

JackKelly commented Jul 27, 2021

Should be fairly simple to implement by just modifying the for loop at the bottom of Perceiver.forward(). Note that x is the latents.

Here's a quick diagram. My additions are in black.

(I've removed the "weight sharing" from the diagram, but weight sharing would absolutely still be part of this)

image

The paper talks about using different timestep inputs. But I don't think the paper talks about using different outputs for each timestep. Maybe that's a bad idea :)

Related to: openclimatefix/predict_pv_yield#35

Background

The ultimate aim is to predict solar electricity generation for a single solar system, over the next few hours, every five minutes. The inputs to the model will include 5-minutely satellite data, real-time power from thousands of solar PV systems, etc.

Whilst satellite imagery is probably great for telling us that "there's a big dark cloud coming in 25 minutes", satellite imagery probably doesn't tell us exactly how much sunlight will get through that big dark cloud.. so we need to blend satellite data with measurements from the ground. IMHO, this is where ML can really shine: Combing multiple data sources, many of which will be quite low quality data sources.

The input to the "Perceiver RNN" would include:

  • Recent history (for the last hour or so):
    • Satellite data (1 image every 5 minutes). Probably just a 64x64 crop of satellite imagery, centred on the solar system that we're making predictions for. 12 channels.
    • Ground measurements within the geospatial extent of the satellite imagery:
      • Solar electricity generation from all solar systems with the region of interest
      • Air-quality measurements
      • Rainfall radar
      • Weather measurements (temperature, irradiance, wind speed, etc.)
  • Predictions of the next few hours:
    • Predicted satellite imagery for the next few hours (using SatFlow)
    • Numerical weather predictions

(I'm really excited about The Perceiver because our data inputs are "multi-modal", and The Perceiver works really well for multi-modal perception!)

So, maybe we'd actually have two "Perceiver RNNs" (i.e. weights would be shared within the encoder, and within the decoder. But the encoder and decoder would have different weights):

  1. An "encoder" which gets each timestep of the recent history. The per-timestep outputs would be ignored. The final output would form the latent array for the "decoder". The job of the "encoder" is to "perceive" the recent context, e.g. to use ground-based measurements to calibrate how "dark" each cloud is in the satellite imagery.
  2. A "decoder" which gets each timestep of our predictions, and outputs each timestep of the predicted solar electricity generation.

One problem with this is that information can only flow in one direction: forwards. So it might be interesting to add a self-attention block which functions over the time dimension:

image

Maybe the initial latent is the embedding for the solar PV system of interest.

At each input tinestep, the PV would be a concatenation of the power, the embedding of the PV system ID, and the geospatial location.

@JackKelly JackKelly added the enhancement New feature or request label Jul 27, 2021
@JackKelly JackKelly self-assigned this Jul 27, 2021
@JackKelly JackKelly changed the title Enable "RNN-like" behaviour for sequences: Each cross-attention gets a different input timestep; and each output from the latent self-attention stack is made available Enable "RNN-like" behaviour for sequences: Each cross-attention gets a different input timestep; & output the latent self-attention stack Jul 27, 2021
@JackKelly JackKelly changed the title Enable "RNN-like" behaviour for sequences: Each cross-attention gets a different input timestep; & output the latent self-attention stack Enable "RNN-like" behaviour for sequences: Each cross-attention gets a different input timestep Jul 27, 2021
@tcapelle
Copy link

tcapelle commented Jul 27, 2021

Why not just let the sequences attend to the full input (and latent) like in the Informer implementation https://github.com/zhouhaoyi/Informer2020
My bad, this is trying directly this type of arch!

@danstowell
Copy link

Could you add a line saying what the output is, please? "Output for timestep 0" is... Is it a single prediction for a fixed lookahead amount? Is it a whole timeseries of prediction from now until now+delta?

@JackKelly
Copy link
Member Author

Sure! Here's a "zoomed out" diagram showing the encoder & decoder (but with just two timesteps of history (t-1 and t0) and two timesteps of predictions (t1 and t2). The idea is that it'll create predictions for every timesteps in a single forward pass. For example, if we were predicting 2 hours ahead at 5-minute intervals, the decoder would output 24 timesteps at once.

image

@JackKelly
Copy link
Member Author

JackKelly commented Jul 27, 2021

Some more details:

Output of the decoder

This model just predicts PV power for a single PV system at a time. Each timestep would specify a probability distribution over normalised PV power (PV power in kW divided by max(PV power)). I'm planning to use a mixture density network (I know MDNs aren't very popular, but I've found they work quite well :) ).

Inputs

Recent history (the inputs for the "encoder")

Each timestep gets a different byte array with M rows and C columns.

  • Satellite imagery: 64 x 64 crop of satellite imagery centred over the solar PV system we're making the predictions for. Each row will be a concatenation of the pixel values for the 12 satellite channels, and position encoding for the geospatial position of the pixel (basically its lat, lon) and the relative pixel position within the image ("near the top left"), and the time of day and day of year. We'll try using satellite imagery that's been reprojected to OSGB. But I'm also eager to try "raw" (unprojected) satellite imagery (one of the lovely things about transformers is the data doesn't have to be on a regular grid - see this issue).
  • Solar electricity generation from all solar systems within the region of interest, up to a maximum of, say, 128 PV systems. Each row will be a concatenation of the normalised PV power, the embedding of the PV system ID, and the geospatial location of the PV system. We want to include an embedding of the PV system ID so the model can learn that some PV systems behave differently (e.g. due to shading, performance issues, etc.). The recent history for the PV system we're making predictions for will be included.
  • Air quality measurements, rainfall radar, and weather measurements will be similarly encoded to the PV power.
  • Dense, gridded numerical weather predictions will be encoded similarly to the satellite imagery.
  • Clear-sky irradiance

Perhaps each input "mode" (PV or satellite or air quality, etc.) needs to also be identified by a learned embedding, which would be concatenated onto each row. Or maybe that's not necessary. Or maybe we can just use a simple one-hot encoding!

Forecasts (the inputs for the "decoder")

  • Predictions of satellite imagery from SatFlow. Encoded the same way as the real satellite imagery fed into the encoder.
  • Dense, gridded numerical weather predictions.
  • Clear-sky irradiance

The initial values for the latent array

Could be all-zeros; or learnt; or maybe the embedding of the solar PV system we're making the forecast for

@JackKelly
Copy link
Member Author

See openclimatefix/predict_pv_yield#68 for latest ideas about using Perceiver IO for solar PV nowcasting.

@JackKelly
Copy link
Member Author

TBH, I'm slightly de-prioritising the idea of saving the separate latent arrays from each cross-attend. Instead, from the Perceiver IO paper, it feels like we're probably better off feeding in all our data as a big chunk. But I'll leave this issue open becuase I would definitely like to give this a go, if we get time. It's just probably not a priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants