Bréboin Alexandre, Delarue Simon, Nourry Mathias, Pannier Valentin
We are four student pursuing a Post Master's degree in Big Data at Télécom Paris, all coming to this Hackathon with different background :
- Alexandre : "Engineer with a Managing and Consulting background, I'm am delighted to work on my technical skills with this Hi!ckathon !"
- Simon : "Engineer with 5 years experience in banking sector and AI enthusiast, I'm pleased to join this Hi!ckaton !"
- Mathias : "Engineer with 3 years experience in energy sector and management of IT solutions in indsutrial field, this Hi!ckathon was a great way to apply my knowledge !"
- Valentin : "Holding a Master's degree in Physics, I'm moving to the Data domain especially dealing with Deep learning approaches."
To tackle this interesting challenge, we first focused on understanding the context of energy supply when dealing with microgrids and did exploratory data analysis to get a good intuition about the different metrics. Then, as our team contains several data scientists, we chose to investigate both the "rules based" approach and the "simple" Reinforcement approach. Indeed, it allowed us to compare the results at first round, then to improve our learning agent (RL approach) with the context knowledge we gathered. Finally, as the challenge get to an end, we developped a Deep Q-Learning approach which indeed lowered our final budgets on buildings.
We developped an algorithm able to learn by itself an improved strategy for energy supply management, for 3 different buildings. This algorithm found an efficient strategy to use energy, in terms of environmental aspect - as we focused on frugality - as well as financial aspect, since one of its objective is reducing final energy cost.
After a quick look at the data, we used the "rules-based" naïve approach proposed in the hackaton to get a baseline model. The purpose of this model is to have a quick intuition about the possible results.
Then, to solve the probem raised in the hackaton, we've decided to work in parrallel on two sides of the subject :
- The understanding of data. We focused on developping human-based rules to manage the energy supply, which were guidelines to have in mind when developping and testing autonomous agent
- The implementation of a reinforcement learning based approach, in which we gave an agent :
- some basic rules to manage supply
- some crafted rules to try and find by itself the best strategy to minimize final cost for energy
- penalization at Q-table initialization, in order to drive our agent's choices to maximize specific comportements (battery use for example)
- some basic rules to manage supply
Since the results of this approach were outperforming baseline, yet still quite low regarding the "theoretical" values given in the project, we developped a third approach, based on Deep Q-Network. In this method, we only gave our agent the most simplest rules and used neural network power to improve its long term strategy.
Crafted-rules
- We reduced the total price by trying and remove a maximum of the loss-load of our system
- We focused on reducing the pv_curtailed of our system in order to maximize the use of already available ressources
- We focused on the agent's ability to charge/use its battery when needed
Q-table penalization
The buildings 1 & 2 Q-tables were initialized not completely randomly, but rather with penalization if battery energy is not sufficiently used. Indeed, we saw that the agent had a particuliar approach on this use and was often trying to import energy from the grid, rather than using its own capacities.
As the 3d building was specific in its energy management possibilities (genset available, yet with a bad ecological footprint, and power cuts), we decided to initialize the Q-table (state/action cross-product) not completely randomly, but rather with penalization on the genset actions.
This drived the choices of our agent, in the direction of a better ecological impact on the long term, by reducing its reward policy when he chose to use the genset. Beyond the ecological part of the problem, this method showed slightly better cost after a year of choices by our agent.
Architecture
We chose an architecture with respect to the frugality/performance trade-off :
- Layers : 1 hidden layer, 1 input layer, 1 output layer, all with 64 units
- Activation function : ReLU
- Batch size : 64
- Optimize : Adam
The baseline approach results on test buildings are as following :
- Building 1 : 14 399.5€
- Building 2 : 48 012.3€
- Building 3 : 43 901.5€
In this method, we tried to find the best trade-off between frugality (time of training agent and cost generated by CPU) and performance. Thus, we trained our models on 15 episodes for all 3 buildings.
The "simple" reinfocement learning approach results on the test buildings are as following :
- Building 1 : 8 728.2€
- Building 2 : 29 951.0€
- Building 3 : 37 758.1€
Frugality
- Training : 350.6s
- Test : 21.8s
This approach gave us the best results on our problem by far. Finally, the Deep reinfocement learning approach results on the test buildings are as following (in parenthesis, relative error with theoretical values):
- Building 1 : 4 119.9€ (+12.3%)
- Building 2 : 13 749.1€ (+12.4%)
- Building 3 : 16 429.0€ (+19.9%)
Frugality
- Training : 6696s
- Test : 5s
During this project, we had troubles training our agent with the poor state dictionnary. Indeed, its low range of values did not reflect some interesting aspects of the problem, like the price or the level of curtailement. An incomplete answer to this is the method we developped ; adding crafted rules to our actions. But, to go further, we could use a Machine Learning model to predict actions based on features newly created to retrieve these information.
Another improvement would be to train our agent on different windows of time. Indeed, at the moment, it learns on windows of 1 hour, but we could hypothetize that similar patterns in energy demand (and thus supply need) would emerge only on day-long windows (night and days) or even week-long windows (working day vs week ends). This could potentially gives our agent more intuition about the needs in energy at specific moments, and then lead it to learn more complex strategies in term of price/efficiency optimization.
Files
- data/ : Directory containing pickle files for building 1, 2 and 3
- results/ : Directory containing json files for results (Deep RL) on building 1, 2 and 3
Requirements.txt
: Packages requirementsDiscreteEnvironment.py
: Contains the originalDiscreteEnvironment
classDiscreteEnvironment_modified.py
: Contains theDiscreteEnvironment
class, with crafted rulesSubmission_notebook.ipynb
: Notebook containing code for "Simple" and Deep reinforcement learning implementations. Agents are training and testing on all 3 buildings.
Usage
Clone this repository and run the Submission_notebook.ipynb
notebook.