Skip to content

Commit

Permalink
new readme
Browse files Browse the repository at this point in the history
  • Loading branch information
OuyangWenyu committed May 30, 2024
1 parent eb9de98 commit 0d22e3d
Show file tree
Hide file tree
Showing 40 changed files with 24 additions and 850 deletions.
5 changes: 1 addition & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -214,15 +214,12 @@ fabric.properties
# Editor-based Rest Client
.idea/httpRequests

# trained models
models/

*.xlsx
.vscode
/test/SCEUA_*.csv
/test/SCEUA_*
/hydromodel/app/*.csv
/test/test_data_camels_cc.py
/example/*
/results/
/results/*
/.hydrodataset*/
44 changes: 5 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,16 @@
<!--
* @Author: Wenyu Ouyang
* @Date: 2023-10-29 17:35:04
* @LastEditTime: 2024-02-12 15:49:47
* @LastEditTime: 2024-05-30 09:06:30
* @LastEditors: Wenyu Ouyang
* @Description: Hydro forecast
* @FilePath: \HydroForecast\README.md
* @FilePath: \hydroevaluate\README.md
* Copyright (c) 2023-2024 Wenyu Ouyang. All rights reserved.
-->
# HydroForecast

It's a project for hydrological forecasting based on big data and artificial intelligence technology (especially deep learning). The project is still in progress, and the current version is only a prototype.
This project aims to provide a unified evaluation framework for hydrological models, facilitating the evaluation and comparison of different models.

## Introduction
Currently, both physically-based and machine learning-based hydrological models heavily rely on existing datasets for evaluation, without considering the performance of hydrological forecasts. For example, many papers divide the CAMELS dataset into training and testing sets, train models on the training set, and evaluate them on the testing set. However, in actual forecasting, it is common to distinguish between observed rainfall and forecasted rainfall. Models should not have access to any observed data within the forecast period. Therefore, a more realistic evaluation approach would be to evaluate models without using any observed data as input within the forecast period. While the differences may not be significant when comparing different models, this evaluation approach is more appropriate for assessing actual forecasting performance.

The project is based on the [PyTorch](https://pytorch.org/) framework, and the main code is written in Python.

It is divided into two parts: data processing and model training. The data processing part is currently mainly based on our [hydrodata](https://github.com/iHeadWater/hydrodata) project, which is used to download, process, read and write public data source related to flood forecasting. The model training part is mainly based on the [torchhydro](https://github.com/iHeadWater/torchhydro) and [hydromodel](https://github.com/iHeadWater/hydromodel) framework, which is our self-developed framework focusing on hydrological forecasting

The idea of the project is to use the public data source from data-rich regions such as United States and Europe to train a foundation model. Then we use the trained model to predict river stage or discharge in data-poor regions such as China (actually ther are much data in China, but most are not accessible to the public). The current version is mainly based on Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) model with precipitation from [Global Precipitation Measurement (GPM)](https://gpm.nasa.gov/) and Global Forecast System (GFS) as input and river stage or discharge as output.

## Installation

The project is based on Python 3.10. The required packages are listed in `env.yml`. You can install them by running the following command:

```bash
# simply install a new environment AIFF
conda env create -f env.yml
# then we install packages developed by ourselves as follows
conda activate HydroForecast
# xxx means your Github username; xxxxx means the name of the package; xx means the git-branch of the package
pip install git+ssh://git@github.com/xxx/xxxxx.git@xx
```

The packages we developed are listed as follows in [iHeadWater](https://github.com/iHeadWater):

```bash
torchhydro
hydromodel
```

We'd better use the latest version of the packages. You can check the version of the packages in Github.

## Usage

The project is still in progress, and the current version is only a prototype. The main code is in the root folder. You can run the code by running the following command:

```bash
python main.py
```
Furthermore, there is a lot of research on model evaluation, and we will continuously incorporate relevant studies into the program to explore the topic more comprehensively.
110 changes: 16 additions & 94 deletions README_CN.md
Original file line number Diff line number Diff line change
@@ -1,94 +1,16 @@

# AIFloodForecast

## 使用方法


1.创建AIFF虚拟环境
```bash
conda env create -f env.yml
```
2.激活环境
```bash
conda activate AIFF
```
3.安装torchhydro
```bash
pip install git+ssh://git@github.com/iHeadWater/torchhydro.git@dev
```
4.对照v001.yml创建新的yml文件(如v002.yml)
```bash
# record version info here as the following format:
data_cfgs:
sub: "/v002"
source: "GPM_GFS"
source_path: "gpm_gfs_data"
source_region: "US"
download: 0
ctx: [0]
dataset: "GPM_GFS_Dataset"
sampler: "WuSampler"
scaler: "GPM_GFS_Scaler"

model_cfgs:
model_name: "SPPLSTM"
model_hyperparam:
seq_length: 168
forecast_length: 24
n_output: 1
n_hidden_states: 80

training_cfgs:
train_epoch: 50
save_epoch: 1
te: 50
batch_size: 256
loss_func: "RMSESum"
opt: "Adam"
lr_scheduler: {1: 1e-4, 2: 5e-5, 3: 1e-5}
which_first_tensor: "sequence"

train_period: ["2016-08-01", "2016-12-31"]
test_period: ["2016-08-01", "2016-12-31"]
valid_period: ["2016-08-01", "2016-12-31"]

gage_id:
- '21401550'

var_out: ["streamflow"]
var_t: ["tp"]
```
5.在main.py中最后一行调用yml文件
```bash
run_normal_dl(cfg_path_dir + "v002.yml")
```
6.进入到torchhydro源代码(data_source_gpm_gfs.py)修改数据路径(有三处地方需要修改)
```bash
#os.path.join中修改为读取流量nc文件的路径
def read_streamflow_xrdataset(...):
...
streamflow = xr.open_dataset(
os.path.join())
...
```
```bash
#os.path.join中修改为读取降水nc文件的路径
def read_gpm_xrdataset(...):
...
for basin in gage_id_lst:
gpm = xr.open_dataset(
os.path.join())
...
```
```bash
#os.path.join中修改为读取attributes的nc文件的路径
def read_attr_xrdataset(...):
...
attr = xr.open_dataset(
os.path.join())
...
```
7.运行main.py文件
```bash
python main.py
```
<!--
* @Author: Wenyu Ouyang
* @Date: 2024-02-12 09:52:49
* @LastEditTime: 2024-05-30 09:05:20
* @LastEditors: Wenyu Ouyang
* @Description: 中文版README
* @FilePath: \hydroevaluate\README_CN.md
* Copyright (c) 2023-2024 Wenyu Ouyang. All rights reserved.
-->
# hydroevaluate

本项目旨在为水文模型提供一个统一的评估框架,以便于对水文模型进行评估和比较。

现阶段不论是基于物理机制还是机器学习的水文模型,主要都依托于现成的数据集进行评估,并没有针对水文预报性能进行衡量。比如很多论文都会在CAMELS数据集上划分训练集、测试集,在训练集上训练的模型在测试集上评估,然而实际预报中,最典型的就是会区分落地雨和预报雨,模型在预见期内是不能有任何实际观测数据的,所以比较实际的评估方式应该是在预见期内不使用任何观测为输入的条件下进行评估。当然,从对比不同模型的角度来说,这其中的差别可能并不大,但是从实际预报性能的评估角度来说,这种评估方式是更加合理的。

此外,关于模型评估的研究也很多,后续会不断补充进相关程序以更深入地探讨。
6 changes: 2 additions & 4 deletions env.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
name: HydroForecast
name: hydroevaluate
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- python=3.10
- numpy=1.23
- pytorch=1.12
- python
- pip
- pip:
- hydromodel
Expand Down
177 changes: 0 additions & 177 deletions results/v001/01_December_202311_51AM.json

This file was deleted.

Binary file removed results/v001/01_December_202311_51AM_model.pth
Binary file not shown.
Loading

0 comments on commit 0d22e3d

Please sign in to comment.