Skip to content

Download And Preprocess Russian Meteorological Data

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

atsyplenkov/rp5pik

Repository files navigation

rp5pik

DOI Lifecycle: experimental pkgcheck CRAN status GitHub R package version GitHub last commit

The rp5pik package provides a set of functions to download and preprocess meteorological data from http://www.pogodaiklimat.ru/

Installation

You can install the development version of rp5pik from GitHub with:

# install.packages("devtools")
devtools::install_github("atsyplenkov/rp5pik")

# OR

# install.packages("remotes")
remotes::install_github("atsyplenkov/rp5pik")

Examples

1. Data download

Below is an example for rp_parse_pik functions. It allows you to download meteo data at 3-hour temporal resolution for various stations using their WMO ID from http://www.pogodaiklimat.ru/:

library(rp5pik)

example <-
  rp_parse_pik(
    wmo_id = c("20069", "27524"),
    start_date = "2022-05-01",
    end_date = "2022-05-31"
  )
#> ⠙ 1/2 ETA:  4s | Downloading data                                    Parsing data

example
#> # A tibble: 496 × 11
#>    wmo   datetime_utc           ta    td    rh    ps   psl  prec windd
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#>  1 20069 2022-05-01 00:00:00  -7.9  -9.1    91  993   996.   0.8    90
#>  2 20069 2022-05-01 03:00:00  -7    -8.2    91  994.  997.  NA      90
#>  3 20069 2022-05-01 06:00:00  -7.2  -8.4    91  994.  997.  NA      90
#>  4 20069 2022-05-01 09:00:00  -7.7  -8.8    92  994.  998.  NA      90
#>  5 20069 2022-05-01 12:00:00  -8.4  -9.5    92  995.  998.   0.6    90
#>  6 20069 2022-05-01 15:00:00  -9.8 -11      91  995   998.  NA      45
#>  7 20069 2022-05-01 18:00:00 -11.3 -12.7    89  995.  998.  NA      45
#>  8 20069 2022-05-01 21:00:00 -14.2 -16      86  995.  999.  NA      45
#>  9 20069 2022-05-02 00:00:00 -14.3 -15.9    88  995.  998.  NA      45
#> 10 20069 2022-05-02 03:00:00 -11.8 -14.4    81  994.  997.  NA       0
#> # ℹ 486 more rows
#> # ℹ 2 more variables: winds_mean <dbl>, winds_max <dbl>

List of available variables:

  • wmo character. WMO index of the meteostation

  • datetime_utc POSIXct. Date and Time of the measurement at UTC

  • ta numeric. Air temperature at 2m above the surface, °C

  • td numeric. Dew point, °C

  • rh numeric. Relative humidity at 2m above the surface, %

  • ps numeric. Atmosphere pressure at meteostation, hPa

  • psl numeric. Atmosphere pressure adjusted to the height of mean sea level, hPa

  • prec numeric. Cumulative precipitation for the last 12 hours, mm

  • windd integer. Wind direction, deg

  • winds_mean numeric. Average 10-min wind speed, m/s

  • winds_max numeric. Maximum wind speed, m/s

We can visualize the example dataset using ggplot2 as follows:

library(ggplot2)

example |> 
  ggplot(
    aes(
      x = datetime_utc,
      y = ta,
      group = wmo
    )
  ) +
  geom_line(aes(color = wmo)) +
  labs(
    x = "",
    y = "Average Temperature, °C"
  ) +
  theme_minimal()

2. Data preprocessing

Since the downloaded with rp_parse_pik data contains raw data, it requires additional checking and cleaning. We suggest to explore the raw dataset by yourselves before any further manipulations.

However, the rp5pik package has a function to aggregate raw data on daily (24h) or semi-daily (12h) periods. The rp_aggregate_pik function removes known error codes from precipitation data (699 values). Additionally, it calculates daily precipitation sums based on measured precipitation at 06 UTC and 18 UTC in European part of Russia (see meteostation manuals for more info).

⚠ As of 2023-06-28 this function works only with Moscow timezone.

This is how you can aggregate data daily:

library(dplyr)

example_daily <- 
  example |> 
  rp_aggregate_pik(.period = "24h") |> 
  group_split(wmo)

example_daily
#> <list_of<
#>   tbl_df<
#>     wmo       : character
#>     date      : date
#>     p         : double
#>     ta        : double
#>     td        : double
#>     rh        : double
#>     ps        : double
#>     psl       : double
#>     windd     : double
#>     winds_mean: double
#>     winds_max : double
#>   >
#> >[2]>
#> [[1]]
#> # A tibble: 32 × 11
#>    wmo   date           p     ta     td    rh    ps   psl windd winds_mean
#>    <chr> <date>     <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>      <dbl>
#>  1 20069 2022-05-01    NA  -8.47  -9.67  91    994.  998.  77.1       9.14
#>  2 20069 2022-05-02    NA -12.9  -14.9   85    992.  995. 129.        6.12
#>  3 20069 2022-05-03    NA -16.0  -19.0   77.8  988.  992. 152.        2.25
#>  4 20069 2022-05-04    NA -14.3  -17.4   77.2  996.  999. 248.        5.38
#>  5 20069 2022-05-05    NA -14.2  -16.0   86.6  998. 1002.  67.5       3.38
#>  6 20069 2022-05-06    NA -12.6  -14.3   87.2  994.  997. 253.        4.12
#>  7 20069 2022-05-07    NA  -9.31 -11.1   86.9  999. 1002. 231.        3.12
#>  8 20069 2022-05-08    NA  -6.88  -8.25  90.1  999. 1003. 152.        9.25
#>  9 20069 2022-05-09    NA  -7.19  -9.02  86.6  991.  995. 202.        9.62
#> 10 20069 2022-05-10    NA  -7.25  -8.89  88    987.  990. 197.        7.75
#> # ℹ 22 more rows
#> # ℹ 1 more variable: winds_max <dbl>
#> 
#> [[2]]
#> # A tibble: 32 × 11
#>    wmo   date           p    ta     td    rh    ps   psl windd winds_mean
#>    <chr> <date>     <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>      <dbl>
#>  1 27524 2022-05-01  NA    7.21 -5.59   43    996. 1023.  238.       3.71
#>  2 27524 2022-05-02   0    9.85 -2.4    44.1  991. 1018.  199.       3   
#>  3 27524 2022-05-03  NA   11.6   1.54   53.9  982. 1008.  231.       5.88
#>  4 27524 2022-05-04   0.6  6    -1.14   60.5  985. 1012.  298.       5.25
#>  5 27524 2022-05-05  NA    5.24 -6.68   45.5  997. 1025.  248.       3.75
#>  6 27524 2022-05-06  NA    9.59 -5.91   34.2 1001. 1028.  191.       3.5 
#>  7 27524 2022-05-07  NA   13.1  -2.59   34.2  999. 1025.  180        3.62
#>  8 27524 2022-05-08   1   13.6   2.17   47.8  992. 1019.  208.       4.75
#>  9 27524 2022-05-09   2.6  6.99  0.575  64.2  994. 1021.  158.       6.25
#> 10 27524 2022-05-10   0    6.62 -3.5    50.6  992. 1019.  158.       7.38
#> # ℹ 22 more rows
#> # ℹ 1 more variable: winds_max <dbl>

Or semi-daily:

library(dplyr)

example_12h <- 
  example |> 
  rp_aggregate_pik(.period = "12h", .tz = "Europe/Moscow") |> 
  group_split(wmo)

example_12h
#> <list_of<
#>   tbl_df<
#>     wmo        : character
#>     datetime_tz: datetime<Europe/Moscow>
#>     p          : double
#>     ta         : double
#>     td         : double
#>     rh         : double
#>     ps         : double
#>     psl        : double
#>     windd      : double
#>     winds_mean : double
#>     winds_max  : double
#>   >
#> >[2]>
#> [[1]]
#> # A tibble: 63 × 11
#>    wmo   datetime_tz             p     ta     td    rh    ps   psl windd
#>    <chr> <dttm>              <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 20069 2022-05-01 09:00:00    NA  -7.37  -8.57  91    993.  997.  90  
#>  2 20069 2022-05-01 21:00:00    NA  -9.3  -10.5   91    995.  998.  67.5
#>  3 20069 2022-05-02 09:00:00    NA -13.0  -15.0   84.8  994.  997.  22.5
#>  4 20069 2022-05-02 21:00:00    NA -12.8  -14.8   85.2  989.  993. 236. 
#>  5 20069 2022-05-03 09:00:00    NA -17.0  -19.7   79.8  987.  990.  90  
#>  6 20069 2022-05-03 21:00:00    NA -15    -18.4   75.8  990.  993. 214. 
#>  7 20069 2022-05-04 09:00:00    NA -15.0  -18.6   74.5  993.  997. 248. 
#>  8 20069 2022-05-04 21:00:00    NA -13.5  -16.2   80    998. 1001. 248. 
#>  9 20069 2022-05-05 09:00:00    NA -14.9  -16.6   86.8 1000. 1003. 124. 
#> 10 20069 2022-05-05 21:00:00    NA -13.4  -15.3   86.5  997. 1000.  11.2
#> # ℹ 53 more rows
#> # ℹ 2 more variables: winds_mean <dbl>, winds_max <dbl>
#> 
#> [[2]]
#> # A tibble: 63 × 11
#>    wmo   datetime_tz             p    ta    td    rh    ps   psl windd
#>    <chr> <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 27524 2022-05-01 09:00:00  NA    2.47 -5.27  59.3  998. 1026.  225 
#>  2 27524 2022-05-01 21:00:00  NA   10.8  -5.82  30.8  995. 1022.  248.
#>  3 27524 2022-05-02 09:00:00  NA    7.55 -1.58  53    993. 1020.  195 
#>  4 27524 2022-05-02 21:00:00   0   12.2  -3.22  35.2  990. 1016.  202.
#>  5 27524 2022-05-03 09:00:00  NA    7.8   3.22  73.2  985. 1011.  214.
#>  6 27524 2022-05-03 21:00:00  NA   15.4  -0.15  34.5  980. 1006.  248.
#>  7 27524 2022-05-04 09:00:00   0.3  7     0.55  63.5  982. 1009.  292.
#>  8 27524 2022-05-04 21:00:00   0.3  5    -2.82  57.5  988. 1015.  304.
#>  9 27524 2022-05-05 09:00:00  NA    1.9  -4.85  62    996. 1023.  315 
#> 10 27524 2022-05-05 21:00:00  NA    8.58 -8.5   29    999. 1026.  180 
#> # ℹ 53 more rows
#> # ℹ 2 more variables: winds_mean <dbl>, winds_max <dbl>

Roadmap

rp5pik 📦
├── Parser functions for
│   ├── pogodaiklimat
│   │   ├── rp5pik::rp_parse_pik ✅
│   │   └── rp5pik::rp_aggregate_pik ✅
│   ├── rp5 🔲
│   └── gmvo.skniivh 🔲
├── WMO stations coordinates  🔲
└── Rain/Snow guessing  
    └── rp5pik::rp_get_temp50 ✅

About

Download And Preprocess Russian Meteorological Data

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published

Languages