Skip to content
Onno Kleen edited this page Apr 2, 2019 · 27 revisions

Background

The highfrequency package is the go-to package for the analysis of intraday price data. The package was created as a merger of the packages RTAQ and realized in 2012. The package is in need of a thorough update.

Related work

There is the HighFreq package that provides some functionality to aggregate trades and quotes data.

Details of your coding project

  • Improve the basis of the package

    • Set up a test environment using the R-package 'testthat'
    • Clean-up different notations, e.g. camelCase vs. underscore vs. dot
    • Align functionality across realized covariance estimates. Some allow for multi-day calculation (like rCov) others not (like rHYCov). This is due to the merger of the two packages whereby functionality has not been made homogeneous.
    • Improve documentation throughout the package, possibly by using the R-package 'roxygen2'
    • Add vignettes
  • Fixing bugs

    • Update examples and functions where needed to be compatible with the millisecond data from TAQ.
    • Support for millisecond data in 'mergeTradesSameTimestamp'
  • Data

    • Allow data to be in data.table-format instead of xts
      • data.frame-like objects (e.g. data.table) allow columns to be lists. Hence, output from rCov or similar realized covariance estimators could be stored into a date-sensitive format instead of lists, which is probably the most common approach.
      • Right now, frequent object-conversion is necessary when working in a data.frame-like environment: data.frame -> xts -> data.frame -> xts -> ... This is a) cumbersome and b) produces a lot of overhead that slows down the analysis
    • Optional: Support for other high-frequency providers than TAQ, e.g. Tick Data or LOBSTER
    • Implement direct loading of the realized library of the oxford-man institute.
      • Because the current version of the realized library is in 'long‘-format, it would be a nice data set for using it in vignettes that describe a data.frame/data.table/dplyr-approach to financial analysis using the highfrequency package.
      • Given that realized measures are already calculated in this instance, people will first and foremost employ the models implemented in the highfrequency package, e.g. HAR, HEAVY and maybe others to be added through GSOC. However, the models that are included in the highfrequency package are among the most often employed ones on this data set. Hence, it will be of great help for researchers even though data cleaning is not necessary.
  • Features

    • Allow external regressors, e.g. the VIX, in models that are implemented in the package, e.g. harModel
    • Add generic functions, e.g. predict, to models included in the package, e.g. for HEAVY models
    • Optional: Simplify model evaluation by integrating models into broom
    • Optional: Include a spot drift estimation function, already in contact with the author of DriftBurstHypothesis
    • Optional: Include a spot covariation estimator
    • Employing C++ implementations using Rcpp where current functions are slow

Expected impact

The changes to the package reflect the requests of different users of the highfrequency package. Addressing those needs will thus be useful for the R community.

Mentors

Kris Boudt, Dirk Eddelbuettel, Scott Payseur.

Clone this wiki locally