README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# techfactor

<!-- badges: start -->
[![R build status](https://github.com/shrektan/techfactor/workflows/R-CMD-check/badge.svg)](https://github.com/shrektan/techfactor/actions)
[![codecov](https://codecov.io/gh/shrektan/techfactor/branch/master/graph/badge.svg)](https://codecov.io/gh/shrektan/techfactor)
<!-- badges: end -->

## The Goal

The main purpose of this package is to provide a C++ "framework" that can be used to implement complex technical stock factors, e.g., ["WorldQuant 101 Alphas"](https://arxiv.org/pdf/1601.00991.pdf) and ["GTJA 191 Alphas"(the Chinese name of this research paper is "基于短周期价量特征的多因子选股体系")](https://guorn.com/static/upload/file/3/134065454575605.pdf), in an **efficient, maintainable and correct** way. 

This package currently implements all the 191 alphas that documented in GTJA's research papers. We plan to make the package extensible in the future so that the users can implement their own definitions easily, by taking advantage of the C++ framework.

## The difficulty

Most of these technical factors are generated by the machine (by data mining) so they are often nested with multiple layers. For example, the formula of the "alpha 87" factor in "GTJA 191 Alphas" looks like this:

```
Alpha87: 
((RANK(DECAYLINEAR(DELTA(VWAP, 4), 7)) + 
TSRANK(DECAYLINEAR(((((LOW * 0.9) + 
(LOW * 0.1)) - VWAP) / (OPEN - ((HIGH + LOW) / 2))), 11), 7))
* -1)
Alpha160:
SMA((CLOSE<=DELAY(CLOSE,1)?STD(CLOSE,20):0),20,1)
```

As you can see, it's complicated in ways that:

- It uses not only the historical price of the individual stock but also the peer info given any time point
- The formula are multiple-nested, the researcher is difficult to write the implementation code correctly
- Some functions can't be expressed directly in codes so you may have to implement a formula with bloated code thus error-prone
- It's difficult to know the historical length of the data requires for a given formula, making the optimizing and `NA` handling issue harder

**What's more, the efficiency of implementation is very important**: as the effectiveness of the technical factors declines quickly, we need to have factor values in daily frequency. At the time of writing, there're more than 3000 stocks in the A-share market. Even if we are able to perform 100 calculation per second. It takes 10.5 hours to have a five-year historical factor value for a single factor (5 * 252 * 3000 / 3600 / 100).

However, given the complexity of the formula, it's easy to use future information by accident (very dangerous in Quant research) or write incorrect codes, without a framework, while it's also difficult to implement efficiently, with one. 

## The solution

This package strives to provide a framework so that you can write those alpha formulas in an **efficient, maintainable and correct** way. The two of those alphas can be implemented with C++ codes like below.

The first one looks still complicated but if you check the code carefully, you can see that **the code is very similar / close to the original formula**. In addition, it avoids the manual management of the data handling thus **prevents you from using future data accidentally**. What's more important, **it runs fast, due to taking advantage of the zero-cost abstraction that C++ empowers**(it takes *less than 1 minute* to calculate the two alphas of all A-share stocks for the past five year 201501 - 201912, on a regular PC using three cores).

### The C++ code

```cpp
Alpha_mfun alpha087 = [](const Quotes& qts) -> Timeseries {
  auto decay_linear1 = [](const Quote& qt) {
    return decaylinear(
      qt.ts<double>(7, [](const Quote& qt){ return delta(qt.ts_vwap(4)); })
    );
  };
  auto decay_linear2 = [](const Quote& qt) {
    auto part1 = qt.ts_low(11) * 0.9 + qt.ts_low(11) * 0.1 - qt.ts_vwap(11);
    auto part2 = qt.ts_open(11) - (qt.ts_high(11) + qt.ts_low(11) / 2);
    return decaylinear(part1 / part2);
  };
  auto ts_rank = [decay_linear2](const Quote& qt) {
    return tsrank(qt.ts<double>(7, decay_linear2));
  };
  Timeseries part1 = rank(qts.apply(decay_linear1));
  Timeseries part2 = qts.apply(ts_rank);
  return (part1 + part2) * -1.0;
};

Alpha_fun alpha160 = [](const Quote& qt) -> double {
  auto fun = [](const Quote& qt) {
    return (qt.close() <= qt.close(1)) ? stdev(qt.ts_close(20)) : 0.0;
  };
  return sma(qt.ts<double>(20, fun), 1);
};
```

## Example

```{r example}
library(techfactor)
head(tf_quote)
(from_to <- range(tail(tf_quote$DATE)))

factors <- tf_reg_factors()
str(factors)
(normal_factor <- attr(factors, "normal")[1])
(panel_factor <- attr(factors, "panel")[1])

qt <- tf_quote_xptr(tf_quote)
tf_qt_cal(qt, normal_factor, from_to)

head(tf_quotes[1])
qts <- tf_quotes_xptr(tf_quotes)
tf_qts_cal(qts, normal_factor, from_to)
tf_qts_cal(qts, panel_factor, from_to)
```

## session info

```{r session_info}
xfun::session_info(packages = 'techfactor')
```