Skip to content

Latest commit

 

History

History
142 lines (115 loc) · 5.02 KB

README.md

File metadata and controls

142 lines (115 loc) · 5.02 KB

discrete.time.markov

Applies the discrete time Markov model (Markov chain) to analyze data (both text and non-text) by computing probabilities of transitions between different states.

Inspired by my Stochastic Models in Management module, as well as my Honors Dissertation, as part of my coursework in the National University of Singapore (NUS) Business School.

Installation

You can install the released version of discrete.time.markov from my GitHub with:

library(devtools)
install_github("kaiwei-tan/discrete.time.markov")

Functionality

This package contains the following functions:

  • get.transitions: creates dataframe of all state transition probabilities
  • get.transitions.text: text version of get.transitions
  • get.transition.matrix: creates transition probability matrix
  • get.transition.matrix.text: text version of get.transition.matrix
  • get.steady.state: calculates steady-state / long-run probabilities of all states

Examples

Here is a basic example, where we generate a random sequence of states and calculate their transition probabilities:

library(discrete.time.markov)

# Generate random sequence of states
set.seed(57)
states <- sample(c('up', 'down', 'same'), 10000, replace=TRUE)

# Create transition probability matrix
get.transition.matrix(states, option='prob', output_type='matrix')
#>           down      same        up
#> down 0.3348044 0.3309797 0.3342159
#> same 0.3324275 0.3324275 0.3351449
#> up   0.3524939 0.3302920 0.3172141

Here is another example involving text. Let each word in our example sentence be considered as a state, and we calculate their transition probabilities accordingly:

sentence <- 'the quick, brown fox jumps over the lazy dog.'

# Function ignores all punctuation
# Create transition probability matrix
get.transition.matrix.text(sentence, 1, option='prob', output_type='matrix', punct='none')
#>       brown dog fox jumps lazy over quick the
#> brown     0   0   1     0  0.0    0   0.0   0
#> dog       0   1   0     0  0.0    0   0.0   0
#> fox       0   0   0     1  0.0    0   0.0   0
#> jumps     0   0   0     0  0.0    1   0.0   0
#> lazy      0   1   0     0  0.0    0   0.0   0
#> over      0   0   0     0  0.0    0   0.0   1
#> quick     1   0   0     0  0.0    0   0.0   0
#> the       0   0   0     0  0.5    0   0.5   0

Now let’s see what happens when we consider two words in a state, and also involve the period at the end (ignoring mid-sentence punctuation, each symbol counts as a state regardless of the number of words considered as a state).

# Function ignores mid-sentence punctuation (the comma)
# Create transition probability matrix
get.transition.matrix.text(sentence, 2, option='prob', output_type='matrix', punct='end')
#>             . brown fox fox jumps jumps over lazy dog over the quick brown
#> .           1         0         0          0        0        0           0
#> brown fox   0         0         1          0        0        0           0
#> fox jumps   0         0         0          1        0        0           0
#> jumps over  0         0         0          0        0        1           0
#> lazy dog    1         0         0          0        0        0           0
#> over the    0         0         0          0        0        0           0
#> quick brown 0         1         0          0        0        0           0
#> the lazy    0         0         0          0        1        0           0
#> the quick   0         0         0          0        0        0           1
#>             the lazy the quick
#> .                  0         0
#> brown fox          0         0
#> fox jumps          0         0
#> jumps over         0         0
#> lazy dog           0         0
#> over the           1         0
#> quick brown        0         0
#> the lazy           0         0
#> the quick          0         0

We can also use the output of get.transitions.text (for text data) or get.transitions (for non-text), which return a dataframe of state transitions and their probabilities, with igraph::graph_from_data_frame to get a state transition diagram:

library(magrittr)
library(igraph)

lyrics <- c('never gonna give you up', 'never gonna let you down')

# Create dataframe of state transitions and probabilities
lyrics_transitions <- get.transitions.text(lyrics, 1, option='prob', punct='none')
lyrics_transitions
#> # A tibble: 7 x 3
#>   start_state end_state  prob
#>   <chr>       <chr>     <dbl>
#> 1 give        you         1  
#> 2 gonna       give        0.5
#> 3 gonna       let         0.5
#> 4 let         you         1  
#> 5 never       gonna       1  
#> 6 you         down        0.5
#> 7 you         up          0.5

# Plot graph
igraph::graph_from_data_frame(lyrics_transitions) %>%
  plot(edge.label=lyrics_transitions$prob, vertex.color='white')