Skip to content

ICDE 2019 - KV-match: A Subsequence Matching Approach Supporting Normalization and Time Warping

License

Notifications You must be signed in to change notification settings

DSM-fudan/KV-match

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KV-match

A Subsequence Matching Approach Supporting Normalization and Time Warping

Demo

Prerequisites

Requirement

  • Java Runtime Environment 1.8+
  • HBase 1.1.5+ (only for HBase table version)

Build from source

Clone the repository:

$ git clone https://github.com/dsm-fudan/KV-match.git

Build the project:

$ cd KV-match
$ mvn package

The executable program will be generated in the target directory.

Generate data and put to HBase

Execute the Data Generator, and type in the data length n.

$ java -cp KV-match-1.0-SNAPSHOT.jar cn.edu.fudan.dsm.kvmatch.DataGenerator
n = 1000000
Put to HBase? = [true/false]

The data will be generated by combining three synthetic time series generators.

It will be stored in a local file named data-{n}, and put to a HBase table named KVmatch:data-{n} if you enable the HBase table version.

Build KV-index on the data

Execute the Index Builder, and type in the length n of data you just generated.

$ java -cp KV-match-1.0-SNAPSHOT.jar cn.edu.fudan.dsm.kvmatch.IndexBuilder
n = 1000000

Subsequence matching by KV-matchDP

RSM-ED

cn.edu.fudan.dsm.kvmatch.QueryEngine

RSM-DTW

cn.edu.fudan.dsm.kvmatch.QueryEngineDtw

cNSM-ED

cn.edu.fudan.dsm.kvmatch.NormQueryEngine

cNSM-DTW

cn.edu.fudan.dsm.kvmatch.NormQueryEngineDtw

Execute the Query Engine for specific problem, and type in the parameter offset, length and epsilon. You can query multiple times, and exit with Offset = -1.

For example, we demonstrate an RSM-ED query here.

$ java -cp KV-match-1.0-SNAPSHOT.jar cn.edu.fudan.dsm.kvmatch.QueryEngine
n = 1000000
Offset = 123456
Length = 8192
Epsilon = 10

It will fetch the length-length subsequence start from offset in the data series, and use it as the query series to execute the query.

You will see the results like following.

Best: 123456, distance: 0.0
T: 176.0 ms, T_1: 86.0 ms, T_2: 21.0 ms, #candidates: 147.0, #answers: 1.0

That means the query process costs 176 ms in total, 86 ms in phase 1 (index-probing), and 21 ms in phase 2 (post-processing). The number of candidates after index-probing phase is 147, and the number of answers actually fall within the distance threshold epsilon is 1.

Default settings

  • KV-matchDP with Σ = {25, 50, 100, 200, 400}
  • Index building: d = 0.5 and α = 0.8
  • Optimizations enabled:
    • Window reduction
    • Window reordering
    • Incremental index visiting

If you want, you can change them in the code.

License

Apache-2.0 © DSM@fudan wujysh

Releases

No releases published

Packages

No packages published

Languages