Update README.md

Cleaned up the intro by removing documentation that's duplicated below.
icecube · Jan 14, 2022 · 3a98fc0 · 3a98fc0
1 parent 4273b3c
commit 3a98fc0
Showing 1 changed file with 19 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -64,9 +64,17 @@ $ coverage run --source=voka -m unittest
 ```
 
 # Introduction
-Nearly all modern HPC applications execute in a distributed system.  Whether the system is a commercial cloud, public grid, or private cluster, application execution represents a significant cost of time, energy, and money.  Continuous Integration (CI) systems have been around for a while as the first stage of code validation.  More recently Continuous Delivery (CD) systems have become more popular, as a means to autonomously deliver validated software.  A new stage in this pipeline is emerging, called Continuous Benchmarking (CB) that attempts to look beyond traditional unit and integration tests common to all CI/CD systems.  No prior knowledge, by human observers, of distribution expectations are required.  Distributions are not required to be mathematically well-behaved, i.e. derived by sampling functions with continuous low-order derivatives.  Distributions are not required to adhere to Poissonian statistics.  Finally, there is no threshold on the sample statistics required.  The technique presented is currently being used by IceCube in a nightly CI/CD/CB system to validate its physics codebase before release and subsequent petabyte-scale mass production.
-
-The method described here was inspired by F. Porter's paper[1] describing various methods to test the consistency of two histograms and can be considered an extension, where instead of asking whether two histograms are consistent, we ask whether one test histogram is consistent with an ensemble of N benchmark histograms.
+Nearly all modern HPC applications execute in a distributed system.  Whether the system is a commercial cloud, public grid, 
+or private cluster, application execution represents a significant cost of time, energy, and money.  Continuous Integration 
+(CI) systems have been around for a while as the first stage of code validation.  More recently Continuous Delivery (CD) 
+ystems have become more popular, as a means to autonomously deliver validated software.  A new stage in this pipeline is 
+emerging, called Continuous Benchmarking (CB) that attempts to look beyond traditional unit and integration tests common 
+to all CI/CD systems.  The technique presented is currently being used by IceCube in a nightly CI/CD/CB system to validate 
+its physics codebase before release and subsequent petabyte-scale mass production.
+
+The method described here was inspired by F. Porter's paper[1] describing various methods to test the consistency of two 
+histograms and can be considered an extension, where instead of asking whether two histograms are consistent, we ask 
+whether one test histogram is consistent with an ensemble of N benchmark histograms.
 
 ## Advantages
 - No prior detailed knowledge of the distribution is required.
@@ -76,12 +84,18 @@ The method described here was inspired by F. Porter's paper[1] describing variou
 - Works on distributions of naturally, human-interpretable metrics (i.e. histograms).
 
 ## Difficulty with Traditional Methods
-The chi^2 distribution between two histograms doesn't follow a chi^2 distribution.  It has long tails.  Can't easily derive a p-value from it using standard tools.  Necessary to empirically determine the test statistic distribution for each physical distribution, which is impractical.  Fitting has the same problem[3].  It's impractical and often distributions don't easily fit analytic functions since there's no guarantee they're sampled from a function with continuous first derivatives.
+The chi^2 distribution between two histograms doesn't follow a chi^2 distribution.  It has long tails.  Can't easily 
+derive a p-value from it using standard tools.  Necessary to empirically determine the test statistic distribution for 
+each physical distribution, which is impractical.  Fitting has the same problem[3].  It's impractical and often 
+istributions don't easily fit analytic functions since there's no guarantee they're sampled from a function with 
+continuous first derivatives.
 
 In practice, physical distributions are not guaranteed to be Poissonian [2].  IceCube has non-Poissonian noise [?].
 
 ## Basics
-The method presented here uses traditional test statistics, such as Chi^2, KS, AD, as inputs to an outlier detection algorithm (specifically calculating a Local Outlier Factor) to determine whether the test histogram is consistent with the benchmark ensemble.
+The method presented here uses traditional test statistics, such as Chi^2, KS, AD, as inputs to an outlier detection 
+algorithm (specifically calculating a Local Outlier Factor) to determine whether the test histogram is consistent 
+with the benchmark ensemble.
 
 ### Getting Started