Skip to content

Sandmark 2.0

Shakthi Kannan edited this page Jul 5, 2021 · 4 revisions

Sandmark 2.0

MVP

Status Feature
IN_PROGRESS Minimal dependencies using dunemark
DONE Use dune natively instead of sys_dune hack in Makefile
DONE Package and upstream orun and rungen to opam.ocaml.org
DONE Support ITER variable for multiple benchmark runs
DONE Dynamic package override support with use of dev.opam file
DONE Add meta-header in .bench result file
DONE List available benchmark tags
DONE Classify benchmarks based on time tags
DONE Integration and execution with current-bench OCurrent pipeline

Future

Status Feature
TODO User input configuration
TODO ocaml-ci deployment pipeline for developer branches/commits and Upstream CI
TODO UI Dashboard (Sandmark nightly and current-bench
TODO Analytics Dashboard (AI/ML)

Motivation

The following properties or characteristics are the objectives for Sandmark 2.0:

  • Types The user input is provided to the benchmark pipeline, and hence must be type-checked. We do not want erroneous input cascading through the various phases of execution runs, and it is essential to validate the same prior to testing. Using dependent-types is useful as the appropriate parameters can be verified while compiling the specification.

  • Reproducibility The specification for the experiments must contain all the required information so that it can be reproduced in the same or similar environment. This is helpful for the researcher to keep an account of the changes made to the set of experiments during report analysis, and also to make further test runs with the required changes. Reproducibility is also useful for a third-party to verify and validate the results.

  • Repeatability The complete, self-contained specification for the experiments allows the scheduler to make multiple runs, so that it is useful to compute an average result, rather than rely on a single run.

  • Separation of Concerns A layered architecture is essential with well-defined interfaces, so that each component performs its function, and does it well. At the interface of each layer, the data format specification should be clearly defined, so that the functional components can be replaced, as needed.

  • Categorization It is useful to classify benchmarks into multiple categories. The classification could be based on the algorithms that the researcher is interested in, or a particular hardware that is being tested, or a compiler feature that is being evaluated, or the time it takes to run the same, and so on and so forth. The use of tags can help distinguish the different benchmarks.

  • OCaml We would like to leverage the package and tools available in the OCaml ecosystem for implementing the various components. In the earlier version of Sandmark, there were Makefiles, Bash, Python, and also OCaml, and we would like to converge to using OCaml for all our needs. We can still allow moving parts in the system, but, in a controlled environment.

  • Parallelization The execution pipeline needs to be parallelizable, so that with modern multiprocessor hardware, the builds and tests can complete faster. The use of OCaml for implementing the various moving parts, allows us to use Multicore OCaml as well. This should drastically reduce the build times, and speed up the generation of reports.

The different characteristics can be worked in parallel, if required, but, the prioritization and reviews for future changes is essential to continue to improve the system. The design should be flexible such that any component can be replaced with a newer implementation so as to improve the overall performance and efficiency of the system.

Architecture

The proposed design has a layered architecture as shown in the illustration:

At the core is the Multicore OCaml compiler setup and environment. The different variants and compiler options can be provided for testing. At the next higher level, are the list of benchmarks and possible combinations of them supplied to evaluate the compiler. The scheduler is responsible for executing the benchmark runs based on the design experiments of choice for different hardware requirements, and compilers. The visualization module provides a means to analyze the different test results, and for report generation. Finally, we have the AI/ML module that is used to suggest enhancements and improvements to the design and the system.

Visualization

The JupyterHub notebook for sequential and parallel benchmarks are available for analyzing the benchmark results. In future, we should be able to create a JSON template for Grafana, where the data can be fetched from the archival system, and executed with the notebooks. An interactive option should be possible with the notebooks, and also for nightly report generation. In future, AI/ML techniques are to be incorporated into the pipeline for deeper understanding of the system.

The Sandmark nightly report generation for N+1 and N commits using the JupyterHub notebook, and Grafana dashboard with information on macro-, micro-, and nano-benchmark results are two important features.

Plan

The proposed implementation can be worked upon in multiple phases, in a bottom-up order to ensure that the solution is feasible. The different phases are listed below.

Phase 0 The end-to-end pipeline test is to be performed with the proposed tools to ensure that it is possible to implement the solution, from building the compiler, to generating a set of benchmark results.

Phase 1 Different Multicore OCaml compiler variants and options need to be tested to see if they can be built with ocaml-compiler.

Phase 2 Benchmarks along with their dependencies need to be built on this new environment, and .bench result files need to be generated and verified.

Phase 3 User configuration with examples and documentation to design an experiment, and provide the necessary and sufficient input for running the benchmarks.

Phase 4 Study of OCurrent, OCaml-CI and OCluster to be able to orchestrate deployment of the compiler setup, along with the required benchmarks provided by the user and to generate output results.

Phase 5 The use of various visualization tools to project the benchmark results to make it meaningful to the user for analysis and introspection.

Phase 6 The use AI/ML techniques to provide improvements to the system, and configuration for performance and optimization.

The steps in Phase 1-5, can be repeated when the user requirements change, or there is room for improvement over time.

References

$ git clone https://github.com/shakthimaan/sandmark.git
$ git checkout 2.0.0-alpha+001
Clone this wiki locally