Skip to content

oscaro/ds-test-tools

Repository files navigation

ds-test-tools Clojure CI Clojars Project

ds-test-tools is a small library to help test Datasplash pipelines.

Usage

[com.oscaro/ds-test-tools "0.2.1"]

Then:

(ns your.project
  (:require [ds-test-tools.core :as dt]))

The only function you need is dt/run-pipeline.

It takes inputs as Clojure data, mapping of keys to result files, and a function to call in order to build the pipeline. It dumps the input data in the appropriate files; builds a configuration map; pass it to your function; run the pipeline; collect the results; and return them to you.

Simple Usage

;; Your pipeline
(defn my-job [conf p]
  (->> p
    (ds/read-edn-file (:numbers conf))
    (ds/map inc)
    (ds/write-edn-file (str (:output conf) "/higher.edn"))))


(let [{:keys [result]} (dt/run-pipeline
                         {:numbers [1 2 3 4]}
                         {:result "higher"}
                         my-job)]
  (println (sort result))) ; '(2 3 4 5)

Specifying inputs

The inputs config map uses the same format as your configuration map. Your build function should take a map of keywords to file paths:

(defn my-job [{:keys [people houses output]} p]
  (let [people (ds/read-edn-file people p)
        houses (ds/read-edn-file houses p)]
    (->> (ds/join-by (fn [p h] [p :lives-in h])
                     [[people :house-id {:type :required}]
                      [houses :id {:type :required}]])
         (ds/write-edn-file (tio/join-path output "housing.edn")))))

The pipeline above would use the following inputs config map:

{:people [{:name "John" :house-id 1} {:name "Jane" :house-id 2} ...]
 :houses [{:id 1 :name "Red House"} {:id 2 :name "Green House"} ...]}

Changing the input/output format

By default, it assumes you use EDN as inputs and outputs. You can change that by setting the :reader (used to read outputs) and :writer (used to write inputs) keys in the optional options map:

(dt/run-pipeline
    {:reader :jsons  ; or :edns (the default)
     :writer :jsons} ; or :edns (the default)
    inputs-config
    outputs-config)

If you have mixed formats or something else than EDN/JSONS, you can also provide a function of two (readers) or three (writers) arguments:

(dt/run-pipeline
    {:reader (fn [k filename] ; k is the key in your inputs-config map
               (case k
                 :my-jsons-output (tio/read-jsons-file filename)
                 :my-edns-output (tio/read-edns-file filename)))

     :writer (fn [k filename data]
               (case k
                 :my-jsons-input (tio/write-jsons-file filename data)
                 :my-csv-input (tio/write-csv-file filename data)
                 :my-text-input (tio/write-text-file filename data)))}
    inputs-config
    outputs-config)

License

Copyright © 2018-2019 Oscaro

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.