From 40c297e5bae03234d0d8ada4dbcf672b1e05e05d Mon Sep 17 00:00:00 2001
From: David Dias <daviddias.p@gmail.com>
Date: Sat, 2 Jan 2016 19:59:29 +0100
Subject: [PATCH 1/3] initial commit for data importing spec

---
 data-importing/README.md           |  80 +++++++++++++++++++++++++++++
 data-importing/graphs/arch.monopic | Bin 0 -> 1598 bytes
 data-importing/graphs/arch.txt     |   8 +++
 3 files changed, 88 insertions(+)
 create mode 100644 data-importing/README.md
 create mode 100644 data-importing/graphs/arch.monopic
 create mode 100644 data-importing/graphs/arch.txt

diff --git a/data-importing/README.md b/data-importing/README.md
new file mode 100644
index 00000000..65315c8c
--- /dev/null
+++ b/data-importing/README.md
@@ -0,0 +1,80 @@
+RFC - IPFS Data Importing
+=========================
+
+Authors:
+
+Reviewers:
+
+
+> tl;dr; This document presents how data is chunked and represented inside the IPFS network.
+
+* * *
+
+# Abstract
+
+IPFS Data Importing spec describes the several importing mechanisms used by IPFS that can be also be reused by other systems. An importing mechanism is composed by one or more chunkers and data format layouts.
+
+# Status of this spec
+
+> **This spec is a Work In Progress (WIP).**
+
+# Organization of this document
+
+This RFC is organized by chapters described on the *Table of contents* section.
+
+# Table of contents
+
+- [%N%. Introduction]()
+- [%N%. Requirements]()
+- [%N%. Architecture]()
+- [%N%. Interfaces]()
+- [%N%. Implementations]()
+- [%N%. References]()
+
+# Introduction
+
+### Goals
+
+- Have a set of primitives to digest, chunk and parse files, so that different chunkers can be replaced/added without any trouble.
+
+# Requirements
+
+# Architecture
+
+```bash
+              ┌───────────┐        ┌──────────┐
+┌──────┐      │           │        │          │        ┌───────────────┐
+│ DATA │━━━━━▶│  chunker  │━━━━━━━▶│  layout  │━━━━━━━▶│ DATA formated │
+└──────┘      │           │        │          │        └───────────────┘
+              └───────────┘        └──────────┘
+             ▲                                 ▲
+             └─────────────────────────────────┘
+                          Importer
+```
+
+- `chunkers or splitters`  algorithms that read a stream and produce a series of chunks. for our purposes should be deterministic on the stream. divided into:
+  - `universal chunkers` which work on any streams given to them. (eg size, rabin, etc). should work roughly equally well across inputs.
+  - `specific chunkers` which work on specific types of files (tar splitter, mp4 splitter, etc). special purpose but super useful for big files and special types of data.
+- `layouts or topologies` graph topologies (eg balanced vs trickledag vs ext4, ... etc)
+- `importer` is a process that reads in some data (single file, set of files, archive, db, etc), and outputs a dag. may use many chunkers. may use many layouts.
+
+# Interfaces
+
+#### chunker (splitters)
+
+#### layout (topologies)
+
+#### importer
+
+# Implementations
+
+#### chunker
+
+- go-chunk https://github.com/jbenet/go-chunk
+
+#### layout
+
+#### importer
+
+# References
+
diff --git a/data-importing/graphs/arch.monopic b/data-importing/graphs/arch.monopic
new file mode 100644
index 0000000000000000000000000000000000000000..f4185c9637730330eeca0cc5f2b85b141e2e827a
GIT binary patch
literal 1598
zcmV-E2EqCNO;1iwP)S1pABzY8000000t4k*OLL<}5dJG$oN*QN;^A8=xuhzmRBnlj
zLSiHlBX9ukuH({w&%7D|0<RDnJGKv?u{_h$@BVs5&rZ4eo3ESNxqX(y8!QK_vPyXs
zH`#Zd$4%wjI*#i!{Cg88dA2R~yl9+T<T~rJXv8>P<-ACF>fAQfBR4a%qRvtdU)8%L
zEx&1snq1N}$$FQFr7)+r>&om}3FeYw`<N$HtS30P)O9vhvgdJ9taoJ<S9~KTOYVK*
z4Cv&wrjwWJd&%J@$!oD!TCO|GkeIN`(v%nN1Sx~~@2~rZvTAtsS0M=}>-%k0J{D=*
zJUs}$u~0#?6W?ri?ZVC^UP$qdYDu=vttURk+Bg$mW3N?Ax!KfQZs@t51LD=$zhV<7
zbR~cF<4VjGD+Sv^ra$l6we2>B1F*_~W=}1ZQA^*_=F38CK9pT^v@WYcTA*WlwX_EP
zhhl13W!p?>(z#*o4WjMhm-t!L9VZ*;&4`)F{pzfFQV#G?ieXk3ppK*Y)qui)!N7+>
z;M}#Fr%Am#ZujgwC<8sJcPZY7p@#CF+T`+9?b<rpF{rdxwMo`I7K{!B>PrqJJI-~C
z5~YuQ`IYOK^R#hdo@PzQ;$4ERDI!>2f8$&Wrv3!YDJ!0a*T9_)Mr%?#c~<a&$_xX8
z=xQfU=yVWk6Mj4<=~TvnkYas1V8ct@JmuOft#cCQBJ3Qa(m4HZV)_>uZ=()Iz>A3T
zq8(4txG0-M>ywP-l^6V==38l2@pbyrE+diO3BG;u(AkkltB>atr3kGSG7(g#SsGaE
zu@_3NP7>cRp&QRZ-q{t!`yW;mgsxms-mZ=y;<7g>B)@Bo=Q^;?t0c(pRubagEU6>?
zJaa}NQC3$23<D@F2ucfLuTV(P+ZPfP{IHOCFBg(-Hc<HTG*;5AQZ_Y|x<v|iJYD|L
z8f=p%Tfrncw1$~W+CP&tM%G9pRauWy!?8>P_ITEcHvuD^buIIZBRx$RR6O_M;WEid
z-!xVd{;nLGxYC)adNcL*)jLq{P`xAZzB?G+h-b}Mm@E?e7xCYp#Pf&xJQWoyVTJ-9
z1;h4Sg{koGBo#v+s|{Y-QlSYQ_TB7`JaCophg=^QA<1GUwI%a@-~f9;Pw&wLOG0=p
zwpm8egjAHykP3}SMd%_=$YtRd&y&o`CJ-9&e7Qyd;OJtF`=D{U$kR`RIQp<ZVG=}_
zIvQom9gTd5hJg&aSvJuy;Hidz0gzw-Bp3h*20(&AkIDe|vj7I+NSlGL)Mgx63QriI
za~lsIpVN5IMU4lGpojNhfabI!B)lm3`RJl$%(D~tG_Y2@XmrJ@Ea1Ay$(4BaCQmm8
zprmJ4?bH11I>^%>NPp;sNUub+o5F*4-F@Qd<8+_+9d)0CcDqkZU%CCnkmv{`Is$cs
zK+-1UbF`m?U9tUSQ=m!J;6<Q`g;S)1#iZ#JX)=Q}9;c0`Xrsln{rQzDI<r#wY&L0j
zE@^TZ`X!?L5P-jaWT*HmOfeQZFNEnya_>gCKME$tGxB(r?-suP#BThXsxglK1~tK4
zL_#BBun9m@fPM}Xc>M_&7E&kjWPWUOxPhYmsyVEYt+^(lfT<}zC&kyK9=#{H11$Bf
zfd^$D2Y3W7SqeOE&}$(N8g-EeeQo4HA-pKW7lpv15NS009LR%tS41A+hr2+P=OJ|N
z1u6#r!oI}|)WI%u&M|Ok{{j^Uvo284@<T{OtidJhStE^@HT;ONHPR@sbl{8Z*^t^{
z8Bd;3lcm=N)SDo!8sug7ERnryuh7z&Ain5`k}x~$C{IoD2a(LU{-vPKzcv3SE*|$Q
zUMYG={mWNxs@_5jeCX&>LSqHBd-+o!`$G#(Y{di#rW_0{^<Ce)%vzRuO7`KPiH}{C
zVP-)S3TJH;S%fBoUFJX&O6)@ukw|7h6B=5pqJ&r@O?_+m-N$x9G<x5^p$k9EVM6$x
z&Jr-`{Tf>|2e14qIyjR<a43f%`(e1x!1!|<CSZEaYmDHsmQXl3YLBQ{blJ7`g+C5q
ww7WOCoCt1-pv$$;67ek|a^JUhOYc()_3NM!k4z`e&R6x#f1YQf=o~Zv06a<#hX4Qo

literal 0
HcmV?d00001

diff --git a/data-importing/graphs/arch.txt b/data-importing/graphs/arch.txt
new file mode 100644
index 00000000..46919dd3
--- /dev/null
+++ b/data-importing/graphs/arch.txt
@@ -0,0 +1,8 @@
+              ┌───────────┐        ┌──────────┐
+┌──────┐      │           │        │          │        ┌───────────────┐
+│ DATA │━━━━━▶│  chunker  │━━━━━━━▶│  layout  │━━━━━━━▶│ DATA formated │
+└──────┘      │           │        │          │        └───────────────┘
+              └───────────┘        └──────────┘
+             ▲                                 ▲
+             └─────────────────────────────────┘
+                          Importer
\ No newline at end of file

From 464745c67f56ce5ac47060507298c7e148e8ea3c Mon Sep 17 00:00:00 2001
From: David Dias <daviddias.p@gmail.com>
Date: Mon, 4 Jan 2016 15:57:02 +0100
Subject: [PATCH 2/3] add intro and requirements

---
 data-importing/README.md | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/data-importing/README.md b/data-importing/README.md
index 65315c8c..d04dc6f9 100644
--- a/data-importing/README.md
+++ b/data-importing/README.md
@@ -33,12 +33,42 @@ This RFC is organized by chapters described on the *Table of contents* section.
 
 # Introduction
 
+Importing data into IPFS can be done in a variety of ways. These are use-case specific, produce different datastructures, produce different graph topologies, and so on. These are not strictly needed in an IPFS implementation, but definitely make it more useful. 
+
+These data importing primitivies  are really just tools on top of IPLD, meaning that these can be generic and separate from IPFS itself.
+
+Essentially, data importing is divided into two parts:
+
+- Layouts - The graph topologies in which data is going to be structured and represented, there can include:
+  - balanced graphs, simpler to implement
+  - trickledag, a custom graph optimized for seeking
+  - live stream
+  - database indices
+  - and so on
+- Splitters - The chunking algorithms applied to each file, these can be:
+  - fixed size chunking (also known as dumb chunking)
+  - rabin fingerprinting
+  - dedicated format chunking, these require knowledge of the format and typically only work with certain time of files (e.g. video, audio, images, etc)
+  - special datastructures chunking, formats like, tar, pdf, doc, container and/org vm images fall into this category
+
 ### Goals
 
 - Have a set of primitives to digest, chunk and parse files, so that different chunkers can be replaced/added without any trouble.
 
 # Requirements
 
+These are a set of requirements (or guidelines) of the expectations that need to be fullfilled for a layout or a splitter:
+
+- a layout should expose an API encoder/decoder like, that is, able to convert data to its format and convert it back to the original format
+- a layout should contain a clear umnambiguous representation of the data that gets converted to its format
+- a layout can leverage one or more splitting strategies, applying the best strategy depending on the data format (dedicated format chunking)
+- a splitter can be:
+  - agnostic - chunks any data format in the same way
+  - dedicated - only able to chunk specific data formats
+- a splitter should expose also a encoder/decoder like API
+- a splitter, once fed with data, should yield chunks to be added to layout or another layout of itself
+- an importer is a aggregate of layouts and splitters
+
 # Architecture
 
 ```bash
@@ -60,9 +90,9 @@ This RFC is organized by chapters described on the *Table of contents* section.
 
 # Interfaces
 
-#### chunker (splitters)
+#### splitters
 
-#### layout (topologies)
+#### layout
 
 #### importer
 

From b210bcabf46866fcc5f56822e677129e1646ac14 Mon Sep 17 00:00:00 2001
From: David Dias <daviddias.p@gmail.com>
Date: Mon, 13 Feb 2017 08:25:24 -0800
Subject: [PATCH 3/3] rename to DEX (for now) and point to all of the
 discussions

---
 {data-importing => dex}/README.md           |  12 +++++++++++-
 {data-importing => dex}/graphs/arch.monopic | Bin
 {data-importing => dex}/graphs/arch.txt     |   0
 3 files changed, 11 insertions(+), 1 deletion(-)
 rename {data-importing => dex}/README.md (93%)
 rename {data-importing => dex}/graphs/arch.monopic (100%)
 rename {data-importing => dex}/graphs/arch.txt (100%)

diff --git a/data-importing/README.md b/dex/README.md
similarity index 93%
rename from data-importing/README.md
rename to dex/README.md
index d04dc6f9..88b03c76 100644
--- a/data-importing/README.md
+++ b/dex/README.md
@@ -1,8 +1,11 @@
-RFC - IPFS Data Importing
+RFC - DEX (name still under consideration)
 =========================
 
 Authors:
 
+- David Dias
+- Juan Benet
+
 Reviewers:
 
 
@@ -18,6 +21,13 @@ IPFS Data Importing spec describes the several importing mechanisms used by IPFS
 
 > **This spec is a Work In Progress (WIP).**
 
+Lots of discussions around this topic, some of them here:
+
+- https://github.com/ipfs/notes/issues/204
+- https://github.com/ipfs/notes/issues/216
+- https://github.com/ipfs/notes/issues/205
+- https://github.com/ipfs/notes/issues/144
+
 # Organization of this document
 
 This RFC is organized by chapters described on the *Table of contents* section.
diff --git a/data-importing/graphs/arch.monopic b/dex/graphs/arch.monopic
similarity index 100%
rename from data-importing/graphs/arch.monopic
rename to dex/graphs/arch.monopic
diff --git a/data-importing/graphs/arch.txt b/dex/graphs/arch.txt
similarity index 100%
rename from data-importing/graphs/arch.txt
rename to dex/graphs/arch.txt