-
Notifications
You must be signed in to change notification settings - Fork 1
/
dissertation.tex
3670 lines (2892 loc) · 226 KB
/
dissertation.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% example for dissertation.sty
\documentclass[
% Replace oneside by twoside if you are printing your thesis on both sides
% of the paper, leave as is for single sided prints or for viewing on screen.
oneside,
%twoside,
11pt, a4paper,
footinclude=true,
headinclude=true,
cleardoublepage=empty
]{scrbook}
\usepackage{dissertation}
\usepackage{chngcntr}
\counterwithin{figure}{chapter}
\theoremstyle{definition}
\newtheorem{definition}{Definition}[section]
\theoremstyle{definition}
\newtheorem{example}{Example}[definition]
\newtheorem{theorem}{Theorem}[section]
\newtheorem{corollary}{Corollary}[theorem]
\usepackage{upgreek}
\usepackage{stmaryrd}
%----- jno added stuff -----
%\usepackage{fleqn}
%\setlength\mathindent{3em}
\usepackage{multirow}
\usepackage{amssymb}
\newenvironment{lcbr}{\left\{\begin{array}{l}}{\end{array}\right.}
\usepackage[all]{xy}
\def\rarrow#1#2#3{\def\nolabel{}\def\lab{#2}\ifx\lab\nolabel{#1\to #3}\else\xymatrix{ #1 \ar[r]^-{#2} & #3 }\fi}
\def\larrow#1#2#3{\def\nolabel{}\def\lab{#2}\ifx\lab\nolabel{#3\from#1}\else\xymatrix{ #3 & #1 \ar[l]_-{#2}}\fi}
\def\longlarrow#1#2#3{\xymatrix{ #3 && #1 \ar[ll]_-{#2} }}
\def\longrarrow#1#2#3{\xymatrix{ #1 \ar[rr]^-{#2} && #3 }}
\usepackage{wrapfig}
\def\doc{paper}
\def\start{&&}
\def\more{\\&&}
\def\just#1#2{\\ & \rule{2em}{0pt} \{ \mbox{\rule[-.7em]{0pt}{1.8em} \small #2 \/} \} \nonumber\\ && }
\def\conv#1{#1^\circ}
\def\comp{ \mathbin{\cdot} }
\def\kr{\mathbin{\hbox{\tiny${}^\triangledown$}}}
\def\bind{\mathbin{{>\hskip-1ex>\hskip-.4ex=}}}
\def\bindA{\mathbin{{>\hskip-1ex>\hskip-1ex>}}}
\def\ap{\mathbin{{<\hskip-1ex*\hskip-1ex>}}}
\let\kons=\underline
\def\xsor{{\dot\vee}}
%-----
\def\msplit#1#2{\left[\begin{array}{c}#1\\\hline#2\end{array}\right]} % Split
\def\meither#1#2{\left[\begin{array}{c|c}{#1}&{#2}\end{array}\right]} % either
\def\matfour#1#2#3#4{\begin{bmatrix} \left.\begin{matrix} \dfrac{#1}{#3} \end{matrix}\ \right|\ \begin{matrix} \dfrac{#2}{#4} \end{matrix}\end{bmatrix}}
%------
\def\grantsponsor#1#2#3{#2}
\newcommand\grantnum[3][]{#3%
\def\@tempa{#1}\ifx\@tempa\@empty\else\fi}
%---------------------------
% ACRONYMS -----------------------------------------------------
%import the necessary package with some options
%\usepackage[acronym,nonumberlist,nomain]{glossaries}
%enable the following to avoid links from the acronym usage to the list
%\glsdisablehyper
%displays the first use of an acronym in italic
\defglsdisplayfirst[\acronymtype]{\emph{#1#4}}
%the style of the Glossary
\glossarystyle{listgroup}
% set the name for the acronym entries page
\renewcommand{\glossaryname}{Acronyms}
%this shall be the last thing in the acronym configuration!!
\makeglossaries
% here are the acronym entries
\newacronym{miei}{MIEI}{Integrated Master's in Informatics Engineering}
\newacronym{di}{DI}{Informatics Department}
\newacronym{um}{UM}{University of Minho}
\newacronym{fp}{FP}{Functional Programming}
\newacronym{ct}{CT}{Category Theory}
\newacronym{cs}{CS}{Computer Science}
\newacronym{pfp}{PFP}{Probabilistic Functional Programming}
\newacronym{aop}{AoP}{Algebra of Programming}
\newacronym{laop}{LAoP}{Linear Algebra of Programming}
\newacronym{saf}{SAF}{Selective Applicative Functor}
\newacronym{edsl}{eDSL}{Embedded domain specific language}
\newacronym{ppl}{PPL}{Probabilistic Programming Language}
\newacronym{gadt}{GADT}{Generalised Algebraic Datatype}
\newacronym{ghc}{GHC}{Glasgow Haskell Compiler}
% these could go in an acronyms.tex file, and loaded with:
% \loadglsentries[\acronymtype]{Parts/Definitions/acronyms}
% when using this, you may want to remove 'nomain' from the package options
%% **MORE INFO** %%
%to add the acronyms list add the following where you want to print it:
%\printglossary[type=\acronymtype]
%\clearpage
%\thispagestyle{empty}
%to use an acronym:
%\gls{qps}
% compile the thesis in command line with the following command sequence:
% pdlatex dissertation.tex
% makeglossaries dissertation
% bibtex dissertation
% pdlatex dissertation.tex
% pdlatex dissertation.tex
% ----------------------------------------------------------------
% Title
\titleA{Selective Applicative Functors}
\titleB{\& Probabilistic Programming} % (if any)
%\subtitleA{First Part of Subtitle}
%\subtitleB{Second part of Subtitle} % (if any)
% Author
\author{Armando João Isaías Ferreira dos Santos}
% Supervisor(s)
\supervisor{José Nuno Oliveira (INESCTEC \& University of Minho)} %Mudar o bold
\cosupervisor{Andrey Mokhov (Newcastle University, UK)}
% University (uncomment if you need to change default values)
% \def\school{Escola de Engenharia}
% \def\department{Departamento de Inform\'{a}tica}
% \def\university{Universidade do Minho}
% \def\masterdegree{Computer Science}
% Date
\date{\myear} % change to text if date is not today
% Keywords
%\keywords{master thesis, functional programming, probabilistic programming, monads, applicatives, selective applicative functor, haskell}
% Glossaries & Acronyms
%\makeglossaries % either use this ...
%\makeindex % ... or this
% Define Acronyms
%\input{sec/acronyms}
%\glsaddall[types={\acronymtype}]
\ummetadata % add metadata to the document (author, publisher, ...)
\AtBeginDocument{\usefont{\encodingdefault}{phv}{m}{n}}
\begin{document}
%fontfamily{phv}\fontseries{mc}\selectfont
% Cover page ---------------------------------------
\umfrontcover
\umtitlepage
\chapter*{Copyright and Terms of Use for Third Party Work}
This dissertation reports on academic work that can be used by third parties as long as the internationally accepted standards and good practices are respected concerning copyright and related rights.
\vskip 1em
\noindent This work can thereafter be used under the terms established in the license below.
\vskip 1em
\noindent Readers needing authorisation conditions not provided for in the indicated licensing should contact the author through the RepositóriUM of the University of Minho.
\section*{License granted to users of this work:}
\CCBY % or replace by one in***************** the list below, cf https://alunos.uminho.pt/PT/estudantes/Formataes/3_Despacho_RT-31_2019_Anexo%203-Informa%c3%a7%c3%a3o-Direitor%20de%20Autor.docx
%---------
%\CCBYNCND
%\CCBYNCSA
%\CCBYNC
%\CCBYND
%\CCBYSA
% Add acknowledgements ----------------------------
\chapter*{Acknowledgements}
Throughout the writing of this dissertation I have received a great deal of support and assistance.
I would also like to express my sincere gratitude to my supervisor, Professor José N. Oliveira, first and foremost, for all his assistance and guidance. Without him, I wouldn't be able to tackle a lot of challenges that I've found along the way. I would like to thank him for his patient support and for all the opportunities I have been given to further my studies.
I would like to thank my co-supervisor, Andrey Mokhov, for welcoming me as his student and for proposing the subject of this master's thesis. Andrey's expertise was invaluable in the formulation of research questions and methodology. His insightful feedback pushed me to sharpen my thinking and brought my work to a higher level.
While doing this work I held a Research Grant of the DaVinci Project funded by FEDER and by National Funds, so I wish to thank \grantsponsor{ GS100000001 }{ FCT (Portuguese Foundation for Science and Technology, I.P.) }{ http://dx.doi.org/10.13039/501100001871 } and all people involved in the project for the opportunity.
A special, heartfelt thank you to Cláudia Correia, whose invaluable and unconditional support was what kept me moving. Thank you for always being there for me, in good and bad times, and for encouraging me to be professional and to do the right thing even when the road got rough. Without your support, this work would not have been possible.
Moreover, I would like to thank my family for standing by me, for investing on my education, and for always providing me everything I needed to strive.
Last but not least, for all the wonderful adventures, stories and shared moments over the past five years, I want to acknowledge and thank all of my friends. This master's thesis depicts five years of work, friendship, love, and without each one of you, I definitely wouldn't have made it this far.
\chapter*{Statement of Integrity}
I hereby declare having conducted this academic work with integrity.
\vskip 1em\noindent
I confirm that I have not used plagiarism or any form of undue use of information or falsification of results along the process leading to its elaboration.
\vskip 1em\noindent
I further declare that I have fully acknowledged the Code of Ethical Conduct of the University of Minho.
% Add abstracts (en,pt) ---------------------------
\chapter*{Abstract}
In functional programming, \emph{selective applicative functors} (SAF) are an abstraction between applicative functors and monads. This abstraction requires all effects to be statically declared, but provides a way to select which effects to execute dynamically. SAF have been shown to be a useful abstraction in several examples, including two industrial case studies. Selective functors have been used for their static analysis capabilities. The collection of information about all possible effects in a computation and the fact that they enable \emph{speculative} execution make it possible to take advantage to describe probabilistic computations instead of using monads. In particular, selective functors appear to provide a way to obtain a more efficient implementation of probability distributions than monads.
This dissertation addresses a probabilistic interpretation for the \emph{arrow} and \emph{selective} abstractions in the light of the linear algebra of programming discipline, as well as exploring ways of offering SAF capabilities to probabilistic programming, by exposing sampling as a concurrency problem. As a result, provides a Haskell type-safe matrix library capable of expressing probability distributions and probabilistic computations as typed matrices, and a probabilistic programming eDSL that explores various techniques in order to offer a novel, performant solution to probabilistic functional programming.
\vskip0.5cm
\keywords{master thesis, functional programming, probabilistic programming, monads, applicatives, selective applicative functor, Haskell, matrices}
\cleardoublepage
\chapter*{Resumo}
Em programação funcional, os functores \emph{aplicativos seletivos} (FAS) são uma abstração entre functores aplicativos e monades. Essa abstração requer que todos os efeitos sejam declarados estaticamente, mas fornece uma maneira de selecionar quais efeitos serão executados dinamicamente. FAS têm se mostrado uma abstração útil em vários exemplos, incluindo dois estudos de caso industriais. Functores seletivos têm sido usados pela suas capacidade de análise estática. O conjunto de informações sobre todos os efeitos possíveis numa computação e o facto de que eles permitem a execução \emph{especulativa} tornam possível descrever computações probabilísticas. Em particular, functores seletivos parecem oferecer uma maneira de obter uma implementação mais eficiente de distribuições probabilisticas do que monades.
Esta dissertação aborda uma interpretação probabilística para as abstrações \emph{Arrow} e \emph{Selective} à luz da disciplina da álgebra linear da programação, bem como explora formas de oferecer as capacidades dos FAS para programação probabilística, expondo \emph{sampling} como um problema de concorrência. Como resultado, fornece uma biblioteca de matrizes em Haskell, capaz de expressar distribuições de probabilidade e cálculos probabilísticos como matrizes tipadas e uma eDSL de programação probabilística que explora várias técnicas, com o obejtivo de oferecer uma solução inovadora e de alto desempenho para a programação funcional probabilística.
\vskip0.5cm
\keywordsPT{dissertação de mestrado, programação funcional, programação probabilística, monades, aplicativos, funtores aplicativos seletivos, haskell, matrizes}
% Summary Lists ------------------------------------
\tableofcontents
\listoffigures
\listoftables
\lstlistoflistings
%\listofabbreviations
\renewcommand{\listtheoremname}{List of theorems and definitions}
\printglossary[type=\acronymtype]
\clearpage
\thispagestyle{empty}
\pagenumbering{arabic}
% CHAPTER - Introduction -------------------------
\chapter{Introduction}
% This dissertation describes the work carried out in the context of the author's \gls{miei} offered by the \gls{di} of the \gls{um}.
% This chapter provides an introduction to the research by explaining the context of the research problem, the research motivation and objectives, state of the art, scope and related work.
% \section{Context}
\label{sec-context}
\gls{fp} deals with the complexity of real life problems by handling so-called (side) \emph{effects} in an algebraic manner. Monads are one such algebraic device, pioneered by \cite{Moggi:1991:NCM:116981.116984} in the field of computer science to verify \emph{effectful} programs, i.e.\ programs that deal with side effects. \cite{Wadler:1989:TF:99370.99404} was among the first to recommend monads in functional programming as a general and powerful approach for describing effectful (or impure) computations, while still using pure functions. The key ingredient of the monad abstraction is the \textit{bind} operator, which applies functions to monadic objects \emph{carrying the effects through}. This operator leads to an approach to composing effectful computations which is inherently sequential. This intrinsic nature of monads can be used for conditional effect execution. However, this abstraction is often too strong for particular programming situations, where abstractions with weaker laws are welcome.
Applicative functors \citep{mcbride2008applicative} can be used for composing statically known collections of effectful computations, as long as these computations are independent from each other. Therefore, this kind of functor can only take two effectful computations and, independently (i.e.\ in parallel), compute their values and return their composition.
There are situations in which just having a Monad or an Applicative is too limiting, calling for a programming abstraction sitting somewhere between Monad and Applicative. An abstraction that requires all effects to be statically declared but provides a way to select which of the effects to execute dynamically was introduced by \cite{andrey2019selective} to cope with such situations. It is called the \gls{saf} abstraction.
In the field of \gls{pfp}, monads are used to describe events (probabilistic computations in this case) that depend on others \citep{erwig_kollmansberger_2006}. Better than monads, which are inherently sequential, selective functors provide a nicer abstraction for describing conditional probabilistic computations. According to \cite{andrey2019selective}, this kind of functor has proved to be a helpful abstraction in the fields of static analysis (at Jane Street) and speculative execution (at Facebook), achieving good results without disturbing the adopted code style.
Arrows \citep{Hughes:2000:GMA:347238.347246} are more generic than monads and were designed to abstract the structure of more complex patterns than the monad interface could support. The most common example is the parsing library by \cite{swiestra&duponcheel} that takes advantage of static analysis to improve its performance. This example could not be optimised using the Monad interface, given its sequential nature. Having \gls{ct} as a foundation, the Arrow abstraction has made its way to the \gls{fp} ecosystem as a way to mitigate the somewhat heavy requirements of the powerful Monad.
There are reasons to believe that by adopting the selective abstraction one could shorten the gap that once was only filled by the Arrow abstraction \citep{Hughes:2000:GMA:347238.347246}. On the one hand, the generality of the Arrow interface enables solving some of the structural constraints that refrain one from implementing a stronger abstraction and compose various combinators in order to achieve greater expressiveness. On the other hand, languages such as Haskell, which implement many of these abstractions out of the box, render code written in the Arrow style not only convoluted, but also unnatural and difficult to refactor.
\section{Motivation and Goals}\label{sec-moti-goals}
The rise of new topics such as e.g.\ machine learning, deep learning, quantum computing are stimulating major advances in the programming language domain \citep{selinger2004brief, innes2018machine}. To cope with the increased complexity, mathematics always had a principal role, either by formalising the underlying theory, or by providing robust and sound theories to deal with the new heavy machinery. But what do these topics have in common? They all deal, in some way, with probabilities.
Programming languages are a means of communicating (complex) concepts to computers. They provide a way to express, automate, abstract and reason about them. There are programming languages, specially functional programming languages, that work more closely to the mathematical level and are based in concepts like referential transparency and purity. However, not all of the abstractions useful in \gls{cs} have come directly from mathematics. There are several abstractions that were meant to factor out some kind of ubiquitous behaviour or to provide a sound and robust framework where one could reason about the code and provide a more efficient solution. The \gls{saf} is such an abstraction.
Probabilistic programming allows programmers to model probabilistic events and predict or calculate results with a certain degree of uncertainty. In particular \gls{pfp} manipulates and manages probabilities in an abstract, high-level way, circumventing convoluted notation and complex mathematical formulas. Probabilistic programming research is primarily focused on developing optimisations to inference and sampling algorithms in order to make code run faster while preserving the posterior probabilities. There are many strategies and techniques for optimising probabilistic programs, namely using static analysis \citep{bernstein2019static}.
The main goal of this research is to study, evaluate and compare ways of describing and implementing probabilistic computations using the so-called \emph{selective abstraction}. In particular, to evaluate the benefits of doing so in the \gls{pfp} ecosystem. This will be accomplished by proposing an appropriate set of case studies and, ultimately, developing a couple of Haskell libraries that provides an efficient encoding of probabilities, taking advantage of the selective applicative abstraction. Focusing on how to overcome the intrinsic sequential nature of the monad abstraction \citep{Scibior:2015:PPP:2887747.2804317} in favour of the speculative execution of the selective functors, one of the aims of this work is to answer the following research question:
\begin{quote}
"Can the \texttt{select} operator be implemented more efficiently than the monadic \texttt{bind} operator?"
\end{quote}
\section{State of the Art}\label{sec-state-art}
In the context of this research, abstractions can be viewed from two perspectives:
\begin{itemize}
\item The programming language;
\item The underlying mathematical theory.
\end{itemize}{}
As expected, the programming language prism makes one see things more concretely, i.e.\ brings one down the abstraction ladder. That is why normally many abstractions tend to be associated to quite frequent patterns and interfaces that programmers wish to generalise.
This said, a recurrent problem happens when authors try to explain their mathematical abstractions by going down to a comfortable, intuitive and easy to understand level \citep{DBLP:journals/corr/abs-1803-10195}. However, in \gls{cs} the level might be so low (one could even write: \emph{ad-hoc}) that the need for such abstractions may be questionable. Mathematical abstractions are useful ways of generalising and organising patterns that abide by the same rules, i.e.\ are governed by the same set of laws. Thanks to much work on abstract algebra or \gls{ct}, these abstractions automatically become powerful conceptual tools. In this regard, finding the right mathematical description of an abstraction is \emph{halfway} for correctly using it.
The following section presents widely used mathematical abstractions that made their way into programming languages, in particular in the probabilistic programming environment. How recent work by \cite{andrey2019selective} relates to such abstractions will also be addressed. Given the scope of this research and aiming to explore interesting ways of thinking about probability distributions, every abstraction is introduced accompanied by a concrete instance in the probabilistic setting.
\subsection{Hierarchy of Abstractions}
The purpose of every abstraction is to generalise a certain pattern or behaviour. Abstract algebra is a field of mathematics devoted to studying mathematical abstractions. In particular, by studying ways of building more complex abstractions by composing simpler ones. Regarding abstractions as layers, one can pretty much think of the heritage mechanism that is so fond of object oriented programming \citep{Liskov:1987:KAD:62139.62141}.
A hierarchy of abstractions aims to hide information and manage complexity. The highest level has the least information and lowest complexity. For the purposes of this research, it is interesting to see how the abstractions presented in the next sections map to the corresponding probability theory features and how the underlying levels translate to more complex ones.
\subsection{Functors}
\paragraph{What Functors are}
Functors originate from \gls{ct} as morphisms between categories \citep{Awodey:2010:CT:2060081}. Functors abstract a common pattern in programming and provide the ability to map a function inside some kind of structure. Since functors must preserve structure they are a powerful reasoning tool in programming.
\begin{lstlisting}[mathescape, language=Haskell, caption={Functor laws},captionpos=b]
class Functor f where
fmap :: (a -> b) -> f a -> f b
-- fmap id = id
-- fmap f . fmap g = fmap (f . g)
\end{lstlisting}{}
\paragraph{Probabilistic Speak}
There are many situations in which the type \texttt{f a} makes sense. The easiest way to understand it is to see \texttt{f} as a data container; then readers can instantiate \texttt{f} to a concrete type, for instance lists (\texttt{[a]}).
For the purpose of probabilistic thinking, \texttt{f a} instantiates to the "Distribution of a's" container and \texttt{fmap} (the factored out pattern) as the action of mapping a function through all the values of a distribution without changing their probabilities. (Probabilities will sum up automatically wherever function \texttt{f} is not injective.) As will be seen in chapter \ref{ch-background} there are multiple ways of combining probabilities. However, given the properties of a functor, it is only possible to map functions inside it while preserving its structure. This said, the probability functor can be casually seen as only being capable to express the probability $P(A)$ of an event $A$ in probability theory \citep{jtobin}.
\subsection{Applicative Functors}
\paragraph{What Applicative Functors are}
Most functional programming languages separate pure computations from effectful ones. An effectful computation performs side effects or runs in a given context while delivering its result. While working with the Haskell functional programming language, \cite{mcbride2008applicative} found that the pattern of applying pure functions to effectful computations popped out very often in a wide range of fields. The pattern consists mostly of
\begin{inparaenum}
\item embedding a pure computation in the current context while maintaining its semantics, i.e.\ lifting a value into an "effect free" context,\label{p-1} and then
\item combining the results of the computations, i.e.\ \emph{applying} a pure computation to effectful ones.\label{p-2}
\end{inparaenum}
All it takes to abstract this pattern is a way to factor out \ref{p-1} and \ref{p-2}.
\begin{lstlisting}[language=Haskell, caption={Applicative laws},captionpos=b]
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
-- pure id <*> u == u
-- pure f <*> pure x == pure (f x)
-- u <*> (v <*> w) = pure (.) <*> u <*> v <*> w
-- u <*> pure y = pure ($ y) <*> u
\end{lstlisting}{}
It is important to note that in order to be an applicative, \texttt{f} first needs to be a functor. So, every applicative is a functor. This can be seen as going down one layer of abstraction in the hierarchy, by empowering a functor \texttt{f} with more capabilities if it respects the applicative laws (given in the listing above).
Applicatives are interesting abstractions in the sense that they were not a transposition of a known mathematical concept. However, \cite{mcbride2008applicative} establish a correspondence with the standard categorical ``zoo" by concluding that \emph{in categorical terms applicative functors are strong lax monoidal functors}. This has opened ground for a stream of fascinating research, see e.g.\ \citep{Paterson:2012:CAF:2368298.2368321, Cooper:2008:EFA:1485346.1485361, DBLP:journals/corr/CapriottiK14}.
\paragraph{Probabilistic Speak}
Looking at the laws of applicative functors one sees that they pretty much define what the intended semantics regarding sequencing effects are. The last one, called the \emph{interchange law} \citep{mcbride2008applicative}, clearly says that when evaluating the application of an effectful function to a pure argument, the order in which one evaluates the function and its argument \emph{does not matter}. However, if both computations are effectful the order \emph{does matter}, but a computation cannot depend on values returned by prior computations, i.e.\ the result of the applicative action can depend on earlier values but the effects cannot. In other words, computations can run \emph{independently} from each other \citep{Cooper:2008:EFA:1485346.1485361, Marlow:2014:NFA:2692915.2628144, Marlow:2016:DHD:3241625.2976007, andrey2019selective}.
So, if \texttt{f a} represents a distribution then \texttt{pure} can be seen as the embedding of a given value \texttt{a} in the probabilistic context with 100\% chance, and \texttt{($\ap$)} as the action responsible of combining two \emph{independent} distributions, calculating their joint probability. This said, the probability instance of applicative functors can be regarded as being able to express $P(A, B) = P(A)P(B)$, i.e.\ statistical independence \citep{jtobin}.
\subsection{Monads}
\paragraph{What Monads are}
Before being introduced in programming languages, monads had already been used in algebraic topology by \cite{godement1958topologie} and \gls{ct} by \cite{maclane:71}. Monads were used in this areas because they were able to embed a given value into another structured object and because they were able to express a lot of different constructions in a single structure \citep{DBLP:journals/corr/abs-1803-10195}. Evidence of the flexibility and usefulness of Monads can be found in programming: \cite{Moggi:1991:NCM:116981.116984} introduced monads in order to be capable of reasoning about effectful programs and \cite{1995_wadler_monads} used them to implement effectful programs in Haskell. Although they are not presented in the same way, the mathematical monad and the programming language monad are the same concept.
\begin{definition}{A monad in a category $\mathscr{C}$ is defined as a triple $(T,\eta, \mu)$ where $T : \mathscr{C} \rightarrow \mathscr{C}$ is a functor; $\eta : Id_\mathscr{C} \rightarrow T$ and $\mu : T^2 \rightarrow T$ are natural transformations, such that:}
\begin{align*}
&\mu_A \cdot T\mu_A = \mu_A \cdot \mu_{TA} \\
&\mu_A \cdot \eta_{TA} = id_{TA} = \mu_A \cdot T\eta_A
\end{align*}{}
\end{definition}
In programming, two alternative but equivalent definitions for monads come up. A functor can be seen as a type constructor and natural transformations as functions:
\begin{lstlisting}[mathescape, language=Haskell, caption={Monad laws and definition in terms of \texttt{unit} and \texttt{join}},captionpos=b]
class Applicative m => Monad m where
unit :: a -> m a
join :: m (m a) -> m a
-- join $\cdot$ join = join $\cdot$ fmap join
-- join $\cdot$ unit = id = join $\cdot$ fmap unit
-- fmap f $\cdot$ join = join $\cdot$ fmap (fmap f)
-- fmap f $\cdot$ unit = unit $\cdot$ f
\end{lstlisting}{}
\begin{lstlisting}[language=Haskell, caption={Monad laws and definition in terms of \texttt{unit} and \texttt{bind}},captionpos=b]
class Applicative m => Monad m where
unit :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
-- unit a >>= f = unit (f a)
-- m >>= unit = m
-- (m >>= f) >>= g = m >>= (\x -> f x >>= g)
\end{lstlisting}{}
As can be seen, both definitions once again highlight the hierarchy of abstractions where every monad is an applicative, and consequently a functor. These two definitions are related by the following law:
\begin{lstlisting}[language=Haskell, caption={Relation between \texttt{join} and \texttt{bind}},captionpos=b]
m >>= f = join (fmap f m)
\end{lstlisting}{}
If monads are so versatile what type of pattern do they abstract? Intuitively, monads abstract the idea of "taking some uninteresting object and turning it into something with more structure" \citep{DBLP:journals/corr/abs-1803-10195}. This idea can be explained by using some of several known metaphors:
\begin{itemize}
\item Monads as \emph{containers}: Visualising it as a box to represent the type \texttt{m a}, the \texttt{unit} operation takes a value and wraps it in a box, and the \texttt{join} operation takes a box of boxes and unwraps it into a single box. This metaphor however, is not so good at giving intuition for \texttt{bind ($\bind$)} but, as the previous listing demonstrated, it can be seen as a combination of \texttt{fmap} and \texttt{join}.
\item Monads as \emph{computations}: Visualising \texttt{m a} as a computation, \texttt{a $\rightarrow$ m b} represents computations that depend on previous values; so, \texttt{bind} let us combine two computations emulating the sequential, imperative programming paradigm and \texttt{unit} represents a computation that does nothing.
\end{itemize}{}
Brought to programming languages, monads are used to encode different notions of computations and their structure allows us to separate pure from impure code, obtaining, in this way, nice and structured programs that are easier to reason about.
\paragraph{Probabilistic Speak}
It is more rewarding to look at probability distributions as a probabilistic computation or event. Given this, by observing the type of \texttt{bind} one can infer that it let us combine an event \texttt{m a} with another that depends on the previous value \texttt{a $\rightarrow$ m b} \citep{erwig_kollmansberger_2006}. In other words, \texttt{bind} in a sense encapsulates the notion of conditional probability. What happens in a conditional probability calculation $P(B|A)$ is that $A$ becomes the sample space, and $A$ \& $B$ will only occur a fraction $P(A \cap B)$ of the time. Making the bridge with the type signature of \texttt{($\bind$)}: \texttt{m a} represents the new sample space $A$ and \texttt{a $\rightarrow$ m b} the fraction where $A$ and $B$ occur. This being said, the probability monad can be seen as being able to express $P(B|A) = \frac{P(B \cap A)}{P(A)}$.
The observation that probability distributions form a monad is not new. Thanks to the work of \cite{giry1982} and following the hierarchy of abstractions, it is easy to see that it is indeed possible to talk about probabilities with respect to the weaker structures mentioned in the other sections \citep{jtobin, Scibior19}.
\subsection{Arrows}
\paragraph{What Arrows are}
Most abstractions described until now are based on \gls{ct}. This is because \gls{ct} can be seen as the "theory of everything", a framework where a lot of mathematical structures fit in. So, how can such an abstract theory be so useful in programming? Because computer scientists value abstraction. When designing an interface, it is meant to reveal as little as possible about the implementation details and it should be possible to switch the implementation with an alternative one, i.e.\ other \emph{instances} of the same \emph{concept}. It is the generality of a monad that is so valuable and it is thanks to the generality of \gls{ct} that makes it so useful in programming.
This being said, Arrows, introduced by \cite{Hughes:2000:GMA:347238.347246} and inspired by the ubiquity of \gls{ct}, aim to abstract how to build and structure more generic combinator libraries by suggesting the following type-class:
\begin{lstlisting}[language=Haskell, caption={Arrow type-class},captionpos=b]
class Arrow a where
arr :: (b -> c) -> a b c
(>>>) :: a b c -> a c d -> a b d
first :: a b c -> a (b, d) (c, d)
\end{lstlisting}{}
As one can note, Arrows make the dependence on an input explicit and abstract the structure of a given output type. This is why it is said that Arrows generalise monads.
Due to the fact that there are many more arrow combinators than monadic ones, a larger set of laws are required and the reader is referred to \cite{Hughes:2000:GMA:347238.347246} paper for more information about them. However, a brief explanation of the three combinators is given: \texttt{arr} can be seen as doing the same as \texttt{return} does for monads, it lifts pure functions to computations; \texttt{($\bindA$)} is analogous to \texttt{($\bind$)}, it is the left-to-right composition of arrows; and \texttt{first} comes from the limitation that Arrows can not express binary arrow functions, so this operator converts an arrow from $b$ to $c$ into an arrow of pairs, that applies its argument to the first component and leaves the other unchanged.
The astute reader will see how Arrows try to encode the notion of a category and indeed the associativity law of \texttt{($\bindA$)} is one of the laws of this type-class. Moreover, if one thinks about how, for any monad a function of type \texttt{a $\rightarrow$ m b} is a Kleisli arrow \citep{Awodey:2010:CT:2060081}, one can define the arrow combinators as follows:
\begin{lstlisting}[mathescape, language=Haskell, caption={Arrow Kleisli type-class instance},captionpos=b]
newtype Kleisli m a b = K (a -> m b)
instance Arrow (Kleisli m) where
arr f = K (\b -> return (f b))
K f >>> K g = K (\b -> f b >>= g)
first (K f) = K (\(b, d) -> f b >>= \c -> return (c, d))
\end{lstlisting}{}
This shows that Arrows in fact generalise monads. Nevertheless there is still one question that goes unanswered --- why generalise monads if they serve the same purpose of providing a common structure to generic programming libraries? \cite{Hughes:2000:GMA:347238.347246} saw in the example of \cite{swiestra&duponcheel} a limitation on the monadic interface and argues that the advantage of the Arrow interface is that it has a wider class of implementations. Thus, simpler libraries based on abstract data types that are not monads, can be given an arrow interface.
It seems that Arrows are more expressive than the abstractions seen in the previous sections, but what \emph{are} their relation with them? \cite{Lindley:2011:IOA:1953652.1954016} established the relative order of strength of Applicative $\rightarrow$ Arrow $\rightarrow$ Monad, in contrast to the putative order of Arrow $\rightarrow$ Applicative $\rightarrow$ Monad. Furthermore, given the right restrictions, Arrows are isomorphic to both Applicatives and Monads being able to "slide" between the layers of this hierarchy of abstractions.
\paragraph{Probabilistic Speak}
As seen, Arrows allow us to categorically reason about a particular structure and benefit from all the combinators that its interface offers. However, Arrows find themselves between Applicatives and Monads with respect to their strength and therefore do not express any extra special capabilities \citep{Lindley:2011:IOA:1953652.1954016}. Nevertheless, due to their generality, Arrows are able to offer either of the two abstraction (Applicative and Monad) capabilities, provided that their laws are verified.
In fact, Monads are able to express the minimum structure to represent arbitrary probability distributions \citep{jtobin}. However, there are cases where it becomes hard to reason about probability distributions using only the monadic interface \citep{Oliveira2016KeepDC}. Arrows come into play regarding this problem, allowing the so called \gls{laop} \citep{Macedo2012MatricesAA} as it will be seen in section \ref{sec-current-work}.
\subsection{Selective Applicative Functors}
\paragraph{What Selective Applicative Functors are}
Such as Applicatives, \gls{saf} did not originate from any existing mathematical construction, but rather from observing interface limitations in the hierarchy of abstractions established so far.
Allied to a specific research domain, like building systems and static analysis, \cite{andrey2019selective} saw the following limitations:
\begin{itemize}
\item Applicative functors allow effects to be statically declared, which makes it possible to perform static analysis. However, they only permit combining independent effects leaving static analysis of conditional effects aside;
\item Monads allow for combining conditional effects but can only do this dynamically, which makes static analysis impossible.
\end{itemize}{}
This said, \cite{andrey2019selective} developed an interface (abstraction) aiming at getting the best of both worlds, the \gls{saf}:
\begin{lstlisting}[language=Haskell, label={lst:saf}, caption={Selective Applicative Functor laws},captionpos=b]
class Applicative f => Selective f where
-- also known as (<*?)
select :: f (Either a b) -> f (a -> b) -> f b
-- x <*? pure id = either id id <$> x
-- pure x <*? (y *> z) = (pure x <*? y) *> (pure x <*? z)
-- x <*? (y <*? z) = (f <$> x) <*? (g <$> y) <*? (h <$> z)
-- where
-- f x = Right <$> x
-- g y = \a -> bimap (,a) ($ a) y
-- h z = uncurry z
\end{lstlisting}{}
By construction, \glspl{saf} find themselves between Applicatives and Monads and only provide one operator, \texttt{select}. By parametricity \citep{Wadler:1989:TF:99370.99404}, it is possible to understand that this operator runs an effect \texttt{f (Either a b)} which returns either an \texttt{a} or a \texttt{b}. In the case of the return value being of type \texttt{a}, the second effect must be run, in order to apply the function \texttt{a $\rightarrow$ b} and obtain the \texttt{f b} value. In the case of the return value being of type \texttt{b}, then the second computation is \emph{skipped}.
The laws presented in the listing above characterise \glspl{saf}. The first law indicates that the \texttt{select} operator should not duplicate any effect associated with \texttt{x}, and the second indicates that \texttt{select} should not add any computation when the first one is pure, which allows it to be distributed.
It is worth noting that there is no law enabling \glspl{saf} to discard the second computation, in particular \texttt{pure (Right x) <*? y = pure x}. And there is no law enabling the return value of \texttt{f (a $\rightarrow$ b)} to be applied to the value obtained by the first computation, in particular \texttt{pure (Left x) <*? y = (\$ x) <\$> y}. The explanation for this is simple: it allows instances of \glspl{saf} which are useful for static analysis to be performed and the \texttt{select} operator becomes more expressive, in the same way that Applicative Functors do not limit the execution order of two independent results.
With this in mind, it is possible to see how \glspl{saf} solve the limitation of Applicatives and Monads in the context of static analysis, allowing over-approximation and under-approximation of effects in a circuit with conditional branches. Moreover, \glspl{saf} are useful not only in static contexts but also in dynamic ones, benefiting from speculative execution \citep{andrey2019selective}.
From a theoretic point of view, \glspl{saf} can be seen as the composition of an Applicative functor \texttt{f} with the \texttt{Either} monad \citep{andrey2019selective}. Even though this formalisation is not studied by \cite{andrey2019selective}, one should address the relation between \glspl{saf} and Arrows.
%Taking advantage of the first sentence of this paragraph, we can infer that,
As every \gls{saf} is an instance of Applicative, every Applicative functor is also an instance of Selective. Moreover, as pointed by \cite{andrey2019selective} it is possible to implement a specialised version of the \texttt{bind ($\bind$)} operator for any \emph{enumerable} data type, i.e.\ the capacity of \emph{selecting} an infinite number of cases makes \glspl{saf} equivalent to Monads \citep{gist_2019}. It seems that, like Arrows, given the right conditions, \glspl{saf} are also able to "slide" between Applicatives and Monads. As a matter of fact, \cite{Hughes:2000:GMA:347238.347246} had already come up with an interface that extended Arrows with conditional capabilities, the \texttt{ArrowChoice} type-class.
Given that there was already an abstraction capable of expressing the same as \glspl{saf}, why did these arise? Arrows are more general and powerful than \glspl{saf} and could be used to solve the static analysis and speculative execution examples presented by \cite{andrey2019selective}. In fact, the build system DUNE \citep{dune_2018} is an example of successful application of Arrows. However, adding the ability of performing static analysis or speculative execution in a code-base that is not written using the Arrow abstraction, becomes more complicated than only defining an instance for \gls{saf} in just a couple of lines. With this being said, \glspl{saf} are a "just good enough" solution for providing the ability of static analysis of conditional effects and speculative execution without relying in the more powerful and intrusive Arrow abstraction.
\subsection{Summary}\label{sec-summary}
The discrete probability distribution is a particular representation of probability distributions. A distribution is represented by a sampling space, i.e.\ an enumeration of both the support and associated probability mass at any point.
Discrete distributions are also instances of the Functor type-class, which means that one can take advantage of the \texttt{fmap} operator to map all values (the distribution domain) to others while keeping the distribution structure intact, i.e.\ maintaining the probabilities of each value.
The Applicative instance let us apply pure functions to distributions. By taking advantage of the Applicative laws, it is possible, for example, to combine two distributions and calculate their joint probability, if one knows that they are independent from each other.
The Monad instance let us chain distributions, giving the possibility of expressing the calculation of conditioned probability.
\begin{table}%[]
\centering
\resizebox{0.6\textwidth}{!}{%
\begin{tabular}{|c|c|c|}
\hline
\textbf{Abstraction} & \textbf{Operators} & \textbf{Probabilistic Equivalent} \\ \hline
Functor & \texttt{fmap f A} & $P(A)$ \\ \hline
Applicative & \begin{tabular}[c]{@{}c@{}}\texttt{pure A}\\ \\ \texttt{A $\ap$ B}\end{tabular} & \begin{tabular}[c]{@{}c@{}}$A$\\ \\ $P(A)P(B)$\end{tabular} \\ \hline
Monad & \begin{tabular}[c]{@{}c@{}}\texttt{return A}\\ \\ \texttt{A $\bind$ B}\end{tabular} & \begin{tabular}[c]{@{}c@{}}$A$\\ \\ $P(B|A) = \frac{P(B \cap A)}{P(A)}$\end{tabular} \\ \hline
Arrow & \begin{tabular}[c]{@{}c@{}}\texttt{arr f}\\ \\ \texttt{A $\bindA$ B}\end{tabular} & \begin{tabular}[c]{@{}c@{}}Stochastic Matrix \texttt{f}\\ \\ Stochastic Matrix Composition\end{tabular} \\ \hline
Selective & \texttt{select A B} & - \\ \hline
\end{tabular}%
}
\caption{Summary of abstractions and their probabilistic counterpart}
\label{tab:my-table-2}
\end{table}
The most prevalent abstractions in \gls{fp} were analysed in order to understand the motivation and theory behind these and in which way they relate to the probabilistic setting. Table \ref{tab:my-table-2} summarises the relation between each abstraction and its probabilistic counterpart. The conclusion is that maths-theoretic foundations traverse all the abstractions addressed and, in particular, \gls{ct} is ubiquitous in programming and \gls{cs} in general. There are cases in which the need for abstraction comes from more practical contexts, calling for a more systematic and disciplined study grounded on sound mathematical frameworks and leading to the design of correct and efficient solutions.
This said, this dissertation is chiefly concerned with identifying which probabilistic interpretations or benefits are achievable with \glspl{saf}. After a detailed analysis of the different abstractions found in the \gls{fp} ecosystem, several starting points are outlined, in order to
% try and achieve the main goal of this thesis: proving
prove that the \gls{saf} abstraction is useful in providing a more efficient solution than Monads to encode probabilities. The ability of static analysis and speculative execution of \glspl{saf} has proved very useful in optimising certain libraries, as was the case of the Haxl library \citep{Marlow:2014:NFA:2692915.2628144, andrey2019selective}. On the other hand, the adoption of an abstraction weaker than the monadic one, may prove to be of value in mitigating the performance constraints that the monadic interface imposes because of being inherently sequential.
\section{Related Work}\label{sec-related-work}
This thesis benefits from synergies among computing and mathematics fields such as probability theory, category theory and programming languages. This section reviews similar work in such fields of research.
\subsection{Exhaustive Probability Distribution Encoding}
Over the past few years, the field of probabilistic programming has been primarily concerned with extending language capabilities in expressing probabilistic calculations and serving as practical tools for Bayesian modelling and inference \citep{erwig_kollmansberger_2006}. As a result, several languages were created to respond to emerging limitations. Despite the observation that probability distributions form a monad is not new, it was not until later that its sequential and compositional nature was explored by \cite{Ramsey:2002:SLC:565816.503288}, \cite{Goodman:2013:PPP:2480359.2429117} and \cite{Gordon:2013:MPB:2429069.2429119}.
\cite{erwig_kollmansberger_2006} were among the first to encode distributions as monads by designing a probability and simulation library based on this concept. \cite{kidd2007build}, the following year, inspired by the work of \cite{Ramsey:2002:SLC:565816.503288}, introduced a modular way of probability monad construction and showed the power of using monads as an abstraction. Due to this, he was able to, through a set of different monads, offer ways to calculate probabilities and explore their compositionality, from discrete distributions to sampling algorithms.
\cite{erwig_kollmansberger_2006}, in their library, used the non-deterministic monad to represent distributions, resulting in an exhaustive approach capable of calculating the exact probabilities of any event. However, common examples of probabilistic programming grow the sample space exponentially and make it impossible to calculate the entire distribution. Despite \cite{larsen2011memory}'s efforts to improve the performance of this library, his approach was still limited to exhaustively calculating all possible outcomes.
Apart from the asymptotic poor performance of the \cite{erwig_kollmansberger_2006} library, the use of the non-deterministic monad means that its sequential nature does not allow for further optimisations. It was with these two limitations in mind that many probabilistic programming systems were proposed.
\subsection{Embedded Domain Specific Languages}\label{rel-work-edsl}
\glspl{ppl} usually extend an existing programming language. The choice of the base language may depend on many factors such as paradigm, popularity and performance. There are many probabilistic programming languages with different trade-offs \citep{Scibior:2015:PPP:2887747.2804317} and many of them are limited to ensure that the model has certain properties in order to make inference fast. The type of approach followed by these programming languages, such as BUGS \citep{gilks1994language} and Infer.NET \citep{minkainfer}, simplify writing inference algorithms for the price of reduced expressiveness.
A more generic approach, known as universal probabilistic programming, allows the user to specify any type of model that has a computable prior. The pioneering language was Church \citep{goodman2012church}, a sub-dialect of Scheme. Other examples of probabilistic programming languages include Venture and Anglican \citep{mansinghka2014venture, tolpin2015probabilistic} both also Scheme sub-dialects.
\cite{Scibior:2015:PPP:2887747.2804317} show that the Haskell functional language is an excellent alternative to the above mentioned languages with regard to Bayesian modelling and development of inference algorithms. Just as \cite{erwig_kollmansberger_2006}, \cite{Scibior:2015:PPP:2887747.2804317} use monads and develop a practical probabilistic programming library whose performance is competitive with that of the Anglican language. In order to achieve the desired performance, a less accurate than the exhaustive approach to calculating probabilities is used: sampling. This work by \cite{Scibior:2015:PPP:2887747.2804317}, also elaborated in the first author's doctoral dissertation \citep{Scibior19}, kept on giving rise to a more modular extension of the library presented in previous work, in order to separate modelling from inference \citep{scibior2018functional}. Despite the results obtained, both solutions suffer from the fact that they use monads only to construct probability distributions. Since monads are inherently sequential they are unable to exploit parallelism in the sampling of two independent variables.
\cite{jtobin} contributes to the investigation of embedded probabilistic programming languages, which have the advantage of benefiting from various features for free such as parser, compiler and host language library ecosystem. More than that, \cite{jtobin} studies the functorial and applicative nature of the Giry monad and highlights its various capabilities by mapping them to the probabilistic setting. He uses free monads, a novel technique for embedding a statically typed probabilistic programming language into a purely functional language, obtaining a syntax based on the Giry monad, and uses free applicative functors to be able to express statistical independence and explore its parallel nature. Notwithstanding the progress and studies shown, \cite{jtobin} does not cope with the latest abstraction of \gls{saf} nor fills the gap on how they fit into a probabilistic context in order to benefit from their properties.
\section{Structure of the Dissertation}
This text is structured in the following way: this chapter provides the context, motivation and overall goals of the dissertation. It also presents a review of the state of the art and related work.
%related to the subject of this dissertation.
Chapter \ref{ch-background} introduces the most relevant background topics and chapter \ref{ch-problem} explains in more detail the problem at target and its main challenges. Chapters \ref{sec-current-work} and \ref{ch-applications} present all details of the implemented solution, as well as some application examples and evaluation results. Finally, chapter \ref{ch-conclusion} presents conclusions and guidelines for future work.
\chapter{Background}\label{ch-background}
This chapter shines a light through the path of probabilities and their foundations. The aim is to provide readers with a good context refreshment and intuition, saving them from the need to resort to heavy books. While more reading is required for a full understanding of the whole background, this chapter can easily be skipped by readers familiar with these subjects.
\section{Set Theory}
The field of probability theory is the basis of statistics, giving means to model social and economic behaviour, infer from scientific experiments, or almost everything else. Through these models, statisticians are able to draw inferences from the examination of only a part of the whole.
Just as statistics was built upon the foundations of probability theory, probability theory was built upon set theory. Statisticians aim at drawing conclusions about populations of objects by making observations or conducting experiments, for which they need to identify all possible outcomes in the first place, the \emph{sample space}.
\begin{definition}{}
The set $S$ of all possible outcomes of an experience is called the \emph{sample space}.
\end{definition}
The next step, after the sample space is defined, is to consider the collection of possible outcomes of an experience.
\begin{definition}{}
An \emph{event} $E$ is any collection of possible results of an experience, i.e. any subset of $S$ ($E \subseteq S$).
\end{definition}
As the reader surely knows, there are several elementary operations on sets (or events):
\vskip0.2cm
\textbf{Union:} The union of two events, $A \cup B$, is the set of elements that belong to either $A$ or $B$, or both:
\begin{equation}
A \cup B = \{x : x \in A\ or\ x \in B\}
\end{equation}{}
\textbf{Intersection:} The intersection of two events, $A \cap B$, is the set of elements that belong to both $A$ and $B$:
\begin{equation}
A \cup B = \{x : x \in A\ and\ x \in B\}
\end{equation}{}
\textbf{Complementation:} The complement of an event $A$, $A^c$, is the set of all elements that are not in $A$:
\begin{equation}
A^c = \{x : x \notin A \}
\end{equation}{}
These elementary operations can be combined and behave much like numbers:
\begin{theorem}{}
\begin{align*}
&1.\ Commutativity && A \cup B = B \cup A \\
& && A \cap B = B \cap A \\
&2.\ Associativity && A \cup (B \cup C) = (A \cup B) \cup C \\
& && A \cap (B \cap C) = (A \cap B) \cap C \\
&3.\ Distributive\ Laws && A \cap (B \cup C) = (A \cap B) \cup (A \cap C) \\
& && A \cup (B \cap C) = (A \cup B) \cap (A \cup C) \\
&3.\ DeMorgan's\ Laws && (A \cup B)^c = A^c \cap B^c \\
& && (A \cap B)^c = A^c \cup B^c
\end{align*}{}
\end{theorem}
The reader is referred to \cite{CaseBerg:01} for the proofs of these properties.
\section{Basic Probabilities and Distributions}
Probabilities come up rather often in our daily lives. They are not only of use to statisticians. A good understanding of probability theory allows us to assess the likelihood of everyday tasks and to benefit from the wise choices learnt by experience. Probability theory is also useful in the fields of economics, medicine, science and engineering, and in risk analysis. For example, the design of a nuclear reactor must be such that the leak of radioactivity into the environment should be an extremely rare event. So, using probability theory as a tool to deal with uncertainty, the reactor can be designed to ensure that an unacceptable amount of radiation will escape once in a billion years.
\paragraph{What Probabilities are}
When an experiment is made, its realisation results in an outcome that is a subset of the sample space. If the experiment is repeated multiple times the result might vary in each repetition, or not. This "frequency of occurrence" can be seen as a \emph{probability}. % If the result of an experiment can be described probabilistically its half way there to analyse the experience statistically.
However, this "frequency of occurrence" is just one of the many interpretations of what a probability is, another one being more subjective: a probability is the \emph{belief} of a chance of an event occurring.
For each event $A$ in the sample space $S$ a number in the interval $[0, 1]$, said to be the probability of $A$, is associated with $A$ denoted $P(A)$. The domain of $P$, which intuitively is the set of all subsets of $S$, is called a \emph{sigma algebra}, denoted by $\mathscr{B}$.%
\begin{definition}{A collection of subsets of $S$ is a sigma algebra, $\mathscr{B}$ if it satisfies the following properties:}
\begin{align*}
&1.\ \emptyset \in \mathscr{B}\ \text{(the empty set is an element of $\mathscr{B}$)}\\
&2.\ \text{If}\ A \in \mathscr{B}\ \text{, then}\ A^c \in \mathscr{B}\ \text{($\mathscr{B}$ is closed under complementation)}\\
&3.\ \text{If}\ A_1, A_2, ... \in \mathscr{B}\ \text{, then}\ \bigcup_{i=1}^{\infty} A_i \ \in \mathscr{B}\ \text{($\mathscr{B}$ is closed under countable unions)}
\end{align*}{}
\end{definition}{}
\begin{example}{(Sigma algebras)}
If $S$ has $n$ elements, then $\mathscr{B}$ has $2^n$ elements. For example, if $S = \{1,2,3\}$, then $\mathscr{B}$ is the collection of $2^3 = 8$ sets:
\begin{align*}
&\{1\}\quad \{1,2\}\quad \{1,2,3\} \\
&\{2\}\quad \{1,3\}\quad \{\emptyset\} \\
&\{3\}\quad \{2,3\}
\end{align*}{}
If $S$ is uncountable (e.g.\ $S = (-\infty, +\infty)$), then $\mathscr{B}$ is the set that contains all sets of the form:
\begin{align*}
[a, b]\quad [a, b)\quad (a, b]\quad (a, b)
\end{align*}{}
\end{example}{}
Given this, $P(\cdot)$ can now be defined as a function from $\mathscr{B} \rightarrow [0,1]$, this probability measure must assign to each event $A$, a probability $P(A)$ and abide the following properties:
\begin{definition}{Given a sample space $S$ and an associated sigma algebra $\mathscr{B}$, a probability function $P$ satisfies:}
\begin{align*}
& 1.\ P(A) \in [0, 1]\ \text{, for all}\ A \in \mathscr{B} \\
& 2.\ P(\emptyset) = 0\ \text{(i.e. if $A$ is the empty set, then $P(A) = 0$)} \\
& 3.\ P(S) = 1\ \text{(i.e. if $A$ is the entire sample space, then $P(A) = 1$)} \\
& 4.\ \text{$P$ is \emph{countably additive}, meaning that if, $A_1$, $A_2$, ...} \\
& \quad \text{is a finite or countable sequence of \emph{disjoint} events, then:}\\
& \qquad\qquad P(A_1 \cup A_2 \cup ....) = P(A_1) + P(A_2) + \text{...}
\end{align*}{}
\end{definition}{}
These properties satisfy the Axioms of Probability (or the Kolmogorov Axioms), and every function that satisfies them is called a probability function. The first three axioms are pretty intuitive and easy to understand. However, the fourth one is more subtle and is an implication of the third Kolmogorov Axiom, called the \emph{Axiom of Countable Additivity} which says that one can calculate probabilities of complicated events by adding up the probabilities of smaller events, provided those smaller events are disjoint and together contain the entire complicated event.
\subsubsection{Calculus of Probabilities}
Many properties of a probability function follow from the From the Axioms of Probabilities, which is
% Properties are quite
useful for calculating more complex probabilities. The additivity property automatically implies certain basic properties that are true for any probability model.
Taking a look at $A$ and $A^c$ one can see that they are always disjoint, and their union is the entire sample space: $A \cup A^c = S$. By the additivity property one has $P(A \cup A^c) = P(A) + P(A^c) = P(S)$, and since $P(S) = 1$ is known, then $P(A \cup A^c) = 1$ or:
\begin{align}
P(A^c) = 1 - P(A)
\end{align}{}
\noindent In words: the probability of an event not happening is equal to one minus the probability of an event happening.
% We can already see how this property can be useful, and in fact there are a lot others that follow from other properties.
\begin{theorem}{Let $A_1$, $A_2$, ... be events that form a partition of the sample space $S$. Let $B$ be any event, then:}
\begin{align*}
P(B) = P(A_1 \cap B) + P(A_2 \cap B) + ...
\end{align*}{}
\end{theorem}
\begin{theorem}{Let $A$ and $B$ be two events such that $B \subseteq A$. Then:}
\begin{align}\label{2-1-3}
P(A) = P(B) + P(A \cap B^c)
\end{align}{}
\end{theorem}
\noindent From this one can draw, since $P(A \cap B^c) \geq 0$ always holds:
\begin{corollary}{(Monotonicity) Let $A$ and $B$ be two events such that $B \subseteq A$. Then:}
\begin{align*}
P(A) \geq P(B)
\end{align*}{}
\end{corollary}
\noindent Moreover, by rearranging (\ref{2-1-3}) one obtains:
\begin{corollary}{Let $A$ and $B$ be two events such that $B \subseteq A$. Then:}
\begin{align*}
P(A \cap B^c) = P(A) - P(B)
\end{align*}{}
\end{corollary}
\noindent Finally, by lifting constraint $B \subseteq A$ one has the following, more general property:
\begin{theorem}{(Principle of inclusion–exclusion, two-event version) Let $A$ and $B$ be two events. Then:}
\begin{align}\label{t-2-1-4}
P(A \cup B) = P(A) + P(B) - P(A \cap B)
\end{align}{}
\end{theorem}
% Property \ref{2-1-4} gives an useful inequality for the probability of an intersection.
Since $P(A \cup B) \leq 1$, property (\ref{t-2-1-4}) leads to (after some rearranging):
\begin{align}\label{al-6}
P(A \cap B) \geq P(A) + P(B) - 1
\end{align}{}
This inequality is a special case of what is known as the \emph{Bonferroni's Inequality}. Altogether, one can say that the basic properties of total probability, subadditivity, and monotonicity hold. The interested reader is referred to \citep{CaseBerg:01} or \citep{RePEc:bes:amstat:v:59:y:2005:m:august:p:276-276} for proofs and more details concerning the theorems above.
\subsubsection{Counting and Enumerating Outcomes}
Counting methods can be used to assess probabilities in finite sample spaces. In general, counting is non-trivial, often needing constraints to be taken into account. The approach is to break counting problems into easy-to-count sub-problems and use some combination rules.
\begin{theorem}{(Fundamental Theorem of Counting)}\label{t-2-1-5}
If a job consists in $k$ tasks, in which the \emph{i}-th task can be done in $n_i$ ways, $i = 1,...,k$, then the whole job can be done in $n_1 \times n_2 \times ... \times n_k$ ways.
\end{theorem}{}
Although theorem \ref{t-2-1-5} is a good starting point, in some situations there are more aspects to consider. In a lottery, for instance, the first number can be chosen in 44 ways and the second in 43 ways, making a total of $44 \times 43 = 1892$ ways. However, if the player could pick the same number twice then the first two numbers could be picked in $44 \times 44 = 1936$ ways. This shows the distinction between counting \emph{with replacement} and counting \emph{without replacement}. There is a second important aspect in any counting problem: whether or not the \emph{order} of the tasks matters. Taking these considerations into account, it is possible to construct a $2 \times 2$ table of possibilities.
Back to the lottery example, one can express all the ways a player can pick 6 numbers out of 44, under the four possible cases:
\begin{itemize}
\item \emph{ordered, without replacement} - Following theorem \ref{t-2-1-5} the first number can be picked in 44 ways, the second in 43, etc. So, there are:
\begin{align*}
44 \times 43 \times 42 \times 41 \times 40 \times 39 = \frac{44!}{38!} = 5082517440
\end{align*}{}
\item \emph{ordered, with replacement} - Each number can be picked in 44 ways, so:
\begin{align*}
44 \times 44 \times 44 \times 44 \times 44 \times 44 = 44^6 = 7256313856
\end{align*}{}
\item \emph{unordered, without replacement} - Since how many ways we can pick the numbers if the order is taken into account is known, then one just needs to divide the redundant orderings. Following theorem \ref{t-2-1-5}, 6 numbers can be rearranged in $6!$, so:
\begin{align*}
\frac{44 \times 43 \times 42 \times 41 \times 40 \times 39}{6 \times 5 \times 4 \times 3 \times 2 \times 1} = \frac{44!}{6!38!} = 7059052
\end{align*}{}
\end{itemize}{}
This last form of counting is so frequent that there is a special notation for it:
\begin{definition}{For non-negative numbers, $n$ and $r$, where $n \geq r$, the symbol $\binom{n}{r}$, read \emph{$n$ choose $r$}, as:}
\begin{align*}
\binom{n}{r} = \frac{n!}{r!(n-r)!}
\end{align*}{}
\end{definition}{}
\begin{itemize}
\item \emph{unordered, with replacement} - %This is the most difficult case to count.
To count this more difficult case, it is easier to think of placing 6 markers into 44 boxes. Someone noticed \citep{feller-vol-2} that all one needs to keep track of is the arrangement of the markers and the walls of the boxes. Therefore, 43 (walls) $+$ 6 markers $=$ 49 objects which can be combined in $49!$ ways. Redundant orderings still need to be divided, so:
\begin{align*}
\frac{49!}{6!43!} = 13983816
\end{align*}{}
\end{itemize}{}
The following table summarises these situations:
\begin{table}[h]
\centering
\resizebox{0.6\textwidth}{!}{%
\begin{tabular}{@{}ccc@{}}
\toprule
\multicolumn{1}{l}{} & Without replacement & With replacement \\ \midrule
Ordered & $\frac{n!}{(n-r!)}$ & $n^r$ \\
Unordered & $\binom{n}{r}$ & $\binom{n+r-1}{r}$ \\ \bottomrule
\end{tabular}%
}
\caption{Number of possible arrangements of size $r$ from $n$ objects}
\label{tab:my-table}
\end{table}
Counting techniques are useful when the sample space is finite and all outcomes in $S$ are equally probable. So, the probability of an event can be calculated by counting the number of its possible outcomes. For $S = \{S_1, ..., S_n\}$, saying that all the elements are equally probable means $P(\{s_i\} = \frac{1}{N}$. From the Axiom of Countable Additivity, for any event $A$:
\begin{align*}
P(A) = \frac{\text{\# of elements in $A$}}{\text{\# of elements in $S$}}
\end{align*}{}
\subsubsection{Conditional Probability and Independence}
All probabilities dealt with so far were unconditional. There are situations in which it is desirable to \emph{update the sample space based on new information}, that is to calculate conditional probabilities.
\begin{definition}{If $A$ and $B$ are events in S, and $P(B) > 0$, then the conditional probability of $A$ given $B$, is:}
\begin{align}\label{d-2-1-6}
P(A|B) = \frac{P(A \cap B)}{P(B)}
\end{align}{}
\end{definition}{}
It is worth noting that what happens in a conditional probability calculation is that $B$ becomes the sample space ($P(B|B) = 1$). The intuition is that the event $B$ will occur a fraction $P(B)$ of the time and, both $A$ and $B$ will occur a fraction $P(A \cap B)$ of the time; so the ratio $P(A \cap B)/P(B)$ gives the \emph{proportion of times} when both $B$ and $A$ occur.
Rearranging (\ref{d-2-1-6}) gives a useful way to calculate intersections:
\begin{align}\label{a-7}
P(A \cap B) = P(A|B)P(B)
\end{align}{}
By symmetry with (\ref{a-7}) and equating both right-hand sides of the symmetry equations:
\begin{theorem}{(Bayes' Theorem) Let $A$ and $B$ be two events with positive probabilities each:}
\begin{align}\label{t-bayes}
P(A|B) = P(B|A)\frac{P(A)}{P(B)}
\end{align}{}
\end{theorem}{}
There might be cases where an event $B$ does not have any impact on the probability of another event $A$: $P(A|B) = P(A)$. If this holds then by using Bayes' rule (\ref{t-bayes}):
\begin{align*}
P(B|A) = P(A|B)\frac{P(B)}{P(A)} = P(A)\frac{P(B)}{P(A)} = P(B)
\end{align*}{}
%\subsubsection{Distribution functions}
%\subsection{Statistical and Bayesian inference}
%\subsection{Probabilistic programming}
%\section{Category Theory}
\section{(Linear) Algebra of Programming}
\gls{laop} is a quantitative extension to the \gls{aop} discipline that treats binary functions as relations. This extension generalises relations to matrices and sees them as arrows, i.e.\ morphisms typed by the dimensions of the matrix. This extension is important as it paves the way to a categorical approach which is the starting point for the development of an advanced type system for linear algebra and its operators.
Central to the approach is the notion of a \emph{biproduct}, which merges categorical products and coproducts into a single construction. Careful analysis of the biproduct axioms as a system of equations provides a rich palette of constructs for building matrices from smaller ones, regarded as blocks.
By regarding a matrix as a morphism between two dimensions, matrix multiplication becomes simple matrix composition:
\begin{center}
\begin{tikzcd}
m & n \arrow[l, "A"'] & q \arrow[l, "B"'] \arrow[ll, "C = A \cdot B", bend left]
\end{tikzcd}
\end{center}
Since this discipline is based on \gls{ct}, some basic familiarity with categories $\mathbb{C}$, $\mathbb{D}$, functors $\mathbb{F}$, $\mathbb{G} : \mathbb{C} → \mathbb{D}$, natural transformations $\alpha$ , $\beta : \mathbb{F} → \mathbb{G}$, products and coproducts, is assumed. The reader is referred to e.g.\ %Otherwise, we remit the reader to
\citep{Awodey:2010:CT:2060081}, \citep{jno-5}
%. A more detailed overview of the topic can also be seen in the work done by
and \citep{Macedo2012MatricesAA} for more details.
\subsection{Category of Matrix Basic Structure}
\subsubsection{Vectors} Wherever one of the dimensions of the matrix is 1 the matrix is referred as a \emph{vector}. In more detail, a matrix of type $m \leftarrow 1$ is a column vector, and of type $1 \leftarrow m$ is a row vector.
\subsubsection{Identity} The identity matrix has type $n \leftarrow n$. For every object $n$ in the category there must be a morphism of this type, which will be denoted by $n \xleftarrow{id_n} n$
\subsubsection{Transposed Matrix} The transposition operator changes the matrix shape by swapping rows with columns. Type-wise, this means converting an arrow of type $n \xleftarrow{A} m$ into an arrow of type $m \xleftarrow{A^\circ} n$.
\subsubsection{Bilinearity} Given two matrices it is possible to add them up entry-wise, leading to $A + B$ with $0$ as unit - the matrix wholly filled with $0$'s. This unit matrix works as one would expect with respect to matrix composition:
\begin{align*}
& A + 0 = A = 0 + A \\
& A \cdot 0 = A = 0 \cdot A
\end{align*}
In fact, matrices form an Abelian category:
\begin{align*}
& A \cdot (B + C) = A \cdot B + A \cdot C\\
& (B + C) \cdot A = B \cdot A + C \cdot A
\end{align*}
\subsection{Biproducts}
In an Abelian category, a biproduct diagram for the objects $m$, $n$ is a diagram of the following shape
\begin{center}
\begin{tikzcd}
m \arrow[r, "i_1"', bend right] & r \arrow[l, "\pi_1"', bend right] \arrow[r, "\pi_2", bend left] & n \arrow[l, "i_2", bend left]
\end{tikzcd}
\end{center}{}
\noindent whose arrows $\pi_1$, $\pi_2$, $i_1$, $i_2$, satisfy the following laws:
\begin{align*}
&\pi_1 \cdot i_1 = id_m \\
&\pi_2 \cdot i_2 = id_n \\
&i_1 \cdot \pi_1 + i_2 \cdot \pi_2 = id_r \\
&i_1 \cdot i_2 = 0 \\
&\pi_2 \cdot i_1 = 0
\end{align*}
How do biproducts relate to products and coproducts in the category? The diagram and definitions below depict how products and coproducts arise from biproducts (the product diagram is in the lower half; the upper half is the coproduct one):
\begin{center}
\begin{tikzcd}
& & b & & \\
a \arrow[rr, "i_1"', bend left] \arrow[rru, "A", bend left] & & a + b \arrow[u, "{\left[A|B\right]}"'] \arrow[ll, "\pi_1", bend left] \arrow[rr, "\pi_2"', bend right] & & b \arrow[ll, "i_2", bend right] \arrow[llu, "B"', bend right] \\
& & c \arrow[u, "{\left[\frac{C}{D}\right]}"'] \arrow[llu, "C", bend left] \arrow[rru, "D"', bend right] & &
\end{tikzcd}
\end{center}{}
By analogy with the \gls{aop}, expressions $\left[A | B\right]$ and $\left[\tfrac{A}{B}\right]$ will be read 'A junc B' and 'C split D', respectively. These combinators purport the effect of putting matrices side by side or stacked on top of each other, respectively. Taken from the rich algebra of such combinators, the following laws are very useful, where capital letters $M$, $N$, etc.\ denote suitably typed matrices (the types, i.e.\ dimensions, involved in each equality can be inferred by drawing the corresponding diagram):
\begin{itemize}
\item Fusion laws:
\begin{align}\label{fusion}
& P \cdot \left[A|B\right] = \left[P \cdot A | P \cdot B\right] \\
& \left[\frac{A}{B}\right] \cdot P = \left[\frac{A \cdot P}{B \cdot P}\right]
\end{align}
\item Divide and conquer:
\begin{equation}\label{div-conq}
\left[A | B\right] \cdot \left[\frac{C}{D}\right] = A \cdot C + B \cdot D
\end{equation}
\item Converse duality:
\begin{equation}
\left[A | B\right]^\circ = \left[\frac{A^\circ}{B^\circ}\right]
\end{equation}
\item Exchange ("Abide") law:
\begin{equation}\label{abide}
\left[\left[\frac{A}{C}\right] | \left[\frac{B}{D}\right]\right] = \left[\frac{\left[A | B\right]}{\left[C | D\right]}\right]
\end{equation}
\end{itemize}{}
\subsection{Biproduct Functors}
As in the relational setting of the standard \gls{aop}, the biproduct presented above gives rise to the coproduct bifunctor that joins two matrices (which is usually known as \emph{direct sum}):
\begin{center}
\begin{tikzcd}
& {A \oplus B = \left[i_1 \cdot A | i_2 \cdot B \right]} & \\
k & k+j & j \\
n \arrow[u, "A"] & n + m \arrow[u, "A \oplus B" description] & m \arrow[u, "B"']
\end{tikzcd}
\end{center}{}
The well-known Kronecker product is the tensor product in matrix categories. In the context of \gls{laop}, this bifunctor may be expressed in terms of the Khatri Rao product which, in turn, can and be expressed in terms of the Hadamard and Schur matrix multiplication:
\begin{center}
\begin{tikzcd}
k & k \times j & j \\
n \arrow[u, "A"] & n \times m \arrow[u, "A \otimes B" description] & m \arrow[u, "B"']
\end{tikzcd}
\end{center}{}
\section{Stochastic Matrices}\label{sec-stoch-mat}
Functions are special cases of relations — the deterministic, totally defined ones. Relations, however, can also be considered as special cases of functions — the set-valued ones, as captured by universal property:
\begin{align}\label{func-rel}
f = \Lambda R \equiv \langle \forall\ b,a\ ::\ b R a \equiv b \in f\ a \rangle
\end{align}
\noindent This implies that a binary relation $R$ can be expressed uniquely by the $\Lambda R$ function, which yields the (possibly empty) set of all $b$ that $R$ relates to $a$ for a given input $a$. Dually, any set-valued function $f$ "is" a relation that relates every input to any of its \emph{possible} outputs.
Note the word 'possible' in the previous paragraph: it means that any outcome may be output, but nothing is said about which outputs are more \emph{probable} than others. Even if one were able to foresee such a probability or tendency, how would it be expressed?
Written in terms of types, (\ref{func-rel}) is the isomorphism:
% https://tikzcd.yichuanshen.de/#N4Igdg9gJgpgziAXAbVABwnAlgFyxMJZABgBpiBdUkANwEMAbAVxiRAEEACAHW4CcsAcwAWOOnz4QA7j24BbOjmFwAxn2AAFAL6cAQiC2l0mXPkIoATOSq1GLNl14CRYidL0GjIDNjwEiAIzW1PTMrIggvCoEggY2MFCC8ESgAGaSckhkIDgQSEEgAEYwYFBIAMzZDHTFDBomfuYgDDCpOCAhduEgABS8WGCyKlAQOACUnmkZ+dS5WdTFpRVVNTB1DWZsLW0dtmFsvAAydHKFUHRxWkA
\begin{equation}\label{func-rel-2}
\begin{tikzcd}
A \rightarrow \mathscr{P} B \arrow[rr, "(\in \cdot)", bend left] & \cong & A \rightarrow B \arrow[ll, "\Lambda", bend left]
\end{tikzcd}
\end{equation}
\noindent $A \rightarrow \mathscr{P}B$ is the functional type that can also be written $(\mathscr{P}B)^{A}$, where $\mathscr{P}B$ denotes the power set of $B$, and $A \rightarrow B$ is the relational type of all relations $R \subseteq B \times A$. Operator $\Lambda$, termed the \emph{ power transpose} \citep{Bird:1997:AP:248932,oliveira2012towards, freyd1990categories}, defines the isomorphism, from right to left. Since $\mathscr{P}B$ is isomorphic to $2^{B}$, which is the set of all $B$ predicates, one might write $A \rightarrow 2^{B}$ for the type of $f$ in (\ref{func-rel}), where $2 = \{0,1\}$ is the set of truth values ($0$ is false and $1$ is true). So for each input $a \in A$, $f\ a$ is a predicate that tells which outputs are likely to be $b \in B$.
With $2^B$ one is able to tell which outputs can be issued but not how likely they are. Ranking output probability can be achieved by extending from $B$ predicates to $[0,1]^{B}$ distributions, where $[0,1]$ denotes the interval of real numbers between $0$ and $1$. That is, on extends the discrete set $\{0,1\}$ to the corresponding interval of real numbers. Not every function $\mu \in [0,1]^{B}$ qualifies: only those such that $\sum\limits_{b \in B} \mu\ b = 1$ holds. By defining
\begin{align}\label{func-rel-3}
\mathscr{D}B = \{\mu \in [0,1]^{B}\ |\ \sum\limits_{b \in B} \mu\ b = 1 \}
\end{align}
\noindent $A \rightarrow \mathscr{D} B$ will be regarded as the type of all probabilistic functions from $A$ to $B$. Probabilistic functions have been around in various guises. For $B = A$ they can be regarded as Markov chains.
In what way does the \gls{aop} extends to probabilistic functions? In the same way one can look for an isomorphism close to (\ref{func-rel}), this time with $\mathscr{D} B$ instead of $\mathscr{P} B$. This is not difficult to accomplish: just write $(\mathscr{D} B)^{A}$ instead of $A \rightarrow \mathscr{D} B$ and extend $\mathscr{D}B$ to $[0,1]^{B}$, temporarily leaving aside the requirement captured by the summation in (\ref{func-rel-3}): by uncurrying, $([0,1]^{B})^{A}$ is isomorphic to $[0,1]^{B \times A}$, which can be considered as the mathematical space of all $[0,1]$-valued matrices with as many columns as elements in $A$ and rows as elements in $B$. Thus, given the probabilistic function $A \xrightarrow[]{f} \mathscr{D}B$, its matrix transform $\llbracket f \rrbracket$ is an unique $M$ matrix, such that:
%
\begin{align}
M = \llbracket f \rrbracket \equiv \langle \forall b,a\ ::\ M(b,a)\ = (f\ a)\ b \rangle
\end{align}
Recalling (\ref{func-rel-3}), each matrix of this kind will be such that all its columns will add up to $1$, i.e., \emph{left-stochastic}. This gives rise to a \emph{typed} linear algebra of programming in which matrices replace relations and which can be used to express and reason about (recursive) probabilistic functions \citep{oliveira2012towards}.
\section{Summary}
Common knowledge indicates that probabilities concern numerical descriptions of how likely an event is to occur, or how likely it is to be true for a hypothesis. A number between 0 and 1 is the probability of an occurrence, where 0 indicates the impossibility of the event and 1 indicates certainty. It is easy to reason about the possibility of such outcomes for simple events, e.g.\ a flip of a coin or a game of cards, but most real life situations require carefulness and rigour.
%With this being said,
Set theory is the mathematical framework on which probability theory and its calculus are founded, by expressing it through a set of axioms in a rigorous mathematical manner. Without this basis, abstractions such as \gls{laop}, which allow one to reason about probabilities in a more compositional, generic, higher-level manner, would not be possible. With regard to the computational aspect of probabilities and probabilistic programming, linear algebra is closer than what could be imagined and, that thanks to matrices and these mathematical foundations, correct, efficient and compositional solutions were made possible to build.
The main purpose of the current chapter is to provide a path to basic probability theory, its foundations and its common vocabulary, followed by a brief introduction to the linear algebra of programming and stochastic matrices.
\chapter{Contribution}\label{sec-current-work}
This chapter presents the main contributions of the work carried out in this master's project and discusses its major obstacles and difficulties.
%of this master's dissertation, as well as its main contributions, that arose from the study and work carried out during the master's degree.
As will be seen later, the contributions range over theoretical and practical aspects of the problem being addressed. Concerning theoretical contributions, the probabilistic interpretations of Arrows and \glspl{saf} are discussed, while bridging between sampling problems and concurrency ones. With regard to practical contributions, two probabilistic programming libraries in Haskell are described and presented.
\section{The Problem and its Challenges}\label{ch-problem}
%This section addresses the main problems and challenges that were found during this master thesis. The proposed approach and corresponding solution are described as well.
\subsection{Probabilistic Interpretation of Selective Functors}
Section \ref{sec-state-art} presented some of the most well-known abstractions in functional programming, as well as their probabilistic interpretation. How these abstractions are translated into the context of probabilistic programming was also addressed. The fact that the Giry monad \citep{giry1982, jtobin} is an Applicative Functor takes us closer to a potential probabilistic interpretation of \glspl{saf}. In addition, the relationship between \glspl{saf} and Arrows \citep{andrey2019selective} has also led to the challenge of figuring out how Arrows' fundamentals and generality can be useful in finding % an answer.
%
% One of the difficulties therefore is to find
the probabilistic interpretation of \glspl{saf} and how it relates to any probabilistic programming construct.
\subsection{Inefficient Probability Encodings}
There are two ways to model probabilistic distributions. In the light of the work outlined in section \ref{sec-related-work}, it is possible to opt for an exhaustive representation of distributions, where all chance-value pairs are stored and any structural manipulation is done by changing all pairs, one by one. This method has the advantage of ensuring the calculation of the exact probability of any type of event. However, even a seemingly simple problem can lead to state explosions within distributions, which have major negative impacts on performance. With this in mind, another (less rigorous) method of calculating probabilities is to infer them using less reliable, yet faster and more efficient inference algorithms, instead of always measuring the exact probabilities across all values.
Modelling a simple program that calculates the likelihood that a particular event will occur in $N$ throws of a die, using an exhaustive method, will easily become unfeasible even for a relatively small $N$. However, modelling complex, safety-critical problems (such as e.g.\ calculating the probability of two aircrafts crashing) using a non-exhaustive approach may lead to hazardous situations, if the accuracy of the results is not the desired one. This trade-off is a topic of much concern in probabilistic programming. % research and is largely based on the type of method used.
Therefore, finding a way capable of minimising the distance between the two most common probabilistic distribution encodings is challenging. Another challenge is to find one that takes advantage of the \gls{saf} abstraction and manages to make the most out of its static analysis or speculative execution properties.