-
Notifications
You must be signed in to change notification settings - Fork 0
/
curvatureNNI_full.tex
1132 lines (917 loc) · 74.8 KB
/
curvatureNNI_full.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[11pt]{amsart}
\usepackage{amsmath,amsthm,amssymb}
\usepackage{todonotes}
%\usepackage[notref,notcite]{showkeys}
\usepackage{enumerate}
\usepackage{url}
\usepackage[hidelinks]{hyperref}
\usepackage{graphicx}
\usepackage{etoolbox}
\usepackage[style=authoryear,ibidtracker=false,uniquename=false,giveninits=true,terseinits=true,maxbibnames=5]{biblatex}
\DeclareNameAlias{sortname}{last-first}
\renewcommand*{\revsdnamepunct}{}
\renewbibmacro{in:}{}
%\usepackage[nomarkers,nolists]{endfloat}
\usepackage{outlines}
\setlength{\textwidth}{\paperwidth}
\setlength{\textheight}{\paperheight}
\addtolength{\textwidth}{-2in}
\addtolength{\textheight}{-2in}
\calclayout
\addbibresource{curvatureNNI.bib}
\newtheorem{lemma}{Lemma}
\newtheorem{question}[lemma]{Question}
\newtheorem{proposition}[lemma]{Proposition}
\newtheorem{corollary}[lemma]{Corollary}
\newtheorem{theorem}[lemma]{Theorem}
\newtheorem{problem}[lemma]{Problem}
\newtheorem{conjecture}[lemma]{Conjecture}
\theoremstyle{definition}
\newtheorem{example}[lemma]{Example}
\newcommand{\nni}{\mathrm{NNI}}
\newcommand{\rnni}{\mathrm{RNNI}}
\newcommand{\rnniu}{\mathrm{RNNIu}}
\newcommand{\dtt}{\mathrm{DtT}}
\newcommand{\dttu}{\mathrm{DtTu}}
\newcommand{\MH}{\mathrm{MH}}
\newcommand{\ric}{\operatorname{ric}}
\newcommand{\rt}{\operatorname{rt}}
\newcommand{\tp}{\operatorname{tp}}
\newcommand{\W}{\mathcal{W}}
\newcommand{\M}{\mathcal{M}}
\newcommand{\dom}{\operatorname{dom}}
\newcommand{\G}{\mathcal{G}}
\renewcommand{\O}{\mathcal{O}}
\newtoggle{curvatureON}
\togglefalse{curvatureON}
\sloppy
\begin{document}
\title[The combinatorics of discrete time-trees]{The combinatorics of discrete time-trees:\\
theory and open problems}
\author{Alex Gavryushkin}
\address{Department of Biosystems Science and Engineering, ETH Z\"urich, 4058 Basel, Switzerland}
\email{alex@gavruskin.com}
\author{Chris Whidden}
\address{Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109}
\email{cwhidden@fredhutch.org}
\author{Frederick A Matsen IV}
%\address{Program in Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, WA 98109}
\email{matsen@fredhutch.org}
\date{October 26, 2016}
\thanks{AG was supported by Marsden Fund of the Royal Society of New Zealand, grant UOA1324, through the Centre for Computational Evolution, the University of Auckland, where this work has been carried out.
This work was done in part while AG was visiting Fred Hutchinson Cancer Research Center.}
\thanks{CW and FM were funded by National Science Foundation awards 1223057 and 1564137.}
\thanks{CW is a Simons Foundation Fellow of the Life Sciences Research Foundation.}
\thanks{FM supported in part by a Faculty Scholar grant from the Howard Hughes Medical Institute and the Simons Foundation.}
\begin{abstract}
A \emph{time-tree} is a rooted phylogenetic tree such that all internal nodes are equipped with absolute divergence dates and all leaf nodes are equipped with sampling dates.
Such time-trees have become a central object of study in phylogenetics but little is known about the parameter space of such objects.
Here we introduce and study a hierarchy of discrete approximations of the space of time-trees from the graph-theoretic and algorithmic point of view.
One of the basic and widely used phylogenetic graphs, the $\nni$ graph, is the roughest approximation and bottom level of our hierarchy.
More refined approximations discretize the relative timing of evolutionary divergence and sampling dates.
We study basic graph-theoretic questions for these graphs, including the size of neighborhoods, diameter upper and lower bounds, and the problem of finding shortest paths.
We settle many of these questions by extending the concept of graph grammars introduced by Sleator, Tarjan, and Thurston to our graphs.
Although time values greatly increase the number of possible trees, we show that $1$-neighborhood sizes remain linear, allowing for efficient local exploration and construction of these graphs.
We also obtain upper bounds on the $r$-neighborhood sizes of these graphs, including a smaller bound than was previously known for $\nni$.
\iftoggle{curvatureON}{
We also investigate the Ricci-Ollivier curvature of basic random walks on the space.
}{}
Our results open up a number of possible directions for theoretical investigation of graph-theoretic and algorithmic properties of the time-tree graphs.
We discuss the directions that are most valuable for phylogenetic applications and give a list of prominent open problems for those applications.
In particular, we conjecture that the split theorem applies to shortest paths in time-tree graphs, a property not shared in the general $\nni$ case.
\end{abstract}
\maketitle
\thispagestyle{empty}
\addtocounter{page}{-1}
\newpage
\section{Introduction}
The last ten years have seen an explosion of methods using sequence data to infer demographic model parameters by sampling phylogenetic trees \autocite{Kuhner1995-mj,Kuhner1998-tq,Kuhner2000-af,Beerli2001-sc,Kuhner2006-vx,Drummond2002,Drummond2005-ks,Drummond2006-oa,Minin2008-wz}.
These methods have had an especially significant impact in the study of quickly evolving organisms such as viruses, such as inferring historical epidemic spreading rates.
For example, such methods can be used to infer time to a most recent common ancestor of human HIV group M viruses \autocite{Worobey2008-rt,Baele2013-op}.
Thus the \emph{time-tree}---a rooted phylogenetic tree with all internal nodes equipped with absolute divergence dates---has become an important object of investigation.
This interpretation of continuous parameters for time-trees stands in contrast to that for classical phylogenetic trees, in which branch lengths quantify the amount of molecular substitution along a branch.
Posterior distributions on both time-trees and classical trees are estimated using Markov chain Monte Carlo (MCMC) \autocite{Mau1997-sq,Yang1997-gv,Drummond2002}, but with different transition kernels.
This paper builds a foundation for a mathematical understanding of time-trees, such as understanding convergence properties of MCMC thereupon.
\begin{figure}[ht]
\centering
\includegraphics[width=0.7\textwidth]{timeTree.eps}
\caption{Time-tree on $6$ leaves.
Time is measured by non-negative real numbers.
If all times are integers, the tree is a discrete time-tree.}
\label{timeTree.eps}
\end{figure}
Although Markov chain Monte Carlo (MCMC) is guaranteed to sample from the true posterior given an infinite run time, it is important to understand mixing properties of the chain, which determine sampling properties for a finite time run.
The mixing properties of phylogenetic MCMC have, in the classical unrooted case, been a major area of research from both the theoretical perspective of mixing time bounds \autocite{Mossel2005-ly,Mossel2006-fo,Stefankovic2011-hu,spade2014note} and the practical perspective of performance on real data \autocite{beiko2006searching,Ronquist2006-fv,lakner2008efficiency,Whidden2015-yi}.
This research has focused on mixing over the set of discrete phylogenetic tree graph structures, because mixing over these discrete structures is the primary obstruction to MCMC convergence.
Time-trees use different MCMC transition kernels than do classical phylogenetic trees.
The discrete component of MCMC moves between classical phylogenetic trees are typically determined by their discretization.
For example, common moves include \emph{subtree prune and regraft} (SPR) moves, which cut a subtree off and reattach it at another location, and the subset of SPR moves called \emph{nearest-neighbor interchange} ($\nni$) moves.
Classically, these are applied without reference to branch lengths.
On the other hand, the discrete component of moves between time-trees are defined also in terms of their timing information.
For example, \textcite{Hohna2008-vl} show that SPR-like moves that reattach subtrees at the same divergence time are more effective than ones that do not.
This sort of move cannot be expressed using the type of discretization used thus far in which all continuous information is lost, and so the previous work on MCMC mixing cannot be applied.
However, one can discretize time-trees in a way that does preserve some of the information.
For example, by retaining the order of internal nodes backward in time one obtains a so-called \emph{ranked tree} \autocite{Semple2003-nj}.
To make a less rough approximation of a time-tree, one can allow the time periods between nodes of the tree to take only finitely many possible values \autocite{Akerborg2008-cl}.
This results in an object we call a \emph{discrete time-tree}.
The graph built on such trees provides a discretization of the space of time-trees, which we call a \emph{discrete time-tree graph}.
By a \emph{graph} on a set of trees here and throughout the paper we mean a graph consisting of trees as vertices, with edges connecting pairs of trees that are identical after a given tree rearrangement operation \autocite{Semple2003-nj}.
This graph-theoretic terminology is convenient and has become widespread in recent years \autocite{spade2014note, Whidden2015-yi, Gavryushkin2014-bw}.
A sequence of time-trees sampled using MCMC projects to a collection of movements on a graph in which each vertex is a discrete time-tree and each edge is a discrete version of an MCMC move on time-trees.
Although inferential algorithms have applied MCMC on time-trees for over a decade, and graphs corresponding to discretizations of unrooted tree space have been studied for even longer, we are not aware of any work defining graphs from discretization of time-tree spaces or analyzing random walks thereupon.
Such a theory would provide a foundation for understanding the behavior of MCMC algorithms on time-trees, as has been done previously for graphs associated with unrooted phylogenetic trees.
Up to now, the only discretization of time-tree space is that of ranked trees \autocite{Page1991-rd,Ford2009-qi,Lambert2013-mr}, and the corresponding graphs have not been studied.
In this paper we initiate the mathematical study of discrete time-tree graphs, obtain basic geometric and graph-theoretic results,
\iftoggle{curvatureON}{
study random walks on these graphs,
}{}
and compare these results to those in the classical phylogenetic setting.
In particular, we focus on ranked trees, discrete time-trees (as defined above), and the ultrametric versions of those types of trees.
We establish size bounds for neighborhoods and diameters in these tree spaces and show how those bounds can be used to develop efficient tree search algorithms such as MCMC.
The importance of these basic geometric characteristics in phylogenetics and other areas of evolutionary biology is highlighted in \autocite{Huber2011-uu}.
\iftoggle{curvatureON}{
We study Ricci-Ollivier curvature \autocite{Ollivier2009-cj} of Markov chains on these graphs, a formalism which quantifies to what extent random walking brings unit balls together.
This formalism has recently been applied to Markov chains on graphs of classical phylogenetic trees \autocite{Whidden2015-es}.
}
\section{Technical introduction}
\iftoggle{curvatureON}{
\subsection{Discrete time-trees}
}{}
Throughout the paper by a (phylogenetic) \emph{tree} we mean a rooted binary tree with designated leaves, that is, an undirected acyclic graph with the following properties:
(1) all nodes have degree $1$, $2$, or $3$; (2) there exists exactly one node of degree $2$, this node is called the \emph{root} of the tree; (3) all nodes of degree $1$ are labeled by distinct identifiers, these nodes are called \emph{leaves} or \emph{taxa} (singular \emph{taxon}).
The \emph{parent} of a node $x$ in a tree is the unique node $y$ that is both adjacent to $x$ and closer to the root of the tree than $x$.
Every node of a tree has a parent except for the root, and is called the \emph{child} of that parent.
We fix the number of leaves of the trees and denote this number by $n$ throughout the paper.
We also assume that each tree with $n$ leaves uses the same fixed set of $n$ labels to mark the leaves.
Hence, we say that two trees are \emph{isomorphic} if they are isomorphic as graphs and the isomorphism maps leaves marked by the same label to each other.
We do not distinguish between isomorphic trees, i.e.\ we \emph{identify} them.
By a \emph{time-tree} (Figure~\ref{timeTree.eps}) we mean a phylogenetic tree with an absolute time associated with every node of the tree so that the time strictly increases along every path from a leaf to the root: for internal nodes the time is interpreted as (typically estimated) divergence time and for leaves it is the (typically known) sampling time.
By ``absolute'' we clarify that these are not relative times, but rather actual times that can be put on a calendar.
We assume that time progresses backward, from the leaves to the root.
A \emph{discrete time-tree} is a tree such that all its nodes are assigned distinct times from the set of non-negative integers and every node has a smaller time than its parent (Figure~\ref{timeTree.eps}).
Note that this implies that for every pair of nodes $x,y$ if the shortest path from the root to $x$ passes through $y$ then the time of $y$ is greater than the time of $x$.
Each node is associated with a type of event: internal nodes represent divergence events, while leaf nodes represent the sampling of new taxa.
The \emph{rank} of a node is the number of nodes in the tree with strictly smaller time.
We say that a pair of nodes $x,y$ of a discrete time-tree is an \emph{event interval} if there exists no node $z$ such that the time of $z$ is between the time of $x$ and $y$.
Hence, an event interval is an interval between two divergence events, a taxon and a divergence event, or two taxa.
Note that taxon events can be younger than divergence nodes, i.e.\ when the sampling time precedes divergence events on the tree.
The difference between the times of $x$ and $y$ is called the \emph{length} of the interval $x,y$.
We identify two discrete time-trees if they are isomorphic as trees and the isomorphism preserves ranks of the nodes as well as event interval lengths.
We are now ready to introduce a hierarchy of discrete time-trees.
At the bottom level of our hierarchy is the well-known $\nni$ graph, which does not have timing information.
$\nni$ is a graph with the vertices being all trees on $n$ leaves.
Two trees $T$ and $R$ are adjacent in $\nni$ if there exists an edge $e$ in $T$ and an edge $f$ in $R$ such that both edges are not adjacent to a leaf and the graph obtained from $T$ by shrinking $e$ to a vertex is isomorphic to the graph obtained from $R$ by shrinking $f$.
We denote this graph by $\dtt_0$, where $\dtt$ stands for ``discrete time-trees''.
\begin{figure}[ht]
\centering
\includegraphics[width=0.8\textwidth]{DtT_with_rank_move_short_both_intervals}
%\includegraphics[width=0.8\textwidth]{DtT_with_rank_move_short}
\caption{All possible moves performed on intervals $I$ and $J$.
Assuming that the length of every event interval is either $1$ or $2$, the outer trees are all possible neighbors in $\dtt$ of the tree in the middle obtained by moves performed on intervals $I$ and $J$.
The trees on the right are obtained by moves performed on interval $I$ and those on the left on $J$.}
\label{DtT.pdf}
\end{figure}
The following graph $\dtt_m$ forms the level $m > 0$ of the hierarchy.
The set of vertices of the graph is the set of all discrete time-trees on $n$ leaves such that every event interval has its length not greater than $m$.
Two trees $T$ and $R$ are adjacent in $\dtt_m$ if $R$ can be obtained from $T$ by one of three operations: a \emph{length move performed on interval} $I$, \emph{swapping the rank of two nodes $x$ and $y$ on interval} $I$ or an \emph{$\nni$ move performed on interval $I$} (Figure~\ref{DtT.pdf}).
A length move changes the length of $I$ by 1.
Swapping the rank of two nodes swaps the times of the nodes; such a swap is only possible when the nodes bound an event interval $I$ of length $1$.
$R$ can be obtained from $T$ by an $\nni$ move if there exist event intervals $I_T$ and $I_R$ of length $1$ in $T$ and $R$, respectively, such that the graphs obtained by shrinking $I_T$ and $I_R$ to vertices are isomorphic and the isomorphism preserves the lengths of the event intervals.
In other words, by going from one tree to an adjacent tree in the $\dtt_m$ graph we can either change the length of one event interval by one unit, swap the rank of two nodes bounding an event interval of minimal length, or send the length of an event interval of minimal length down to zero and then resolve the multifurcation to either of the two possible trees.
In the latter case, the new interval is of minimal length.
See Figure~\ref{dts_neighbors.pdf} for an example of the full variety of possible moves.
\begin{figure}[ht]
\centering
\includegraphics[width=\textwidth]{dts_neighbors.pdf}
\caption{Trees $T$ and $R$ are at $\dtt_2$ distance $3$.
To move from $T$ to $R$ in $\dtt_2$, one must decrease the length of interval $I$, swap the ranks of nodes $A$ and $C$ bounding interval $J$, and perform an $\nni$ move on interval $K$, resolving nodes $C$ and $D$.}
\label{dts_neighbors.pdf}
\end{figure}
Note that $\dtt_0$ can be obtained from $\dtt_1$ by ``forgetting'' ranks and $\dtt_1$---from $\dtt_m$ by forgetting lengths.
In general, the graph structure for $m > 1$ can be understood by considering all four-tree diagrams as the four trees on the right in Figure~\ref{DtT.pdf} and introducing an extra layer of nodes in every tail adjacent to the $\nni$ triangle.
This hierarchy can be seen as a set of discrete refinements of the full space of phylogenetic time-trees, where event intervals can take an arbitrary real value.
Indeed, if we allow the lengths of event intervals to take every possible non-negative real value and impose the Euclidean metric on trees with the same ranked topology, then we get the $\tau$-space introduced by \textcite{Gavryushkin2014-bw}.
In this case, $\dtt_1$ is the graph with the vertices being the orthants of $\tau$-space and the adjacency relation being the relation of ``have a shared facet of co-dimension $1$.''
Similarly $\dtt_0$ is the adjacency graph on the orthants of BHV space \autocite{Billera2001-rj}.
The $\dtt_m$ graph on ultrametric trees, that is the set of trees such that each leaf has the same time, is denoted by $\dttu_m$.
The $\dtt_1$ graph is denoted by $\rnni$ and called the \emph{ranked $\nni$} graph; on ultrametric trees we will denote it $\rnniu$.
The $\mathrm R$ in $\rnni$ stands for \emph{ranked} and not for rooted.
We emphasize that although this is the $\rnni$ \emph{graph}, $\nni$ \emph{moves} can be applied to trees in any of the spaces considered (under appropriate conditions).
For simplicity, we use the generic identifier $\dtt$ to refer to a $\dtt_m$ with an arbitrary $m > 1$.
A graph can also be seen as a metric space where the distance is given by the length of a shortest path, so we will refer to $\dtt_m$ and other graphs introduced below as both, in agreement with \autocite{Semple2003-nj}.
\iftoggle{curvatureON}{
Furthermore, as $\nni$ graph on unrooted tree topologies with $n-1$ leaves is isomorphic (as a graph) to the $\nni$ graph on rooted tree topologies with $n$ leaves, all our results on curvature immediately imply their counterparts in unrooted $\nni$ graph.
We omit from the text the exact formulations for unrooted trees, as they can be obtained in the obvious way.
\subsection{Ricci-Ollivier curvature}
A random walk on a metric space is defined by its proposal mechanism, which is a functional $m$ that maps points of the space to the set of measures on this space \autocite[see][]{Ollivier2009-cj}.
In other words, for every point $v$ in the space, $m(v,w)$, which we denote by $m_v(w)$, is the probability of the random walk moving from $v$ to $w$ in one step.
Given a metric space $(V,d)$, a \emph{coupling} $\mu$ between two measures $\mu_1$ and $\mu_2$ on $V$ is a probability measure on the set of pairs of points $V \times V$ that is $\mu_1$ after marginalizing over the second component, and $\mu_2$ after marginalizing over the first component, that is, $\sum\limits_{y \in V}\mu(x,y) = \mu_1(x)$ and $\sum\limits_{y \in V}\mu(y,x) = \mu_2(x)$ for all $x \in V$.
A coupling can be seen as a scenario of transforming the measure $\mu_1$ on $V$ to $\mu_2$ by looking at $\mu(x,y)$ as the probability mass that has to be moved from $x$ to $y$ under the scenario.
Using these notations, we define the \emph{work} $\W_\mu$ needed to be done to transform $\mu_1$ to $\mu_2$ via coupling $\mu$ by setting
\[
\W_\mu(\mu_1,\mu_2) = \sum\limits_{x,y\in V}\mu(x,y) d(x,y).
\]
Let $\M$ be the set of all couplings between measures $\mu_1$ and $\mu_2$.
Then the \emph{earth mover's distance} $W$ between $\mu_1$ and $\mu_2$ is defined to be $W(\mu_1,\mu_2) = \inf\limits_{\mu\in\M}\W_\mu(\mu_1,\mu_2)$.
We are now ready to introduce two curvature notions that we are studying in this paper.
Let $m$ be a random walk on a metric space $(V,d)$.
Then the \emph{coarse Ricci-Ollivier curvature} between two points $x$ and $y$ from $V$ with the random walk $m$ is
\[
\kappa(m;x,y) = 1 - \frac{W(m_x,m_y)}{d(x,y)}
\]
Another important notion of this paper is that of asymptotic curvature.
The following random walk $m$ is called a \emph{uniform $p$-lazy random walk} on a metric space $(V,d)$:
\[
m_x(y) =
\begin{cases}
1-p & \mbox{ if } x=y \\
0 & \mbox{ if } d(x,y) > 1 \\
\dfrac{p}{|N(x)|} & \mbox{ if } d(x,y) = 1 \\
\end{cases}
\]
where $x,y \in V$ and $N(x) = \{y \mid d(x,y) = 1\}$.
We follow \textcite{Loisel2014-gu} and define \emph{asymptotic Ricci-Ollivier curvature} $\ric$ between points $x,y$ from a metric space with uniform $p$-lazy random walk $m$ by
\[
\ric(x,y) = \lim_{p\to0} \frac{\kappa(m;x,y)}{p}
\]
One of the central properties of Ricci-Ollivier curvature is that the curvature is a local property \autocite{Ollivier2009-cj}.
For finite metric spaces that means that for all distinct $u$ and $v$,
\[
\inf\{\kappa(m;x,y)\mid d(x,y) = 1\} \leq \kappa(m;u,v).
\]
Hence, we define the \emph{coarse Ricci-Ollivier curvature of a metric space} with a random walk $m$ to be $\inf\{\kappa(m;x,y)\mid d(x,y) = 1\}$ and the \emph{asymptotic Ricci-Ollivier curvature of a metric space} to be $\inf\{\ric(x,y)\mid d(x,y) = 1\}$.
}{}
\section{Geometry and complexity of discrete time-trees}
We begin by considering the shortest path distance on the graphs.
It is well-known that computing distances is NP-hard in $\nni$ \autocite{Dasgupta2000-xa}.
Hence the following question is natural.
\begin{problem}
\label{problemComplexity}
What is the complexity of computing the distance between two discrete time-trees?
\end{problem}
Although this problem remains open, we make progress towards the solution by establishing several geometric and algorithmic properties of these graphs.
First, we demonstrate in the following example that even for caterpillar trees the complexity cannot be derived from that of $\nni$ distance.
A \emph{caterpillar tree} (sometimes called a \emph{ladder tree}) has all leaves connected to a single path from the root.
A \emph{cherry} is a pair of taxa adjacent to a common internal node in the tree.
\begin{example}
\label{ex_caterpillars}
Let $T$ be the ultrametric caterpillar tree denoted $((((((1, 2), 3), 4), 5), 6), 7)$ in Newick notation \autocite{felsenstein1990newick} and $R$ be the ultrametric tree $((((((1, 4), 5), 6), 2), 3), 7)$.
Then a shortest $\nni$ path is given by first making a cherry $(2,3)$, then moving the cherry up to the split $1456 \mid 7$, and then resolving the cherry back.
In $\rnniu$ this path is not shortest, and one shortest path moves the parents of $2$ and $3$ up independently.
\begin{figure}[ht]
\centering
\includegraphics[width=\textwidth]{NNI_VS_rNNI.pdf}
\caption{Shortest paths in $\nni$ may not be shortest in $\rnni$.}
\label{NNI_VS_rNNI.pdf}
\end{figure}
\end{example}
A set of trees $A$ is called \emph{convex} if for every pair of trees from $A$, there exists a shortest path between them such that every tree on the path belongs to $A$.
Example~\ref{ex_caterpillars} can be generalized to show that the set of trees of the form $(\ldots(1, i_2), \ldots, i_{n-1}), n)$, where $\{i_2, \ldots, i_{n-1}\} = \{2, \ldots, n-1\}$, is convex in $\dtt$ and is not convex in $\nni$.
Indeed, the example shows that if two leaves have to be moved up, they can be moved independently along a shortest path.
The generalization of this statement to an arbitrary number of leaves implies convexity in $\dtt$, while the need to group leaves in $\nni$ as in the previous example shows non-convexity in $\nni$ space.
This basic difference in the structure of the two graphs suggests that the complexity of computing the distance is likely to be different.
We proceed by establishing basic geometric properties of discrete time-trees.
\subsection{Sizes of neighborhoods}
We first bound the number of trees in each graph.
As expected, times greatly expand the size of the graphs.
\begin{lemma}[see \autocite{Semple2003-nj}]
\label{spaceSizes}
Let $|V|$ be the number of vertices in the graph, then $|V|$ is equal to
\begin{align*}
& \frac{(n-1)!\,n!\,n!\,m^{n-1}}{2^{n-1}} & \mbox{in $\dtt_m$,}
&&& \frac{(n-1)!\,n!\,m^{n-1}}{2^{n-1}} & \mbox{in $\dttu_m$,}\\
& \frac{(n-1)!\,n!\,n!}{2^{n-1}} & \mbox{in $\rnni$,}
&&& \frac{(n-1)!\,n!\,}{2^{n-1}} & \mbox{in $\rnniu$,}\\
& (2n - 3)!! & \mbox{in $\nni$.}
\end{align*}
\end{lemma}
\proof
The proof for $\nni$ can be found in \autocite{Semple2003-nj}.
The rest follow using similar arguments.
\endproof
We continue by tightly bounding the sizes of $1$-neighborhoods of various discretizations of the space of time-trees.
Each has a $1$-neighborhood size that is linear in $n$ allowing for efficient local traversal and enumeration, a property of importance for phylogenetic algorithms, e.g.\ tree proposals in MCMC.
\iftoggle{curvatureON}{
We will use these bounds to estimate the curvature of various random walks later in the paper.
}{}
\begin{lemma}
\label{neighBound}
Let $T$ be a tree on $n$ leaves and $\deg(T)$---the number of trees adjacent to $T$.
Then
\begin{align*}
& 2(n-1) \leq \deg(T) \leq 5n-6 & \mbox{in $\dtt$,}
&&& n-1 \leq \deg(T) \leq 3n-5 & \mbox{in $\dttu$,}\\
& n-1\leq \deg(T) \leq3n-4 & \mbox{in $\rnni$,}
&&& n-1 \leq \deg(T) \leq 2n-4 & \mbox{in $\rnniu$,}\\
& \deg(T) = 2n-4 & \mbox{in $\nni$.}
\end{align*}
All bounds are tight.
\end{lemma}
\proof
We prove the bounds by providing exemplar trees with the specified degrees and explaining why no tree can have a greater (or lesser) degree.
Recall that $m > 1$ is the maximal possible length of an event interval in a tree from $\dtt$.
The lower bound in $\dtt$ is attained by any tree with all event intervals being of length $m$.
In this case, $\deg(T)$ is simply the number of event intervals, since every event interval adds $1$ to the total degree of the tree.
Other trees have the same or more possible event interval changes, showing that this is a lower bound.
The upper bound is attained by a caterpillar tree with all intervals short, every taxon being younger than every divergence event, and both taxa in the cherry being younger than at least one other taxon, that is, a caterpillar tree with internal nodes having ranks $n, \ldots, 2(n-1)$ and the ranks of the taxa in the cherry $< n-1$.
In this case, $\deg(T)$ is bounded by the sum of the $2(n-1)$ possible interval length changes, at most $2(n-2)$ $\nni$ neighbors, and at most $n$ rank changes that can occur between the $n$ taxa and the root of the cherry.
In total, $\deg(T) \le 2(n-1) + 2(n-2) + n = 5n-6$.
Note that this is an upper bound indeed, as each of the $2(n-2)$ intervals excluding the most recent can contribute either a rank change or $2$ $\nni$ moves.
The caterpillar tree described above maximizes the number of intervals that contribute $2$ $\nni$ moves and enables the rest of intervals to contribute a rank change.
For ultrametric trees, the number of event intervals is $n-1$, hence they add $n-1$ to the degree of the caterpillar tree from interval length changes.
The number of intervals on which an $\nni$ move is possible is $n-2$ for ultrametric trees, hence they contribute $2(n-2)$ to the degree.
In total, this gives the upper bound of $3n-5$ for ultrametric trees.
Every $\nni$ move results in two neighbors, so the equality $\deg(T) = 2(n-2)$ follows for the $\nni$ graph.
Indeed, no $\nni$ move can be performed on an edge adjacent to a taxon, exactly two $\nni$ moves can be performed on every edge between internal nodes (the number of which is $n-2$), and no two trees obtained by an $\nni$ move performed on different edges are identical.
The lower bound in $\rnni$ is attained by the caterpillar-tree where taxa get ranks $0, 1, 3, 5, \ldots, 2n-3$ and internal nodes get ranks $2, 4, 6, \ldots, 2n-2$.
In other words, the divergence events alternate with the taxa in the ranked topology of the tree so that if we parse the tree from the present to the past, we meet the nodes in the following order: taxon, taxon, coalescence, taxon, coalescence, taxon, coalescence, and so on.
In this case, intervals bounded by a taxon from below add nothing to the degree of the tree and intervals bounded from below by an internal node add one each, hence $n-1$ in total.
The upper bound for both $\rnni$ and $\rnniu$ is obtained in the same way as for $\dtt$.
For the lower bound in $\rnniu$, we note that the oldest event interval (the one adjacent to the root) necessarily contributes $2$ to the degree.
Hence the lower bound can be reached by a tree such that no other interval contributes more than 1.
The degree of such a tree is $n-1$ in $\rnniu$.
\endproof
Lemma~\ref{neighBound} gives tight bounds for the number of trees in a $1$-neighborhood, while the following theorem gives an upper bound for the number of trees in an $r$-neighborhood.
The best known upper bound for $\nni$ that we are aware of is $3^{n-2} 2^{4r}$ \autocite{li1996some}.
We obtain a smaller bound for $\rnniu$ and, hence, improve the result for $\nni$ as well.
\begin{theorem}
\label{neighSizeTh}
The number of trees within distance $r$ from any given tree is at most
\begin{align*}
& 3^{n+2r-1} & \mbox{in $\rnniu$,}
&&& 3^{2n+2r-1} & \mbox{in $\rnni$,}\\
& 4^{n+2r-1} & \mbox{in $\dtt$,}
&&& 4^{2n+2r-1} & \mbox{in $\dttu$.}
\end{align*}
\end{theorem}
\proof
We describe the proof in detail for $\rnniu$ and then explain how to modify the proof for the other three graphs.
The proof employs the technique by \textcite{Sleator1992-bp} for counting paths in a graph using a \emph{graph grammar}.
A graph grammar is a method of encoding each possible graph as a finite set of productions (described more formally below).
Given the grammar, the number of possible productions to the power of the encoding length for a given neighborhood radius gives a simple upper bound on the number of trees in a neighborhood.
Specifically, Theorem~2.3 from \autocite{Sleator1992-bp}, adapted to trees, states the following:
\begin{itemize}
\item[] Let $T$ be a tree with $h$ nodes, including all internal nodes and taxa, $\Gamma$ be a graph grammar, $d$ be the number of vertices in left sides of $\Gamma$, and $k$ be the maximum number of nodes in any right side of a production of $\Gamma$.
Let $R(T, \Gamma, r)$ be the set of graphs obtainable from $T$ by derivations in $\Gamma$ of length at most $r$.
Then $|R(T, \Gamma, r)| \leq (d + 1)^{h+k\cdot r}$.
\end{itemize}
We first apply Theorem~2.3 from \autocite{Sleator1992-bp} and then improve the bound using specific properties of the $\rnniu$ graph.
We note that since our trees are ranked trees, they possess additional structure (ranking) on top of the tree graph structure, hence we will be applying the same style of argument as in \autocite{Sleator1992-bp} but to a somewhat different structure.
We now introduce the \emph{graph grammar} for $\rnniu$.
Since we will apply the graph grammar to modify trees, we use the graph-theoretic terminology here and recall that a tree is a graph.
A graph grammar consists of a finite set of productions $\{L_i \to_i R_i\}$, where $L_i$ and $R_i$ are connected undirected edge-end labeled graphs and $\to_i$ is a one-to-one map between half-edges of $L_i$ and those of $R_i$.
Here, for every edge in the tree we distinguish between its two ends and refer to them as half-edges---this is necessary to be able to substitute one subgraph by another in a unique way by connecting corresponding half-edges---as in \autocite{Sleator1992-bp}.
The productions are then applied to the starting tree $T$ to derive all possible trees at $\rnniu$ distance up to $r$ from $T$.
By a \emph{running tree at stage $s$} we mean the tree obtained after $s$ applications of the production rules in the derivation.
A production is said to be \emph{ready} at a stage $s$ of the derivation if the running tree at stage $s$ has a subgraph isomorphic to the left side $L_i$ of the production.
A ready production can be applied to the running tree by destroying all nodes corresponding to the left side of the production under the isomorphism and replacing them with the right side $R_i$ of the production.
The map $\to_i$ of the production then says how to reconnect the right side of the production to the half-edges of the running tree that were created after the destruction.
The obtained tree is the new running tree for the next stage $s+1$ of the derivation.
See \autocite{Sleator1992-bp} for precise definitions and details.
The graph grammar for $\rnniu$ is the grammar $\Gamma$ shown on Figure~\ref{grammar_rNNIu.pdf}.
Note that the definition of a graph grammar requires the left sides $L_i$ to be connected and we have a disconnected left side in the third production.
We can adapt our graph grammar to fit this requirement by noting that the nodes on the left side of the production must be of consecutive ranks, so we consider those nodes as being adjacent via a second type of adjacency relation by declaring nodes with consecutive ranks to be adjacent.
This second type of relation is in addition to the first type of adjacency relation given by the branches of the tree.
Furthermore, this second type of adjacency allows us to avoid considering two separate moves for the left and right sister nodes.
Indeed, the half-edge labels order the edges to make the distinction between left and right, however since our nodes are ordered via ranking, this ordering of edges is irrelevant.
We will further exploit this important property below, where we improve the bound obtained directly from grammar $\Gamma$.
\begin{figure}
\centering
\includegraphics[width=0.7\textwidth]{grammar_rNNIu.pdf}
\caption{Graph grammar $\Gamma$ for $\rnniu$.
Maps $\to_i$ are shown by dashed lines.
In the first two productions, the edge-ends marked by $\beta$ and $\gamma$ without dashed lines are mapped to each other: top $\gamma$ to top $\gamma$, $\beta$ to $\beta$, bottom $\gamma$ to bottom $\gamma$.
In the last production, edge-ends marked by the same label are mapped to each other.
All pairs of nodes on both sides of each production must be of consecutive ranks.
The first two productions correspond to an $\nni$ move, while the last production corresponds to rank change.}
\label{grammar_rNNIu.pdf}
\end{figure}
The number of vertices in left sides of $\Gamma$ is $6$, the maximum number of vertices in any right side of a production of $\Gamma$ is $2$.
Since the number of internal nodes of a tree on $n$ leaves in $\rnniu$ is $n-1$, directly applying Theorem~2.3 from \autocite{Sleator1992-bp} we get the bound of $7^{n+2r-1}$.
This bound can be improved following the techniques described in Sections 3.2--3.4 of \autocite[][see also Section 5]{Sleator1992-bp} by bounding the number of possible productions available at a given time by a number smaller than their total.
Every node of a tree in any application of a production of grammar $\Gamma$ can play either of two roles: top node or bottom node.
Hence to indicate that a pair of nodes is ready for being destroyed and replaced using a production, we must specify which node is a top node and which is a bottom node.
We use a total of three node-labels (\textcite{Sleator1992-bp} call these ``labels of vertices'') to identify these two roles and to distinguish which of the two possible $\nni$ moves is applied.
We claim that three node-labels $a, b, c$ is enough to specify which productions can be applied to a given tree.
We must redefine the notion of readiness to verify this claim.
We say that a pair of consecutive nodes is \emph{ready} if the node-label of the bottom node is $a$ and the node-label of the top node is either $b$ or $c$.
If two consecutive nodes are ready and not connected by an edge in the tree, then only the rank move is possible.
If they are, the type of the move is determined by the node-label of the top node.
Hence, this notion of readiness uniquely identifies which of the three productions is applied.
\textcite{Sleator1992-bp} use a special node-label, called the ``zero label'', to mark nodes that should not be destroyed.
They also explain how to eliminate the zero label in various cases.
In our case, we can use one of the labels that are already in use, namely node-label $a$.
That is, to indicate that a node $v$ should never be destroyed, we label the node with an $a$.
This labeling will indeed preserve the node $v$ because of the following.
If $w$ is the node directly succeeding $v$ at some step of the derivation then $w$ can only be marked with an $a$ and hence the pair $v, w$ is not ready.
If $w$ is the node directly preceding $v$ at some step of the derivation then the pair $w, v$ is never ready.
Hence $v$ is preserved in either case.
Thus, all possible configurations encoded by the six nodes on the left sides of the productions of $\Gamma$ can be encoded using three node-labels.
This implies that the derivation can be encoded by a ternary sequence of length $n-1+2r$.
Indeed, the first $n-1$ entries of the sequence are needed to encode all possible node-labelings of the initial tree, plus two entries are needed for each of the $r$ applications of productions because every production creates two new nodes, each of which has to be node-labeled.
The number of such strings, $3^{n-1+2r}$, gives the desired improved bound.
The proof for the other three spaces follows similarly.
For $\rnni$, we extend the graph grammar $\Gamma$ by two productions, namely, one for the move when two taxa swap their ranks and one for the move when a taxon and an internal node swap their ranks.
Since in both of these productions the only possible type of move is the rank move, two node-labels is enough for these productions.
Since the number of (internal and leaf) nodes in an $\rnni$ tree is $2n - 1$ and the maximum number of nodes on right sides of the productions is still $2$, the desired bound is $3^{2n-1 + 2r}$.
For $\dttu$, we have to extend the grammar $\Gamma$ by two productions, namely, one for increasing the length of the branch between two nodes in the production and one for decreasing.
This increases the number of necessary node-labels by one.
Indeed, if a production has a long event interval on the left side, three node-labels are enough to indicate whether the event interval length increases or decreases.
Alternatively, if a production has a short event interval on the left side, three node-labels are needed for the top node to indicate the type of move to apply: a length increase move, one of the two possible $\nni$ moves, or a rank swap move.
Together, this requires $4$ node-labels and the desired bound is $4^{n-1 + 2r}$.
For $\dtt$, we extend the grammar constructed for $\dttu$ by four productions, namely, one for the move when two taxa either swap their ranks or increase the interval length between them, one for the move when two taxa either increase or decrease the interval length between them, one for the move when a taxon and an internal node either swap their ranks or increase the interval length between them, and one for the move when a taxon and an internal node either increase or decrease the interval length between them.
All these moves require three node-labels, hence the desired bound is $4^{2n-1 + 2r}$.
\endproof
Since this theorem bounds the sizes of $r$-neighborhoods by functions of the form $a^{f(n) + 2r}$, the following corollary bounds the diameters of the graphs under consideration from below.
\begin{corollary}
Let $\Delta(\G)$, $|\G|$, and $\delta_r(\G)$ be the diameter, the number of vertices, and the number of vertices within distance $r$ from any given vertex in graph $\G$, respectively.
If $\delta_r(\G) = a^{f(n) + 2r}$ then
\[
\Delta(\G) \geq \frac 12 \log_a\frac{|\G|}{b},
\]
where $b = a^{f(n)}$.
In particular, $\Delta(\rnniu) \geq \frac 12 \log_3\frac{(n-1)!\,n!}{6^{n-1}}$.
\end{corollary}
\proof
We note that every $r$ that satisfies the inequality $\delta_r(\G) < |\G|$ is smaller than the diameter, that is, $\Delta(\G) > r$ for all such $r$.
By taking $\log_a(\cdot)$, the former inequality is equivalent to $\log_ab + 2r < \log_a|\G|$, that is, to $r < \frac 12 \log_a\frac{|\G|}{b}$.
Hence, the desired inequality.
\endproof
\subsection{Diameter of the graphs}
Now we estimate the diameter of the graphs from above, complementing the above lower bounds.
Recall that $\Delta(\G)$ is the diameter of graph $\G$.
\begin{theorem}
\label{diameterUpperBound}
For $n \ge 4$, $\Delta(\rnniu) \le n^2 - 3n - \displaystyle\frac 58$.
\end{theorem}
\proof
Let $T$ and $R$ be trees in $\rnniu$.
We show that there exists a path between them of length $n^2 - 3n - \frac 58$ or shorter.
We denote the taxa on one side from the root of $T$ by $A$ and the rest of the taxa by $B$.
Let $A = A_1 \cup A_2$ and $B = B_1 \cup B_2$ so that the root of $R$ splits the taxa into $A_1 \cup B_1$ and $A_2 \cup B_2$.
Note that all $A_i$'s and $B_i$'s have to be disjoint.
We assume that $|A_1| = |A_2| = |B_1| = |B_2| = \frac{n-1}{4}$ and denote this number by $s$.
We will see later in the proof that this assumption does not restrict the generality of our argument.
We construct the path from $T$ to $R$ proceeding in the following steps, which are illustrated on Figure~\ref{diameterUpperBound.pdf}.
Let $f(n)$ be the desired upper bound.
For move counts in the remainder of this proof, we will allow the ``null move'' in which no modification is made.
\begin{figure}
\centering
\includegraphics[width = \textwidth]{diameterUpperBound.pdf}
\caption{An algorithm to compute a (not necessarily shortest) path between discrete time-trees.
Although the trees at steps $(6)$ and $(7)$ look identical, the topologies inside the triangles are different: those at step $(6)$ correspond to the restriction of $T$, at step $(7)$---of $R$.}
\label{diameterUpperBound.pdf}
\end{figure}
\begin{enumerate}[(1)]
\item Move all the nodes adjacent to taxa in $A_2$ on top of all the other nodes in $T$, apart from the root.
The tree restricted to $A_2$ is then a caterpillar tree.
This takes $3s^2$ moves.
\item Move the caterpillar tree on $A_2$ to the other side of the root.
This takes $\frac12 s^2 + \frac12 s$ moves.
\item Move all the nodes adjacent to taxa in $B_1$ on top of all nodes apart from those adjacent to taxa in $A_2$ and the root.
The tree restricted to $B_1$ is then a caterpillar tree.
This takes $2s^2$ moves.
\item Swap the caterpillars on $A_2$ and $B_1$.
This takes $s^2$ moves.
\item Move the caterpillar tree on $B_1$ to the other side of the root.
This takes $\frac12 s^2 + \frac12 s$ moves.
\item Translate the tree restricted to $B_2$ up to place it between the trees on $A_1$ and $A_2$.
This takes $s^2$ moves.
\item Convert the tree on $A_1 \cup B_2$ to the corresponding subtree in $R$ in terms of topology and relative ranking of nodes within the subtree.
This takes $2f(n/4)$ moves.
\item Translate the tree on $B_2$ down so that the tree on $A_1 \cup B_2$ matches the corresponding subtree in $R$ in terms of topology and relative ranking of nodes.
This takes $s^2$ moves.
\item Merge the tree on $A_2$ with the tree on $B_2$ so that the tree on $A_1 \cup A_2 \cup B_2$ coincides with that in $R$.
This takes $2s^2$ moves.
\item Merge the tree on $B_1$ with the tree on $A_1$ so that the tree coincides with $R$.
This takes $3s^2$ moves.
\end{enumerate}
In total, we have a recursive equation $f(n) = 14s^2 + s + 2 f(n/4)$.\\
Assuming that the solution is a polynomial, we see that the polynomial must be of degree $2$.
It remains to apply the recursive equation to find the coefficients of the polynomial:
\[
f(n) = n^2 - 3n - \frac 58.
\]
It remains to note that the assumption $|A_1| = |A_2| = |B_1| = |B_2| = \frac{n-1}{4}$ does not increase the distance between $T$ and $R$.
Indeed, the sets can be chosen so that at most half of the nodes have to cross the root, that is, $|A_2 \cup B_1| \le \frac{n-1}{2}$.
Furthermore, the smaller the size of $A_2 \cup B_1$ the shorter the path.
Finally, the total length of the path is maximized when $|A_2| = |B_1|$ and $|A_1| = |B_2|$.
\endproof
Although we stated and proved the theorem only for the $\rnniu$ graph, the idea can be adapted to the other graphs considered here.
However, this adaptation goes beyond the scope of this paper.
We also note that the proof of the theorem provides an efficient algorithm for computing (not necessarily shortest) paths between discrete time-trees.
This algorithm can be used for efficient exploration of the space, e.g.\ via MCMC, where approximate paths can be used to improve mixing by making distant proposals.
The algorithm might also be useful to determine if two trees appear to be separated by a valley (e.g.\ of posterior probability) or if they are rather located on a plateau.
A non-ranked tree has a symmetry associated with every internal node.
Those symmetries can be employed to consider only a single representative from equivalent pairs of trees, such as by using tanglegrams \autocite{Matsen2015-fn, Whidden2015-es}, where a \emph{tanglegram} is a graph formed by identifying the leaves of two phylogenetic trees.
However, as the following proposition shows, ranked trees are free from all but one of those symmetries, hence the tanglegram approach is not applicable in this case.
\begin{proposition}
Let $T$ be a tree from $\dtt$, $\dttu$, $\rnni$, or $\rnniu$ and $\sigma$ a permutation of taxa of $T$.
Let $T_\sigma$ be the tree obtained from $T$ by permuting the taxa using $\sigma$.
Then $T = T_\sigma$ if and only if $\sigma$ is either the identity permutation or a transposition of a pair of taxa that form a cherry in $T$.
\end{proposition}
\proof
The sufficiency is obvious.
For the necessity, assume that $\sigma$ transpose taxa $i,j$ such that $i,j$ is not a cherry in $T$.
This implies that $i$ and $j$ have different parents.
Hence $T \ne T_\sigma$ because the ranks of the parent of $i$ are different in $T$ and $T_\sigma$.
\endproof
These observations suggest that the $\nni$ graph is geometrically and algorithmically different from the other four graphs.
\iftoggle{curvatureON}{
\section{Ricci-Ollivier curvature of discrete time-trees}
We need the following corollaries for our analysis of curvature.
\begin{corollary}
\label{degreeBounds}
The following are satisfied in $\dtt$ for every pair of trees $T$ and $R$.
\begin{align*}
& \deg(T)-\deg(R) \leq 3n-4 && \mbox{in $\dtt$,}\\
& \deg(T)-\deg(R) \leq 2n-4 && \mbox{in $\dttu$,}\\
& \deg(T)-\deg(R) \leq 2n-3 && \mbox{in $\rnni$,}\\
& \deg(T)-\deg(R) \leq n-3 && \mbox{in $\rnniu$,}\\
& \dfrac25 < \dfrac{\deg(T)}{\deg(R)} < \dfrac52 && \mbox{in $\dtt$,}\\
& \dfrac13 < \dfrac{\deg(T)}{\deg(R)} < 3 && \mbox{in $\dttu$ and $\rnni$,}\\
& \dfrac12 < \dfrac{\deg(T)}{\deg(R)} < 2 && \mbox{in $\rnniu$,}\\
\end{align*}
and all inequalities are tight.
\end{corollary}
\proof
The first four inequalities follow from Lemma~\ref{neighBound} immediately.
For the fifth inequality we note that
\[
\frac{\deg(T)}{deg(R)} \geq \frac{2(n-1)}{5n-6} = \frac25 \left(1 + \frac{1}{5n-6}\right) \searrow \frac 25 \mbox{ when } n\to\infty
\]
The upper bound follows similarly, as well as the rest of inequalities.
\endproof
For $\nni$ graph, Lemma~\ref{neighBound} implies that $deg(T)-deg(R) = 0$ and $\frac{\deg(T)}{deg(R)} = 1$.
Let $N(T)$ be the set of trees at distance one from $T$, that is, the set of trees adjacent to $T$ in the corresponding graph.
The following lemma describes how many trees the tree neighborhoods can have in common.
\begin{lemma}
\label{intersecNeighb}
The following is true for all the graphs $\nni$, $\rnni$, $\rnniu$, $\dtt$, and $\dttu$.
If $d(T,R) = 1$ then $|N(T)\cap N(R)|\in\{0,1\}$.
\end{lemma}
\proof
Suppose $T$ and $R$ are neighbors.
If the trees differ by a single branch length or rank change then the fact that $\nni$ moves only apply to short branches implies that any shared neighbor $S$ would have to match the length or rank of one of the trees.
As $S$ must differ from both trees, they do not have a neighbor in common.
If $T$ and $R$ are $\nni$-neighbors, there is precisely one tree which is a neighbor of both $T$ and $R$, namely the third tree that can be obtained by resolving the interval that connects $T$ and $R$.
\endproof
The following general lemma is true for all graphs, particularly, for those we consider in this paper.
By \emph{distance-one random walk}, we mean a random walk $(m_x)_{x \in X}$ satisfying $m_x(y) = 0$ for all $x$ and $y$ such that $d(x,y) > 1$.
\begin{lemma}
\label{curvBoundGeneral}
Let $(M,d)$ be a finite metric space and $x,y$ a pair of points from $M$.
If $(m_x)_{x \in M}$ is a distance-one random walk on $M$, then the curvature $\kappa$ of the random walk satisfies the following boundary conditions.
\[
\dfrac{-2}{d(x,y)} \leq \kappa(x,y) \leq \dfrac{2}{d(x,y)}.
\]
\end{lemma}
\proof
Let $x$, $y$, $u$, and $v$ be arbitrary points from $M$ such that both $m_x(u)$ and $m_y(v)$ are positive.
Since $m$ is a distance-one random walk, $d(u,v) \leq d(x,y) + 2$.
Hence for any coupling $\{p_{u,v}\}$,
\[
W(m_x,m_y) \leq \sum p_{u,v} d(u,v) \leq (d(x,y)+2)\sum p_{u,v} = d(x,y) + 2,
\]
where the sum is taken over a correspondence between the set of $u \in N(x)$ and $v \in N(y)$.
So, $\kappa(x,y) \geq - 2/d(x,y)$.
The lower bound follows similarly from $d(u,v) \geq d(x,y) - 2$.
\endproof
Note that this lemma can be generalized to \emph{distance-$d$} random walks, which are those satisfying $m_x(y) = 0$ for all $x$ and $y$ such that $d(x,y) > d$.
In this case, using the notations of the lemma, $d(u,v) \leq d(x,y) + 2d$.
Thus,
\[
-\dfrac{2d}{d(x,y)} \leq \kappa(x,y) \leq \dfrac{2d}{d(x,y)}.
\]
As we will see in the next few lemmas, these bounds can be improved for certain random walks on the graphs under consideration.
\begin{lemma}
\label{uniformUpper}
Let $T$ and $R$ be adjacent trees.
Then the curvature of the following spaces with uniform random walk satisfy
\begin{align*}
& \kappa(T,R) \leq \dfrac{1}{2(n-1)} && \mbox{for $\dtt$,}\\
& \kappa(T,R) \leq \dfrac{1}{2(n-2)} && \mbox{for $\nni$, and}\\
& \kappa(T,R) \leq \dfrac{1}{n-1} && \mbox{for $\rnni$, $\rnniu$, and $\dttu$.}
\end{align*}
The bounds are tight.
\end{lemma}
\proof
To maximize the curvature $\kappa(T,R)$, we have to minimize $W(m_T,m_R) = \inf\limits_{\mu\in\M} \sum\limits_{x,y\in V}\mu(x,y) d(x,y)$.
The latter is minimized when the probability mass $\mu(x,y)$ to be moved at shorter distance $d(x,y)$ is maximized.
Since $d(T,R) = 1$, it follows from Lemma~\ref{intersecNeighb} that the smallest possible value of $d(x,y)$ is $0$ (the case when $|N(T) \cap N(R)| = 1$).
Hence, to maximize the curvature we are to maximize the probability mass allocated to the tree $E$ in the intersection $N(T) \cap N(R)$.
It follows from Lemma~\ref{neighBound} that the maximal mass $p_E$ we can allocate under a uniform random walk to the tree $E$ is $\frac{1}{2(n-1)}$ in $\dtt$, $\frac{1}{2(n-2)}$ in $\nni$, and $\frac{1}{n-1}$ in $\rnni$, $\rnniu$, and $\dttu$.
It is not hard to see that trees $T$ and $R$ can be chosen so that $d(x,y) = 1$ for all $x,y$ which are different from $E$ and for which $\mu(x,y) > 0$.
Hence, $\kappa(T, R) = 1 - \left(\sum\limits_{x,y\in V\setminus\{E\}}\mu(x,y)\cdot 1 + p_E \cdot 0\right) = 1 - (1-p_E) = p_E$.
The lemma follows.
\endproof
This lemma and corollary provide tighter upper bounds for the curvature than Lemma~\ref{curvBoundGeneral} for all $n$.
It is easy to generalize the lemma to the uniform $p$-lazy random walk.
Indeed, we observe that the curvature between adjacent trees $T$ and $R$ under a uniform $p$-lazy random walk satisfies
\begin{align*}
& \kappa(T,R) \leq \frac{p}{2(n-1)} && \mbox{in $\dtt$,}\\
& \kappa(T,R) \leq \frac{p}{2(n-2)} && \mbox{in $\nni$, and}\\
& \kappa(T,R) \leq \frac{p}{n-1} && \mbox{in $\rnni$, $\rnniu$, and $\dttu$.}
\end{align*}
We recall that the \emph{asymptotic Ricci-Ollivier curvature} \autocite{Loisel2014-gu} $\ric$ of a metric space with uniform $p$-lazy random walk is defined by
\[
\ric(x,y) = \lim_{p\to0} \frac{\kappa(x,y)}{p}
\]
In this case the asymptotic curvature between adjacent trees in $\dtt$ with uniform $p$-lazy random walk is bounded from above by
\[
\frac{1}{2(n-1)}
\]
Following \textcite{Loisel2014-gu}, we will refer to non-asymptotic curvature as \emph{coarse curvature} to distinguish between the two alternative notions of curvature.
Thus, the upper bounds of asymptotic curvatures of $\nni$, $\rnni$, $\rnniu$, $\dtt$, and $\dttu$ spaces coincides with the coarse curvatures of the spaces.
We continue with lower bounds on curvature of spaces with uniform lazy random walks.
\begin{lemma}
\label{uniformLower}
Let $T$ and $R$ be adjacent trees.
Then both the asymptotic curvature of the space with $p$-lazy uniform random walk and the curvature of the space with uniform random walk are at least
\begin{enumerate}[(i)]
\item $-\dfrac{4}{n-1}$ in $\dtt$,
\item $-\dfrac{4}{n-2}$ in $\nni$,
\item $-\dfrac{8}{n-1}$ in $\rnni$.
\end{enumerate}
The bounds are tight.
\end{lemma}
\proof
Let $T$ and $R$ be two adjacent trees.
To minimize the curvature, we have to maximize $W(m_T, m_R)$ across $T$ and $R$.
The latter is maximized when the probability mass on trees $E\in N(T)$ such that $d(E, N(R)) > d(T, R)$, is
maximized\footnote{If $x$ is a point of a metric space $(M,d)$ and $S \subseteq M$ then by $d(x,S)$ we denote $\inf\limits_{s \in S} d(x,s)$.}.
Since $d(T, R) = 1$, the maximum possible number of such trees $E$ is
\todo{Explain this.
EM skipped pending explanation.}
$8$.
To make $W(m_T,m_R)$ being as large as possible, we have to assume $d(S, N(R)) = 1$ for the rest of the trees $S$ from $N(T)$.
Hence for the uniform random walk we have:
\[
W(m_T,m_R)\leq 8 \cdot 2 \cdot \frac{1}{|N(T)|} +
(|N(T)| - 8) \cdot \frac{1}{N(T)} = 1 + \dfrac{8}{|N(T)|}.
\]
Hence, $\kappa(T,R) = 1 - W(m_T,m_R) \geq - \dfrac{8}{|N(T)|}$.
A similar reasoning applies for the $p$-lazy random walk:
\[
W(m_T,m_R)\leq 8 \cdot 2 \cdot \frac{p}{|N(T)|} +
(|N(T)| - 7) \cdot \frac{p}{|N(T)|} + (1-p) - \frac{p}{|N(T)|} =
\]
$1 + \dfrac{8p}{|N(T)|}$.
Hence, $\ric(T,R) = \lim\limits_{p\to0}\left(1 - W(m_T,m_R)\right) \geq - \dfrac{8}{|N(T)|}$.
It remains to note that Lemma~\ref{neighBound} gives us an estimate $|N(T)| \geq 2(n-1)$ in $\dtt$, $|N(T)| \geq 2(n-2)$ in $\nni$, and $|N(T)| \geq n-1$ in $\rnni$.
The lemma follows.
\endproof
\begin{corollary}
\label{flatInLimDTS}
Let $(T_n,R_n)_{n\in\omega}$ be a sequence of pairs of trees such that $d(T_n,R_n) = 1$ in $\nni$, $\rnni$, or $\dtt$.
Then $\lim\limits_{n \to \infty}\kappa(T_n,R_n) = \lim\limits_{n \to \infty}\ric(T_n,R_n) = 0$.
\end{corollary}
\proof
Follows from Lemma~\ref{uniformUpper} and Lemma~\ref{uniformLower}.
\endproof
As we have just proved, the corollary above holds true in $\nni$, $\rnni$, and $\dtt$.
This corollary is also true in the SPR graph of rooted trees \autocite{Whidden2015-es}.
Actually, this property can be derived from the following general statement for all spaces mentioned, as well as for many others.
\begin{proposition}
\label{flatInLimGen}
Let $(M_n,d_n)_{n \in \omega}$ be a sequence of finite metric spaces and $(x_n, y_n)$ a sequence of adjacent vertices from $M_n$, that is $d_n(x_n,y_n) = 1$, such that
\begin{enumerate}[(1)]
\item
\label{intersec}
$\big|N(x_n) \cap N(y_n)\big| = o(|N(x_n)|)$
\item
\label{difference}
$\big||N(x_n)| - |N(y_n)|\big| = o(|N(x_n)|)$
\item
\label{dist2}
$\big|\{(x,y) \mid
x \in N(x_n),~ y \in N(y_n),~ d(x, y) \geq 2\}\big| = o(|N(x_n)|)$
\end{enumerate}
Then $\lim\limits_{n \to \infty} \kappa(x_n, y_n) = 0$.
\end{proposition}
\proof
Consider an arbitrary $n$ and the pair of points $x_n,y_n$.
Set $I_0 = N(x_n) \cap N(y_n)$, denote by $I_1$ the set of points $z$ such that the probability mass is moved from $z$ at distance one under the optimum coupling for $W(m_{x_n},m_{y_n})$, by $I_2$ the set of points $z$ such that the probability mass is moved from $z$ a distance $\geq 2$ under the optimum scenario for $W(m_x,m_y)$.
Then
\[
W(m_{x_n},m_{y_n}) = \sum_{I_0} d_i x_i p_{x_i} + \sum_{I_1} d_i x_i p_{x_i} +
\sum_{I_2} d_i x_i p_{x_i}.
\]
It follows from~(\ref{intersec}) and~(\ref{dist2}) that the first and the last sums are $\O(1)$ when $n\to\infty$.
This together with property~(\ref{difference}) implies that the middle sum tends to $1$ when $n\to\infty$ because the number of points $z$ such that the probability mass has to travel at distance one under an optimum scenario, divided by $|N(x_n)|$, tends to $1$ when $n\to\infty$.
\endproof
We note that in both Corollary~\ref{flatInLimDTS} and Proposition~\ref{flatInLimGen}, the assumption of being at distance one can be relaxed by that of being at a distance uniformly bounded by a slowly growing function.
But since the primary focus of this paper is on the spaces under consideration, we will not \todo{Should we?} include the statement in general terms of metric spaces.
However, we prove below (Theorem~\ref{zero-in-the-limit}) a much stronger property of the spaces, namely, that the curvature tends to zero when the tree size tends to infinity does not matter what distance between the trees is.
}
{}
\iftoggle{curvatureON}{
We now use Theorem~\ref{max_good_neighbors} to prove the following.
\todo[inline]{$\downarrow$ This must be true for $p$-lazy walk as well.}
\begin{theorem}
\label{zero-in-the-limit}
Let $(T_n,R_n)_{n\in\omega}$ be a sequence of pairs of $\dtt_2$-trees on $n$ taxa.
Then $\lim\limits_{n \to \infty}\kappa(T_n,R_n) = 0$ for the uniform random walk.
\end{theorem}
\proof
Since the curvature is a local property, it follows from Lemma~\ref{uniformLower} that $\kappa(T_n,R_n) \geq -\dfrac{4}{n-1}$ for all $n$.
To finish the proof, we need to bound the sequence $\kappa(T_n,R_n)$ from above by another sequence which tends to zero.
Let us fix an $n$ and let $U$ be the set of trees $E \in N(T)$ such that $d_\dtt_2(E,R_n) \leq d_\dtt_2(T_n,R_n)$ as in Theorem~\ref{max_good_neighbors}, and $B$ be the rest of trees $N(T_n)\setminus U$ from $N(T_n)$.
Let's denote $d_\dtt_2(T_n,R_n)$ by $d$.
Then
\[
W(m_{T_n},m_{R_n}) = \sum_{i\in U} p_i d_i + \sum_{j\in B} p_j d_j \geq
\sum_{i\in U} p_i (d-2) + \sum_{j\in B} p_j d =
\frac{|U|(d-2)}{|N(T_n)|} + \frac{|B|d}{|N(T_n)|}.
\]
Hence,
\[
\kappa(T_n,R_n) = 1 - \frac{W(m_{T_n},m_{R_n})}{d} \leq
1 - \frac{|U| + |B|}{|N(T_n)|} + \frac{2|U|}{|N(T_n)|d}
= \frac{2|U|}{|N(T_n)|d}.
\]
It follows from Theorem~\ref{max_good_neighbors} that $|U| \leq 4d$.
Thus, $\kappa(T_n,R_n) \leq \dfrac{8}{|N(T_n)|}$.
It remains to note that $\lim\limits_{n\to\infty}\dfrac{8}{|N(T_n)|} = 0$.
\endproof
Clearly, the same reasoning goes through for the other three spaces under consideration:
\begin{corollary}
Let $(T_n,R_n)_{n\in\omega}$ be a sequence of pairs of trees on $n$ taxa.
Then $\lim\limits_{n \to \infty}\kappa(T_n,R_n) = 0$ for $\nni$, $\rnni$, and $\dtt$ distances.
\end{corollary}
We now estimate the curvature of the Metropolis-Hastings random walk, which proposes a tree in the $1$-neighborhood uniformly at random and accepts the proposal with probability $\min\left(1, \dfrac{|N(T_{old})|}{|N(T_{new})|}\right)$.
We denote the corresponding curvature by $\kappa(\MH;T,R)$.
\begin{lemma}
The following inequalities are satisfied in both $\dtt_2$ and $\dtt$.
\[
\kappa(T,R) - \dfrac{6}{5d(T,R)} \leq \kappa(\MH;T,R) \leq \kappa(T,R) +
\dfrac{6}{5d(T,R)}\mbox{, and}
\]
\[
\kappa(T,R) - \dfrac35 \leq \kappa(\MH;T,R) \leq \kappa(T,R) + \dfrac35.
\]
\end{lemma}
\proof
From Corollary~\ref{degreeBounds} we have that $\frac{|N(T_{old})|}{|N(T_{new})|} \geq \frac{2}{5}$, so the probability mass that an MH move leaves at $T_{old}$ is not greater than $\frac35$.
The rest of the proof follows \autocite[][Proof of Lemma~V.8]{Whidden2015-es} literally.
\endproof
\begin{corollary}
The following inequalities are satisfied in $\rnni$ \todo{What about NNI?}.
\[
\kappa(T,R) - \dfrac{4}{3d(T,R)} \leq \kappa(\MH;T,R) \leq \kappa(T,R) +
\dfrac{4}{3d(T,R)}\mbox{, and}
\]
\[
\kappa(T,R) - \dfrac23 \leq \kappa(\MH;T,R) \leq \kappa(T,R) + \dfrac23.
\]
\end{corollary}
\proof
Use Corollary~\ref{degreeBounds}.
\endproof
}{}
\subsection{Efficient algorithm for generating $\rnni$ graphs}
We now introduce an algorithm to compute the $\rnni$ graph on $n$ leaves by extending an algorithm from \autocite{Whidden2015-es}.
The input to our algorithm is a set $S$ of $\rnni$ trees in the format described by \textcite{Gavryushkin2014-bw} and implemented in \autocite{tauGeodesic} (see also \autocite{Semple2003-nj}).
The two key requirements of the algorithm are the ability to enumerate the neighbors of a given tree in the $\rnni$ graph, and the ability to determine whether a given tree is already a vertex of the graph.
Both of these requirements follow from an efficient unique representation for ranked trees.
A node of a tree is given by the set of taxa that are all descendants of the node.
A tree is then given by a sequence of nodes ordered by their ranks.
Clearly, this obtained representation is unique.
For example, the representation of the tree on Figure~\ref{timeTree.eps} is $(\{A\}, \{C\}, \{D\}, \{B\}, \{E\}, \{A,B\}, \{D,E\}, \{C, D, E\}, \{F\}, \{A,B,F\}, \{A,B,C,D,E,F\})$, or
$(\{A,B\}, \{D,E\}, \{C, D, E\}, \{A,B,F\}, \{A,B,C,D,E,F\})$ if we make the tree ultrametric by placing all taxa below the most recent divergence event.
Using this representation of trees we construct a map $\nu : S \to \omega$, where $\omega$ is the set of non-negative integers.
This map allows us to determine whether or not a tree is already a vertex of the graph.
This takes linear time if the map is implemented as a trie.
We can also enumerate the neighbors of a tree in $O(n^2)$-time using Lemma~\ref{neighBound}.
The algorithm therefore takes $O(|S|*n^2)$-time.
The high-level steps of the algorithm are as follows.
See \autocite{Whidden2015-es} for an analogous proof of correctness.
\medskip
\textsc{Construct-$\rnni$-Graph($S$)}
\begin{enumerate}[1.]
\item Let $G$ be an empty graph.
\item Let $\nu$ be an empty mapping from trees to integers.
\item Let $i = 0$.
\item For each of the $m$ trees: \vspace{-0.2em}
\begin{enumerate}
\item Add a vertex $i$ to $G$ representing the current tree $T_i$.
\item Add $T_i \rightarrow i$ to $\nu$.
\item For each neighbor $T_j$ of $T_i$:
\begin{enumerate}
\item[(j)] If $T_j$ is in $\dom(\nu)$ then add an edge $(i, \nu(T_j))$ to $G$.
\end{enumerate}
\item $i = i + 1$.
\end{enumerate}
\end{enumerate}
\iftoggle{curvatureON}{
\section{Some other spaces}
Several important proposal mechanisms used in phylogenetic Bayesian inference by popular software packages such as BEAST2 \autocite{beast2} favor topological moves or tree moves depending on various conditions.
All tree moves we have been considering so far do not make an explicit distinction between topological changes and branch length changes.
To address this issue\todo{issue?}, we consider the following tree move that explicitly allows distributing the acceptance probability between topological and branch length moves.
{\bf Lazy walk.}
Let $p$ be the laziness probability, that is, we do nothing with probability $1-p$ and distribute the probability $p$ as follows.
We decide first on what type of move we want to perform: choose a topological move with probability $q$ and a length move otherwise, that is, $q \leq p$ and the probability of a length move is $p-q$.
The proposal is rejected if a topological move is impossible.
{\bf tau move.}
Choose a coordinate uniformly at random.
Increase the coordinate by $1$ with probability $p$ and decrease it by $1$ otherwise.
If the coordinate becomes $0$, resolve the multifurcation uniformly at random and set the new coordinate to $1$.
Note that this mechanism does not bound edge lengths from above and favors topological moves when $p<1/2$.