-
Notifications
You must be signed in to change notification settings - Fork 1
/
ms.tex
1383 lines (1226 loc) · 105 KB
/
ms.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[amsmath,amssymb,aps,prd,reprint,nopreprintnumbers,nofootinbib,showpacs]{revtex4-1}
\usepackage[nolist]{acronym}
\usepackage{multirow}
\usepackage{bigdelim}
\usepackage{multirow}
\usepackage{bm}
\usepackage{graphicx}
\usepackage{mathrsfs}
\usepackage{aas_macros}
\usepackage[colorlinks]{hyperref}
\usepackage{protosem}
\usepackage{listings}
\lstset{ %
basicstyle=\footnotesize\ttfamily,
showspaces=false,
showstringspaces=false,
showtabs=false,
keepspaces,
frame=tb,
commentstyle=\color{red},
keywordstyle=\color{blue}
}
% From http://www.f.kth.se/~ante/latex.php
\setlength{\marginparwidth}{0.5in}
\let\oldmarginpar\marginpar
\renewcommand\marginpar[1]{\-\oldmarginpar[\raggedleft\scriptsize #1]%
{\raggedright\scriptsize #1}}
\DeclareMathOperator{\Tr}{Tr}
\DeclareMathOperator{\erf}{erf}
\DeclareMathOperator{\cov}{cov}
\DeclareMathOperator{\std}{std}
\DeclareMathOperator*{\argmin}{\arg\!\min}
\DeclareMathOperator*{\argmax}{\arg\!\max}
% Macros for collapsing sizes of things
% From TUGboat, Volume 22 (2001), No. 4
% http://www.tug.org/TUGboat/tb22-4/tb72perlS.pdf
\def\clap#1{\hbox to 0pt{\hss#1\hss}}
\def\mathllap{\mathpalette\mathllapinternal}
\def\mathrlap{\mathpalette\mathrlapinternal}
\def\mathclap{\mathpalette\mathclapinternal}
\def\mathllapinternal#1#2{\llap{$\mathsurround=0pt#1{#2}$}}
\def\mathrlapinternal#1#2{\rlap{$\mathsurround=0pt#1{#2}$}}
\def\mathclapinternal#1#2{\clap{$\mathsurround=0pt#1{#2}$}}
\newcommand\transpose{\ensuremath{^{^\mathsf{T}}}}
\begin{document}
\title[Rapid-response Bayesian sky localization]{Rapid Bayesian position reconstruction for gravitational\nobreakdashes-wave transients}
\author{Leo P. Singer}
\email{leo.singer@ligo.org}
\author{Larry R. Price}
\email{larryp@caltech.edu}
\affiliation{\acs{LIGO} Laboratory, California Institute of Technology, Pasadena, California 91125, USA}
\received{28 September 2015}
\published{14 January 2016}
\pacs{04.80.Nn, 04.30.Tv, 02.50.Tt}
\begin{abstract}
Within the next few years, Advanced LIGO and Virgo should detect \aclp{GW} from \acl{BNS} and \acl{NSBH} mergers. These sources are also predicted to power a broad array of \acl{EM} transients. Because the \acl{EM} signatures can be faint and fade rapidly, observing them hinges on rapidly inferring the sky location from the \acl{GW} observations. \acl{MCMC} methods for gravitational\nobreakdashes-wave parameter estimation can take hours or more. We introduce BAYESTAR, a rapid, Bayesian, non-\acl{MCMC} sky localization algorithm that takes just seconds to produce probability sky maps that are comparable in accuracy to the full analysis. Prompt localizations from BAYESTAR will make it possible to search \acl{EM} counterparts of compact binary mergers.
\end{abstract}
\maketitle
The Laser Interferometer Gravitational Wave Observatory (\acsu{LIGO})~\cite{AdvancedLIGOHarry,AdvancedLIGO} has just begun taking data \cite{LIGOObservingScenarios} in its ``Advanced'' configuration. The two \ac{LIGO} detectors will ultimately increase their reach in volume within the local Universe by 3 orders of magnitude as compared to their initial configurations through 2010. They form the first parts of a sensitive global gravitational-wave (\acsu{GW}) detector network, soon to be augmented by Advanced Virgo~\cite{AdvancedVirgo} and later by the Japanese KAGRA facility~\cite{LCGT,KAGRAInterferometerDesign} and \ac{LIGO}\nobreakdashes--India~\cite{LIGOIndia}.
The most readily detectable sources of \acp{GW} include binary neutron star mergers, with 0.4\nobreakdashes--400 events per year within the reach of Advanced LIGO at its final design sensitivity~\cite{rates}. These binary systems are not only efficient sources of \acp{GW} but also potential sources of detectable \ac{EM} transients from the aftermath of the tidal disruption of the \acp{NS}. \citet{MostPromisingEMCounterpart} argue that the most promising \ac{EM} counterparts are the hypothesized optical/infrared ``kilonovae'' powered by the radioactive decay of $r$\nobreakdashes-process elements synthesized within the neutron\nobreakdashes-rich ejecta. These are expected to be faint and red and to peak rapidly, reaching an absolute magnitude of only $M_R \sim -13$ within a week, though rising several magnitudes brighter in the infrared~\citep{KilonovaHighOpacities}.
Several mechanisms could make the kilonovae brighter, bluer, and hence more readily detectable \citep{KilonovaPrecursor,KilonovaRedOrBlue}, but peak even earlier, within hours. If, as is widely believed \citep{1986ApJ...308L..43P,1989Natur.340..126E,1992ApJ...395L..83N,2011ApJ...732L...6R}, \ac{BNS} mergers are indeed progenitors of short \acp{GRB}, then a small (due to jet collimation) fraction of Advanced \ac{LIGO} events could also be accompanied by a bright optical afterglow, but this signature, likewise, would peak within hours or faster.
Adding to the challenge of detecting a faint, short-lived optical transient, there is an extreme mismatch between the sky localization accuracy of \ac{GW} detector networks, $\sim$10\nobreakdashes--500\,deg$^2$ \citep{FairhurstTriangulation,WenLocalizationAdvancedLIGO,FairhurstLocalizationAdvancedLIGO,2011PhRvD..84j4020V,RodriguezBasicParameterEstimation,NissankeLocalization,NissankeKasliwalEMCounterparts,KasliwalTwoDetectors,Grover:2013,SiderySkyLocalizationComparison,FirstTwoYears}, and the \acp{FOV} of 1\nobreakdashes--8\,m\nobreakdashes-class optical telescopes. Wide\nobreakdashes-field optical transient facilities such as BlackGEM~(0.6\,m/2.7\,deg$^2$), the \acl{ZTF}~(1.2\,m/47\,deg$^2$)~\cite{ZTF}, the \acl{DECAM}~(4\,m/3\,deg$^2$), or the \acl{LSST}~(8.4\,m/9.6\,deg$^2$), operated in ``target of opportunity'' mode, may be able to tile these large areas rapidly enough to find the one needle in the haystack that is connected to the \ac{GW} event. However, prompt and accurate \ac{GW} position reconstructions will be of the utmost importance for guiding the selection of fields to observe.
The final science run of the initial LIGO and Virgo instruments saw the first joint search for \ac{GW} and \ac{EM} emission from compact binaries. This involved several advances in the \ac{GW} data analysis \citep{CBCLowLatency}, including the first real\nobreakdashes-time matched\nobreakdashes-filter detection pipeline, the \acl{MBTA}~\cite{LIGOVirgoInspiralPipelineComparison}; a semi\nobreakdashes-coherent, \textit{ad hoc} rapid triangulation code, Timing++; and the first version of a rigorous Bayesian \ac{MCMC} parameter estimation code, LALINFERENCE~\cite{S6PE}---all in service of the first search for X\nobreakdashes-ray \citep{SwiftFollowup} and optical \citep{OpticalImageAnalysis} counterparts of \ac{GW} triggers, by a consortium of facilities. Despite the technical achievements in the \ac{GW} data analysis, there was an undesirable tradeoff between the speed as well as accuracy of the rapid localization and the full parameter estimation: the former could analyze a detection candidate in minutes, whereas the latter took half a day; the latter decreased the area on the sky by a factor of 1/20 over the former but took 1000 times as long to run~\cite{SiderySkyLocalizationComparison}.
The success of \ac{EM} follow\nobreakdashes-up of \ac{LIGO} events will depend critically on disseminating high quality sky localizations within a time scale of minutes to hours. To that end, we have devised a rapid and accurate Bayesian sky localization method that takes mere seconds to achieve approximately the same accuracy as the full \ac{MCMC} analysis. Our key insights are the following:
%
\begin{enumerate}
\item Nearly all of the information in the \ac{GW} time series that is informative for sky localization is encapsulated within the matched\nobreakdashes-filter estimates of the times, amplitudes, and phases on arrival at the detectors. To infer the position and distance of a \ac{GW} event, we only have to consider three numbers per detector rather than a densely sampled strain time series per detector.
\item The matched\nobreakdashes-filter pipeline can be treated as a measurement system in and of itself. Just like the strain time series from the detectors, the resultant times, amplitudes, and phases have a predictable and quantifiable measurement uncertainty that can be translated into a likelihood function suitable for Bayesian inference.
\item The Fisher information matrix will provide clues as to suitable forms of this likelihood function. Recent \ac{GW} parameter estimation literature has largely rejected the Fisher matrix,\footnote{Though not entirely; see \cite{EffectiveFisherMatrix}.} but this is mostly on the grounds of the abuse of the related \ac{CRLB} outside its realm of validity of moderate to high \ac{SNR}~\cite{BayesianBounds,UseAbuseFisherMatrix,FisherMatrixAsymptoticExpansions,InadequaciesFisherMatrix}. However, we recognize that the block structure of the Fisher matrix provides important insights and is a useful quantity for checking the validity of the aforementioned likelihood function, quite independent of the \ac{CRLB}.
\item The Fisher matrix teaches us that errors in sky localization are semi\nobreakdashes-independent from errors in masses. This implies that if we care only about position reconstruction and not about jointly estimating masses as well, then we can reduce the dimensionality of the parameter estimation problem significantly. Moreover, this frees us of the need to directly compute the expensive post-Newtonian model waveforms, making the likelihood itself much faster to evaluate.
\item Thanks to a simple likelihood function and a well\nobreakdashes-characterized parameter space, we may dispense with costly and parallelization\nobreakdashes-resistant \ac{MCMC} integration and instead perform the Bayesian marginalization with classic, deterministic, very low order Gaussian quadrature.
\item The Bayesian inference scheme thus designed to operate on the matched\nobreakdashes-filter detection pipeline outputs could be trivially generalized to operate on the full \ac{GW} time series within the same computational constraints. This would yield a fast and coherent localization algorithm that is mathematically equivalent to the full \ac{MCMC} parameter estimation, restricted to extrinsic parameters (sky location, binary orientation, and distance).
\end{enumerate}
%
We call this algorithm \ac{BAYESTAR}.\footnote{A pun on the Cylon battleships in the American television series Battlestar Galactica. The defining characteristic of the Cylons is that they repeatedly defeat humanity by using their superhuman information\nobreakdashes-gathering ability to coordinate overwhelming forces.}\footnote{We do not like to mention the final `L' in the acronym, because then it would be pronounced BAYESTARL, which sounds stupid.} It is as fast as Timing++ but nearly as accurate as the rigorous full parameter estimation. It is unique in that it bridges the detection and parameter estimation of \ac{GW} signals, two tasks that have until now involved very different numerical methods and time scales. Beginning with the first Advanced \acs{LIGO} observing run, \ac{BAYESTAR} is providing localizations within minutes of the detection of any \ac{BNS} merger candidate, playing a key role in enabling rapid follow-up observations.
This paper is organized as follows. In Sec.~\ref{sec:detection}, we describe the \ac{GW} signal model and sketch the standard detection algorithm, the matched\nobreakdashes-filter bank. In Sec.~\ref{sec:parameter-estimation}, we describe Bayesian inference formalism and the prevailing method for inferring the parameters of detected candidates, \ac{MCMC} sampling. In Sec.~\ref{sec:bayestar-likelihood}, we propose the \ac{BAYESTAR} likelihood as a model for the uncertainty in the matched\nobreakdashes-filter parameter estimates, and discuss its relationship to and consistency with the likelihood for the full \ac{GW} data. In Sec.~\ref{sec:prior}, we describe the input to \ac{BAYESTAR} supplied by the detection pipeline, and the prior distribution on parameters. In Sec.~\ref{sec:marginal-posterior}, we explain the integration scheme by which the posterior probability distribution is calculated for a given sky location. In Sec.~\ref{sec:adaptive-sampling}, we show a scheme whereby the sky posterior is sampled on an adaptive \ac{HEALPix} grid. In Sec.~\ref{sec:run-time}, we report the running time of the algorithm on the hardware available on the \ac{LIGO} Data Grid. Finally, in Sec.~\ref{sec:case-study}, we quantify the sky localization performance on a comprehensive set of simulated \ac{GW} events.
\section{Signal model and detection}
\label{sec:detection}
In the \acl{TD}, the strain observed by a single \ac{GW} interferometer is
%
\begin{equation}
y_i(t) = x_i (t; \bm\theta) + n_i (t).
\end{equation}
%
In the \acl{FD},
%
\begin{equation}\label{eq:signal-model}
Y_i (\omega) = \int_{-\infty}^\infty y(t) e^{-i \omega t} dt = X_i (\omega; \bm\theta) + N_i (\omega),
\end{equation}
%
where $X_i (\omega; \bm\theta)$ is the \ac{GW} signal given a parameter vector $\bm\theta$ that describes the \ac{GW} source and $N_i (\omega)$ is that detector's Gaussian noise with one\nobreakdashes-sided \ac{PSD} $S_i(\omega) = E\left[\left|N_i(\omega)\right|^2\right] + E\left[\left|N_i(-\omega)\right|^2\right] = 2 E\left[\left|N_i(\omega)\right|^2\right]$. We shall denote the combined observation from a network of detectors as $\mathbf Y (\omega) \equiv \{Y_i (\omega)\}_i$.
Under the assumptions that the detector noise is Gaussian and that the noise from different detectors is uncorrelated, the likelihood of the observation, $\mathbf y$, conditioned upon the parameters $\bm\theta$, is a product of Gaussian distributions:
%
\begin{multline}\label{eq:gaussian-likelihood}
\mathcal{L}(\mathbf Y; \bm\theta) = \prod_i p(Y_i | \bm\theta)
\\
\propto \exp \left[
- \frac{1}{2} \sum_i \int_0^\infty \frac{\left|Y_i (\omega)
- X_i(\omega; \bm\theta) \right|^2}{S_i(\omega)} \, d\omega
\right].
\end{multline}
A \ac{CBC} source is specified by a vector of extrinsic parameters describing its position and orientation and intrinsic parameters describing the physical properties of the binary components\footnote{This list of parameters involves some simplifying assumptions. Eccentricity is omitted; although it may play a major role in the evolution and waveforms of rare close binaries formed by dynamical capture~\citep{PhysRevD.87.043004,2013PhRvD..87l7501H,2014PhRvD..90h4016H}, \ac{BNS} systems formed by binary stellar evolution should almost always circularize due to tidal interaction~\citep{0004-637X-572-1-407} and later \ac{GW} emission~\citep{PhysRev.136.B1224} long before the inspiral enters LIGO's frequency range of $\sim$10\nobreakdashes--1000~kHz. Tidal deformabilities of the \acp{NS} are omitted because the signal imprinted by the companions' material properties is so small that it will only be detectable by an Einstein Telescope\nobreakdashes-class \ac{GW} observatory~\citep{PhysRevD.81.123016}. Furthermore, in \ac{GW} detection efforts, especially those focused on \ac{BNS} systems, the component spins $\mathbf{S}_1$ and $\mathbf{S}_2$ are often assumed to be nonprecessing and aligned with the system's total angular momentum and condensed to a single scalar parameter $\chi$, or even neglected entirely: $\mathbf{S}_1 = \mathbf{S}_2 = 0$.}:
%
\begin{equation}\label{eq:params}
\bm\theta = \left[
\begin{array}{cl@{\quad}p{4cm}lp{3.75cm}}
\alpha & \rdelim]{11}{0mm} & right ascension & \rdelim\}{7}{1mm} & \multirow{7}{2cm}{extrinsic parameters, $\bm\theta_\mathrm{ex}$} \\
\delta && declination & \\
r && distance & \\
t_\oplus && arrival time at geocenter & \\
\iota && inclination angle & \\
\psi && polarization angle & \\
\phi_c && coalescence phase & \\
\cline{1-1}\cline{3-3}
m_1 && first component's mass & \rdelim\}{4}{1mm} & \multirow{4}{2cm}{intrinsic parameters, $\bm\theta_\mathrm{in}$.}\\
m_2 && second component's mass & \\
\mathbf S_1 && first component's spin & \\
\mathbf S_2 && second component's spin & \\
\end{array}\right.
\end{equation}
Assuming a non\nobreakdashes-precessing circular orbit, we can write the \ac{GW} signal received by any detector as a linear combination of two basis waveforms, $H_0$ and $H_{\pi/2}$ \citep{PhysRevD.83.084002}. $H_0$ and $H_{\pi/2}$ are approximately ``in quadrature'' in the same sense as the cosine and sine functions, being orthogonal and out of phase by ${\pi/2}$ at all frequencies. If $H_0$ and $H_{\pi/2}$ are Fourier transforms of real functions, then $H_0(\omega) = H_0^*(-\omega)$ and $H_{\pi/2}(\omega) = H_{\pi/2}^*(-\omega)$, and we can write
%
\begin{equation}
H_{\pi/2}(\omega) = H_0(\omega) \cdot
\begin{cases}
-i & \text{if } \omega \geq 0 \\
i & \text{if } \omega < 0
\end{cases}.
\end{equation}
%
For brevity, we define $H \equiv H_{0}$ and write all subsequent equations in terms of the $H$ basis vector alone. Then, we can write the signal model in a way that isolates all dependence on the extrinsic parameters, $\bm\theta_\mathrm{ex}$, into a few coefficients and all dependence on the intrinsic parameters, $\bm\theta_\mathrm{in}$, into the basis waveform, by taking the Fourier transform of Eq.~(2.8) of Ref.~\cite{PhysRevD.83.084002},
%
\begin{multline}\label{eq:full-signal-model}
X_i(\omega; \bm\theta) = e^{-i \omega (t_\oplus - \mathbf{d}_i \cdot \mathbf{n})}
\frac{r_{1,i}}{r}
e^{2 i \phi_c}
\\
\left[
\frac{1}{2} \left(1 + \cos^2 \iota\right) \Re \left\{\zeta\right\} - i
\left(\cos\iota\right) \Im \left\{\zeta\right\}
\right]
H(\omega; \bm\theta_\mathrm{in})
\end{multline}
%
for $\omega \geq 0$, where
%
\begin{equation}
\zeta = e^{-2 i \psi} \left(
F_{+,i}(\alpha, \delta, t_\oplus) +
i F_{\times,i}(\alpha, \delta, t_\oplus)
\right).
\end{equation}
%
The quantities $F_{+,i}$ and $F_{\times,i}$ are the dimensionless detector antenna factors, defined such that $0 \leq {F_{+,i}}^2 + {F_{\times,i}}^2 \leq 1$. They depend on the orientation of detector $i$ as well as the sky location and sidereal time of the event and are presented in Ref.~\cite{ExcessPower}. In a coordinate system with the $x$ and $y$ axes aligned with the arms of a detector, the antenna pattern is given in spherical polar coordinates as
%
\begin{align}
F_+ &= -\frac{1}{2}(1 + \cos^2 \theta) \cos{2\phi}, \\
F_\times &= -\cos\theta \sin{2\phi}.
\end{align}
%
The unit vector $\mathbf{d}_i$ represents the position of detector $i$ in units of light travel time.\footnote{When considering transient \ac{GW} sources such as those that we are concerned with in this article, the origin of the coordinate system is usually taken to be the geocenter. For long\nobreakdashes-duration signals such as those from statically deformed neutron stars, the solar system barycenter is a more natural choice.} The vector $\mathbf{n}$ is the direction of the source. The negative sign in the dot product $-\mathbf{d}_i \cdot \mathbf{n}$ is present because the direction of travel of the \ac{GW} signal is opposite to that of its sky location. The quantity $r_{1,i}$ is a fiducial distance at which detector $i$ would register \ac{SNR}=1 for an optimally oriented binary (face\nobreakdashes-on, and in a direction perpendicular to the interferometer's arms):
%
\begin{equation}\label{eq:horizon}
r_{1,i} = 1 / \sigma_i, \qquad {\sigma_i}^2 = \int_0^\infty \frac{\left|H(\omega; \boldsymbol \theta_\mathrm{in})\right|^2}{S_i(\omega)} \,d\omega.
\end{equation}
More succinctly, we can write the signal received by detector $i$ in terms of observable extrinsic parameters $\bm\theta_i = (\rho_i, \gamma_i, \tau_i)$, the amplitude $\rho_i$, phase $\gamma_i$, and time delay $\tau_i$ on arrival at detector $i$:
%
\begin{multline}\label{eq:signal-model}
X_i (\omega; \bm\theta_i, \bm\theta_\mathrm{in})
\\
= X_i (\omega; \rho_i, \gamma_i, \tau_i, \bm\theta_\mathrm{in}) = \frac{\rho_i}{\sigma_i} e^{i (\gamma_i - \omega \tau_i)} H(\omega; \bm\theta_\mathrm{in}).
\end{multline}
The prevailing technique for detection of \acp{GW} from \acp{CBC} is to realize a \ac{ML} estimator (MLE) from the likelihood in Eq.~(\ref{eq:gaussian-likelihood}) and the signal model in Eq.~(\ref{eq:signal-model}). Concretely, this results in a bank of matched filters, or cross-correlations between the incoming data stream and a collection of template waveforms,
%
\begin{equation}
z_i(\tau_i;\bm\theta_\mathrm{in}) = \frac{1}{\sigma_i (\bm\theta_\mathrm{in})} \int_0^\infty \frac{H^*(\omega; \bm\theta_\mathrm{in}) Y_i(\omega) e^{i \omega \tau_i}}{S_i(\omega)} \,d\omega.
\end{equation}
%
The \ac{ML} point estimates of the signal parameters, $\mathrm{MLE}(\mathbf{y}) = \{\{ \hat{\bm\theta}_i \}_i, \hat{\bm\theta}_\mathrm{in}\} = \{\left\{ \hat\rho_i, \hat\gamma_i, \hat\tau_i \right\}_i, \hat{\bm\theta}_\mathrm{in}\}$, are given by
%
\begin{eqnarray}
\label{eq:optimal-tau}
\hat{\bm\theta}_\mathrm{in}, \{\hat\tau_i\}_i
&=& \argmax_{\bm\theta_\mathrm{in}, \{\hat\tau_i\}_i}
\sum_i \left| z_i\left(\tau_i;
\bm\theta_\mathrm{in}\right) \right|^2, \\
\label{eq:optimal-rho}
\hat\rho_i &=& \left| z_i\left(\hat\tau_i;
\hat{\bm\theta}_\mathrm{in}\right) \right|, \\
\label{eq:optimal-gamma}
\hat\gamma_i &=& \arg z_i\left(\hat\tau_i;
\hat{\bm\theta}_\mathrm{in}\right).
\end{eqnarray}
%
A detection candidate consists of $\{\left\{ \hat\rho_i, \hat\gamma_i, \hat\tau_i \right\}_i, \hat{\bm\theta}_\mathrm{in}\}$. There are various ways to characterize the significance of a detection candidate. In Gaussian noise, the maximum likelihood for the network is obtained by maximizing the network \ac{SNR}, $\rho_\mathrm{net}$,
%
\begin{equation}
\hat\rho_\mathrm{net} = \max_{\bm\theta} \sum_i {|z_i({\bm\theta})|}^2 = \sqrt{\sum_i {\hat\rho_i}^2};
\end{equation}
%
this, therefore, is the simplest useful candidate ranking statistic.
\subsection{Uncertainty and the Fisher matrix}
\label{sec:fisher}
We can predict the uncertainty in the detection pipeline's \ac{ML} estimates using the \ac{CRLB}. The \ac{CRLB} has been widely applied in \ac{GW} data analysis to estimate parameter estimation uncertainty \cite{1996PhRvD..53.3033B,FairhurstTriangulation,2009PhRvD..79h4032A,WenLocalizationAdvancedLIGO,LIGOObservingScenarios,FairhurstLIGOIndia}.\footnote{\label{footnote:template-banks}The Fisher matrix is also used in construction of \ac{CBC} matched-filter banks. The common procedure is to place templates uniformly according to the determinant of the signal space metric, which is the Fisher matrix. This is equivalent to uniformly sampling the Jeffreys prior. In practice, this is done either by constructing a hexagonal lattice~\citep{PhysRevD.76.102004} or sampling stochastically~\citep{2009PhRvD..80j4014H,2009PhRvD..80b4009V,SBank,2010PhRvD..81b4004M,2014PhRvD..89b4003P}.} As we noted, there are significant caveats to the \ac{CRLB} at low or moderate \ac{SNR} \cite{BayesianBounds,UseAbuseFisherMatrix,FisherMatrixAsymptoticExpansions,InadequaciesFisherMatrix}. However, here we will be concerned more with gaining intuition from the block structure of the Fisher matrix than its numerical value. Furthermore, the Fisher matrix in its own right---independent of its suitability to describe the parameter covariance---is a well\nobreakdashes-defined property of any likelihood function, and we will exploit it as such in Sec.~\ref{sec:bayestar-likelihood}.
We will momentarily consider the likelihood for a single detector:
%
\begin{multline}\label{eq:gaussian-likelihood-spa}
\mathcal{L}\left(Y_i; \rho_i, \gamma_i, \tau_i,
\bm\theta_\mathrm{in}\right)
\\
\propto \exp \left[
- \frac{1}{2} \int_0^\infty \frac{\left|Y_i (\omega)
- X_i\left(\omega; \rho_i, \gamma_i, \tau_i,
\bm\theta_\mathrm{in}\right)
\right|^2}{S_i(\omega)} \, d\omega
\right],
\end{multline}
%
with $X_i(\omega; \rho_i, \gamma_i, \tau_i, \bm\theta_\mathrm{in})$ given by Eq.~(\ref{eq:signal-model}).
The Fisher information matrix for a measurement $y$ described by the unknown parameter vector $\bm{\theta}$ is the conditional expectation value
%
\begin{equation}\label{eq:general-fisher-matrix}
\mathcal{I}_{jk} = \mathrm{E} \, \left[
\left(\frac{\partial \log
\mathcal{L}(Y_i ; \bm\theta)}
{\partial \theta_j}\right)
\left(\frac{\partial \log
\mathcal{L}(Y_i ; \bm\theta)}
{\partial \theta_k}\right)
\middle| \bm\theta
\right].
\end{equation}
The Fisher matrix describes how strongly the likelihood depends, on average, on the parameters. Furthermore, it provides an estimate of the mean-square error in the parameters. If $\hat{\bm\theta}$ is an unbiased estimator of $\bm\theta$, $\tilde{\bm\theta} = \hat{\bm\theta} - \bm\theta$ is the measurement error, and $\Sigma = \mathrm{E} \, [\tilde{\bm\theta}\tilde{\bm\theta}\transpose]$ is the covariance of the measurement error, then the \ac{CRLB} says that $\Sigma \geq \mathcal{I}^{-1}$, in the sense that $\left(\Sigma - \mathcal{I}^{-1}\right)$ is positive semidefinite.
Note that if $\log\mathcal{L}$ is twice differentiable in terms of $\bm\theta$, then the Fisher matrix can also be written in terms of second derivatives as
%
\begin{equation}\label{eq:general-fisher-matrix-second-derivatives}
\mathcal{I}_{jk} = \mathrm{E} \, \left[
-\frac{\partial^2 \log
\mathcal{L}(Y_i ; \bm\theta)}
{\partial \theta_j \partial \theta_k}
\middle| \bm\theta
\right].
\end{equation}
When (as in our assumptions) the likelihood is Gaussian,\footnote{This assumes that the merger occurs at a frequency outside the sensitive ``bucket'' of the detector's noise \ac{PSD}. There are additional terms if the \ac{GW} spectrum drops to zero within the sensitive bandwidth of the detector, as can be the case for \acl{NSBH} mergers; see Ref.~\cite{2014CQGra..31o5005M}.} Eq.~(\ref{eq:general-fisher-matrix}) simplifies to
%
\begin{equation}\label{eq:gaussian-fisher-matrix}
\mathcal{I}_{jk} = \int_0^\infty \Re \left[
\left(\frac{\partial X_i}{\partial \theta_j}\right)^*
\left(\frac{\partial X_i}{\partial \theta_k}\right)
\right] \frac{1}{S_i(\omega)} \, d\omega.
\end{equation}
%
This form is useful because it involves manipulating the signal $X_i (\omega)$ rather than the entire observation $Y (\omega)$. In terms of the $k$th \ac{SNR}-weighted moment of angular frequency,
%
\begin{equation}\label{eq:angular-frequency-moments}
{\overline{\omega^k}}_i =
\left[ \int_0^\infty \frac{|h (\omega)|^2}{S_i(\omega)} \omega^k \, d\omega \right]
\left[ \int_0^\infty \frac{|h (\omega)|^2}{S_i(\omega)} \, d\omega \right]^{-1},
\end{equation}
%
the Fisher matrix for the signal in the $i$th detector is
%
\begin{equation}
\label{eq:single-detector-block-fisher-matrix}
\mathcal{I}_i = \left(
\begin{array}{cc}
\mathcal{I}_{\bm\theta_i,\bm\theta_i} &
\mathcal{I}_{\bm\theta_i,\bm\theta_\mathrm{in}} \\
{\mathcal{I}_{\bm\theta_i,\bm\theta_\mathrm{in}}^{^\mathsf{T}}} &
{\rho_i}^2 \mathcal{I}_{\bm\theta_\mathrm{in},\bm\theta_\mathrm{in}}
\end{array}
\right).
\end{equation}
%
The top-left block describes only the extrinsic parameters, and is given by
%
\begin{equation}\label{eq:fisher-matrix}
\mathcal{I}_{\bm\theta_i,\bm\theta_i} = \bordermatrix{
~ & \rho_i & \gamma_i & \tau_i \cr
\rho_i & 1 & 0 & 0 \cr
\gamma_i & 0 & {\rho_i}^2 & -{\rho_i}^2 {\overline{\omega}}_i \cr
\tau_i & 0 & -{\rho_i}^2 {\overline{\omega}}_i & {\rho_i}^2 {\overline{\omega^2}}_i
}.
\end{equation}
%
(This is equivalent to an expression given in Ref.~\cite{Grover:2013}.) The bottom row and right column of Eq.~(\ref{eq:single-detector-block-fisher-matrix}) describe the intrinsic parameters and how they are coupled to the extrinsic parameters. We show in Appendix~\ref{sec:independence-intrinsic-extrinsic} that we need not consider the intrinsic parameters at all if we are concerned only with sky localization.
For our likelihood, the \ac{CRLB} implies that
%
\begin{equation}\label{eq:covariance-matrix}
\cov{
\left(
\begin{array}{c}
\tilde{\rho}_i \\
\tilde{\gamma}_i \\
\tilde{\tau}_i
\end{array}
\right)
} \geq \mathcal{I}^{-1} = \left(
\begin{array}{ccc}
1 & 0 & 0 \\
0 & {\rho_i}^2 {\overline{\omega^2}}_i/{\omega_{\mathrm{rms},i}}^2 & {\rho_i}^2 {\overline{\omega}}_i/{\omega_{\mathrm{rms},i}}^2 \\
0 & {\rho_i}^2 {\overline{\omega}}_i/{\omega_{\mathrm{rms},i}}^2 & {\rho_i}^2/{\omega_{\mathrm{rms},i}}^2
\end{array}
\right),
\end{equation}
%
where ${\omega_{\mathrm{rms},i}}^2 = {\overline{\omega^2}}_i - {{\overline{\omega}}_i}^2$. This structure implies that errors in the \ac{SNR} are uncorrelated with errors in time and phase and that there is a particular sum and difference of the times and phases that are measured independently (see Appendix~\ref{sec:interpretation-of-errors}).
Reading off the $\tau \tau$ element of the covariance matrix reproduces the timing accuracy in Eq.~(24) of Ref.~\cite{FairhurstTriangulation},
%
\begin{equation}\label{eq:timing-crlb}
\std \left(\hat{\tau}_i - \tau_i \right) \geq \sqrt{\left(\mathcal{I}^{-1}\right)_{\tau\tau}} = \frac{\rho_i}{\omega_{\mathrm{rms},i}}.
\end{equation}
%
\citet{FairhurstTriangulation} goes on to frame the characteristic position reconstruction accuracy of a \ac{GW} detector network in terms of time delay triangulation, with the above formula describing the time of arrival uncertainty for each detector. In Appendix~\ref{sec:position-resolution}, we show how to extend this formalism to include the phases and amplitudes on arrival as well.
\section{Bayesian probability and parameter estimation}
\label{sec:parameter-estimation}
In the Bayesian framework, parameters are inferred from the data by forming the posterior distribution, $p(\bm\theta|\mathbf y)$, which describes the probability of the parameters given the observations. Bayes's rule relates the likelihood $p(\mathbf y|\bm\theta)$ to the posterior $p(\bm\theta|\mathbf y)$,
%
\begin{equation}\label{bayes}
p(\bm\theta|\mathbf y) = \frac{p(\mathbf y|\bm\theta) p(\bm\theta)}{p(\mathbf y)},
\end{equation}
%
introducing the prior distribution $p(\bm\theta)$ which encapsulates previous information about the parameters (for example, arising from earlier observations or from known physical bounds on the parameters) and the evidence $p(\mathbf y)$ which can be thought of as a normalization factor or as describing the parsimoniousness of the model.
The choice of prior is determined by one's astrophysical assumptions. During \acl{S6} when LIGO's Bayesian \ac{CBC} parameter estimation pipelines were first deployed, the prior was taken to be isotropic in the sky location and binary orientation and uniform in volume, arrival time, and the component masses~\cite{S6PE}.
In Bayesian inference, although it is often easy to write down the likelihood or even the full posterior in closed form, usually one is interested in only a subset $\bm\beta$ of all of the model's parameters, the others $\bm\lambda$ being nuisance parameters. In this case, we integrate away the nuisance parameters, forming the marginal posterior
%
\begin{equation}\label{eq:marginal-posterior}
p(\bm\beta|\mathbf y) = \int \frac{p(\mathbf y|\bm\beta,\bm\lambda) p(\bm\beta,\bm\lambda)}{p(\mathbf y)} \,d\bm\lambda
\end{equation}
%
with $\bm\theta = (\bm\beta, \bm\lambda)$. For instance, for the purpose of locating a \ac{GW} source on the sky, all parameters but $(\alpha, \delta)$ are nuisance parameters.
% Bayesian parameter estimation has many advantages, including broad generality and the ability to make probabilistically meaningful statements even with very low \ac{SNR} measurements. However, in problems of even modest complexity, the marginalization step involves many-dimensional, ill-behaved integrals. The powerful \ac{MCMC} integration technique has become almost synonymous with Bayesian inference. Though powerful, \ac{MCMC} is inherently non-deterministic and resistant to parallelization, as well as (at least historically) slow. With the most sophisticated \ac{CBC} parameter estimation codes, it still takes days to process a single event. This delay is undesirable for planning targeted \ac{EM} follow\nobreakdashes-up searches of LIGO events.
\section{\acs{BAYESTAR} likelihood}
\label{sec:bayestar-likelihood}
For the purpose of rapid sky localization, we assume that we do not have access to the \ac{GW} data $\mathbf{Y}$ itself and that our only contact with it is through the \ac{ML} parameter estimates $\{\left\{ \hat\rho_i, \hat\gamma_i, \hat\tau_i \right\}_i, \hat{\bm\theta}_\mathrm{in}\}$. Although this is a significant departure from conventional \ac{GW} parameter estimation techniques, we can still apply the full Bayesian machinery of Eq.~(\ref{eq:marginal-posterior}) to compute a posterior distribution for the sky location.
The relevant likelihood is now the probability of the \ac{ML} estimates, conditioned upon the true parameter values, and marginalized over all possible \ac{GW} observations:
%
\begin{equation}\label{eq:detection-candidate-likelihood}
p\left(\{\hat{\bm\theta}_i\}_i,
\hat{\bm\theta}_\mathrm{in}
\middle| \bm\theta\right)
\propto \int\limits_{\mathclap{\mathbf{Y} | \mathrm{MLE}(\mathbf{Y}) =
\{\{\hat{\bm\theta}_i\}_i,
\hat{\bm\theta}_\mathrm{in}\}}}
p(\mathbf{Y} | \bm\theta) \, p(\bm\theta)
\, d\mathbf{Y}.
\end{equation}
%
Although we may not be able to evaluate this equation directly, with some educated guesses we can create a likelihood that has many properties in common with it. Any valid approximate likelihood must have the same Fisher matrix as shown in Eq.~(\ref{eq:single-detector-block-fisher-matrix}). It must also have the same limiting behavior: it should be periodic in the phase error $\tilde{\gamma}_i$ and go to zero as $\tilde{\tau}_i \rightarrow \pm \infty$, $\hat{\rho}_i \rightarrow 0$, or $\hat{\rho}_i \rightarrow \infty$. Additionally, when $\tilde{\tau}_i = 0$, the distribution of ${\hat{\rho}_i}^2$ should reduce to a noncentral $\chi^2$ distribution with 2 degrees of freedom, centered about ${\rho_i}^2$, because the complex matched-filter time series $z_i(t)$ is Gaussian (under the ideal assumption the \ac{GW} strain time series is Gaussian).
These conditions could be satisfied by realizing a multivariate Gaussian distribution with covariance matrix $\Sigma = \mathcal{I}^\intercal$ and then replacing individual quadratic terms in the exponent of the form $-\tilde{\theta}^2/2$ with $\cos{\tilde{\theta}}$.
A more natural way is to plug the signal model from Eq.~(\ref{eq:signal-model}) evaluated at the \ac{ML} parameter estimates into the single-detector likelihood in Eq.~(\ref{eq:gaussian-likelihood-spa}):
%
\begin{multline}
p\left(\hat{\bm\theta}_i \middle| \bm\theta \right)
:=
p\left(Y_i(\omega) = X_i(\omega; \hat{\bm\theta})
\middle| \bm\theta \right)
\\
\propto \exp \Bigg[
- \frac{1}{2} \int_0^\infty \Bigg|
\frac{\hat{\rho}_i}{\sigma_i(\hat{\bm\theta}_\mathrm{in})} e^{i (\hat\gamma_i - \omega \hat\tau_i)} \frac{H(\omega; \hat{\bm\theta}_\mathrm{in})}{S_i(\omega)}
\\
- \frac{\rho_i}{\sigma_i(\bm\theta_\mathrm{in})} e^{i (\gamma_i - \omega \tau_i)} \frac{H(\omega; \bm\theta_\mathrm{in})}{S_i(\omega)}
\Bigg|^2 \, d\omega
\Bigg].
\end{multline}
%
If we further assume that the intrinsic parameters are equal to their \ac{ML} estimates, $\bm\theta_\mathrm{in} = \hat{\bm\theta}_\mathrm{in}$, then this reduces to what we call the autocorrelation likelihood,
%
\begin{multline}\label{eq:autocor-likelihood}
p\left(\hat\rho_i, \hat\gamma_i, \hat\tau_i
\middle| \rho_i, \gamma_i, \tau_i \right)
\\
\propto
\exp \left[ - \frac{1}{2}{\hat\rho_i}^2 - \frac{1}{2}{\rho_i}^2
+ \hat\rho_i \rho_i \Re \left\{ e^{i \tilde{\gamma}_i} a_i^*(\tilde{\tau}_i)
\right\}
\right],
\end{multline}
%
with $\tilde{\gamma}_i = \hat\gamma_i - \gamma_i$, $\tilde{\tau}_i = \hat\tau_i - \tau_i$, and the template's autocorrelation function $a_i(t; \bm\theta_\mathrm{in})$ defined as
%
\begin{equation}\label{eq:autocorrelation-function}
a_i(t; \bm\theta_\mathrm{in}) := \frac{1}{{{\sigma_i}^2(\bm\theta_\mathrm{in})}} \int_0^\infty \frac{\left| H(\omega; \hat{\bm\theta}_\mathrm{in})\right|^2}{S_i(\omega)} e^{i \omega t} \,d\omega.
\end{equation}
%
Some example autocorrelation functions and corresponding likelihoods are shown in Fig.~\ref{fig:autocorr-likelihood}. To assemble the joint likelihood for the whole network, we form the product of the autocorrelation likelihoods from the individual detectors:
%
\begin{multline}
p\left(\left\{\hat\rho_i, \hat\gamma_i, \hat\tau_i\right\}_i
\middle| \left\{\rho_i, \gamma_i, \tau_i\right\}_i \right) \\
\propto
\exp \left[ - \frac{1}{2} \sum_i {\hat\rho_i}^2 - \frac{1}{2} \sum_i {\rho_i}^2
+ \sum_i \hat\rho_i \rho_i \Re \left\{ e^{i \tilde{\gamma}_i} a^*(\tilde{\tau}_i)
\right\}
\right].
\end{multline}
In the following section, we discuss some key properties of the autocorrelation likelihood.
\begin{figure*}
\includegraphics[width=\textwidth]{autocorr-likelihood}
\caption{\label{fig:autocorr-likelihood}The autocorrelation likelihood for a $(1.4, 1.4)$\,$M_\odot$ binary as observed by four detector configurations: from top to bottom, the final sensitivity achieved by the \acs{LIGO} Hanford, \acs{LIGO} Livingston, and Virgo detectors in their ``initial'' configuration and the final Advanced \acs{LIGO} design sensitivity. The left panels show the noise amplitude spectral densities. The middle panels show the absolute value of the autocorrelation function. The right panel shows the phase-marginalized autocorrelation likelihood for \ac{SNR}=1,~2,~4,~and~8. In the right panel, the time scale is normalized by the \ac{SNR} so that one can see that as the \ac{SNR} increases, a central parabola is approached (the logarithm of a Gaussian distribution with standard deviation given by the Fisher matrix).}
\end{figure*}
\subsection{Properties}
\label{sec:properties}
First, the autocorrelation likelihood has the elegant feature that if we were to replace the autocorrelation function with the \ac{SNR} time series for the best-matching template, $z(\tau; \hat{\mathbf{\theta}}_\mathrm{in})$, we would recover the likelihood for the full \ac{GW} time series, evaluated at the \ac{ML} estimate of the intrinsic parameters, viz.:
%
\begin{equation}\label{eq:extrinsic-only-likelihood}
% p\left(z_i(\hat{\tau}, \hat{\bm\theta}_\mathrm{in})
% \middle| \rho_i, \gamma_i, \tau_i \right) \propto \\
\exp \left[ - \frac{1}{2} \sum_i {\rho_i}^2
+ \sum_i \rho_i \Re \left\{ e^{-i \gamma_i} z_i^*(\tau_i)
\right\}
\right].
\end{equation}
%
[We have omitted the term $\int |Y_i(\omega)|^2/S(\omega)d\omega$, which takes the place of the earlier ${\hat{\rho}_i}^2$ term and is only important for normalization.] The numerical scheme that we will develop is thus equally applicable for rapid, coincidence-based localization, or as a fast extrinsic marginalization step for the full parameter estimation.
Second, observe that at the true parameter values, $\hat{\bm\theta}_i = \bm\theta_i$, the logarithms of Eqs.~(\ref{eq:autocor-likelihood}) and~(\ref{eq:gaussian-likelihood-spa}) have the same Jacobian. This is because the derivatives of the autocorrelation function are
%
\begin{equation*}
a^{(n)}(t) = i^n \overline{\omega^n},
\end{equation*}
%
with $\overline{\omega^n}$ defined in Eq.~(\ref{eq:angular-frequency-moments}). For example, the first few derivatives are
%
\begin{equation*}
a(0) = 1,
\qquad
\dot{a}(0) = i \overline{\omega},
\qquad
\ddot{a}(0) = -\overline{\omega^2}.
\end{equation*}
Using Eq.~(\ref{eq:general-fisher-matrix-second-derivatives}), we can compute the Fisher matrix elements for the autocorrelation likelihood given by Eq.~(\ref{eq:autocor-likelihood}), with the detector subscript suppressed,
%
\begin{align}
\nonumber
\mathcal{I}_{\rho\rho} &= 1, \\
\nonumber
\mathcal{I}_{\rho\gamma} &= 0, \\
\nonumber
\mathcal{I}_{\rho\tau} &= 0, \\
\label{eq:fisher-autocor-gamma-gamma}
\mathcal{I}_{\gamma\gamma} &= \rho^2
\int_{-T}^T \left|a(t)\right|^2 w(t; \rho) dt, \\
\label{eq:fisher-autocor-tau-tau}
\mathcal{I}_{\tau\tau} &= -\rho^2
\int_{-T}^T \Re\left[a^*(t) \ddot{a}(t)\right] w(t; \rho) dt, \\
\label{eq:fisher-autocor-gamma-tau}
\mathcal{I}_{\gamma\tau} &= -\rho^2
\int_{-T}^T \Im\left[a^*(t) \dot{a}(t)\right] w(t; \rho) dt,
\intertext{where}\label{eq:fisher-weight-factor}
w(t; \rho) &= \frac{
\displaystyle
\exp\left[\frac{\rho^2}{4}\left|a(t)\right|^2\right]
\left(
I_0\left[\frac{\rho^2}{4}\left|a(t)\right|^2\right] +
I_1\left[\frac{\rho^2}{4}\left|a(t)\right|^2\right]
\right)
}{
\displaystyle
2 \int_{-T}^T
\exp\left[\frac{\rho^2}{4}\left|a(t')\right|^2\right]
I_0\left[\frac{\rho^2}{4}\left|a(t')\right|^2\right]
dt'
}.
\end{align}
%
The notation $I_k$ denotes a modified Bessel function of the first kind. Matrix elements that are not listed have values that are implied by the symmetry of the Fisher matrix. Note that the minus signs are correct but a little confusing; despite them, $\mathcal{I}_{\gamma\gamma}, \mathcal{I}_{\tau\tau} \geq 0$ and $\mathcal{I}_{\gamma\tau} \leq 0$. The time integration limits $[-T, T]$ correspond to a flat prior on arrival time or a time coincidence window between detectors.
\begin{figure*}
\centering
\includegraphics[width=\columnwidth]{crlb_tau}
\includegraphics[width=\columnwidth]{crlb_gamma}
\caption[\acl{CRLB} on time and phase accuracy]{\label{fig:crlb-tau}\ac{CRLB} on \acl{RMS} timing uncertainty and phase error, using the likelihood for the full \ac{GW} data [Eq.~(\ref{eq:gaussian-likelihood-spa}); dashed diagonal line] or the autocorrelation likelihood [Eq.~(\ref{eq:autocor-likelihood}); solid lines] with a selection of arrival time priors.}
\end{figure*}
We can show that the weighting function $w(t; \rho)$ approaches a Dirac delta function as $\rho \rightarrow \infty$, so that the Fisher matrix for the autocorrelation likelihood approaches the Fisher matrix for the full \ac{GW} data, Eq.~(\ref{eq:fisher-matrix}), as $\rho \rightarrow \infty$. The Bessel functions asymptotically approach
%
\begin{equation*}
I_0(x), I_1(x) \rightarrow \frac{e^x}{\sqrt{2 \pi x}}
\textrm{ as } x \rightarrow \infty.
\end{equation*}
%
For large $\rho$, the exponents of $e^{\rho^2}$ dominate Eq.~(\ref{eq:fisher-weight-factor}), and we can write
%
\begin{equation*}
w(t; \rho) \rightarrow \frac{
\displaystyle
\exp\left[\frac{\rho^2}{2}|a(t)|^2\right]
}{
\displaystyle
\int_{-T}^T \exp\left[\frac{\rho^2}{2}|a(t')|^2\right] dt'
}
\textrm{ as } \rho \rightarrow \infty.
\end{equation*}
%
The Taylor expansion of $|a(t)|^2$ is
%
\begin{align}
\nonumber
|a(t)|^2 &= 1 + \frac{1}{2} \left(\frac{\partial^2}{\partial t^2}|a(t)|^2 \Bigg|_{t=0}\right) t^2 + \mathcal{O}(t^4) \\
\nonumber
&= 1 - {\omega_\mathrm{rms}}^2 t^2 + \mathcal{O}(t^4).
\end{align}
%
Substituting, we find that $w(t; \rho)$ approaches a normalized Gaussian distribution:
%
\begin{equation*}
w(t; \rho) \approx \frac{
\displaystyle
\exp\left[-\frac{1}{2} \rho^2 {\omega_\mathrm{rms}}^2 t^2\right]
}{
\displaystyle
\int_{-T}^T \exp\left[-\frac{1}{2} \rho^2 {\omega_\mathrm{rms}}^2 (t')^2\right] dt'
}.
\end{equation*}
%
And finally, because the Dirac delta function may be defined as the limit of a Gaussian, $w(t; \rho) \rightarrow \delta(t)$ as $\rho \rightarrow \infty$.
We can now write the Fisher matrix for the autocorrelation likelihood in a way that makes a comparison to the full signal model explicit. Define
%
\begin{align*}
\mathcal{I}_{\gamma\gamma} &= \rho^2 \cdot \textproto{D}_{\gamma\gamma}(\rho), \\
\mathcal{I}_{\tau\tau} &= \rho^2 \overline{\omega^2} \cdot \textproto{D}_{\tau\tau}(\rho), \\
\mathcal{I}_{\gamma\tau} &= -\rho^2 \overline{\omega} \cdot \textproto{D}_{\gamma\tau}(\rho).
\end{align*}
%
Now, the $\textproto{D}_{ij}$%
%
\footnote{The Fish(er) factor.}
%
contain the integrals from Eqs.~(\ref{eq:fisher-autocor-gamma-gamma},~\ref{eq:fisher-autocor-tau-tau},~\ref{eq:fisher-autocor-gamma-tau}) and encode the departure of the autocorrelation likelihood from the likelihood of the full data at a low \ac{SNR}. All of the $\textproto{D}_{ij}(\rho)$ are sigmoid-type functions that asymptotically approach 1 as $\rho \rightarrow \infty$ (see Figs.~\ref{fig:crlb-tau} and~\ref{fig:fishfactor}). The transition \ac{SNR} $\rho_\mathrm{crit}$ is largely the same for all three nontrivial matrix elements, and is determined by the time coincidence window $T$ and the signal bandwidth $\omega_\mathrm{rms}$.
\begin{figure}
\centering
\includegraphics{fishfactor}
\caption[Ratio of Fisher matrix elements between autocorrelation likelihood and full \acs{GW} data]{\label{fig:fishfactor}Ratio between Fisher matrix elements (solid: $\textproto{D}_{\gamma\gamma}$, dashed: $\textproto{D}_{\gamma\tau}$, dotted: $\textproto{D}_{\tau\tau}$) for the autocorrelation likelihood and the full \ac{GW} data. Colors correspond to different arrival time priors as in Fig.~\ref{fig:crlb-tau}.}
\end{figure}
In the limit of large \ac{SNR}, our interpretation is that the point estimates $(\hat\rho, \hat\gamma, \hat\tau)$ contain all of the information about the underlying extrinsic parameters.
On the other hand, in the low \ac{SNR} limit, the diminishing value of $\textproto{D}_{ij}(\rho)$ reflects the fact that some information is lost when the full data $\mathbf{x}$ are discarded. Concretely, as the prior interval $T$ becomes large compared to 1/$\rho\omega_\mathrm{rms}$, the \ac{ML} estimator becomes more and more prone to picking up spurious noise fluctuations far from the true signal. Clearly, when the coincidence window $T$ is kept as small as possible, more information is retained in the \ac{ML} point estimates. Put another way, if $T$ is small, then the transition \ac{SNR} $\rho_\mathrm{crit}$ is also small and fainter signals become useful for parameter estimation. In this way, the \ac{BAYESTAR} likelihood exhibits the \emph{threshold effect} that is well known in communication and radar applications \citep{barankin1949locally,mcaulay1969barankin,mcaulay1971barankin}.
In the following sections, we describe our prior and our numerical schemes to integrate over nuisance parameters, which together amount to the \ac{BAYESTAR} algorithm.
\section{Prior and problem setup}
\label{sec:prior}
The detection pipeline supplies a candidate, $\{\left\{ \hat\rho_i, \hat\gamma_i, \hat\tau_i \right\}_i, \hat{\bm\theta}_\mathrm{in}\}$, and discretely sampled noise \acp{PSD}, $S_i(\omega_j)$, for all detectors. We compute the \ac{GW} signal for a source with intrinsic parameters equal to the detection pipeline's estimate, $H(\omega; \hat{\bm\theta}_\mathrm{in})$.
Then, we find the \ac{SNR}=1 horizon distance $r_{1,i}$ for each detector by numerically integrating Eq.~(\ref{eq:horizon}).
We have no explicit prior on the intrinsic parameters; in our analysis they are fixed at their \ac{ML} estimates, $\hat{\bm\theta}_\mathrm{in}$.\footnote{As noted in footnote \ref{footnote:template-banks}, the detection template bank is typically designed to uniformly sample the Jeffreys prior on the intrinsic parameters. Due to the equivalence of marginalization and maximization with respect to a parameter under a Gaussian distribution, fixing the intrinsic parameters at their \ac{ML} estimates is roughly equivalent to selecting the Jeffreys prior.}
The arrival time prior is connected to the origin of the detector coordinate system. Given the Earth\nobreakdashes-fixed coordinates of the detectors $\mathbf{n}_i$ and the arrival times $\tau_i$, we compute their averages weighted by the timing uncertainty formula:
%
\begin{equation*}
\langle \mathbf{n} \rangle = \frac{
\displaystyle
\sum_i \frac{\mathbf{n}_i}
{\left(\hat\rho_i \omega_{\mathrm{rms},i}\right)^2}
}{
\displaystyle
\sum_i \frac{1}{\left(\hat\rho_i \omega_{\mathrm{rms},i}\right)^2}
},
\qquad
\langle \hat\tau \rangle = \frac{
\displaystyle
\sum_i \frac{\hat\tau_i}
{\left(\hat\rho_i \omega_{\mathrm{rms},i}\right)^2}
}{
\displaystyle
\sum_i \frac{1}{\left(\hat\rho_i \omega_{\mathrm{rms},i}\right)^2}
}.
\end{equation*}
%
Then, we subtract these means:
%
\begin{equation*}
\mathbf{n}_i \leftarrow \mathbf{n}_i - \langle \mathbf{n} \rangle,
\qquad
\hat\tau_i \leftarrow \hat\tau_i - \langle \hat\tau \rangle.
\end{equation*}
%
In these coordinates, now relative to the weighted detector array barycenter, the arrival time prior is uniform in $-T \leq t \leq T$, with $T = \max\limits_i |\mathbf{n}_i| / c + 5~\textrm{ms}$.
The distance prior is a user-selected power of distance,
%
\begin{equation*}
p(r) \propto \begin{cases}
r^m & \text{if } r_\mathrm{min} < r < r_\mathrm{max} \\
0 & \text{otherwise},
\end{cases}
\end{equation*}
%
where $m=2$ for a prior that is uniform in volume and $m=-1$ for a prior that is uniform in the logarithm of the distance. If a distance prior is not specified, the default is uniform in volume out to the maximum \ac{SNR}=4 horizon distance:
%
\begin{equation*}
m = 2,
\qquad
r_\mathrm{min} = 0,
\qquad
r_\mathrm{max} = \frac{1}{4} \max_i r_{1,i}.
\end{equation*}
Finally, the prior is uniform in $-1 \leq \cos\iota \leq 1$ and $0 \leq \psi < \pi$.
We compute the autocorrelation function for each detector from $t = 0$ to $t = T$ at intervals of $\Delta t = 1/f_s$, where $f_s$ is the smallest power of 2 that is greater than or equal to the Nyquist rate. Because BNS signals typically terminate at about 1500~Hz, a typical value for $\Delta t$ is $(4096\,\textrm{Hz})^{-1}$. We use a pruned \ac{FFT} because for BNS systems, the \ac{GW} signal remains in LIGO's sensitive band for $\sim$100\nobreakdashes--1000~s, whereas $T \sim 10$~ms.\footnote{See \url{http://www.fftw.org/pruned.html} for a discussion of methods for computing the pruned \ac{FFT}, the first $K$ samples of an \ac{FFT} of length $N$.}
\section{Marginal posterior}
\label{sec:marginal-posterior}
The marginal posterior as a function of the sky location is
%
\begin{multline}
f(\alpha, \delta) \propto
\int_{0}^{\pi}
\int_{-1}^{1}
\int_{-T}^{T}
\int_{r_\mathrm{min}}^{r_\mathrm{max}}
\int_{0}^{2\pi}
\\
\exp \left[ - \frac{1}{2} \sum_i {\rho_i}^2
+ \sum_i \hat\rho_i \rho_i \Re \left\{ e^{i \tilde{\gamma}_i}
a^*(\tilde{\tau}_i)
\right\}
\right] \\
r^m d\phi_c \, dr \, dt_\oplus \, d\cos{\iota} \, d\psi.
\end{multline}
To marginalize over the coalescence phase, we can write $\tilde{\gamma}_i = \tilde{\gamma}_i^\prime + 2\phi_c$. Then, integrating over $\phi_c$ and suppressing normalization factors, we get
%
\begin{multline}
f(\alpha, \delta) \rightarrow
\int_{0}^{\pi}
\int_{-1}^{1}
\int_{-T}^{T}
\int_{r_\mathrm{min}}^{r_\mathrm{max}}
\\
\exp \left[ - \frac{1}{2} \sum_i {\rho_i}^2 \right] I_0 \left[
\left| \sum_i \hat\rho_i \rho_i e^{i \tilde{\gamma}_i} a_i^*(\tilde{\tau}_i)
\right|
\right] \\
r^m dr \, dt_\oplus \, d\cos{\iota} \, d\psi.
\end{multline}
%
In the above equation, we need not distinguish between $\tilde{\gamma}_i$ and $\tilde{\gamma}_i^\prime$ because the likelihood is now invariant under arbitrary phase shifts of all of the detectors' signals.
\subsection{Integral over angles and time}
The integrand is periodic in $\psi$, so simple Newton\nobreakdashes--Cotes quadrature over $\psi$ exhibits extremely rapid convergence (see Fig.~\ref{fig:angle-convergence}). We therefore sample the posterior on a regular grid of ten points from 0 to $\pi$.
The integral over $\cos\iota$ converges just as rapidly with Gauss\nobreakdashes--Legendre quadrature (see Fig.~\ref{fig:angle-convergence}), so we use a ten\nobreakdashes-point Gauss\nobreakdashes--Legendre rule for integration over $\cos\iota$.
\begin{figure*}
\begin{minipage}[t]{0.5\textwidth}
\centering
(a) Polarization angle \\
\includegraphics{polarization_angle_integral_convergence}
\end{minipage}%
\begin{minipage}[t]{0.5\textwidth}
\centering
(b) Inclination angle \\
\includegraphics{inclination_integral_convergence}
\end{minipage}
\caption{\label{fig:angle-convergence}Relative error in the \ac{BAYESTAR} integration scheme as a function of the number of Gaussian quadrature nodes. The two panels describe (a) the integral over the polarization angle $\psi$ and (b) the integral over the inclination angle $\iota$.}
\end{figure*}
We sample $t_\oplus$ regularly from $-T$ to $T$ at intervals of $\Delta t$. This is typically $\sim 2 (10\,\mathrm{ms})(4096\,\mathrm{Hz}) \approx 80$~samples. We use Catmull\nobreakdashes--Rom cubic splines to interpolate the real and imaginary parts of the autocorrelation functions between samples.
\subsection{Integral over distance}
\label{sec:distance}
The distance integral is now performed differently from what we initially described in Refs.~\cite{leo-singer-thesis,BerryLocalization}; the method described in the present work is about an order of magnitude faster. We define $\rho_i = \omega_i / r$ in order to absorb all of the distance-independent terms in the amplitudes into $\omega_i$ and then define
%
\begin{align}
p^2 &= \frac{1}{2} \sum_i {\omega_i}^2 \label{eq:likelihood-p-factor} \\
b &= \left| \sum_i \hat\rho_i \omega_i e^{i \tilde{\gamma}_i}
a_i^*(\tilde{\tau}_i) \right|. \label{eq:likelihood-b-factor} \end{align}
%
The innermost integral over distance $r$ may then be written as
%
\begin{align}\label{eq:distance-integral}
\mathscr{F} &=
\int_{\mathrlap{r_\mathrm{min}}}^{\mathrlap{r_\mathrm{max}}}
\exp\left[-\frac{p^2}{r^2}\right]
I_0\left[\frac{b}{r}\right] r^m dr \nonumber \\
&=
\int_{\mathrlap{r_\mathrm{min}}}^{\mathrlap{r_\mathrm{max}}}
\exp\left[-\frac{p^2}{r^2}+\frac{b}{r}\right]
\overline{I}_0\left[\frac{b}{r}\right] r^m dr
\end{align}
%
or, completing the square,
%
\begin{align}
\mathscr{F} &= \exp\left[\frac{p^2}{{r_0}^2}\right]
\int_{\mathrlap{r_\mathrm{min}}}^{\mathrlap{r_\mathrm{max}}}
\exp\left[-\left(\frac{p}{r} - \frac{p}{r_0}\right)^2\right]
\overline{I}_0\left[\frac{2p^2}{r r_0}\right] r^m dr \\
&= \exp\left[\frac{p^2}{{r_0}^2}\right] \mathscr{G},
\end{align}
%
where
%
\begin{align}
r_0 &= 2 p^2 / b \\
\overline{I}_0(x) &= \exp(-|x|) I_0(x).
\end{align}
%
The coefficients $p^2$ and $b$ are non\nobreakdashes-negative and independent of distance. $p$ has a maximum value of
%
\begin{equation}
p_\mathrm{max} = \sqrt{\frac{1}{2} \sum_i \left(\frac{r_{1,i}}{r_\mathrm{max}}\right)^2}.
\end{equation}
%
The symbol $\overline{I}_0$ denotes an exponentially scaled Bessel function. In the limit of large argument, $I_0(|x|) \sim \exp(|x|) / \sqrt{2 \pi |x|}$ \citep{Olver:2010:NHMF,NIST:DLMF}\footnote{\url{http://dlmf.nist.gov/10.40.E1}}. The scaled Bessel function is useful for evaluation on a computer because it has a relatively small range $(0, 1]$ and varies slowly in proportion to $x^{1/2}$.
\subsubsection{Parameter grid}
\begin{figure}
\includegraphics{radial_integral_interpolant}
\caption{\label{fig:radial-integral-interpolant}Partition of the parameter space of the distance integral into three regions for (bi)cubic interpolation.}
\end{figure}
This integral is not particularly amenable to low-order Gaussian quadrature. However, luckily $\mathscr{G}$ is a very well-behaved function of $p$ and $r_0$, so we evaluate it using a lookup table and bicubic interpolation. The lookup table is produced in logarithmic coordinates
%
\begin{equation}
x = \log p, \qquad y = \log r_0.
\end{equation}
%
As shown in Fig.~\ref{fig:radial-integral-interpolant}, the function basically consists of a plateau region in the upper-left half of the plane delimited by the lines $y = x$ and $x = \log p_0$, with
%
\begin{equation}
p_0 = \frac{1}{2}
\begin{cases}
r_\mathrm{max} & \text{if } m \geq 0 \\
r_\mathrm{min} & \text{if } m < 0.
\end{cases}
\end{equation}
%
We tabulate $\mathscr{G}$ on a $400 \times 400$ regular grid spanning the range
%
\begin{align}
x_0 &= \log \min(p_0, p_\mathrm{max}) \\
x_\mathrm{min} &= x_0 - (1 + \sqrt{2}) \alpha \\
x_\mathrm{max} &= \log p_\mathrm{max} \\
y_\mathrm{min} &= 2 x_0 - \sqrt{2} \alpha - x_\mathrm{max} \\
y_\mathrm{max} &= x_0 + \alpha
\end{align}
%
where $\alpha = 4$ is a constant parameter that determines the extent of the grid.
\subsubsection{Lookup table construction}
The lookup table for $\mathscr{G}$ is populated as follows. If we neglect both the Bessel function and the $r^m$ prior, then the approximate likelihood $\exp(-(p/r-p/r_0)^2)$ is maximized when $r = r_0$. The likelihood takes on a factor $\eta$ (say, $\eta=0.01$) of its maximum value when
%
\begin{equation}
r = r_\pm = \left(\frac{1}{r_0} \mp \frac{\sqrt{-\log\eta}}{p}\right)^{-1}.
\end{equation}
We have now identified up to five breakpoints that partition the distance integrand into up to four intervals with quantitatively distinct behavior. These intervals are depicted in Fig.~\ref{fig:radial_integrand} with the distance increasing from left to right. There is a left\nobreakdashes-hand or small distance tail in which the integrand is small and monotonically increasing, a left\nobreakdashes- and right\nobreakdashes-hand side of the maximum likelihood peak, and a right\nobreakdashes-hand tail in which the integrand is small and monotonically decreasing. These breakpoints are
%
\begin{equation}
r_\mathrm{break} = \{ r \in
\left\{
\begin{array}{c}
r_\mathrm{min} \\
r_- \\
r_0 \\
r_+ \\
r_\mathrm{max}
\end{array}
\right\} :
r_\mathrm{min} \leq r
\leq r_\mathrm{max}\}.
\end{equation}
We use these breakpoints as initial subdivisions in an adaptive Gaussian quadrature algorithm\footnote{For instance, \acl{GSL}'s \texttt{gsl\_integrate\_qagp} function, \url{http://www.gnu.org/software/gsl/manual/html_node/QAGP-adaptive-integration-with-known-singular-points.html}.}. This function estimates the integral over each subdivision and each interval's contribution to the total error, then subdivides the interval that contributes the most to the error. Subdivisions continue until a fixed total fractional error is reached. In this way, most integrand evaluations are expended on the most important distance interval, whether that happens to be the tails (when the posterior is dominated by the prior) or the peak (when the posterior is dominated by the observations).
\begin{figure}
\begin{center}
\includegraphics[width=\columnwidth]{radial_integrand}
\end{center}
\caption{\label{fig:radial_integrand}Illustration of initial subdivisions for the distance integration scheme. The distance increases from left to right. In the color version, the left\nobreakdashes-hand tail, the left\nobreakdashes- and right\nobreakdashes-hand sides of the maximum likelihood peak, and the right\nobreakdashes-hand tail, are colored cyan, red, green, and blue, respectively.}
\end{figure}
\subsubsection{Interpolation}
The interpolant is evaluated slightly differently depending on which of the three regions marked I, II, and III in Fig.~\ref{fig:radial-integral-interpolant} contains the point of interest. In region I, we use bicubic interpolation of $\log \mathcal{G}$ in $x$ and $y$. In region II, we use univariate cubic interpolation of $\log \mathcal{G}$ in $x$, with the sample points taken from the horizontal boundary between regions I and II. In region III, we use univariate cubic interpolation of $\log \mathcal{G}$ in $u = (x - y) / 2$, with the sample points taken from the downward diagonal boundary between regions I and III. Finally, the distance integral $\mathscr{F}$ is obtained by multiplying the interpolated value of $\mathscr{G}$ by $\exp\left(p^2/{r_0}^2\right)$. For a $400\times400$ grid, the entire lookup table scheme is accurate to a relative error of about $10^{-5}$ in $\mathscr{F}$ (see Fig.~\ref{fig:distance_interpolant_convergence}).
\begin{figure}
\begin{center}
\includegraphics[width=\columnwidth]{distance_interpolant_convergence}
\end{center}
\caption{\label{fig:distance_interpolant_convergence}Relative error in the \ac{BAYESTAR} distance integral interpolation scheme as a function of the size of the grid.}
\end{figure}
\section{Adaptive HEALPix sampling}
\label{sec:adaptive-sampling}
We have explained how we evaluate the marginal posterior at a given sky location. Now we must specify where we choose to evaluate it.
Our sampling of the sky relies completely on \ac{HEALPix}~\cite{healpix}, a special data structure designed for all\nobreakdashes-sky maps. \ac{HEALPix} divides the sky into equal\nobreakdashes-area pixels. There is a hierarchy of \ac{HEALPix} resolutions. A \ac{HEALPix} resolution may be designated by its order $N$. The $N=0$th order or base tiling has 12 pixels. At every successive order, each tile is subdivided into four new tiles. A resolution may also be referred to by the number of subdivisions along each side of the base tiles, $N_\mathrm{side} = 2^N$. There are $N_\mathrm{pix} = 12 {N_\mathrm{side}}^2$ pixels at any given resolution. The \ac{HEALPix} projection uniquely specifies the coordinates of the center of each pixel by providing a mapping from the resolution and pixel index $(N_\mathrm{side}, i_\mathrm{pix})$ to right ascension and declination $(\alpha, \delta)$.
The \ac{BAYESTAR} adaptive sampling process works as follows. We begin by evaluating the posterior probability density at the center of each of the $N_{\mathrm{pix},0} = 3072$ pixels of an $N_{\mathrm{side},0}=16$ \ac{HEALPix} grid. At this resolution, each pixel has an area of 13.4~deg$^2$. We then rank the pixels by contained probability (assuming constant probability density within a pixel) and subdivide the most probable $N_{\mathrm{pix},0}/4$ pixels into $N_{\mathrm{pix},0}$ new daughter pixels. We then evaluate the posterior again at the centers of the new daughter pixels, sort again, and repeat seven times. By the end of the last iteration, we have evaluated the posterior probability density a total of $8 N_{\mathrm{pix},0}$ times. On \emph{most} subdivision steps, we descend one level deeper in \ac{HEALPix} resolution. This process is illustrated in Fig.~\ref{fig:adaptive-illustration}.
\begin{figure*}
\includegraphics[width=\textwidth]{illustration}
\caption{\label{fig:adaptive-illustration}Illustration of the \ac{BAYESTAR} adaptive HEALPix sampling scheme.}
\end{figure*}
The resulting map is a tree structure that describes a mesh of pixels with different resolutions. An example \ac{BAYESTAR} subdivision is shown in Fig.~\ref{fig:adaptive-mesh}. To convert this mesh into a \ac{FITS}~\cite{FITS} image, we traverse the tree and flatten it into the highest resolution represented. The highest possible resolution is $N_\mathrm{side}=2^{11}$, with an area of $\approx 10^{-3}$~deg$^2$ per pixel.\footnote{Although the resulting sky map contains $N_\mathrm{pix} \approx 5\times10^6$ pixels, at most $\approx 2\times10^4$ pixels have distinct values. For the purpose of delivery to observers, therefore, the output is always \texttt{gzip}\nobreakdashes-compressed with a ratio of $\approx 250:1$.}
\begin{figure}
\includegraphics[width=\columnwidth]{adaptive_mesh}
\caption{\label{fig:adaptive-mesh}An example multiresolution \ac{HEALPix} mesh arising from the \ac{BAYESTAR} sampling scheme (plotted in a cylindrical projection). This is event 18951 from Ref.~\cite{FirstTwoYears}.}
\end{figure}
\section{Parallelization}
\ac{MCMC} and similar stochastic schemes are typically very resistant to parallelization. However, \ac{BAYESTAR} is completely deterministic and easily parallelizable because each pixel can be evaluated independently from all of the others. \ac{BAYESTAR} consists of nine computationally intensive loops: the generation of the distance integral lookup table and the eight loops over pixels in the adaptive HEALPix sampling step. The iterations of each loop are distributed across multiple cores using OpenMP\footnote{\url{http://openmp.org/}.} compiler directives. In Sec.~\ref{sec:run-time}, we will show that \ac{BAYESTAR}'s run time is almost perfectly proportional to the number of cores, demonstrating that the serial sections (the sorts between the adaptation steps) are a negligible contribution to the overall wall clock time.
\section{Case study}
\label{sec:case-study}
We have completed our description of the \ac{BAYESTAR} algorithm. In Ref.~\cite{FirstTwoYears}, the authors presented a comprehensive and astrophysically realistic sample of simulated \ac{BNS} mergers. We focused on the first two planned Advanced LIGO and Virgo observing runs as desribed in Ref.~\cite{LIGOObservingScenarios}. That work presented a catalog of 500 sky localizations from \ac{BAYESTAR} and LALINFERENCE and dealt with the quantitative position reconstruction accuracy as well as the qualitative sky morphologies. In the present work, we will use the same data set but instead focus on demonstrating the correctness and performance of the \ac{BAYESTAR} algorithm.
\subsection{Observing scenarios}
To review the assumptions made in Ref.~\cite{FirstTwoYears}, the two scenarios are:
\textit{2015}.---The first Advanced \acs{LIGO} observing run, or ``O1,'' scheduled to start in September 2015 and continue for three months. There are only two detectors participating in this run: \acs{LIGO} Hanford (H) and \acs{LIGO} Livingston (L). Both detectors are expected to operate with a direction\nobreakdashes-averaged \ac{BNS} merger range of 40\nobreakdashes--80\,Mpc (though ongoing Advanced \acs{LIGO} commissioning suggests that the higher end of this range will be achieved). As a result of having only two detectors, most localizations are long, thin arcs a few degrees wide and tens to hundreds of degrees long. The median 90\% credible area is about 600\,deg$^2$.
\textit{2016}.---The second observing run, ``O2,'' with the two between Advanced \acs{LIGO} detectors, upgraded to a \ac{BNS} range of 80\nobreakdashes--120\,Mpc, operated jointly with the newly commissioned Advanced Virgo detector (V), operating at a range of 20\nobreakdashes--60\,Mpc. The run is envisioned as lasting for six months in 2016\nobreakdashes--2017. The detectors are assumed to have random and independent 80\% duty cycles. Consequently, all three detectors (HLV) are in science mode about half of the time, with the remaining time divided roughly equally between each of the possible pairs (HL, HV, or LV) and one or fewer detectors (at least two \ac{GW} facilities are required for a detection). Virgo's range is assumed to be somewhat less than \acs{LIGO}'s because its commissioning time table is about a year behind. Although the simulated signals are generally too weak in Virgo to trigger the matched\nobreakdashes-filter pipeline and contribute to \emph{detection}, even these subthreshold signals aid in position reconstruction with LALINFERENCE by lifting degeneracies. As a result, the median 90\% credible area decreases to about 200\,deg$^2$.
All simulated sources have component masses distributed uniformly between 1.2 and 1.6\,$M_\odot$ and randomly oriented spins with dimensionless magnitudes $\chi = c|\mathbf{S}|/Gm^2$ between -0.05 and +0.05. Sky positions and binary orientations are random and isotropic. Distances are drawn uniformly from ${D_\mathrm{L}}^3$, reflecting a uniform source population (neglecting cosmological effects, which are small within the Advanced \acs{LIGO} \ac{BNS} range).
\subsection{Detection and localization}
The simulated waveforms were deposited in Gaussian noise that has been filtered to have the \acp{PSD} consistent with Ref.~\cite{LIGOObservingScenarios}. They were detected using the real\nobreakdashes-time matched\nobreakdashes-filter pipeline, GSTLAL\_INSPIRAL \citep{Cannon:2011vi}. Candidates with estimated \acp{FAR} less than $10^{-2}$\,yr$^{-1}$ were considered to be ``detections.'' Because using Gaussian noise results in lower \acp{FAR} than would be calculated in realistically glitchy detector noise, we imposed an additional detection threshold on the network \ac{SNR}, $\hat{\rho} \geq 12$, which has been found to correspond to a comparable \ac{FAR} in the initial \ac{LIGO} runs.\footnote{See Ref.~\cite{BerryLocalization} for an analysis of the effect of glitchy noise on detection and parameter estimation.} Localizations for the detections were generated with \ac{BAYESTAR} as well as the functionally equivalent and interchangeable LALINFERENCE\_MCMC, LALINFERENCE\_NEST, and LALINFERENCE\_BAMBI samplers (collectively referred to as LALINFERENCE).
\subsection{Areas}
We measured sky localization areas for each event as follows. First, we ranked the \ac{HEALPix} pixels by descending posterior probability. Then, we computed the cumulative sum of the pixels in that order. Finally, we searched for the index of the pixel of which the cumulative sum was equal to a given value: for example, 0.9 if we are interested in the 90\% credible area. That pixel index times the area per pixel is the area of the smallest region of the specified credible level. This area can be thought of as measuring the precision of the sky localization: it is a measure of the scale of the posterior distribution.
We can construct a second measure, called the searched area, as the smallest such constructed area that contains the true location of the source. A telescope with a \ac{FOV} that is small compared to the characteristic scale of the posterior would intercept the true location of the source after covering the searched area. This measure is mainly useful because it measures the accuracy of the sky localization independently of the precision. In other words, it treats the sky map as merely a ranking statistic.
Histograms of the 90\% credible area and the searched area are shown in Fig.~\ref{fig:area-hist}, broken down by observing scenario (2015 or 2016) and detector network (HL, HV, LV, or HLV). Note that there are no statistically significant differences in areas between \ac{BAYESTAR} and LALINFERENCE, with the exception in the 2016/HLV configuration, for which some LALINFERENCE sky maps span about an order of magnitude less area than \ac{BAYESTAR}. If we consider only events for which all three detectors contained a signal that was loud enough to trigger the matched\nobreakdashes-filter pipeline, the difference becomes much smaller and insignificant within 95\% error bars. This is because if the signal is too weak to trigger the detection pipeline in one of the detectors, then \ac{BAYESTAR} receives no information about that detector. This issue does not occur in the two\nobreakdashes-detector configurations (HL, HV, or LV) because two or more triggers are required to report a detection candidate.
\begin{figure*}
\includegraphics{area-hist}
\caption{\label{fig:area-hist}Cumulative histograms of sky area, broken down by observing run and detector network. The plots in the left column show the 90\% credible area and the plots in the right column show the searched area. From top bottom, the rows refer to the following observing scenarios/network configurations: 2015/HL, 2016/HL, 2016/HV, 2016/LV, and 2016/HLV. The shaded regions represent the 95\% confidence bounds. The magenta lines represent \ac{BAYESTAR} and the blue lines LALINFERENCE. Where relevant, dotted lines show all events in the given network configuration and solid lines show only events for which the matched\nobreakdashes-filter pipeline triggered on all operating detectors. Note that statistically significant differences in areas between the \ac{BAYESTAR} and LALINFERENCE localizations occur only for events that were below the detection threshold in one or more detectors.}
\end{figure*}
This is a significant issue for the 2016 configuration, because the most accurate localizations are possible when all three detectors are operating. However, there may be a simple remedy. As we noted in Sec.~\ref{sec:properties}, the \ac{BAYESTAR} likelihood can be modified to use, instead of the times, phases, and amplitudes on arrival, the full complex matched\nobreakdashes-filter time series from all detectors. The detection pipeline, GSTLAL\_INSPIRAL, would have to be modified to save and transmit a small interval of the complex \ac{SNR} time series (perhaps a few tens of milliseconds) around the time of each detection candidate. In addition to supplying the missing information for subthreshold signals, this would make \ac{BAYESTAR} mathematically equivalent to the LALINFERENCE analysis, but with the intrinsic parameters fixed to their maximum\nobreakdashes-likelihood values. This idea will be pursued in future work.
\subsection{Self-consistency}
As we observed above, the area of a given credible region describes the precision of the sky localization, whereas the searched area describes the accuracy. However, self-consistency requires that the two are related. For example, we should find that on average 90\% of events have their true locations contained within their respective 90\% credible regions. More generally, if we make a cumulative histogram of the credible levels corresponding to the searched areas of all of the events, then we should obtain a diagonal line (with small deviations due to finite sample size). This test, popularized for \ac{GW} data analysis by Ref.~\cite{SiderySkyLocalizationComparison}, is a necessary but not sufficient condition for the validity of any Bayesian parameter estimation scheme.
It is already well established that LALINFERENCE localizations satisfy the $P$\nobreakdashes--$P$ plot test when deployed with accurate templates and reasonable priors. We found at first that \ac{BAYESTAR}'s $P$\nobreakdashes--$P$ plots tended to sag below the diagonal, indicating that though the accuracy (i.e., searched area) was comparable to LALINFERENCE, the \emph{precision} was overstated, with confidence intervals that were only about 70\% of the correct area. This was rectified by prescaling the \acp{SNR} from GSTLAL\_INSPIRAL by a factor of 0.83 prior to running \ac{BAYESTAR}. This correction factor suggests that, for example, a \ac{SNR}=10 trigger from GSTLAL\_INSPIRAL has the \emph{effective information content} of a \ac{SNR}=8.3 signal. The missing information may be due to losses from the discreteness of the template bank, from the \acl{SVD}, from mismatch between the matched\nobreakdashes-filter templates and the simulated signals, from the small but nonzero correlations between masses and intrinsic parameters, or from elsewhere within the detection pipeline. The correction is hard coded into the rapid localization. With it, the $P$\nobreakdashes--$P$ plots are diagonalized without negatively affecting the searched area (see Fig.~\ref{fig:pp}).
\begin{figure*}
\begin{center}
\includegraphics[width=0.45\textwidth]{2015-pp}
\includegraphics[width=0.45\textwidth]{2016-pp}
\end{center}
\caption[\acs{BAYESTAR} $P$\nobreakdashes--$P$ plots]{\label{fig:pp}$P$\nobreakdashes--$P$ plots for \ac{BAYESTAR} and LALINFERENCE localizations in the 2015 and 2016 configurations. The gray lozenge around the diagonal is a target 95\% confidence band derived from a binomial distribution.}
\end{figure*}
\subsection{Run time}
\label{sec:run-time}
Since \ac{BAYESTAR} is designed as one of the final steps in the real\nobreakdashes-time \ac{BNS} search, it is important to characterize how long it takes to calculate a sky map. We compiled \ac{BAYESTAR} with the Intel~C~Compiler~(\texttt{icc}) at the highest architecture\nobreakdashes-specific optimization setting (\texttt{-ipo~-O3~-xhost}). We timed it under Scientific~Linux~6.1 on a Supermicro SuperServer 6028TP\nobreakdashes-HTTR system with dual 8\nobreakdashes-core Intel Xeon E5\nobreakdashes-2630~v3 CPUs clocked at 2.40\,GHz, capable of executing 32 threads simultaneously (with hyperthreading). In Fig.~\ref{fig:runtimes}, we show how long it took to calculate a localization with \ac{BAYESTAR} as the number of OpenMP threads was varied from 1 to 32. This is a violin plot, a smoothed vertical histogram. The magenta regions show run times for a two\nobreakdashes-detector network (HL) modeled on the first scheduled Advanced \ac{LIGO} observing run in 2015, and the blue regions show run times for a three\nobreakdashes-detector network (HLV) based on the second planned observing run in 2016. These are the two observing scenarios that are discussed in Ref.~\cite{FirstTwoYears}.
\begin{figure}
\centering
\includegraphics{runtimes}
\caption[\acs{BAYESTAR} run time]{\label{fig:runtimes}Violin plot of \ac{BAYESTAR} run times as the number of OpenMP threads is varied from 1 to 32. The 2015 scenario is shown in red and the 2016 scenario in blue.}
\end{figure}
Several features are apparent. First, at any number of threads, the two configurations have similar run times, although the 2016 events contain a subpopulation of outliers that take about 2.5 times as long as the 2015 events. These are probably due to taking one of the more expensive code branches in the distance integral interpolation. Second, the run times decrease proportionally to the number of threads. Based on experiences running \ac{BAYESTAR} on the 32\nobreakdashes-core (64 threads with hyperthreading) cluster login machine, we expect the almost ideal parallel speedup to continue on machines with even more processors.
With just one thread, the \ac{BAYESTAR} analysis takes 76\nobreakdashes--356~s, already orders of magnitude faster than the full parameter estimation. With 32 threads, \ac{BAYESTAR} takes just 4\nobreakdashes--13~s. In practice, \ac{BAYESTAR}'s data handling (reading the detectors' \acp{PSD}, communicating with the \ac{GW} candidate database, writing \ac{FITS} files) takes an additional 15~s or so, though this overhead could be reduced by parallelizing many of these steps. The overall latency is comparable to the other stages (data aggregation, trigger generation, alert distribution) in the real\nobreakdashes-time \ac{BNS} analysis; therefore, any significant further speedup would require significant changes through Advanced \acs{LIGO} computing and infrastructure. The 32\nobreakdashes-thread configuration is representative of how \ac{BAYESTAR} might be deployed in early Advanced~\ac{LIGO}.\footnote{\ac{BAYESTAR} has been successfully ported to the Intel's Many Integrated Core architecture and has been tested in a 500 thread configuration on a system with dual Intel Xeon Phi 5110P coprocessors.} For comparison, sky localization with LALINFERENCE takes about 100\,h~\cite{BerryLocalization}.
Note that this benchmark shows \ac{BAYESTAR} to be an order of magnitude faster than what was reported in Refs.~\cite{leo-singer-thesis,BerryLocalization} due to the changes in the distance integration scheme that we noted in Sec.~\ref{sec:distance}.
\section{Future work}
One immediately pressing direction for future work is to address the issue of subthreshold signals, as this will be a major issue when Advanced Virgo comes online in 2016\nobreakdashes--2017. Using the full \ac{SNR} time series in place of the autocorrelation function seems like a promising avenue; implementing this requires some infrastructure changes to both the matched-filter pipeline and \ac{BAYESTAR}. Along these lines, we also refer the reader to Ref.~\cite{PankowRapidPE} for a similar, non\nobreakdashes-\ac{MCMC} approach to the rapid exploration of the full parameter space.
A more open\nobreakdashes-ended question is how to account for spin precession. The simulations in Ref.~\cite{FirstTwoYears} and in this paper featured extremely modest spins of $\chi \leq 0.05$, consistent with the fastest known pulsars in binaries \citep{2003Natur.426..531B,DetectingBNSSystemsWithSpin}. The signals were detected using a template bank that lacked spins entirely. Reference~\cite{FirstTwoYearsSpin} shows that using nonspinning \ac{BNS} templates for parameter estimation has negligible impact on sky localization. However, if one or both companions are spinning as fast as a millisecond pulsar, $\chi \sim 0.4$ \cite{FastestSpinningMillisecondPulsar}, or even near breakup, $\chi \sim 0.7$, then the orbital plane may precess; in this case, spins can no longer be neglected for detection \cite{DetectingBNSSystemsWithSpin} and may also be important for parameter estimation. Since spinning \ac{BNS} searches are still an active area of development, \ac{BAYESTAR}'s sky localization accuracy in this regime should be reexamined in the future.
Although the response time of \ac{BAYESTAR} has been driven by the anticipated time scales for kilonova and afterglow emission, a recurring question is whether there is any detectable \ac{EM} signal in the seconds before, during, and after the merger itself. Since the \ac{GW} inspiral signal is in principal detectable for up to hundreds of seconds before merger, one could imagine positioning rapidly slewing instruments to search for any prompt emission. This concept was explored by the authors \citep{Cannon:2011vi}.
On the topic of very low\nobreakdashes-latency localization, we also recommend Chen~\&~Holz~\cite{ChenHolzRapidLocalization}, who propose a rapid localization scheme that is similar to ours, but even faster because it makes some additional compromises: their likelihood is strictly Gaussian, so one more marginalization integral (the integral over arrival time) can be performed analytically.
\section{Conclusion}
We have presented a novel, fast, accurate, Bayesian algorithm for inferring the sky locations of compact binary merger sources that may soon be detected by advanced ground\nobreakdashes-based \ac{GW} detectors. For \ac{BNS} systems with small spins, we have shown that \ac{BAYESTAR} produces sky maps that are as accurate as the full \ac{MCMC} parameter estimation code but can do so within $\sim$10\,s after a detection. Still faster response times should be possible in the future (if warranted) by deploying \ac{BAYESTAR} on machines with more cores or by distributing \ac{BAYESTAR} across multiple computers.
Following a \ac{BNS} merger, the signal will be detected by the matched\nobreakdashes-filter pipeline within tens of seconds; an alert containing the time and estimated significance of the event can be distributed almost immediately (although a human validation stage that may be present at the beginning of the first observing run may introduce some additional latency). The localization from \ac{BAYESTAR} will be available tens of seconds to a minute later. Finally, the refined localization and the detailed estimates of masses and spins from LALINFERENCE will be distributed hours to days later.
Relevant time scales for possible \ac{EM} counterparts to \ac{GW} signals include seconds (the prompt \ac{GRB} signature), hundreds of seconds (extended emission and X\nobreakdashes-ray plateaus that are observed for some short \acp{GRB}), minutes to hours (X\nobreakdashes-ray and optical afterglow), hours to days (the kilonova or the blue flashes associated with unbound ejecta or disk winds), and days to years (the radio afterglow). For the first time, we are able to provide accurate localizations before the peak of any of these \ac{EM} signatures (except for the short \ac{GRB} or any premerger signal). Even for components like the kilonova that should peak within hours to days, the availability of the localizations within seconds might provide a window of several hours to obtain tiled images of the area before the \ac{EM} emission begins. These could be used as reference images, crucial at optical wavelengths for establishing the rapid rise and quickly distinguishing from slower background transients.
\input{ligo-acronyms/acronyms}
\begin{acknowledgements}
This paper has LIGO Document Number LIGO\nobreakdashes-P1500009\nobreakdashes-v8.
We thank John Veitch and Will Farr for chairing a review of the analysis and code. We thank Britt Griswold for assistance with preparing Fig.~\ref{fig:adaptive-illustration}.
\end{acknowledgements}
LIGO was constructed by the California Institute of Technology and Massachusetts Institute of Technology with funding from the \ac{NSF} and operates under Cooperative Agreement No.~PHY\nobreakdashes-0107417.
This research was supported by the \ac{NSF} through a Graduate Research Fellowship to L.~S. L.~S. thanks the Aspen Center for Physics and \ac{NSF} Grant \#1066293 for hospitality during the editing of this paper.
Source code for \ac{BAYESTAR} is available as part of LALInference,\footnote{\url{https://ligo-vcs.phys.uwm.edu/cgit/lalsuite/tree/lalinference}.} the open source LIGO/Virgo parameter estimation toolchain, which is in turn part of LALSuite.\footnote{\url{http://www.lsc-group.phys.uwm.edu/daswg/projects/lalsuite.html}.} This research made use of Astropy\footnote{\url{http://www.astropy.org}.}~\citep{astropy}, a community-developed core Python package for astronomy. Some of the results in this paper have been derived using \ac{HEALPix}\footnote{\url{http://healpix.sourceforge.net}.}~\cite{healpix}.
Some results were produced on the NEMO computing cluster operated by the Center for Gravitation and Cosmology at University of Wisconsin\nobreakdashes--Milwaukee under \ac{NSF} Grants No.~PHY\nobreakdashes-0923409 and No.~PHY\nobreakdashes-0600953.
\appendix
\section{Independence of intrinsic and extrinsic errors}
\label{sec:independence-intrinsic-extrinsic}
If all of the detectors have the same noise \acp{PSD} up to multiplicative factors, $c_1 S_1(\omega) = c_2 S_2(\omega) = \cdots = c_n S_n(\omega) \equiv S(\omega)$, then we can show that the errors in the intrinsic parameters (masses) are not correlated with sky position errors. This is because we can change variables from amplitudes, phases, and times to amplitude ratios, phase differences, and time differences. With $N$ detectors, we can form a single average amplitude, time, and phase, plus $N-1$ linearly independent differences. The averages are correlated with the intrinsic parameters, but neither are correlated with the differences. Since only the differences inform the sky location, this gives us license to neglect uncertainty in masses when we are computing the sky resolution.
This is easiest to see if we make the temporary change of variables $\rho \rightarrow \varsigma = \log \rho$. This allows us to factor out the \ac{SNR} dependence from the single-detector Fisher matrix. The extrinsic part becomes
%
\begin{align}
\mathcal{I}_{\bm\theta_i,\bm\theta_i} &= \bordermatrix{
~ & \varsigma_i & \gamma_i & \tau_i \cr
\varsigma_i & {\rho_i}^2 & 0 & 0 \cr
\gamma_i & 0 & {\rho_i}^2 & -{\rho_i}^2 {\overline{\omega}}_i \cr
\tau_i & 0 & -{\rho_i}^2 {\overline{\omega}}_i & {\rho_i}^2 {\overline{\omega^2}}_i
}
\\
\nonumber
&= {\rho_i}^2 \left( \begin{array}{ccc}
1 & 0 & 0 \\
0 & 1 & -{\overline{\omega}}_i \\
0 & -{\overline{\omega}}_i & {\overline{\omega^2}}_i
\end{array} \right).
\end{align}
%
Due to our assumption that the detectors' \acp{PSD} are proportional to each other, the noise moments are the same for all detectors, ${\overline{\omega^k}}_i \equiv {\overline{\omega^k}}$. Then, we can write the single-detector Fisher matrix as
%
\begin{equation}
\mathcal{I}_i = {\rho_i}^2 \left(
\begin{array}{cc}
A & B \\
B\transpose & C