-
Notifications
You must be signed in to change notification settings - Fork 21
/
Copy pathPrediction markets.page
1097 lines (733 loc) · 222 KB
/
Prediction markets.page
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
description: My prediction/betting strategies and track record, reflections on rationality, prediction judgments
tags: psychology, statistics, predictions, politics, Bitcoin
created: 10 Jan 2009
status: finished
belief: highly likely
...
Everything old is new again. Wikipedia is the collaboration of amateur gentlemen, writ in countless epistolary IRC or email or talk page messages. And the American public's [untrammeled betting on elections](http://www.unc.edu/~cigar/papers/PoP_submit4.pdf) and victories has been reborn as [prediction markets](!Wikipedia).
# Prediction markets
Wikipedia admirably summarizes the basic idea:
> Prediction markets...are speculative markets created for the purpose of making predictions. Assets are created whose final cash value is tied to a particular event (e.g., will the next US president be a Republican) or parameter (e.g., total sales next quarter). The *current market prices can be interpreted as predictions of the probability of the event or the [expected value](!Wikipedia)* of the parameter[^interpretation]. Prediction markets are thus structured as betting exchanges, without any risk for the bookmaker.
[^interpretation]: As is true of every short description, this is a little over-simplified. People are risk-averse and fundamentally uncertain, so their beliefs about the true probability won't directly translate into the percentage/price they will buy at, and one can't even average out and say 'this is what the market believes the probability is'. See economist Rajiv Sethi's ["On the Interpretation of Prediction Market Data"](http://rajivsethi.blogspot.com/2011/03/on-interpretation-of-prediction-market.html) & ["From Order Books to Belief Distributions"](http://rajivsethi.blogspot.com/2011/03/from-order-books-to-belief.html); for more rigor, see Wolfers & Zitzewitz's paper, ["Interpreting Prediction Market Prices as Probabilities"](http://www.econstor.eu/bitstream/10419/33261/1/510931871.pdf)
Emphasis is added on the most important characteristic of a prediction market, the way in which it differs from regular stock markets. The idea is that by tracking accuracy - punishing ignorance & rewarding knowledge in equal measure - a prediction market can elicit one's *true* beliefs, and avoid the failure mode of predictions as pundit's bloating or wishful thinking or signaling alignment:
> "The usual touchstone of whether what someone asserts is mere persuasion or at least a subjective conviction, i.e., firm belief, is betting. Often someone pronounces his propositions with such confident and inflexible defiance that he seems to have entirely laid aside all concern for error. A bet disconcerts him. Sometimes he reveals that he is persuaded enough for one ducat but not for ten. For he would happily bet one, but at 10 he suddenly becomes aware of what he had not previously noticed, namely that it is quite possible that he has erred."^[[Immanuel Kant](!Wikipedia), _[Critique of Pure Reason](!Wikipedia)_ (A824/B852)]
### Events, not dividends or sales
Imagine a prediction market in which every day the administrator sells off pairs of shares (he doesn't want to risk paying out more than he received) for $1 a share, and all the shares say either heads or tails. Then he flips a coin and gives everyone with a 'right' share $2. Obviously if people bid up heads to $5, this is crazy and irrational - even if heads wins today, one would still lose. Similarly for any amount greater than $2. But $2 is also crazy: the only way this share price doesn't lose money is if heads is 100% guarantee. Of course, it isn't. It is quite precisely guaranteed to not be the case - 50% not the case. Anything above 50% is going to lose in the long run.
A smart investor could come into this market, and blindly buy any share whatsoever that was less than $1; they would make money. If their shares were even 99¢, then about half would turn into $2 and half into 0...
This is all elementary and obvious, and its how we can convince ourselves that market prices can indeed be interpreted as predictions of expected value. But that's only because the odds are known in advance! We specified it was a fair coin. If the odds of the event were not known, then things would be much more interesting. No one bets on a coin flip: we bet on whether John is bluffing.
Real prediction markets famously prefer to make the subject of a share a topic like the party of the victor of the 2008 American presidential elections; a topic with a relatively clear outcome (barring the occasional George W. Bush or coin landing on its edge) and of considerable interest to many.
Interest, I mean, not merely for speculating on, but possibly of real world importance. Advocates for prediction markets as tools, such as [Robin Hanson](!Wikipedia), tirelessly remind us of the possible benefits in 'aggregating information'. A prediction market rewards clear thinking and insider information, but they focus on topics it'd be difficult to clearly bet for or against on regular financial markets.
Yes, if I thought the financial markets were undervaluing green power stocks because they were too weighing Senator John McCain's presidential candidacy too heavily, then I could do something like short those stocks. But suppose that's all I know about the green power stocks and the financial markets? It'd be madness to go and trade on that belief alone. I'd be exposing myself to countless risks, countless ways for the price of green stocks to be unconnected to McCain's odds, countless intermediaries, countless other relations of green stocks which may cancel out my correct appraisal of one factor. Certainly in the long run, weakly related factors will have exactly the effect they deserve to have. But this is a long run in which the investor is quite dead.
Prediction markets offer a way to cut through all the confounding effects of proxies, and bet directly and precisely on that bit of information. If I believe Senator Barack Obama has been unduly discounted, then I can directly buy shares in him instead of casting about for some class of stocks that might be correlated with him - which is a formidable task in and of itself; perhaps oil stocks will rise because Obama's platform includes withdrawal from Iraq which render the Middle East less stable, or perhaps green stocks will rise for similar reasons, or perhaps they'll all fall because people think he'll be incompetent, or perhaps optimism over a historic election of a half-black man and enthusiasm over his plans will lift all boats...
One will never get a faithful summation of all the information about Obama scattered among hundreds or thousands of traders if one places multiple difficult barriers in front of a trader who wishes to turn his superior knowledge or analysis into money.
Or here's another example: many of the early uses of prediction markets have been inside corporations, betting on metrics like quarterly sales. Now, all of those metrics are important and will in the long run affect stock prices or dividends. But what employee working down in the R&D department is going to say 'People are too optimistic about next year's sales, the prototypes just aren't working as well as they would need to' and go short the company's stock? No one, of course. A small difference in their assessment from everyone else's is unlikely to make a noticeable price difference, even if the transaction costs of shorting didn't bar it. And yet, the company wants to know what this employee knows.
## How much to bet
There's something of an [efficient market](!Wikipedia) issue with prediction markets, specifically a [no-trade theorem](!Wikipedia). Given that unlike the regular stock market, trades in prediction markets are usually [zero-sum](!Wikipedia)[^subsidy], and so lots of traders are going to be net losers. If you don't have any particular reason to think you are one of the wolves canny enough to make money off the sheep, then you're one of the sheep, and why trade at all? (I understand poker players have a saying - if you can't spot the fish at the table, you're the fish.)
[^subsidy]: Or negative-sum, when you consider the costs of running the prediction market and the various fees that might be assessed on participants - the house needs a cut. In some circumstances, prediction markets can be positive-sum for traders: if some party benefits from the information and will subsidize it to encourage trading. For example, when companies run internal prediction markets they tend to subsidize the markets.
Public prediction market subsidies are much rarer - the only instance I know of is [Peter McCluskey subsidizing 2008 Intrade markets](http://www.bayesianinvestor.com/amm/) ([announcement](http://www.overcomingbias.com/2008/01/presidential-de.html)). As far as he could tell in November 2008, his subsidies [did not do much](http://www.bayesianinvestor.com/blog/index.php/2008/11/13/automated-market-maker-results/ "Automated Market Maker Results"). I emailed him May 2012, and he said:
> I was somewhat disappointed with the results.
>
> I don't expect a small number of subsidized markets is enough to accomplish much. I suspect it would require many donors (or a billionaire donor) to create the markets needed for me to consider them successful. I see no hint that my efforts encouraged anyone else to subsidize such markets.
So, the bad-and-self-aware won't participate. If you are trading in a prediction market, you are either good-and-aware or good-and-ignorant or bad-but-ignorant. Ironically, the latter two can't tell whether they are the first group or not. It reminds me of the [smoking lesion](http://wiki.lesswrong.com/wiki/Smoking_lesion) puzzle or ["King Solomon's problem"](http://lesswrong.com/lw/3pf/does_evidential_decision_theory_really_fail/) in [decision theory](!Wikipedia): you may have many [cognitive bias](!Wikipedia)es such as the [overconfidence effect](!Wikipedia) (the lesion), and they may cause you to fail or succeed on the prediction market (get cancer or not) and also want to participate therein. What do you do?
Best of course is to test for the lesion directly - to test whether our predictions are [calibrated](!Wikipedia "Calibration (statistics)")^[See also the [LessWrong articles](http://lesswrong.com/r/lesswrong/tag/calibration/) on the topic.], whether events we confidently predict at 0% do in fact never happen and so on. If we manage to overcome our biases, we can give [calibrated probability assessment](!Wikipedia)s. We can do this sort of testing with the relevant biases - just knowing about them and introspecting about one's predictions can improve them. Coming up with the precise reasons one is making a prediction improves one's predictions[^Delphi-reasons] and can also help with the [hindsight bias](!Wikipedia)^[See ["Eliminating the hindsight bias"](/docs/sunkcosts/1988-arkes.pdf), Arkes 1988.] or the temptation to [falsify your memories](http://www.wired.com/wiredscience/2011/10/how-friends-ruin-memory-the-social-conformity-effect/) [based on social feedback](http://www.weizmann.ac.il/neurobiology/labs/dudai/uploads/files/Science-2011-Edelson-108-11.pdf), all of which is important to figuring out how well you will do in the future. We can quickly test calibration using our partial ignorance about many factual questions, eg. the links in ["Test Your Calibration!"](http://lesswrong.com/lw/1f8/test_your_calibration/). [My recent practice](http://predictionbook.com/users/gwern) with thousands of real-world predictions on [PredictionBook.com](http://predictionbook.com/) has surely helped my calibration.
[^Delphi-reasons]: [Rowe & Wright 2001](/docs/predictions/2001-rowe.pdf "Expert opinions in forecasting: the role of the Delphi technique"), reviewing studies of the [Delphi method](!Wikipedia):
> When one restricts the exchange of information among panelists so severely and denies them the chance to explain the rationales behind their estimates, it is no surprise that feedback loses its potency (indeed, the statistical information may encourage the sort of group pressures that Delphi was designed to pre-empt). We ([Rowe and Wright 1996](/docs/predictions/1996-rowe.pdf "The impact of task characteristics on the performance of structured group forecasting techniques")) compared a simple iteration condition (with no feedback) to a condition involving the feedback of statistical information (means and medians) and to a condition involving the feedback of reasons (with no averages) and found that the greatest degree of improvement in accuracy over rounds occurred in the "reasons" condition. Furthermore, we found that, although subjects were less inclined to change their forecasts as a result of receiving reasons feedback than they were if they received either "statistical" feedback or no feedback at all, when "reasons" condition subjects *did* change their forecasts they tended to change towards more accurate responses. Although panelists tended to make greater changes to their forecasts under the "iteration" and "statistical" conditions than those under the 'reasons' condition, these changes did not tend to be toward more accurate predictions. This suggests that informational influence is a less compelling force for opinion change than normative influence, but that it is a more effective force. Best (1974) has also provided some evidence that feedback of reasons (in addition to averages) can lead to more accurate judgments than feedback of averages (e.g., medians) alone.
It may be a stretch to generalize this to a single person predicting on their own, though many tools involve groups or you could view predicting as a Delphi method involving temporally separated *selves*. (If multiple selves works for [Ainslie](!Wikipedia "George Ainslie (psychologist)") in explaining addiction, why not predicting?)
So, how much better are you than your competing traders? What is your edge? This, believe it or not, is pretty much all you need to know to know how much to bet on any contract. The exact fraction of your portfolio to bet given a particular edge is defined by the [Kelly criterion](!Wikipedia) ([more details](http://www.elem.com/~btilly/kelly-criterion/)) which gives the greatest possible expected utility of [your growth rate](http://r6.ca/blog/20070816T193609Z.html). (But you need to be psychologically tough[^wp] to use it lest you begin to [play irrationally](!Wikipedia "Tilt (poker)"): it's not a [risk averse](!Wikipedia) strategy. And strictly speaking, it doesn't immediately apply to multiple bets you can choose from, but let's say that whatever we're looking at is the bet we feel is the most mispriced and we can do the best on.)
[^wp]: Wikipedia's [criticism section](!Wikipedia "Kelly criterion#Reasons to bet less than Kelly") remarks that "Kelly betting leads to highly volatile short-term outcomes which many people find unpleasant, even if they believe they will do well in the end."
The formula is:
$x = \frac{o \times e - (1 - e)}{o}$
- _o_ = odds
- _e_ = your edge
- _x_ = the fraction to invest
To quote the Wikipedia explanation:
> As an example, if a gamble has a 60% chance of winning ($e = 0.60$), but the gambler receives 1-to-1 odds on a winning bet ($o = 1$), then the gambler should bet 20% of the bankroll at each opportunity ($x = 0.20$), in order to maximize the long-run growth rate of the bankroll.
So, suppose the President's re-election contract was floating at 50%, but based on his performance and past incumbent re-election rates, you decide the true odds are 60%; you can buy the contract at 50% and if you hold until the election and are right, you get back double your money, so the odds are 1:1. The filled-in equation looks like
1. $x = \frac{1 \times 0.6 - (1 - 0.6)}{1}$
2. $x = 1 \times 0.6 - (1 - 0.6)$
3. $x = 0.6 - (1 - 0.6)$
4. $x = 0.6 - 0.4$
5. $x = 0.2$
Hence, you ought to put 20% of your portfolio into buying the President's contract. (If we assume that all bets are double-or-nothing, Wikipedia tells us it simplifies to $x = (2 \times p) - 1$, which in this example would be $(2 \times 0.6) - 1$ = $1.2 - 1$ = $0.2$. But usually our contracts in prediction markets won't be that simple, so the simplification isn't very useful here.)
It's not too hard to apply this to more complex situations. Suppose the president were at, say, 10% but you are convinced the unfortunate equine sex scandal will soon be forgotten and the electorate will properly appreciate _el Presidente_ for winning World War III by making his true re-election odds 80%. You can buy in at 10% and you resolve to sell out at 80%, for a reward of 70% or 7 times your initial stake (7:1). And we'll again say you're right 60% of the time. So your Kelly criterion looks like:
1. $x = \frac{7 \times 0.6 - (1 - 0.6)}{7}$
2. $x = \frac{(7 \times 0.6) - 0.4}{7}$
3. $x = \frac{4.2 - 0.4}{7}$
4. $x = \frac{3.8}{7}$
5. $x = 0.54$
Wow! We're supposed to bet more than half our portfolio despite knowing we'll lose the bet 40% of the time? Well, yes. With an upside like 7x, we can lose several bets in a row and eventually make up our loss. And if we win the first time, we just won huge.
It goes both ways, of course. If we have a market/true-odds of 80%/90% and we do the same thing, we have a return of 12.5% (9/8) rather than 100%, and for that little return ought to risk only:
1. $x = \frac{0.875 \times 0.6 - (1 - 0.6)}{0.875}$
2. $x = \frac{0.875 - 0.4}{0.875}$
3. $x = \frac{0.125}{0.875}$
4. $x = 0.142856$
As one would expect, with a smaller reward but equal risk compared to our first example, the KC recommends a smaller than 0.2 fraction invested.
If one doesn't enjoy calculating the KC, one could always write a program to do so; Russell O'Connor has a nice Haskell blog post on ["Implementing the Kelly Criterion"](http://r6.ca/blog/20070820T175938Z.html) (who also has an interesting post on the [KC and the lottery](http://r6.ca/blog/20090522T015739Z.html).)
## Specific markets
So once we are interested in prediction markets and would like to try them out, we need to pick one. There are several. I generally ignore the 'play money' markets like the [Hollywood Stock Exchange](!Wikipedia), despite their similar levels of accuracy to the real money markets; I just have a prejudice that if I make a killing, then I ought to have a real reward like a nice steak dinner and not just increment some bits on a computer. The primary markets to consider are:
- [Betfair](!Wikipedia) and [BETDAQ](!Wikipedia) are probably the 2 largest prediction markets, but unfortunately, it is difficult for Americans to make use of them - Betfair bans them outright.
- [Intrade](!Wikipedia) is another European prediction market, similar to Betfair and BETDAQ, but it does not go out of its way to bar Americans, and thus is likely the most popular market in the United States. (Its sister site [TradeSports](!Wikipedia) was sport-only, and is now defunct.)
- [HedgeStreet](!Wikipedia) is some sort of hybrid of derivatives and predictions. I know little about it.
- [The Iowa Electronic Markets](!Wikipedia) (IEM) is an old prediction markets, and one of the better covered in American press. It's a research prediction market, so it handles only small quantities of money and trades and has only a few traders[^size]. Accounts max out at $500, a major factor in limiting the depth & liquidity of its markets.
[^size]: On 27 January 2008, the IEM sent out an email which accidentally listed all recipients in the CC; the listed emails totaled 292 emails. Given that many of these traders (like myself) are surely inactive or infrequent, and only a fraction will be active at a given time, this means the 10 or so markets are thinly inhabited.
I didn't want to wager too much money on what was only a lark, and the IEM has the favorable distinction of being clearly legal in the USA. So I chose them.
### IEM
In 2003, I sent in a check for $20. A given market's contracts in the IEM are supposed to sum to $1, so $20 would let me buy around 40 shares - enough to play around with.
#### My IEM trading
##### 2004
> "Like all weak men he laid an exaggerated stress on not changing one's mind."^[[William Somerset Maugham](!Wikipedia), writer (1874-1965)]
Prediction markets are known to have a number of biases. Some of these biases are shared with other betting exchanges; horse-racing is plagued with a 'long-shot favoritism' just like prediction markets are. (An example of long-shot favoritism would be Intrade and IEM shares for libertarian Ron Paul winning the 2008 Republican nomination trading at ludicrous valuations like 10¢, or Al Gore - who wasn't even running - for the Democratic nomination at 5¢.) The financial structure of markets also seems to make shorting of such low-value (but still over-valued) shares more difficult. They can be manipulated, consciously or unconsciously, due to not being very good markets (["They are thin, trading volumes are anemic, and the dollar amounts at risk are pitifully small"](https://web.archive.org/web/20140102011857/http://www.wired.com/techbiz/it/magazine/16-06/st_essay "Prediction Markets Are Hot, But Here's Why They Can Be So Wrong")), and that's where they aren't reflecting the prejudices of their users (one can't help but suspect Ron Paul shares were overpriced because he has so many fans among techies).
I began experimenting with some small trades on IEM's Federal Reserve interest rate market; I had a theory that there was a 'favorites bias' (the inverse of long-shot favoritism, where traders buck the conventional wisdom despite it being more correct). I simply based my trades on what I read in the _New York Times_. It worked fairly well. In 2005, I also dabbled in the markets on Microsoft and Apple share prices, but I didn't find any values I liked.
2004 was, of course, a presidential election year. I couldn't resist, and traded heavily. I avoided Democratic nominations, reasoning that I was too ignorant that year - which was true, I did not expect John Kerry to eventually win the nomination - and focused on the party-victory market. The traders there were far too optimistic about a Democratic victory; I knew 'Bush is a war-time president' (in addition to the incumbency!) as people said, and that this matter a lot to the half of the electorate that voted for him in 2000. Giving him a re-election probability of under 40% was too foolish for words.
I did well on these trades, and then in October, I closed out all my trades, sold my Republican/Bush shares, and bought Kerry. I thought the debates had gone well for Kerry and was confident the Swift Boating wouldn't do much in the end, and certainly couldn't compensate for the albatross of Iraq.
As you know, I was quite wrong in this strategy. Bush did win, and won more than in 2000. And I lost $5-10. (Between a quarter and a half my initial capital. Ouch! I was glad I hadn't invested some more substantial sum like \$200.) I had profited early on from people who had confused what they *wanted* to happen with what *would*, but then I had succumbed to the same thing. Yes, everyone around me (I live in a liberal state) was sure Kerry would win, but that's no excuse for starting off with a correct assessment and then choosing a false one. It was a valuable lesson for me; this experience makes me sometimes wonder whether 'personal' prediction markets, if you will, could be a useful tool.
##### 2005/2006
In 2005 & 2006, I did minimal interesting trading. I largely continued my earlier strategies in the interest rate markets. Slowly, I made up for my failures in 2004.
##### 2007
In 2007, the presidential markets started back up! I surveyed the markets and the political field with great excitement. As anyone remembers, it was the most interesting election in a very long, with such memorable characters (Hillary Clinton, Ron Paul, Barack Obama, John McCain, Sarah Palin) and unexpected twists.
###### The Republicans
As in 2004, the odds of an ultimate Republican victory were far too low - hovering in the range of 30-40%. This is obviously wrong on purely historical considerations (Democrats don't win the presidency *that* often), and seems particularly wrong when we consider that George W. Bush won in 2004. Anyone arguing that GWB poisoned the well for a succeeding Republican administration faces the difficult task of explaining (at least) 2 things:
1. How association with GWB would be so damaging when GWB himself was re-elected in 2004 with a larger percentage of votes than 2000
2. How association with GWB policies like Iraq would be so damaging when the daily security situation in Iraq has clearly improved since 2004.
3. And in general: how a fresh Republican face (with the same old policies) could do any worse than GWB did, given that he will possess all the benefits of GWB's policies and none of the personal animus against GWB.
The key to Republican betting was figuring out who was hopeless, and work from there by essentially short selling them. As time passed, one could sharpen one's bets and begin betting for a candidate rather than against. My list ultimately looked like this:
1. **Ron Paul** was so obviously not going to win. He appealed to only a small minority of the Republican party, had views idiosyncratic where they weren't offensive, and wanted to destroy important Republican constituencies. If the Internets were America, perhaps he could've won.
2. **Rudy Giuliani** was another easy candidate to bet against. He had multiple strikes: he was far too skeevy, questionable ethically (the investigations of Bernard Kerik were well underway at this point), had made himself a parody, had few qualifications, and a campaign strategy that was as ruinous as it was perplexing. He was unacceptable culturally, what with his divorces, loose living, humorous cross-dressing, and New York ways. He would not play well in Peoria.
3. **[Fred Thompson](!Wikipedia)** was undone by being a bad version of Reagan. He didn't campaign nearly as industriously as he needed to. The death knell, as far as I was concerned, was when national publications began mentioning the "lazy like a fox" joke as an old joke. No special appeal, no special resources, no conventional ability...
4. **Mitt Romney** had 2 problems: he was slick and seemed inauthentic, and people focused too much on his being Mormon and Massachusetts governorship (a position that would've been a great aid - if it hadn't been in that disgustingly liberal state). I was less confident about striking him off, but I decided his odds of 20% or so were too generous.
5. **Mike Huckabee** struck me as not having the resources to make it to the nomination. I was even less sure about this one than Mitt, but I lucked out - the supporters of Huckabee began infighting with Romney supporters.
This didn't leave very many candidates for consideration. By this process of elimination, I was in fact left with only John McCain as a serious Republican contender. If you remember the early days, this was in fact a very strange result to reach: John McCain appeared tired, a beaten man from 2004 making one last pro forma try, his campaign inept and riven by infighting, and he was just in general - old, old, old.
But hey, his shares were trading in the 5-15% range. They were the best bargain going in the market. I held them for a long time and ultimately would sell them at 94-99¢ for a roughly 900% gain. (I sold them instead of waiting for the Republican convention because I was forgoing minimal gains, and I was concerned by reports on his health.)
###### The Democrats
A similar process obtained for the Democrats. A certain dislike of Hillary Clinton led me to think that her status as the heir presumptive (reflected in share proces) would be damaged at some point. All of the other candidates struck me as flakes and hopeless causes, with the exception of John Edwards and Barack Obama.
I eventually ruled out John Edwards as having no compelling characteristics and smacking of phoniness (much like Romney). I was never tempted to change my mind on him, and the adultery and hair flaps turned out to be waiting in the wings for him. So I could get rid of Edwards as a choice.
Is it any surprise I lighted on Obama? He had impressed me (and just about everyone else) with his 2004 convention speech, his campaign seemed quite competent and well-funded, the media clearly loved him, and so on. Best of all, his shares were relatively low (30-40%) and I had money left after the Republicans. So I bought Obama and sold Clinton. I eventually sold out of Obama at the quite respectable 78¢.
### Summing up
By the end of the election, I had made a killing on my Obama and McCain shares. My account balance stood at $38; so over the 3 or 4 years of trading I had nearly doubled my investment. $18 is perhaps enough for a steak dinner.
Further, I had learned a valuable lesson in 2004 about my own political biases and irrationality, and had earned the right in 2008 to be smug about foreseeing a McCain and Obama match-up when the majority of pundits were trying to figure out whether Hillary would be running against Huckabee or Romney.
And finally, I've concluded that my few observations aside, prediction markets are pretty accurate. I often use them to sanity-check myself by asking 'If I disagree, what special knowledge do I have?' Often I have none.
When I got out of the IEM, I reflected on my trades: I learned some valuable lessons, I had a good experience, and I came out a believer. I resolved that one day I'd like to try out a more substantial and varied market, like Intrade.
#### IEM logs
The following is an edited IEM trading history for me, removing many limit positions and other expired or canceled trades:
Order date O.time Market Contract Order # Unit price Expiry Resolution type R.# R.price
---------- ------ ------ -------- ----- ----- ---------- ------ ----------------- --- -------
12/29/04 20:16:23 FedPolicyB FRsame0205 Purchase 20 0.048 Traded 10 0.048
12/29/04 20:17:26 FedPolicyB FRup0205 Purchase 2 0.956 Traded 2 0.956
12/29/04 20:17:46 FedPolicyB FRsame0205 Purchase 10 0.049 Traded 10 0.049
02/12/05 17:48:51 Comp-Ret AAPL-05b Bid 5 0.96 3/14/2005 11:59PM Cancel-Manager
02/13/05 16:43:33 Comp-Ret AAPL-05b Bid 7 0.982 3/15/2005 11:59PM Traded 7 0.98
02/21/05 10:03:45 FedPolicyB FRsame0505 Bid 12 0.053 4/23/2005 11:59PM Traded 12 0.053
02/21/05 10:04:35 FedPolicyB FRup0305 Bid 7 0.988 3/23/2005 11:59PM Traded 7 0.988
02/21/05 10:04:35 FedPolicyB FRup0305 Traded 3 0.007 3/3/2005 9:23AM
02/21/05 10:06:59 FedPolicyB FRsame0305 Bid 6 0.007 3/23/2005 11:59PM Traded 3 0.007
02/21/05 10:07:51 Comp-Ret AAPL-05b Bid 5 0.998 3/23/2005 11:59PM Cancel-Manager
02/21/05 10:07:51 Comp-Ret AAPL-05b Traded 4 0.889 2/28/2005 8:56:AM
02/26/05 10:14:08 Comp-Ret AAPL-05c Bid 5 0.889 3/28/2005 11:59PM Traded 1 0.889
02/26/05 10:14:30 Comp-Ret MSFT-05c Bid 1 0.889 3/28/2005 11:59PM Traded 1 0.889
02/26/05 10:15:43 MSFT-Price ? Traded 1 0.4 3/5/2005 10:39PM
03/05/05 12:51:45 MSFT-Price MS025-05cL Bid 5 0.4 4/7/2005 11:59PM Traded 4 0.4
03/05/05 12:53:27 Comp-Ret AAPL-05c Ask 4 0.95 7/7/2005 11:59PM Cancel-Manager
03/05/05 12:53:56 Comp-Ret MSFT-05c Ask 1 0.5 7/7/2005 11:59PM Cancel-Manager
03/05/05 12:54:38 FedPolicyB FRsame0505 Ask 12 0.7 9/7/2005 11:59PM Cancel-Manager
03/05/05 12:55:07 FedPolicyB FRsame0305 Ask 6 0.2 9/7/2005 11:59PM Cancel-Manager
03/05/05 12:55:33 FedPolicyB FRup0305 Ask 6 0.998 6/7/2005 11:59PM Traded 6 0.998
03/05/05 12:55:33 FedPolicyB ? Traded 2 0.803 9/16/2005 3:37PM
03/05/05 12:55:33 FedPolicyB ? Traded 5 0.803 9/16/2005 3:34PM
09/16/05 14:38:57 FedPolicyB FRup0905 Bid 12 0.803 9/20/2005 11:59PM Traded 5 0.803
09/16/05 14:39:34 FedPolicyB FRsame0905 Bid 6 0.17 9/22/2005 11:59PM Traded 6 0.17
09/28/05 23:49:01 FedPolicyB FRsame1105 Bid 15 0.066 10/1/2005 11:59PM Traded 15 0.066
10/07/05 12:28:48 FedPolicyB FRsame1105 Ask 15 0.07 10/9/2006 11:59PM Cancel-Manager
10/07/05 12:29:23 FedPolicyB FRup1105 Bid 2 0.95 10/9/2006 11:59PM Cancel-Manager
10/10/05 14:54:45 FedPolicyB FRup1105 Bid 3 0.97 10/12/2005 11:59PM Traded 3 0.97
12/09/05 15:02:02 FedPolicyB FRup1205 Bid 15 0.995 12/12/2005 11:59PM Traded 15 0.995
12/09/05 15:02:20 FedPolicyB FRsame1205 Bid 10 0.002 12/12/2005 11:59PM Traded 10 0.002
12/09/05 15:02:43 FedPolicyB FRdown1205 Bid 2 0.001 12/13/2005 11:59PM Traded 2 0.001
12/09/05 15:02:43 FedPolicyB ? Traded 2 0.719 6/2/2006 8:41:40AM
12/09/05 15:02:43 FedPolicyB ? Traded 10 0.719 6/2/2006 8:39:46AM
05/31/06 21:28:25 FedPolicyB FRup0606 Bid 22 0.719 6/6/2006 11:59PM Traded 10 0.719
08/07/06 21:19:08 FedPolicyB FRup0806 Bid 20 0.27 8/22/2006 11:59PM Traded 20 0.27
08/07/06 21:19:08 FedPolicyB ? Traded 7 0.608 8/8/2006 1:13:17PM
08/07/06 21:19:47 FedPolicyB FRsame0806 Bid 10 0.608 8/9/2006 11:59PM Traded 3 0.608
08/07/06 21:19:47 FedPolicyB ? Traded 7 0.7 8/7/2006 9:52:43PM
08/07/06 21:20:29 FedPolicyB FRsame0906 Bid 10 0.7 8/9/2006 11:59PM Traded 3 0.7
08/07/06 21:20:54 FedPolicyB FRdown0906 Bid 10 0.006 8/9/2006 11:59PM Traded 10 0.006
08/07/06 21:23:04 PRES08-WTA DEM08-WTA Bid 15 0.5 12/23/2006 11:59PM Traded 15 0.5
08/28/06 09:20:10 PRES08-VS UREP08-VS Bid 10 0.48 12/30/2006 11:59PM Traded 10 0.48
08/28/06 09:20:10 PRES08-VS ? Traded 3 0.5 9/19/2006 10:24AM
08/28/06 09:20:26 PRES08-VS UDEM08-VS Bid 10 0.5 12/30/2006 11:59PM Traded 1 0.5
06/01/07 20:00:20 PRES08-WTA DEM08-WTA Ask 10 0.66 9/3/2007 11:59PM Traded 10 0.66
06/01/07 20:01:24 PRES08-WTA DEM08-WTA Ask 5 0.7 6/3/2008 11:59PM Traded 5 0.7
06/01/07 20:02:21 PRES08-WTA REP08-WTA Bid 10 0.33 9/3/2007 11:59PM Traded 10 0.33
06/01/07 20:04:26 RConv08 ROMN-NOM Bid 5 0.2 7/3/2007 11:59PM Traded 5 0.2
06/01/07 20:05:33 DConv08 OBAM-NOM Purchase 5 0.322 6/1/2007 8:05:33PM Traded 1 0.322
06/06/07 23:41:39 DConv08 DConv08 Buy-bundle 3 1 Traded 3 1
06/06/07 23:42:20 DConv08 EDWA-NOM Ask 3 0.1 6/8/2008 11:59PM Traded 3 0.1
06/06/07 23:42:46 DConv08 DROF-NOM Ask 3 0.13 6/8/2008 11:59PM Traded 3 0.13
06/06/07 23:44:29 RConv08 RConv08 Buy-bundle 3 1 Traded 3 1
06/06/07 23:45:12 RConv08 GIUL-NOM Ask 3 0.21 9/20/2007 11:59PM Traded 3 0.21
06/06/07 23:45:34 RConv08 MCCA-NOM Ask 3 0.15 9/20/2007 11:59PM Traded 3 0.15
06/06/07 23:46:55 PRES08-VS UDEM08-VS Ask 4 0.56 6/8/2008 11:59PM Traded 4 0.56
12/11/07 16:08:57 RConv08 HUCK-NOM Ask 3 0.22 12/13/2007 11:59PM Traded 3 0.22
12/11/07 16:10:08 RConv08 ROMN-NOM Ask 4 0.25 12/13/2007 11:59PM Traded 4 0.25
12/11/07 16:14:22 RConv08 RROF-NOM Ask 3 0.03 12/13/2007 11:59PM Traded 3 0.03
12/11/07 16:16:12 RConv08 MCCA-NOM Bid 5 0.1 12/13/2008 11:59PM Traded 5 0.1
12/11/07 16:16:57 RConv08 RConv08 Buy-bundle 5 1 12/11/2007 4:16PM Traded 5 1
12/11/07 16:17:39 RConv08 GIUL-NOM Sell 5 0.375 12/11/2007 4:17PM Traded 5 0.375
12/11/07 16:18:01 RConv08 HUCK-NOM Sell 5 0.207 12/11/2007 4:18PM Traded 5 0.207
12/11/07 16:18:10 RConv08 MCCA-NOM Sell 5 0.108 12/11/2007 4:18PM Traded 5 0.108
12/11/07 16:18:22 RConv08 ROMN-NOM Sell 5 0.241 12/11/2007 4:18PM Traded 5 0.241
12/11/07 16:18:33 RConv08 THOMF-NOM Sell 5 0.04 12/11/2007 4:18PM Traded 5 0.04
12/11/07 16:18:46 RConv08 RROF-NOM Sell 5 0.02 12/11/2007 4:18PM Traded 5 0.02
12/11/07 16:19:03 RConv08 ROMN-NOM Sell 4 0.24 12/11/2007 4:19PM Traded 4 0.24
12/11/07 16:20:28 DConv08 DConv08 Buy-bundle 10 1 12/11/2007 4:20PM Traded 10 1
12/11/07 16:20:51 DConv08 DROF-NOM Ask 10 0.03 12/13/2008 11:59PM Traded 10 0.03
12/11/07 16:20:51 DConv08 ? Traded 5 0.09 12/19/2007 3:34PM
12/11/07 16:21:31 DConv08 EDWA-NOM Ask 10 0.09 12/13/2008 11:59PM Traded 5 0.09
12/11/07 16:21:31 DConv08 ? Traded 1 0.58 12/11/2007 9:40PM
12/11/07 16:21:31 DConv08 ? Traded 9 0.58 12/11/2007 9:40PM
12/11/07 16:25:21 DConv08 CLIN-NOM Ask 13 0.58 12/13/2008 11:59PM Traded 3 0.58
12/11/07 16:26:08 DConv08 OBAM-NOM Ask 14 0.45 12/13/2008 11:59PM Traded 14 0.45
12/11/07 16:27:05 DConv08 OBAM-NOM Bid 5 0.3 12/31/2007 11:59PM Traded 5 0.3
12/11/07 16:28:51 FedPolicyB FRsame0108 Bid 3 0.31 12/31/2007 11:59PM Traded 3 0.31
02/05/08 22:41:41 RConv08 THOMF-NOM Sell 3 0.002 2/5/2008 10:41PM Traded 3 0.002
02/05/08 22:47:46 DConv08 OBAM-NOM Bid 10 0.42 2/7/2008 11:59PM Traded 10 0.42
02/05/08 22:48:09 DConv08 OBAM-NOM Bid 5 0.43 2/7/2008 11:59PM Traded 5 0.425
02/07/08 14:46:34 DConv08 DConv08 Buy-bundle 5 1 2/7/2008 2:46PM Traded 5 1
02/07/08 14:47:21 DConv08 EDWA-NOM Sell 5 0.002 2/7/2008 2:47PM Traded 5 0.002
02/07/08 14:47:34 DConv08 DROF-NOM Sell 5 0.006 2/7/2008 2:47PM Traded 5 0.006
02/07/08 14:47:54 DConv08 OBAM-NOM Ask 15 0.6 2/9/2008 11:59PM Traded 15 0.6
02/07/08 15:11:51 PRES08-WTA REP08-WTA Ask 10 0.51 2/9/2009 11:59PM Traded 10 0.51
02/07/08 15:13:24 RConv08 RConv08 Buy-bundle 4 1 2/7/2008 3:13PM Traded 4 1
02/07/08 15:13:42 RConv08 GIUL-NOM Sell 4 0.001 2/7/2008 3:13PM Traded 4 0.001
02/07/08 15:13:49 RConv08 HUCK-NOM Sell 4 0.017 2/7/2008 3:13PM Traded 4 0.017
02/07/08 15:13:58 RConv08 ROMN-NOM Purchase 4 0.005 2/7/2008 3:13PM Traded 4 0.005
02/07/08 15:14:06 RConv08 THOMF-NOM Sell 4 0.003 2/7/2008 3:14PM Traded 4 0.003
02/07/08 15:14:14 RConv08 RROF-NOM Sell 4 0.009 2/7/2008 3:14PM Traded 4 0.009
02/07/08 15:14:29 RConv08 RConv08 Buy-bundle 1 1 2/7/2008 3:14PM Traded 1 1
02/07/08 15:14:44 RConv08 ROMN-NOM Sell 9 0.002 2/7/2008 3:14PM Traded 9 0.002
02/07/08 15:14:54 RConv08 GIUL-NOM Sell 1 0.001 2/7/2008 3:14PM Traded 1 0.001
02/07/08 15:15:02 RConv08 HUCK-NOM Sell 1 0.017 2/7/2008 3:15PM Traded 1 0.017
02/07/08 15:15:10 RConv08 THOMF-NOM Purchase 1 0.006 2/7/2008 3:15PM Traded 1 0.006
02/07/08 15:15:22 RConv08 RROF-NOM Sell 1 0.009 2/7/2008 3:15PM Traded 1 0.009
02/07/08 15:15:30 RConv08 THOMF-NOM Sell 2 0.003 2/7/2008 3:15PM Traded 2 0.003
04/06/08 13:52:28 DConv08 CLIN-NOM Ask 5 0.15 4/8/2008 11:59PM Traded 4 0.15
04/06/08 13:52:51 DConv08 CLIN-NOM Ask 1 0.14 4/8/2008 11:59PM Traded 1 0.14
04/06/08 13:52:51 DConv08 ? Traded 3 0.79 4/10/2008 6:45PM
04/06/08 13:55:08 DConv08 OBAM-NOM Bid 5 0.79 4/8/2009 11:59PM Traded 2 0.79
04/06/08 13:59:43 RConv08 RConv08 Buy-bundle 10 1 4/6/2008 1:59PM Traded 10 1
04/06/08 14:00:27 RConv08 GIUL-NOM Sell 10 0.004 4/6/2008 2:00PM Traded 10 0.004
04/06/08 14:00:41 RConv08 HUCK-NOM Sell 10 0.007 4/6/2008 2:00PM Traded 10 0.007
04/06/08 14:00:54 RConv08 ROMN-NOM Sell 10 0.01 4/6/2008 2:00PM Traded 10 0.01
04/06/08 14:01:07 RConv08 THOMF-NOM Sell 10 0.004 4/6/2008 2:01PM Traded 10 0.004
04/06/08 14:01:20 RConv08 RROF-NOM Sell 10 0.025 4/6/2008 2:01PM Traded 10 0.025
04/14/08 13:51:41 DConv08 OBAM-NOM Bid 3 0.78 4/16/2008 11:59PM Traded 3 0.78
05/03/08 12:06:18 DConv08 OBAM-NOM Ask 18 0.78 5/5/2008 11:59PM Traded 18 0.78
05/05/08 20:21:52 RConv08 MCCA-NOM Ask 20 0.94 5/7/2008 11:59PM Traded 20 0.94
05/20/08 15:44:10 PRES08-VS UREP08-VS Sell 10 0.483 5/20/2008 3:44PM Traded 1 0.483
05/20/08 15:45:29 PRES08-VS UREP08-VS Sell 10 0.482 5/20/2008 3:45PM Traded 9 0.482
### Intrade
In 2010, I signed up for Intrade since the IEM was too small and had too few contracts to maintain my interest.
#### Payment
Paying Intrade, as a foreign company in Ireland, was a little tricky. I first looked into paying via debit card, but Intrade demanded considerable documentation, so I abandoned that approach. I then tried a bank transfer since that would be quick; but my credit union failed me and said Intrade had not provided enough information (which seemed unlikely to me, and Intrade's customer service agreed) - and even if they had, they would charge me $10! Finally, I decide to just snail-mail them a check. I was pleasantly surprised to see that postage to Ireland was ~$1, and it made it there without a problem. But very slowly: perhaps 15 days or so before the check finally cleared and my initial $200 was deposited.
#### My Intrade trading
Intrade has a considerably less usable system than IEM. In IEM, selling short is very easy: you purchase a pair of contracts (yes/no) which sum to $0, and then you sell off the opposite. If I think DEM08 is too high compared to REP08, I get 1 share of each and sell the DEM08. Intrade, on the other hand, requires you to 'sell' a share. I don't entirely understand it, but it *seems* to be equivalent.
I wanted to sell short some of the more crazy probabilities such as on Japan going nuclear or the USA attacking North Korea or Iran, but it turned out that to make even small profits on them, I would have to hold them a long time and because their probabilities were so low already, Intrade was demanding large [margins](!Wikipedia "Margin (finance)") - to buy 4 or 5 shorts would lock up half my account!^[The problem is that if a contract is at 10%, and you buy 10 contracts, then if the contract actually pays off, you have to come up with 100% to pay the other people their winnings. Intrade, to guarantee them payment, will make you pay the full 10%, and then freeze the 90% in your account.]
My first trade was to sell short the [Intrade contract](https://www.intrade.com/jsp/intrade/common/c_cd.jsp?conDetailID=702407&z=1285870999458) on [California Proposition 19 (2010)](!Wikipedia), which would legalize non-medical marijuana possession.I reasoned that California recently banned gay marriage at the polls, and medical marijuana is well-known as a joke (lessening the incentive to pass Prop 19), and that its true probability of passing was more like 30% - well below its current price. The contract would expire in just 2 months, making it even more attractive.
It was at 49 when I shorted it. I put around 20% of my portfolio (or ~\$40) after consulting with the [Kelly criterion](#how-much-to-bet). 2 days later, the price had increased to 53.3, and on 4 October, it had spiked all the way to 76%. I began to seriously consider how confident I was in my prediction, and whether I was faced with a choice between losing the full $40 I had invested or buying shares at 76% (to fulfill my shorting contracts) and eating the loss of ~$20. I meditated, and reasoned that there wasn't *that* much liquidity and I had found no germane information online (like a poll registering strong public support), and decided to hold onto my shares. As of 27 October, the price had plummeted all the way to 27%, and continued to bounce around the 25-35% price range. I had at the beginning decided that the true probability was in the 30% decile, and if anything, it was now *underpriced*. Given that, I was running a risk holding onto my shorts. So on 30 October, I bought 10 shares at 26%, closing out my shorts, and netting me $75.83, for a return of $25.83, or 50% over the month I held it.
My second trade dipped into the highly liquid 2012 US presidential elections. The partisan contracts were trading at ~36% for the Republicans and ~73% for the [Democrats](https://www.intrade.com/jsp/intrade/common/c_cd.jsp?conDetailID=639648&z=1285871684768). I would agree that the true odds are >50% for the Democrats since presidents are usually re-elected and the Republicans have few good-looking candidates compared to Obama, who has accomplished quite a bit in office. However, I think 73% is overstated, and further, that the markets always panic during an election and squish the ratio to around 50:50. So I sold Democrat and bough Republican. (I wound up purchasing more Republican contracts than selling Democrat contracts because of the aforementioned margin issues.)
I bought 5 Reps at 39, and shorted 1 Dem at 60.8. 2 days later, they had changed to 37.5 and 62.8 respectively. By 26 November 2010, it was 42 and 56.4. By 1 January 2011, Republicans was at 39.8 and Democrats at 56.8.
Finally, I decided that Sarah Palin has next to no chance at the Republican nomination since she blew a major hole in her credentials by her bizarre resignation as governor, and [her shares](https://www.intrade.com/jsp/intrade/common/c_cd.jsp?conDetailID=652756&z=1285872122100) at 18% were just crazy.
I shorted 10 at 18% since I thought the true odds are more like [10%](http://predictionbook.com/predictions/460). 2 days later, they had risen to 19%. By 26 November, they were still at 19%, but the odds of her [announcing a candidacy](https://www.intrade.com/jsp/intrade/common/c_cd.jsp?conDetailID=686537&z=1290800258450) had risen to 75%. I'd put the odds of her announcing a run at [~90%](http://predictionbook.com/predictions/378) (a mistake, given that she ultimately decided against running in October 2011), but I don't have any spare cash to buy contracts. I *could* sell out of the anti-nomination contracts and put that money into announcement, but I'm not sure this is a good idea - the announcement is very volatile, and I dislike eating the fees. She hasn't done too well as the Tea Party eminence grise, but maybe she prefers it to the hard work of a national campaign?
By 1 January 2011, the nominee odds were still stuck at 18% but the announcement had fallen to 62%. The latter is dramatic enough that I'm wondering whether my 90% odds really are correct (it probably wasn't). By June, I've begun to think that Palin knows she has little chance of winning either the nomination or presidency, and is just milking the speculation for all its worth. Checking on 8 June, I see that the odds of an announcement have fallen from 62% to 33% and a nomination from 18% to 5.9% - so I would have made out very nicely on the nomination contract had I held the short, but been mauled if I had made any shorts on the announcement. I am not sure what lesson to draw from this observation; probably that I am better at assessing outcomes based on a great many people (like a nomination) than outcomes based on a single individual person's psychology (like whether to announce a run or not).
##### Cashing out
In January 2011[^lw], Intrade announced a new fee structure - instead of paying a few cents per trade, one has free trading but your account is charged [$5 every month](http://www.intrade.com/jsp/intrade/help/index.jsp?page=general.html%23fees) or $60 a year (see also the [forum announcement](http://bb.intrade.com/intradeForum/posts/list/4797.page#44860)). Fees have been a problem with Intrade in the past due to the small amounts usually wagered - see for example financial journalist [Felix Salmon](!Wikipedia)'s [2008 complaints](http://www.portfolio.com/views/blogs/market-movers/2008/12/03/the-problem-with-intrade/).
[^lw]: This section first appeared on [LessWrong.com](http://lesswrong.com/) as ["2011 Intrade fee changes, or, Intrade considered no longer useful for LessWrongers"](http://lesswrong.com/r/discussion/lw/3l2/2011_intrade_fee_changes_or_intrade_considered_no/) and includes some discussion.
Initially, the new changes didn't seem so bad to me, but then I compared the annual cost of this fee to my trading stake, ~\$200. I would have to earn a return of 30% just to cover the fee! (This is also pointed out by many in the forum thread above.)
I don't trade very often since I think I'm best at spotting mispricings over the long-term (the CA Proposition 19 contract (WP) being a case in point; despite being ultimately correct, I could have been mauled by some of the spikes if I had tried only short-term trades). If this fee had been in place since I joined, I would be down by $30 or $40.
I'm confident that I can earn a good return like 10 or 20%, but I can't do >30% without taking tremendous risks and wiping myself out.
And more generally, assuming that this isn't raiding accounts[^raid] as a prelude to shutting down (as a number of forumers claim), Intrade is no longer useful for LessWrongers like me as it is heavily penalizing small long-term bets like the ones we are usually concerned with - bets intended to be educational or informative. It may be time to investigate other prediction markets like Betfair, or just resign ourselves to non-monetary/play-money sites like [PredictionBook.com](http://predictionbook.com/).
[^raid]: When I submitted my withdrawal request for my balance, I received an email offering to instead set my account to 'inactive' status such that I could not trade but would not be charged the fee; if I wanted to trade, I would simply be charged that month's \$5. I declined the offer, but I couldn't help wonder - why didn't they simply set all accounts to 'inactive' and then let people opt in to the new fee structure? Or at least set 'inactive' all accounts which have not engaged in any transactions within X months?
Regardless, here are my probabilities for Intrade ending in the next few years:
- [Intrade will close/merge/be sold by 2012](http://predictionbook.com/predictions/2051): 5%
- [Intrade will close/merge/be sold by 2013](http://predictionbook.com/predictions/2052): 8%
- [Intrade will close/merge/be sold by 2015](http://predictionbook.com/predictions/2053): 18%
- [Intrade will not be open for business in 2020](http://predictionbook.com/predictions/4236): 35%
In March 2013 (relevant events post-dating my predictions include the US CFTC attacking Intrade), Intrade announced it was shutting down trading and liquidating all positions. I probably was far too optimistic.
Fortunately for my decision to cash out (I didn't see anything I wanted to risk holding for more than a few weeks), prices had moved enough that I didn't have to take any losses on any positions^[I made $0.31 on DEM.2012, $3.65 on REP.2012, and $1.40 on 2012.REP.NOM.PALIN for a total profit of $5.36.], and I wound up with $223.32. The $5 for January had already been assessed, and there is a 5 euro fee for a check withdrawal, so my check will actually be for something more like $217, a net profit of $17.
I requested my account be closed on 5 January and the check arrived 16 January; the fee for withdrawal was $5.16 and my sum total $218.16 (a little higher than the $217 I had guessed).
### Bitcoin
In May-June 2011, [Bitcoin](!Wikipedia), an online currency, underwent approximately 5-6 doublings of its exchange rate against the US dollar, drawing the interest of much of the tech world and myself. (I had first heard of it when it was at 50 cents to the dollar, but had written it off as not worth my time to investigate in detail.)
During the first doubling, when it hit parity with the dollar, I began reading up on it and acquired a Bitcoin of my own - a donation from Kiba on [#lesswrong](irc://irc.freenode.net#lesswrong) to try out [Witcoin](http://en.bitcoin.it/wiki/Witcoin), which was a social news site where votes are worth fractions of bitcoins. I then [gave my thoughts](http://lesswrong.com/lw/4cs/making_money_with_bitcoin/3kde) on LessWrong when the topic came up:
> After thinking about it and looking at the current community and the surprising amount of activity being conducted in bitcoins, I estimate that bitcoin has somewhere between 0 and 0.1% chance of eventually replacing a decent size fiat currency, which would put the value of a bitcoin at anywhere upwards of $10,000 a bitcoin. (Match the existing outstanding number of whatever currency to 21m bitcoins. Many currencies have billions or trillions outstanding.) Cut that in half to $5000, and call the probability an even 0.05% (average of 0 and 0.1%), and my expected utility/value for possessing a coin is $25 a bitcoin ($5000 \times 0.005$).
I was more than a little surprised that by June, my expected value had already been surpassed by the market value of bitcoins. Which leads to a tricky question: should I sell now? If Bitcoin is a bubble as frequently argued, then I would be foolish not to sell my 5 bitcoins for a cool $130 (excluding transaction costs). But... I had not expected Bitcoin to rise so much, and if Bitcoin did better than I expected, doesn't it follow that I should no longer believe the probability of success is merely 0.05%? Shouldn't it have increased a bit? Even if it increased only to 0.07%, that would make the EV more like $35 and so I would continue to hold bitcoins.
The stakes are high. It is a curious problem, but it's also a prediction market. One is simply predicting what the ultimate price of bitcoins will be. Will they be worthless, or a global currency? The current price is the probability, against an unknown payoff. To predict the latter, one simply holds bitcoins. To predict the former, one simply sells bitcoins. Bitcoins are not commodities in *any* sense. Buying a cow is not a prediction market on beef because the value of beef can't drop to literally 0: you can always eat it. You can't eat bitcoins or do anything at all with them. They are even more purely money than fiat money (the US government having perpetual problems with the zinc or nickel or copper in its coins being worth more as metal than as coins, and dollars are a tough linen fabric).
[Mencius Moldbug](http://unqualified-reservations.blogspot.com/2011/04/on-monetary-restandardization.html) turns out to have a similar analysis of the situation:
> If Bitcoin becomes the new global monetary system, one bitcoin purchased today (for 90 cents, last time I checked) will make you a very wealthy individual. You are essentially buying Manhattan for a quarter. There are only 21 million bitcoins (including those not yet minted). (In my design, this was a far more elegant 2^64^, with quantities in exponential notation. Just sayin'.) Mapped to $100 trillion of global money, to pull a random number out of the air, you become a millionaire. Wow!
>
> So even if the probability of Bitcoin succeeding is epsilon, a million to one, it's still worthwhile for anyone to buy at least a few bitcoins now. The currency thus derives an initial value from this probability, and boots itself into existence from pure worthlessness - becoming a viable repository of savings. If a very strange, dangerous and unstable one. I think the probability of Bitcoin succeeding is very low. I would not put it at a million to one, though, so I recommend that you go out and buy a few bitcoins if you have the technical chops. My financial advice is to not buy more than ten^[An aside: there's not much point in accumulating more than, say, 1000 bitcoins. It's generally believed that Bitcoin's ultimate fate will be victory or failure - it'd be very strange if Bitcoin leveled off as a stable permanent alternative currency for only part of the Internet. In such a situation, the difference between 1000 bitcoins and 1500 bitcoins is like the difference to Bill Gates between $60 billion and $65 billion; it matters in some abstract sense, but not even a tiny fraction as much as the difference between $1 and $100 million. Money is logarithmic in utility, as the saying goes.], which should be F-U money if Bitcoin wins.
Bitcoin cumulatively represents my largest ever wager in a prediction market; at stake was >$130 in losses (if bitcoins go to zero), or indefinite thousands. It will be very interesting to see what happens. By 5 August 2011, Bitcoin has worked its way down to around $10/฿, making my net worth $26; I did spend several bitcoins on the [Silk Road](), though. By 23 November 2011, it had trended down to $2.35/฿, but due to a large donation of 20 bitcoins, I spent most of my balance at the Silk Road, leaving me with 4.7 bitcoins. Overall, not a good start. By July 2012, donations brought my stock up to ฿12.5 with prices trading at $5-7. After an unexpected spike on 17 July to $9, I did some reading and learned that "pirateat40" (the operator of a [possible](http://predictionbook.com/predictions/7485 "pirateat40 will win his bet with vandroiy that BTCST will not default, as judged by nanotube") Ponzi scheme) was [boasting in `#bitcoin`](http://www.bitcointrading.com/forum/talk-bitcoin/confused-about-the-7-8-9-5-btcusd-bubble-answers-here-bitcoin-2012-07-17/) ([Reddit discussion](http://www.reddit.com/r/Bitcoin/comments/wpllp/can_someone_explain_in_simple_terms_what_the_hell/)) of using the funds to manipulate the market in an apparent [pump and dump](!Wikipedia) scheme and also mocking the ignorance of most buyers and sellers for not paying attention to the Bitcoin forums or IRC channel. pirateat40's manipulation and insinuation of future plans sourced me on holding many bitcoins, and I resolved to sell if the price on [MtGox](https://en.bitcoin.it/wiki/MtGox) went quickly back up to >$9; it did so the next day (18 July), I sold at $9.17. Withdrawing from MtGox turns out to be a major pain, with [Dwolla](!Wikipedia) withdrawal requiring providing documentation like a passport and a bank transfer costing $25. I ultimately used the [`#bitcoin-otc`](http://bitcoin-otc.com/) channel to arrange a swap with "nanotube" of my $115 MtGox dollars for an equivalent donation to my Paypal account. The next day, the price had fallen to $7.77; demonstrating why I don't try to time markets, by 11 August, the price had jumped to $11.50. This was a little worrisome for my long-term views that there's a good chance the Ponzi scheme will be used in market manipulation or collapse, but there's still much time left. A few days later, the price had spiked as high as $15, and I felt like quite a fool; but that's the marvelous thing about markets, one day you are a genius and the next you are fool. Unexpectedly, pirateat40 announced the dissolution of his BTCST. Was it a Ponzi or not? No one knew. Perhaps on fears of that, or perhaps because pirateat40 was fleeing with the funds, on the 18-19 August, the price began dropping, and kept dropping, all the way through $10, then $9, then $8. Watching this, I resolved to buy back in. It was very difficult to find anyone who would accept PayPal on `#bitcoin-otc`, but ultimately Namegduf agreed to a MtGox voucher swap, and I got $60 which I then spent at $7.8 for ฿7.6. In late February 2013, Bitcoin was almost at its all-time high of $31, and I happened to also need cash badly; I had received additional donations, so I sold out my ฿5.79 at $31.5 even as the price reached $32 - I just wanted to be out of what might have been another bubble. I then watched slackjawed as the bubble failed to pop, failed to keep its price-level, but instead doubled to $60, doubled again to $120, hit $159 on 7 April 2013, having quintupled since I decided to sell out, and finally peaked at $266 2 days later before falling back down to a steady-state of ~$100. That sale was not a great testament to my market timing skills, and prompted me to [rethink my opinions about Bitcoin](http://lesswrong.com/lw/h5x/bitcoin_cryonics_fund/8qew#body_t1_8qew). At various points through August 2013, I sold on `#bitcoin-otc` ฿0.5 for $52, ฿0.28 for $50, & ฿1.15 for $120, ฿0.5 for $66 & $64, ฿0.25 for $32, ฿0.1 for $13, and ฿1.0 for $127 & $129 - leaving me uncomfortably exposed at ฿18 (having had difficulty finding trustworthy buyers). On 2 October 2013, the news burst that Silk Road had been busted & DPR arrested & charged; Bitcoin immediately began dropping by $20-$40 from ~$127 (depending on exchange), so I purchased ฿2.7 for $105 each.
(One might wonder why I don't use the fairly active [Bets of Bitcoin](https://en.bitcoin.it/wiki/Bets_of_Bitcoin) prediction market; that is because the payout rules are [insane](http://betsofbitco.in/help) and I have no idea how to translate the "total weighted bets" into actual probabilities - Betting blind is never a good idea. And I have no interest in ever using BitBet as they brazenly [steal from users](https://bitcointalk.org/index.php?topic=339544.0;all "BitBet Stole ~$7,000 from me (10 BTC)").)
#### Zerocoin
[A research paper](http://zerocoin.org/media/pdf/ZerocoinOakland.pdf "'Zerocoin: Anonymous Distributed E-Cash from Bitcoin', Miers et al 2013") ([overview](http://blog.cryptographyengineering.com/2013/04/zerocoin-making-bitcoin-anonymous.html)) introduced [zero-knowledge proofs](!Wikipedia) for the destruction of coins in a hypothetical Bitcoin variant ([Zerocoin](http://zerocoin.org/)); this allowed the creation of new coins out of nothing while still keeping total coins constant (simply require a proof that for every new coin, an older coin was destroyed). In other words, truly *anonymous* coins rather than the pseudonymity and trackability of Bitcoin. Existing coin mixes are not guaranteed to work & to not steal your coins, so this scheme could be useful to Bitcoin users and worth adding. Efficiency concerns meant that the original version was impossible to add, but the researchers/developers kept working on it and shrunk the proofs to the point where they should be feasible to use. But they also announced they were looking into [launching the functionality into an altcoin](https://twitter.com/matthew_d_green/status/401797786347114496 "We're going to release it as an alt-coin. It will take a few months to get it to that point. Bitcoin can do what it wants.").
This raises a question: would this potential "Zerocoin" altcoin be worth possessing? That is, might it be more than simply a testbed for the zero-knowledge proofs to see how they perform before merging into Bitcoin proper?
I am generally extremely cynical about altcoins as being generally pump-and-dump schemes like Litecoin; I except [Namecoin](!Wikipedia) because distributed domain names is an interesting application of the global ledger and the proof-of-stake altcoins as interesting experiments on alternatives to Bitcoin's proof-of-work solution. Anonymity seems to me to be even more important than Namecoin's DNS functionality - witness the willingness of people to pay the fees to laundries like Bitcoin Fog without even guarantee they will receive safe bitcoins back (or look at the Tor network itself). So I see basically a few possible long-term outcomes:
1. Zerocoin fizzles out and the network disintegrates because no one cares
2. Zerocoin core functionality is captured in Bitcoin and it disintegrates because it is now redundant
3. Zerocoin survives as an anonymity layer: people buy zerocoins with tainted bitcoins, then sell the zerocoins for unlinked bitcoins
4. Zerocoin replaces Bitcoin
Probability-wise, I'd rank outcome #1 as the most likely, #2 is likely but not very likely because the Bitcoin Foundation seems increasingly beholden to corporate and government overseers and even if not actively opposed, will engage in motivated reasoning looking for reasons to reject Zerocoin functionality and avoid rocking its boat; #3 seems a little less likely since people can use the laundries or alternative tumbling solutions like [CoinJoin](https://bitcointalk.org/index.php?topic=279249.0;all "CoinJoin: Bitcoin privacy for the real world") but still fairly probable; #4 very improbable, like 1%.
To elaborate a little more on the reasoning for believing #2 unlikely: my belief that the Foundation & core developers are not keen on Zerocoin is based on my personal intuition about a number of things:
- the decision by the Zerocoin developers to pursue an altcoin at all, which is a massive waste of effort *if* they had no reason to expect it to be hard to merge it in (or if the barriers to Zerocoin use were purely technical); the altcoin is a very recent decision, and they were clear upfront that "Zerocoin is not intended as a replacement for Bitcoin" (written 11 April 2013).
- [the iron law of oligarchy](!Wikipedia), which suggests that the Foundation & core developers may be gradually shifting into an accommodationist modes of thought - attending government hearings to defend Bitcoin, repeatedly stating Bitcoin is not anonymous but pseudonymous and so is no threat to the status quo (which is misleading and even technically interpreted, would be torpedoed by Zerocoin), and discussing whitelisting addresses. To put it crudely, we may be in the early stages of them "selling out": moderating their positions and cooperating with the Powers That Be to avoid rocking the boat and achieve things they value more like mainstream acceptance & praise. (I believe something very similar happened to [Wikipedia's WikiMedia Foundation after the Seigenthaler incident](In Defense Of Inclusionism).)
- the lack of any really positive statements about Zerocoin, despite the technical implications: the holy grail achieved - truly anonymous decentralized digital cash! With Zerocoin added in, the impossible will have become possible. It says a lot about how far from the libertarian cryptopunk roots Bitcoin has drifted that Zerocoin is not a top priority.
Price-wise, #1 and #2 mean zerocoins go to zero, but on the plus side mining or buying at least signals support and may have positive effects on the Foundation or Bitcoin community. Outcome #4 (replacing Bitcoin) means obviously ludicrous profits as Zerocoin goes from pennies or a dollar each to $500+ (assuming for convenience Zerocoin also sets 21m coins). Interestingly, outcome #3 (anonymity layer) *also* means substantial profits: because the price of zerocoins will be more than pennies due to the float from Bitcoin users washing coins. Imagine that there are 1m zerocoins actively traded, and Bitcoin users want to launder $10m of bitcoins a year, and it on average takes a day for each Bitcoin user to finish moving in and out of zerocoins; then each day there's $27378 locked up in zerocoins and spread over the 1m zerocoins, then solely from the float alone, each zerocoin must be worth 3¢ (which is a nice profit for anyone who, say, bought zerocoins at 1¢ after the Zerocoin genesis block).
I personally think Bitcoin should incorporate Zerocoin if the resource requirements are not too severe, and supporting Zerocoin may help this. And if it doesn't, then it may well be profitable. In either case, I benefit. So if/when the Zerocoin genesis block is released, I will consider trying to mine it or establishing a price floor (eg publicly committing $100 to buying zerocoins at 1¢ from any and all comers).
Predictions:
- Zerocoin as functioning altcoin network within a year: [65%](http://predictionbook.com/predictions/22178)
- Zerocoin market cap >$7,700,000,000 within 5 years (conditional on launch): [1%](http://predictionbook.com/predictions/22179)
- Zerocoin market cap >$7,000,000 within 5 years (conditional on launch): [7%](http://predictionbook.com/predictions/22180)
- Zerocoin functionality incorporated into Bitcoin within 1 year: [33%](http://predictionbook.com/predictions/22181)
- Zerocoin functionality incorporated into Bitcoin within 5 years: [45%](http://predictionbook.com/predictions/22182)
# Personal bets
> Overall, I am for betting because I am against [bullshit](!Wikipedia "Bullshit#Harry Frankfurt's concept"). Bullshit is polluting our discourse and drowning the facts. A bet costs the bullshitter more than the non-bullshitter so the willingness to bet signals honest belief. A bet is a tax on bullshit; and it is a just tax, tribute paid by the bullshitters to those with genuine knowledge.^[Alex Tabarrok, ["A Bet is a Tax on Bullshit"](http://marginalrevolution.com/marginalrevolution/2012/11/a-bet-is-a-tax-on-bullshit.html)]
Besides prediction markets, one can make person-to-person bets. These are not common because they require a degree of trust due to the issue of who will judge a bet & [counterparty risk](!Wikipedia), and I have not found many people online that I would willing to bet with or vice versa. Below is a list of attempts:
Person Bet Accepted Date offered Expiration Theirs My $ My _P_ Bet Position Result Notes
-------------- -------------------------------------------------------------- -------- -------------- ----------- ------ ---- --------- ---------- ---------- -----------------
mostlyacoustic Entrance fee/RSVP required at NYU lecture. No 3 March 2011 2 days $5 $100 <5% Against Win [LW discussion](http://lesswrong.com/lw/4mi/eliezer_yudkowsky_and_michael_vassar_at_nyu/3mg9)
Eliezer Yudkowsky _HP MoR_ will win Hugo for Best Novel 2013-2017 Yes 12 April 2012 5 Sep 2017 $5 $100 5% Against [LW discussion](http://lesswrong.com/lw/bfo/harry_potter_and_the_methods_of_rationality/6bcw)
Filipe Cosma Shalizi believes that P=NP Yes 4 June 2012 1 week $100 $100 1% Against Win I forgave the amount due to his personal circumstances.
mtaran Kim Suozzi's donation solicitations not a scam No 19 August 2012 1 Jan 2013 $10 $100 90% Against Win [LW discussion](http://lesswrong.com/lw/e5d/link_reddit_help_me_find_some_peace_im_dying_young/786d); in negotiating the details, mtaran didn't seem to understand betting, so the bet fell through.
chaosmosis Mitt Romney lose 2012 Presidential election No 15 Oct 2012 3 Nov 2013 $30 $20 70% For Win
David Lee >1m people using Google Glass-style HUD in 10 years. No 8 June 2013 10 years ? ? 50% Against [_Fortune_](http://www.forbes.com/sites/haydnshaughnessy/2013/05/08/what-is-driving-the-google-stock-price-up/) discussion; Lee's cavalier acceptance of 100:1 odds indicated he was not serious, so I declined.
chaosmosis _HP MoR_: the dead character Hermione to reappear as ghost No 30 June 2013 1 year ? $25 30% Against Win [Reddit discussion](http://www.reddit.com/r/HPMOR/comments/1hc6x6/spoiler_discussion_thread_for_ch_8889/cat7wsz)
jacoblyles MIRI/CFAR to evolve into terrorist organizations No 18 Oct 2012 30 years ? <$1000 <1% Against [LW discussion](http://lesswrong.com/lw/bql/our_phyg_is_not_exclusive_enough/7msl?context=1#7msl)
Patrick Robotham Whether could prove took economics course to third party Yes 20 Sep 2013 immediate $50 $10 50% Against Loss
Mparaiso >30 [Silk Road]()-related arrests in the year after the bust No 8 Oct 2013 1 Oct 2014 $20 $100 20% Against [offer](https://news.ycombinator.com/item?id=6518399), [PB.com](http://predictionbook.com/predictions/21664)
qwertyoruiop Bitcoin ≤$50/฿ between October & December 2013 Yes 19 Oct 2013 19 Dec 2013 ฿0.1 ฿0.1 5% Against Win [PB.com](http://predictionbook.com/predictions/21779 "The price of Bitcoin will be <=$50 before 20 December 2013: 5%"); [signed contract](http://pastebin.com/raw.php?i=0Psiuupw); qwertyoruiop paid early as once Bitcoin reached a peak of $900, it was obviously not going to be ≤$50 again, as indeed it was not.
everyone Sheep Marketplace to shut down in 6 months No 30 Oct 2013 30 Apr 2013 ฿2.3 ฿1.0 40% For Loss [Reddit post](http://www.reddit.com/r/SilkRoad/comments/1pko9y/the_bet_bmr_and_sheep_to_die_in_a_year/)
* Sheep Marketplace to shut down in 12 months No 30 Oct 2013 30 Oct 2014 ฿0.66 ฿1.0 50% For Win *
* BlackMarket Reloaded to shut down in 6 months No 30 Oct 2013 30 Apr 2013 ฿3.0 ฿1.0 35% For Win? *
* BlackMarket Reloaded to shut down in 12 months No 30 Oct 2013 30 Oct 2014 ฿1.5 ฿1.0 50% For Win? *
Delerrar Nanotube is providing escrow for the 4 BMR/Sheep bets No 30 Oct 2013 31 Oct 2013 ฿0.1 ฿0.1 <5% For Win [Offer on Reddit](http://www.reddit.com/r/SilkRoad/comments/1pko9y/the_bet_bmr_and_sheep_to_die_in_a_year/cd3ar75)
# Predictions
> "I recall, for example, suggesting to a regular loser at a weekly poker game that he keep a record of his winnings and losses. His response was that he used to do so but had given up because it proved to be unlucky." --Ken Binmore, [_Rational Decisions_](http://www.amazon.com/Rational-Decisions-Gorman-Lectures-Economics/dp/0691149895/)
Markets teach humility to all except those who have very good or very poor memories. Writing down precise predictions is like [spaced repetition](Spaced repetition): it's brutal to do because it is almost a paradigmatic long-term activity, being wrong is *physically* unpleasant[^dopamine], and it requires 2 skills, formulating precise predictions and then actually predicting. (For spaced repetition, writing good flashcards and then actually regularly reviewing.) There are lots of exercises to try to (calibrate yourself using trivia questions obscure historical events, geography, etc.), but they only take you so far; it's the real world near term and long term predictions that give you the most food for thought, and those require a year or three at minimum. I've used PB heavily for 11 months now, and I used prediction markets for years before PB, and only now do I begin to feel like I am getting a grasp on predicting. We'll look at these alternatives.
[^dopamine]: The famous neurotransmitter [dopamine](!Wikipedia "Dopamine#Motivation and pleasure") is intimately involved with feelings of happiness and pleasure (which is why dopamine is affected by most addictions or addictive drugs). It also is involved in learning - make an error and no dopamine for you; ["Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal"](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1564381/) (Bayer & Glimcher 2005, _Neuron_):
> The midbrain dopamine neurons are hypothesized to provide a physiological correlate of the reward prediction error signal required by current models of [reinforcement learning](!Wikipedia). We examined the activity of single dopamine neurons during a task in which subjects learned by trial and error when to make an eye movement for a juice reward. We found that these neurons encoded the difference between the current reward and a weighted average of previous rewards, a reward prediction error, but only for outcomes that were better than expected. Thus, the firing rate of midbrain dopamine neurons is quantitatively predicted by theoretical descriptions of the reward prediction error signal used in reinforcement learning models for circumstances in which this signal has a positive value. We also found that the dopamine system continued to compute the reward prediction error even when the behavioral policy of the animal was only weakly influenced by this computation.
## Prediction sites
> "The best salve for failure - to have quite a lot else going on."^[[Alain de Botton](https://twitter.com/#!/alaindebotton/status/112772752041181184)]
Besides the specific mechanism of prediction markets, one can just make and keep track of predictions oneself. They are much cheaper than prediction markets or informal betting and correspondingly tend to elicit many more responses^[For example, no one has actually taken up [Kevin's offer](http://lesswrong.com/r/discussion/lw/6g0/making_an_amanda_knox_prediction_market/) to wager on the outcome to [Amanda Knox's](!Wikipedia "Amanda Knox") appeal, while there are dozens of specific probabilities given in an [earlier survey](http://lesswrong.com/r/lesswrong/lw/1ir/you_be_the_jury_survey_on_a_current_event/).]
There are a number of relevant websites I have a little experience with; some aspire to be like David Brin's proposed [prediction registries](http://www.davidbrin.com/predictionsregistry.htm), some do not:
1. [PredictionBook](http://www.predictionbook.com) (PB) is a general-purpose free-form prediction site. PB is a site intended for personal use and small groups registering predictions; the hope was that LessWrongers would use it whenever they made predictions about things (as they ought to in order to keep their theories grounded in reality). It hasn't seen much uptake, though not for the lack of my trying.
I [personally use it](http://predictionbook.com/users/gwern) heavily and have input somewhere around 1000 predictions, of which around 300 have been judged. (I apparently am rather *under*confident.) A good way to get started is to go to the list of [upcoming predictions](http://predictionbook.com/predictions/future) and start entering in your own assessment; this will give you feedback quickly.
2. [Long Bets](http://longbets.org/)
I find the Long Bets concept interesting, but it has serious flaws for anyone who wants to do more than make a public statement like [Warren Buffet has](http://longbets.org/362/). forcing people to put up money has kept real-money prediction markets pretty small in both participants and volume; and how much more so when all proceeds go to charity? No wonder that half a decade or more later, there's only a few hundred money-bets going, even with prominent participants like Warren Buffet. Non-money markets or prediction registries can work in the higher volumes necessary for learning to predict better. Single-handedly on PB I have made 10 times the number of predictions on all of Long Bets. Where will I learn & improve more, Long Bets or PB? (It was easy for me to borrow all the decent predictions and register them on PB.)
3. [FutureTimeline](http://www.futuretimeline.net/) is a maintained list of projected technological milestones, events like the Olympics, and mega-construction deadlines.
FutureTimeline does not assign any probabilities and doesn't attempt to track which came true; hence, it's more of a list of suggestions than predictions. I have copied over many of the more falsifiable ones to PB.
4. WrongTomorrow: a site that was devoted solely to registering and judging predictions made by pundits (such as the infamous [Tom Friedman](!Wikipedia "Friedman (unit)")).
Unfortunately, WT was moderated and when WT didn't see a sudden massive surge in contributions, moderation fell behind badly until eventually the server was just turned off for the author's other projects. I still managed to copy a number of predictions off it into PB, however. WT is an example of a general failure mode for collections of predictions: no follow-through. Predictions are the paradigmatic [Long Content](About#long-content), and WT will probably not be the first site to learn this the hard way.
And the last site demonstrates like Brin's prediction registries have not come into existence. One of the few approximations to a prediction registry is [Philip Tetlock](!Wikipedia)'s justly famous 2005 book [_Expert Political Judgment: How Good Is It? How Can We Know?_](http://www.amazon.com/Expert-Political-Judgment-Good-Know/dp/0691128715/), which discusses an ongoing study which has tracked >28000 predictions by >284 experts, proves why: experts are not accurate and [can be outperformed](http://www.overcomingbias.com/2006/11/foxes_vs_hedgho.html) by [embarrassingly simple models](http://lesswrong.com/lw/3gv/statistical_prediction_rules_outperform_expert/), and they do not learn from their experience, attempting to retroactively justify their predictions with reference to counterfactuals. (If wishes were fishes... Predictions are about the real world, and in the real world, hacks and bubbles are normal expected phenomena. A verse I saw somewhere runs: "Since the beginning / not one unusual thing has happened". If your predictions can't handle normal exogenous events, then they are still wrong. Tetlock identifies this as a common failure mode of hedgehog-style experts: "I was actually right! but for X Y Z...") And looking around, I think I agree with [Eliezer Yudkowsky](http://lesswrong.com/lw/hi/futuristic_predictions_as_consumable_goods/) that when the vast majority of people make a prediction, it is not an actual prediction to be judged right or wrong but an entertaining [performative utterance](!Wikipedia) intended to [signal partisan loyalties](http://www.overcomingbias.com/2009/08/tetlock-wisdom.html).
Another feature worth mentioning is that prediction sites do not generally allow *retrospective* predictions, because that is easily abused even by the honest (who may be suffering [confirmation bias](!Wikipedia)). Prediction markets, needless to say, universally ban retrospective predictions. So, predicting generally doesn't give fast feedback - intrinsically, you can't learn very much from short-term predictions because either there's serious randomness involved such that it takes hundreds of predictions to begin to improve, or the predictions are badly over-determined by available information that one learns little from the successes.
### Prediction sources
A short list of sites which make it easy to find newly-created predictions or (for quicker gratification & calibration) predictions which are about to reach their due dates:
- PredictionBook.com: [new](http://predictionbook.com/predictions)/[upcoming](http://predictionbook.com/predictions/future)
- Bets of Bitcoin: [new](http://betsofbitco.in/list?status=available&category=All&sorting=-moderationTime)/[upcoming](http://betsofbitco.in/list?status=available&category=All&sorting=deadlineTime)
- Inkling: [new](http://home.inklingmarkets.com/recent/markets)/[upcoming](http://home.inklingmarkets.com/expiring/markets)
- NITLE: [new](http://markets.nitle.org/markets)/[upcoming](http://markets.nitle.org/expiring/markets)
- iPredict: [new](https://www.ipredict.co.nz/app.php?do=browse&tag=4)/[upcoming](https://www.ipredict.co.nz/app.php?do=browse&tag=3)
- Foresight Exchange: [new/upcoming](http://www.ideosphere.com/fx-bin/ListClaims) ("Sort order: date_created"/"Sort order: date_due")
- LongBets: [new](http://feeds.feedburner.com/longbets)
- Intrade: [new](http://www.intrade.com/v4/misc/recentpredictions/)
- Hollywood Stock Exchange: [opening movies](http://www.hsx.com/security/feature.php?type=opening)/[upcoming movies](http://www.hsx.com/security/feature.php?type=upcoming)
### IARPA: The Good Judgment Project
In 2011, the [Intelligence Advanced Research Projects Activity](!Wikipedia) agency (IARPA) began the [Aggregative Contingent Estimation (ACE) Program](http://www.iarpa.gov/index.php/research-programs/ace), pitting 5 research teams against each other to investigate and improve prediction of geopolitical events. One team, [the Good Judgment Project](http://goodjudgmentproject.blogspot.com/) (see the [_Wired_ interview](http://www.wired.com/wiredscience/2011/08/do-political-experts-know-what-theyre-talking-about/) with [Philip Tetlock](!Wikipedia)), solicited college graduates for the 4 year time period of ACE to register predictions on selected events, for a $150 honorarium. A last-minute notice was posted [on LessWrong](http://lesswrong.com/lw/6ya/link_get_paid_to_train_your_rationality/), and I immediately signed up and was accepted as [I predicted](http://predictionbook.com/predictions/2973).
The initial survey upon my acceptance was long and detailed (calibration on geopolitics, finance, and religion; personality surveys with a lot of fox/hedgehog questions; basic probability; a critical thinking test, the CRT; educational test scores; and then what looked like a full matrix IQ test - we were allowed to see some of [our own results](/docs/2011-gwern-gjp-psychsurveys.html), like the season 2 calibration test[^GJP-2-calibration]). The final results will no doubt turn up many interesting correlations or lack of correlation. I look forward [to completing the study](http://predictionbook.com/predictions/2978). At the very least, they will supply a few hundred predictions I can put on PredictionBook.com - formulating a quality prediction (falsifiable, objective, and interesting) can be the hardest part of predicting.
[^GJP-2-calibration]: Unfortunately they don't give any population statistics so it's hard for me to interpret my results:
> Your calibration score is -3. Calibration is defined as the difference between the percentage average confidence rating and the percentage of correct answers. A score of zero is perfect calibration. Positive numbers indicate overconfidence and can go up to 100. Negative numbers represent under-confidence and can go down to -100.
>
> Your discrimination score is 4.48. Discrimination is defined as the difference between the percentage average confidence rating for the correct items and the percentage average confidence rating for the incorrect items. Higher positive numbers indicate greater discrimination and are better scores.
#### Season 1 results
My initial batch of short-term predictions did well; even though I make a major mistake when I fumble-fingered a prediction about Mugabe (I bet that he would fall from office in a month, when I believed the opposite), I was still up by \$700 in its play-money. I have, naturally, been copying my predictions [onto PredictionBook.com](http://www.google.com/search?q=%22GJP%3A%20%22%20site%3Apredictionbook%2Ecom) the entire time.
Despite a very questionable prediction closure by IARPA which cost me $200[^China], I finished 2011 well in the green. My [results](/docs/2011-gwern-gjp-forecastresults.html):
> - Your total earnings for 84 out of 85 closed forecasts is 15,744.
> - You were ranked 28 among the 204 forecasters in Group 3c.
Not *too* shabby; I was actually under the impression I was doing a lot worse than that. Hopefully I can do better in 2012 - I seem fairly accurate, so I ought to make my bets larger.
[^China]: Specifically, prediction [#1007](http://predictionbook.com/predictions/3254). In its preface to the results page, GJP told us:
> Question 1007 (the "lethal confrontation" question) illustrates this point. Many of our best forecasters got 'burned' on this question because a Chinese fishing captain killed a South Korean Coast Guard officer late in the forecasting window - an outcome that the tournament's sponsors deemed to satisfy the criteria for resolving the question as 'yes', but one that had little geopolitical significance (it did not signify a more assertive Chinese naval policy). These forecasters had followed our advice (or their own common sense) by lowering their estimated likelihood of a lethal confrontation as time elapsed and made their betting decisions based on this assumption.
#### Season 2 results
Naturally I signed up for Season 2. But it took the GJP months to actually send us the honorarium, and for Season 2, they switched to a much harder to use prediction-market interface which I did not like at all. I used up my initial allotment of money, but I'm not sure how actively I will participate: there's still some novelty but the UI was bad enough that all the fun is gone. The later addition of 'trading agents' where one could just specify one's probability and it would make appropriate trades automatically lured me back in for some trading, but as one would expect from my disengagement, [my final results](/docs/2013-gwern-gjp-forecastresults.pdf) were far worse than for season 1: I ranked 184 out of 245.
I might as well stick around for season 3. Maybe I will try harder this time.
## Calibration
> "The best lack all conviction, while the worst are full of passionate intensity." --Yeats, ["The Second Coming"](!Wikipedia "The Second Coming (poem)")
Faster even than making one's own predictions is the procedure of [*calibrating*](http://lesswrong.com/r/lesswrong/tag/calibration/) yourself. Simply put, instead of buying shares or not, you give a direct probability: your 10% predictions should come true 10% of the time, your 20% predictions true 20% of the time, etc. This is not so much about figuring out the true probability of the event or fact in the real world but rather about *your* own ignorance. It is as much about learning humility and avoiding hubris as it is about accuracy. You can be well-calibrated even making predictions about topics you are completely ignorant of - simply flip a coin to choose between 2 possibilities. You are still better than someone who is equally ignorant but arrogantly tries to pick the right answers anyway and fails - he will be revealed as miscalibrated. If they are ignorant and don't know it, they will come out overconfident; and if they are knowledgeable and don't realize it, they will come out underconfident. (Note that learning of your overconfidence is less painful than in a prediction market, where you lose your money.)
Thus, one can simply compile a trivia list and test people on their calibration; there are [at least 4](http://lesswrong.com/r/lesswrong/lw/1f8/test_your_calibration/) such online quizzes along with the board game [Wits & Wagers](http://boardgamegeek.com/boardgame/20100/wits-wagers). (Consultant Douglas Hubbard has a book [_How to Measure Anything: Finding the Value of "Intangibles" in Business_](http://www.amazon.com/How-Measure-Anything-Intangibles-Business/dp/0470539399/) which is principally on the topic of applying a combination of calibration and [Fermi estimates](Notes#fermi-calculations) to many business problems, which I found imaginative & interesting.) These tests are also useful for occasional independent checks on whether you easily succumb to bias or miscalibration in other domains; I personally seem to do reasonably well[^Your-Morals].
[^Your-Morals]: For example, in the [YourMorals.org](http://yourmorals.org/) tests dealing with calibration/bias, I usually do well above average, even for LessWrongers; see:
- ["an experimental investigation of how people evaluate research evidence that either supports or opposes their pre-existing beliefs"](/docs/2011-gwern-yourmorals.org/evalinfo_process.html)
- ["Over-claiming Technique"](/docs/2011-gwern-yourmorals.org/cog_ab_process.html)
- ["Balanced Inventory of Desirable Responding"](/docs/2011-gwern-yourmorals.org/bidr_process.html)
- ["Marlowe-Crowne Social Desirability Scale"](/docs/2011-gwern-yourmorals.org/crowne_process.html)
- ["This scale is designed to measure the better-than-average effect, which is also known as the illusory superiority bias."](/docs/2011-gwern-yourmorals.org/selfeval_process.html)
Some professional groups do much better on forecasting than others. Two of the key factors found by Armstrong and other forecasting researchers is that the better groups have fast and clear feedback[^WSJ], and conversely, Tetlock's "hedgehogs" were characterized by constant attempts to rationalize unexpected outcomes and refrain from falsifying their cherished world-view. Trivia questions, and to a lesser extent the predictions on PredictionBook.com, offer both factors.
[^WSJ]: The 2001 anthology of reviews and papers, [_Principles of Forecasting_](/docs/predictions/2001-principlesforecasting.pdf "Armstrong et al 2001"), is invaluable, although many of the papers are highly technical. Excerpts from Dylan Evans's [_Risk Intelligence_](http://www.amazon.com/Risk-Intelligence-How-Live-Uncertainty/dp/1451610904/) (in the _[Wall Street Journal](http://online.wsj.com/article/SB10001424052702304451104577392270431239772.html)_) may be more readable:
> Psychologists have tended to assume that such biases are universal and virtually impossible to avoid. But certain groups of people-such as meteorologists and professional gamblers-have managed to overcome these biases and are thus able to estimate probabilities much more accurately than the rest of us. Are they doing something the rest of us can learn? Can we improve our risk intelligence?
>
> Sarah Lichtenstein, an expert in the field of decision science, points to several characteristics of groups that exhibit high intelligence with respect to risk. First, they tend to be comfortable assigning numerical probabilities to possible outcomes. Starting in 1965, for instance, U.S. National Weather Service forecasters have been required to say not just whether or not it will rain the next day, but how likely they think it is in percentage terms. Sure enough, when researchers measured the risk intelligence of American forecasters a decade later, they found that it ranked among the highest ever recorded, according to a study in the Journal of the Royal Statistical Society.
>
> It helps, too, if the group makes predictions only on a narrow range of topics. The question for weather forecasters, for example, is always roughly the same: Will it rain or not? Doctors, on the other hand, must consider all sorts of different questions: Is this rib broken? Is this growth malignant? Will this drug cocktail work? Studies have found that doctors score rather poorly on tests of risk intelligence.
>
> Finally, groups with high risk intelligence tend to get prompt and well-defined feedback, which increases the chance that they will incorporate new information into their understanding. For weather forecasters, it either rains or it doesn't. For battlefield commanders, targets are either disabled or not. For doctors, on the other hand, patients may not come back, or they may be referred elsewhere. Diagnoses may remain uncertain.
>
> ...Royal Dutch Shell introduced just such a program in the 1970s. Senior executives had noticed that when newly hired geologists predicted oil strikes at four out of 10 new wells, only one or two actually produced. This overconfidence cost Royal Dutch Shell millions of dollars. In the training program, the company gave geologists details of previous explorations and asked them for numerical estimates of the chances of finding oil. The inexperienced geologists were then given feedback on the number of oil strikes that had actually been made. By the end of the program, their estimates roughly matched the actual number of oil strikes.
>
> ...Just by becoming aware of our tendency to be overconfident or underconfident in our estimates, we can go a long way toward correcting for our most common errors. Doctors, for instance, could provide numerical estimates of probability when making diagnoses and then get data about which ones turned out to be right. As for the rest of us, we could estimate the likelihood of various events in a given week, record our estimates in numerical terms, review them the next week and thus measure our risk intelligence in everyday life. A similar technique is used by many successful gamblers: They keep accurate and detailed records of their earnings and their losses and regularly review their strategies in order to learn from their mistakes.
## 1001 PredictionBook Nights
> I explain what I've learned from creating and judging thousands of predictions on personal and real-world matters: the challenges of maintenance, the limitations of prediction markets, the interesting applications to my other essays, skepticism about pundits and unreflective persons' opinions, my own biases like optimism & planning fallacy, 3 very useful heuristics/approaches, and the costs of these activities in general. (Plus an extremely geeky parody of _Fate/Stay Night_.)
(Initial discussion [on LessWrong](http://lesswrong.com/lw/7z9/1001_predictionbook_nights/#comments).)
> I am the [core of my mind.](http://www.paulgraham.com/identity.html) \
> [Belief](http://lesswrong.com/lw/s6/probability_is_subjectively_objective/) is my body and [choice](http://wiki.lesswrong.com/wiki/Rationalists_should_win) is my blood. \
> [I have recorded](http://predictionbook.com/users/gwern) over a thousand predictions, \
> [Unaware of fear](!Wikipedia "Loss aversion") \
> Nor [aware of hope](http://lesswrong.com/lw/uk/beyond_the_reach_of_god/) \
> [Have](http://lesswrong.com/lw/21b/ugh_fields/) [withstood](http://lesswrong.com/lw/5xx/overcoming_suffering_emotional_acceptance/) [pain](http://lesswrong.com/lw/jy/avoiding_your_beliefs_real_weak_points/) [to update](http://lesswrong.com/lw/i9/the_importance_of_saying_oops/) many times \
> Waiting for [truth's arrival](http://predictionbook.com/predictions/future). \
> This is the [one uncertain path](http://wiki.lesswrong.com/wiki/How_To_Actually_Change_Your_Mind). \
> My whole life has been... \
> [Unlimited Bayes Works](http://lesswrong.com/lw/h8/tsuyoku_naritai_i_want_to_become_stronger/)!^[Modified version of [Eliezer Yudkowsky's parody](http://lesswrong.com/lw/ya/normal_ending_last_tears_68/qvc) of the [_Fate/Stay Night_ chant](http://comipress.com/article/2007/07/02/2228).]
In October 2009, the site [PredictionBook.com](http://www.predictionbook.com) was [announced on LW](http://lesswrong.com/lw/1bh/predictionbookcom_track_your_calibration/). I signed up in July 2010, as tracking free-form predictions was the logical endpoint of my dabbling in prediction markets, and I had recently withdrawn from Intrade due to [fee changes](Prediction markets#cashing-out). Since then [I have been](http://predictionbook.com/users/gwern) the principal user of PB.com, and a while ago, I registered my 1001^th^ prediction. (I am currently up to >1628 predictions, with >383 judged; PB total has >4258 predictions.) I had to write and research most of them myself and they represent a large time investment. To what use have I put the site, and what have I gotten out of the predictions?
### Using PredictionBook
> "Our errors are surely not such awfully solemn things. In a world where we are so certain to incur them in spite of all our caution, a certain lightness of heart seems healthier than this excessive nervousness on their behalf." --[William James](!Wikipedia), "[The Will to Believe](!Wikipedia)", section VII
Using PredictionBook taught me two things as far as such sites go:
1. I learned the value of centralizing (and [backing up](Archiving URLs)) predictions of interest to me. I ransacked [LongBets.org](http://longbets.org/), `WrongTomorrow.com`, [Intrade](!Wikipedia), [FutureTimeline.net](http://futuretimeline.net/), and various collections of predictions like [Arthur C. Clarke's list](http://www.arthurcclarke.net/?scifi=3), LessWrong's own annual prediction threads ([2010](http://lesswrong.com/lw/1la/new_years_predictions_thread/), [2011](http://lesswrong.com/lw/3kz/new_years_predictions_thread_2010/)), or simply [random comments on LW](http://www.google.com/search?q=lesswrong.com%20site%3Apredictionbook.com) (sometimes [Reddit](http://www.google.com/search?q=predictionbook%2Ecom%20site%3Areddit%2Ecom) too). This makes searching for previous predictions easier, graphs all my registered predictions, and makes backups a little simpler. WrongTomorrow promptly vindicated my paranoia by dying without notice. I now have a reply to [David Brin](!Wikipedia)'s oft-repeated plea for a '[predictions registry](http://www.davidbrin.com/predictionsregistry.htm)': no one cares, so if you want one, you need to do it yourself.
2. I realized that using prediction markets had narrowed my appreciation of what predictions are good for. IEM & Intrade had taught me contempt for certain pundits (and respect for [Nate Silver](!Wikipedia)) because they would mammer on about issues where I knew better from the relevant market; but there are very few liquid markets in either site, and so I learned this for only a few things like the US Presidential elections. Prediction markets will be flawed for the foreseeable future, with individual contracts subject to long-shot bias[^longshot] or simply bizarre claims due to illiquidity[^Taiwan]; for these things, one must go elsewhere or not go at all.
At worst, this fixation on prediction markets - and real-money prediction markets - may lead one to engage in [epic](http://lesswrong.com/lw/le/lost_purposes/) [yak-shaving](http://projects.csail.mit.edu/gsb/old-archive/gsb-archive/gsb2000-02-11.html) in [striving](http://lesswrong.com/lw/atm/cult_impressions_of_less_wrongsingularity/7czw?context=1#7czw) [to change](http://lesswrong.com/lw/h/test_your_rationality/7s) US laws to permit prediction markets! I am reminded of Thoreau:
> This spending of the best part of one's life earning money in order to enjoy a questionable liberty during the least valuable part of it reminds me of the Englishman who went to India to make a fortune first, in order that he might return to England and live the life of a poet. He should have gone up the garret at once.
[^longshot]: [Long-shot bias](!Wikipedia "Favourite-longshot bias") is the overvaluing of events in the 0-5% range or so; it plagues even heavily traded markets on Intrade. Ron Paul and Michele Bachmann are 2 cases in point - they are covered by the heavily-traded US Presidential contracts, yet they are priced too high, and this has been noted by many:
- <http://fskrealityguide.blogspot.com/2008/02/defect-of-intrade.html>
- <http://lesswrong.com/lw/1ia/arbitrage_of_prediction_markets/>
- <http://www.freakonomics.com/2007/05/24/what-do-you-have-to-say-about-ron-paul/>
- <http://www.regruntled.com/2008/10/21/selling-delusion-short/>
- <http://www.regruntled.com/2008/11/06/intrade-retrospective/>
Beyond blog posts, a [2004 Wolfers & Zitzewitz paper](http://www.econ.ku.dk/tyran/Teaching/BEecon_MA/readings_BEecon/readings%20MA_Expecon/Wolfers%20and%20Zitzewitz_Prediction%20Markets_JEP2004.pdf "Prediction Markets") finds their presence (see also [Rothschild 2011](http://repository.upenn.edu/cgi/viewcontent.cgi?article=1421&context=edissertations "Forecasting: Expectations, Intentions, and Confidence")):
> In fact, the price differences implied a (small) arbitrage opportunity that persisted for most of summer 2003 and has reappeared in 2004. Similar patterns existed for Tradesports securities on other financial variables like crude oil, gold prices and exchange rates. This finding is consistent with the long-shot bias being more pronounced on smaller-scale exchanges.
This is apparently due in part to the short-term pressure on prediction market traders; [Robin Hanson](http://lesswrong.com/lw/1ia/arbitrage_of_prediction_markets/1b2l) says:
> "Intrade and IEM don't usually pay interest on deposits, so for long term bets you can win the bet and still lose overall. The obvious solution is for them to pay such interest, but then they'd lose a hidden tax many customers don't notice."
Another reason to use a free-form site like PB.com - you can (and I have) made predictions about decades or centuries into the far future without worrying about how to earn returns of thousands of percent.
[^Taiwan]: Going through Intrade to copy over predictions to PB.com, I was struck by how non-liquid markets could be left at hilarious prices, prices that make no rational sense since they can't even represent someone hedging against that outcome because so few shares have been sold; example contracts include:
1. [US attacking North Korea](http://predictionbook.com/predictions/3048)
2. [China attacking Taiwan](http://predictionbook.com/predictions/3052)
3. [Japan acquiring nuclear weapons](http://predictionbook.com/predictions/3073)
### Noted predictions
> "Robert Morris has a very unusual quality: he's never wrong. It might seem this would require you to be omniscient, but actually it's surprisingly easy. Don't say anything unless you're fairly sure of it. If you're not omniscient, you just don't end up saying much. More precisely, the trick is to pay careful attention to how you qualify what you say...He has an almost superhuman integrity. He's not just generally correct, but also correct about how correct he is. You'd think it would be such a great thing never to be wrong that everyone would do this. It doesn't seem like that much extra work to pay as much attention to the error on an idea as to the idea itself. And yet practically no one does." --[Paul Graham](http://www.paulgraham.com/heroes.html "Some Heroes")
Do any particular sets of predictions come to my mind? Yes:
1. My largest outstanding collection are [the >207 predictions](otaku-predictions) about the unreleased _Evangelion_ movies & manga; I regard their upcoming releases as excellent chances to test my theories about _Evangelion_ interpretation in a way that is usually impossible when it comes to literary interpretation
2. For my personal Adderall double-blind trial, I [recorded 16 predictions about a trial](Nootropics#adderall-blind-testing) (guessing whether it was placebo or Adderall) to try to see how strong an effect I could diagnose, in addition to whether there was one at all. (I also did one for [modafinil](Nootropics#modalert-blind-day-trial) & [LSD microdosing]())
3. During the big Bitcoin bubble, I recorded a number of predictions on Reddit & LW and followed up on a number of them; I believe this was educational for those involved - at the least, I think I tempered my own enthusiasm by noting the regular failure of the most optimistic predictions and the very low Outside View probability of a take-off
4. I made qualitative predictions in [Haskell Summer of Code]() for [2010](Haskell Summer of Code#predicting-2010-results) & [2011](Haskell Summer of Code#predicting-2011-results), but I've refrained from recording them because I've been accused of being subjective in my evaluations; for [2012](Haskell Summer of Code#predictions) & [2013](Haskell Summer of Code#predictions-1), I bit the bullet.
5. For my modeling & predictions of [when Google will kill its various products](Google shutdowns#predictions), I registered my own adjustments to the final set of 5-year survival predictions so as to compare my performance with the model's performance 5 years later
### Benefits from making predictions
> Day ends, market closes up or down, reporter looks for good or bad news respectively, and writes that the market was up on news of Intel's earnings, or down on fears of instability in the Middle East. Suppose we could somehow feed these reporters false information about market closes, but give them all the other news intact. Does anyone believe they would notice the anomaly, and not simply write that stocks were up (or down) on whatever good (or bad) news there was that day? That they would say, "hey, wait a minute, how can stocks be up with all this unrest in the Middle East?"^[[Paul Graham](!Wikipedia "Paul Graham (computer programmer)"), ["It's Charisma, Stupid"](http://www.paulgraham.com/charisma.html "Graham 2004")]
When I do use predictions, I've noticed some direct benefits:
- Giving probabilities can make an analysis clearer (how do I know what I think until I see what I predict?); when I speculated on the identity of [Mike Darwin](http://www.chronopause.com)'s patron (above, 'Notes'), the very low probabilities I assigned in the conclusion to any particular billionaire makes clear that I repose no real confidence in any of my guesses and that this is more of a Fermi problem puzzle or exercise than anything else. (And indeed, none of them were correct.) I believe that sharpening my analyses has also made me better at spotting political bloviation and pundits pontifying:
> "Don't ask whether predictions are made, ask whether predictions are implied." --[Steven Kaas](https://twitter.com/#!/stevenkaas/statuses/149616831290818560)
- Going on the record with time-stamps can turn sour-grapes into a small victory. If one read my [Silk Road]() article and saw [a footnote](Silk Road#fn3) to the effect that the Bitcoin forum administrators were censors who removed any discussion of the Silk Road, such an accusation is rather less convincing than a footnote linking to a prediction that a particular thread would be removed and noting that as the reader can verify for themselves, said thread was indeed subsequently deleted.
One of the things I hoped would make my site [unusual](About#long-content) was regularly employing prediction; I haven't been able to do it as often as I hoped, but I've still used it in 19 pages:
- [About](): projections about finishing writing/research projects, and site pageviews
- [Choosing Software](): whether I will continue to use certain software tools chosen in accordance with its principles
- [Haskell Summer of Code](): success of the 2012 projects
- [In Defense Of Inclusionism](): predicting the WMF's half-hearted efforts at editor retention will fail; predictions about informal experiments I've carried out
- [Mistakes](): computer Go
- [Nootropics](): checking the success of blinding Adderall, day-time modafinil, iodine, and nicotine experiments (see above); <!-- TODO: LSD -->
- [Zeo](): checking blinding of 2 vitamin D experiments
- [Notes](Notes#the-hidden-library-of-the-long-now): predictions on Steve Jobs's lack of charity, correctness of speculative analysis
- [Wikipedia and Knol](): in my description of the failure of Knol as a Wikipedia or blog competitor, I naturally registered several estimates of when I expected it to die; I was correct to expect it to die quickly, in 2012 or 2013, but not that the content would remain public. This experience was part of the motivation for my later [Google shutdowns]() analysis.
- [_Evangelion_ predictions](otaku-predictions): see above
- [_Harry Potter and the Methods of Rationality_ predictions](hpmor-predictions) -(an exercise similar to the _Evangelion_ predictions)
- [Prediction markets](): political predictions, Intrade failure predictions, GJP acceptance
- [Silk Road](): prediction of censorship on main Bitcoin forums (see above), and of no legal repercussions
- [Slowing Moore's Law](Slowing Moore's Law): asserts semiconductor manufacturing is fragile and hence Kryder's law has been *permanently* set back by 2011 Thai floods
- [_The Notenki Memoirs_](/docs/2002-notenki-memoirs#fn222): [Hiroyuki Yamaga](!Wikipedia)'s perpetually in-planning movie _Aoki Uru_ will not be released.
- [Modafinil](): correctly predicted tolerance for a particularly frequent user
- [Death Note script](): I registered predictions on what replies I expected from Parlapanides, asking about whether he wrote the leaked script being analyzed, to forestall accusations of hindsight bias
- ["The Crypto-Currency: Bitcoin and its mysterious inventor"](/docs/2011-davis), _The New Yorker_ 2011; mentioned my own failed prediction of a government crackdown
- [Google shutdowns](): as part of my statistical modeling of the likely lifetimes of Google products, I took the final model's predictions of 5-year survival (to 2018) and adjusted them to what I felt intuitively was more right.
### Lessons learned
> "We should not be upset that others hide the truth from us, when we hide it so often from ourselves." --[François de La Rochefoucauld](!Wikipedia), _Maximes_ 11
To sum things up, like the [haunted rationalist](http://lesswrong.com/lw/1l/the_mystery_of_the_haunted_rationalist/), I learned in my gut things that I already supposedly knew - the biases are now [more satisfying](http://lesswrong.com/lw/73r/remind_physicalists_theyre_physicalists/); the following are my subjective impressions:
- I knew (to quote Julius Caesar) that "What we wish, we readily believe, and what we ourselves think, we imagine others think also." or (to quote Orwell), "Politics...is a sort of sub-atomic or non-Euclidean word where it is quite easy for the part to be greater than the whole or for two objects to be in the same place simultaneously."[^Orwell], but it wasn't until I was sure that George Bush would not be re-elected in 2004, that I knew that I could succumb to that even in abstract issues which I had read enormous quantities of information & speculation on.
- while I am weak in areas close to me, in other areas I am underconfident, which is [a sin](http://lesswrong.com/lw/c3/the_sin_of_underconfidence/) and as much to be remedied as overconfidence. (Specifically, it seemed I was initially overconfident on 95%+ predictions and underconfident in the 60-90% regime; I think I've learned my lesson, but by the nature of these things, my recorded calibration will take many predictions to recover in the extreme ranges.)
- I am too optimistic and not cynical enough; the cardinal example, personally, would be the five-year [XiXiDu](http://predictionbook.com/predictions/2909) prediction which was falsified in *one month*. The Outside View heavily militated against it, as did my fellow predictors, and if it had been formulated as something socially disapproved of like alcohol or smoking, I would probably have gone with 10 or 20% like JoshuaZ; but because it was a fellow LessWronger trying to get his life straight...
- I am considerably more skeptical of op-eds and other punditry, after tracking the rare clear predictions they made. (I was already wary due to Tetlock, and a more recent [study of major pundits](http://www.hamilton.edu/news/polls/pundit/an-analysis-of-the-accuracy-of-forecasts-in-the-political-media.pdf) but not enough, it seems.)
The rareness of such predictions has instill in me an appreciation of Hansonian signaling theories of politics - it is *so* hard to get falsifiable predictions out of writings even when they *look* clear; for example, leading up to the 2011 US Federal debt crisis and ratings downgrade, everyone prognosticated furiously - but did they mean any rating agency, or all of them, or just a majority?
- I respect fundamental trends more; they are powerful predictors indeed, and like Philip Tetlock's experts, I find that it's hard to out-perform the past in predicting. I no longer expect much of politicians, who are as trapped as the rest of us.
This could be seen as more use of base rates as the prior, or as moving towards more of an Outside View. I am frequently reminded of the power of reductionism and analysis - pace _MoR_ Quirrel's question to Harry[^mor], what states of the world would a prediction coming true imply had become more likely? Sometimes when I record predictions, I see someone who has [clearly not considered](http://lesswrong.com/lw/1la/new_years_predictions_thread/2hg3) what his predictions coming true implies about the *current* state of the world; I sigh and reflect on how you just can't get *there* from *here*.
- Merely contemplating seriously my predictions over years and decades makes the future much more concrete to me; I will live most of my life there, so I *should* take a longer-term perspective.
- Making thousands of predictions has helped me gain detachment from particular positions and ideas (which made it easier for me to write my [Mistakes]() essay and publicly admit them - after so many 'failures' on PB.com, what were a few described in more detail?) To quote [Alain de Botton](https://twitter.com/#!/alaindebotton/status/112772752041181184):
> The best salve for failure -- to have quite a lot else going on.
This detachment itself seems to help accuracy; I was struck by a psychology study demonstrating that not only are people better at falsifying theories put forth by other people, they are better at falsifying *when pretending it is held by an imaginary friend*[^imaginaryfriend]!
- Raw probabilities are more intuitive; I can't describe this much better than the poker article, ["This is what 5% feels like."](http://rationalpoker.com/2011/04/21/this-is-what-5-feels-like/)
- [Planning fallacy](http://lesswrong.com/lw/jg/planning_fallacy/): I knew it perfectly well, but still committed it until I tracked predictions; this is true both of my own mundane activities like writing, and larger more global events (recently, running out the clock on the Palestinian nationhood UN vote)
This was interesting because it's so easy to make excuses - 'I would've succeeded if not for X!' The question (in the classic study) is whether students could predict their projects' actual completion time; they're not trying to predict project completion time given a hypothetical version of themselves which didn't procrastinate. If they aren't self-aware enough to know they procrastinate and to take that into account - their predictions are still bad, no matter *why* they're bad. (And someone on the outside who is told that in the past the students had finished -1 days before the due date will just shrug and say: 'regardless of whether they took so long because of procrastination, or because of [Parkinson's law](!Wikipedia), or because of a 3rd reason, I have no reason to believe they'll finish early *this* time.' And they'd be absolutely correct.) It's like a fellow who predicts he won't fall off a cliff, but falls off anyway. 'If only that cliff hadn't been there, I wouldn't've fallen!' Well, duh. But you still fell. How can you correct this until you stop making excuses?
- Less [hindsight bias](http://wiki.lesswrong.com/wiki/Hindsight_bias); when I have my previous opinions written down, it's harder to claim I knew it all along (when I didn't), and as [Arkes et al 1988](/docs/sunkcosts/1988-arkes.pdf "Eliminating the hindsight bias") indicated, writing down my reasons (even in Twitter-sized comments) helped prevent it.
Example: I had put the 2011 S&P downgrade at [5%](http://predictionbook.com/predictions/2030), and reminded of my skepticism, I can see the double-standards being applied by pundits - all of a sudden they remember how the ratings agencies failed in the housing bubble and how the academic literature has proven they are inferior to the [CDS](!Wikipedia "Credit default swap") markets and how they are a bad government-granted monopoly, even though they were happy to cite the AAA rating beforehand and are still happy to cite the *other* ratings agencies... In short, while base rates are powerful indeed, there are still many exogenous events and multiplicities of low probability events.
[^Orwell]: ["In Front of Your Nose"](http://orwell.ru/library/articles/nose/english/e_nose), [George Orwell](!Wikipedia) 1946:
> To see what is in front of one's nose needs a constant struggle. One thing that helps toward it is to keep a diary, or, at any rate, to keep some kind of record of one's opinions about important events. Otherwise, when some particularly absurd belief is exploded by events, one may simply forget that one ever held it. Political predictions are usually wrong. But even when one makes a correct one, to discover why one was right can be very illuminating. In general, one is only right when either wish or fear coincides with reality. If one recognizes this, one cannot, of course, get rid of one's subjective feelings, but one can to some extent insulate them from one's thinking and make predictions cold-bloodedly, by the book of arithmetic. In private life most people are fairly realistic. When one is making out one's weekly budget, two and two invariably make four. Politics, on the other hand, is a sort of sub-atomic or non-Euclidean word where it is quite easy for the part to be greater than the whole or for two objects to be in the same place simultaneously. Hence the contradictions and absurdities I have chronicled above, all finally traceable to a secret belief that one's political opinions, unlike the weekly budget, will not have to be tested against solid reality.
[^mor]: Eliezer Yudkowsky, [chapter 20](http://www.fanfiction.net/s/5782108/20/Harry_Potter_and_the_Methods_of_Rationality), [_Harry Potter and the Methods of Rationality_](http://www.hpmor.com/):
> ...while I suppose it is barely possible that perfectly good people exist even though I have never met one, it is nonetheless *improbable* that someone would be beaten for fifteen minutes and then stand up and feel a great surge of kindly forgiveness for his attackers. On the other hand it is *less* improbable that a young child would imagine this as the *role to play* in order to convince his teacher and classmates that he is not the next Dark Lord.
>
> The import of an act lies not in what that act *resembles on the surface*, Mr. Potter, but in the states of mind which make that act more or less probable.
[^imaginaryfriend]: ["When falsification is the only path to truth"](http://csjarchive.cogsci.rpi.edu/Proceedings/2005/docs/p512.pdf); abstract:
> Can people consistently attempt to falsify, that is, search for refuting evidence, when testing the truth of hypotheses? Experimental evidence indicates that people tend to search for confirming evidence. We report two novel experiments that show that people can consistently falsify when it is the only helpful strategy. Experiment 1 showed that participants readily falsified somebody else's hypothesis. Their task was to test a hypothesis belonging to an 'imaginary participant' and they knew it was a low quality hypothesis. Experiment 2 showed that participants were able to falsify a low quality hypothesis belonging to an imaginary participant more readily than their own low quality hypothesis. The results have important implications for theories of hypothesis testing and human rationality.
One line of thought in [evolutionary psychology](!Wikipedia) is that our minds are *not* evolved for truth-seeking per se, but rather are split between heuristics and effective procedures like that, and argumentation to try to deceive & persuade others; eg. ["Why do humans reason? Arguments for an argumentative theory"](http://files.meetup.com/284333/Philosophy-Reason_SSRN-id1698090.pdf) (Mercier & Sperber 2011). This ties in well with why we are better at falsifying the theories of *others* - you don't convince anyone by falsifying your own theories, but you do by falsifying others' theories.
I think, but am not sure, that I really have [internalized](http://lesswrong.com/lw/1yq/understanding_your_understanding/) these lessons; they simply seem... obvious to me, now. I was surprised when I looked up my earliest work and saw it was only around 14 months ago - I felt like I'd been recording predictions for far longer.
### Non-benefits
> "If people don't want to come to the ballpark how are you going to stop them?" --[Yogi Berra](!Wikipedia), p. 36 [_The Yogi book_](http://www.amazon.com/Yogi-Book-Berra/dp/0761154434/) (1997)
Making predictions has been personally costly; while some predictions have been total time investments of a score of seconds, other predictions required considerable research, and thinking carefully is no picnic, as we've all noticed. I justify the invested time as a learning experience which would hopefully pay off for others as well, who can free-ride off the many predictions (eg. the [soon-to-expire predictions](http://predictionbook.com/predictions/future)) I have laboriously added to PB.com. (Only a fool learns from his mistakes only.)
What I have not noticed? It was suggested that predictions might help me in resolutions based on some experimental evidence[^hbv]; I did not notice anything, but I didn't carefully track it or put in predictions about many routine tasks. Making predictions seems to be largely effective for improving one's *epistemic* rationality; I make no promises or implied warranties as to whether it is *instrumentally* rational.
[^hbv]: ["Can self-prediction overcome barriers to hepatitis B vaccination? A randomized controlled trial"](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3244552/ "Cox et al 2011"):
> Half of participants were assigned randomly to a "self-prediction" intervention, asking them to predict their future acceptance of HBV vaccination. The main outcome measure was subsequent vaccination behavior. Other measures included perceived barriers to HBV vaccination, measured prior to the intervention. Results: There was a [statistically-]significant interaction between the intervention and vaccination barriers, indicating the effect of the intervention differed depending on perceived vaccination barriers. Among high-barriers patients, the intervention [statistically-]significantly increased vaccination acceptance. Among low-barriers patients, the intervention did not influence vaccination acceptance. Conclusions: The self-prediction intervention [statistically-]significantly increased vaccination acceptance among "high-barriers" patients, who typically have very low vaccination rates.
### How I make predictions
A prediction can be broken up into 3 steps:
1. The specification
2. The due-date
3. The probability
The first issue is simply formulating the prediction. The goal is to make a statement on an objective and easily checkable fact; imagine that the other people predicting are yourself if you had been raised in some completely opposite fashion like an evangelical Republican household, and they are quite as suspicious of you as you are of them, and believe you to be suffering from as many partisan and self-serving biases as you believe them to. Wording is important as words [frame](!Wikipedia "Framing (social sciences)") how we think about things and can directly bias us (eg. [push poll](!Wikipedia)s)[^Rowe-wording]. The prediction should be so clear that they would expose themselves to mockery even among their own kind if they were to seriously disagree about the judgment^[It may help to read the dialogue/examples of "Dr. Malfoy" and "Dr. Potter" in [chapter 22](http://www.fanfiction.net/s/5782108/22/Harry_Potter_and_the_Methods_of_Rationality "The Scientific Method") of Eliezer Yudkowsky's _Harry Potter and the Methods of Rationality_.]. For example, 'Obama will be the next President' is perfectly precise - *everyone* knows and understands what it is to be President and how one would decide - and so there's no need to do any more; it would be risible to try to deny it. On the other hand, 'the globe will increase 1 degree Fahrenheit' may initially sound good, but your dark counterpart immediately objects: 'what if it's colder in Russia? When is this increase going to happen? Is this exactly 1 degree or are you going to try to claim as success only 0.9 degrees too? Who's deciding this anyway?' A good resolution might be 'OK, global temperatures will increase >=1.0 degrees Fahrenheit on average according to the next IPCC report'.
[^Rowe-wording]: Rowe & Wright 2001:
> - In phrasing questions, use clear and succinct definitions and avoid emotive terms.
>
> How a question is worded can lead to [substantial] response biases. By changing words or emphasis, one can induce respondents to give dramatically different answers to a question. For example, Hauser (1975) describes a 1940 survey in which 96% of people answered yes to the question "do you believe in freedom of speech?" and yet only 22% answered yes to the question "do you believe in freedom of speech to the extent of allowing radicals to hold meetings and express their views to the community?" The second question is consistent with the first; it simply entails a fuller definition of the concept of freedom of speech. One might therefore ask which of these answers more clearly reflects the views of the sample. Arguably, the more apt representation comes from the question that includes a clearer *definition* of the concept of interest, because this should ensure that the respondents are all *answering the same question*. Researchers on Delphi per se have shown little empirical interest in question wording. Salancik, Wenger and Heifer (1971) provide the only example of which we are aware; they studied the effect of question length on initial panelist consensus and found that one could apparently obtain greater consensus by using questions that were neither "too short" nor "too long." This is a generally accepted principle for wording items on surveys: they should be long enough to define the question adequately so that respondents do not interpret it differently, yet they should not be so long and complicated that they result in information overload, or so precisely define a problem that they demand a particular answer. Also, questions should not contain emotive words or phrases: the use of the term "radicals" in the second version of the freedom-of-speech question, with its potentially negative connotations, might lead to emotional rather than reasoned responses.
> - Frame questions in a balanced manner.
>
> Tversky and Kahneman (1974, 1981) provide a second example of the way in which question framing may bias responses. They posed a hypothetical situation to subjects in which human lives would be lost: if subjects were to choose one option, a certain number of people would *definitely* die, but if they chose a second option, then there was a *probability* that more would die, but also a chance that less would die. Tversky and Kahneman found that the proportion of subjects choosing each of the two options changed when they phrased the options in terms of people surviving instead of in terms of dying (i.e., subjects responded differently to an option worded "60% will survive" than to one worded "40% will die," even though these are logically identical statements). The best way to phrase such questions might be to clearly state both death and survival rates (balanced), rather than leave half of the consequences implicit. Phrasing a question in terms of a single perspective, or numerical figure, may provide an anchor point as the focus of attention, so biasing responses.
Deciding the due-date of a prediction is usually trivial and not worth discussing; when making open-ended predictions about people (eg. 'X will receive a Nobel Prize'), I find it helpful to consult [life table](!Wikipedia)s like [Social Security's table](http://www.ssa.gov/oact/STATS/table4c6.html) to figure out their average life expectancy and then set the due-date to that. (This both minimizes the number of changes to the due date and helps calibrate us by pointing out what time spans we're really dealing with.)
When we begin deciding what probability to give the prediction, we can employ a number of heuristics (partially drawn from ["Techniques for probability estimates"](http://lesswrong.com/lw/3m6/techniques_for_probability_estimates)):
1. What does the prediction about the future world imply about the present world?
Every prediction one makes is also a *retrodiction*: you are claiming that the world is now and in the past on a course towards the future you have picked out of all the possibilities (or not on that course), and on that course to the degree you specified. What does your claim imply about the world as it is now? The world has to be in a state which can progress of its own internal logic to the future state, and so we can work backwards to figure out what that implies about the present or past. (You can think of this as a kind of proof by contradiction: assuming prediction _X_ is true, what can we infer from _X_ about the present world which is absurd?)
In our first example, Miller predicted 15% for ["Within ten years either genetic manipulation or embryo selection will have been used on at least 50% of Chinese babies to increase the babies' expected intelligence"](http://predictionbook.com/predictions/1689). This initially seems reasonable: China is a big place with known interests in eugenics. But then we start working backwards - this prediction implies handling >=9 million pregnancies annually, which entails hundreds of thousands of gynecologists, geneticists, lab technicians etc., which all have lead-times measured in years or decades. (It takes a long time to train a doctor even if your standards are low.) And the program must be set up with hundreds of thousands of employees, policies experimented with and implemented, and so on. As matters stand, even in the United States mere [SNP](!Wikipedia "Single-nucleotide polymorphism") genotyping couldn't be done for 9 million people annually, and genetic sequencing is much more expensive & difficult, and genetic modification is even hairier. If we work backwards, we would expect to see such a program already begun and active as it frantically tries to scale up to handle those millions of cases a year in order to hit Miller's deadline. But as far as I knows, all the pieces are absent in China as of the day it was predicted; hence, it's already too late. And then there are the politics; it is a deeply doubtful assertion that the Chinese population would countenance this, given the stress over the [One Child policy](!Wikipedia) and the continuing [selective abortion](!Wikipedia "Sex-selective abortion") crisis. Even if the prediction comes true eventually, it definitely will not come true in time. (The same logic applies to ["Within ten years the SAT testing service will require students to take a blood test to prove they are not on cognitive enhancing drugs."](http://predictionbook.com/predictions/1691); [~1.65 million test-takers](http://press.collegeboard.org/releases/2011/43-percent-2011-college-bound-seniors-met-sat-college-and-career-readiness-benchmark) implies scores of thousands of [phlebotomists](!Wikipedia), who do not exist, although in theory they could be trained in under a year - but whence the trainers?)
A second example would be a series of predictions on anti-aging/life-extension registered in November 2011. The first and earliest prediction - ["By 2025 there will be at least one confirmed person who has lived to 130"](http://predictionbook.com/predictions/3847) - initially seems at least possible (I am optimistic about the approaches suggested by [SENS](!Wikipedia)), and so I assigned it a reasonable probability of 3%. But I felt troubled - something about it seemed wrong. So I applied this heuristic: what does the existence of an 130 year-old in 2025 imply about people in 2011? Well, if someone is 130 in 2025, then that implies that are now 116 years old ($130 - (2025 - 2011)$). Then I looked up the then-oldest person in the world: [Besse Cooper](!Wikipedia), aged 11*5* years old. Oops. It's *impossible* for the prediction to come true, but because we didn't think about what it coming true implied about the present world, we made an absurdly high prediction. We can do this for all the other anti-aging predictions; for example ["By 2085 there will be at least one confirmed person who has lived to 150"](http://predictionbook.com/predictions/4431) can be rephrased as 'someone aged 76 now will live to 2085', which seems implausible except with a [technological singularity](!Wikipedia) of some sort ("Hmm, phrased in that context, my estimate has to go down"). This can be applied to financial or economic questions, too, since under even the weakest version of [efficient markets](!Wikipedia), the markets are smarter than you - [Tyler Cowen](!Wikipedia) asks [why we don't](http://marginalrevolution.com/marginalrevolution/2011/11/what-is-the-future-of-solar-power.html) see investor piling into solar power if it's following an exponential curve downwards and is such a great idea ([Robin Hanson](http://www.overcomingbias.com/2011/11/when-see-solar.html) appeals to discount rates and purblind investors).
The idea of 'rephrasing' leads directly into the next heuristic.
2. [Base rates](!Wikipedia). Already discussed, but base rates should be your mental starting point for every prediction, before you take into account any other opinion or belief.
Base rates are easily expressed in terms of frequencies: "of the last Y years, X happened only once, so I will start with 1/Y%". ("There are 10 candidates for the 2012 Republican nominee, so I will assume 10% until I've looked at each candidate more closely.") Frequencies have a long history in the academic literature of making suboptimal or fallacious performance just disappear[^Rowe-frequencies], and there's no reason to think that is not true for your predictions as well. This works for personal predictions as well - focus on what sort of person you are, how you've done in similar cases over years, and you'll improve your predictions[^personal-predictions].
An example: ["A Level 7 (Chernobyl/2011 Japan level) nuclear accident will take place by end of 2020"](http://predictionbook.com/predictions/4654). One's gut impression is a very bad place to start because Fukushima and Chernobyl - mentioned in the very prediction! - are such vivid and [mentally available](!Wikipedia "Availability heuristic") examples. 60%? 50%? Read the coverage of Fukushima and many people give every impression of expecting fresh disasters in coming years. (Look at Germany quickly announcing the shutdown of its nuclear reactors, despite tsunamis not being a *frequent* problem in northern Europe, shall we say.) But if we *start* with base rates and look up nuclear accidents, we realize something interesting: Chernobyl and Fukushima come to mind readily in part because they are - literally - the *only* such level-7 accidents over the past >40 years. So the frequency would be 1 in ~20 years, which puts a different face on a prediction spanning 9 years. This gives us a base rate more like ~40%. This is our starting point for asking how much does the rate go down because Fukushima has prompted additional safety improvements or closure of older plants (Fukushima's equally-outdated sibling nuclear plants will have a harder time getting stays in execution) and how much the rate goes up due to global warming or aging nuclear plants. But from here we can hope to arrive at a sensible answer and not be spooked by a recent incident.
3. Breaking predictions down into conjunctions
Similar to heuristic #1, we may not realize what a prediction implies *internally* and so wind up giving high probability to [a vivid or interesting scenario](!Wikipedia "Conjunction fallacy").
"Hillary Clinton will become President in 2016" is specific, easily dateable, implies things about the present world like rumors of Clinton running and strong political connections (as do exist), and yet this prediction is *still easy to mess up* for someone in 2012. Why? Because becoming President is actually the outcome of a long series of steps, every one of which must be successful and every one of which is doubtful: Hillary must resign from the White House where she was then Secretary of State, she must announce a run, she must become Democratic nominee (out of several candidates), and she must actually win. It's the exceptional nominee who ever has >50% odds, so we start with a coin flip and work our way down to perhaps a few percent. This is more plausible than most national-level Democrats, but not as plausible as pundits might lead you to believe.
We can see a particularly striking failure to analyze in the prediction ["Obama gets reelected and during that time Hillary Clinton brokers the middle east peace deal between Israel and Palestine for the two state solution. This secures her presidency in 2016."](http://predictionbook.com/predictions/4145), where the predictor gave it a flabbergasting *80%*; before clicking through, the reader is invited to assign probabilities to the following events (and then multiply them to obtain the probability that they will *all* come true):
1. Barack Obama is re-elected
2. A Middle East peace deal is brokered
3. The peace deal is for a two state solution
4. Hillary Clinton runs in 2016
5. Hillary Clinton is the 2016 Democratic nominee
6. Hillary Clinton is elected
(Sometimes the examples are [even more extreme](http://predictionbook.com/predictions/5070) than 6 clauses.) This heuristic is not perfect, as it works best on step-by-step processes where every step must happen. If this is not true, the heuristic will be overly pessimistic. Worse, it is possible to lie to ourselves by simply breaking down the steps into ever tinier steps and giving them relatively small probabilities like 99%: the opposite of the good heuristic is the bad [Subadditivity effect](!Wikipedia), where if we then multiple out each of our exaggerated sub-steps, we wind up being absurdly skeptical. [Steven Kaas](https://twitter.com/#!/stevenkaas/status/9852587905912832) furnishes an example:
> Walking requires dozens of different muscles working together, so if you think you can walk you're just committing the conjunction fallacy.
(One more complex use of this heuristic is to combine it with a [hope function analysis](/docs/statistics/1994-falk "'The Ups and Downs of the Hope Function In a Fruitless Search', Falk et al 1994"): decide the odds of it ever happening, the last date it could happen by, and then you can figure out how your confidence will change in each year that goes by without it happening. I have found this useful in thinking about Artificial Intelligence, which is something that may or may not happen but which one should *somehow* be changing one's opinion on as another year goes by with no H.A.L.)
4. Building predictions up into disjunctions
One of the problems with non-frequency information is that we're not always good at an 'absolute pitch' for probability - we may have intuitive probabilities but they are fuzzy. On the other hand, comparisons are much easier: I may not be able to say that Obama had a 52.5% chance of election vs McCain at 47.3%, but I can tell you which guy was on the happier side of 50%. This suggests we pit predictions against each other: I pit my intuition about Obama against my intuition about McCain and I see Obama comes out on top. The more predictions you can pit against each other the better, which ultimates leads to an exhaustive list of outcomes, a full disjunction: "either Obama (52.5%) *or* McCain (47.3%) *or* Nader (0.2%) will win"
Surprised to see Ralph Nader there? He ran too, you know. This is one of the pitfalls of disjunctive reasoning (as overstated conditionality and floors on percentages are a pitfall of conjunctive reasoning), the pitfall of the possibilities you forgot to list and make room for.
Nader is pretty trivial, but imagine you were discussing Middle Eastern politics and your interlocutor immediately goes "either Israel will aerially attack Iran *or* Israel will launch covert ops *or* the US will aerially attack Iran *or*..." If you dutifully begin assigning probabilities ("let's see, 15% sounds reasonable, and covert ops is a lot less probable so we'll give that just 5%, and then the US is just as likely to attack Iran so that's 15% too, and..."), you find you have somehow concluded Iran will be attacked, 35%+, when no prediction market remotely agrees with you! What happened? You read about one disjunct ("Iran will be attacked, period") divided up into fine detail, [anchored](!Wikipedia "Anchoring") on it, and ignored how many possibilities were also being tucked away under "Iran will not be attacked, period". If you had constructed your own disjunction before listening to the other guy, you might have instead said that no-attack was 80%+ probable, and then correctly divvied up the remaining percentage among the various attack options. Even domain-experts have problems when the tree of categories or outcomes is presented to them with modifications, unfortunately[^disjunctions].
5. Sets of predictions must be consistent: a full set of disjunctions must add to 100%, the probability something will happen and will not happen must also sum to 100%, etc.[^Rowe-consistency] It's surprising how often people mess this up.
[^Rowe-consistency]: Rowe & Wright 2001:
> - Use coherence checks when eliciting estimates of probabilities.
>
> Assessed probabilities are sometimes incoherent. One useful *coherence* check is to elicit from the forecaster not only the probability (or confidence) that an event will occur, but also the probability that it will not occur. The two probabilities should sum to one. A variant of this technique is to *decompose* the probability of the event not occurring into the occurrence of other possible events. If the events are mutually exclusive and exhaustive, then the addition rule can be applied, since the sum of the assessed probabilities should be one. Wright and Whalley (1983) found that most untrained probability assessors followed the additivity axiom in simple two-outcome assessments involving the probabilities of an event happening and not happening. However, as the number of mutually exclusive and exhaustive events in a set increased, more forecasters became supra-additive, and to a greater extent, in that their assessed probabilities added up to more than one. Other coherence checks can be used when events are interdependent (Goodwin and Wright 1998; Wright, et al. 1994).
>
> There is a debate in the literature as to whether decomposing analytically complex assessments into analytically more simple marginal and conditional assessments of probability is worthwhile as a means of simplifying the assessment task. This debate is currently unresolved (Wright, Saunders and Ayton 1988; Wright et al. 1994). Our view is that the best solution to problems of inconsistency and incoherence in probability assessment is for the pollster to show forecasters the results of such checks and then allow interactive resolution between them of departures from consistency and coherence. MacGregor (2001) concludes his review of decomposition approaches with similar advice.
>
> When assessing probability distributions (e.g., for the forecast range within which an uncertainty quality will lie), individuals tend to be overconfident in that they forecast too narrow a range. Some response modes fail to counteract this tendency. For example, if one asks a forecaster initially for the median value of the distribution (the value the forecaster perceives as having a 50% chance of being exceeded), this can act as an anchor. Tversky and Kahneman (1974) were the first to show that people are unlikely to make sufficient adjustments from this anchor when assessing other values in the distribution. To counter this bias, Goodwin and Wright (1998) describe the "probability method" for eliciting probability distributions, an assessment method that de-emphasizes the use of the median as a response anchor. McClelland and Bolger (1994) discuss overconfidence in the assessment of probability distributions and point probabilities. Wright and Ayton (1994) provide a general overview of psychological research on subjective probability. Arkes (2001) lists a number of principles to help forecasters to counteract overconfidence.
[^Rowe-frequencies]: For example, the famous and replicated examples of doctors failing to correctly apply Bayes' theorem to cancer rates is reduced when the percentages are translated into frequencies. Rowe & Wright 2001 give this advice:
> - When possible, give estimates of uncertainty as frequencies rather than probabilities or odds.
>
> Many applications of Delphi require panelists to make either numerical estimates of the probability of an event happening in a specified time period, or to assess their confidence in the accuracy of their predictions. Researchers on behavioral decision making have examined the adequacy of such numerical judgments. Results from these findings, summarized by Goodwin and Wright (1998), show that sometimes judgments from *direct* assessments (what is the probability that...?) are inconsistent with those from *indirect* methods. In one example of an indirect method, subjects might be asked to imagine an urn filled with 1,000 colored balls (say, 400 red and 600 blue). They would then be asked to choose between betting on the event in question happening, or betting on a red ball being drawn from the urn (both bets offering the same reward). The ratio of red to blue balls would then be varied until a subject was *indifferent* between the two bets, at which point the required probability could be inferred. Indirect methods of eliciting subjective probabilities have the advantage that subjects do not have to verbalize numerical probabilities. Direct estimates of odds (such as 25 to 1, or 1,000 to 1), perhaps because they have no upper or lower limit, tend to be more extreme than direct estimates of *probabilities* (which must lie between zero and one). If probability estimates derived by different methods for the same event are inconsistent, which method should one take as the true index of degree of belief? One way to answer this question is to use a single method of assessment that provides the most consistent results in repeated trials. In other words, the subjective probabilities provided at different times by a single assessor for the same event should show a high degree of agreement, given that the assessor's knowledge of the event is unchanged. Unfortunately, little research has been done on this important problem. Beach and Phillips (1967) evaluated the results of several studies using direct estimation methods. Test-retest correlations were all above 0.88, except for one study using students assessing odds, where the reliability was 0.66.
>
> Gigerenzer (1994) provided empirical evidence that the untrained mind is not equipped to reason about uncertainty using subjective probabilities but is able to reason successfully about uncertainty using frequencies. Consider a gambler betting on the spin of a roulette wheel. If the wheel has stopped on red for the last 10 spins, the gambler may feel subjectively that it has a greater probability of stopping on black on the next spin than on red. However, ask the same gambler the relative frequency of red to black on spins of the wheel and he or she may well answer 50-50. Since the roulette ball has no memory, it follows that for each spin of the wheel, the gambler should use the latter, relative frequency assessment (50-50) in betting. Kahneman and Lovallo (1993) have argued that forecasters tend to see forecasting problems as unique when they should think of them as instances of a broader class of events. They claim that people's natural tendency in thinking about a particular issue, such as the likely success of a new business venture, is to take an "inside" rather than an "outside" view. Forecasters tend to pay particular attention to the distinguishing features of the particular event to be forecast (e.g., the personal characteristics of the entrepreneur) and reject analogies to other instances of the same general type as superficial. Kahneman and Lovallo cite a study by Cooper, Woo, and Dunkelberger (1988), which showed that 80% of entrepreneurs who were interviewed about their chances of business success described this as 70% or better, while the overall survival rate for new business is as low as 33 percent. Gigerenzer's advice, in this context, would be to ask the individual entrepreneurs to estimate the proportion of new businesses that survive (as they might make accurate estimates of this relative frequency) and use this as an estimate of their own businesses surviving. Research has shown that such interventions to change the required response mode from subjective probability to relative frequency improve the predictive accuracy of elicited judgments. For example, Sniezek and Buckley (1991) gave students a series of general knowledge questions with two alternative answers for each, one of which was correct. They asked students to select the answer they thought was correct and then estimate the probability that it was correct. Their results showed the same general overconfidence that Arkes (2001) discusses. However, when Sniezek and Buckley asked respondents to state how many of the questions they had answered correctly of the total number of questions, their frequency estimates were accurate. This was despite the fact that the same individuals were generally overconfident in their subjective probability assessments for individual questions. Goodwin and Wright (1998) discuss the usefulness of distinguishing between single-event probabilities and frequencies. If a reference class of historic frequencies is not obvious, perhaps because the event to be forecast is truly unique, then the only way to assess the likelihood of the event is to use a subjective probability produced by judgmental heuristics. Such heuristics can lead to judgmental overconfidence, as Arkes (2001) documents.
[^personal-predictions]: From [_Principles of Forecasting_](/docs/predictions/2001-principlesforecasting.pdf "Armstrong et al 2001"):
> [Osberg and Shrauger (1986)](/docs/predictions/1986-osberg.pdf "Self-Prediction: Exploring the Parameters of Accuracy") determined prediction accuracy by scoring an item as a hit if the respondents predicted the event definitely or probably would occur and it did, or if the respondent predicted that the event definitely or probably would not occur and it did not. Respondents who were instructed to focus on their own personal dispositions predicted [statistically-]significantly more of the 55 items correctly (74%) than did respondents in the control condition who did not receive instructions (69%). Respondents whose instructions were to focus on personal base rates had higher accuracy (72%) and respondents whose instructions were to focus on population base rates had lower accuracy (66%) than control respondents, although these differences were not statistically-significant.
[^disjunctions]: From [Richards Heuer](!Wikipedia)'s [_Psychology of Intelligence Analysis_](https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/psychology-of-intelligence-analysis/art13.html):
> Ideally, intelligence analysts should be able to recognize what relevant evidence is lacking and factor this into their calculations. They should also be able to estimate the potential impact of the missing data and to adjust confidence in their judgment accordingly. Unfortunately, this ideal does not appear to be the norm. Experiments suggest that "out of sight, out of mind" is a better description of the impact of gaps in the evidence.
>
> This problem has been demonstrated using fault trees, which are schematic drawings showing all the things that might go wrong with any endeavor. Fault trees are often used to study the fallibility of complex systems such as a nuclear reactor or space capsule.
>
> A fault tree showing all the reasons why a car might not start was shown to several groups of experienced mechanics.^[96](/docs/predictions/1978-fischhoff.pdf "'Fault trees: Sensitivity of estimated failure probabilities to problem representation', Fischhoff et al 1978")^ The tree had seven major branches--insufficient battery charge, defective starting system, defective ignition system, defective fuel system, other engine problems, mischievous acts or vandalism, and all other problems--and a number of subcategories under each branch. One group was shown the full tree and asked to imagine 100 cases in which a car won't start. Members of this group were then asked to estimate how many of the 100 cases were attributable to each of the seven major branches of the tree. A second group of mechanics was shown only an incomplete version of the tree: three major branches were omitted in order to test how sensitive the test subjects were to what was left out.
>
> If the mechanics' judgment had been fully sensitive to the missing information, then the number of cases of failure that would normally be attributed to the omitted branches should have been added to the "Other Problems" category. In practice, however, the "Other Problems" category was increased only half as much as it should have been. This indicated that the mechanics shown the incomplete tree were unable to fully recognize and incorporate into their judgments the fact that some of the causes for a car not starting were missing. When the same experiment was run with non-mechanics, the effect of the missing branches was much greater.
>
> As compared with most questions of intelligence analysis, the "car won't start" experiment involved rather simple analytical judgments based on information that was presented in a well-organized manner. That the presentation of relevant variables in the abbreviated fault tree was incomplete could and should have been recognized by the experienced mechanics selected as test subjects. Intelligence analysts often have similar problems. Missing data is normal in intelligence problems, but it is probably more difficult to recognize that important information is absent and to incorporate this fact into judgments on intelligence questions than in the more concrete "car won't start" experiment.
# See also
- [Statistically judging the 2012 election forecasters](2012 election predictions "Compiling academic and media forecaster's 2012 American presidential election predictions and statistically judging correctness; Nate Silver was not the best")
# External links
- ["Calibrate your self-assessments"](http://lesswrong.com/lw/7o7/calibrate_your_selfassessments/) -(miscalibration of one's capability, performance, personal appearance etc. can cause suffering & stress)
- ["Calibrating our Confidence"](http://measureofdoubt.com/2011/08/17/calibrating-our-confidence/) -(on PB.com)
- ["Amanda Knox: post mortem"](http://lesswrong.com/lw/84j/amanda_knox_post_mortem/) -(even if we cannot infallibly judge our predictions, our beliefs should still change over time)
- ["PredictionBook: A Short Note"](http://lesswrong.com/r/discussion/lw/8dx/predictionbook_a_short_note/)
- ["How Using a Decision Journal can Help you Make Better Decisions"](http://www.farnamstreetblog.com/2014/02/decision-journal/)
- [Hacker News discussion](https://news.ycombinator.com/item?id=6489135)
# Appendices
## _Modus tollens_ vs _modus ponens_
For an explanation of this aphorism, see ["Knowing your argumentative limitations, OR one rationalist's _modus ponens_ is another's _modus tollens_."](http://www.overcomingbias.com/2008/01/knowing-your-ar.html); a modern version is George Moore's [here is a hand](!Wikipedia) argument, and it is related to the [Duhem-Quine thesis](!Wikipedia). It can be considered a flaw in uses of proofs by contradiction or the _[reductio ad absurdum](!Wikipedia)_ - how does one know the conclusion really is absurd and to reject one of the premises instead of perhaps "biting the bullet", as in [a mock review of smoking's benefits](http://lesswrong.com/lw/8j1/how_to_prove_anything_with_a_review_article/5aw9)? A fun Islamic version goes ([Imam al-Haddad](!Wikipedia), _The Sublime Treasures_):
> Nothing can be soundly understood \
> If daylight itself needs proof.
When Roberts argues that one's subjective memories about sleep conflict with the sleep data recorded by one's Zeo EEG, does that constitute a disproof of the Zeo's accuracy? No: establishing contradictions between one's memories/subjective impressions and the Zeo merely tells us that *one* (or both) are wrong; it doesn't tell us that the *Zeo* is wrong unless you have additional data or arguments which say that the Zeo is less reliable than the memories. One could take the Zeo contradicting memories as just proof of the [fallibility of sleep-related memories](http://web.archive.org/web/20120628120817/http://blog.myzeo.com/sleep-forgetting-to-remember-to-forget/)[^sleep-fallibility]! (The fundamental question of epistemology: "What do you believe, and why do you believe it?")
[^sleep-fallibility]: That sleep affects consciousness & memory is an uncontroversial claim; eg in the [_WSJ_](http://www.marketwatch.com/Story/story/print?guid=C5AC50E0-C305-11E2-BA61-002128040CF6 "10 things the sleep-aid industry won't tell you: A little light reading to distract you from the cracks in your ceiling"):
> One little known aspect of insomnia is that the seemingly sleep-deprived often underestimate (or overestimate) how much shut-eye they're getting, says Matt Bianchi, director of the sleep division at Massachusetts General Hospital in Boston. "They could sleep seven hours in the sleep lab and they would say they didn't sleep one minute," says Bianchi, adding that many patients also wake up multiple times without remembering it. That disconnect has sparked numerous apps and gadgets that offer to help people gauge how much sleep they're getting.
For example, if someone is caught on camera sleep-walking, and denies strenuously that he was sleep-walking, do you take _modus ponens_ and say his memories prove he was not sleep-walking and reject the camera footage; or _modus tollens_ and say that the claim his sleep memories are reliable imply he could not have been caught on camera, but he was, therefore we can reject the claim his sleep memories imply no walking? But extraordinary claims require extraordinary evidence, so obviously you choose to take _modus tollens_ - because you have priors which say that memories are malleable and untrustworthy, while camera footage is much harder to fake. Before the discovery of the timing error, the 2011 [FTL neutrinos](!Wikipedia "Faster-than-light neutrino anomaly") was an excellent place to apply this ['I defy (that particular) data'](http://wiki.lesswrong.com/wiki/Defying_the_data) reasoning, as are such errors in general: [Steven Kaas](http://web.archive.org/web/20131020232721/http://www.acceleratingfuture.com/steven/?p=238 "A New Challenge to 98% Confidence") puts it nicely:
> According to [the 2009 blog post] ["A New Challenge to Einstein"](http://blogs.discovermagazine.com/cosmicvariance/2009/10/12/a-new-challenge-to-einstein/), General Relativity has been refuted at 98% confidence. I wonder if it wouldn't be more accurate to say that, actually, 98% confidence has been refuted at General Relativity.
Similarly, if I read [Sturrock 2013](http://www.amazon.com/dp/0984261419/ "AKA Shakespeare: A Scientific Approach to the Authorship Question") on the [Shakespeare authorship question](!Wikipedia) where he argues that Shakespeare's plays were not written by Shakespeare but by [De Vere](!Wikipedia "Edward de Vere, 17th Earl of Oxford"), and gives an example analysis concluding that the odds Shakespeare wrote his plays is [1 in 10^13 (10 trillion)](http://lesswrong.com/r/discussion/lw/huj/book_aka_shakespeare_an_extended_bayesian/), then I am apt to think that the available evidence on the issue could *never* afford us an extraordinary level of certainty, that we do not have this level of certainty in things like the theory of relativity, and that Sturrock has instead proven his analysis to be 1 in trillions likely to be a valid analysis! When any method claims to have reached such an extraordinary level of certainty, it has certainly disproven itself. (This can be applied on a much smaller scale as a [statistical power](!Wikipedia) analysis: asking to what extent the employed data could ever support the conclusion reached by a regular analysis.)
This point isn't always appreciated: when you have 2 contradicting claims or arguments, only 1 can be correct but the contradiction doesn't tell you *which* one is correct. You need to step outside the argument and find additional data or perspectives. From [Gary Drescher](!Wikipedia)'s [_Good and Real: Demystifying Paradoxes from Physics to Ethics_](/docs/2006-drescher-goodandreal.pdf):
> A paradox arises when two seemingly airtight arguments lead to contradictory conclusions - conclusions that cannot possibly both be true. It's similar to adding a set of numbers in a two-dimensional array and getting different answers depending on whether you sum up the rows first or the columns. Since the correct total must be the same either way, the difference shows that an error must have been made in at least one of the two sets of calculations. But it remains to discover at which step (or steps) an erroneous calculation occurred in either or both of the running sums. There are two ways to rebut an argument. We might call them countering and invalidating.
>
> - To counter an argument is to provide another argument that establishes the opposite conclusion.
> - To invalidate an argument, we show that there is some step in that argument that simply does not follow from what precedes it (or we show that the argument's premises - the initial steps - are themselves false).
>
> If an argument starts with true premises, and if every step in the argument does follow, then the argument's conclusion must be true. However, invalidating an argument - identifying an incorrect step somewhere-does not show that the argument's conclusion must be false. Rather, the invalidation merely removes that argument itself as a reason to think the conclusion true; the conclusion might still be true for other reasons. Therefore, to firmly rebut an argument whose conclusion is false, we must both invalidate the argument and also present a counterargument for the opposite conclusion.
>
> In the case of a paradox, invalidating is especially important. Whichever of the contradictory conclusions is incorrect, we've already got an argument to counter it - that's what makes the matter a paradox in the first place! Piling on additional counterarguments may (or may not) lead to helpful insights, but the counterarguments themselves cannot suffice to resolve the paradox. What we must also do is invalidate the argument for the false conclusion-that is, we must show how that argument contains one or more steps that do not follow.
>
> Failing to recognize the need for invalidation can lead to frustratingly circular exchanges between proponents of the conflicting positions. One side responds to the other's argument with a counterargument, thinking it a sufficient rebuttal. The other side responds with a counter-counterargument - perhaps even a repetition of the original argument - thinking it an adequate rebuttal of the rebuttal. This cycle may persist indefinitely. With due attention to the need to invalidate as well as counter, we can interrupt the cycle and achieve a more productive discussion.
An example from mathematics by [Timothy Gowers](!Wikipedia) ("Vividness in Mathematics and Narrative", in [_Circles Disturbed: The Interplay of Mathematics and Narrative_](http://www.amazon.com/Circles-Disturbed-Interplay-Mathematics-Narrative/dp/0691149046/)):
> ...a suggestion was made that [proofs by contradiction](!Wikipedia) are the mathematician's version of [irony](!Wikipedia). I'm not sure I agree with that: when we give a proof by contradiction, we make it very clear that we are discussing a counterfactual, so our words *are* intended to be taken at face value. But perhaps this is not necessary. Consider the following passage.
>
>> There are those who would believe that every polynomial equation with integer coefficients has a rational solution, a view that leads to some intriguing new ideas. For example, take the equation x² - 2 = 0. Let p/q be a rational solution. Then (p/q)² - 2 = 0, from which it follows that p² = 2q². The highest power of 2 that divides p² is obviously an even power, since if 2^k^ is the highest power of 2 that divides p, then 2^2k^ is the highest power of 2 that divides p². Similarly, the highest power of 2 that divides 2q² is an odd power, since it is greater by 1 than the highest power that divides q². Since p² and 2q² are equal, there must exist a positive integer that is both even and odd. Integers with this remarkable property are quite unlike the integers we are familiar with: as such, they are surely worthy of further study.
>
> I find that it conveys the irrationality of √2 rather forcefully. But could mathematicians afford to use this literary device? How would a reader be able to tell the difference in intent between what I have just written and the following superficially similar passage?
>
>> There are those who would believe that every polynomial equation has a solution, a view that leads to some intriguing new ideas. For example, take the equation x² + 1 = 0. Let i be a solution of this equation. Then i² + 1 = 0, from which it follows that i² = -1. We know that i cannot be positive, since then i² would be positive. Similarly, i cannot be negative, since i² would again be positive (because the product of two negative numbers is always positive). And i cannot be 0, since 0² = 0. It follows that we have found a number that is not positive, not negative, and not zero. Numbers with this remarkable property are quite unlike the numbers we are familiar with: as such, they are surely worthy of further study.
Indeed, how *would* a reader show the difference - why do we apply _modus tollens_ when we accept √2 must be irrational but then apply _modus ponens_ and accept [_i_](!Wikipedia "Imaginary unit") as being real in some sense? Do we simply appeal to the utility of using _i_, and say with Wittgenstein, "If a contradiction were now actually found in arithmetic - that would only prove that an arithmetic with *such* a contradiction in it could render very good service; and it would be better for us to modify our concept of the certainty required, than to say it would really not yet have been a proper arithmetic." But such use of priors may lead us to fanaticism:
> "An atheist familiar with biology and medicine has no reason to believe the biblical story of the resurrection. But a Christian who believes it by faith should not, according to [Plantinga](!Wikipedia "Alvin Plantinga"), be dissuaded by general biological evidence. Plantinga compares the difference in justified beliefs to a case where you are accused of a crime on the basis of very convincing evidence, but you know that you didn't do it. For you, the immediate evidence of your memory is not defeated by the public evidence against you, even though your memory is not available to others. Likewise, the Christian's faith in the truth of the gospels, though unavailable to the atheist, is not defeated by the secular evidence against the possibility of resurrection. Of course sometimes contrary evidence may be strong enough to persuade you that your memory is deceiving you. Something analogous can occasionally happen with beliefs based on faith, but it will typically take the form, according to Plantinga, of a change in interpretation of what the Bible means. This tradition of interpreting scripture in light of scientific knowledge goes back to Augustine, who applied it to the "days" of creation. But Plantinga even suggests in a footnote that those whose faith includes, as his does not, the conviction that the biblical chronology of creation is to be taken literally can for that reason regard the evidence to the contrary as systematically misleading. One would think that this is a consequence of his epistemological views that he would hope to avoid." --[Thomas Nagel](!Wikipedia), ["A Philosopher Defends Religion"](http://www.nybooks.com/articles/archives/2012/sep/27/philosopher-defends-religion/?pagination=false)
Characteristic of this philosophical use, we will often find instances of the disagreement anytime foundational issues like methodology comes up in a field; an example from [sociology](!Wikipedia) is provided by the classic paper ["The Iron Law Of Evaluation And Other Metallic Rules"](/docs/1987-rossi "Peter H. Rossi 1987") (emphasis added):
> A possibility that deserves very serious consideration is that there is something radically wrong with the ways in which we go about conducting evaluations. Indeed, this argument is the foundation of a revisionist school of evaluation, composed of evaluators who are intent on calling into question the main body of methodological procedures used in evaluation research, especially those that emphasize quantitative an particularly experimental approaches to the estimation of net impacts. The revisionists include such persons as Michael Patton (1980) and Ego Guba (1981). Some of the revisionists are reformed number crunchers who have seen the errors of their ways and have been reborn as qualitative researchers. Others have come from social science disciplines in which qualitative ethnographic field methods have been dominant. Although the issue of the appropriateness of social science methodology is an important one, so far the revisionist arguments fall far short of being fully convincing. At the root of the revisionist argument appears to be that *the revisionists find it difficult to accept the findings that most social programs, when evaluate for impact assessment by rigorous quantitative evaluation procedures, fail to register main effects: hence the defects must be in the method of making the estimates*. This argument per se is an interesting one, and deserves attention: all procedures need to be continually re-evaluated. There are some obvious deficiencies in most evaluations, some of which are inherent in the procedures employed. For example, a program that is constantly changing and evolving cannot ordinarily be rigorously evaluated since the treatment to be evaluate cannot be clearly defined. Such programs either require new evaluation procedures or should not be evaluated at all.
[Bryan Caplan](!Wikipedia) offers [an interesting example](http://online.wsj.com/article/SB10000872396390444180004578018480061824600.html "The Intelligence Boom: If your IQ is in the top bracket, your verbal smarts will be about as good in your 80s as they were in your teens") of its use in politics:
> Other sections-most notably [Mr. Flynn's](!Wikipedia "James R. Flynn") attack on the death penalty-are also tainted by serious left-wing bias. He is eager to argue that the "competent" murderers of 1960 were "mentally retarded" by modern standards. You could just as easily conclude, however, that the "mentally retarded" murderers of today are "competent" by the standards of 1960. Mr. Flynn briefly considers this argument, and objects that while IQ has risen, "practical intelligence" - the "ability to live autonomous lives" - hasn't. If so, the "Flynn effect" has no effect on this debate: Can't we simply use unadjusted IQ scores as a proxy for practical intelligence?
And [later commented](http://econlog.econlib.org/archives/2013/03/inescapable_int.html "Inescapable Intuition") on the issue of "common sense" & burdens of proof:
> Hasn't common sense been wrong before? Of course. But how do people show that a common sense view is wrong? By demonstrating a conflict with other views *even* more firmly grounded in common sense. The strongest scientific evidence can always be rejected if you're willing to say, "Our senses deceive us" or "Memory is never reliable" or "All the scientists have conspired to trick us." The only problem with these foolproof intellectual defenses is... that... they're... absurd.
See also ["Inherited Improbabilities: Transferring the Burden of Proof"](http://lesswrong.com/lw/35d/inherited_improbabilities_transferring_the_burden/) on application to the [Amanda Knox](!Wikipedia) case.
## Case Study: Testing Confirmation Bias
> Original [LessWrong](http://lesswrong.com/r/discussion/lw/c5f/case_study_testing_confirmation_bias/#comments) discussion
[Confirmation bias](http://en.wikipedia.org/wiki/Confirmation_bias) is one of the most common cognitive biases: it is putting excess weight on evidence that confirms your belief, and even ignoring evidence falsifying your belief. It's particularly dreadful because at no point does one say something that is actually *wrong* - you can build up an immaculate scientific case for a wrong position by cherry-picking evidence. (A funny example: ["Cigarette smoking: an underused tool in high-performance endurance training"](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3001541/ "Myers 2010").)
Confirmation bias is an issue in [self-experimentation](!Wikipedia)/[Quantified Self](!Wikipedia) because one is already at a disadvantage in evaluating the results - humans don't weigh them very well; [satt](http://lesswrong.com/r/discussion/lw/bs0/knowledge_value_knowledge_quality_domain/6dcn?context=1#6dbu) points out that (via the [Bienaymé formula](http://en.wikipedia.org/wiki/Variance#Sum_of_uncorrelated_variables_.28Bienaym.C3.A9_formula.29)) "An [RCT](!Wikipedia "Randomized controlled trial") with a sample size of e.g. 400 would still be 10 times better than 4 self-experiments by this metric." If one encountered 4 immaculately run self-experiments, I suspect they would *feel* like more evidence than 1/10th that RCT. When you toss in any selection effects (due to confirmation bias), the value of those 4 trials plunges even further.
[Fortunately](http://lesswrong.com/lw/im/hindsight_devalues_science/ "Hindsight Devalues Science"), just as there is a somewhat easy way to test for status quo bias, there's also a somewhat easy way to test for confirmation bias: simply present a high-quality result - with the reverse of the true outcome. Or present the initial data about the setup and whatnot, but hide the results. (The savvy will recognize this as similar to Robin Hanson's proposal for [result-blind peer review](http://www.overcomingbias.com/2012/04/who-wants-unbiased-journals.html).) Or better yet, present the same study with both egosyntonic and egodystonic. If the subject rates them differently, well, the only varying factor was the outcome... (One of the tests on [YourMorals.org](http://lesswrong.com/lw/8lk/poll_lesswrong_group_on_yourmoralsorg/) does just this by rewording a 'study' on gun control.) This is a strategy similar but not identical to a [Sokal affair](!Wikipedia), since the subjects in the Sokal affair could plausibly claim that they didn't understand the pseudo-physics and were trusting the word of a physics & mathematics professor in good standing - in a confirmation bias test, the subject must understand the material and reject it because it conflicts with his prior beliefs.
### Seth Roberts
> Amateur Science - \
> I do what I must, \
> because, I can. \
> For the good of all of us. \
> Except the ones who were tricked. \
> But there's no sense crying over the missing frills, \
> You just keep on trying until you run out of pills. \
> And the Science[^portal-thanks] was fun, \
> And you get neat posts done \
> For the people who are, still alive.
[^portal-thanks]: This parody has not been approved by the FDA for any medicinal purposes nor has it been replicated. I thank [Zack M. Davis](http://lesswrong.com/lw/c5f/case_study_testing_confirmation_bias/6hz5) for some excellent versifying suggestions.
[Seth Roberts](!Wikipedia) is a psychology professor and [blogger](http://blog.sethroberts.net/), famous for his unconventional diet proposed in his book _[The Shangri-La Diet](!Wikipedia)_; I've read his blog since ~2010, and found it filled with interesting self-experiment suggestions. He has also been mentioned [positively repeatedly on LessWrong](http://www.google.com/search?q=seth%20roberts%20site%3Alesswrong.com), for example in Eliezer Yudkowsky's post ["The Unfinished Mystery of the Shangri-La Diet"](http://lesswrong.com/lw/a6/the_mysteries_of_shangrila_dieting/). Roberts posts much material critical of mainstream psychology & medicine. Fair enough; I'm not a [huge fan](DNB FAQ#flaws-in-mainstream-science-and-psychology) either. But he also posts many anecdotes/interviews, most of which are unremittingly positive, and is willing to use sources - apparently not humorously - that I regard as utter cesspools. (For example, [Ayurvedic medicine](http://blog.sethroberts.net/2011/07/02/ancient-wisdom-butter-is-brain-food/ "Ancient Wisdom: Butter is Brain Food"), which you may remember as being keen on [heavy metal poisoning](!Wikipedia "Ayurveda#Safety").) All this began to make me wonder. Roberts has many publications and theories, so one might just read through them carefully and see how many were borne out; but unfortunately, few to no real trials have been done as far as I am able to tell. (When asked about the absence of trials in March 2010, Roberts pointed to [20 unpublished "case series"](http://andrewgelman.com/2010/03/clippin_it/#comment-53303 "Clippin' it") of his Shangri-La diet by a "professor at SUNY Upstate Medical Center".) Since it's not easy to directly check his work, one would have to resort to much less reliable & indirect methods.
#### Vitamin D
##### The First Experiment
Vitamin D is Roberts's latest theory, which he posted often on starting around December 2011. If you look through the [vitamin D](http://blog.sethroberts.net/category/sleep/vitamin-d3-and-sleep/) category on his blog, you will see that most/all of the anecdotes are non-randomized retrospective anecdotes over <7 days about generalities like wake-time or mood, with no real analysis. This is unfortunate as sleep data is very noisy, and self-reported unrecorded data even worse. But I found the idea interesting enough (and [completely uncovered](Zeo#background) in the academic literature I searched) to drop my [modafinil](Modafinil) experiments and start setting up some decent self-experiments.
While I was still running my first [vitamin D & sleep experiment](Zeo#vitamin-d), I emailed with Roberts about it beginning 24 January 2012; back in June, he had [written favorably](http://web.archive.org/web/20120126152237/http://blog.myzeo.com/how-the-other-person-sleeps-seth-roberts-on-christine-petersons-zeo-research/ "How the Other Person Sleeps: Seth Roberts on Christine Peterson's Zeo Research") on Zeos. Ge seemed interested and asked for help interpreting the Zeo data set (which I had provided as a CSV export). His main reaction was that I was only testing that vitamin D in the evening was damaging my sleep, and this was not interesting to him since he was suggesting that vitamin D in the *morning* would help sleep. I was randomizing days, which meant there could be multi-day effects that contaminated the results. My blinding was too complicated, specifically my attempt to keep vitamin D consumption constant (so as to not confound levels of vitamin D with timing). He said he had done a _t_-test on the data I had posted so far, and the effect was there but not statistically-significant; he also said in that email that he didn't trust the Zeo summary score (ZQ) of sleep length/awakenings/composition. When I finished and posted my analysis that the damage had indeed reached significance for multiple metrics, Roberts said it was interesting work and he'd link it.
##### The Second Experiment
I had intended to stop there and go back to modafinil, but it got posted to Hacker News and apparently people were interested in it and what vitamin D in the morning would do. So I emailed Roberts again, after re-reading his criticisms, and proposed a design for a morning experiment: 5-day blocks, randomized as before, recorded the morning after, for a full 50 days. After some back and forth, we settled on 7-day paired blocks. He would have preferred that every pair of weeks be blinded, but I didn't have that many Tupperwares on hand; he didn't mention any criticism of Zeo data.
So I created the 50 active & placebo pills, and started the experiment. This time I did not post any data publicly.
Things proceeded reasonably well, and Friday (28 April 2012) was the last day. On Saturday, I uploaded the Zeo data, annotated the days by active/placebo, put in the 1-5 Mood ranking, and ran the same R functions. To my considerable surprise, of the 9 metrics, only *1* reached significance ('Morning Feel' - how I feel when I wake up in the morning, cruddy or refreshed, 1-5), but it was very statistically-significant (_p_=0.005, survived multiple correction) and also a strong effect-size too (_d_=0.7). By a remarkable coincidence, Roberts posted [his own results](http://blog.sethroberts.net/2012/04/28/effect-of-vitamin-d3-on-my-sleep/ "Effect of Vitamin D3 on My Sleep") that day, and found little effect but this one:
> When I woke up in the morning I rated how rested I felt on a 0-100 scale, where 0 = not rested at all and 100 = completely rested. I'd been using this scale for years. Here are the results (means and standard errors):
>
> ![rested fraction versus treatment (IU dose)](/images/zeo/2012-sethroberts-restedratingvsvitamind.jpg "http://blog.sethroberts.net/wp-content/uploads/2012/04/2012-01-17-rested-ratings-vs-Vitamin-D-1024x575.jpg")
>
> Vitamin D3 had a clear effect, but the necessary dose was more than 2000 IU. If Vitamin D3 acts like sunlight, you might think that taking it in the morning would make me wake up earlier. Here are the results for the time I woke up:...There was no clear effect of dosage on when I got up. Shifting the time from 8 am to 9 am may have had an effect (I wish I had 3 more days at 9 am).
I take 5000 IU. 'rested' seems pretty much identical to 'Morning Feel'. Very promising!
##### The Test
But I had kept my data private for a reason: so I could edit it. I tampered with the high marks in 'Morning Feel', and left it with a non-significant small increase. I published the supposed results to `gwern.net`, and send an email to Seth Roberts, principally reading:
> I was surprised to see http://blog.sethroberts.net/2012/04/28/effect-of-vitamin-d3-on-my-sleep/ today, but the timing is remarkably apt - yesterday was the last data day for my morning experiment (you remember helping me design it, I hope!), and I was in the middle of processing and then analyzing my results: http://www.gwern.net/Zeo#morning-analysis (If you don't see the analysis, you may need to force-reload.)
>
> So not to mince words, the upshot is that none of the metrics showed any significance. The best _p_-value is like 0.3. Given that it is a pretty good quality data set and the Zeo is a lot more reliable than writing things down or whatever your informants were doing, and I followed your suggestions on experimental design, I'd say this suggests that improvements from vitamin D are just noise/selection effects or alternately, reflect the very genuine improvement caused by *not* taking vitamin D in the evening and messing with your sleep. Anyway, can I assume you will post a link to it on your blog? I'd like to see what the other commentators make of it.
Then, I registered [a prediction of 70% on PredictionBook.com](http://predictionbook.com/predictions/6737) for the next 3 days: "Seth Roberts will not post a blog post on my (supposedly) null result for taking vitamin D in the morning."
Roberts replied within the hour, asking why I trusted Zeo data, when he thought his misreported when he went to sleep and there were negative reviews on Amazon. (I am not quoting him because I asked later whether I could, and he declined, since he didn't want to hurt a struggling company like Zeo Inc.; Zeo would shut down in 2013.)
I can't say I was surprised that he tried to deny the results (see above), but I was disappointed. There were many possible replies to my report - null results are *expected* in _n_=1 experiments, for example, given that many people's responses will be idiosyncratic or their self-experiments underpowered. I mentioned my car being totaled and ruining that week (and the next), which was a plausible reason as well. And so on. But instead, he went after the Zeo. (I suppose I should be glad he didn't resort to ad hominems like 'you must have screwed something up', although ironically, that was actually the right class of explanations for the data!) I replied:
> [why I trust the Zeo data:] Principally [the papers](http://web.archive.org/web/20130515062508/http://www.myzeo.com/sleep/validation) comparing the Zeo data to [polysomnography](!Wikipedia), which is of course the gold standard. They give the Zeo something like 75% accuracy in the sense of giving the same state classification per time unit.
>
> I've seen the Amazon page, and what I would ask is what are they comparing the Zeo to? "One man's [_modus ponens_ is another man's _modus tollens_](#modus-tollens-vs-modus-ponens)", as we philosophy types like to say. The study of sleep is replete with subjective illusions where people are simply flat out wrong about sleep: they forget long intervals before going to sleep, they underestimate number of wakings [[among other things](http://web.archive.org/web/20121124084214/http://www.myzeo.com/sleep/knowledge-center/articles/sleep-forgetting-remember-forget)], [false awakening](!Wikipedia) is a real phenomenon, sense of time is distorted\*, [hypnagogic](!Wikipedia) illusions abound, etc.
>
> (I'd compare it to saying 'well, horoscopes work for *me*', except that's a little unfair to the Zeo critics - as I said, there is a real mistake rate and this is one reason to use datasets like 50 days rather than 1 or 2.)
>
> \* [lucid dreaming](!Wikipedia) offers the best example. Some of [LaBerge's](!Wikipedia "Stephen LaBerge") experiments involved the subject when lucid moving his eyes - externally detectable on EEG - and waiting a fixed time and then moving his eyes again. The times were considerably wrong, which makes sense since we've all had dreams where one 'experiences' hours or years, even though dreams/REM intervals only last a few minutes.
Roberts dismissed the papers as upper bounds and designed to produce best possible results (eg. use of new headbands - not that I noticed degradation of data after months of use, and as it happens, I replaced my headband shortly before starting); he would take it seriously only if the ZQ correlated with how he felt on waking or during the day. (ZQ was only one of the 9 metrics, and 2 of the metrics did correspond to those two examples.) This was a little surprising because he had previously seemed [positive about Zeo results](http://blog.sethroberts.net/2011/06/04/christine-petersons-zeo-research/) and had not criticized it - indeed, as of 1 June 2013, he still has yet to post or publish anything I am aware of listing criticisms of Zeo-based data or why one would not trust it.
I decided to drop the conversation there: I had heard enough. All I wanted now was permission to quote him, which he denied (see above).
As far as I know, before posting this my fake results were never linked or discussed publicly by Roberts. He has mentioned no links, I have seen nothing relevant in my RSS subscription, and Google Analytics reports no referrals. I was prepared to immediately correct the page if there was any activity, but there wasn't^[I was monitoring my email and RSS closely the night I sent the email. I had already written and proofread the real version, and written an email explaining I had discovered a mistake; so to change my site was just a matter of issuing a single revision-control command (`darcs rollback`) & re-syncing my site, and then sending the email. I don't think the bad version would have been up for more than 10 or 20 minutes past him posting or replying clearly that he would post it.]. He had posted multiple fresh blog posts since, such as [another anecdotal interview](http://blog.sethroberts.net/2012/04/29/interview-with-a-shangri-la-dieter/ "Interview with a Shangri-La Dieter") on the Shangri-La Diet (based on a forum posting 2 days previously).
(One might wonder whether my experiment is itself an example of confirmation bias - had Roberts linked it, would I still be posting it? I hope so; besides the private PredictionBook.com prediction, I gave a hash pre-commitment to AngryParsley, told another LWer about my plan months ago, discussed it in `#lesswrong`, and alluded to it in some of my LW comments.)
### Roberts's reply
In 5 emails:
> it's really bad behavior, after I say I don't want to be quoted, to quote and paraphrase me.
>
> why I haven't yet commented on your results: because I haven't yet studied them. Not because they were negative.
>
> I collected my own data in November. It took me four months to post it. So forgive me for not commenting on yours within 3 days.
>
> I never looked at your (fake) data. I simply reacted to your email. Please note the absence of Zeo results in my own data (and I have a Zeo). I analyzed my own data -- ignoring the Zeo stuff -- long before your email.
>
> You write as if as soon as I heard your (fake) results were negative, I dismissed them by questioning your Zeo. Actually, my email to you, reacting to your (fake) Vitamin D results, began "before I make a longer comment". Those were its very first words. You failed to wait for that longer comment before passing judgment. Nor do you say anything about it in your description of what I did.
>
> although you told me (August 2011) that your one-legged standing data supported my claims I have not yet blogged about it. I have not yet analyzed your data, either. This does not support your idea that if I don't analyze or report some data, it must be due to confirmation bias.
>
> You really won't take down information that I said you couldn't quote? After I ask you to?
(I have pointed out to him that I have scrupulously *not* quoted him, but only paraphrased him.)
### Final thoughts
So, Roberts is not very good about reporting results contradicting his theories. This is useful to keep in mind: he has a lot of ideas, a lot of which will be false, and one should treat them with due caution. This is also interesting because I believe Roberts has acknowledged as much in his papers defending self-experimentation (defending them as acceptable because they can be iterated on quickly and the end results will be valuable) - which is just another demonstration of the old observation that you can know all about a bias, and still succumb anyway.
I doubt I will do many more confirmation tests in the future, if any at all:
- This little test has used up a good deal of my time, and delayed me from posting my real results.
- Roberts is deeply offended by this, so I can forget about getting advice on setting up self-experiments from him in the future. (I made sure to get advice on my pending [lithium self-experiment](Nootropics#lithium-experiment) before I finished, but what about future self-experiments?)
- And testing confirmation bias in this fashion is intrinsically deceptive, so I probably have damaged my online reputation as well.
But if I were? My experiment has been fiercely criticized by Roberts and the [LessWrong commenters](http://lesswrong.com/lw/c5f/case_study_testing_confirmation_bias/#comments), and they have given me multiple ways I could have done better: