This repository has been archived by the owner on Feb 20, 2021. It is now read-only.
forked from snowplow/snowplow
-
Notifications
You must be signed in to change notification settings - Fork 3
/
CHANGELOG
1092 lines (1032 loc) · 60.7 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Release 61 Pygmy Parrot (2015-03-02)
------------------------------------
Common: bumped VERSION file to r61-pygmy-parrot
Common: added Gradle to up.playbooks (#1270)
Common: added .travis.yml file and Travis button to repo (#1359)
Common: added Release button to README (#1428)
Common: added License button to README (#1427)
Clojure Collector: bumped to 1.0.0
Clojure Collector: updated access-valve to depend on Tomcat 8 classes (#1203)
Clojure Collector: updated .ebextensions to depend on Tomcat 8 (#1202)
Clojure Collector: added ability to disable third-party cookies (#1362)
Clojure Collector: added CORS support (#1146)
Clojure Collector: added CORS-style support for ActionScript3 Tracker (#1330)
Clojure Collector: added support for /:vendor/:version to HEAD (#1166)
Clojure Collector: now using UTF-8 for character encoding throughout (#1354)
Scala Common Enrich: bumped to 0.12.0
Scala Common Enrich: updated SnowplowAdapter to accept "charset=UTF-8" (#1424)
Scala Common Enrich: Base64 decoding does not specify UTF-8 charset (#1403)
Scala Common Enrich: removed incorrect extra layer of URL decoding from non-Bas64-encoded JSONs (#1396)
Scala Common Enrich: added support for ti_nm for transaction item name as well as ti_na (#1401)
Scala Common Enrich: added CloudfrontAccessLogAdapter (#1282)
Scala Common Enrich: made timestamp field of CollectorPayload an Option (#1417)
Scala Hadoop Enrich: bumped to 0.13.0
Scala Hadoop Enrich: bumped Scala Common Enrich to 0.12.0 (#1395)
Scala Hadoop Enrich: added test for non-Base64-encoded JSON (#1394)
Scala Hadoop Enrich: updated tests to include Unicode (#1390)
Scala Hadoop Enrich: added integration test for CloudfrontAccessLogAdapter (#1423)
Scala Hadoop Bad Rows: removed .travis.yml (#1382)
EmrEtlRunner: bumped to 0.12.0
EmrEtlRunner: now appending region name to Clojure Collector log files (#1379)
EmrEtlRunner: added support for moving and archiving timestamped Clojure Collector log files (#1400)
EmrEtlRunner: now appending rather than prepending instance names to Clojure Collector log files (#1404)
EmrEtlRunner: changed Clojure Collector log timestamp format to match CloudFront logs (#1398)
EmrEtlRunner: added dedicated return code for no files to process (#1397)
EmrEtlRunner: now allowing tsv/*/* and json/*/* as :etl:collector_format (#1284)
EmrEtlRunner: now performing S3DistCp from processing for tsv/com.amazon.aws.cloudfront/* (#1431)
EmrEtlRunner: added output directory empty check prior to staging step (#1151)
StorageLoader: updated shell script to only run StorageLoader if EmrEtlRunner found files (#1399)
StorageLoader: wrote JSON Path file for a com.snowplowanalytics.snowplow/flash_context (#1305)
StorageLoader: wrote JSON Path file for a com.snowplowanalytics.snowplow/timing event (#1388)
StorageLoader: wrote JSON Path file for a com.amazon.aws.cloudfront/wd_access_log event (#1285)
StorageLoader: wrote JSON Path file for a com.google.analytics/cookies context (#1409)
StorageLoader: wrote JSON Path file for a com.snowplowanalytics.snowplow/desktop_context (#1421)
Redshift: added Redshift DDL for a com.snowplowanalytics.snowplow/timing event (#1387)
Redshift: added Redshift DDL for a com.snowplowanalytics.snowplow/flash_context (#1304)
Redshift: added Redshift DDL for a com.amazon.aws.cloudfront/wd_access_log event (#1286)
Redshift: added Redshift DDL for a com.google.analytics/cookies context (#1408)
Redshift: added Redshift DDL for a com.snowplowanalytics.snowplow/desktop_context (#1420)
Release 60 Bee Hummingbird (2015-02-03)
---------------------------------------
Common: added VERSION file in root to assist vagrant push (#1293)
Common: added vagrant push scripting to publish Kinesis apps (#1288)
Common: added lzo.yml to up.playbooks (#1325)
Thrift Raw Event: bumped Thrift version to 0.9.1 (#1225)
Thrift Raw Event: added collector-payload-1 and schema-sniffer-1 (#1322)
Thrift Raw Event: created a subproject for each Thrift class (#1298)
Thrift Raw Event: updated README and project description to reflect new structure (#1300)
Thrift Raw Event: renamed to thrift-schemas (#1299)
Scala Stream Collector: bumped to 0.3.0
Scala Stream Collector: started sending CollectorPayloads instead of SnowplowRawEvents (#1226)
Scala Stream Collector: added support for POST requests (#187)
Scala Stream Collector: added support for any {api-vendor}/{api-version} for GET and POST (#652)
Scala Stream Collector: stopped decoding URLs (#1217)
Scala Stream Collector: changed 1x1 pixel response to use a stable GIF (#1260)
Scala Stream Collector: renamed default.conf to config.hocon.sample (#1243)
Scala Stream Collector: started using ThreadLocal to handle Thrift serialization, thanks @denismo and @pkallos! (#1254)
Scala Stream Collector: added healthcheck for load balancers, thanks @duncan! (#1360)
EmrEtlRunner: bumped to 0.11.0
EmrEtlRunner: added "thrift" collector format (#1301)
EmrEtlRunner: implemented time_diff manually (#1310)
EmrEtlRunner: fixed failure reporting when jobflow step(s) created_at is nil (#1351)
Scala Common Enrich: bumped to 0.11.0
Scala Common Enrich: added schema-sniffer-1 and collector-payload-1 dependencies (#1296)
Scala Common Enrich: bumped user-agent-utils version to 1.14 (#1224)
Scala Common Enrich: changed EnrichedEvent field name to ip_organization (#1145)
Scala Common Enrich: changed "thrift" to "thrift-raw" in Loader object (#1302)
Scala Common Enrich: added tests for getLoader function (#558)
Scala Hadoop Enrich: bumped to 0.12.0
Scala Hadoop Enrich: bumped Scala Common Enrich to 0.11.0 (#1294)
Scala Hadoop Enrich: added collector-payload-1 and snowplow-thrift-raw-event as test dependencies (#1248)
Scala Hadoop Enrich: added support for processing Thrift raw events, thanks @pkallos! (#538)
Scala Hadoop Enrich: added tests to Hadoop Enrich for processing Thrift raw events (#559)
Scala Kinesis Enrich: bumped to 0.3.0
Scala Kinesis Enrich: bumped Scala Common Enrich to 0.11.0 (#1295)
Scala Kinesis Enrich: renamed default.conf to config.hocon.sample (#1242)
Kinesis Elasticsearch Sink: added LICENSE-2.0.txt (#1329)
Kinesis LZO S3 Sink: added. Version 0.1.0, thanks @pkallos! (#1016)
Version 0.9.14 (2014-12-31)
---------------------------
Common: added dedicated Vagrant setup (#1266)
Common: added Quickstart section to README (#1268)
Common: added script to sync region-specific Snowplow Hosted Assets buckets (#1269)
CloudFront Collector: replaced 1x1 pixel with stable GIF (#1259)
Clojure Collector: bumped to 0.9.1
Clojure Collector: increased Tomcat's HTTP header tolerance to 64kB (#1249)
Clojure Collector: changed 1x1 pixel response to use a stable GIF (#1258)
EmrEtlRunner: bumped to 0.10.0
EmrEtlRunner: removed hyphen from the pattern match for Clojure Collector logs (#1194)
EmrEtlRunner: on job failure, log overall jobflow and individual step statuses (#1153)
Scala Common Enrich: bumped to 0.10.0
Scala Common Enrich: bumped Scala Iglu Client to 0.2.0 (#1222)
Scala Common Enrich: updated SnowplowAdapter to accept payload_data versions above 1-0-0 (#1220)
Scala Common Enrich: updated SnowplowAdapter to make charset=utf-8 optional (#1257)
Scala Common Enrich: added Adapter to pre-process Pingdom events (#1164)
Scala Common Enrich: added Adapter to pre-process PagerDuty events (#1158)
Scala Common Enrich: added Adapter to pre-process Mandrill events (#1061)
Scala Hadoop Enrich: bumped to 0.11.0
Scala Hadoop Enrich: bumped Scala Common Enrich to 0.10.0 (#1223)
Scala Hadoop Enrich: added test job for PingdomAdapter (#1176)
Scala Hadoop Enrich: added test job for PagerdutyAdapter (#1175)
Scala Hadoop Enrich: added test job for MandrillAdapter (#1171)
Scala Hadoop Enrich: added test job for more relaxed payload_data schema matching (#1235)
Scala Hadoop Shred: bumped to 0.3.0
Scala Hadoop Shred: bumped Scala Common Enrich to 0.10.0 (#1236)
Scala Hadoop Shred: bumped Iglu Scala Client to 0.2.0 (#1230)
Scala Hadoop Shred: loosened match criteria for unstructured events and contexts (#1231)
StorageLoader: wrote JSON Path file for com.pingdom/incident_notify_of_close event (#1182)
StorageLoader: wrote JSON Path file for com.pingdom/incident_assign event (#1181)
StorageLoader: wrote JSON Path file for com.pingdom/incident_notify_user event (#1251)
StorageLoader: wrote JSON Path file for com.pagerduty/incident event (#1177)
StorageLoader: wrote JSON Path file for com.mandrill/message_sent event (#1059)
StorageLoader: wrote JSON Path file for com.mandrill/message_bounced event (#1058)
StorageLoader: wrote JSON Path file for com.mandrill/message_opened event (#1057)
StorageLoader: wrote JSON Path file for com.mandrill/message_marked_as_spam event (#1056)
StorageLoader: wrote JSON Path file for com.mandrill/message_delayed event (#1055)
StorageLoader: wrote JSON Path file for com.mandrill/message_soft_bounced event (#1054)
StorageLoader: wrote JSON Path file for com.mandrill/message_clicked event (#1053)
StorageLoader: wrote JSON Path file for com.mandrill/message_rejected event (#1052)
StorageLoader: wrote JSON Path file for com.mandrill/recipient_unsubscribed event (#1051)
Redshift: added Redshift DDL for a com.pingdom/incident_notify_of_close event (#1180)
Redshift: added Redshift DDL for a com.pingdom/incident_assign event (#1179)
Redshift: added Redshift DDL for a com.pingdom/incident_notify_user (#1252)
Redshift: added Redshift DDL for a com.pagerduty/incident event (#1178)
Redshift: added Redshift DDL for a com.mandrill/message_sent event (#1050)
Redshift: added Redshift DDL for a com.mandrill/message_bounced event (#1049)
Redshift: added Redshift DDL for a com.mandrill/message_opened event (#1048)
Redshift: added Redshift DDL for a com.mandrill/message_marked_as_spam event (#1047)
Redshift: added Redshift DDL for a com.mandrill/message_delayed event (#1046)
Redshift: added Redshift DDL for a com.mandrill/message_soft_bounced event (#1045)
Redshift: added Redshift DDL for a com.mandrill/message_clicked event (#1044)
Redshift: added Redshift DDL for a com.mandrill/message_rejected event (#1043)
Redshift: added Redshift DDL for a com.mandrill/recipient_unsubscribed event (#1042)
Redshift: removed trailing commas from com.mailchimp SQL table definitions (#1174)
Version 0.9.13 (2014-12-01)
---------------------------
Scala Common Enrich: bumped to 0.9.1
Scala Common Enrich: added error handling for Netaporter URI parsing (#1216)
Scala Kinesis Enrich: bumped to 0.2.1
Scala Kinesis Enrich: bumped Scala Common Enrich to 0.9.1
Scala Kinesis Enrich: fixed conflict with Specs2 version, thanks @knservis! (#1213)
Scala Hadoop Enrich: bumped to 0.10.1
Scala Hadoop Enrich: bumped Scala Common Enrich to 0.9.1
Deleted test-file in repository root (#1219)
Version 0.9.12 (2014-11-26)
---------------------------
Scala Stream Collector: bumped to 0.2.0
Scala Stream Collector: changed organization to "com.snowplowanalytics" (#1168)
Scala Stream Collector: made the --config option mandatory (#1128)
Scala Stream Collector: added ability to set AWS credentials from environment variables (#1116)
Scala Stream Collector: now enforcing Java 7 for compilation (#1068)
Scala Stream Collector: increased request character limit to 32768 (#987)
Scala Stream Collector: improved performance by using Future, thanks @pkallos! (#580)
Scala Stream Collector, Scala Kinesis Enrich: made endpoint configurable, thanks @sambo1972! (#978)
Scala Stream Collector, Scala Kinesis Enrich: added support for IAM roles, thanks @pkallos! (#534)
Scala Stream Collector, Scala Kinesis Enrich: replaced stream list with describe to tighten permissions, thanks @pkallos! (#535)
Scala Kinesis Enrich: bumped to 0.2.0
Scala Kinesis Enrich: bumped Scala Common Enrich to 0.9.0
Scala Kinesis Enrich: changed organization to "com.snowplowanalytics" (#1167)
Scala Kinesis Enrich: made the --config option mandatory (#1126)
Scala Kinesis Enrich: updated instructions in README (#1125)
Scala Kinesis Enrich: added ability to set AWS credentials from environment variables (#1117)
Scala Kinesis Enrich: now enforcing Java 7 for compilation (#1067)
Scala Kinesis Enrich: replaced printlns with Java Logger (#521)
Scala Kinesis Enrich: started sending bad records to a separate stream (#463)
Scala Kinesis Enrich: added page_url and page_referrer back into enrichment output (#686)
Scala Kinesis Enrich: stopped opening a new file for each enriched event, thanks @pkallos! (#714)
Scala Common Enrich: bumped to 0.9.0
Scala Common Enrich: added BadRow from Scala Hadoop Enrich (#1118)
Scala Common Enrich: added ability to override collector-set nuid with tracker-set tnuid (#1095)
Scala Common Enrich: made URI parsing more permissive using NetAPorter's URI library, thanks @rupeshmane! (#1172)
Scala Hadoop Enrich: bumped to 0.10.0
Scala Hadoop Enrich: bumped Scala Common Enrich to 0.9.0
Scala Hadoop Enrich: moved BadRow into Scala Common Enrich (#1119)
Scala Hadoop Enrich: updated README with new Snowplow capitalization (#1127)
Kinesis Elasticsearch Sink: added. Version 0.1.0
Version 0.9.11 (2014-11-10)
---------------------------
Clojure Collector: bumped to 0.9.0
Clojure Collector: add support for /:vendor/:version to GET (#1131)
Scala Common Enrich: bumped to 0.8.0
Scala Common Enrich: bumped json4s to 3.2.11 (#1141)
Scala Common Enrich: bumped Scala Iglu Client to 0.1.1 (#1140)
Scala Common Enrich: removed check that POST request has body and content-type (#1132)
Scala Common Enrich: moved payload API detection into CollectorApi.parse (#1113)
Scala Common Enrich: fixed bug in CljTomcatLoader expecting request body to be "_" instead of "-" (#1112)
Scala Common Enrich: added Adapter to pre-process CallRail events (#1108)
Scala Common Enrich: added Adapter to pre-process MailChimp events (#1086)
Scala Common Enrich: added Adapter to pre-process Iglu-compatible events (#1060)
Scala Hadoop Enrich: bumped to 0.9.0
Scala Hadoop Enrich: added job test for unrecognized api name/version (#1115)
Scala Hadoop Enrich: updated DiscardableCfLinesSpec given /not-ice.png is no longer discarded (#1114)
Scala Hadoop Enrich: added test job for MailchimpAdapter (#1159)
Scala Hadoop Enrich: added test job for CallrailAdapter (#1160)
Redshift: removed not null constraint on change_form's value column (#1162)
Redshift: added Redshift DDL for a com.callrail/call_complete event (#1110)
Redshift: added Redshift DDL for a com.mailchimp/campaign_sending_status event (#1085)
Redshift: added Redshift DDL for a com.mailchimp/cleaned_email event (#1084)
Redshift: added Redshift DDL for a com.mailchimp/email_address_change event (#1083)
Redshift: added Redshift DDL for a com.mailchimp/profile_update event (#1082)
Redshift: added Redshift DDL for a com.mailchimp/unsubscribe event (#1081)
Redshift: added Redshift DDL for a com.mailchimp/subscribe event (#1080)
StorageLoader: wrote JSON Path file for com.callrail/call_complete event (#1109)
StorageLoader: wrote JSON Path file for com.mailchimp/campaign_sending_status event (#1079)
StorageLoader: wrote JSON Path file for com.mailchimp/cleaned_email event (#1078)
StorageLoader: wrote JSON Path file for com.mailchimp/email_address_change event (#1077)
StorageLoader: wrote JSON Path file for com.mailchimp/profile_update event (#1076)
StorageLoader: wrote JSON Path file for com.mailchimp/unsubscribe event (#1075)
StorageLoader: wrote JSON Path file for com.mailchimp/subscribe event (#1074)
Version 0.9.10 (2014-11-06)
---------------------------
StorageLoader: wrote JSON Path file for PerformanceTiming (#1147)
StorageLoader: wrote JSON Path file for social_interaction (#1029)
StorageLoader: wrote JSON Path file for site_search (#1027)
StorageLoader: wrote JSON Path file for change_form (#1025)
StorageLoader: wrote JSON Path file for submit_form (#1023)
StorageLoader: wrote JSON Path file for remove_from_cart (#1021)
StorageLoader: wrote JSON Path file for add_to_cart (#1019)
Redshift: converted all Redshift DDLs to use tabs (#1034)
Redshift: added Redshift DDL for PerformanceTiming (#1032)
Redshift: added Redshift DDL for social_interaction (#1030)
Redshift: added Redshift DDL for site_search (#1028)
Redshift: added Redshift DDL for change_form (#1026)
Redshift: added Redshift DDL for submit_form (#1024)
Redshift: added Redshift DDL for remove_from_cart (#1022)
Redshift: added Redshift DDL for add_to_cart (#1020)
Version 0.9.9 (2014-10-27)
--------------------------
.NET Tracker: added git submodule. Version 0.1.0 (#1000)
PHP Tracker: added git submodule. Version 0.1.0 (#1013)
Clojure Collector: bumped to 0.8.0
Clojure Collector: fixed regression in log record format caused by #854 (#992)
Clojure Collector: correctly handles multiple IPs in X-Forwarded-For (#970)
StorageLoader: bumped to 0.3.3
StorageLoader: selecting Snowplow's hosted-assets bucket based on region (#1012)
EmrEtlRunner: bumped to 0.9.2
EmrEtlRunner: no rows to process now returns 0, not 1 (#1018)
EmrEtlRunner: fixed bug where --process-enrich doesn't work, thanks @kingo55! (#1089)
EmrEtlRunner: now checking that output directories are empty before running (#1124)
Scala Common Enrich: bumped to 0.7.0
Scala Common Enrich: bumped scala-maxmind-iplookups to 0.2.0 (#1002)
Scala Common Enrich: added support for non-GA campaign attribution: phase 1 (#402)
Scala Common Enrich: rewrote AttributionEnrichments tests as RefererParserEnrichment tests (#974)
Scala Common Enrich: allow but downcase a-f characters in incoming event_id (#1006)
Scala Common Enrich: extract useragent from ua parameter (#1011)
Scala Common Enrich: fixed issue where unset integer fields throw an NPE (#570)
Scala Common Enrich: fixed issue where unset double fields throw an NPE (#1062)
Scala Common Enrich: added tests for ConversionUtils.stringToJInteger (#1064)
Scala Common Enrich: now enforcing Java 7 for compilation (#1065)
Scala Hadoop Enrich: bumped to 0.8.0
Scala Hadoop Enrich: bumped Scala Common Enrich to 0.7.0 (#995)
Scala Hadoop Enrich: added test for empty integer and double fields to ensure no NPE thrown (#1063)
Scala Hadoop Enrich: now enforcing Java 7 for compilation (#1066)
Scala Hadoop Enrich: updated test jobs to reflect updated useragent parsing (#1070)
Version 0.9.8 (2014-09-18)
--------------------------
iOS Tracker: added git submodule. Version 0.1.1 (#982)
Android Tracker: added git submodule. Version 0.1.1 (#983)
Clojure Collector: bumped to 0.7.0
Clojure Collector: merged snowplow/tomcat-cf-access-log-valve into Snowplow as clojure-collector/access-valve (#898)
Clojure Collector: bumped access-valve to 0.1.0
Clojure Collector: changed access-valve's package path to com.snowplowanalytics.snowplow.collectors.clojure.accessvalve (#924)
Clojure Collector: changed access-valve to use Gradle (#899)
Clojure Collector: changed access-valve to publish to war-resources/.ebextensions (#900)
Clojure Collector: updated access-valve and added web.xml to log request body and content type (#901)
Clojure Collector: fixed empty querystring in access-valve (#938)
Clojure Collector: fixed IP address forwarding for VPC-based environments (#854)
Clojure Collector: added support for API vendor and version in routing (#925)
Clojure Collector: added support for POST as well as GET (#654)
Scala Stream Collector: fixed broken link to `thrift-raw-event`, thanks @bamos! (#955)
Scala Common Enrich: bumped to 0.6.0
Scala Common Enrich: split out Clojure and CloudFront Collector event processing (#943)
Scala Common Enrich: added CljTomcatLoaderSpec tests (#963)
Scala Common Enrich: filtering non-GETs from CloudfrontLoader (#944)
Scala Common Enrich: replaced all Argonaut code with json4s (#945)
Scala Common Enrich: renamed CanonicalOutput to EnrichedEvent (#964)
Scala Common Enrich: replaced CanonicalInput and TrackerPayload with CollectorPayload and RawEvent (#946)
Scala Common Enrich: updated EnrichmentManager to process RawEvent not CanonicalInput (#903)
Scala Common Enrich: added Snowplow Tp2 Adapter to convert event JSON to NEL of RawEvents (#904)
Scala Common Enrich: geo-IP lookup now supports ip parameter on querystring (#961)
Scala Common Enrich: IP address anonymization now works with ip parameter on querystring (#960)
Scala Hadoop Enrich: bumped to 0.7.0
Scala Hadoop Enrich: bumped to Scala Common Enrich 0.6.0 (#940)
Scala Hadoop Enrich: updated to support generating multiple enriched events from one raw payload (#902)
StorageLoader: wrote JSON Path file for mobile_context (#776)
StorageLoader: wrote JSON Path file for geolocation_context (#962)
Redshift: added Redshift DDL for mobile_context (#542)
Redshift: added Redshift DDL for geolocation_context (#950)
Version 0.9.7 (2014-09-02)
--------------------------
Ruby Tracker: bumped git submodule to 0.3.0 (#939)
Java Tracker: bumped git submodule to 0.5.1 (#948)
Node.js Tracker: added git submodule. Version 0.1.0 (#949)
Trackers: fixed broken git submodule links, thanks @OAGr! (#957)
EmrEtlRunner: bumped to 0.9.1
EmrEtlRunner: fixed @jobflow.ec2_subnet_id not being set due to incorrect guard, thanks @rslifka! (#956)
EmrEtlRunner: fixed bugs in --process-bucket (#973)
EmrEtlRunner: renamed --process-bucket option to --process-enrich (#972)
EmrEtlRunner: changed -s option for --skip to -x prevent clash with -s for --start (#975)
EmrEtlRunner: now allows shredding without prior enrichment (#927)
StorageLoader: bumped to 0.3.2
StorageLoader: removed EMPTYASNULL for loading JSONs (#942)
StorageLoader: added missing targetUrl field to ad_impression JSON Path file, thanks @gisripa! (#951)
StorageLoader: made providing jsonpath_assets optional (#958)
StorageLoader: added support for cross-region Redshift COPY (#971)
Hive Storage: bumped table-def.q to 0.2.0
Hive Storage: added and removed fields to synchronize with 0.9.6's enriched event format (#965)
Scala Hadoop Shred: bumped to version 0.2.1
Scala Hadoop Shred: fixed multiple JSONs not being shredded for a single row (#968)
Scala Hadoop Shred: strengthened test suite (#967)
Version 0.9.6 (2014-07-26)
--------------------------
Java Tracker: bumped git submodule to 0.4.0 (#892)
EmrEtlRunner: bumped to 0.9.0
EmrEtlRunner: passed etl_tstamp into Hadoop Enrich as an argument (#396)
EmrEtlRunner: removed enrichment-specific code (#811)
EmrEtlRunner: removed enrichment-specific parameters from config.yml.sample (#809)
EmrEtlRunner: replaced enrichment-specific arguments from EmrEtlRunner (#808)
EmrEtlRunner: removed %3D code following Scalding upgrade (#849)
EmrEtlRunner: fixed contract on partition_by_run (#894)
EmrEtlRunner: updated Bash script to support enrichments path (#916)
StorageLoader: bumped to 0.3.1
StorageLoader: now looking in eu-west-1 region for s3://snowplow-hosted-assets (#895)
StorageLoader: updated combined Bash script to support enrichments path (#917)
Scala Hadoop Enrich: bumped to 0.6.0
Scala Hadoop Enrich: bumped Scala to 2.10.4 (#912)
Scala Hadoop Enrich: bumped Scalding to 0.11.1 (#911)
Scala Hadoop Enrich: bumped Hadoop to 1.2.1 (#913)
Scala Hadoop Enrich: bumped to Scala Common Enrich 0.5.0 (#788)
Scala Hadoop Enrich: passed etl_tstamp into Scala Common Enrich (#817)
Scala Hadoop Enrich: removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#835)
Scala Hadoop Enrich: removed %3D handling for compatibility with old Scalding Args (#850)
Scala Hadoop Enrich: added ability to download additional MaxMind databases (#885)
Scala Hadoop Enrich: added runHadoop and Tool.main tests (#914)
Scala Common Enrich: bumped to 0.5.0
Scala Common Enrich: bumped user-agent-utils version, thanks @pkallos! (#662)
Scala Common Enrich: bumped referer-parser to 0.2.2 (#864)
Scala Common Enrich: bumped httpclient to 4.3.3 (#897)
Scala Common Enrich: bumped scala-maxmind-geoip to scala-maxmind-iplookups 0.1.0 (#882)
Scala Common Enrich: stored etl_tstamp in new field in CanonicalOutput (#818)
Scala Common Enrich: removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#836)
Scala Common Enrich: made referer parsing configurable with list of internal domains (#857)
Scala Common Enrich: migrated configurable enrichments to new EnrichmentRegistry (#858)
Scala Common Enrich: added validation of enrichments JSON (#807)
Scala Common Enrich: replaced "anon_ip_quartets" with "anon_ip_octets" everywhere (#547)
Scala Common Enrich: added ability to extract event_id from querystring (#723)
Scala Common Enrich: extracted CanonicalInput's userId as network_userid, thanks @pkallos! (#855)
Scala Common Enrich: added MaxMind region_name field (#873)
Scala Common Enrich: added IP -> ISP lookup (#861)
Scala Common Enrich: added IP -> organization lookup (#887)
Scala Common Enrich: added IP -> domain lookup (#886)
Scala Common Enrich: added IP -> net speed lookup (#889)
Scala Common Enrich: added validation for transaction ID (#428)
Scala Common Enrich: renamed Tests to Specs for consistency (#618)
Scala Hadoop Shred: bumped to 0.2.0
Scala Hadoop Shred: bumped to Scala Common Enrich 0.5.0 (#918)
Scala Hadoop Shred: trailing empty fields no longer cause shredding for that row to fail (#921)
Scala Hadoop Shred: updated column offsets for enriched events TSV (#915)
Redshift: bumped table-def to 0.4.0
Redshift: migration script added for 0.3.0 to 0.4.0
Redshift: added etl_tstamp to atomic.events (#819)
Redshift: removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#834)
Redshift: added new MaxMind fields (#871)
Redshift: applied runlength encoding to all fields keyed off IP address (#883)
Redshift: migration script added for 0.3.0 to 0.4.0 (#838)
Postgres: bumped table-def to 0.3.0
Postgres: migration script added for 0.2.0 to 0.3.0
Postgres: added etl_tstamp to atomic.events (#820)
Postgres: removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#833)
Postgres: added new MaxMind fields (#871)
Postgres: migration script added for 0.2.0 to 0.3.0 (#837)
Version 0.9.5 (2014-07-09)
--------------------------
Ruby Tracker: added git submodule. Version 0.1.0 (#645)
Java Tracker: added git submodule. Version 0.2.0 (#843)
JavaScript Tracker: bumped git submodule to 2.0.0 (#635)
Python Tracker: bumped Python Tracker git submodule to 0.4.0 (#634)
Scala Hadoop Shred: added. Version 0.1.0
EmrEtlRunner: bumped to 0.8.0
EmrEtlRunner: updated S3DistCp steps to use new S3DistCpStep from Elasticity (#629)
EmrEtlRunner: added --skip s3distcp option (#313)
EmrEtlRunner: added ability to start Lingual in EmrEtlRunner (#623)
EmrEtlRunner: added ability to start HBase in EmrEtlRunner (#622)
EmrEtlRunner: improved load performance by switching ETL to write out to HDFS (#278)
EmrEtlRunner: now invoking Scala Hadoop Shredder after main job (#644)
EmrEtlRunner: added :iglu: section to config.yml for Scala Hadoop Shred (#814)
EmrEtlRunner: updated to run Scala Hadoop Shred following Hadoop Enrich (#815)
EmrEtlRunner: added --skip shred option (#659)
StorageLoader: bumped to 0.3.0
StorageLoader: bumped Sluice to 0.2.1 (#881)
StorageLoader: added initial Ruby.contracts support (#391)
StorageLoader: updated config.yml to support shredding (#897)
StorageLoader: added ACCEPTINVCHARS to StorageLoader (#411)
StorageLoader: wrote JSON Path files for ad_* events (#642)
StorageLoader: wrote JSON Path file for link_click (#599)
StorageLoader: wrote JSON Path file for screen_view (#643)
StorageLoader: wrote JSON Path file for schema.org's WebPage (#772)
StorageLoader: added :jsonpath_assets: setting for StorageLoader (#606)
StorageLoader: added ability to load custom tables using JSON Paths (#607)
StorageLoader: added --skip shred option (#660)
StorageLoader: added :in: hint on StorageLoader configuration, thanks @joaolcorreia! (#755)
Redshift: added Redshift DDL for ad_* events (#639)
Redshift: added Redshift DDL for link_click events (#600)
Redshift: added Redshift DDL for screen_view events (#640)
Redshift: added Redshift DDL for schema.org's WebPage (#771)
Looker Analytics: wrote LookML for ad_* events (#605)
Looker Analytics: wrote LookML for screen_view events (#637)
Looker Analytics: wrote LookML for link_click events (#636)
Looker Analytics: wrote LookML for schema.org's WebPage (#770)
Looker Analytics: updated LookML to use liquid templating (#851)
Version 0.9.4 (2014-05-30)
---------------------------
Redshift: added reference_data.country_codes (#779)
Postgres: added reference_data.country_codes (#781)
Looker Analytics: New 'traffic_pulse' dashboard with globally configurable drill-down variables (#765)
Looker Analytics: Snowplow website specific dimensions and metrics removed: base model is now company-generic (#764)
Looker Analytics: cleaner joining of data sets in Looker model (#763)
Looker Analytics: dimensions and metrics renamed to make it clearer for an analyst getting started with the data (#761)
Looker Analytics: added distkeys and sortkeys to derived tables to speed up query times (#696)
Looker Analytics: derived tables now auto-generated when new data is loaded into atomic.events (#688)
Looker Analytics: 'visits' renamed to 'sessions' (#762)
Looker Analytics: LookML models versioned using SchemaVer (#766)
Version 0.9.3 (2014-05-21)
--------------------------
EmrEtlRunner: bumped to 0.7.0
EmrEtlRunner: bumped Sluice to 0.2.1 (#405)
EmrEtlRunner: bumped Elasticity to 3.0.4 (#665)
EmrEtlRunner: replaced hadoop_version setting with ami_version setting (#701)
EmrEtlRunner: fixed handling of region, placement and ec2_subnet_id (#754)
EmrEtlRunner: fixed regression where 0 files staged still kicks off EMR (#409)
EmrEtlRunner: stopped Sluice file operation threads being killed by folders (#401)
EmrEtlRunner: fixed disabling of Cascading error catching (#721)
EmrEtlRunner: renamed Clojure Collector log files in processing bucket to support multiple instances (#717)
EmrEtlRunner: added initial Ruby.contracts support into EmrEtlRunner (#392)
EmrEtlRunner: updated to use the Ruby Logger (#194)
EmrEtlRunner: updated so it's embeddable in other applications (#128)
EmrEtlRunner: added ability to bundle as a JRuby fat jar (#674)
EmrEtlRunner: added initial unit tests (#672)
Clojure Collector: bumped to 0.6.0
Clojure Collector: load balancer IP address getting stored in logs (#719)
Documentation: removed all Snowplow tracking from READMEs, thanks @acinader! (#720)
Documentation: fixed EmrEtlRunner documentation is (slightly) inconsistent, thanks @pvdb! (#749)
Version 0.9.2 (2014-04-30)
--------------------------
Scala Hadoop Enrich: bumped to 0.5.0
Scala Hadoop Enrich: bumped to Scala Common Enrich 0.4.0 (#699)
Scala Hadoop Enrich: bumped SBT to 0.13.2 (#702)
Scala Hadoop Enrich: bumped to using using sbt-assembly 0.11.2 (#704)
Scala Common Enrich: bumped to 0.4.0
Scala Common Enrich: upgraded to support new and future CloudFront file formats (#698)
Scala Common Enrich: bumped SBT to 0.13.2 (#703)
Scala Hadoop Bad Rows: added. Version 0.1.0
Hive Storage: bumped table-def.q to 0.1.0
Hive Storage: added new unstructured fields to Hive table definition (#709)
Hive Storage: added raw page_url and page_referrer into Hive table (#710)
Hive Storage: added name_tracker field to Hive table (#711)
Version 0.9.1 (2014-04-11)
--------------------------
Scala Hadoop Enrich: bumped to 0.4.0
Scala Hadoop Enrich: bumped to Scala Common Enrich 0.3.0 (#497)
Scala Hadoop Enrich: renamed AnonQuartets to AnonOctets (#498)
Scala Hadoop Enrich: renamed all Snowplow Hadoop Tests to Specs (#515)
Scala Hadoop Enrich: added page_url and page_referrer back into ETL's output (#483)
Scala Common Enrich: bumped to 0.3.0
Scala Common Enrich: bumped Argonaut to 6.0.3 (#620)
Scala Common Enrich: added app and mob as valid platform codes, thanks @kinabalu! (#524)
Scala Common Enrich: added support for remaining platform codes (#516)
Scala Common Enrich: updated POJO in Scalding ETL to include new unstructured fields (#362)
Scala Common Enrich: updated POJO in Scalding ETL to include name_tracker field (#595)
Scala Common Enrich: extract evn from Tracker Protocol (#604)
Scala Common Enrich: extract tna from Tracker Protocol (#616)
Scala Common Enrich: extract and validate unstructured events (#142)
Scala Common Enrich: extract and validate custom contexts (#426)
Scala Common Enrich: reformat incoming event and context JSONs (#589)
Scala Common Enrich: make sure to error a JSON if > length (#567)
EmrEtlRunner: bumped to 0.6.0
EmrEtlRunner: bumped Elasticity to 3.0.2 (#587)
EmrEtlRunner: allowed AWS VPC selection in EmrEtlRunner (#581)
EmrEtlRunner: set :visible_to_all_users to true for EMR jobs, thanks @smugryan! (#560)
Redshift: atomic-def script bumped to 0.3.0
Redshift: migration script added for 0.2.2 to 0.3.0
Redshift: added new unstructured fields to Redshift table definition (#361)
Redshift: changed distkey to be event_id, not domain_userid (#584)
Redshift: added raw page_url and page_referrer into Redshift table (#591)
Redshift: added name_tracker field to Redshift table (#594)
Redshift: converted Redshift varchar(38) for event IDs to char(36) (#282)
Postgres: atomic-def script bumped to 0.2.0
Postgres: migration script added for 0.1.x to 0.2.0
Postgres: added new unstructured fields to Postgres table definition (#359)
Postgres: added raw page_url and page_referrer into Postgres table (#592)
Postgres: added name_tracker field to Postgres table (#593)
Postgres: converted varchar(36) for event IDs to char(36) (#596)
StorageLoader: bumped to 0.2.0
StorageLoader: added TIMEFORMAT 'auto' to StorageLoader to handle outlier dvce_timestamps (#427)
JavaScript Tracker: bumped git submodule to 1.0.1 (#585)
Python Tracker: added git submodule pointing to 0.1.0 (#586)
Version 0.9.0 (2014-02-04)
--------------------------
Thrift Raw Event: added. Version 0.1.0
Thrift Raw Event: specified Thrift IDL for new raw event schema (#430)
Scala Stream Collector: added. Version 0.1.0
Scala Stream Collector: implemented new spray-can (Akka Http) Scala stream collector (#432)
Scala Kinesis Enrich: added. Version 0.1.0
Scala Kinesis Enrich: implemented initial Kinesis-based enrichment (#460)
Scala Common Enrich: bumped to 0.2.0
Scala Common Enrich: added Thrift SnowplowRawEvent as a dependency to common-enrich (#475)
Scala Common Enrich: added ability to read Thrift SnowplowRawEvent (Thrift) (#462)
Scala Common Enrich: renamed CloudFront to Cloudfront in code (#495)
Scala Common Enrich: renamed AnonQuartets to AnonOctets (#491)
Scala Common Enrich: added raw -> CanonicalInput tests (#484)
Scala Common Enrich: updated GET payload extraction to handle empty payloads (#502)
Git submodules: changed git:// protocol in .gitmodules to https:// (#512)
NodeJS Collector: removed contrib-nodejs-collector from 2-collectors (#474)
JavaScript Tracker: bumped JS Tracker submodule to 0.13.1 release (#511)
Version 0.8.13 (2014-01-08)
---------------------------
Looker Analytics: added 0.1.0
Looker Analytics: created Snowplow metadata model for Looker BI (www.looker.com) (#472)
Version 0.8.12 (2014-01-07)
---------------------------
Hadoop ETL: bumped to 0.3.6
Hadoop ETL: bumped to SBT 0.13.0 (#404)
Hadoop ETL: bumped to using sbt-assembly 0.10.1 (#421)
Hadoop ETL: bumped to Scala 2.10.3 (#423)
Hadoop ETL: bumped to Scalding 0.8.11 (#422)
Hadoop ETL: upgraded useragent utils to 1.11 & moved to Maven dependency (#416)
Hadoop ETL: added test running back into sbt-assembly step (#420)
Hadoop ETL: updated copyright messages to be Snowplow not SnowPlow, and to 2014 not 2013 (#419)
Hadoop ETL: added ValidatedString as a type to package.scala (#328)
Hadoop ETL: added missing validation to stringToJByte (#408)
Hadoop ETL: missing page URI no longer interpreted as bad row (#399)
Hadoop ETL: updated CfRegex to reflect Cfcs(Cookie) can be empty (#410)
Hadoop ETL: numeric fields in tr_ and ti_ now parsed to doubles, not madeTsvSafe strings (#400)
Hadoop ETL: moved ETL core into separate project scala-enrich-common (#417)
Scala Common Enrich: updated ETL versioning to include host and common versions (#448)
Postgres: bumped cube-pages.sql to 0.1.1
Postgres: minor fix: cube_pages.complete referenced non-existent table cube_pages.basic, thanks @mrwalker! (#414)
Version 0.8.11 (2013-10-22)
---------------------------
Hadoop ETL: bumped to 0.3.5
Hadoop ETL: added Argonaut 6.0 as a dependency (#342)
Hadoop ETL: added fromTimestamp to EventEnrichments (#340)
Hadoop ETL: added makeTsvSafe to ConversionUtils (#338)
Hadoop ETL: added JsonUtils (#323)
Hadoop ETL: added support for 3 and 4 return values from MapTransformer (#324)
Hadoop ETL: updated GetJsonPayload to use Argonaut and renamed to JsonPayload (#339)
Hadoop ETL: added ability to mask IP addresses in ETL (#309)
Hadoop ETL: refr_ and page_ fields now stored raw (#374)
Hadoop ETL: defensively fixed raw spaces in page and referer URLs (#346)
Hadoop ETL: fixed regression, single-encoded %s logic didn't account for % itself (#347)
Hadoop ETL: added unit tests for fixTabsNewlines (#332)
Hadoop ETL: tests now report the failing CanonicalOutput field (#325)
Hadoop ETL: now handling all fields double-encoded as per CloudFront post-14-September (#348)
Hadoop ETL: added support for 21 Oct CloudFront access log format (#384)
Hadoop ETL: added truncation to refr_term (#379)
Hadoop ETL: added truncation to se_label (#394)
Hadoop ETL: made all prior ME.identity fields TSV-safe (#395)
EmrEtlRunner: bumped to 0.5.0
EmrEtlRunner: bumped Sluice to 0.1.5 (#96)
EmrEtlRunner: bumped Elasticity to 2.6 (#345)
EmrEtlRunner: enabled EMR Job Flow debugging for easier access to logs (#279)
EmrEtlRunner: ETL job no longer fails if there's no data for last run period (#296)
EmrEtlRunner: empty processing dir check now works if dir contains 1 file (#326)
EmrEtlRunner: added ability to mask IP addresses in ETL (#309)
EmrEtlRunner: made the examples match what you get from git out of the box, thanks @shermozle (#331)
StorageLoader: bumped to 0.1.1
StorageLoader: bumped Sluice to 0.1.5 (#96)
StorageLoader: fixed "\" in fields acts as an escape character for Postgres, thanks @kingo55 (#329)
StorageLoader: added ability to --skip analyze (#335)
StorageLoader: moved VACUUM SORT ONLY to a --include step (#321)
StorageLoader: added COMPROWS to config and --include compupdate option (#344)
StorageLoader: changed Postgres VACUUM FULL to VACUUM (#357)
StorageLoader: added TRUNCATECOLUMNS for Redshift load (#360)
StorageLoader: added FILLRECORD to our Redshift COPY command (#380)
Postgres: fixed error in `recipes_basic.technology_mobile` recipe (#397)
Version 0.8.10 (2013-10-18)
---------------------------
Redshift: bumped table-def to 0.2.2
Redshift: moved events table to a new atomic schema in atomic-def.sql (#301)
Redshift: added migration script for 0.2.1 to 0.2.2
Redshift: added SQL DDL to define Redshift recipes (#297)
Redshift: added SQL DDL to define Redshift cubes (#298)
Postgres: bumped table-def to 0.1.1
Postgres: renamed table-def file to atomic-def.sql
Postgres: added migration script for 0.1.0 to 0.1.1
Postgres: moved NOT NULL constraint on event field to event_vendor field (#318)
Postgres: added SQL DDL to define Postgres recipes (#303)
Postgres: added SQL DDL to define Postgres cubes (#302)
Documentation: fixed wrong path to no-js-tracker subdirectory, thanks @gregakespret (#343)
Documentation: improved "Find out more" table in README, thanks @dideler (#353)
Version 0.8.9 (2013-09-05)
--------------------------
Hadoop ETL: bumped to 0.3.4
Hadoop ETL: updated to handle singly-encoded %s in CloudFront querystring field (#333)
Version 0.8.8 (2013-08-04)
--------------------------
JavaScript Tracker: moved into own repo (#277)
Hadoop ETL: bumped to 0.3.3
Hadoop ETL: URL-decodes "%3D" to "=" to allow Hive-style directory names as arguments (#305)
Hadoop ETL: bumped referer-parser to 0.1.1 to fix java.lang.NullPointerException (#314)
EmrEtlRunner: bumped to 0.4.0
EmrEtlRunner: bumped Sluice to 0.0.7 (#299)
EmrEtlRunner: removed :snowplow: section from config.yml.sample (#289)
EmrEtlRunner: simplified EmrEtlRunner and its config (#287)
EmrEtlRunner: added run= to timestamped ETL folder names (#294)
EmrEtlRunner: updated "Jobflow started" stdout message to include jobflow ID (#315)
Hive ETL: removed folder 3-enrich/hive-etl as no longer supported (#286)
Hive storage: updated hive-storage scripts to work with current Redshift-format flatfile (#290)
Infobright: removed folder 4-storage/infobright as not currently supported (#285)
Postgres: add Postgres table definition in atomic schema (#160)
StorageLoader: bumped to 0.1.0
StorageLoader: bumped Sluice 0.0.7 (#300)
StorageLoader: removed code to delete Hive ETL's empty event files (#306)
StorageLoader: fixed bug where download path has to be set (even when using Redshift) (#280)
StorageLoader: optimized ANALYZE and VACUUM commands (#283)
StorageLoader: added MAXERROR as StorageLoader configuration value for Redshift (#273)
StorageLoader: added support for loading Postgres (#161)
StorageLoader: removed Infobright loading capability (#307)
StorageLoader: added support for loading into multiple storage targets (#311)
Version 0.8.7 (2013-07-07)
--------------------------
JavaScript Tracker: bumped to 0.12.0
JavaScript Tracker: fixed document reference to use documentAlias (#247)
JavaScript Tracker: fixed bug with setCustomUrl (#267)
JavaScript Tracker: changed ev_ to se_ for structured events (#197)
JavaScript Tracker: fixed Firefox failure when "Always ask" set for cookies (#163)
JavaScript Tracker: fixed bug in page ping functionality detected in IE 8 (#260)
JavaScript Tracker: replaced forEach as not supported in IE 6-8 (#295)
EmrEtlRunner: fixed bug in config.yml.sample (#291)
Arduino tracker: added git submodule link (#292)
Version 0.8.6 (2013-06-03)
--------------------------
Hadoop ETL: bumped to 0.3.2
Hadoop ETL: bumped Scalding to 0.8.5
Hadoop ETL: bumped Scala version to 2.10.0
Hadoop ETL: bumped scala-maxmind-geoip to 0.0.5 to work with Scala 2.10.0
Hadoop ETL: bumped SBT from 0.12.1 to 0.12.3
Hadoop ETL: bumped Specs2 to 1.14
Hadoop ETL: replaced Bytes in CanonicalOutput with JBytes (#254)
Hadoop ETL: disabled "corruption" detection in ETL overriding custom URLs with longer collector referer URLs (#268)
EmrEtlRunner: bumped to 0.3.0
EmrEtlRunner: updated config.yml.sample to support spot task instances
EmrEtlRunner: let EmrEtlRunner use spot task instances (#193)
EmrEtlRunner: consolidate small files prior to running ETL job (#207)
Version 0.8.5 (2013-05-24)
--------------------------
Hadoop ETL: bumped to 0.3.1
Hadoop ETL: now supports downloading GeoLiteCity.dat from public S3 URL if needed, thanks @petervanwesep (part of #258)
Hadoop ETL: added Twitter Maven Repo as a resolution repo, thanks @rgabo (#239)
Hadoop ETL: stripping control characters in addition to tabs and newlines (#259)
Hadoop ETL: fixed issue with large values for se_value (#263)
Hadoop ETL: renamed ev_ fields in CanonicalOutput to se_
Hadoop ETL: extractResolution renamed and fails gracefully if view dimensions exceed Integer max size (#264)
EmrEtlRunner: bumped to 0.2.1
EmrEtlRunner: returns public S3 URL to GeoLiteCity.dat file if hosted by Snowplow, thanks @petervanwesep (part of #258)
Redshift: table-def script bumped to 0.2.1
Redshift: migration script added for 0.2.0 to 0.2.1
Redshift: bumped se_value from a float to a double
Redshift: increased size of `_urlport` fields, thanks @petervanwesep (#266)
Infobright: bumped setup_ and verify_infobright.sql to 0.0.9
Infobright: added migration script 0.0.8->0.0.9
Infobright: increased size of `_urlport` fields, thanks @petervanwesep (#266)
Version 0.8.4 (2013-05-16)
--------------------------
Hadoop ETL: bumped to 0.3.0
Hadoop ETL: added geo-ip lookup to Scalding ETL
Hadoop ETL: bumped referer-parser from 0.1.0-M6 to to 0.1.0
Hadoop ETL: removed truncation of page_referrer (#236)
Hadoop ETL: added truncation of referer path/qs/fragment (#235)
Hadoop ETL: removing tabs found in referer search terms (#234)
Hadoop ETL: fixed client timestamp so it's not incorrectly localised - thanks @rgabo (#238)
Hadoop ETL: added parsing of collector version `cv` (#243)
Hadoop ETL: bumped Scalaz from 7.0.0-M9 to 7.0.0
Hadoop ETL: removed .gets from extractPageUri (#249)
EmrEtlRunner: bumped to 0.2.0
EmrEtlRunner: now passes MaxMind .dat file into Scalding ETL (#213)
EmrEtlRunner: improve messages when ETL job starts and fails (#230)
Redshift: table-def script bumped to 0.2.0
Redshift: migration script added for 0.1.0 to 0.2.0
Redshift: added geo-ip fields to Redshift table definition (#226)
Redshift: rename ev_ fields to se_ for structured events (#227)
Version 0.8.3 (2013-05-14)
--------------------------
JavaScript Tracker: bumped to 0.11.2
JavaScript Tracker: added unstructured events, thanks @rgabo, @tarsolya, @lackac (#198)
JavaScript Tracker: remove leading ampersand in querystring (#188)
Clojure Collector: bumped to 0.5.0
Clojure Collector: upgraded to use Tomcat AccessLogValve 0.0.4 (#240)
Clojure Collector: now logging Clojure Collector and Tomcat AccessLogValve versions (#239)
Common: completed splitting custom event type into: unstructured and structured events (#133)
Version 0.8.2 (2013-05-08)
--------------------------
Clojure Collector: bumped to 0.4.0
Clojure Collector: remove duplicate of wrap-request-logging in middleware.clj (#221)
Clojure Collector: check/potentially bump lein-ring dependency in project.clj (#222)
Clojure Collector: simplify building Clojure Collector, thanks @butlermh (#223, #225)
Clojure Collector: fix Tomcat log bug of missing cs(Referer) (#220)
Version 0.8.1 (2013-04-12)
--------------------------
Hadoop ETL: bumped to 0.2.0
Hadoop ETL: break referer_url into constituent parts (part of #175)
Hadoop ETL: remove raw referrer_url (as no space in Redshift table defn) (part of #175)
Hadoop ETL: added referer parsing (#176)
Redshift: table-def script bumped to 0.1.0
Redshift: migration script added for 0.0.1 to 0.1.0
Redshift: add/update referer fields in Redshift table definition (#204)
Redshift: fix bug where mkt_source and mkt_medium are getting swapped around (#215)
Common: replaced embedded architecture images with CloudFront-hosted images
Common: completed rename of 3-etl to 3-enrich (#99)
Common: "SnowPlow" -> "Snowplow" in 1st and 2nd level READMEs
Version 0.8.0 (2013-04-03)
--------------------------
Hadoop ETL: added. Version 0.1.0 (#177)
Hadoop ETL: truncate 6 "high risk" fields for Redshift (raw useragent, page title etc) (#192)
Hadoop ETL: ev_value now extracted as a float (#201)
EmrEtlRunner: bumped to 0.1.0
EmrEtlRunner: updated to work with new config.yml fields (part of #178)
EmrEtlRunner: added support for Hadoop ETL (part of #178)
EmrEtlRunner: added run ID and human-friendly job name (#100)
EmrEtlRunner: added run IDs to output folders (Hadoop ETL only) (#79)
EmrEtlRunner: changed .rvmrc to .ruby-version, thanks @richo (part of #190)
StorageLoader: changed .rvmrc to .ruby-version, thanks @richo (part of #190)
StorageLoader: added final missing /Gemfile to BUNDLE_GEMFILE in Bash script, thanks @frutik (#206)
Common: started rename of 3-etl to 3-enrich (part of #99)
Version 0.7.6 (2013-03-03)
--------------------------
HiveQL: redshift-etl.q added. Version 0.0.1 (#174)
HiveQL: hive-rolling-etl.q renamed to hive-etl.q and bumped to 0.5.7
HiveQL: non-hive-rolling-etl.q renamed to mysql-infobright-etl.q and bumped to 0.0.8 (part of #172)
EmrEtlRunner: bumped to 0.0.9
EmrEtlRunner: renamed :snowplow: variable names and added new Redshift one in config.yml (part of #172)
EmrEtlRunner: updated to support Redshift as a storage format (#173)
EmrEtlRunner: added missing /Gemfile to BUNDLE_GEMFILE in Bash script
StorageLoader: bumped to 0.0.5
StorageLoader: added Redshift-specific fields to config.yml (part of #159)
StorageLoader: added Redshift load support into StorageLoader (part of #159)
StorageLoader: added missing /Gemfile to BUNDLE_GEMFILE in Bash scripts
Redshift: table-def.sql script added. Version 0.0.1 (#158)
Infobright: bumped setup_ and verify_infobright.sql to 0.0.8
Infobright: widened useragent field (#184)
Infobright: added migration script 0.0.7->0.0.8
Serde: fixed and enabled broken tests (#14). Version unchanged
Version 0.7.5 (2013-02-25)
--------------------------
JavaScript Tracker: bumped to 0.11.1
JavaScript Tracker: fixed bug with cookie secure flag killing user ID cookies (#181)
Version 0.7.4 (2013-02-22)
--------------------------
JavaScript Tracker: bumped to 0.11.0
JavaScript Tracker: introduced setAppId() and deprecated setSiteId() (#168)
JavaScript Tracker: 1st party user ID now transmitted as duid (domain uid) (part of #150)
JavaScript Tracker: now sends dtm - the client timestamp (#149)
JavaScript Tracker: deprecated and disabled attachUserId()
JavaScript Tracker: deprecated getVisitorId() and getVisitorInfo() - use getDomainUserId() and getDomainUserInfo() instead
JavaScript Tracker: add setUserId which sets the uid field (#167)
JavaScript Tracker: SnowPlow cookies no longer tied to site ID (#148)
Clojure Collector: bumped to 0.3.0
Clojure Collector: now append nuid (network aka 3rd party) user ID, not uid (#150)
Serde: bumped to 0.5.5
Serde: renamed tstamp field to dtm
Serde: dt and tm split into dvce_x and collector_x (#149)
Serde: extract new nuid and duid fields (#150)
Serde: renamed visit_id to domain_sessionidx (#171)
HiveQL: hive-rolling-etl.q bumped to 0.5.6
HiveQL: non-hive-rolling-etl.q bumped to 0.0.7
HiveQL: dt and tm split into dvce_x and collector_x (#149)
HiveQL: now extracts uid, nuid and duid (#150)
HiveQL: renamed visit_id to domain_sessionidx (#171)
Infobright: bumped setup_infobright.sql to 0.0.7
Infobright: renamed dt and tm to dvce_x and collector_x (#149)
Infobright: now supports uid, nuid and duid (#150)
Infobright: renamed visit_id to domain_sessionidx (#171)
Infobright: added migration script 0.0.6 CloudFront collector -> 0.0.7
Infobright: added migration script 0.0.6 Clojure collector -> 0.0.7
Version 0.7.3 (2013-02-15)
--------------------------
JavaScript Tracker: bumped to 0.10.0
JavaScript Tracker: updated copyright notices
JavaScript Tracker: removed deprecated setAccount(), setTracker(), setHeartBeatTimer() - BREAKING CHANGE (#86)
JavaScript Tracker: added document charset to querystring (#138)
JavaScript Tracker: page ping no longer killed by 1 heartbeat w/o activity (#132)
JavaScript Tracker: added document & viewport dimensions (#94)
JavaScript Tracker: introduced trackStructEvent and deprecated trackEvent (#143)
JavaScript Tracker: cleaned up getRequest code to use improved requestStringBuilder
JavaScript Tracker: fixed logImpression (was using wrong argument names) (#162)
JavaScript Tracker: added scroll offsets to page ping (#127)
Serde: bumped to 0.5.4
Serde: updated copyright notices
Serde: structured events now logged as "struct" not "custom" - DATA CHANGE
Serde: added setting of new event_vendor field (to com.snowplowanalytics) (#144)
Serde: added extraction of doc charset (#138)
Serde: added extraction of document & viewport dimensions (#94)
Serde: added extraction of scroll offsets for enhanced page ping (#127)
Serde: added extraction of URL components (#105)
HiveQL: hive-rolling-etl.q bumped to 0.5.5
HiveQL: non-hive-rolling-etl.q bumped to 0.0.6
HiveQL: updated copyright notices
HiveQL: now supports charset, document & viewport, URL components, event_vendor and enhanced page ping
Infobright: bumped setup_infobright.sql to 0.0.6
Infobright: updated copyright notices
Infobright: added migration scripts (0.0.4->.6; 0.0.5->.6)
Infobright: added charset, document & viewport, URL components, event_vendor enhanced page ping
Version 0.7.2 (2013-01-29)
--------------------------
No-JavaScript Tracker: added. Version 0.1.0
JavaScript Tracker: bumped to 0.9.1
JavaScript Tracker: fixed bug where secure flag not being set on cookies sent via HTTPS
Clojure Collector: bumped to 0.2.0
Clojure Collector: fixed Tomcat config issue of times being recorded in 12-hour clock
Serde: added NoJsTrackerTest
Serde: fixed CljTomcatFormatTest
Version 0.7.1 (2013-01-22)
--------------------------
EmrEtlRunner: bumped to 0.0.8
EmrEtlRunner: updated copyright notices
EmrEtlRunner: added .rvmrc file (part of #121, #84)
EmrEtlRunner: removed .gemspec file
EmrEtlRunner: added dependencies to Gemfile and re-generated Gemfile.lock
StorageLoader: bumped to 0.0.4
StorageLoader: updated copyright notices
StorageLoader: added .rvmrc file (part of #121, #84)
StorageLoader: removed .gemspec file
StorageLoader: added dependencies to Gemfile and re-generated Gemfile.lock
Documentation: updated to use `bundle install` (#122)
Version 0.7.0 (2013-01-04)
--------------------------
Clojure Collector: added. Version 0.1.0
HiveQL: hive-rolling-etl.q bumped to 0.5.4
HiveQL: non-hive-rolling-etl.q bumped to 0.0.5
HiveQL: v_collector now set via Hive variable, not Serde (#118)
EmrEtlRunner: bumped to 0.0.7
EmrEtlRunner: bumped to using Sluice 0.0.6
EmrEtlRunner: added "Complete" message at end of run (part of #97)
EmrEtlRunner: validates "clj-tomcat" as collector format (#119)
EmrEtlRunner: passes collector format through to HiveQL (#119)
EmrEtlRunner: support for log files generated by Clojure Collector on Tomcat (#117)
Serde: added broken CljTomcatFormatTest
StorageLoader: bumped to 0.0.3
StorageLoader: bumped to using Sluice 0.0.6
StorageLoader: added "Complete" message at end of run (part of #97)
StorageLoader: --skip argument now supports a list (#81)
Infobright: bumped setup_infobright.sql to 0.0.5
Infobright: added migration script (0.0.4 -> 0.0.5)
Infobright: user_id field widened to 38 chars to support UUID
Version 0.6.5 (2012-12-26)
--------------------------
JavaScript Tracker: bumped to 0.9.0
JavaScript Tracker: each event now sent with an event type `e` (#63)
JavaScript Tracker: refactoring of event definition code
JavaScript Tracker: added attachUserId(boolean) method (#92)
JavaScript Tracker: removed configCustomData from logImpression (#115)
JavaScript Tracker: cleaned up activity tracking (page pings)
JavaScript Tracker: added a combine only option to snowpak.sh
Serde: bumped to 0.5.3
Serde: now extracts event type (`e`) from querystring (#63)
Serde: now attaches UUID event_id to each event (#89)
Serde: added support for IP address override in querystring (#90)
Serde: no longer dies on corrupted querystring (#114)
HiveQL: hive-rolling-etl.q bumped to 0.5.3
HiveQL: non-hive-rolling-etl.q bumped to 0.0.4
HiveQL: event and event_id now extracted from Serde (#63, #89)
EmrEtlRunner: updated config file template
Version 0.6.4 (2012-12-20)
--------------------------
HiveQL: renamed table-def.q to non-hive-format-table-def.q
HiveQL: added hive-format-table-def.q (#111)
Infobright: bumped setup_infobright.sql to 0.0.4
Infobright: added migration script (0.0.3 -> 0.0.4)
Infobright: now supports long br_langs and urls (#107)
Infobright: removed lookup from fields which slow a large load (#107)
Version 0.6.3 (2012-12-18)
--------------------------
JavaScript Tracker: bumped to 0.8.2
JavaScript Tracker: fixed regressions from splitting JS into multiple files (#103)
HiveQL: hive-rolling-etl.q bumped to 0.5.2
HiveQL: addded missing comma in hive-rolling-etl.q (#112)
Version 0.6.2 (2012-11-29)
--------------------------
JavaScript Tracker: bumped to 0.8.1
JavaScript Tracker: fixed bug with trailing comma (#102)
JavaScript Tracker: removed console.log when not debugging (#101)
JavaScript Tracker: removed minified sp.js from version control (added .gitignore to keep it out)
SnowCannon: bumped submodule to latest shermozle/SnowCannon commit
Version 0.6.1 (2012-11-28)
--------------------------
JavaScript Tracker: bumped to 0.8.0
JavaScript Tracker: rename ice.png to i - BREAKING CHANGE (#29)
JavaScript Tracker: added setCollectorCf() and deprecated setAccount() (#32)
JavaScript Tracker: Tracker constructor now supports Cf or Url (part of #44)
JavaScript Tracker: getTrackerCf() and -Url() added, getTracker() deprecated (part of #44)
JavaScript Tracker: added tracker version (`tv`) to querystring (#41)
JavaScript Tracker: added color depth tracking (part of #69)
JavaScript Tracker: added timezone tracking (part of #69)
JavaScript Tracker: added user fingerprinting (#70)
JavaScript Tracker: broke out .js into multiple files (#55)
EmrEtlRunner: bumped to 0.0.6
EmrEtlRunner: --skip takes multiple args (part of #83, supercedes #80)
EmrEtlRunner: add --process-bucket to process a bucket directly (part of #83)
StorageLoader: bumped to 0.0.2
StorageLoader: changed the data file encloser to NULL (#88)
Serde: bumped to 0.5.2
Serde: now extracts color depth, timezone and fingerprint fields
Serde: added useragent into ETL (#68)
Serde: now extracts platform field
HiveQL: hive-rolling-etl.q bumped to 0.5.1
HiveQL: non-hive-rolling-etl.q bumped to 0.0.3
HiveQL: now extracts color depth, timezone and fingerprint fields
HiveQL: now includes raw useragent as a separate field (#68)
HiveQL: platform field no longer a placeholder
HiveQL: event_name field renamed to event (prep for #89)
HiveQL: added event_id as a placeholder
Infobright: bumped setup_infobright.sql to 0.0.3
Infobright: added migration script (0.0.1/2 -> 0.0.3)
Infobright: now includes color depth, timezone and fingerprint fields
Infobright: now includes raw useragent (#68)
Infobright: event_name field renamed to event
Infobright: added event_id as a placeholder (prep for #89)
Version 0.6.0 (2012-11-12)
--------------------------
EmrEtlRunner: bumped to 0.0.5
EmrEtlRunner: bumped gem dependencies to match StorageLoader (including Sluice 0.0.4)
EmrEtlRunner: renamed snowplow-emr-etl.sh to snowplow-emr-etl-runner.sh
StorageLoader: added. Ruby app to load SnowPlow events into local databases etc
Serde: bumped to 0.5.1
Serde: changed all Booleans to Bytes for non-Hive output
HiveQL: bumped non-hive-rolling-etl.q to 0.0.2
HiveQL: changed non-hive-rolling-etl.q to use the two _bt Byte fields
Infobright: bumped setup_infobright.sql to 0.0.2
Infobright: changed booleans to tinyint(1)s (non-breaking change)
Version 0.5.2 (2012-11-05)
--------------------------
EmrEtlRunner: bump to 0.0.4
EmrEtlRunner: fixed reference to old version of Hive deserializer in config.yml (fixes #71)
EmrEtlRunner: fixed bug using sub-folders with the Processing Bucket (fixes #72)
EmrEtlRunner: can now skip move-files-to-Processing-Bucket or EMR stages (fixes #58)
EmrEtlRunner: S3 filecopy code now moved to Sluice, an external Ruby gem
Version 0.5.1 (2012-10-31)
--------------------------
Data model: stubbed new event_name and platform fields
Infobright: added setup scripts and docs into 4-storage/infobright (fixes #57)
Infobright: added version handling (v_tracker, v_collector, v_etl)
HiveQL: removed hive-exact-etl.q as no longer supported
HiveQL: added non-hive-rolling-etl.q for Infobright- (and other db-)friendly event file format
HiveQL: added version handling (v_tracker, v_collector, v_etl) (fixes #42)
Serde: bumped to 0.5.0
Serde: updated to avoid throwing exceptions on a bad field, fixes #52 (thanks @mtibben!)