forked from apache/pig
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGES.txt
5487 lines (2918 loc) · 198 KB
/
CHANGES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
Pig Change Log
Trunk (unreleased changes)
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-4925: Support for passing the bloom filter to the Bloom UDF (rohini)
PIG-4911: Provide option to disable DAG recovery (rohini)
PIG-4906: Add Bigdecimal functions in Over function (cgalan via daijy)
PIG-2768: Fix org.apache.hadoop.conf.Configuration deprecation warnings for Hadoop 23 (rohini)
OPTIMIZATIONS
BUG FIXES
PIG-4938: [PiggyBank] XPath returns empty values when using aggregation method
PIG-4751: XPath/XPathAll - ignoreNamspace breaks searching for XML attributes
PIG-4896: Param substitution ignored when redefined (knoguchi)
PIG-2315: Make as clause work in generate (daijy via knoguchi)
PIG-4921: Kill running jobs on InterruptedException (rohini)
PIG-4916: Pig on Tez fail to remove temporary HDFS files in some cases (daijy)
Release 0.16.0 - Unreleased
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-4719: Documentation for PIG-4704: Customizable Error Handling for Storers in Pig (daijy)
PIG-4714: Improve logging across multiple components with callerId (daijy)
PIG-4885: Turn off union optimizer if there is PARALLEL clause in union in Tez (rohini)
PIG-4894: Add API for StoreFunc to specify if they are write safe from two different vertices (rohini)
PIG-4884: Tez needs to use DistinctCombiner.Combine (rohini)
PIG-4874: Remove schema tuple reference overhead for replicate join hashmap (rohini)
PIG-4879: Pull latest version of joda-time (rohini)
PIG-4526: Make setting up the build environment easier (nielsbasjes via rohini)
PIG-4641: Print the instance of Object without using toString() (sandyridgeracer via rohini)
PIG-4455: Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter (zjffdu via rohini)
PIG-4866: Do not serialize PigContext in configuration to the backend (rohini)
PIG-4547: Update Jython version to 2.7.0 (erwaman via daijy)
PIG-4862: POProject slow by creating StackTrace repeatedly (knoguchi)
PIG-4853: Fetch inputs before starting outputs (rohini)
PIG-4847: POPartialAgg processing and spill improvements (rohini)
PIG-4840: Do not turn off UnionOptimizer for unsupported storefuncs in case of no vertex groups (rohini)
PIG-4843: Turn off combiner in reducer vertex for Tez if bags are in combine plan (rohini)
PIG-4796: Authenticate with Kerberos using a keytab file (nielsbasjes via daijy)
PIG-4817: Bump HTTP Logparser to version 2.4 (nielsbasjes via daijy)
PIG-4811: Upgrade groovy library to address MethodClosure vulnerability (daijy)
PIG-4803: Improve performance of regex-based builtin functions (eyal via daijy)
PIG-4802: Autoparallelism should estimate less when there is combiner (rohini)
PIG-4761: Add more information to front end error messages (eyal via daijy)
PIG-4792: Do not add java and sun system properties to jobconf (rohini)
PIG-4787: Log JSONLoader exception while parsing records (rohini)
PIG-4763: Insufficient check for the number of arguments in runpigmix.pl (sekikn via rohini)
PIG-4411: Support for vertex level configuration like speculative execution (rohini)
PIG-4775: Better default values for shuffle bytes per reducer (rohini)
PIG-4753: Pigmix should have option to delete outputs after completing the tests (mitdesai via rohini)
PIG-4744: Honor tez.staging-dir setting in tez-site.xml (rohini via daijy)
PIG-4742: Document Pig's Register Artifact Command added in PIG-4417 (akshayrai09 via daijy)
PIG-4417: Pig's register command should support automatic fetching of jars from repo (akshayrai09 via daijy)
PIG-4713: Document Bloom UDF (gliptak via daijy)
PIG-3251: Bzip2TextInputFormat requires double the memory of maximum record size (knoguchi)
PIG-4704: Customizable Error Handling for Storers in Pig (siddhimehta via daijy)
PIG-4717: Update Apache HTTPD LogParser to latest version (nielsbasjes via daijy)
PIG-4468: Pig's jackson version conflicts with that of hadoop 2.6.0 or newer (zjffdu via daijy)
PIG-4708: Upgrade joda-time to 2.8 (rohini)
PIG-4697: Pig needs to serialize only part of the udfcontext for each vertex (rohini)
PIG-4702: Load once for sampling and partitioning in order by for certain LoadFuncs (rohini)
PIG-4699: Print Job stats information in Tez like mapreduce (rohini)
PIG-4554: Compress pig.script before encoding (sandyridgeracer via rohini)
PIG-4670: Embedded Python scripts still parse line by line (rohini)
PIG-4663: HBaseStorage should allow the MaxResultsPerColumnFamily limit to avoid memory or scan timeout issues (pmazak via rohini)
PIG-4673: Built In UDF - REPLACE_MULTI : For a given string, search and replace all occurrences
of search keys with replacement values (murali.k.h.rao@gmail.com via daijy)
PIG-4674: TOMAP should infer schema (daijy)
PIG-4676: Upgrade Hive to 1.2.1 (daijy)
PIG-4574: Eliminate identity vertex for order by and skewed join right after LOAD (rohini)
PIG-4365: TOP udf should implement Accumulator interface (eyal via rohini)
PIG-4570: Allow AvroStorage to use a class for the schema (pmazak via daijy)
PIG-4405: Adding 'map[]' support to mock/Storage (nielsbasjes via daijy)
PIG-4638: Allow TOMAP to accept dynamically sized input (nielsbasjes via daijy)
PIG-4639: Add better parser for Apache HTTPD access log (nielsbasjes via daijy)
BUG FIXES
PIG-4821: Pig chararray field with special UTF-8 chars as part of tuple join key produces wrong results in Tez (rohini)
PIG-4734: TOMAP schema inferring breaks some scripts in type checking for bincond (daijy)
PIG-4786: CROSS will not work correctly with Grace Parallelism (daijy)
PIG-3227: SearchEngineExtractor does not work for bing (dannyant via daijy)
PIG-4902: Fix UT failures on 0.16 branch: TestTezGraceParallelism, TestPigScriptParser (daijy)
PIG-4909: PigStorage incompatible with commons-cli-1.3 (knoguchi)
PIG-4908: JythonFunction refers to Oozie launcher script absolute path (rohini)
PIG-4905: Input of empty dir does not produce empty output file in Tez (rohini)
PIG-4576: Nightly test HCat_DDL_2 fails with TDE ON (nmaheshwari via daijy)
PIG-4873: InputSplit.getLocations return null and result a NPE in Pig (daijy)
PIG-4895: User UDFs relying on mapreduce.job.maps broken in Tez (rohini)
PIG-4883: MapKeyType of splitter was set wrongly in specific multiquery case (kellyzly via rohini)
PIG-4887: Parameter substitution skipped with glob on register (knoguchi)
PIG-4889: Replacing backslash fails as lexical error (knoguchi)
PIG-4880: Overlapping of parameter substitution names inside&outside a macro fails with NPE (knoguchi)
PIG-4881: TestBuiltin.testUniqueID failing on hadoop-1.x (knoguchi)
PIG-4888: Line number off when reporting syntax error inside a macro (knoguchi)
PIG-3772: Syntax error when casting an inner schema of a bag and line break involved (ssvinarchukhorton via knoguchi)
PIG-4892: removing /tmp/output before UT (daijy)
PIG-4882: Remove hardcoded groovy.grape.report.downloads=true from DownloadResolver (erwaman via daijy)
PIG-4581: thread safe issue in NodeIdGenerator (rcatherinot via rohini)
PIG-4878: Fix issues from PIG-4847 (rohini)
PIG-4877: LogFormat parser fails test (nielsbasjes via daijy)
PIG-4860: Loading data using OrcStorage() accepts only default FileSystem path (beriaanirudh via rohini)
PIG-4868: Low values for bytes.per.reducer configured by user not honored in Tez for inputs (rohini)
PIG-4869: Removing unwanted configuration in Tez broke ConfiguredFailoverProxyProvider (rohini)
PIG-4867: -stop_on_failure does not work with Tez (rohini)
PIG-4844: Tez AM runs out of memory when vertex has high number of outputs (rohini)
PIG-3906: ant site errors out (nielsbasjes via daijy)
PIG-4851: Null not padded when input has less fields than declared schema for some loader (rohini)
PIG-4850: Registered jars do not use submit replication (rdblue via cheolsoo)
PIG-4845: Parallel instantiation of classes in Tez cause tasks to fail (rohini)
PIG-4841: Inline-op with schema declaration fails with syntax error (knoguchi)
PIG-4832: Fix TestPrumeColumn NPE failure (kellyzly via daijy)
PIG-4833 TestBuiltin.testURIWithCurlyBrace in TEZ failing after PIG-4819 (knoguchi)
PIG-4819: RANDOM() udf can lead to missing or redundant records (knoguchi)
PIG-4816: Read a null scalar causing a Tez failure (daijy)
PIG-4818: Single quote inside comment in GENERATE is not being ignored (knoguchi)
PIG-4814: AvroStorage does not take namenode HA as part of schema file url (daijy)
PIG-4812: Register Groovy UDF with relative path does not work (daijy)
PIG-4806: UDFContext can be reset in the middle during Tez input and output initialization (rohini)
PIG-4808: PluckTuple overwrites regex if used more than once in the same script (eyal via daijy)
PIG-4801: Provide backward compatibility with mapreduce mapred.task settings (rohini)
PIG-4759: Fix Classresolution_1 e2e failure (rohini)
PIG-4800: EvalFunc.getCacheFiles() fails for different namenode (rohini)
PIG-4790: Join after union fail due to UnionOptimizer (rohini)
PIG-4686: Backend code should not call AvroStorageUtils.getPaths (mitdesai via rohini)
PIG-4795: Flushing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream (emopers via daijy)
PIG-4690: Union with self replicate join will fail in Tez (rohini)
PIG-4791: PORelationToExprProject filters records instead of returning emptybag in nested foreach after union (rohini)
PIG-4779: testBZ2Concatenation[pig.bzip.use.hadoop.inputformat = true] failing due to successful read (knoguchi)
PIG-4587: Applying isFirstReduceOfKey for Skewed left outer join skips records (rohini)
PIG-4782: OutOfMemoryError: GC overhead limit exceeded with POPartialAgg (rohini)
PIG-4737: Check and fix clone implementation for all classes extending PhysicalOperator (rohini)
PIG-4770: OOM with POPartialAgg in some cases (rohini)
PIG-4773: [Pig on Tez] Secondary key descending sort in nested foreach after union does ascending instead (rohini)
PIG-4774: Fix NPE in SUM,AVG,MIN,MAX UDFs for null bag input (rohini)
PIG-4757: Job stats on successfully read/output records wrong with multiple inputs/outputs (rohini)
PIG-4769: UnionOptimizer hits errors when merging vertex group into split (rohini)
PIG-4768: EvalFunc reporter is null in Tez (rohini)
PIG-4760: TezDAGStats.convertToHadoopCounters is not used, but impose MR counter limit (daijy)
PIG-4755: Typo in runpigmix script (mitdesai via daijy)
PIG-4736: Removing empty keys in UDFContext broke one LoadFunc (rohini)
PIG-4733: Avoid NullPointerException in JVMReuseImpl for builtin classes (rohini)
PIG-4722: [Pig on Tez] NPE while running Combiner (rohini)
PIG-4730: [Pig on Tez] Total parallelism estimation does not account load parallelism (rohini)
PIG-4689: CSV Writes incorrect header if two CSV files are created in one script (nielsbasjes via daijy)
PIG-4727: Incorrect types table for AVG in docs (nsmith via daijy)
PIG-4725: Typo in FrontendException messages "Incompatable" (nsmith via daijy)
PIG-4721: IsEmpty documentation error (nsmith via daijy)
PIG-4712: [Pig on Tez] NPE in Bloom UDF after Union (rohini)
PIG-4707: [Pig on Tez] Streaming job hangs with pig.exec.mapPartAgg=true (rohini)
PIG-4703: TezOperator.stores shall not ship to backend (daijy)
PIG-4696: Empty map returned by a streaming_python udf wrongly contains a null key (cheolsoo)
PIG-4691: [Pig on Tez] Support for whitelisting storefuncs for union optimization (rohini)
PIG-3957: Refactor out resetting input key in TezDagBuilder (rohini)
PIG-4688: Limit followed by POPartialAgg can give empty or partial results in Tez (rohini)
PIG-4635: NPE while running pig script in tez mode (daijy)
PIG-4683: Nested order is broken after PIG-3591 in some cases (daijy)
PIG-4679: Performance degradation due to InputSizeReducerEstimator since PIG-3754 (daijy)
PIG-4315: MergeJoin or Split followed by order by gives NPE in Tez (rohini)
PIG-4654: Reduce tez memory.reserve-fraction and clear spillables for better memory utilization (rohini)
PIG-4628: Pig 0.14 job with order by fails in mapreduce mode with Oozie (knoguchi)
PIG-4651: Optimize NullablePartitionWritable serialization for skewed join (rohini)
PIG-4627: [Pig on Tez] Self join does not handle null values correctly (rohini)
PIG-4644: PORelationToExprProject.clone() is broken (erwaman via rohini)
PIG-4650: ant mvn-deploy target is broken (daijy)
PIG-4649: [Pig on Tez] Union followed by HCatStorer misses some data (rohini)
PIG-4636: Occurred spelled incorrectly in error message for Launcher and POMergeCogroup (stevenmz via daijy)
PIG-4624: Error on ORC empty file without schema (daijy)
PIG-3622: Allow casting bytearray fields to bytearray type (redisliu via daijy)
PIG-4618: When use tez as the engine , set pig.user.cache.enabled=true do not take effect (wisgood via rohini)
PIG-4533: Document error: Pig does support concatenated gz file (xhudik via daijy)
PIG-4578: ToDateISO should support optional ' ' space variant used by JDBC (michaelthoward via daijy)
Release 0.15.0
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-4560: Pig 0.15.0 Documentation (daijy)
PIG-4429: Add Pig alias information and Pig script to the DAG view in Tez UI (daijy)
PIG-3994: Implement getting backend exception for Tez (rohini)
PIG-4563: Upgrade to released Tez 0.7.0 (daijy)
PIG-4525: Clarify "Scalar has more than one row in the output." (Niels Basjes via gates)
PIG-4511: Add columns to prune from PluckTuple (jbabcock via cheolsoo)
PIG-4434: Improve auto-parallelism for tez (daijy)
PIG-4495: Better multi-query planning in case of multiple edges (rohini)
PIG-3294: Allow Pig use Hive UDFs (daijy)
PIG-4476: Fix logging in AvroStorage* classes and SchemaTuple class (rdsr via rohini)
PIG-4458: Support UDFs in a FOREACH Before a Merge Join (wattsinabox via daijy)
PIG-4454: Upgrade tez to 0.6.0 (daijy)
PIG-4451: Log partition and predicate filter pushdown information and fix optimizer looping (rohini)
PIG-4430: Pig should support reading log4j.properties file from classpath as well (rdsr via daijy)
PIG-4407: Allow specifying a replication factor for jarcache (jira.shegalov via rohini)
PIG-4401: Add pattern matching to PluckTuple (cheolsoo)
PIG-2692: Make the Pig unit faciliities more generalizable and update javadocs (razsapps via daijy)
PIG-4379: Make RoundRobinPartitioner public (daijy)
PIG-4378: Better way to fix tez local mode test hanging (daijy)
PIG-4358: Add test cases for utf8 chinese in Pig (nmaheshwari via daijy)
PIG-4370: HBaseStorage should support delete markers (bridiver via daijy)
PIG-4360: HBaseStorage should support setting the timestamp field (bridiver via daijy)
PIG-4337: Split Types and MultiQuery e2e tests into multiple groups (rohini)
PIG-4333: Split BigData tests into multiple groups (rohini)
BUG FIXES
PIG-4592: Pig 0.15 stopped working with Hadoop 1.x (daijy)
PIG-4580: Fix TestTezAutoParallelism.testSkewedJoinIncreaseParallelism test failure (daijy)
PIG-4571: TestPigRunner.testGetHadoopCounters fail on Windows (daijy)
PIG-4541: Skewed full outer join does not return records if any relation is empty. Outer join does not
return any record if left relation is empty (daijy)
PIG-4564: Pig can deadlock in POPartialAgg if there is a bag (rohini via daijy)
PIG-4569: Fix e2e test Rank_1 failure (rohini)
PIG-4490: MIN/MAX builtin UDFs return wrong results when accumulating for strings (xplenty via rohini)
PIG-4418: NullPointerException in JVMReuseImpl (rohini)
PIG-4562: Typo in DataType.toDateTime (daijy)
PIG-4559: Fix several new tez e2e test failures (daijy)
PIG-4506: binstorage fails to write biginteger (ssavvides via daijy)
PIG-4556: Local mode is broken in some case by PIG-4247 (daijy)
PIG-4523: Tez engine should use tez config rather than mr config whenever possible (daijy)
PIG-4452: Embedded SQL using "SQL" instead of "sql" fails with string index out of range: -1 error (daijy)
PIG-4543: TestEvalPipelineLocal.testRankWithEmptyReduce fail on Hadoop 1 (daijy)
PIG-4544: Upgrade Hbase to 0.98.12 (daijy)
PIG-4481: e2e tests ComputeSpec_1, ComputeSpec_2 and StreamingPerformance_3 produce different result on Windows (daijy)
PIG-4496: Fix CBZip2InputStream to close underlying stream (petersla via daijy)
PIG-4528: Fix a typo in src/docs/src/documentation/content/xdocs/basic.xml (namusyaka via daijy)
PIG-4532: Pig Documentation contains typo for AvroStorage (fredericschmaljohann via daijy)
PIG-4377: Skewed outer join produce wrong result in some cases (daijy)
PIG-4538: Pig script fail with CNF in follow up MR job (daijy)
PIG-4537: Fix unit test failure introduced by TEZ-2392: TestCollectedGroup, TestLimitVariable, TestMapSideCogroup, etc (daijy)
PIG-4530: StackOverflow in TestMultiQueryLocal running under hadoop20 (nielsbasjes via rohini)
PIG-4529: Pig on tez hit counter limit imposed by MR (daijy)
PIG-4524: Pig Minicluster unit tests broken by TEZ-2333 (daijy)
PIG-4527: NON-ASCII Characters in Javadoc break 'ant docs' (nielsbasjes via daijy)
PIG-4494: Pig's htrace version conflicts with that of hadoop 2.6.0 (daijy)
PIG-4519: Correct link to Contribute page (gliptak via daijy)
PIG-4514: pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change (thejas)
PIG-4503: [Pig on Tez] NPE in UnionOptimizer with multiple levels of union (rohini)
PIG-4509: [Pig on Tez] Unassigned applications not killed on shutdown (rohini)
PIG-4508: [Pig on Tez] PigProcessor check for commit only on MROutput (rohini)
PIG-4505: [Pig on Tez] Auto adjust AM memory can hit OOM with 3.5GXmx (rohini)
PIG-4502: E2E tests build fail with udfs compile (nmaheshwari via daijy)
PIG-4498: AvroStorage in Piggbank does not handle bad records and fails (viraj via rohini)
PIG-4499: mvn-build miss tez classes in pig-h2.jar (daijy)
PIG-4488: Pig on tez mask tez.queue.name (daijy)
PIG-4497: [Pig on Tez] NPE for null scalar (rohini)
PIG-4493: Pig on Tez gives wrong results if Union is followed by Split (rohini)
PIG-4491: Streaming Python Bytearray Bugs (jeremykarn via daijy)
PIG-4487: Pig on Tez gives wrong success message on failure in case of multiple outputs (rohini)
PIG-4483: Pig on Tez output statistics shows storing to same directory twice for union (rohini)
PIG-4480: Pig script failure on Tez with split and order by due to missing sample collection (rohini)
PIG-4484: Ant pull jetty-6.1.26.zip on some platform (daijy)
PIG-4479: Pig script with union within nested splits followed by join failed on Tez (rohini)
PIG-4457: Error is thrown by JobStats.getOutputSize() when storing to a MySql table (rohini)
PIG-4475: Keys in AvroMapWrapper are not proper Pig types (rdsr via daijy)
PIG-4478: TestCSVExcelStorage fails with jdk8 (rohini)
PIG-4474: Increasing intermediate parallelism has issue with default parallelism (rohini)
PIG-4465: Pig streaming ship fails for relative paths on Tez (rohini)
PIG-4461: Use benchmarks for Windows Pig e2e tests (nmaheshwari via daijy)
PIG-4463: AvroMapWrapper still leaks Avro data types and AvroStorageDataConversionUtilities do not handle
Pig maps (rdsr via daijy)
PIG-4460: TestBuiltIn testValueListOutputSchemaComplexType and testValueSetOutputSchemaComplexType tests
create bags whose inner schema is not a tuple (erwaman via daijy)
PIG-4448: AvroMapWrapper leaks Avro data types when the map values are complex avro records (rdsr via daijy)
PIG-4453: Remove test-tez-local target (daijy)
PIG-4443: Write inputsplits in Tez to disk if the size is huge and option to compress pig input splits (rohini)
PIG-4447: Pig Cannot handle nullable values (arrays and records) in avro records (rdsr via daijy)
PIG-4444: Fix unit test failure TestTezAutoParallelism (daijy)
PIG-4445: VALUELIST and VALUESET outputSchema does not match actual schema of data returned when map value schema
is complex (erwaman via daijy)
PIG-4442: Eliminate redundant RPC call to get file information in HPath (cnauroth via daijy)
PIG-4440: Some code samples in documentation use Unicode left/right single quotes, which cause a
parse failure (cnauroth via daijy)
PIG-4264: Port TestAvroStorage to tez local mode (daijy)
PIG-4437: Fix tez unit test failure TestJoinSmoke, TestSkewedJoin (daijy)
PIG-4432: Built-in VALUELIST and VALUESET UDFs do not preserve the schema when the map value type is
a complex type (erwaman via daijy)
PIG-4408: Merge join should support replicated join as a predecessor (bridiver via daijy)
PIG-4389: Flag to run selected test suites in e2e tests (daijy)
PIG-4385: testDefaultBootup fails because it cannot find "pig.properties" (mkudlej via daijy)
PIG-4397: CSVExcelStorage incorrect output if last field value is null (daijy)
PIG-4431: ReadToEndLoader does not close the record reader for the last input split (rdsr via daijy)
PIG-4426: RowNumber(simple) Rank not producing correct results (knoguchi)
PIG-4433: Loading bigdecimal in nested tuple does not work (kpriceyahoo via daijy)
PIG-4410: Fix testRankWithEmptyReduce in tez mode (daijy)
PIG-4392: RANK BY fails when default_parallel is greater than cardinality of field being ranked by (daijy)
PIG-4403: Combining -Dpig.additional.jars.uris with -useHCatalog breaks due to combination
with colon instead of comma (ovlaere via daijy)
PIG-4402: JavaScript UDF example in the doc is broken (cheolsoo)
PIG-4394: Fix Split_9 and Union_5 e2e failures (rohini)
PIG-4391: Fix TestPigStats test failure (rohini)
PIG-4387: Honor yarn settings in tez-site.xml and optimize dag status fetch (rohini)
PIG-4352: Port local mode tests to Tez - TestUnionOnSchema (daijy)
PIG-4359: Port local mode tests to Tez - part4 (daijy)
PIG-4340: PigStorage fails parsing empty map (daijy)
PIG-4366: Port local mode tests to Tez - part5 (daijy)
PIG-4381: PIG grunt shell DEFINE commands fails when it spans multiple lines (daijy)
PIG-4384: TezLauncher thread should be deamon thread (zjffdu via daijy)
PIG-4376: NullPointerException accessing a field of an invalid bag from a nested foreach
(kspringborn via daijy)
PIG-4355: Piggybank: XPath cant handle namespace in xpath, nor can it return more than one match
(cavanaug via daijy)
PIG-4371: Duplicate snappy.version in libraries.properties (daijy)
PIG-4368: Port local mode tests to Tez - TestLoadStoreFuncLifeCycle (daijy)
PIG-4367: Port local mode tests to Tez - TestMultiQueryBasic (daijy)
PIG-4339: e2e test framework assumes default exectype as mapred (rohini)
PIG-2949: JsonLoader only reads arrays of objects (eyal via daijy)
PIG-4213: CSVExcelStorage not quoting texts containing \r (CR) when storing (alfonso.nishikawa via daijy)
PIG-2647: Split Combining drops splits with empty getLocations() (tmwoodruff via daijy)
PIG-4294: Enable unit test "TestNestedForeach" for spark (kellyzly via rohini)
PIG-4282: Enable unit test "TestForEachNestedPlan" for spark (kellyzly via rohini)
PIG-4361: Fix perl script problem in TestStreaming.java (kellyzly via xuefu)
PIG-4354: Port local mode tests to Tez - part3 (daijy)
PIG-4338: Fix test failures with JDK8 (rohini)
PIG-4351: TestPigRunner.simpleTest2 fail on trunk (daijy)
PIG-4350: Port local mode tests to Tez - part2 (daijy)
PIG-4326: AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records (mprim via daijy)
PIG-4345: e2e test "RubyUDFs_13" fails because of the different result of "group a all" in different engines like "spark", "mapreduce" (kellyzly via rohini)
PIG-4332: Remove redundant jars packaged into pig-withouthadoop.jar for hadoop 2 (rohini)
PIG-4331: update README, '-x' option in usage to include tez (thejas via daijy)
PIG-4327: Schema of map with value that has an alias can't be parsed again (mprim via daijy)
PIG-4330: Regression test for PIG-3584 - AvroStorage does not correctly translate arrays of strings (brocknoland via daijy)
PIG-3615: Update the way that JsonLoader/JsonStorage deal with BigDecimal (tyro89 via daijy)
PIG-4329: Fetch optimization should be disabled when limit is not pushed up (lbendig via cheolsoo)
PIG-3413: JsonLoader fails the pig job in case of malformed json input (eyal via daijy)
PIG-4247: S3 properties are not picked up from core-site.xml in local mode (cheolsoo)
PIG-4242: For indented xmls with multiline content (e.g. wikipedia) XMLLoader cuts out the begining of every line
(holdfenytolvaj via daijy)
Release 0.14.1 - Unreleased
INCOMPATIBLE CHANGES
IMPROVEMENTS
BUG FIXES
PIG-4409: fs.defaultFS is overwritten in JobConf by replicated join at runtime (cheolsoo)
PIG-4404: LOAD with HBaseStorage on secure cluster is broken in Tez (rohini)
PIG-4375: ObjectCache should use ProcessorContext.getObjectRegistry() (rohini)
PIG-4334: PigProcessor does not set pig.datetime.default.tz (rohini)
PIG-4342: Pig 0.14 cannot identify the uppercase of DECLARE and DEFAULT (daijy)
Release 0.14.0
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-4321: Documentation for 0.14 (daijy)
PIG-4328: Upgrade Hive to 0.14 (daijy)
PIG-4318: Make PigConfiguration naming consistent (rohini)
PIG-4316: Port TestHBaseStorage to tez local mode (rohini)
PIG-4224: Upload Tez payload history string to timeline server (daijy)
PIG-3977: Get TezStats working for Oozie (rohini)
PIG-3979: group all performance, garbage collection, and incremental aggregation (rohini)
PIG-4253: Add a UniqueID UDF (daijy)
PIG-4160: Provide a way to pass local jars in pig.additional.jars when using a remote
url for a script (acoliver via daijy)
PIG-4246: HBaseStorage should implement getShipFiles (rohini)
PIG-3456: Reduce threadlocal conf access in backend for each record (rohini)
PIG-3861: duplicate jars get added to distributed cache (chitnis via rohini)
PIG-4039: New interface for resetting static variables for jvm reuse (rohini)
PIG-3870: STRSPLITTOBAG UDF (cryptoe via daijy)
PIG-4080: Add Preprocessor commands and more to the black/whitelisting feature (prkommireddi via daijy)
PIG-4162: Intermediate reducer parallelism in Tez should be higher (rohini)
PIG-4186: Fix e2e run against new build of pig and some enhancements (rohini)
PIG-3838: Organize tez code into subpackages (rohini)
PIG-4069: Limit reduce task should start as soon as one map task finishes (rohini)
PIG-4141: Ship UDF/LoadFunc/StoreFunc dependent jar automatically (daijy)
PIG-4146: Create a target to run mr and tez unit test in one shot (daijy)
PIG-4144: Make pigunit.PigTest work in tez mode (daijy)
PIG-4128: New logical optimizer rule: ConstantCalculator (daijy)
PIG-4124: Command for Python streaming udf should be configurable (cheolsoo)
PIG-4114: Add Native operator to tez (daijy)
PIG-4117: Implement merge cogroup in Tez (daijy)
PIG-4119: Add message at end of each testcase with timestamp in Pig system tests (nmaheshwari via daijy)
PIG-4008: Pig code change to enable Tez Local mode (airbots via daijy)
PIG-4091: Predicate pushdown for ORC (rohini via daijy)
PIG-4077: Some fixes and e2e test for OrcStorage (rohini)
PIG-4054: Do not create job.jar when submitting job (daijy)
PIG-4047: Break up pig withouthadoop and fat jar (daijy)
PIG-4062: Add ascending order option to builtin TOP function (raj171 via cheolsoo)
PIG-3558: ORC support for Pig (daijy)
PIG-2122: Parameter Substitution doesn't work in the Grunt shell (daijy)
PIG-4031: Provide Counter aggregation for Tez (daijy)
PIG-4028: add a flag to control the ivy resolve/retrieve output (gkesavan via daijy)
PIG-4015: Provide a way to disable auto-parallism in tez (daijy)
PIG-3846: Implement automatic reducer parallelism (daijy)
PIG-3939: SPRINTF function to format strings using a printf-style template (mrflip via cheolsoo)
PIG-3970: Merge Tez branch into trunk (daijy)
OPTIMIZATIONS
PIG-4657: [Pig on Tez] Optimize GroupBy and Distinct key comparison (rohini)
BUG FIXES
PIG-4335: Pig release tarball miss tez classes (daijy)
PIG-4325: StackOverflow when spilling InternalCachedBag (daijy)
PIG-4324: Remove jsch-LICENSE.txt (daijy)
PIG-4267: ToDate has incorrect timezone offsets (bridiver via daijy)
PIG-4319: Make LoadPredicatePushdown InterfaceAudience.Private till PIG-4093 (rohini)
PIG-4312: TestStreamingUDF tez mode leave orphan process on Windows (daijy)
PIG-4314: BigData_5 hang on some machine (daijy)
PIG-4299: SpillableMemoryManager assumes tenured heap incorrectly (prkommireddi via daijy)
PIG-4298: Descending order-by is broken in some cases when key is bytearrays (cheolsoo)
PIG-4263: Move tez local mode unit tests to a separate target (daijy)
PIG-4257: Fix several e2e tests on secure cluster (daijy)
PIG-4261: Skip shipping local resources in tez local mode (daijy)
PIG-4182: e2e tests Scripting_[1-12] fail on Windows (daijy)
PIG-4259: Fix few issues related to Union, CROSS and auto parallelism in Tez (rohini)
PIG-4250: Fix Security Risks found by Coverity (daijy)
PIG-4258: Fix several e2e tests on Windows (daijy)
PIG-4256: Fix StreamingPythonUDFs e2e test failure on Windows (daijy)
PIG-4166: Collected group drops last record when combined with merge join (bridiver via daijy)
PIG-2495: Using merge JOIN from a HBaseStorage produces an error (bridiver via daijy)
PIG-4235: Fix unit test failures on Windows (daijy)
PIG-4245: 1-1 edge vertices should use same jvm opts (rohini)
PIG-4252: Tez container reuse fail when using script udf (daijy)
PIG-4241: Auto local mode mistakenly converts large jobs to local mode when using with Hive tables (cheolsoo)
PIG-4184: UDF backward compatibility issue after POStatus.STATUS_NULL refactory (daijy)
PIG-4238: Property 'pig.job.converted.fetch' should be unset when fetch finishes (lbendig)
PIG-4151: Pig Cannot Write Empty Maps to HBase (daijy)
PIG-4181: Cannot launch tez e2e test on Windows (daijy)
PIG-2834: MultiStorage requires unused constructor argument (daijy)
PIG-4230: Documentation fix: first nested foreach example is incomplete (lbendig via daijy)
PIG-4199: Mapreduce ACLs should be translated to Tez ACLs (rohini)
PIG-4227: Streaming Python UDF handles bag outputs incorrectly (cheolsoo)
PIG-4219: When parsing a schema, pig drops tuple inside of Bag if it contains only one field (lbendig via daijy)
PIG-4226: Upgrade Tez to 0.5.1 (daijy)
PIG-4220: MapReduce-based Rank failing with NPE due to missing Counters (knoguchi)
PIG-3985: Multiquery execution of RANK with RANK BY causes NPE (rohini)
PIG-4218: Pig OrcStorage fail to load a map with null key (daijy)
PIG-4164: After Pig job finish, Pig client spend too much time retry to connect to AM (daijy)
PIG-4212: Allow LIMIT of 0 for variableLimit (constant 0 is already allowed) (knoguchi)
PIG-4196: Auto ship udf jar is broken (daijy)
PIG-4214: Fix unit test fail TestMRJobStats (daijy)
PIG-4217: Fix documentation in BuildBloom (praveenr019 via daijy)
PIG-4215: Fix unit test failure TestParamSubPreproc and TestMacroExpansion (daijy)
PIG-4175: PIG CROSS operation follow by STORE produces non-deterministic results each run (daijy)
PIG-4202: Reset UDFContext state before OutputCommitter invocations in Tez (rohini)
PIG-4205: e2e test property-check does not check all prerequisites (kellyzly via daijy)
PIG-4180: e2e test Native_3 fail on Hadoop 2 (daijy)
PIG-4178: HCatDDL_[1-3] fail on Windows (daijy)
PIG-4046: PiggyBank DBStorage DATETIME should use setTimestamp with java.sql.Timestamp (sinchii via daijy)
PIG-4050: HadoopShims.getTaskReports() can cause OOM with Hadoop 2 (rohini)
PIG-4176: Fix tez e2e test Bloom_[1-3] (daijy)
PIG-4195: Support loading char/varchar data in OrcStorage (daijy)
PIG-4201: Native e2e tests fail when run against old version of pig (rohini)
PIG-4197: Fix typo in Job Stats header: MinMapTIme => MinMapTime (jmartell7 via daijy)
PIG-4194: ReadToEndLoader does not call setConf on pigSplit in initializeReader (shadanan via rohini)
PIG-4187: Fix Orc e2e tests (daijy)
PIG-4177: BigData_1 fail after PIG-4149 (daijy)
PIG-3507: Pig fails to run in local mode on a Kerberos enabled Hadoop cluster (kellyzly via rohini)
PIG-4171: Streaming UDF fails when direct fetch optimization is enabled (cheolsoo)
PIG-4170: Multiquery with different type of key gives wrong result (daijy)
PIG-4104: Accumulator UDF throws OOM in Tez (rohini)
PIG-4169: NPE in ConstantCalculator (cheolsoo)
PIG-4161: check for latest Hive snapshot dependencies (daijy)
PIG-4102: Adding e2e tests and several improvements for Orc predicate pushdown (daijy)
PIG-4156: [PATCH] fix NPE when running scripts stored on hdfs:// (acoliver via daijy)
PIG-4159: TestGroupConstParallelTez and TestJobSubmissionTez should be excluded in Hadoop 20 unit tests (cheolsoo)
PIG-4154: ScriptState#setScript(File) does not close resources (lars_francke via daijy)
PIG-4155: Quitting grunt shell using CTRL-D character throws exception (abhishek.agarwal via daijy)
PIG-4157: Pig compilation failure due to HIVE-7208 (daijy)
PIG-4158: TestAssert is broken in trunk (cheolsoo)
PIG-4143: Port more mini cluster tests to Tez - part 7 (daijy)
PIG-4149: Rounding issue in FindQuantiles (daijy)
PIG-4145: Port local mode tests to Tez - part1 (daijy)
PIG-4076: Fix pom file (daijy)
PIG-4140: VertexManagerEvent.getUserPayload returns ReadOnlyBuffer after TEZ-1449 (daijy)
PIG-4136: No special handling jythonjar/jrubyjar in e2e tests after PIG-4047 (daijy)
PIG-4137: Fix hadoopversion 23 compilation due to TEZ-1469 (daijy)
PIG-4135: Fetch optimization should be disabled if plan contains no limit (cheolsoo)
PIG-4061: Make Streaming UDF work in Tez (hotfix PIG-4061-3.patch)
PIG-4134: TEZ-1449 broke the build (knoguchi)
PIG-4132: TEZ-1246 and TEZ-1390 broke a build (knoguchi)
PIG-4129: Pig -Dhadoopversion=23 compile fail after TEZ-1426 (daijy)
PIG-4127: Build failure due to TEZ-1132 and TEZ-1416 (lbendig)
PIG-4125: TEZ-1347 broke the build
PIG-4123: Increase memory for TezMiniCluster (daijy)
PIG-4122: Fix hadoopversion 23 compilation due to TEZ-1194 (daijy)
PIG-4061: Make Streaming UDF work in Tez (daijy)
PIG-4118: Fix hadoopversion 23 compilation due to TEZ-1237/TEZ-1407 (daijy)
PIG-4109: register local jar fail on Windows when Pig script is remote (daijy)
PIG-4116: Update Pig doc about Hadoop 2 Streaming Python UDF support (cheolsoo)
PIG-4112: NPE in packager when union + group-by followed by replicated join in Tez (rohini via cheolsoo)
PIG-4113: TEZ-1386 breaks hadoop 2 compilation in trunk (cheolsoo)
PIG-4110: TEZ-1382 breaks Hadoop 2 compilation (cheolsoo)
PIG-4105: Fix TestAvroStorage with ibm jdk (fang fang chen via daijy)
PIG-4108: Pig -Dhadoopversion=23 compile fail after TEZ-1317 (daijy)
PIG-4086: Fix Orc e2e tests for tez (daijy)
PIG-4101: Lower tez.am.task.max.failed.attempts to 2 from 4 in Tez mini cluster (cheolsoo)
PIG-4099: "ant copypom" failed with "could not find file $PIG_HOME/ivy/pig.pom to copy" (fang fang chen via cheolsoo)
PIG-4098: Vertex Location Hint api update after TEZ-1041 (jeagles via cheolsoo)
PIG-4088: TEZ-1346 breaks hadoop 2 compilation in trunk (cheolsoo)
PIG-4089: TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after
PIG-4079 in Hadoop 1 (cheolsoo)
PIG-4085: TEZ-1303 broke hadoop 2 compilation in trunk (cheolsoo)
PIG-4082: TEZ-1278 broke hadoop 2 compilation in trunk (cheolsoo)
PIG-4079: Parallel clause is not honored in local mode (cheolsoo)
PIG-4078: Port more mini cluster tests to Tez - part 6 (rohini)
PIG-4071: Fix TestStore.testSetStoreSchema, TestParamSubPreproc.testGruntWithParamSub,
TestJobSubmission.testReducerNumEstimation (daijy)
PIG-4074: mapreduce.client.submit.file.replication is not honored in cached files (cheolsoo)
PIG-4052: TestJobControlSleep, TestInvokerSpeed are unreliable (daijy)
PIG-4053: TestMRCompiler succeeded with sun jdk 1.6 while failed with sun jdk 1.7 (daijy)
PIG-3982: ant target test-tez should depend on jackson-pig-3039-test-download (daijy)
PIG-4064: Fix tez auto parallelism test failures (daijy)
PIG-4075: TEZ-1311 broke Hadoop2 compilation (cheolsoo)
PIG-4070: Change from TezJobConfig to TezRuntimeConfiguration (rohini)
PIG-4068: ObjectCache causes ClassCastException (cheolsoo)
PIG-4067: TestAllLoader in piggybank fails with new hive version (rohini)
PIG-4065: Fix failing unit tests in Tez (rohini)
PIG-4060: Refactor TezJob and TezLauncher (cheolsoo)
PIG-2689: JsonStorage fails to find schema when LimitAdjuster runs (rohini)
PIG-4056: Remove PhysicalOperator.setAlias (rohini)
PIG-4058: Use single config in Tez for input and output (rohini)
PIG-3886: UdfDistributedCache_1 fails in tez branch (cheolsoo)
PIG-4055 Build broke after TEZ-1130 API rename (knoguchi)