Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-3760] Adding capability to fetch Metadata Records by prefix #5208

Merged
merged 73 commits into from
Apr 6, 2022
Merged
Changes from 1 commit
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
7e33ba2
Added `getRecordsByKeyPrefix` to `HoodieHFileReader`
Apr 1, 2022
cdf5975
Tidying up
Apr 1, 2022
67bec3c
Added `getRecordIteratorByKeyPrefix` to `HoodieHFileReader`;
Apr 1, 2022
57b2c83
Made `HoodieDataBlock` accept keys specified as prefixes;
Apr 1, 2022
0000180
Tidying up
Apr 1, 2022
5d650dd
Modified `AbstractHoodieLogRecordReader` to be able to scan records b…
Apr 1, 2022
87bd0df
Adjusted `HoodieFileReader` interface
Apr 1, 2022
aebceba
Add `HoodieBackedTableMetadata::getRecordsByKeyPrefixes`
Apr 1, 2022
4b07e6c
Added test for `HFileReader` seq fetching records by key-prefixes
Apr 1, 2022
f048d24
Fixing compilation
Apr 2, 2022
c8a3566
Fixing read from Metadata table for col stats partition
nsivabalan Apr 2, 2022
52d2ad0
Fixing prefix lookup in HFile reader and enhancing tests for the same
nsivabalan Apr 2, 2022
184a292
Adding test to TestHoodieBackedMetadata to test prefix look up in col…
nsivabalan Apr 2, 2022
fa2b889
Fixing determining col stats partition availability in dataskipping c…
nsivabalan Apr 3, 2022
d32ff02
Fixing compilation error in test with rebase
nsivabalan Apr 3, 2022
e894a83
Fixing HFileScanner usage and HoodieData for prefix based look up wit…
nsivabalan Apr 4, 2022
2b57016
fixing arg name for useCachedReaders
nsivabalan Apr 4, 2022
260d2b3
Fixing test failures related to write schema not added to Hfile on th…
nsivabalan Apr 4, 2022
166791d
Extracted common utils
Apr 3, 2022
032f0f6
Rebased `ColumnStatsIndexSupport` to fetch CSI records by key-prefix …
Apr 3, 2022
27a900b
Tidying up
Apr 3, 2022
578c1af
Rebased `HoodieFileIndex` onto updated API
Apr 3, 2022
bb15c1f
Fixed `HoodieDataBlock` to properly handle when records are fetched b…
Apr 3, 2022
fc6788c
Fixed `HoodieBackedTableMetadata` to avoid doing full-scans
Apr 3, 2022
039d597
Fixed tests
Apr 3, 2022
8f3a919
Fixing compilation
Apr 4, 2022
558892d
Fixed iterator to properly loop
Apr 4, 2022
3676540
Fixed incorrect key-prefix access gen
Apr 4, 2022
672a23a
Tidying up
Apr 4, 2022
341e338
Removed references to HFile from `HoodieFileReader`;
Apr 4, 2022
edc99f9
Cleaned up `HoodieFileReader` APIs;
Apr 4, 2022
3884f07
Cleaned up `HoodieHFileReader` APIs;
Apr 4, 2022
b55a0d3
Fixing compilation
Apr 4, 2022
4d0a0ba
`lint`
Apr 4, 2022
d24ccf7
Removed APIs which dereference iterators from `HoodieFileReader`, ins…
Apr 4, 2022
e580150
Fixing compilation
Apr 4, 2022
63c7160
Fixed records iterators
Apr 4, 2022
b4c9ef6
Fixed tests
Apr 4, 2022
974d903
Fixing some more
Apr 4, 2022
fe1f54f
Fixing a little more
Apr 4, 2022
53a0369
Fixing task being non-serializable
Apr 4, 2022
eb6773a
Tidying up
Apr 4, 2022
51d708c
Consolidated configuration around `forceFullScan` property for `LogRe…
Apr 4, 2022
54c5ebf
Adding assertions
Apr 4, 2022
8a43c10
Test with all log-files scan modes
Apr 4, 2022
44ccc1b
Killed dead-code
Apr 4, 2022
9189246
Fixing incorrect sorting of the keys
Apr 5, 2022
917a6d4
Fixing tests
Apr 5, 2022
d3d03e3
Tidying up
Apr 5, 2022
4aedf0d
Cleaned up `ColumnStatsIndexSupport` removing duplication
Apr 5, 2022
4fb6dcc
Fixed schemas used to read from MT, to be fetched from HFile
Apr 5, 2022
c84616b
Fixed `TestHoodieFileIndex`
Apr 5, 2022
42144d0
Tidying up
Apr 5, 2022
6abff9e
Disallow full-scans for "column_stats", "bloom_filters" partitions
Apr 5, 2022
bc2de38
Killed `fsDataInputStream`
Apr 5, 2022
a10a6cc
Rebased `schema` field to become final, thread-safe
Apr 5, 2022
8f99436
Tidying up
Apr 5, 2022
7f7a75d
Tidying up
Apr 5, 2022
b035ff3
Added `sharedScanner` instance to be re-used by point-lookup queries
Apr 5, 2022
b911163
Fixed `RecordIterator`, `RecordByKeysIterator`
Apr 5, 2022
87ff41f
Fixed `RecordByKeyPrefixIterator`
Apr 5, 2022
ade64a5
Tidying up
Apr 5, 2022
a640f38
Missing license
Apr 5, 2022
885b577
Fixing tests
Apr 5, 2022
10551b6
Fixed `RecordByKeyPrefixIterator` to properly handle EOF
Apr 5, 2022
4980837
Rebased `HoodieMergeOnReadRDD` to use Metadata Config from the File I…
Apr 5, 2022
1ce0aa9
Fixed opts for MT reads
Apr 5, 2022
0370146
Force MT full-scan when reading it from Spark DS;
Apr 5, 2022
92aab50
Fixed query-type missing from config props in Spark Streaming
Apr 5, 2022
47c2631
Added fallback to default val for `QUERY_TYPE` config in Spark File I…
Apr 5, 2022
40613eb
Fixed `TestLayoutOptimization` missing required MT configs
Apr 6, 2022
f110928
Cleaned up duplicated methods
Apr 6, 2022
0dcfe21
Close reader and remove public API from metadata payload
codope Apr 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixing test failures related to write schema not added to Hfile on th…
…e write path
  • Loading branch information
nsivabalan authored and Alexey Kudinkin committed Apr 6, 2022
commit 260d2b341cc55d9937bfbef355ffb339cec0c82b
Original file line number Diff line number Diff line change
@@ -149,6 +149,8 @@ protected byte[] serializeRecords(List<IndexedRecord> records) throws IOExceptio
}
});

writer.appendFileInfo(HoodieHFileReader.KEY_SCHEMA.getBytes(), getSchema().toString().getBytes());

writer.close();
ostream.flush();
ostream.close();