-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Type mismatch silently leads to incorrect de-serialization result #573
Comments
Not reproducible, here is the test proving it: class EdgeCaseInt32 {
public int Id { get; set; }
}
[Fact]
public async Task EdgeCase_rawint64_to_classInt32() {
var schema = new ParquetSchema(new DataField<long>("Id"));
using var ms = new MemoryStream();
using(ParquetWriter writer = await ParquetWriter.CreateAsync(schema, ms)) {
using(ParquetRowGroupWriter rg = writer.CreateRowGroup()) {
await rg.WriteColumnAsync(new DataColumn(schema.DataFields[0], new long[] { 1, 2, 3 }));
}
}
ms.Position = 0;
IList<EdgeCaseInt32> data = await ParquetSerializer.DeserializeAsync<EdgeCaseInt32>(ms);
Assert.Equal(1, data[0].Id);
Assert.Equal(2, data[1].Id);
Assert.Equal(3, data[2].Id);
} Feel free to reopen with reproducible test if I didn't understand you correctly. |
You are right, I did miss one detail: to reproduce this issue one needs a file with the following schema (without logical type/hint):
If I get it correctly, the provided test actually replicates the following schema (with logical type/hint):
That's why it doesn't reproduce the issue. But I attached a sample file with the following content: parquet-tools cat ./no-logical-type.parquet
# output
[{"Id":1},{"Id":2},{"Id":3}] And here is an integration test for it: class EdgeCaseInt32 {
public int? Id { get; set; }
}
[Fact]
public async Task EdgeCase_fileInt64_to_classInt32() {
IList<EdgeCaseInt32> data = await ParquetSerializer.DeserializeAsync<EdgeCaseInt32>("no-logical-type.parquet");
Assert.Equal(1, data[0].Id); // Actual: 1 - SUCCESS
Assert.Equal(2, data[1].Id); // Actual: 0 - FAILURE
Assert.Equal(3, data[2].Id); // Actual: 2 - FAILURE
} |
@aloneguid please reopen the issue on my behalf as I don't have required permissions to do it. thanks! |
@aloneguid please reopen this issue or let me know if provided details are not enough |
Got it, the issue is that your class should be defined with optional Int64 rather than Int32, then deserialization will work: class EdgeCase {
public long? Id { get; set; }
} I have added stricter validation for this use case to v5.0.3 as well. |
Great! |
Library Version
5.0.2
OS
Ubuntu Linux 22.04
OS Architecture
64 bit
How to reproduce?
Result: de-serialization completes without errors/warnings but values in objects don't match values in the file, e.g.
3
instead of1
.Failing test
No response
The text was updated successfully, but these errors were encountered: