Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsupported Nested Structs #100

Closed
sequencerr opened this issue Feb 5, 2024 · 7 comments
Closed

Unsupported Nested Structs #100

sequencerr opened this issue Feb 5, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@sequencerr
Copy link

image
Is there a problem on my side? #3

@sequencerr sequencerr added the bug Something isn't working label Feb 5, 2024
@sequencerr
Copy link
Author

Also it's showing only first element of lists

@mukunku
Copy link
Owner

mukunku commented Feb 5, 2024

Nested complex types are not supported unfortunately. If you could share a sample file it could help get it implemented.

@sequencerr
Copy link
Author

Hello, @mukunku
Sample files (We don’t support that file type. Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP.)
sample-parquet-files.zip

generated using:
you might be also interested in https://github.com/LibertyDSNP/parquetjs/blob/c07e7e81847523f4d74edd0adf9b2f9b6bbd1d90/lib/reader.ts#L104

import { ParquetSchema, ParquetWriter } from '@dsnp/parquetjs';

const SchemaList = new ParquetSchema({
	groceries: { type: 'UTF8', repeated: true }
});
const SchemaUser = new ParquetSchema({
	user: {
		fields: {
			rating: {
				fields: {
					value: { type: 'FLOAT' },
					count: { type: 'INT64' }
				}
			}
		}
	}
});

const writer2 = await ParquetWriter.openFile(SchemaList, 'list.parquet');
const writer1 = await ParquetWriter.openFile(SchemaUser, 'user.parquet');

await writer2.appendRow({ groceries: ['foo', 'bar', 'baz', 'no', 'naming', 'imagination'] });
await writer1.appendRow({
	user: {
		rating: {
			value: 4.3,
			count: 34
		}
	}
});

await writer1.close();
await writer2.close();

Some web closed-source readers. might help
works well. data display is bad. - https://parquetreader.com/home
top search engine result - not quite accurate - https://www.parquet-viewer.com (same as https://apps.microsoft.com/detail/9N33Z6DPLR49)

@sequencerr
Copy link
Author

sequencerr commented Feb 6, 2024

Ehm, well there is https://github.com/aloneguid/parquet-dotnet/tree/master/src/Parquet.Floor which is works as intended for nested (utf8 for non-latin has bad display)

@mukunku
Copy link
Owner

mukunku commented Feb 6, 2024

Thanks this is all helpful. I'll take a look when I get the chance. I'll leave this issue open in case anyone else wants to give implementing this a shot as well.

@dbraaten42
Copy link

dbraaten42 commented Feb 27, 2024

Also can't view the file that is created when running the parquet.net example for dictionaries. Likely related.
From https://aloneguid.github.io/parquet-dotnet/serialisation.html#nested-types

class IdWithTags {
    public int Id { get; set; }

    public Dictionary<string, string>? Tags { get; set; }
}

var data = Enumerable.Range(0, 10).Select(i => new IdWithTags {
    Id = i,
    Tags = new Dictionary<string, string> {
        ["id"] = i.ToString(),
        ["gen"] = DateTime.UtcNow.ToString()
    }}).ToList();

await ParquetSerializer.SerializeAsync(data, "c:\\tmp\\map.parquet");

The exception thrown is Field schema path not found: key_value/key

@mukunku mukunku mentioned this issue Mar 5, 2024
@mukunku mukunku changed the title Unsupported Nested Unsupported Nested Structs Mar 5, 2024
@mukunku
Copy link
Owner

mukunku commented Mar 6, 2024

Thanks again for the sample files and code folks. I went ahead and created a pre-release of v2.10.1 with fixes for your issues.

@sequencerr I added nested struct support so this new version can open your test user.parquet file that you shared. The utility will still have issues opening nested lists or maps but at least nested struct support is there now.

@dbraaten42 I broadened the Map type support so ParquetViewer supports Map's created with Parquet.Net now 😁 Thanks a lot for reporting the issue.

Please give this new version a try, folks. I'm going to close this ticket out but feel free to open a new one if you have more parquet files you can't view.

@mukunku mukunku closed this as completed Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants