You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to the stats --stats-jsonl option, create a frequency --frequency-jsonl option.
The frequency cache will have the COMPLETE frequency table for each column, not just the top N (default: 10) values, as is the default behavior of frequency.
This frequency metadata will allow us to support more "smarter" use cases that require more info than the stats cache provides.
The text was updated successfully, but these errors were encountered:
IMPLEMENTATION NOTE:
The frequency cache also depends on the stats cache existing before its computed so it will only have an ALL_UNIQUE entry for ID columns.
Also, for columns with cardinality > 1000 or 90% (both configurable) of the rowcount, whichever is smaller, it will have HIGH_CARDINALITY instead.
In this way, the frequency cache should still be relatively "tiny" metadata, even for very large CSV files.
Similar to the
stats --stats-jsonl
option, create afrequency --frequency-jsonl
option.The
frequency cache
will have the COMPLETE frequency table for each column, not just the top N (default: 10) values, as is the default behavior offrequency
.This frequency metadata will allow us to support more "smarter" use cases that require more info than the stats cache provides.
The text was updated successfully, but these errors were encountered: