infer
: new command to infer additional dataset metadata based on summary stats/frequency table
#2184
Labels
datapusher+
for Datapusher+
enhancement
New feature or request. Once marked with this label, its in the backlog.
qsv pro
requires backend/cloud services
Date/Datetime formats
--infer-date
is enabled,format
should be set to the format usedmin
,max
,median
andmodes
to see if they match one of the 19 date formats recognized by qsv-dateparserLocation
--infer-location
flagmin
,max
,median
andmodes
to see if they match common location formats - https://www.maptools.com/tutorials/lat_lon/formatslatitude
formatlongitude
formatEmail
--infer-email
flagmin
,max
,median
andmodes
to see if they match common email formats using the email_address crateUsing the same approach above (looking at summary stats min, max, median, modes), also infer:
--infer-hostnames
option--infer-ipaddress
option, for bothipv4
andipv6
formats--infer-phoneno
option--infer-currency
option, adding currency symbol metadata to the format entry - e.g. "currency - USD ( $ )", "currency - JPY (¥)", "currency = PHP (₱)", "currency - ? ($)", etc.As some currency symbols like the $ is used in several countries, it will use "?" instead of the three-letter ISO 4217 code if it cannot infer it.
Also add
-F, --infer-all-formats
convenience option.If a CSV is indexed and
--format-sample <sample_size>
option is used, randomly sample the CSV to further verify if the inferred format using the summary stats is correct.The text was updated successfully, but these errors were encountered: