This connector allows Presto to query data stored in CSV files from local or SFTP storage.
Download the source code of Presto from GitHub, copy presto-csv
into it, and add the following line to pom.xml
under the root module:
<module>presto-csv</module>
Create a catalog properties file etc/catalog/csv.properties
. If you want to read files from local drives, use the following configuration:
connector.name=csv
csv.protocol=file
csv.base=/path/to/dir
csv.splitter=,
csv.suffix=csv
-
csv.base
sets the base directory. The schema name is the second level directory, and table name is the file name. -
csv.splitter
sets the delimiter of the CSV file. The default value is,
. -
csv.suffix
sets the suffix of the CSV file. The default value iscsv
. -
The first line of the CSV file must be the header.
The connector also supports reading files from a SFTP server.
connector.name=csv
csv.protocol=sftp
csv.base=/path/to/dir
csv.splitter=,
csv.suffix=csv
csv.host=xxx.xxx.xxx.xxx
csv.port=xxx
csv.username=xxx
csv.password=xxx
Presto does not support case-sensitive identifiers (see this issue). The connector cannot recognize the file that has uppercase letters in its path either.
- Support HDFS, HTTP Server, etc.
Presto CSV connector,支持查询本地或 SFTP 服务器上的 CSV 文件。
从 GitHub 获取 Presto 源码,将本项目复制进去,在根目录的 pom.xml
中添加模块:
<module>presto-csv</module>
创建配置文件 etc/catalog/csv.properties
。读取本地文件的配置如下:
connector.name=csv
csv.protocol=sftp
csv.base=/path/to/dir
csv.splitter=,
csv.suffix=csv
csv.host=xxx.xxx.xxx.xxx
csv.port=xxx
csv.username=xxx
csv.password=xxx
-
csv.base
为根目录,schema 对应二级目录,table 对应二级目录下的文件名。 -
csv.splitter
可以指定分隔符,默认为,
。 -
csv.suffix
可以指定文件后缀,默认为csv
。 -
文件第一行必须是字段名。
也可以从 SFTP 服务器读取文件:
connector.name=csv
csv.protocol=sftp
csv.base=/path/to/dir
csv.splitter=,
csv.suffix=csv
csv.host=xxx.xxx.xxx.xxx
csv.port=xxx
csv.username=xxx
csv.password=xxx
Presto 不支持大写表名(见 这个 issue),该连接器也不支持读取路径中包含大写字母的文件。
- 支持 HDFS、HTTP 等协议。