From 608fd1d38c55bea13f4e34b8e27639831e38e7d6 Mon Sep 17 00:00:00 2001 From: slothever <18522955+wsjz@users.noreply.github.com> Date: Sat, 4 Nov 2023 15:42:45 +0800 Subject: [PATCH] [fix](multi-catalog)add the FAQ for Aliyun DLF and add the fs.xx.impl check #25594 (#26422) --- docs/en/docs/lakehouse/faq.md | 278 ++++++++++++++++++ docs/en/docs/lakehouse/multi-catalog/dlf.md | 32 +- docs/zh-CN/docs/lakehouse/faq.md | 273 +++++++++++++++++ .../zh-CN/docs/lakehouse/multi-catalog/dlf.md | 34 ++- .../datasource/hive/HiveMetaStoreCache.java | 6 +- .../property/PropertyConverter.java | 7 +- 6 files changed, 594 insertions(+), 36 deletions(-) create mode 100644 docs/en/docs/lakehouse/faq.md create mode 100644 docs/zh-CN/docs/lakehouse/faq.md diff --git a/docs/en/docs/lakehouse/faq.md b/docs/en/docs/lakehouse/faq.md new file mode 100644 index 00000000000000..a6c97cfbb6d9e7 --- /dev/null +++ b/docs/en/docs/lakehouse/faq.md @@ -0,0 +1,278 @@ +--- +{ + "title": "FAQ", + "language": "en" +} +--- + + + + +# FAQ + +## Kerberos + + +1. What to do with the `GSS initiate failed` error when connecting to Hive Metastore with Kerberos authentication? + + Usually it is caused by incorrect Kerberos authentication information, you can troubleshoot by the following steps: + + 1. In versions before 1.2.1, the libhdfs3 library that Doris depends on does not enable gsasl. Please update to a version later than 1.2.2. + 2. Confirm that the correct keytab and principal are set for each component, and confirm that the keytab file exists on all FE and BE nodes. + + 1. `hadoop.kerberos.keytab`/`hadoop.kerberos.principal`: for Hadoop HDFS + 2. `hive.metastore.kerberos.principal`: for hive metastore. + + 3. Try to replace the IP in the principal with a domain name (do not use the default `_HOST` placeholder) + 4. Confirm that the `/etc/krb5.conf` file exists on all FE and BE nodes. + +2. An error is reported when connecting to the Hive database through the Hive Catalog: `RemoteException: SIMPLE authentication is not enabled. Available: [TOKEN, KERBEROS]` + + If both `show databases` and `show tables` are OK, and the above error occurs when querying, we need to perform the following two operations: + - Core-site.xml and hdfs-site.xml need to be placed in the fe/conf and be/conf directories + - The BE node executes the kinit of Kerberos, restarts the BE, and then executes the query. + +3. If an error is reported while querying the catalog with Kerberos: `GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos Ticket)`. + - Restarting FE and BE can solve the problem in most cases. + - Before the restart all the nodes, can put `-Djavax.security.auth.useSubjectCredsOnly=false` to the `JAVA_OPTS` in `"${DORIS_HOME}/be/conf/be.conf"`, which can obtain credentials through the underlying mechanism, rather than through the application. + - Get more solutions to common JAAS errors from the [JAAS Troubleshooting](https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/Troubleshooting.html). + +4. The solutions when configuring Kerberos in the catalog and encounter an error: `Unable to obtain password from user`. + - The principal used must exist in the klist, use `klist -kt your.keytab` to check. + - Ensure the catalog configuration correct, such as missing the `yarn.resourcemanager.principal`. + - If the preceding checks are correct, the JDK version installed by yum or other package-management utility in the current system maybe have an unsupported encryption algorithm. It is recommended to install JDK by yourself and set `JAVA_HOME` environment variable. + +5. An error is reported when using KMS to access HDFS: `java.security.InvalidKeyException: Illegal key size` + + Upgrade the JDK version to a version >= Java 8 u162. Or download and install the JCE Unlimited Strength Jurisdiction Policy Files corresponding to the JDK. + +6. If an error is reported while configuring Kerberos in the catalog: `SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]`. + + Need to put `core-site.xml` to the `"${DORIS_HOME}/be/conf"` directory. + + If an error is reported while accessing HDFS: `No common protection layer between client and server`, check the `hadoop.rpc.protection` on the client and server to make them consistent. + + ``` + + + + + + + hadoop.security.authentication + kerberos + + + + ``` + +7. If an error is reported while configuring Kerberos for Broker Load: `Cannot locate default realm.`. + + Add `-Djava.security.krb5.conf=/your-path` to the `JAVA_OPTS` of the broker startup script `start_broker.sh`. + +8. When using Kerberos configuration in the Catalog, the `hadoop.username` property cannot be appeared in Catalog properties. + +## JDBC Catalog + +1. An error is reported when connecting to SQLServer through JDBC Catalog: `unable to find valid certification path to requested target` + + Please add `trustServerCertificate=true` option in `jdbc_url`. + +2. When connecting to the MySQL database through the JDBC Catalog, the Chinese characters are garbled, or the Chinese character condition query is incorrect + + Please add `useUnicode=true&characterEncoding=utf-8` in `jdbc_url` + + > Note: After version 1.2.3, these parameters will be automatically added when using JDBC Catalog to connect to the MySQL database. + +3. An error is reported when connecting to the MySQL database through the JDBC Catalog: `Establishing SSL connection without server's identity verification is not recommended` + + Please add `useSSL=true` in `jdbc_url` + +4. When using JDBC Catalog to synchronize MySQL data to Doris, the date data synchronization error occurs. It is necessary to check whether the MySQL version corresponds to the MySQL driver package. For example, the driver com.mysql.cj.jdbc.Driver is required for MySQL8 and above. + + +## Hive Catalog + +1. What to do with errors such as `failed to get schema` and `Storage schema reading not supported` when accessing Icerberg tables via Hive Metastore? + + To fix this, please place the Jar file package of `iceberg` runtime in the `lib/` directory of Hive. + + And configure as follows in `hive-site.xml` : + + ``` + metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader + ``` + + After configuring, please restart Hive Metastore. + +2. An error is reported when connecting Hive Catalog: `Caused by: java.lang.NullPointerException` + + If there is stack trace in fe.log: + + ``` + Caused by: java.lang.NullPointerException + at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:78) ~[hive-exec-3.1.3-core.jar:3.1.3] + at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:55) ~[hive-exec-3.1.3-core.jar:3.1.3] + at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1548) ~[doris-fe.jar:3.1.3] + at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1542) ~[doris-fe.jar:3.1.3] + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181] + ``` + + Try adding `"metastore.filter.hook" = "org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl"` in `create catalog` statement. + +3. If the `show tables` is normal after creating the Hive Catalog, but the query report `java.net.UnknownHostException: xxxxx` + + Add a property in CATALOG: + ``` + 'fs.defaultFS' = 'hdfs://' + ``` + +4. The table in orc format of Hive 1.x may encounter system column names such as `_col0`, `_col1`, `_col2`... in the underlying orc file schema, which need to be specified in the catalog configuration. Add `hive.version` to 1.x.x so that it will use the column names in the hive table for mapping. + + ```sql + CREATE CATALOG hive PROPERTIES ( + 'hive.version' = '1.x.x' + ); + ``` + +5. If an error related to the Hive Metastore is reported while querying the catalog: `Invalid method name`. + + Configure the `hive.version`. + + ```sql + CREATE CATALOG hive PROPERTIES ( + 'hive.version' = '2.x.x' + ); + ``` + +6. When querying a table in ORC format, FE reports an error `Could not obtain block` or `Caused by: java.lang.NoSuchFieldError: types` + + For ORC files, by default, FE will access HDFS to obtain file information and split files. In some cases, FE may not be able to access HDFS. It can be solved by adding the following parameters: + + `"hive.exec.orc.split.strategy" = "BI"` + + Other options: HYBRID (default), ETL. + +7. The values of the partition fields in the hudi table can be found on hive, but they cannot be found on doris. + + Doris and hive currently query hudi differently. Doris needs to add partition fields to the avsc file of the hudi table structure. If not added, it will cause Doris to query partition_ Val is empty (even if home. datasource. live_sync. partition_fields=partition_val is set) + + ``` + { + "type": "record", + "name": "record", + "fields": [{ + "name": "partition_val", + "type": [ + "null", + "string" + ], + "doc": "Preset partition field, empty string when not partitioned", + "default": null + }, + { + "name": "name", + "type": "string", + "doc": "名称" + }, + { + "name": "create_time", + "type": "string", + "doc": "创建时间" + } + ] + } + ``` + +8. Query the appearance of hive and encounter this error:`java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found` + + Search in the hadoop environment hadoop-lzo-*.jar, and put it under "${DORIS_HOME}/fe/lib/",then restart fe. + + Starting from version 2.0.2, this file can be placed in BE's `custom_lib/` directory (if it does not exist, just create it manually) to prevent the file from being lost due to the replacement of the lib directory when upgrading the cluster. + +9. Create a hive table specifying `serde` as `org.apache.hadoop.hive.contrib.serde2.MultiDelimitserDe`, and an error is reported when accessing the table: `storage schema reading not supported` + + Add the following configuration to the hive-site .xml file and restart the HMS service: + + ``` + + metastore.storage.schema.reader.impl + org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader + + ``` + +10. Error:java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + + Entire error info found in FE.log is shown as below: + ``` + org.apache.doris.common.UserException: errCode = 2, detailMessage = S3 list path failed. path=s3://bucket/part-*,msg=errors while get file status listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + org.apache.doris.common.UserException: errCode = 2, detailMessage = S3 list path exception. path=s3://bucket/part-*, err: errCode = 2, detailMessage = S3 list path failed. path=s3://bucket/part-*,msg=errors while get file status listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + org.apache.hadoop.fs.s3a.AWSClientIOException: listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + Caused by: javax.net.ssl.SSLException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + Caused by: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + Caused by: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + ``` + + Try to update FE node CA certificates, use command `update-ca-trust (CentOS/RockyLinux)`, then restart FE process. + +## HDFS + +1. What to do with the`java.lang.VerifyError: xxx` error when accessing HDFS 3.x? + + Doris 1.2.1 and the older versions rely on Hadoop 2.8. Please update Hadoop to 2.10.2 or update Doris to 1.2.2 or newer. + +2. Use Hedged Read to optimize the problem of slow HDFS reading. + + In some cases, the high load of HDFS may lead to a long time to read the data on HDFS, thereby slowing down the overall query efficiency. HDFS Client provides Hedged Read. + This function can start another read thread to read the same data when a read request exceeds a certain threshold and is not returned, and whichever is returned first will use the result. + + This feature can be enabled in two ways: + + - Specify in the parameters to create the Catalog: + + ``` + create catalog regression properties ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://172.21.16.47:7004', + 'dfs.client.hedged.read.threadpool.size' = '128', + 'dfs.client.hedged.read.threshold.millis' = "500" + ); + ``` + + `dfs.client.hedged.read.threadpool.size` indicates the number of threads used for Hedged Read, which are shared by one HDFS Client. Usually, for an HDFS cluster, BE nodes will share an HDFS Client. + + `dfs.client.hedged.read.threshold.millis` is the read threshold in milliseconds. When a read request exceeds this threshold and is not returned, Hedged Read will be triggered. + + After enabling it, you can see related parameters in Query Profile: + + `TotalHedgedRead`: The number of Hedged Reads initiated. + + `HedgedReadWins`: The number of successful Hedged Reads (numbers initiated and returned faster than the original request) + + Note that the value here is the cumulative value of a single HDFS Client, not the value of a single query. The same HDFS Client will be reused by multiple queries. + +## DLF Catalog + +1. When using DLF Catalog, BE reads `Invalid address` when fetching JindoFS data and needs to add the domain name to IP mapping that appears in the log in `/ets/hosts`. + +2. When reading data is not authorized, use the `hadoop.username` property to specify the authorized user. + +3. The metadata in the DLF Catalog is consistent with the DLF. When DLF is used to manage metadata, newly imported Hive partitions may not be synchronized by DLF, resulting in inconsistency between the DLF and Hive metadata. In this case, ensure firstly that the Hive metadata is fully synchronized by DLF. diff --git a/docs/en/docs/lakehouse/multi-catalog/dlf.md b/docs/en/docs/lakehouse/multi-catalog/dlf.md index 763fa9fdd8cd07..7fb88bb316b75e 100644 --- a/docs/en/docs/lakehouse/multi-catalog/dlf.md +++ b/docs/en/docs/lakehouse/multi-catalog/dlf.md @@ -67,23 +67,25 @@ Doris supports accessing Hive/Iceberg/Hudi metadata in DLF. ### Use OSS-HDFS as the datasource 1. Enable OSS-HDFS. [Grant access to OSS or OSS-HDFS](https://www.alibabacloud.com/help/en/e-mapreduce/latest/oss-hdfsnew) -2. Download the SDK. [JindoData SDK](https://github.com/aliyun/alibabacloud-jindodata/blob/master/docs/user/5.x/5.0.0-beta7/jindodata_download.md) -3. Decompress the jindosdk.tar.gz, and then enter its lib directory and put `jindo-core.jar, jindo-sdk.jar` to both `${DORIS_HOME}/fe/lib` and `${DORIS_HOME}/be/lib/java_extensions`. +2. Download the SDK. [JindoData SDK](https://github.com/aliyun/alibabacloud-jindodata/blob/master/docs/user/5.x/5.0.0-beta7/jindodata_download.md). If the Jindo SDK directory already exists on the cluster, skip this step. +3. Decompress the jindosdk.tar.gz or locate the Jindo SDK directory on the cluster, and then enter its lib directory and put `jindo-core.jar, jindo-sdk.jar` to both `${DORIS_HOME}/fe/lib` and `${DORIS_HOME}/be/lib/java_extensions/preload-extensions`. 4. Create DLF Catalog, set `oss.hdfs.enabled` as `true`: -```sql -CREATE CATALOG dlf_oss_hdfs PROPERTIES ( - "type"="hms", - "hive.metastore.type" = "dlf", - "dlf.proxy.mode" = "DLF_ONLY", - "dlf.endpoint" = "datalake-vpc.cn-beijing.aliyuncs.com", - "dlf.region" = "cn-beijing", - "dlf.uid" = "uid", - "dlf.access_key" = "ak", - "dlf.secret_key" = "sk", - "oss.hdfs.enabled" = "true" -); -``` + ```sql + CREATE CATALOG dlf_oss_hdfs PROPERTIES ( + "type"="hms", + "hive.metastore.type" = "dlf", + "dlf.proxy.mode" = "DLF_ONLY", + "dlf.endpoint" = "datalake-vpc.cn-beijing.aliyuncs.com", + "dlf.region" = "cn-beijing", + "dlf.uid" = "uid", + "dlf.access_key" = "ak", + "dlf.secret_key" = "sk", + "oss.hdfs.enabled" = "true" + ); + ``` + +5. When the Jindo SDK version is inconsistent with the version used on the EMR cluster, will reported `Plugin not found` and the Jindo SDK needs to be replaced with the corresponding version. ### DLF Iceberg Catalog diff --git a/docs/zh-CN/docs/lakehouse/faq.md b/docs/zh-CN/docs/lakehouse/faq.md new file mode 100644 index 00000000000000..c1ab720e9e07ff --- /dev/null +++ b/docs/zh-CN/docs/lakehouse/faq.md @@ -0,0 +1,273 @@ +--- +{ + "title": "常见问题", + "language": "zh-CN" +} +--- + + + + +# 常见问题 + +## Kerberos + +1. 连接 Kerberos 认证的 Hive Metastore 报错:`GSS initiate failed` + + 通常是因为 Kerberos 认证信息填写不正确导致的,可以通过以下步骤排查: + + 1. 1.2.1 之前的版本中,Doris 依赖的 libhdfs3 库没有开启 gsasl。请更新至 1.2.2 之后的版本。 + 2. 确认对各个组件,设置了正确的 keytab 和 principal,并确认 keytab 文件存在于所有 FE、BE 节点上。 + + 1. `hadoop.kerberos.keytab`/`hadoop.kerberos.principal`:用于 Hadoop hdfs 访问,填写 hdfs 对应的值。 + 2. `hive.metastore.kerberos.principal`:用于 hive metastore。 + + 3. 尝试将 principal 中的 ip 换成域名(不要使用默认的 `_HOST` 占位符) + 4. 确认 `/etc/krb5.conf` 文件存在于所有 FE、BE 节点上。 + +2. 通过 Hive Catalog 连接 Hive 数据库报错:`RemoteException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]`. + + 如果在 `show databases` 和 `show tables` 都是没问题的情况下,查询的时候出现上面的错误,我们需要进行下面两个操作: + - fe/conf、be/conf 目录下需放置 core-site.xml 和 hdfs-site.xml + - BE 节点执行 Kerberos 的 kinit 然后重启 BE ,然后再去执行查询即可. + +3. 查询配置了Kerberos的外表,遇到该报错:`GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos Ticket)`,一般重启FE和BE能够解决该问题。 + + - 重启所有节点前可在`"${DORIS_HOME}/be/conf/be.conf"`中的JAVA_OPTS参数里配置`-Djavax.security.auth.useSubjectCredsOnly=false`,通过底层机制去获取JAAS credentials信息,而不是应用程序。 + - 在[JAAS Troubleshooting](https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/Troubleshooting.html)中可获取更多常见JAAS报错的解决方法。 + +4. 在Catalog中配置Kerberos时,报错`Unable to obtain password from user`的解决方法: + + - 用到的principal必须在klist中存在,使用`klist -kt your.keytab`检查。 + - 检查catalog配置是否正确,比如漏配`yarn.resourcemanager.principal`。 + - 若上述检查没问题,则当前系统yum或者其他包管理软件安装的JDK版本存在不支持的加密算法,建议自行安装JDK并设置`JAVA_HOME`环境变量。 + +5. 使用 KMS 访问 HDFS 时报错:`java.security.InvalidKeyException: Illegal key size` + + 升级 JDK 版本到 >= Java 8 u162 的版本。或者下载安装 JDK 相应的 JCE Unlimited Strength Jurisdiction Policy Files。 + +6. 在Catalog中配置Kerberos时,如果报错`SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]`,那么需要将`core-site.xml`文件放到`"${DORIS_HOME}/be/conf"`目录下。 + + 如果访问HDFS报错`No common protection layer between client and server`,检查客户端和服务端的`hadoop.rpc.protection`属性,使他们保持一致。 + + ``` + + + + + + + hadoop.security.authentication + kerberos + + + + ``` +7. 在使用Broker Load时,配置了Kerberos,如果报错`Cannot locate default realm.`。 + + 将 `-Djava.security.krb5.conf=/your-path` 配置项添加到Broker Load启动脚本的 `start_broker.sh` 的 `JAVA_OPTS`里。 + +8. 当在Catalog里使用Kerberos配置时,不能同时使用`hadoop.username`属性。 + +## JDBC Catalog + +1. 通过 JDBC Catalog 连接 SQLServer 报错:`unable to find valid certification path to requested target` + + 请在 `jdbc_url` 中添加 `trustServerCertificate=true` 选项。 + +2. 通过 JDBC Catalog 连接 MySQL 数据库,中文字符乱码,或中文字符条件查询不正确 + + 请在 `jdbc_url` 中添加 `useUnicode=true&characterEncoding=utf-8` + + > 注:1.2.3 版本后,使用 JDBC Catalog 连接 MySQL 数据库,会自动添加这些参数。 + +3. 通过 JDBC Catalog 连接 MySQL 数据库报错:`Establishing SSL connection without server's identity verification is not recommended` + + 请在 `jdbc_url` 中添加 `useSSL=true` + +4. 使用JDBC Catalog将MySQL数据同步到Doris中,日期数据同步错误。需要校验下MySQL的版本是否与MySQL的驱动包是否对应,比如MySQL8以上需要使用驱动com.mysql.cj.jdbc.Driver。 + + +## Hive Catalog + +1. 通过 Hive Metastore 访问 Iceberg 表报错:`failed to get schema` 或 `Storage schema reading not supported` + + 在 Hive 的 lib/ 目录放上 `iceberg` 运行时有关的 jar 包。 + + 在 `hive-site.xml` 配置: + + ``` + metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader + ``` + + 配置完成后需要重启Hive Metastore。 + +2. 连接 Hive Catalog 报错:`Caused by: java.lang.NullPointerException` + + 如 fe.log 中有如下堆栈: + + ``` + Caused by: java.lang.NullPointerException + at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:78) ~[hive-exec-3.1.3-core.jar:3.1.3] + at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:55) ~[hive-exec-3.1.3-core.jar:3.1.3] + at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1548) ~[doris-fe.jar:3.1.3] + at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1542) ~[doris-fe.jar:3.1.3] + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181] + ``` + + 可以尝试在 `create catalog` 语句中添加 `"metastore.filter.hook" = "org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl"` 解决。 + +3. 如果创建 Hive Catalog 后能正常`show tables`,但查询时报`java.net.UnknownHostException: xxxxx` + + 可以在 CATALOG 的 PROPERTIES 中添加 + ``` + 'fs.defaultFS' = 'hdfs://' + ``` +4. Hive 1.x 的 orc 格式的表可能会遇到底层 orc 文件 schema 中列名为 `_col0`,`_col1`,`_col2`... 这类系统列名,此时需要在 catalog 配置中添加 `hive.version` 为 1.x.x,这样就会使用 hive 表中的列名进行映射。 + + ```sql + CREATE CATALOG hive PROPERTIES ( + 'hive.version' = '1.x.x' + ); + ``` + +5. 使用Catalog查询表数据时发现与Hive Metastore相关的报错:`Invalid method name`,需要设置`hive.version`参数。 + + ```sql + CREATE CATALOG hive PROPERTIES ( + 'hive.version' = '2.x.x' + ); + ``` + +6. 查询 ORC 格式的表,FE 报错 `Could not obtain block` 或 `Caused by: java.lang.NoSuchFieldError: types` + + 对于 ORC 文件,在默认情况下,FE 会访问 HDFS 获取文件信息,进行文件切分。部分情况下,FE 可能无法访问到 HDFS。可以通过添加以下参数解决: + + `"hive.exec.orc.split.strategy" = "BI"` + + 其他选项:HYBRID(默认),ETL。 + +7. 在hive上可以查到hudi表分区字段的值,但是在doris查不到。 + + doris和hive目前查询hudi的方式不一样,doris需要在hudi表结构的avsc文件里添加上分区字段,如果没加,就会导致doris查询partition_val为空(即使设置了hoodie.datasource.hive_sync.partition_fields=partition_val也不可以) + ``` + { + "type": "record", + "name": "record", + "fields": [{ + "name": "partition_val", + "type": [ + "null", + "string" + ], + "doc": "Preset partition field, empty string when not partitioned", + "default": null + }, + { + "name": "name", + "type": "string", + "doc": "名称" + }, + { + "name": "create_time", + "type": "string", + "doc": "创建时间" + } + ] + } + ``` +8. 查询hive外表,遇到该报错:`java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found` + + 去hadoop环境搜索`hadoop-lzo-*.jar`放在`"${DORIS_HOME}/fe/lib/"`目录下并重启fe。 + + 从 2.0.2 版本起,可以将这个文件放置在BE的 `custom_lib/` 目录下(如不存在,手动创建即可),以防止升级集群时因为 lib 目录被替换而导致文件丢失。 + +9. 创建hive表指定serde为 `org.apache.hadoop.hive.contrib.serde2.MultiDelimitserDe`,访问表时报错:`storage schema reading not supported` + + 在hive-site.xml文件中增加以下配置,并重启hms服务: + + ``` + + metastore.storage.schema.reader.impl + org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader + + ``` + +10. 报错:java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + + FE日志中完整报错信息如下: + ``` + org.apache.doris.common.UserException: errCode = 2, detailMessage = S3 list path failed. path=s3://bucket/part-*,msg=errors while get file status listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + org.apache.doris.common.UserException: errCode = 2, detailMessage = S3 list path exception. path=s3://bucket/part-*, err: errCode = 2, detailMessage = S3 list path failed. path=s3://bucket/part-*,msg=errors while get file status listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + org.apache.hadoop.fs.s3a.AWSClientIOException: listStatus on s3://bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + Caused by: javax.net.ssl.SSLException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + Caused by: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + Caused by: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty + ``` + + 尝试更新FE节点CA证书,使用 `update-ca-trust(CentOS/RockyLinux)`,然后重启FE进程即可。 + +## HDFS + +1. 访问 HDFS 3.x 时报错:`java.lang.VerifyError: xxx` + + 1.2.1 之前的版本中,Doris 依赖的 Hadoop 版本为 2.8。需更新至 2.10.2。或更新 Doris 至 1.2.2 之后的版本。 + +2. 使用 Hedged Read 优化 HDFS 读取慢的问题。 + + 在某些情况下,HDFS 的负载较高可能导致读取某个 HDFS 上的数据副本的时间较长,从而拖慢整体的查询效率。HDFS Client 提供了 Hedged Read 功能。 + 该功能可以在一个读请求超过一定阈值未返回时,启动另一个读线程读取同一份数据,哪个先返回就是用哪个结果。 + + 注意:该功能可能会增加 HDFS 集群的负载,请酌情使用。 + + 可以通过以下两种方式开启这个功能: + + - 在创建 Catalog 的参数中指定: + + ``` + create catalog regression properties ( + 'type'='hms', + 'hive.metastore.uris' = 'thrift://172.21.16.47:7004', + 'dfs.client.hedged.read.threadpool.size' = '128', + 'dfs.client.hedged.read.threshold.millis' = "500" + ); + ``` + + `dfs.client.hedged.read.threadpool.size` 表示用于 Hedged Read 的线程数,这些线程由一个 HDFS Client 共享。通常情况下,针对一个 HDFS 集群,BE 节点会共享一个 HDFS Client。 + + `dfs.client.hedged.read.threshold.millis` 是读取阈值,单位毫秒。当一个读请求超过这个阈值未返回时,会触发 Hedged Read。 + + + 开启后,可以在 Query Profile 中看到相关参数: + + `TotalHedgedRead`: 发起 Hedged Read 的次数。 + + `HedgedReadWins`:Hedged Read 成功的次数(发起并且比原请求更快返回的次数) + + 注意,这里的值是单个 HDFS Client 的累计值,而不是单个查询的数值。同一个 HDFS Client 会被多个查询复用。 + +## DLF Catalog + +1. 使用DLF Catalog时,BE读在取JindoFS数据出现`Invalid address`,需要在`/ets/hosts`中添加日志中出现的域名到IP的映射。 + +2. 读取数据无权限时,使用`hadoop.username`属性指定有权限的用户。 + +3. DLF Catalog中的元数据和DLF保持一致。当使用DLF管理元数据时,Hive新导入的分区,可能未被DLF同步,导致出现DLF和Hive元数据不一致的情况,对此,需要先保证Hive元数据被DLF完全同步。 diff --git a/docs/zh-CN/docs/lakehouse/multi-catalog/dlf.md b/docs/zh-CN/docs/lakehouse/multi-catalog/dlf.md index 822ecff1bb02f8..43d30ff5e6562e 100644 --- a/docs/zh-CN/docs/lakehouse/multi-catalog/dlf.md +++ b/docs/zh-CN/docs/lakehouse/multi-catalog/dlf.md @@ -66,24 +66,26 @@ CREATE CATALOG dlf PROPERTIES ( ### 使用开启了HDFS服务的OSS存储数据 -1. 确认OSS开启了HDFS服务。[开通并授权访问OSS-HDFS服务](https://help.aliyun.com/document_detail/419505.html?spm=a2c4g.2357115.0.i0) -2. 下载SDK。[JindoData SDK下载](https://github.com/aliyun/alibabacloud-jindodata/blob/master/docs/user/5.x/5.0.0-beta7/jindodata_download.md) -3. 解压下载后的jindosdk.tar.gz,将其lib目录下的`jindo-core.jar、jindo-sdk.jar`放到`${DORIS_HOME}/fe/lib`和`${DORIS_HOME}/be/lib/java_extensions`目录下。 +1. 确认OSS开启了HDFS服务。[开通并授权访问OSS-HDFS服务](https://help.aliyun.com/document_detail/419505.html?spm=a2c4g.2357115.0.i0)。 +2. 下载SDK。[JindoData SDK下载](https://github.com/aliyun/alibabacloud-jindodata/blob/master/docs/user/5.x/5.0.0-beta7/jindodata_download.md)。如果集群上已有SDK目录,忽略这一步。 +3. 解压下载后的jindosdk.tar.gz或者在集群上找到Jindo SDK的目录,将其lib目录下的`jindo-core.jar、jindo-sdk.jar`放到`${DORIS_HOME}/fe/lib`和`${DORIS_HOME}/be/lib/java_extensions/preload-extensions`目录下。 4. 创建DLF Catalog,并配置`oss.hdfs.enabled`为`true`: -```sql -CREATE CATALOG dlf_oss_hdfs PROPERTIES ( - "type"="hms", - "hive.metastore.type" = "dlf", - "dlf.proxy.mode" = "DLF_ONLY", - "dlf.endpoint" = "datalake-vpc.cn-beijing.aliyuncs.com", - "dlf.region" = "cn-beijing", - "dlf.uid" = "uid", - "dlf.access_key" = "ak", - "dlf.secret_key" = "sk", - "oss.hdfs.enabled" = "true" -); -``` + ```sql + CREATE CATALOG dlf_oss_hdfs PROPERTIES ( + "type"="hms", + "hive.metastore.type" = "dlf", + "dlf.proxy.mode" = "DLF_ONLY", + "dlf.endpoint" = "datalake-vpc.cn-beijing.aliyuncs.com", + "dlf.region" = "cn-beijing", + "dlf.uid" = "uid", + "dlf.access_key" = "ak", + "dlf.secret_key" = "sk", + "oss.hdfs.enabled" = "true" + ); + ``` + +5. 当Jindo SDK版本与EMR集群上所用的版本不一致时,会出现`Plugin not found`的问题,需更换到对应版本。 ### 访问DLF Iceberg表 diff --git a/fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java b/fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java index 6380dd2e6e0d68..850619cb246430 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java +++ b/fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java @@ -404,8 +404,10 @@ private FileCacheValue loadFiles(FileCacheKey key) { if (uri.getScheme() != null) { String scheme = uri.getScheme(); updateJobConf("fs." + scheme + ".impl.disable.cache", "true"); - if (!scheme.equals("hdfs") && !scheme.equals("viewfs")) { - updateJobConf("fs." + scheme + ".impl", PropertyConverter.getHadoopFSImplByScheme(scheme)); + if (jobConf.get("fs." + scheme + ".impl") == null) { + if (!scheme.equals("hdfs") && !scheme.equals("viewfs")) { + updateJobConf("fs." + scheme + ".impl", PropertyConverter.getHadoopFSImplByScheme(scheme)); + } } } } catch (Exception e) { diff --git a/fe/fe-core/src/main/java/org/apache/doris/datasource/property/PropertyConverter.java b/fe/fe-core/src/main/java/org/apache/doris/datasource/property/PropertyConverter.java index 174b0808bc240a..1b0e3b6d97298a 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/datasource/property/PropertyConverter.java +++ b/fe/fe-core/src/main/java/org/apache/doris/datasource/property/PropertyConverter.java @@ -332,8 +332,8 @@ private static void rewriteHdfsOnOssProperties(Map ossProperties region + ".oss-dls.aliyuncs.com"); } } - ossProperties.put("fs.oss.impl", "com.aliyun.emr.fs.oss.JindoOssFileSystem"); - ossProperties.put("fs.AbstractFileSystem.oss.impl", "com.aliyun.emr.fs.oss.OSS"); + ossProperties.put("fs.oss.impl", "com.aliyun.jindodata.oss.JindoOssFileSystem"); + ossProperties.put("fs.AbstractFileSystem.oss.impl", "com.aliyun.jindodata.oss.OSS"); } private static Map convertToCOSProperties(Map props, CloudCredential credential) { @@ -454,7 +454,8 @@ private static void getPropertiesFromDLFProps(Map props, if (!Strings.isNullOrEmpty(region)) { boolean hdfsEnabled = Boolean.parseBoolean(props.getOrDefault(OssProperties.OSS_HDFS_ENABLED, "false")); if (hdfsEnabled) { - props.putIfAbsent("fs.oss.impl", "com.aliyun.emr.fs.oss.JindoOssFileSystem"); + props.putIfAbsent("fs.oss.impl", "com.aliyun.jindodata.oss.JindoOssFileSystem"); + props.put("fs.AbstractFileSystem.oss.impl", "com.aliyun.jindodata.oss.OSS"); props.putIfAbsent(OssProperties.REGION, region); // example: cn-shanghai.oss-dls.aliyuncs.com // from https://www.alibabacloud.com/help/en/e-mapreduce/latest/oss-kusisurumen