Need Solution to the error.Running packages on Rstudio. Are the packages built to run on Impala ? #2

kandupl · 2020-04-03T03:34:58Z

DBMS:
impala

Error:
java.sql.SQLException: [Cloudera]ImpalaJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Possible loss of precision for target table 'rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref'.
Expression 'CAST(concat('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS STRING)' (type: STRING) would need to be cast to VARCHAR(512) for column 'covariate_name'
), Query: INSERT INTO rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref (covariate_id,covariate_name,analysis_id,concept_id) SELECT t1.covariate_id, CAST( CONCAT('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS string ) as covariate_name, 410 as analysis_id, CAST( ((t1.covariate_id - 410) / 1000) AS int ) as concept_id FROM (SELECT DISTINCT e0pr395scov_1.covariate_id FROM rwd_p_omop_truven_medicaid_results2.e0pr395scov_1) AS t1 LEFT JOIN rwd_p_omop_truven_medicaid_cdm2.concept ON (concept.concept_id = CAST( ((t1.covariate_id - 410) / 1000) AS int )).

SQL:
INSERT INTO rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref (
covariate_id,
covariate_name,
analysis_id,
concept_id
)
SELECT covariate_id,

CAST(CONCAT('drug_era group during day -365 through -1 days relative to index: ', CASE WHEN concept_name IS NULL THEN 'Unknown concept' ELSE concept_name END) AS VARCHAR(512)) AS covariate_name,


410 AS analysis_id,
CAST((covariate_id - 410) / 1000 AS INT) AS concept_id

FROM (
SELECT DISTINCT covariate_id
FROM rwd_p_omop_truven_medicaid_results2.e0pr395scov_1
) t1
LEFT JOIN rwd_p_omop_truven_medicaid_cdm2.concept
ON concept_id = CAST((covariate_id - 410) / 1000 AS INT)

R version:
R version 3.6.0 (2019-04-26)

Platform:
x86_64-redhat-linux-gnu

Attached base packages:

stats
graphics
grDevices
utils
datasets
methods
base

Other attached packages:

HospitalizationInSymptomaticPatientsValidation (0.0.3)
Covid19CohortEvaluation (0.0.1)
DatabaseConnector (2.4.3)

**Time it took to run- 9 hours / R packages are adequately tuned?
Can we generate table/column statistics for the respective tables during each ETL?
**

2020-04-01 20:31:23 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used
2020-04-01 20:48:58 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used
2020-04-01 20:48:58 [Main thread] WARN DatabaseConnector getJbcDriverSingleton only the first element is used as variable name
2020-04-01 20:49:17 [Main thread] INFO HospitalizationInSymptomaticPatientsValidation execute Creating Cohorts
2020-04-01 20:49:17 [Main thread] WARN HospitalizationInSymptomaticPatientsValidation execute Error 'cannot open the connection' when writing log to file '/app/sas/users/kandupl/Lakshmi_Kandukuri/2020/2020-03-23-005_ttt_covid-19_response/Data/MCID_2/log.txt. Removing file appender from logger.
2020-04-01 20:49:17 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used
2020-04-01 20:49:17 [Main thread] WARN SqlRender translate Table name ' rwd_p_omop_truven_medicaid_results2' is too long. Table names should be shorter than 30 characters to prevent Oracle from crashing.
2020-04-02 01:55:15 [Main thread] INFO HospitalizationInSymptomaticPatientsValidation createCohorts Counting cohorts
2020-04-02 01:55:16 [Main thread] INFO HospitalizationInSymptomaticPatientsValidation execute Validating Models
2020-04-02 01:55:17 [Main thread] INFO PatientLevelPrediction evaluateMultiplePlp Evaluating model in /homes/kandupl/R/x86_64-redhat-linux-gnu-library/3.6/HospitalizationInSymptomaticPatientsValidation/plp_models/Analysis_2
2020-04-02 01:55:17 [Main thread] INFO PatientLevelPrediction evaluateMultiplePlp plpResult found in /homes/kandupl/R/x86_64-redhat-linux-gnu-library/3.6/HospitalizationInSymptomaticPatientsValidation/plp_models/Analysis_2
2020-04-02 01:55:17 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used
2020-04-02 01:55:18 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used
2020-04-02 01:55:56 [Main thread] INFO PatientLevelPrediction 3 Error: Error: Error executing SQL: java.sql.SQLException: [Cloudera]ImpalaJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Possible loss of precision for target table 'rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref'. Expression 'CAST(concat('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS STRING)' (type: STRING) would need to be cast to VARCHAR(512) for column 'covariate_name' ), Query: INSERT INTO rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref (covariate_id,covariate_name,analysis_id,concept_id) SELECT t1.covariate_id, CAST( CONCAT('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS string ) as covariate_name, 410 as analysis_id, CAST( ((t1.covariate_id - 410) / 1000) AS int ) as concept_id FROM (SELECT DISTINCT e0pr395scov_1.covariate_id FROM rwd_p_omop_truven_medicaid_results2.e0pr395scov_1) AS t1 LEFT JOIN rwd_p_omop_truven_medicaid_cdm2.concept ON (concept.concept_id = CAST( ((t1.covariate_id - 410) / 1000) AS int )). An error report has been created at /app/sas/users/kandupl/Lakshmi_Kandukuri/2020/2020-03-23-005_ttt_covid-19_response/Code/errorReport.txt

Thanks
Lakshmi

The text was updated successfully, but these errors were encountered:

jreps · 2020-04-08T12:55:53Z

what PLP package are you trying to run?

kandupl · 2020-04-09T15:49:05Z

https://github.com/ohdsi-studies/Covid19PredictionStudies/tree/master/HospitalizationInSymptomaticPatientsValidation

jreps · 2020-04-09T15:59:58Z

One thing I see is the table name is long (probably worth reducing it): WARN SqlRender translate Table name ' rwd_p_omop_truven_medicaid_results2' is too long. Table names should be shorter than 30 characters to prevent Oracle from crashing.

The issue is some casting problem: (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS STRING)' (type: STRING) would need to be cast to VARCHAR(512) for column 'covariate_name' ) - the covariate_name column in the covariate reference table is too short. I think this is a FeatureExtraction bug.

gklebanov · 2020-04-09T16:50:14Z

hey Lakshmi - let's debug it together with Konstantin and Vitaly?

kandupl · 2020-04-09T17:25:05Z

Sure will debug with Konstantin and Vitaly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need Solution to the error.Running packages on Rstudio. Are the packages built to run on Impala ? #2

Need Solution to the error.Running packages on Rstudio. Are the packages built to run on Impala ? #2

kandupl commented Apr 3, 2020

jreps commented Apr 8, 2020

kandupl commented Apr 9, 2020

jreps commented Apr 9, 2020

gklebanov commented Apr 9, 2020

kandupl commented Apr 9, 2020

Need Solution to the error.Running packages on Rstudio. Are the packages built to run on Impala ? #2

Need Solution to the error.Running packages on Rstudio. Are the packages built to run on Impala ? #2

Comments

kandupl commented Apr 3, 2020

jreps commented Apr 8, 2020

kandupl commented Apr 9, 2020

jreps commented Apr 9, 2020

gklebanov commented Apr 9, 2020

kandupl commented Apr 9, 2020