Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need Solution to the error.Running packages on Rstudio. Are the packages built to run on Impala ? #2

Open
kandupl opened this issue Apr 3, 2020 · 5 comments

Comments

@kandupl
Copy link

kandupl commented Apr 3, 2020

DBMS:
impala

Error:
java.sql.SQLException: [Cloudera]ImpalaJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Possible loss of precision for target table 'rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref'.
Expression 'CAST(concat('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS STRING)' (type: STRING) would need to be cast to VARCHAR(512) for column 'covariate_name'
), Query: INSERT INTO rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref (covariate_id,covariate_name,analysis_id,concept_id) SELECT t1.covariate_id, CAST( CONCAT('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS string ) as covariate_name, 410 as analysis_id, CAST( ((t1.covariate_id - 410) / 1000) AS int ) as concept_id FROM (SELECT DISTINCT e0pr395scov_1.covariate_id FROM rwd_p_omop_truven_medicaid_results2.e0pr395scov_1) AS t1 LEFT JOIN rwd_p_omop_truven_medicaid_cdm2.concept ON (concept.concept_id = CAST( ((t1.covariate_id - 410) / 1000) AS int )).

SQL:
INSERT INTO rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref (
covariate_id,
covariate_name,
analysis_id,
concept_id
)
SELECT covariate_id,

CAST(CONCAT('drug_era group during day -365 through -1 days relative to index: ', CASE WHEN concept_name IS NULL THEN 'Unknown concept' ELSE concept_name END) AS VARCHAR(512)) AS covariate_name,


410 AS analysis_id,
CAST((covariate_id - 410) / 1000 AS INT) AS concept_id

FROM (
SELECT DISTINCT covariate_id
FROM rwd_p_omop_truven_medicaid_results2.e0pr395scov_1
) t1
LEFT JOIN rwd_p_omop_truven_medicaid_cdm2.concept
ON concept_id = CAST((covariate_id - 410) / 1000 AS INT)

R version:
R version 3.6.0 (2019-04-26)

Platform:
x86_64-redhat-linux-gnu

Attached base packages:

  • stats
  • graphics
  • grDevices
  • utils
  • datasets
  • methods
  • base

Other attached packages:

  • HospitalizationInSymptomaticPatientsValidation (0.0.3)
  • Covid19CohortEvaluation (0.0.1)
  • DatabaseConnector (2.4.3)

**Time it took to run- 9 hours / R packages are adequately tuned?
Can we generate table/column statistics for the respective tables during each ETL?
**

2020-04-01 20:31:23 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used
2020-04-01 20:48:58 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used
2020-04-01 20:48:58 [Main thread] WARN DatabaseConnector getJbcDriverSingleton only the first element is used as variable name
2020-04-01 20:49:17 [Main thread] INFO HospitalizationInSymptomaticPatientsValidation execute Creating Cohorts
2020-04-01 20:49:17 [Main thread] WARN HospitalizationInSymptomaticPatientsValidation execute Error 'cannot open the connection' when writing log to file '/app/sas/users/kandupl/Lakshmi_Kandukuri/2020/2020-03-23-005_ttt_covid-19_response/Data/MCID_2/log.txt. Removing file appender from logger.
2020-04-01 20:49:17 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used
2020-04-01 20:49:17 [Main thread] WARN SqlRender translate Table name ' rwd_p_omop_truven_medicaid_results2' is too long. Table names should be shorter than 30 characters to prevent Oracle from crashing.
2020-04-02 01:55:15 [Main thread] INFO HospitalizationInSymptomaticPatientsValidation createCohorts Counting cohorts
2020-04-02 01:55:16 [Main thread] INFO HospitalizationInSymptomaticPatientsValidation execute Validating Models
2020-04-02 01:55:17 [Main thread] INFO PatientLevelPrediction evaluateMultiplePlp Evaluating model in /homes/kandupl/R/x86_64-redhat-linux-gnu-library/3.6/HospitalizationInSymptomaticPatientsValidation/plp_models/Analysis_2
2020-04-02 01:55:17 [Main thread] INFO PatientLevelPrediction evaluateMultiplePlp plpResult found in /homes/kandupl/R/x86_64-redhat-linux-gnu-library/3.6/HospitalizationInSymptomaticPatientsValidation/plp_models/Analysis_2
2020-04-02 01:55:17 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used
2020-04-02 01:55:18 [Main thread] WARN DatabaseConnector getJbcDriverSingleton the condition has length > 1 and only the first element will be used
2020-04-02 01:55:56 [Main thread] INFO PatientLevelPrediction 3 Error: Error: Error executing SQL: java.sql.SQLException: [Cloudera]ImpalaJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Possible loss of precision for target table 'rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref'. Expression 'CAST(concat('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS STRING)' (type: STRING) would need to be cast to VARCHAR(512) for column 'covariate_name' ), Query: INSERT INTO rwd_p_omop_truven_medicaid_results2.e0pr395scov_ref (covariate_id,covariate_name,analysis_id,concept_id) SELECT t1.covariate_id, CAST( CONCAT('drug_era group during day -365 through -1 days relative to index: ', (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS string ) as covariate_name, 410 as analysis_id, CAST( ((t1.covariate_id - 410) / 1000) AS int ) as concept_id FROM (SELECT DISTINCT e0pr395scov_1.covariate_id FROM rwd_p_omop_truven_medicaid_results2.e0pr395scov_1) AS t1 LEFT JOIN rwd_p_omop_truven_medicaid_cdm2.concept ON (concept.concept_id = CAST( ((t1.covariate_id - 410) / 1000) AS int )). An error report has been created at /app/sas/users/kandupl/Lakshmi_Kandukuri/2020/2020-03-23-005_ttt_covid-19_response/Code/errorReport.txt

Thanks
Lakshmi

@jreps
Copy link
Contributor

jreps commented Apr 8, 2020

what PLP package are you trying to run?

@jreps
Copy link
Contributor

jreps commented Apr 9, 2020

One thing I see is the table name is long (probably worth reducing it): WARN SqlRender translate Table name ' rwd_p_omop_truven_medicaid_results2' is too long. Table names should be shorter than 30 characters to prevent Oracle from crashing.

The issue is some casting problem: (CASE WHEN (concept.concept_name IS NULL) THEN 'Unknown concept' ELSE concept.concept_name END)) AS STRING)' (type: STRING) would need to be cast to VARCHAR(512) for column 'covariate_name' ) - the covariate_name column in the covariate reference table is too short. I think this is a FeatureExtraction bug.

@gklebanov
Copy link

hey Lakshmi - let's debug it together with Konstantin and Vitaly?

@kandupl
Copy link
Author

kandupl commented Apr 9, 2020

Sure will debug with Konstantin and Vitaly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants