Oracle connector read all tables on ALL schemas under the same DB, not the only schema's table the user id can access #4944

Brianlifung · 2021-07-23T10:17:07Z

Enviroment

Airbyte version: 0.27.3-alpha
OS Version / Instance: Ubuntu 16.04.7 LTS
Deployment: Docker
Source Connector and version: Oracle DB connector: 0.3.1
Destination Connector and version: Snowflake 0.3.11
Severity: HIGH
Step where error happened: Setup new connection

Current Behavior

Step: Setting source to connect Oracle and Destination to Snowflake, passed checking on source and destination but hangs on setup connection.(there is 1 4G size table on Oracle schema)
It works on loading csv file on S3 to Snowflake, but cannot work from Oracle -> Snowflake (hangs without error) at set up connection
it hangs when set up a new connection, checking with Davin (thanks), there is log with "Caused by: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: grpc: received message larger than max (5420089 vs. 4194304)"
it seem loading too many tables on a db.
however I follow the steps under https://docs.airbyte.io/integrations/sources/oracle#2-create-a-dedicated-read-only-user-with-access-to-the-relevant-tables-recommended-but-optional
created a new user with a table access only, however further checking the log e.g.
2021-07-23 09:15:37 [33mWARN[m i.a.i.s.j.AbstractJdbcSource(lambda$discoverInternal$26):490 - {} - Could not convert column: "START_DATE" from table: "APEX_180200"."APEX_APPLICATION_PAGE_IR_SUB" with type: "-102". Casting to VARCHAR.
It shows up other schema "APEX_180200" (which user cannot be access), so that it may the issue Airbyte scanning all schemas, all tables under a DB while setting new connection, not the tables can be access by the oracle use id.
there are many schema, many tables at the Oracle DB, but I create a new user and 1 table on the same db for extracting tbale using airbyte.

Expected Behavior

Airbyte should scan or discover the schema the user id can be access/select ( so will not have the issue of "#3943 Discover schema fails when databases have too many tables.", the user id should can access 1 table 1 schema only

Logs


 "2021-07-23 09:14:49 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-07-23 09:14:49 [32mINFO[m i.a.i.b.IntegrationRunner(run):83 - {} - Integration config: IntegrationConfig{command=DISCOVER, configPath='source_config.json', catalogPath='null', statePath='null'}"


##and  the source_config.json having user id: APP_JEDOX_DATA
{"host":"172.23.10.128","port":51521,"sid":"LFTDWH.SERVER","username":"APP_JEDOX_DATA","password":


##but the log also includes other schema name : APEX_180200
2021-07-23 09:15:37 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-07-23 09:15:37 [33mWARN[m i.a.i.s.j.AbstractJdbcSource(lambda$discoverInternal$26):490 - {} - Could not convert column: "START_DATE" from table: "APEX_180200"."APEX_APPLICATION_PAGE_IR_SUB" with type: "-102". Casting to VARCHAR.

Steps to Reproduce

Are you willing to submit a PR?

Remove this with your answer.

The text was updated successfully, but these errors were encountered:

AlexandruManolache1 · 2021-07-27T07:44:37Z

hello,
this is very much the same with #4906
we need also OLAPSYS, APEX_200200 to be excluded.Or make it configurable somehow.
This are public tables in ORACLE and it is not possible for a user not see them.
We can not "hide" it at DB level, so we need to tackle this in AirByte.

Brianlifung · 2021-07-27T07:54:25Z

there are many schema on our Oracle DB, if no way to filter out then hit the issue #3943

AlexandruManolache1 · 2021-07-27T07:58:51Z

yes, i agree @Brianlifung

davinchia · 2021-08-23T14:14:25Z

Hi, thanks for reporting this. We have a backlog of connectors today so I don't think we'll get to this until 1/2 months from now. We do accept contributions :)

Closes #4944 .

davinchia · 2021-09-03T06:26:23Z

@Brianlifung @AlexandruManolache1 you want to give this another shot?

AlexandruManolache1 · 2021-09-03T06:54:34Z

hi @davinchia , i am now on 0.29.13-alpha, i guess i need to update to 0.29.14-alpha?
update : I saw the update, i am testing it today :)

davinchia · 2021-09-03T07:25:54Z

@AlexandruManolache1 updating the connector should be sufficient

AlexandruManolache1 · 2021-09-03T11:29:33Z

hi, works OK, only thing worth mentioning is that the the exclusion of the default schemas remained valid- e.g.: HR

davinchia · 2021-09-04T02:49:44Z

@AlexandruManolache1 is correct yes? We don't want the default schemas to show?

AlexandruManolache1 · 2021-09-04T06:49:48Z

Considering the fact that we can specify just the ones we need, i would remove that filter.
It's more simpler with no filter-> : just specify the ones you want to pull data from

also, for my use case, i have some data i need to sync from HR schema.

davinchia · 2021-09-04T08:18:58Z

Got it. You do have a use case for the default schemas, so you would want to be able to specify them in the configuration and retrieve information. Is that right?

Brianlifung · 2021-09-06T04:15:51Z

I can pick tables from the schedule now, but failed on run
2021-09-06 03:58:49 INFO () LogClientSingleton(setJobMdc):146 - Setting docker job mdc 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):196 - {} - Finalizing tables in destination completed. 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):199 - {} - Cleaning tmp tables in destination started for 1 streams 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):203 - {} - Cleaning tmp table in destination started for stream DATA_SOURCE. schema PD_UTL, tmp table name: _airbyte_tmp_blp_DATA_SOURCE 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):208 - {} - Cleaning tmp tables in destination completed. 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.b.IntegrationRunner(run):153 - {} - Completed integration: io.airbyte.integrations.destination.snowflake.SnowflakeDestination 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.d.s.SnowflakeDestination(main):83 - {} - completed destination: class io.airbyte.integrations.destination.snowflake.SnowflakeDestination 2021-09-06 03:58:51 INFO () DefaultReplicationWorker(run):144 - Destination thread complete. 2021-09-06 03:58:51 ERROR () DefaultReplicationWorker(run):148 - Sync worker failed. io.airbyte.workers.WorkerException: Source process exit with code 1. This warning is normal if the job was cancelled. at io.airbyte.workers.protocols.airbyte.DefaultAirbyteSource.close(DefaultAirbyteSource.java:135) ~[io.airbyte-airbyte-workers-0.29.14-alpha.jar:?] at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:146) ~[io.airbyte-airbyte-workers-0.29.14-alpha.jar:?] at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:52) ~[io.airbyte-airbyte-workers-0.29.14-alpha.jar:?] at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:152) ~[io.airbyte-airbyte-workers-0.29.14-alpha.jar:?] at java.lang.Thread.run(Thread.java:832) [?:?] 2021-09-06 03:58:51 INFO () DefaultReplicationWorker(run):172 - sync summary: io.airbyte.config.ReplicationAttemptSummary@60de167c[status=failed,recordsSynced=0,bytesSynced=0,startTime=1630900689409,endTime=1630900731802] 2021-09-06 03:58:51 INFO () DefaultReplicationWorker(run):181 - Source did not output any state messages 2021-09-06 03:58:51 WARN () DefaultReplicationWorker(run):189 - State capture: No new state, falling back on input state: io.airbyte.config.State@6b1d9b[state={}] 2021-09-06 03:58:51 INFO () TemporalAttemptExecution(get):139 - Stopping cancellation check scheduling... 2021-09-06 03:58:51 INFO () SyncWorkflow$ReplicationActivityImpl(replicate):184 - sync summary: io.airbyte.config.StandardSyncOutput@7149004a[standardSyncSummary=io.airbyte.config.StandardSyncSummary@344b8007[status=failed,recordsSynced=0,bytesSynced=0,startTime=1630900689409,endTime=1630900731802],state=io.airbyte.config.State@6b1d9b[state={}],outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@44d3d50f[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@53590c3[stream=io.airbyte.protocol.models.AirbyteStream@3c479b7f[name=DATA_SOURCE,jsonSchema={"type":"object","properties":{"CREATED":{"type":"string"},"DSC_KEY":{"type":"number"},"UPDATED":{"type":"string"},"DSC_LONG":{"type":"string"},"DSC_ACTIVE":{"type":"number"},"DSC_PREFIX":{"type":"string"},"DSC_PREDICATE":{"type":"string"},"DSC_STRUCTURE_NAME":{"type":"string"}}},supportedSyncModes=[full_refresh, incremental],sourceDefinedCursor=<null>,defaultCursorField=[],sourceDefinedPrimaryKey=[],namespace=PD_UTL,additionalProperties={}],syncMode=full_refresh,cursorField=[],destinationSyncMode=overwrite,primaryKey=[],additionalProperties={}]],additionalProperties={}]

AlexandruManolache1 · 2021-09-06T06:21:09Z

Got it. You do have a use case for the default schemas, so you would want to be able to specify them in the configuration and retrieve information. Is that right?

Yes @davinchia . Exactly how you configured it now, just without excluding some in the backend.

Brianlifung · 2021-09-06T07:10:03Z

My I know do it supports Oracle 'CLOB' column type? I think I have an issue related to this

davinchia · 2021-09-14T02:07:37Z

@Brianlifung can you post your error in the troubleshooting channel in our Slack?

davinchia · 2021-09-14T02:08:03Z

@AlexandruManolache1 I'm going to remove the default schemas right now

Brianlifung · 2021-09-14T02:10:52Z

@davinchia why to remove the default schemas? having this i can work fine with the connector as there are many schemas on my Oracle DB and reach the limits without it

davinchia · 2021-09-14T02:17:06Z

The latest Oracle connector has a default filter on the schemas returned that matches the user. It will not return the system schemas anymore.

Brianlifung added the type/bug Something isn't working label Jul 23, 2021

sherifnada added the area/connectors Connector related issues label Jul 23, 2021

davinchia mentioned this issue Sep 1, 2021

🎉 Oracle specify schema(s) to discover. #5779

Merged

36 tasks

davinchia closed this as completed in #5779 Sep 2, 2021

davinchia added a commit that referenced this issue Sep 2, 2021

🎉 Oracle specify schema(s) to discover. (#5779)

65efaa5

Closes #4944 .

igrankova added connectors/sources-database connectors/source/oracle labels Jan 12, 2022

karinakuz added connectors/destinations-warehouse connectors/destination/snowflake labels Jan 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oracle connector read all tables on ALL schemas under the same DB, not the only schema's table the user id can access #4944

Oracle connector read all tables on ALL schemas under the same DB, not the only schema's table the user id can access #4944

Brianlifung commented Jul 23, 2021 •

edited by marcosmarxm

Loading

AlexandruManolache1 commented Jul 27, 2021 •

edited

Loading

Brianlifung commented Jul 27, 2021

AlexandruManolache1 commented Jul 27, 2021

davinchia commented Aug 23, 2021 •

edited

Loading

davinchia commented Sep 3, 2021

AlexandruManolache1 commented Sep 3, 2021 •

edited

Loading

davinchia commented Sep 3, 2021

AlexandruManolache1 commented Sep 3, 2021 •

edited

Loading

davinchia commented Sep 4, 2021

AlexandruManolache1 commented Sep 4, 2021 •

edited

Loading

davinchia commented Sep 4, 2021

Brianlifung commented Sep 6, 2021

AlexandruManolache1 commented Sep 6, 2021

Brianlifung commented Sep 6, 2021

davinchia commented Sep 14, 2021 •

edited

Loading

davinchia commented Sep 14, 2021

Brianlifung commented Sep 14, 2021

davinchia commented Sep 14, 2021

Oracle connector read all tables on ALL schemas under the same DB, not the only schema's table the user id can access #4944

Oracle connector read all tables on ALL schemas under the same DB, not the only schema's table the user id can access #4944

Comments

Brianlifung commented Jul 23, 2021 • edited by marcosmarxm Loading

Enviroment

Current Behavior

Expected Behavior

Logs

Steps to Reproduce

Are you willing to submit a PR?

AlexandruManolache1 commented Jul 27, 2021 • edited Loading

Brianlifung commented Jul 27, 2021

AlexandruManolache1 commented Jul 27, 2021

davinchia commented Aug 23, 2021 • edited Loading

davinchia commented Sep 3, 2021

AlexandruManolache1 commented Sep 3, 2021 • edited Loading

davinchia commented Sep 3, 2021

AlexandruManolache1 commented Sep 3, 2021 • edited Loading

davinchia commented Sep 4, 2021

AlexandruManolache1 commented Sep 4, 2021 • edited Loading

davinchia commented Sep 4, 2021

Brianlifung commented Sep 6, 2021

AlexandruManolache1 commented Sep 6, 2021

Brianlifung commented Sep 6, 2021

davinchia commented Sep 14, 2021 • edited Loading

davinchia commented Sep 14, 2021

Brianlifung commented Sep 14, 2021

davinchia commented Sep 14, 2021

Brianlifung commented Jul 23, 2021 •

edited by marcosmarxm

Loading

AlexandruManolache1 commented Jul 27, 2021 •

edited

Loading

davinchia commented Aug 23, 2021 •

edited

Loading

AlexandruManolache1 commented Sep 3, 2021 •

edited

Loading

AlexandruManolache1 commented Sep 3, 2021 •

edited

Loading

AlexandruManolache1 commented Sep 4, 2021 •

edited

Loading

davinchia commented Sep 14, 2021 •

edited

Loading