Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oracle connector read all tables on ALL schemas under the same DB, not the only schema's table the user id can access #4944

Closed
Brianlifung opened this issue Jul 23, 2021 · 18 comments · Fixed by #5779

Comments

@Brianlifung
Copy link

Brianlifung commented Jul 23, 2021

Enviroment

  • Airbyte version: 0.27.3-alpha
  • OS Version / Instance: Ubuntu 16.04.7 LTS
  • Deployment: Docker
  • Source Connector and version: Oracle DB connector: 0.3.1
  • Destination Connector and version: Snowflake 0.3.11
  • Severity: HIGH
  • Step where error happened: Setup new connection

Current Behavior

Step: Setting source to connect Oracle and Destination to Snowflake, passed checking on source and destination but hangs on setup connection.(there is 1 4G size table on Oracle schema)
It works on loading csv file on S3 to Snowflake, but cannot work from Oracle -> Snowflake (hangs without error) at set up connection
it hangs when set up a new connection, checking with Davin (thanks), there is log with "Caused by: io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: grpc: received message larger than max (5420089 vs. 4194304)"
it seem loading too many tables on a db.
however I follow the steps under https://docs.airbyte.io/integrations/sources/oracle#2-create-a-dedicated-read-only-user-with-access-to-the-relevant-tables-recommended-but-optional
created a new user with a table access only, however further checking the log e.g.
2021-07-23 09:15:37 [33mWARN[m i.a.i.s.j.AbstractJdbcSource(lambda$discoverInternal$26):490 - {} - Could not convert column: "START_DATE" from table: "APEX_180200"."APEX_APPLICATION_PAGE_IR_SUB" with type: "-102". Casting to VARCHAR.
It shows up other schema "APEX_180200" (which user cannot be access), so that it may the issue Airbyte scanning all schemas, all tables under a DB while setting new connection, not the tables can be access by the oracle use id.
there are many schema, many tables at the Oracle DB, but I create a new user and 1 table on the same db for extracting tbale using airbyte.

Expected Behavior

Airbyte should scan or discover the schema the user id can be access/select ( so will not have the issue of "#3943 Discover schema fails when databases have too many tables.", the user id should can access 1 table 1 schema only

Logs

Logs

 "2021-07-23 09:14:49 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-07-23 09:14:49 [32mINFO[m i.a.i.b.IntegrationRunner(run):83 - {} - Integration config: IntegrationConfig{command=DISCOVER, configPath='source_config.json', catalogPath='null', statePath='null'}"


##and  the source_config.json having user id: APP_JEDOX_DATA
{"host":"172.23.10.128","port":51521,"sid":"LFTDWH.SERVER","username":"APP_JEDOX_DATA","password":


##but the log also includes other schema name : APEX_180200
2021-07-23 09:15:37 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-07-23 09:15:37 [33mWARN[m i.a.i.s.j.AbstractJdbcSource(lambda$discoverInternal$26):490 - {} - Could not convert column: "START_DATE" from table: "APEX_180200"."APEX_APPLICATION_PAGE_IR_SUB" with type: "-102". Casting to VARCHAR.

Steps to Reproduce

Are you willing to submit a PR?

Remove this with your answer.

@Brianlifung Brianlifung added the type/bug Something isn't working label Jul 23, 2021
@sherifnada sherifnada added the area/connectors Connector related issues label Jul 23, 2021
@AlexandruManolache1
Copy link

AlexandruManolache1 commented Jul 27, 2021

hello,
this is very much the same with #4906
we need also OLAPSYS, APEX_200200 to be excluded.Or make it configurable somehow.
This are public tables in ORACLE and it is not possible for a user not see them.
We can not "hide" it at DB level, so we need to tackle this in AirByte.

@Brianlifung
Copy link
Author

image

there are many schema on our Oracle DB, if no way to filter out then hit the issue #3943

@AlexandruManolache1
Copy link

yes, i agree @Brianlifung

@davinchia
Copy link
Contributor

davinchia commented Aug 23, 2021

Hi, thanks for reporting this. We have a backlog of connectors today so I don't think we'll get to this until 1/2 months from now. We do accept contributions :)

@davinchia
Copy link
Contributor

@Brianlifung @AlexandruManolache1 you want to give this another shot?

@AlexandruManolache1
Copy link

AlexandruManolache1 commented Sep 3, 2021

hi @davinchia , i am now on 0.29.13-alpha, i guess i need to update to 0.29.14-alpha?
update : I saw the update, i am testing it today :)

@davinchia
Copy link
Contributor

@AlexandruManolache1 updating the connector should be sufficient

@AlexandruManolache1
Copy link

AlexandruManolache1 commented Sep 3, 2021

hi, works OK, only thing worth mentioning is that the the exclusion of the default schemas remained valid- e.g.: HR

@davinchia
Copy link
Contributor

@AlexandruManolache1 is correct yes? We don't want the default schemas to show?

@AlexandruManolache1
Copy link

AlexandruManolache1 commented Sep 4, 2021

Considering the fact that we can specify just the ones we need, i would remove that filter.
It's more simpler with no filter-> : just specify the ones you want to pull data from

also, for my use case, i have some data i need to sync from HR schema.

@davinchia
Copy link
Contributor

Got it. You do have a use case for the default schemas, so you would want to be able to specify them in the configuration and retrieve information. Is that right?

@Brianlifung
Copy link
Author

I can pick tables from the schedule now, but failed on run
2021-09-06 03:58:49 INFO () LogClientSingleton(setJobMdc):146 - Setting docker job mdc 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):196 - {} - Finalizing tables in destination completed. 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):199 - {} - Cleaning tmp tables in destination started for 1 streams 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):203 - {} - Cleaning tmp table in destination started for stream DATA_SOURCE. schema PD_UTL, tmp table name: _airbyte_tmp_blp_DATA_SOURCE 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):208 - {} - Cleaning tmp tables in destination completed. 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.b.IntegrationRunner(run):153 - {} - Completed integration: io.airbyte.integrations.destination.snowflake.SnowflakeDestination 2021-09-06 03:58:50 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-06 03:58:50 INFO i.a.i.d.s.SnowflakeDestination(main):83 - {} - completed destination: class io.airbyte.integrations.destination.snowflake.SnowflakeDestination 2021-09-06 03:58:51 INFO () DefaultReplicationWorker(run):144 - Destination thread complete. 2021-09-06 03:58:51 ERROR () DefaultReplicationWorker(run):148 - Sync worker failed. io.airbyte.workers.WorkerException: Source process exit with code 1. This warning is normal if the job was cancelled. at io.airbyte.workers.protocols.airbyte.DefaultAirbyteSource.close(DefaultAirbyteSource.java:135) ~[io.airbyte-airbyte-workers-0.29.14-alpha.jar:?] at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:146) ~[io.airbyte-airbyte-workers-0.29.14-alpha.jar:?] at io.airbyte.workers.DefaultReplicationWorker.run(DefaultReplicationWorker.java:52) ~[io.airbyte-airbyte-workers-0.29.14-alpha.jar:?] at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:152) ~[io.airbyte-airbyte-workers-0.29.14-alpha.jar:?] at java.lang.Thread.run(Thread.java:832) [?:?] 2021-09-06 03:58:51 INFO () DefaultReplicationWorker(run):172 - sync summary: io.airbyte.config.ReplicationAttemptSummary@60de167c[status=failed,recordsSynced=0,bytesSynced=0,startTime=1630900689409,endTime=1630900731802] 2021-09-06 03:58:51 INFO () DefaultReplicationWorker(run):181 - Source did not output any state messages 2021-09-06 03:58:51 WARN () DefaultReplicationWorker(run):189 - State capture: No new state, falling back on input state: io.airbyte.config.State@6b1d9b[state={}] 2021-09-06 03:58:51 INFO () TemporalAttemptExecution(get):139 - Stopping cancellation check scheduling... 2021-09-06 03:58:51 INFO () SyncWorkflow$ReplicationActivityImpl(replicate):184 - sync summary: io.airbyte.config.StandardSyncOutput@7149004a[standardSyncSummary=io.airbyte.config.StandardSyncSummary@344b8007[status=failed,recordsSynced=0,bytesSynced=0,startTime=1630900689409,endTime=1630900731802],state=io.airbyte.config.State@6b1d9b[state={}],outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@44d3d50f[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@53590c3[stream=io.airbyte.protocol.models.AirbyteStream@3c479b7f[name=DATA_SOURCE,jsonSchema={"type":"object","properties":{"CREATED":{"type":"string"},"DSC_KEY":{"type":"number"},"UPDATED":{"type":"string"},"DSC_LONG":{"type":"string"},"DSC_ACTIVE":{"type":"number"},"DSC_PREFIX":{"type":"string"},"DSC_PREDICATE":{"type":"string"},"DSC_STRUCTURE_NAME":{"type":"string"}}},supportedSyncModes=[full_refresh, incremental],sourceDefinedCursor=<null>,defaultCursorField=[],sourceDefinedPrimaryKey=[],namespace=PD_UTL,additionalProperties={}],syncMode=full_refresh,cursorField=[],destinationSyncMode=overwrite,primaryKey=[],additionalProperties={}]],additionalProperties={}]

@AlexandruManolache1
Copy link

Got it. You do have a use case for the default schemas, so you would want to be able to specify them in the configuration and retrieve information. Is that right?

Yes @davinchia . Exactly how you configured it now, just without excluding some in the backend.

@Brianlifung
Copy link
Author

My I know do it supports Oracle 'CLOB' column type? I think I have an issue related to this

@davinchia
Copy link
Contributor

davinchia commented Sep 14, 2021

@Brianlifung can you post your error in the troubleshooting channel in our Slack?

@davinchia
Copy link
Contributor

@AlexandruManolache1 I'm going to remove the default schemas right now

@Brianlifung
Copy link
Author

@davinchia why to remove the default schemas? having this i can work fine with the connector as there are many schemas on my Oracle DB and reach the limits without it

@davinchia
Copy link
Contributor

The latest Oracle connector has a default filter on the schemas returned that matches the user. It will not return the system schemas anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants