-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lark sheets connector #17218
Add lark sheets connector #17218
Conversation
c7198fd
to
65ac8f6
Compare
020e181
to
6f96def
Compare
Hello, @ChunxuTang @beinan. Could you help review this PR? |
Thanks, @7c00 for your contribution, we were planning to request @ChunxuTang and @beinan as they helped us earlier to review the google sheet connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @7c00, thanks a lot for your work! This looks to be a very nice feature, and the Presto community could benefit.
Just left some suggestions/comments. For the first round of review, I mainly focused on the coding styles. Thanks again for your contributions!
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/SheetsMetadata.java
Outdated
Show resolved
Hide resolved
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/SheetsMetadata.java
Outdated
Show resolved
Hide resolved
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/SheetsMetadata.java
Outdated
Show resolved
Hide resolved
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/SheetsRecordCursor.java
Outdated
Show resolved
Hide resolved
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/SheetsRecordCursor.java
Outdated
Show resolved
Hide resolved
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/SheetsSplit.java
Outdated
Show resolved
Hide resolved
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/api/SheetsApiFactory.java
Outdated
Show resolved
Hide resolved
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/api/SheetsApiFactory.java
Outdated
Show resolved
Hide resolved
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/SheetsConfig.java
Outdated
Show resolved
Hide resolved
@@ -0,0 +1 @@ | |||
ewogICJhcHAtc2VjcmV0IiA6ICJYSHdrckVna0N6Y3dFZ0NDNlpQQjRnWXZtb25OSmdNTiIKfQ== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a TTL for this secret? And is there an owner email account? There's a potential risk here that when the secret expires, it may break presto CI workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no TTL for the secret. The secret is not bound to an email account.
I wonder if we should disable TestSheetsIntegration
by default as TestGoogleSheets
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we may need to disable it.
Hi @beinan, could you help disable this integration test by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's disable the test, otherwise it would be too flaky I guess
We could use @Test(enabled = false)
in the integration test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabled all the tests in TestLarkSheetsIntegration
.
d1c4dd5
to
91d0063
Compare
@ChunxuTang Thanks a lot for your comments. I haved update the code. Could you please review it again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @7c00, thanks for your revision!
Left a few minor suggestions/comments.
|
||
connector.name=lark-sheets | ||
app-domain=FEISHU | ||
app-id=cli_a16fc7ddf4b9900d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering is this a real app id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a real app id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering as this is a real ID, is it OK to expose it in an example file? Maybe just use exampleAppId
?
Your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to a fake app id as example_app_id
.
|
||
The ``token`` property is taken from the last path segment of the spreadsheet url, e.g. | ||
``shtcnBf5pg4BNSkwV2Ku5xwW9Pf`` for | ||
``https://test-ch80md45anra.feishu.cn/sheets/shtcnBf5pg4BNSkwV2Ku5xwW9Pf?sheet=MT1p4I`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is a real spreadsheet. Will the link expire?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it will not expire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it more readable, I might use a pattern string rather than a real link. e.g. https://{account_id}.feishu.ch/sheet/{token}/sheet={sheet_id}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the pattern string. And give an example as well.
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/SheetsMetadata.java
Outdated
Show resolved
Hide resolved
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/SheetsMetadata.java
Outdated
Show resolved
Hide resolved
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/SheetsRecordCursor.java
Outdated
Show resolved
Hide resolved
presto-lark-sheets/src/main/java/com/facebook/presto/lark/sheets/api/SheetsApiFactory.java
Outdated
Show resolved
Hide resolved
public class SimpleSheetsApi | ||
implements SheetsApi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering as the CachingSheetsApi
is implemented in a separate file, why do you prefer to implement the SimpleSheetsApi
in the factory class? Any specific reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible some of my perference during development I guess. Have rafactored SimpleSheetsApi to a top level class to make it matched the code style of CachingSheetsApi.
presto-lark-sheets/src/test/java/com/facebook/presto/lark/sheets/TestSheetsPlugin.java
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,70 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor suggestion: As you added some utility functions to SheetUtil
, would you mind also testing them here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Added tests for new functions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you add them? Forgot to commit changes?
It doesn't show any changes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests was added to another class by mistake. Have corrected this and moved the test to TestLarkSheetsUtil
.
@ChunxuTang Thanks for your comments. ❤️ |
Sure! I'll take another round of review very soon this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@7c00 Thanks for your contributions!
As you created this PR some time ago and it is now left a bit behind the master branch, could you merge master into this feature branch to avoid conflicts?
9866ddf
to
b8cd8f8
Compare
|
||
The ``token`` property is taken from the last path segment of the spreadsheet url, e.g. | ||
``shtcnBf5pg4BNSkwV2Ku5xwW9Pf`` for | ||
``https://test-ch80md45anra.feishu.cn/sheets/shtcnBf5pg4BNSkwV2Ku5xwW9Pf?sheet=MT1p4I`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it more readable, I might use a pattern string rather than a real link. e.g. https://{account_id}.feishu.ch/sheet/{token}/sheet={sheet_id}
|
||
import static java.util.Objects.requireNonNull; | ||
|
||
public class SheetsConnector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more consistent, can we use LarkSheetsConnector as the class name? Similarly, other class names should also use LarkSheets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine. Renamed to LarkSheetsConnector and prefixed other class with Lark as well.
@@ -0,0 +1 @@ | |||
ewogICJhcHAtc2VjcmV0IiA6ICJYSHdrckVna0N6Y3dFZ0NDNlpQQjRnWXZtb25OSmdNTiIKfQ== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's disable the test, otherwise it would be too flaky I guess
We could use @Test(enabled = false)
in the integration test.
fb67c06
to
aefb67a
Compare
@ChunxuTang @beinan Thanks for your comments. I have rebased my branch with master and updated the code as the comments. Could you please review it again? Thanks in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just left a couple comments for some minor issues, I will continue review this giant pr in this week. Many thanks for this create contribution!
LarkSheetsApi api = apiFactory.get(); | ||
LarkSheetsApi cachingApi = new CachingLarkSheetsApi(api); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inline api?
LarkSheetsApi api = apiFactory.get(); | |
LarkSheetsApi cachingApi = new CachingLarkSheetsApi(api); | |
LarkSheetsApi cachingApi = new CachingLarkSheetsApi(apiFactory.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
SCHEMA_TOKEN_NOT_PROVIDED(0x03, USER_ERROR), | ||
SCHEMA_NOT_EXISTS(0x04, USER_ERROR), | ||
SHEET_NAME_AMBIGUOUS(0x04, USER_ERROR), | ||
SHEETS_API_ERROR(0x10, EXTERNAL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this plural? if the plural is necessary, it might be better to follow alphabetical order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a typo. Have reorderred the error codes to three catagories.
SCHEMA_ALREADY_EXISTS(0x03, USER_ERROR), | ||
SCHEMA_TOKEN_NOT_PROVIDED(0x03, USER_ERROR), | ||
SCHEMA_NOT_EXISTS(0x04, USER_ERROR), | ||
SHEET_NAME_AMBIGUOUS(0x04, USER_ERROR), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why some of the error are sharing the same error code? the 0x03 and 0x04 are used two times here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a typo. Have reorderred the error codes to three catagories.
} | ||
else if (matched.size() > 1) { | ||
throw new PrestoException(LarkSheetsErrorCode.SHEET_NAME_AMBIGUOUS, | ||
format("Ambiguous name %s in spreadsheet %s: matched sheets: %s", sheetName, token, matched)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we throw an exception with the token? will there be any security concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the raw tokens in exception messages might cause data leak. Have masked the token before writing the token.
public List<SchemaTableName> listTables(ConnectorSession session, Optional<String> schemaName) | ||
{ | ||
if (!schemaName.isPresent()) { | ||
throw new PrestoException(NOT_PERMITTED, "Schema is required to list tables"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: any default schema for lark/feishu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default schema is not supported, since we dont know which spreadsheet shoud be associated with the schema.
public void renameSchema(ConnectorSession session, String source, String target) | ||
{ | ||
LarkSheetsSchema schema = requireVisibleSchema(session, source); | ||
checkSchemaUpdatable(schema, session.getUser(), "rename"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to use constant or enum for the operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operation parameter in checkSchemaUpdatable
is used for generating an error message. So I think it might be not neccessary to define constants of enums for it.
LinkedHashMap<String, LarkSheetsColumnHandle> columns = new LinkedHashMap<>(numColumns); | ||
for (int i = 0; i < numColumns; i++) { | ||
String rawColumnName = header.get(i); | ||
if (rawColumnName == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just thinking if we could use the column index when the column name is null. Your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column name can be a sequence of any UTF-8 characters. So it might be hard to define the column index name spec. For example, using $2
as the column index to reference the second column, it might conflict with another column with the name of $2
. In the other hand, updating a spreadsheet to fill a column name is very easy for end users. So column index name is not be supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, thanks for the clarification!
@ChunxuTang @beinan Hi, thanks for your review. ❤️ Do we need more review before merge this pr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good to me except two very minor issues.
this.api = requireNonNull(api, "api is null"); | ||
this.split = requireNonNull(split, "split is null"); | ||
this.readColumns = requireNonNull(readColumns, "columns is null"); | ||
this.indexMapping = createIndexMapping(readColumns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary the ".this"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
// Used for fast switch to enable or disable tests | ||
private static final boolean TEST_ENABLED = false; | ||
private static final String CATALOG = "larksheets"; | ||
// https://test-ch80md45anra.feishu.cn/sheets/shtcnBf5pg4BNSkwV2Ku5xwW9Pf?sheet=MT1p4I |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this comment for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This url points to the spreadsheet used for this integration test. The TESTING_TOKEN
is extracted from the url. I have added comments to make it more clear.
08e82ab
to
ee0facb
Compare
ping @beinan @ChunxuTang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @7c00 Thanks for your nice work!
Rebased with master branch. |
5585ba4
to
0f6088f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, rerun the test
Thanks @beinan ❤️ . This CI finished. Could we merge this PR? |
Thanks @beinan @ChunxuTang for your time to review! |
@7c00 Do you think you can provide short documentation on how to use this connector that we can publish at https://prestodb.io/docs/current/connector.html ? |
@rohanpednekar Yes. The connector doc is included in the PR. It's https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/larksheets.rst. It's a bit strange that it's not displayed in prestodb website. |
Thanks @7c00 for the confirmation, I guess this ll get published with the 0.272 release notes PR |
This pull introduces Lark Sheets Connector, which is inspired by Google Sheets Connector. Though there are tiny differences between them.
Developers could run
LarkSheetsQueryRunners#main
, and then follow the QueryModel section from the doc to try the connector.Test plan