Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](jdbc catalog) Optimize JdbcCatalog case mapping stability #41510

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zy-kkk
Copy link
Member

@zy-kkk zy-kkk commented Sep 30, 2024

This PR makes the following changes to the uppercase and lowercase mapping of JdbcCatalog

  1. The identifierMapping is managed by JdbcExternalCatalog instead of JdbcClient to better control its lifecycle
  2. The identifierMapping no longer loads remoteName alone, but Catalog controls the loading uniformly
  3. The identifierMapping will be loaded when each FE performs makeSureInitialized() to ensure that each FE has a mapping
  4. The initialization of mapping will only be performed once in makeSureInitialized(), which means that even if you use metaCache, if your source data is updated when identifierMapping is enabled, you must refresh the catalog to query normally.
  5. The identifierMapping is only responsible for the properties of the Catalog and is no longer affected by the fe config, simplifying the processing logic
  6. If lower_case_mete_names is false and meta_names_mapping is empty in the catalog properties, the identifierMapping will no longer take effect, further enhancing the stability of the default settings
  7. The JdbcClient is no longer closed during onRefreshCache, reducing the repeated creation of resources, improving reuse, and reducing the leakage of some global shared threads

…ty (apache#40891)

This PR makes the following changes to the uppercase and lowercase
mapping of JdbcCatalog
1. The identifierMapping is managed by JdbcExternalCatalog instead of
JdbcClient to better control its lifecycle
2. The identifierMapping no longer loads remoteName alone, but Catalog
controls the loading uniformly
3. The identifierMapping will be loaded when each FE performs
makeSureInitialized() to ensure that each FE has a mapping
4. The initialization of mapping will only be performed once in
makeSureInitialized(), which means that even if you use metaCache, if
your source data is updated when identifierMapping is enabled, you must
refresh the catalog to query normally.
5. The identifierMapping is only responsible for the properties of the
Catalog and is no longer affected by the fe config, simplifying the
processing logic
6. If lower_case_mete_names is false and meta_names_mapping is empty in
the catalog properties, the identifierMapping will no longer take
effect, further enhancing the stability of the default settings
7. The JdbcClient is no longer closed during onRefreshCache, reducing
the repeated creation of resources, improving reuse, and reducing the
leakage of some global shared threads
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zy-kkk
Copy link
Member Author

zy-kkk commented Sep 30, 2024

run buildall

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add need to add more FE unit test to test all these logics

String mappedName = nameMapping.getOrDefault(name, name);
String localName = isLowerCaseMetaNames ? mappedName.toLowerCase() : mappedName;
public interface IdentifierMapping {
List<String> fromRemoteDatabaseName(List<String> remoteDatabaseNames);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment for all these methods

localNameToRemoteName.computeIfAbsent(localName, k -> name);

if (isLowerCaseMetaNames && !lowerCaseNames.add(localName)) {
if (nameMap.containsKey(localName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the nameMap must contains localName if goes here?

+ " use meta_name_mapping to specify the column names.");
}
for (int i = 0; i < remoteColumns.size(); i++) {
remoteColumns.get(i).setName(localColumnNames.get(i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not modify the input parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think, the method fromRemoteColumnName should be act same as fromRemoteDbName and fromRemoteTableName, to return List<String> instead of List<Column>.
Which means these method only handle Names, you can wrap it and generate Column object outside

return columnMap;
}

private <K, V> V getRequiredMapping(Map<K, V> map, K key, String typeName, String entityName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entityName seems unnecessary.

private <K, V> V getRequiredMapping(Map<K, V> map, K key, String typeName, String entityName) {
V value = map.get(key);
if (value == null) {
LOG.warn("No remote {} found for {}: {}. Please refresh this catalog.", typeName, typeName, entityName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG.warn("No remote {} found for {}: {}. Please refresh this catalog.", typeName, typeName, entityName);
LOG.warn("No remote {} found for: {}. Please refresh this catalog.", typeName, key);

}

@Override
protected void buildDatabaseMapping() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method's logic is same as listDatabaseNames(), extract one

@@ -149,6 +150,9 @@ public abstract class ExternalCatalog
protected Optional<Boolean> useMetaCache = Optional.empty();
protected MetaCache<ExternalDatabase<? extends ExternalTable>> metaCache;

protected IdentifierMapping identifierMapping;
private boolean mappingsInitialized = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using a separator flag mappingsInitialized?
Why not just using initialized?

public List<String> fromRemoteTableName(String remoteDbName, List<String> remoteTableNames) {
// If mapping is not required, return the original input
if (!isLowerCaseMetaNames && isMappingInvalid()) {
return remoteTableNames;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user set Config.lower_case_table_names = 1,
I think we also need to handle the name mapping?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants