Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CCR] Add time since last auto follow fetch to auto follow stats #36542

Merged

Conversation

martijnvg
Copy link
Member

For each remote cluster the auto follow coordinator, starts an auto
follower that checks the remote cluster state and determines whether an
index needs to be auto followed. The time since last auto follow is
reported per remote cluster and gives insight whether the auto follow
process is alive.

Relates to #33007
Originates from #35895

For each remote cluster the auto follow coordinator, starts an auto
follower that checks the remote cluster state and determines whether an
index needs to be auto followed. The time since last auto follow is
reported per remote cluster and gives insight whether the auto follow
process is alive.

Relates to elastic#33007
Originates from elastic#35895
@martijnvg martijnvg added >enhancement v7.0.0 :Distributed/CCR Issues around the Cross Cluster State Replication features v6.6.0 labels Dec 12, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@@ -73,24 +90,28 @@ public static AutoFollowStats fromXContent(final XContentParser parser) {
private final long numberOfFailedRemoteClusterStateRequests;
private final long numberOfSuccessfulFollowIndices;
private final NavigableMap<String, ElasticsearchException> recentAutoFollowErrors;
private final NavigableMap<String, Long> trackingRemoteClusters;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether all stats should be reported per remote cluster, so numberOfSuccessfulFollowIndices, recentAutoFollowErrors etc. would then be reported on a per remote cluster bases. This makes more sense to me now considering how the auto follow coordinator currently operates.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @martijnvg. This looks good. I left some comments.

@@ -156,7 +156,7 @@ public Ccr(final Settings settings) {

return Arrays.asList(
ccrLicenseChecker,
new AutoFollowCoordinator(client, clusterService, ccrLicenseChecker)
new AutoFollowCoordinator(client, clusterService, ccrLicenseChecker, System::nanoTime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use ThreadPool::relativeTimeInMillis.


private volatile long lastAutoFollowTime = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the time unit to the variable names (relativeTimeProvider and lastAutoFollowTime)?

long numberOfFailedRemoteClusterStateRequests,
long numberOfSuccessfulFollowIndices,
NavigableMap<String, ElasticsearchException> recentAutoFollowErrors,
NavigableMap<String, Long> trackingRemoteClusters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this parameter name (trackingRemoteClusters).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, current name is not so good. I'm thinking about auto_followed_cluster and also including the last_seen_metadata version together with time_since_last_auto_follow_millis.

@martijnvg martijnvg mentioned this pull request Dec 13, 2018
10 tasks
@martijnvg
Copy link
Member Author

@dnhatn Thanks for look and I've updated the PR.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments around the time unit. Feel free to merge it after addressing these.

@@ -79,10 +84,13 @@
public AutoFollowCoordinator(
Client client,
ClusterService clusterService,
CcrLicenseChecker ccrLicenseChecker) {
CcrLicenseChecker ccrLicenseChecker,
LongSupplier relativeNanoTimeProvider) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this became milliseconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, I completely missed that. Good catch!

long lastSeenMetadataVersion = entry.getValue().metadataVersion;
if (lastAutoFollowTimeInNanos != -1) {
long timeSinceLastAutoFollowInMillis =
TimeUnit.NANOSECONDS.toMillis(relativeNanoTimeProvider.getAsLong() - lastAutoFollowTimeInNanos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the time unit is ms, we should remove this conversion.

"cluster_name": {
"type": "keyword"
},
"time_since_last_auto_follow_started_millis": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe drop started in the field name?


private volatile long lastAutoFollowTimeInNanos = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also became ms now.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@martijnvg
Copy link
Member Author

run the gradle build tests 1

@bleskes
Copy link
Contributor

bleskes commented Dec 17, 2018

I wonder if we should call this time_since_last_check. Fetch sounds implementation specific IMO.

@martijnvg
Copy link
Member Author

I wonder if we should call this time_since_last_check. Fetch sounds implementation specific IMO.

👍 I will rename it to time_since_last_check

@martijnvg martijnvg merged commit a181a25 into elastic:master Dec 17, 2018
martijnvg added a commit that referenced this pull request Dec 17, 2018
)

For each remote cluster the auto follow coordinator, starts an auto
follower that checks the remote cluster state and determines whether an
index needs to be auto followed. The time since last auto follow is
reported per remote cluster and gives insight whether the auto follow
process is alive.

Relates to #33007
Originates from #35895
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Dec 18, 2018
* master: (30 commits)
  Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (elastic#35320)"
  Deprecate types in get_source and exist_source (elastic#36426)
  Fix duplicate phrase in shrink/split error message (elastic#36734)
  ingest: support default pipelines + bulk upserts (elastic#36618)
  TESTS:Debug Log. IndexStatsIT#testFilterCacheStats
  [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (elastic#35320)
  [TEST] fix float comparison in RandomObjects#getExpectedParsedValue
  Initialize startup `CcrRepositories` (elastic#36730)
  ingest: fix on_failure with Drop processor (elastic#36686)
  SNAPSHOTS: Adjust BwC Versions in Restore Logic (elastic#36718)
  [Painless] Add boxed type to boxed type casts for method/return (elastic#36571)
  Do not resolve addresses in remote connection info (elastic#36671)
  Add back one line removed by mistake regarding java version check and COMPAT jvm parameter existence
  Fixing line length for EnvironmentTests and RecoveryTests (elastic#36657)
  SQL: Fix translation of LIKE/RLIKE keywords (elastic#36672)
  [DOCS] Adds monitoring requirement for ingest node (elastic#36665)
  SNAPSHOTS: Disable BwC Tests Until elastic#36659 Landed (elastic#36709)
  Add doc's sequence number + primary term to GetResult and use it for updates (elastic#36680)
  [CCR] Add time since last auto follow fetch to auto follow stats (elastic#36542)
  Watcher accounts constructed lazily (elastic#36656)
  ...
@colings86 colings86 removed the v7.0.0 label Feb 7, 2019
@codebrain
Copy link
Contributor

Would it be possible to update the documentation? https://www.elastic.co/guide/en/elasticsearch/reference/6.7/ccr-get-stats.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CCR Issues around the Cross Cluster State Replication features >enhancement v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants