Monitor sweep index #4694

mlissner · 2024-11-15T00:43:29Z

We need to monitor the new sweep index. From our earlier PR:

Do we just need to check on the cronjob tomorrow, then, I assume?

Yes, we could also re-check the ES query to confirm that the reindexing task is running.

GET _tasks?detailed=true&actions=*reindex

Originally posted by @albertisfu in #4672 (comment)

The text was updated successfully, but these errors were encountered:

mlissner · 2024-11-15T00:44:31Z

@albertisfu, I spun this into a new issue, so we can discuss it here. I ran the tasks command just now and it actually had results:

{
  "nodes": {
    "nL1wCXyqQCW64ODQvubEnA": {
      "name": "elastic-cluster-es-master-data-nodes-v3-5",
      "transport_address": "172.30.0.203:9300",
      "host": "172.30.0.203",
      "ip": "172.30.0.203:9300",
      "roles": [
        "data",
        "data_cold",
        "data_content",
        "data_frozen",
        "data_hot",
        "data_warm",
        "ingest",
        "master",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "attributes": {
        "ml.allocated_processors": "4",
        "k8s_node_name": "ip-172-30-0-75.us-west-2.compute.internal",
        "ml.allocated_processors_double": "4.0",
        "ml.machine_memory": "31138512896",
        "xpack.installed": "true",
        "ml.max_jvm_size": "17179869184"
      },
      "tasks": {
        "nL1wCXyqQCW64ODQvubEnA:1261302059": {
          "node": "nL1wCXyqQCW64ODQvubEnA",
          "id": 1261302059,
          "type": "transport",
          "action": "indices:data/write/reindex",
          "status": {
            "total": 792895,
            "updated": 0,
            "created": 390000,
            "deleted": 0,
            "batches": 391,
            "version_conflicts": 0,
            "noops": 0,
            "retries": {
              "bulk": 0,
              "search": 0
            },
            "throttled_millis": 0,
            "requests_per_second": -1,
            "throttled_until_millis": 0
          },
          "description": "reindex from [recap_vectors] to [recap_sweep]",
          "start_time_in_millis": 1731628870357,
          "running_time_in_nanos": 2474749945113,
          "cancellable": true,
          "cancelled": false,
          "headers": {}
        }
      }
    }
  }
}

Surprising, no?

albertisfu · 2024-11-15T01:18:14Z

Are there any cron job instances running for the command?

Is it possible that the code deployment restarted any of the cron job processes? and since you removed the Redis keys, it just started again.

According to the running time (2474749945113), it has been running for just 41 minutes.

The total number of documents targeted for re-indexing is 792,895, and it had made progress up to 390,000, so it should have finished by now.

If not, we can cancel it by:

POST _tasks/nL1wCXyqQCW64ODQvubEnA:1261302059/_cancel

So when the cron job run again only one ES tasks runs.

And it might be also required to clean up the Redis keys again.

mlissner · 2024-11-15T21:12:20Z

Well, this took me down a rabbit hole, but luckily I had some time. We didn't have timezone information set on this or any other of our cronjobs, so I just audited and tweaked them all. This means this won't run until tonight, so we'll have to check on it on Monday.

I did clear out the redis stuff a second time though, so that should be good.

ERosendo · 2024-11-16T23:03:24Z

Here are the logs from the latest execution on November 15th:

2024-11-15 20:01:09.712 INFO Re-indexing task scheduled ID: 0cQl85qiTiyiNppgQkkPOA:1952097644
2024-11-15 20:02:09.717 INFO Task progress: 26000/627645 documents. Estimated time to finish: 1388.953806 seconds.
2024-11-15 20:17:09.720 INFO Task progress: 26000/627645 documents. Estimated time to finish: 22215.187001 seconds.
2024-11-15 20:32:09.723 INFO Task progress: 192000/627645 documents. Estimated time to finish: 4220.377473 seconds.
2024-11-15 20:47:09.727 INFO Task progress: 315000/627645 documents. Estimated time to finish: 2739.398105 seconds.
2024-11-15 21:02:09.730 INFO Task progress: 402000/627645 documents. Estimated time to finish: 2054.400233 seconds.
2024-11-15 21:17:09.734 INFO Task progress: 507000/627645 documents. Estimated time to finish: 1085.100631 seconds.
2024-11-15 21:32:09.738 INFO Task progress: 625000/627645 documents. Estimated time to finish: 60.0 seconds.

2024-11-15 21:33:09.760 INFO Resuming re-indexing process for date: 2024-11-15 00:00:00
2024-11-15 21:33:10.286 INFO Re-indexing task scheduled ID: 0cQl85qiTiyiNppgQkkPOA:1952406648
2024-11-15 21:34:10.289 INFO Task progress: 9000/564072 documents. Estimated time to finish: 3701.636893 seconds.
2024-11-15 21:49:10.292 INFO Task progress: 9000/564072 documents. Estimated time to finish: 59208.99034 seconds.
2024-11-15 22:04:10.296 INFO Task progress: 150000/564072 documents. Estimated time to finish: 5134.561793 seconds.
2024-11-15 22:19:10.299 INFO Task progress: 283000/564072 documents. Estimated time to finish: 2741.225053 seconds.
2024-11-15 22:34:10.303 INFO Task progress: 395000/564072 documents. Estimated time to finish: 1566.604886 seconds.
2024-11-15 22:49:10.306 INFO Task progress: 537000/564072 documents. Estimated time to finish: 229.886932 seconds.

I've included timestamps to help us understand how long it takes to reindex records.

It looks like the missing_error_index issue is resolved. However, the command was unable to complete due to issue #4698.

mlissner · 2024-11-19T00:20:06Z

Cool. Sounds like we should close #4646 and put #4698 on our next sprint, with this as a parent?

ERosendo · 2024-11-19T13:30:57Z

Sounds like we should close #4646 and put #4698 on our next sprint, with this as a parent?

Sounds good! Let's close #4646 and move #4698 to the next sprint.

mlissner self-assigned this Nov 15, 2024

mlissner added this to Sprint Nov 15, 2024

mlissner moved this to In progress in Sprint Nov 15, 2024

mlissner assigned albertisfu Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitor sweep index #4694

Monitor sweep index #4694

mlissner commented Nov 15, 2024

mlissner commented Nov 15, 2024

albertisfu commented Nov 15, 2024

mlissner commented Nov 15, 2024

ERosendo commented Nov 16, 2024

mlissner commented Nov 19, 2024

ERosendo commented Nov 19, 2024

Monitor sweep index #4694

Monitor sweep index #4694

Comments

mlissner commented Nov 15, 2024

mlissner commented Nov 15, 2024

albertisfu commented Nov 15, 2024

mlissner commented Nov 15, 2024

ERosendo commented Nov 16, 2024

mlissner commented Nov 19, 2024

ERosendo commented Nov 19, 2024