fault tolerance and hdds.datanode.replication.work.dir #7525

julienlau · 2024-12-03T10:48:21Z

julienlau
Dec 3, 2024

Hello,

I use ozone 1.4.0 and I am concerned about the hdds.datanode.replication.work.dir (default value /tmp)

1/ Is it used only for replication or also for erasure coding ?
2/ In the cloudera documentation https://docs.cloudera.com/storage/latest/storage-options/topics/ozone-performance-tuning-for-ozone.html it is stated :

The /tmp directory is used for staging transient files as part of data replication. Cloudera recommends a minimum of 50 GB size for the Datanode Replication Working Directory. Cloudera recommends you to use SSD for better performance during data replication. It can use the same SSD as datanode metadata.

Is it fault tolerant or is it a point of failure for a datanode ?
If the drive hosting /tmp fails or is full does my datanode crash ?

Regards

Answered by errose28

Dec 13, 2024

For EC reconstruction, the containers will be created in place but marked with the RECOVERING state until fully rebuilt. Reconstruction will not use this directory afaik. There are cases where we use plain copy/replication for EC containers, for example decommissioning. If we are decommissioning a node with replica index 1 (the first set of chunks in the stripe) we will copy it to a new node and this directory will be used in that case.

View full answer

errose28 · 2024-12-13T22:26:14Z

errose28
Dec 13, 2024
Collaborator

Hi @julienlau, this config has been deprecated since #3648. In 1.4.0 datanodes will automatically import containers to a tmp directory on the same storage volume that the container is destined for. The Cloudera docs are outdated.

For context, datanodes need a staging directory to import containers to before moving them to the main working directory. This ensures that partial state is not left behind in the main datanode working directories of the storage volumes if import fails partway through. Previously this was just using /tmp, but this caused concerns about space, throughput, and non-atomic move across filesystems.

In neither scenario is this directory required for fault tolerance though. If import fails partway through, the operation will be retried. Datanodes will not report to SCM that they have successfully imported a container until it is fully moved into its final location on the volume. In the old implementation if /tmp failed the datanode would not be able to import more containers. The new implementation that uses the storage volumes directly does not have this problem.

1 reply

errose28 Dec 13, 2024
Collaborator

For EC reconstruction, the containers will be created in place but marked with the RECOVERING state until fully rebuilt. Reconstruction will not use this directory afaik. There are cases where we use plain copy/replication for EC containers, for example decommissioning. If we are decommissioning a node with replica index 1 (the first set of chunks in the stripe) we will copy it to a new node and this directory will be used in that case.

Answer selected by julienlau

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fault tolerance and hdds.datanode.replication.work.dir #7525

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

fault tolerance and hdds.datanode.replication.work.dir #7525

julienlau Dec 3, 2024

Replies: 1 comment · 1 reply

errose28 Dec 13, 2024 Collaborator

errose28 Dec 13, 2024 Collaborator

julienlau
Dec 3, 2024

Replies: 1 comment 1 reply

errose28
Dec 13, 2024
Collaborator

errose28 Dec 13, 2024
Collaborator