-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-10685. Short circuit read support in Ozone #7236
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Since this is a big PR, it'll take me more time to go through it entirely but here are some comments.
defaultValue = "600", | ||
type = ConfigType.LONG, | ||
description = "If some unknown IO error happens on Domain socket read, short circuit read will be disabled " + | ||
"temporary for this period of time(seconds).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
"temporary for this period of time(seconds).", | |
"temporarily for this period of time(seconds).", |
type = ConfigType.SIZE, | ||
description = "Buffer size of reader/writer.", | ||
tags = { ConfigTag.CLIENT, ConfigTag.DATANODE }) | ||
private int shortCircuitBufferSize = 128 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious to know how the default value is determined.
The HDFS "dfs.client.read.shortcircuit.buffer.size" which is similar, is 1MB in size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short-circuit channel only exchange getBlock request and response. The value is determined by how big will a request and a response of a 256MB block size, 4MB chunk size, 16KB checksum size block. The request is round 500 bytes, and the response is around
30 + 53 * 64 + 6 * 16384 ~ 100k
block size (exclude chunks) - 30 bytes
chunk size (one checksums) - 53 bytes
one checksum size - 6 bytes
dataOut.write(bytes); | ||
dataOut.flush(); | ||
} | ||
// LOG.info("send {} {} bytes with id {}", type, bytes.length, key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
port = dn.getPort(DatanodeDetails.Port.Name.STANDALONE).getValue(); | ||
InetSocketAddress addr = NetUtils.createSocketAddr(dn.getIpAddress(), port); | ||
if (OzoneNetUtils.isAddressLocal(addr) && | ||
dn.getCurrentVersion() >= SHORT_CIRCUIT_READS.toProtoValue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. It will be addressed with a new patch.
LOG.error("Short-circuit read is not enabled"); | ||
return null; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should abort if it's neither grpc nor short-circuit.
import org.apache.hadoop.hdds.utils.IOUtils; | ||
import org.apache.hadoop.hdfs.server.datanode.DataNode; | ||
import org.apache.hadoop.hdfs.server.datanode.DatanodeUtil; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these org.apache.hadoop.hdfs.* dependency required? Ideally we should get away with HDFS.
import org.apache.hadoop.hdds.utils.IOUtils; | |
import org.apache.hadoop.hdfs.server.datanode.DataNode; | |
import org.apache.hadoop.hdfs.server.datanode.DatanodeUtil; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
import org.apache.hadoop.hdfs.protocol.ExtendedBlock; | ||
import org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier; | ||
import org.apache.hadoop.hdfs.server.datanode.DataNode; | ||
import org.apache.hadoop.hdfs.server.datanode.DatanodeUtil; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these org.apache.hadoop.hdfs.* dependencies are needed.
import org.apache.hadoop.hdfs.protocol.ExtendedBlock; | |
import org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier; | |
import org.apache.hadoop.hdfs.server.datanode.DataNode; | |
import org.apache.hadoop.hdfs.server.datanode.DatanodeUtil; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_KERBEROS_PRINCIPAL_KEY; | ||
|
||
/** | ||
* Security protocol for a secure OzoneManager. | ||
*/ | ||
@KerberosInfo( | ||
serverPrincipal = DFS_DATANODE_KERBEROS_PRINCIPAL_KEY | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can these be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
} catch (IOException e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO an IOException should fail.
} catch (IOException | InterruptedException e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also https://issues.apache.org/jira/browse/HDFS-16198
we will need to handle block token expirations in secure environment too.
public static final String OZONE_READ_SHORT_CIRCUIT = "ozone.client.read.short-circuit"; | ||
public static final boolean OZONE_READ_SHORT_CIRCUIT_DEFAULT = false; | ||
public static final String OZONE_DOMAIN_SOCKET_PATH = "ozone.domain.socket.path"; | ||
public static final String OZONE_DOMAIN_SOCKET_PATH_DEFAULT = "/var/lib/ozone/dn_socket"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to document this property somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to add a document about this feature. Will explain all these new properties there.
public static final String OZONE_READ_SHORT_CIRCUIT = "ozone.client.read.short-circuit"; | ||
public static final boolean OZONE_READ_SHORT_CIRCUIT_DEFAULT = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe this is redundant; See: shortCircuitEnabled.
public static final String OZONE_READ_SHORT_CIRCUIT = "ozone.client.read.short-circuit"; | |
public static final boolean OZONE_READ_SHORT_CIRCUIT_DEFAULT = false; |
try { | ||
request = ContainerProtos.ContainerCommandRequestProto.parseFrom(requestInBytes); | ||
assertTrue(request.hasGetBlock()); | ||
assertEquals(ContainerProtos.Type.GetBlock, request.getCmdType()); | ||
assertEquals(containerID, request.getContainerID()); | ||
assertEquals(datanodeID, request.getDatanodeUuid()); | ||
assertEquals(localBlockID, request.getGetBlock().getBlockID().getLocalID()); | ||
assertEquals(bcsid, request.getGetBlock().getBlockID().getBlockCommitSequenceId()); | ||
} catch (InvalidProtocolBufferException e) { | ||
throw new RuntimeException(e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just let it throw exception and exit. Change method signature accordingly too.
try { | |
request = ContainerProtos.ContainerCommandRequestProto.parseFrom(requestInBytes); | |
assertTrue(request.hasGetBlock()); | |
assertEquals(ContainerProtos.Type.GetBlock, request.getCmdType()); | |
assertEquals(containerID, request.getContainerID()); | |
assertEquals(datanodeID, request.getDatanodeUuid()); | |
assertEquals(localBlockID, request.getGetBlock().getBlockID().getLocalID()); | |
assertEquals(bcsid, request.getGetBlock().getBlockID().getBlockCommitSequenceId()); | |
} catch (InvalidProtocolBufferException e) { | |
throw new RuntimeException(e); | |
} | |
request = ContainerProtos.ContainerCommandRequestProto.parseFrom(requestInBytes); | |
assertTrue(request.hasGetBlock()); | |
assertEquals(ContainerProtos.Type.GetBlock, request.getCmdType()); | |
assertEquals(containerID, request.getContainerID()); | |
assertEquals(datanodeID, request.getDatanodeUuid()); | |
assertEquals(localBlockID, request.getGetBlock().getBlockID().getLocalID()); | |
assertEquals(bcsid, request.getGetBlock().getBlockID().getBlockCommitSequenceId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also take a look at https://issues.apache.org/jira/browse/HDFS-15202
where having multiple short circuit cache instances imporves throughput.
dfs.client.short.circuit.num
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://issues.apache.org/jira/browse/HDFS-13639
https://issues.apache.org/jira/browse/HDFS-16535 SCR slot release
also https://issues.apache.org/jira/browse/HDFS-14535 on an interesting heap problem.
Is it possible to split this into subtasks? 220K patch is hard to review. |
BlockID blockID = BlockID.getFromProtobuf( | ||
request.getGetBlock().getBlockID()); | ||
if (replicaIndexCheckRequired(request)) { | ||
BlockUtils.verifyReplicaIdx(kvContainer, blockID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we keep this line to verify replica index?
} | ||
|
||
@Override | ||
public CompletableFuture<XceiverClientReply> watchForCommit(long index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the latest master changed the interface defintion of this method. It is now returning XceiverClientReply instead.
Same KeyValueHandler#handleGetBlock is used to process the getBlock request from the short-circuit channel. Block token will be verified. If its expires, block will not be opened at server side. We don't use shared memory in current implementation. So I think we should be fine. |
I will seperate the patch into sub patches. Close this one. |
What changes were proposed in this pull request?
support short-circuit read for datanode.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10685
How was this patch tested?
new unit tests
manual test with freon
Still working on whether it's possible to test it with robot test.