feat(deadline): add option to the RenderQueue to use cachefilesd #367

ddneilson · 2021-03-29T16:12:26Z

Implements: #366

Testing

Started up a basic render farm using the RFDK examples, but modified the example to set the fsc mount option on the Amazon EFS and enable cachefilesd on the RenderQueue.

Used systems manager connection to remote in to the RenderQueue's ECS container host, and then:

Verified that cachefilesd is installed and running.
Verified that the /mnt/repo filesystem was mounted using the EFS helper which in-turn uses the NFSv4 mount driver, and that the fsc option was properly set on the NFSv4 mount point.

[root@ip-10-0-126-223 ~]# mount | grep nfs
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
127.0.0.1:/ on /mnt/repo type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,port=20312,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,fsc,local_lock=none,addr=127.0.0.1,_netdev)

Ran a series of full copies of the /mnt/repo filesystem (approx 1.3 GB of data) to the local device both with and without cachefilesd running. The image below shows the throughput metrics on the Amazon EFS filesystem through these tests; as expected, the read throughput with cachefilesd is much lower than a full copy once the local cache has been populated.

Performance Testing

Test setup:

RFDK Basic all-in application with an added bastion host set up as a Workstation (installed Deadline & DCV). So... single RCS (c5.large), EFS burst-mode filesystem with ~1GB of data (encrypted), DocDB (db.r5.large), Linux-based Worker ASG (t3.medium), and TLS enabled throughout

So, cachefilesd shaves about 30% of metered throughput when starting 200 workers (9.73MB/s -> 6.9MB/s) and 26% of metered throughput when rendering 1k 'sleep 10' jobs (1.9MB/s -> 1.6MB/s). With the standard caveats, of course, about this being a single data point so there will be sizable error bars on those.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

RandomInsano · 2021-03-29T22:27:26Z

Looks great! My only concern is with the namespace for this option and underlying function names being generic as it has the following two effects:

Those aware of cachefilesd have to check the docs to know what the underlying tech is
If we find an even better caching option or choose to change the Repo's backing protocol (say to SMB) we'll have problems.

Maybe we could swap "enable_local_file_caching" for "enable_fsc_caching", etc?

jericht

Just one really minor comment, otherwise this LGTM

packages/aws-rfdk/lib/deadline/scripts/bash/enableCacheFilesd.sh

ddneilson · 2021-03-31T03:41:50Z

Looks great! My only concern is with the namespace for this option and underlying function names being generic as it has the following two effects:
1. Those aware of cachefilesd have to check the docs to know what the underlying tech is

2. If we find an _even better_ caching option or choose to change the Repo's backing protocol (say to SMB) we'll have problems.
Maybe we could swap "enable_local_file_caching" for "enable_fsc_caching", etc?

@RandomInsano Thanks for the feedback, Edwin. Variable/property naming is hard. ;-) On one hand... generic for those that don't know the details, or don't want to know/care... On the other hand... it is nice to know what is going on under the hood. I'm trying to strike that balance by going generic with the property name and providing details in the docstring (which shows up under quick-info-type IDE functionality). My thinking is also that if there's a different caching tech for a different filesystem, then we'd want to support that under the same option but detect the filesystem type if we can; it's all the same high-level functionality, regardless of how it's accomplished.

You're thinking that I'm missing the mark with it?

packages/aws-rfdk/lib/deadline/scripts/bash/enableCacheFilesd.sh

jusiskin

Just one minor question, but otherwise looks great.

packages/aws-rfdk/lib/deadline/scripts/bash/enableCacheFilesd.sh

feat(deadline): Adds option to the RenderQueue to use cachefilesd

bb0fee3

ddneilson linked an issue Mar 29, 2021 that may be closed by this pull request

Enable cachefilesd on RenderQueue #366

Closed

2 tasks

ddneilson requested a review from jusiskin March 29, 2021 16:55

jusiskin added the contribution/core This is a PR that came from AWS. label Mar 30, 2021

jericht previously approved these changes Mar 31, 2021

View reviewed changes

packages/aws-rfdk/lib/deadline/scripts/bash/enableCacheFilesd.sh Outdated Show resolved Hide resolved

Add error message when unable to install cachefilesd

940d9f6

ddneilson dismissed jericht’s stale review via 940d9f6 March 31, 2021 03:50

ddneilson requested a review from jericht March 31, 2021 03:50

jericht reviewed Mar 31, 2021

View reviewed changes

packages/aws-rfdk/lib/deadline/scripts/bash/enableCacheFilesd.sh Outdated Show resolved Hide resolved

jusiskin requested changes Mar 31, 2021

View reviewed changes

packages/aws-rfdk/lib/deadline/scripts/bash/enableCacheFilesd.sh Show resolved Hide resolved

Fix subshell in shell script

9c76082

ddneilson requested review from jericht and jusiskin March 31, 2021 16:44

jericht approved these changes Mar 31, 2021

View reviewed changes

jusiskin changed the title ~~feat(deadline): Adds option to the RenderQueue to use cachefilesd~~ feat(deadline): adds option to the RenderQueue to use cachefilesd Mar 31, 2021

jusiskin changed the title ~~feat(deadline): adds option to the RenderQueue to use cachefilesd~~ feat(deadline): add option to the RenderQueue to use cachefilesd Mar 31, 2021

jusiskin approved these changes Mar 31, 2021

View reviewed changes

jusiskin merged commit 901b749 into aws:mainline Mar 31, 2021

ddneilson deleted the file-caching-rq branch April 1, 2021 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(deadline): add option to the RenderQueue to use cachefilesd #367

feat(deadline): add option to the RenderQueue to use cachefilesd #367

ddneilson commented Mar 29, 2021 •

edited

Loading

RandomInsano commented Mar 29, 2021 •

edited

Loading

jericht left a comment

ddneilson commented Mar 31, 2021 •

edited

Loading

jusiskin left a comment

feat(deadline): add option to the RenderQueue to use cachefilesd #367

feat(deadline): add option to the RenderQueue to use cachefilesd #367

Conversation

ddneilson commented Mar 29, 2021 • edited Loading

Testing

Performance Testing

RandomInsano commented Mar 29, 2021 • edited Loading

jericht left a comment

Choose a reason for hiding this comment

ddneilson commented Mar 31, 2021 • edited Loading

jusiskin left a comment

Choose a reason for hiding this comment

ddneilson commented Mar 29, 2021 •

edited

Loading

RandomInsano commented Mar 29, 2021 •

edited

Loading

ddneilson commented Mar 31, 2021 •

edited

Loading