-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(deadline): add option to the RenderQueue to use cachefilesd #367
Conversation
Looks great! My only concern is with the namespace for this option and underlying function names being generic as it has the following two effects:
Maybe we could swap "enable_local_file_caching" for "enable_fsc_caching", etc? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one really minor comment, otherwise this LGTM
packages/aws-rfdk/lib/deadline/scripts/bash/enableCacheFilesd.sh
Outdated
Show resolved
Hide resolved
@RandomInsano Thanks for the feedback, Edwin. Variable/property naming is hard. ;-) On one hand... generic for those that don't know the details, or don't want to know/care... On the other hand... it is nice to know what is going on under the hood. I'm trying to strike that balance by going generic with the property name and providing details in the docstring (which shows up under quick-info-type IDE functionality). My thinking is also that if there's a different caching tech for a different filesystem, then we'd want to support that under the same option but detect the filesystem type if we can; it's all the same high-level functionality, regardless of how it's accomplished. You're thinking that I'm missing the mark with it? |
packages/aws-rfdk/lib/deadline/scripts/bash/enableCacheFilesd.sh
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor question, but otherwise looks great.
Implements: #366
Testing
Started up a basic render farm using the RFDK examples, but modified the example to set the
fsc
mount option on the Amazon EFS and enablecachefilesd
on the RenderQueue.Used systems manager connection to remote in to the RenderQueue's ECS container host, and then:
/mnt/repo
filesystem was mounted using the EFS helper which in-turn uses the NFSv4 mount driver, and that thefsc
option was properly set on the NFSv4 mount point./mnt/repo
filesystem (approx 1.3 GB of data) to the local device both with and without cachefilesd running. The image below shows the throughput metrics on the Amazon EFS filesystem through these tests; as expected, the read throughput with cachefilesd is much lower than a full copy once the local cache has been populated.Performance Testing
Test setup:
So, cachefilesd shaves about 30% of metered throughput when starting 200 workers (9.73MB/s -> 6.9MB/s) and 26% of metered throughput when rendering 1k 'sleep 10' jobs (1.9MB/s -> 1.6MB/s). With the standard caveats, of course, about this being a single data point so there will be sizable error bars on those.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license