You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 31, 2022. It is now read-only.
Cloud Dataproc now natively supports autoscaling. Dataproc's autoscaling seems to be a superset of the functionality in Spydra's autoscaler. If you're interested, I'd be happy to take a stab at moving Spydra to Dataproc's autoscaler and getting rid of the init action.
The one major difference is that the minimum cooldown period (scaling interval) in Dataproc is 10 minutes, while Spydra's README suggests 2 minutes. Are folks at Spotify using scaling intervals that short?
The text was updated successfully, but these errors were encountered:
I haven't looked closely at Dataproc's native autoscaler but supporting replacing our simple heuristic makes sense. It has been our overall strategy to fill in gaps and replace our tools with officially supported tools as they become available. I would be happy to see a PR from your side.
The 10 minutes interval should be fine. The only implication is that one needs to initially size the cluster at a reasonable size to not have jobs taking 10 minutes more but I believe that that's a reasonable requirement.
Hey as an update, autoscaling just launched to Beta today! A few updates since alpha:
The minimum cooldown period is now 2 minutes
Monitoring autoscaling and cluster metrics is far easier now. We have common YARN and HDFS metrics in the cluster page (of the cloud console) and autoscaler logs to understand why the autoscaler made certain decisions (click on "View logs" and select just just the autoscaler logs)
You can also enable autoscaling, disable autoscaling, or switch autoscaling policies on clusters at any time. You can also update autoscaling policies live, without needing to touch the cluster.
(Teaser) there's going to be a new shuffle service so you can actually autoscale clusters without killing in-progress jobs.
Now that the API is stable, I'd like to circle back and actually integrate native Dataproc autoscaling into Spydra. I think the easiest option would be to have users create autoscaling policies outside of spydra, and then just specify the autoscaling policy to use in their spydra config. WDYT?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Cloud Dataproc now natively supports autoscaling. Dataproc's autoscaling seems to be a superset of the functionality in Spydra's autoscaler. If you're interested, I'd be happy to take a stab at moving Spydra to Dataproc's autoscaler and getting rid of the init action.
The one major difference is that the minimum cooldown period (scaling interval) in Dataproc is 10 minutes, while Spydra's README suggests 2 minutes. Are folks at Spotify using scaling intervals that short?
The text was updated successfully, but these errors were encountered: