-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search for better cost-effficient cloud provider #62
Comments
Plan for elastic scaling
If SLK can programmatically spin up a container for both selenium server and scraper job, that would be awesome. Github AWS SDK for javascript. AWS CDK for EC2 npm page. Spec for each container:
One thing we want to ask is, VPC and subnet, does they cost anything? If not, we can use tf to create them beforehand; Otherwise, we may want to include them into the dynamically creating resource logic in SLK.
Plan for more RAM
|
Elastic ApproachLooks like elastic approach is probably the most cost efficient one. The idea is basically:
|
K8S Elastic ApproachLooks like via nodehs DO client, the node pool and its nodes are quite troubling. Node inside stucks at "provisioning". Why is that?
Another method is to use tf to create a separate node pool of 0 node count. Then SLK will just update "count" when need to scale up. Then, do polling to check node readiness. |
Practical approach on k8s elastic
|
Problem
Tunning Profile
|
Final MilestoneWhile there're a lot space for improvement, we could set a final milestone here just to achieve two things:
Steps
Side notes
|
Standlone ApproachThings going well until we face the challenges here As you can see, there's a problem with k8s node assignment algorithm. Before the whole cluster went nuts, you can see besides the 2+2 job-switching overlap, which is dangerous too and we can lower the memory request, but there's an additional job assigned to this node, making it have 5 jobs running concurrently at that moment. Looks like we can't trust k8s's node assignment. We can lower the memory request so that a job doesn't claim such much memory at the beginning, only claim more when a job needs it - we can do that. But we don't have control over node assignment. Unless k8s has some additional parameter to configure this, we will need to implement this node distribution algorithm by our own. Two ways:
|
Challenges implementing anti-affinity by redis semaphoreThe semaphore objects are possibly created across different node processes. Got two issues
Some ideas why previous work succeeded:
|
Overlap memory usage issueDuring job switch, looks like k8s job does not immediately release memory.
Some ideas to tackle this situation
Shape of one node
Looks like it works perfectly now! Final check
|
Scaling SLKRecap the workflow
The core node pool given 1v2G is too small, unstable. We got network and nginx crashed, even if SLK nodes are fine. We then have to upgrade core node pool droplet size. At the end it just lose the point - we can just simply use a 4v8G droplet for both core and SLK. And that's a firm $40 per month. You can scale the entire k8 down by terraform - right, it's not perfect in that sense though. I don't know, maybe we need a separate k8 cluster to "manage" the k8 used for scraper. Also, we would like to populate many companies currently missing in our database, include:
|
The scaling up and down is quite stable right now. Further automation would be really hard to maintain the same cost level around $20-40 monthly and can't really save us money while have the ability to scale up. SummaryThe current max capacity is 60 jobs at max, using core droplet size of 8G RAM. Could be either 4v8G, or memory-optimized 1v8G. Basic monthly bill is $40 at this memory size. But we figured out a way to bypass letsencrypt duplicated certificate, so we can always scale down the entire k8s cluster. Of course we still have to do this in terminal. Would be ideal to have a meta-service running, at least trigger travis job to trigger k8s provision / deletion. Perhaps heroku could be a good place to do this due to its free plan. The scaling up cost is additional, using 2v4G machines. |
This ticket also deals with the vision of this project. If we want to scale while paying reasonable bill for cloud provider, we need a more flexible way to run a kubernetes cluster. And using Kubernetes as a service definitely is limiting our way on that path.
Ideally something like AWS Fargate will do the best - if we can lower the cost when our cluster is at idle, then we can afford more concurrency while scraper jobs fire up.
AWS route: more RAM w/ cost efficiency
Several requirements about provisioning on AWS
Elastic scale route: save cost
Approach 1: AWS Fargate, or any other container service
Approach 2: K8 auto-scaling, K8 API
Approach 3: manual slack command in K8s
This approach is supposed to be the most feasible, fast to start one. No need to seek for other platform. This approach aims to create a low cost node running SLK - SLK has to be up all the time, in order to receive manual scale up / scale down command. That is, this approach uses SLK as a platform to manually scale up and down. This should save us cost and prevent keeping a large-cost node running w/o any scraper job present.
up
, then SLK triggers a travis build, which will run a terraform script to provision the resources for selenium server, on a dedicated node.rrr
orccc
.down
, so that it will trigger a travis build, which will run the same terraform script but usingdestroy
to destroy the node, along with all the selenium server and scraper jobs.The text was updated successfully, but these errors were encountered: