ThirdEye is comprised of three components
Out of these three the Worker component is responsible for doing the heavy lifting of running different kinds of tasks scheduled by the Scheduler. As we introduce more and more alerts, the task count grows proportionally. ThirdEye provides the flexibility to scale the workers horizontally as well as vertically to support the growing tasks.
ThirdEye Worker has an internal thread pool which it uses to run the tasks in parallel. We can
control the parallelism with the help of
This should be configured based on the frequent load expected on the worker, for example if there are 10 alerts which have detection scheduled for every minute then it makes sense to have 10 tasks run in parallel.
For helm installation
... worker: config: maxParallelTasks: 10 # default is 5
For bare metal installation
... taskDriver: maxParallelTasks: 10
ThirdEye Workers can be scaled by replicating them.
This is a more flexible scaling as it allows us to scale up/down the number of workers based on sudden temporary changes in task rates, e.g. if there are 5 alerts that runs detection after every minute, but we also have another 10 alerts which just trigger once a day at 2 am, in this case it makes sense to spin up another worker before 2 am and take it down once the tasks are served. (HPA is not yet supported in official ThirdEye helm charts as the trigger to scale can vary based on the usecase)
Configuration For helm installation, provide the number of replicas needed.
... worker: replicas: 2 ... config: randomWorkerIdEnabled: true
For bare metal installation, run multiple workers on different ports.
... taskDriver: enabled: true randomWorkerIdEnabled: true
What is the max number of parallel tasks can be run in case of vertical scaling?
Each thread will take cpu time and heap memory so long-running and high memory demanding tasks will prefer less parallel threads while quick running and low memory demanding tasks can afford to have more parallelism. (default is 5)
What is the max number of workers one can set in case of horizontal scaling?
There is no limit as such from ThirdEye side, but basically it will be (resources available)/(resource required per worker)
How about consistency and reliability? In which case (vertical or horizontal scaling) current ThirdEye performs well?
This is a scenario based behaviour as to which scaling suits best. Any of the two methods won't put ThirdEye in inconsistent state. As for reliability, as long as enough resources are available for a given setup, the system won't be affected with the scaling.