The Register reports on Google’s approach to active task monitoring and management:
“Our solution, CPI2, uses cycles-per-instruction (CPI) data obtained by hardware performance counters to identify problems, select the likely perpetrators, and then optionally throttle them so that the victims can return to their expected behavior. It automatically learns normal and anomalous behaviors by aggregating data from multiple tasks in the same job.”
The article talks about the benefits in terms of low-latency guarantees, because hungry batch jobs can be killed. The data might be useful for tracking down performance bottlenecks too – imagine never having to attach a profiler to a running process because you can already attribute the time spent on a job to its individual tasklets.