User Journey: google processing a large dataset
step1: mapreduce code
- build a binary
- submit to a borg cluster for execution
mapreduce → flume
gfs→colossus
F1 Query: A disaggregated query layer
Why multi-tenant services
Provision resource pool with peak of sums instead of sum of peaks
- SQL queries handled by F1 range from milliseconds to hours
- workers run many simultaneously-executing queries
For each query:
- Schedule N computation tasks onto a shared pool of F1 workers