Scaling way up: From NoSQL back to SQL (1)

User Journey: google processing a large dataset

step1: mapreduce code
  1. build a binary
  1. submit to a borg cluster for execution
mapreduce → flume
gfs→colossus

F1 Query: A disaggregated query layer

Why multi-tenant services

Provision resource pool with peak of sums instead of sum of peaks
  • SQL queries handled by F1 range from milliseconds to hours
  • workers run many simultaneously-executing queries
For each query:
  • Schedule N computation tasks onto a shared pool of F1 workers