Distributed Scheduler - Datamodel
Explore Temporal data modeling for distributed job schedulers that separates scheduling concerns into different core entities
Summary
Temporal data modeling for distributed job schedulers separates scheduling concerns into three core entities: Jobs store schedule definitions with cron expressions, Run tracks individual executions with timing boundaries, and RunStatus maintains complete state transition history. The approach enables critical scheduler operations through temporal separation: determining next executions via next_scheduled_run queries, analyzing patterns through historical Run data, and debugging via complete state transition timelines. The design evolves iteratively from simple status fields to comprehensive audit trails, incorporating denormalized job_id references for efficient querying while maintaining referential integrity across millions of state transitions
Designed Datamodel
id: int(Primary Key),
name: string,
paylod: blob,
execution_url: string
next_scheduled_run
deleted: boolean
created_by: string
permissions: [principals]
recurring_schedule: (string) or null in case of one_off id: int(primary key)
job_id: foreign_key
current_status: int (NONE, QUEUED, RUNNING, DELETE, CANCELED)
executor_id: string
run_start_time:
run_end_time:
results_path: "s3://"
error_path: "s3://" run_id: composite key with job_id,status
job_id:
status:
timestamp: