Multiple pods · one execution per fire · misfire policies · failover recovery · all interactive
SELECT โฆ FROM QRTZ_LOCKS WHERE LOCK_NAME='TRIGGER_ACCESS' FOR UPDATE. The DB grants the row lock to exactly one pod โ the rest block briefly, then see no triggers to acquire and back off.
SELECT FOR UPDATE is a row-level pessimistic lock. The first transaction to acquire the row blocks all others until it commits or rolls back.
Quartz uses this on a single row in QRTZ_LOCKS with LOCK_NAME='TRIGGER_ACCESS'. While one pod holds it, no other pod can acquire triggers.
The winner picks up the trigger from QRTZ_TRIGGERS, marks it ACQUIRED, inserts a row into QRTZ_FIRED_TRIGGERS, releases the lock, then runs the job.
Cost: this serializes trigger acquisition cluster-wide. Beyond ~3 nodes, contention starts to matter.
QRTZ_LOCKS (the lock row), QRTZ_FIRED_TRIGGERS (currently running jobs), and QRTZ_SCHEDULER_STATE (heartbeats โ used for failover detection).
QRTZ_SCHEDULER_STATE stop heartbeating, the cluster detect the failure after the checkin interval expires, and (if requestsRecovery=true) re-fire the job on a healthy pod.
/actuator/health output update. Kubernetes uses this for readiness/liveness probes โ a NotReady pod stops receiving traffic.
/actuator/health/readiness for the K8s readiness probe and /actuator/health/liveness for liveness. Don't use /actuator/health for both โ a transient DB blip can liveness-fail your pod and trigger a restart loop.
@DisallowConcurrentExecution, each trigger fires a new job instance even if the last one is still running โ threads pile up until the pool is exhausted. Flip the annotation on and see the difference.
QRTZ_SCHEDULER_STATE row has gone stale โ it does not use OS signals or Kubernetes events.spring:
quartz:
job-store-type: jdbc
jdbc:
initialize-schema: never # run quartz_tables.sql manually in production
properties:
org.quartz:
scheduler:
instanceName: ClusteredScheduler
instanceId: AUTO # auto-generates unique ID per pod
jobStore:
class: org.springframework.scheduling.quartz.LocalDataSourceJobStore
driverDelegateClass: org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
tablePrefix: QRTZ_
isClustered: true # THE switch โ without this, behavior is undefined
clusterCheckinInterval: 7500 # ms โ how often each pod heartbeats
useProperties: false
threadPool:
class: org.quartz.simpl.SimpleThreadPool
threadCount: 10
QRTZ_JOB_DETAILS โ job class names, durability, requestsRecovery flag, JobDataMapQRTZ_TRIGGERS โ trigger metadata (next/prev fire time, state, misfire instruction)QRTZ_SIMPLE_TRIGGERS โ SimpleTrigger-specific data (repeat count, repeat interval)QRTZ_CRON_TRIGGERS โ CronTrigger-specific (cron expression, time zone)QRTZ_SIMPROP_TRIGGERS โ calendar/daily-time-interval triggers (simple properties)QRTZ_BLOB_TRIGGERS โ serialized custom triggers (rarely used)QRTZ_CALENDARS โ serialized Calendar objects (exclusion calendars)QRTZ_PAUSED_TRIGGER_GRPS โ paused trigger groupsQRTZ_FIRED_TRIGGERS โ currently executing triggers (one row per running execution; key for failover detection)QRTZ_SCHEDULER_STATE โ heartbeat row per scheduler instance with LAST_CHECKIN_TIMEQRTZ_LOCKS โ row-level lock table (TRIGGER_ACCESS, STATE_ACCESS, JOB_ACCESS, CALENDAR_ACCESS, MISFIRE_ACCESS)By default, Quartz is happy to fire the same job concurrently on different pods (or on the same pod for staggered triggers). If your job is non-idempotent โ like sending an email or updating a balance โ you don't want that.
Adding @DisallowConcurrentExecution on the JobDetail prevents concurrent runs cluster-wide. While job instance A is running on any pod, no other pod will fire the same JobKey.
Trap: if pod A crashes mid-job and the lock is held forever in QRTZ_FIRED_TRIGGERS, you may see the job stuck. Recovery (next checkin) will release it. With requestsRecovery=true, it re-fires; without, it just unblocks future fires.
MISFIRE_INSTRUCTION_SMART_POLICY (default) โ Quartz picks based on trigger type: simple = fire once now, cron = fire once nowMISFIRE_INSTRUCTION_FIRE_ONCE_NOW โ execute immediately, ignore how many were missedMISFIRE_INSTRUCTION_DO_NOTHING โ skip all missed, wait for next regular fireMISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY โ disable misfire detection, fire all missed back-to-back (dangerous for high-frequency jobs)Set per-trigger when building it: .withSchedule(cronSchedule(...).withMisfireHandlingInstructionFireAndProceed())
The cluster-wide lock on QRTZ_LOCKS:TRIGGER_ACCESS serializes trigger acquisition. With 3 nodes the contention is mild. With 8+ nodes the lock becomes the bottleneck โ pods spend more time waiting for the lock than running jobs.
Workarounds:
org.quartz.jobStore.acquireTriggersWithinLock=false (older Quartz; modern versions handle this)isClustered=trueinstanceId (don't hardcode it; use AUTO)