"Fire and Forget" Is a Lie: Why Your Async Code Drops Work
Fire-and-forget async code feels efficient: trigger an email, log an event, call an API, return response, move on.
Then production loses work. Emails never send, logs go missing, payments are inconsistent, and events vanish without clear errors.
Fire-and-forget often means fire and hope.
The Problem: Async Scheduling Is Not Guaranteed Execution
Triggering async work does not guarantee it completes. Between scheduling and execution, requests end, processes restart, runtimes shut down, and networks fail.
Where It Breaks in Real Systems
1. Request ends before task completes
handle_request()
send_email_async()
return response
Request lifecycle ends, cleanup runs, and detached task can be abandoned without retry.
2. Process crash mid-task
Task starts, applies partial side effects, then process crashes. No built-in recovery exists.
3. Silent failures
Unobserved async exceptions can be swallowed. No await means no structured error handling path.
4. Deploys drop in-flight work
Rolling restarts kill old instances. In-flight async work on those nodes disappears.
5. Overload starves task execution
Busy event loops and memory pressure delay or skip background callbacks under load.
The Core Mistake
Treating in-process async execution as durable work processing is the failure.
The Fix: Make Work Durable Before It Runs
1. Persist first
INSERT INTO email_jobs (user_id, status)
VALUES (123, 'pending');
Now work survives crashes and can be audited and retried.
2. Use queues for critical work
Payments, emails, and state mutations need queue-backed workers with retry and visibility.
3. Track execution state
Every task should move through explicit states:
pending -> processing -> completed | failed
Without state, you cannot detect dropped work or recover safely.
4. Make handlers idempotent
process_email_if_not_sent(job_id)
Retries and duplicates are normal. Idempotency prevents repeated side effects.
5. Handle shutdown gracefully
Workers should drain, checkpoint, or requeue in-flight tasks before termination.
6. Log failures explicitly
Capture retries, errors, and final fail status. Silent background failure is operational debt.
7. Restrict fire-and-forget to non-critical work
Metrics and optional analytics are acceptable. Payments and user-facing state changes are not.
Pattern That Works in Production
Persist -> Queue -> Process -> Track.
This gives durability, observability, and predictable recovery under failure.
What Not To Do
- Do not rely on raw async tasks for important work.
- Do not ignore task errors and exceptions.
- Do not assume detached tasks will always finish.
- Do not skip persistence for critical operations.
The Mental Shift
Stop thinking: It does not need to block, so async is enough.
Start thinking: Does this need to be guaranteed? If yes, make it durable.
Reliable systems ensure work is durable before execution, not after failure.