April 20, 2026 - 10 min read - Async Reliability

"Fire and Forget" Is a Lie: Why Your Async Code Drops Work

Fire-and-forget async code feels efficient: trigger an email, log an event, call an API, return response, move on.

Then production loses work. Emails never send, logs go missing, payments are inconsistent, and events vanish without clear errors.

Fire-and-forget often means fire and hope.

The Problem: Async Scheduling Is Not Guaranteed Execution

Triggering async work does not guarantee it completes. Between scheduling and execution, requests end, processes restart, runtimes shut down, and networks fail.

Where It Breaks in Real Systems

1. Request ends before task completes

handle_request()
send_email_async()
return response

Request lifecycle ends, cleanup runs, and detached task can be abandoned without retry.

2. Process crash mid-task

Task starts, applies partial side effects, then process crashes. No built-in recovery exists.

3. Silent failures

Unobserved async exceptions can be swallowed. No await means no structured error handling path.

4. Deploys drop in-flight work

Rolling restarts kill old instances. In-flight async work on those nodes disappears.

5. Overload starves task execution

Busy event loops and memory pressure delay or skip background callbacks under load.

The Core Mistake

Treating in-process async execution as durable work processing is the failure.

The Fix: Make Work Durable Before It Runs

1. Persist first

INSERT INTO email_jobs (user_id, status)
VALUES (123, 'pending');

Now work survives crashes and can be audited and retried.

2. Use queues for critical work

Payments, emails, and state mutations need queue-backed workers with retry and visibility.

3. Track execution state

Every task should move through explicit states:

pending -> processing -> completed | failed

Without state, you cannot detect dropped work or recover safely.

4. Make handlers idempotent

process_email_if_not_sent(job_id)

Retries and duplicates are normal. Idempotency prevents repeated side effects.

5. Handle shutdown gracefully

Workers should drain, checkpoint, or requeue in-flight tasks before termination.

6. Log failures explicitly

Capture retries, errors, and final fail status. Silent background failure is operational debt.

7. Restrict fire-and-forget to non-critical work

Metrics and optional analytics are acceptable. Payments and user-facing state changes are not.

Pattern That Works in Production

Persist -> Queue -> Process -> Track.

This gives durability, observability, and predictable recovery under failure.

What Not To Do

Do not rely on raw async tasks for important work.
Do not ignore task errors and exceptions.
Do not assume detached tasks will always finish.
Do not skip persistence for critical operations.

The Mental Shift

Stop thinking: It does not need to block, so async is enough.

Start thinking: Does this need to be guaranteed? If yes, make it durable.

Reliable systems ensure work is durable before execution, not after failure.