The "Post-Mortem Drafter"
Turn an incident Slack thread into an RCA doc before the next standup.
THE PROMPT
Act as an engineering team lead writing a post-mortem / root cause analysis (RCA) document.
I'll give you a raw incident timeline — this could be a Slack thread, bullet point notes,
a PagerDuty log, or a rough description of what happened.
Produce a structured post-mortem with these sections:
1. Incident Summary — what happened, when, and how long (2 sentences)
2. Timeline — clean chronological list with timestamps if available
3. Root Cause — the actual underlying cause (not just the symptom)
4. Contributing Factors — 2–3 things that made this worse or allowed it to happen
5. Customer / Business Impact — quantified where possible
6. Action Items — 3–5 concrete, owner-assignable follow-up tasks
7. What went well — at least 1 thing (blameless culture matters)
Tone: Factual, blameless, constructive. No finger-pointing. Focus on systems, not people.
[PASTE INCIDENT TIMELINE OR NOTES HERE]
What it does
Post-mortems are essential but universally dreaded because they're time-consuming to write under pressure. This prompt takes the raw, chaotic record of an incident and structures it into a clean, blameless RCA in minutes — freeing the engineering team to focus on the fixes rather than the paperwork.
Example in Action
Raw Data:
"2:14am alerts fire — loan origination API returning 503s. On-call woke up. Took 20 min to find it was the 3rd party identity verification (IDV) provider. They had a rate limit we hit due to a marketing campaign that drove 3x normal volume. Originations down for 47 minutes. About 200 applications dropped. Marketing didn't tell eng about the campaign. We don't have rate limit monitoring on that API."
Result:
Incident Summary: On [date] at 02:14 AM CST, OppFi's loan origination API began returning 503 errors due to a third-party identity verification provider hitting its rate limit. The incident lasted 47 minutes and affected approximately 200 loan applications.
Timeline:
- 02:14 — PagerDuty alert fires; origination API returning 503s
- 02:19 — On-call engineer begins investigation
- 02:34 — Root cause isolated to IDV provider rate limiting
- 02:51 — Rate limit raised via vendor emergency escalation; originations resume
- 03:01 — All-clear confirmed; incident closed
Root Cause: A 3x spike in origination volume driven by a marketing campaign exhausted the rate limit on the third-party IDV API. No monitoring existed for this limit, so there was no advance warning.
Contributing Factors:
- No cross-functional communication process between Marketing and Engineering for campaigns that affect origination volume
- No rate limit monitoring or alerting on the IDV API integration
- Rate limit was set at contract signing and never revisited as volume grew
Business Impact: ~200 applications dropped during the window; estimated $X in potential originations at risk. Customer experience impact unknown (applicants likely saw generic error).
Action Items:
- [Owner: Infra] Add rate limit monitoring + alerting to all third-party API integrations — due [date]
- [Owner: Eng + Marketing] Create campaign notification process (Marketing notifies Eng >48hrs before any campaign expected to 2x+ normal volume) — due [date]
- [Owner: Partnerships] Renegotiate IDV rate limits based on current and projected volume — due [date]
What went well: On-call identified and resolved the root cause in 37 minutes despite being a novel failure mode with no runbook.