Root Cause Analysis (RCA) Overview

Published Date: May 13, 2025

Validated: Yes

Audience: Everyone

Products and Versions Covered:

Cloud/CVC
Self-hosted, Replicated - KOTS
Jama Connect®

Summary

A Root Cause Analysis (RCA) is a structured investigation conducted after an issue has been resolved or temporarily mitigated. A separate ticket is created specifically to document and analyze the RCA.

An RCA identifies the underlying cause of a production outage and provides recommendations to prevent recurrence.

RCA Prerequisites

Customers may request an RCA only for a Severity 1 Production Outage.
A Production Outage (Severity 1) ticket must be created during the outage.
The customer must explicitly request an RCA.

Required Information

Support cannot complete an RCA without sufficient context and data. Customers must provide:

A clear and detailed description of the issue
Relevant logs and/or screenshots
Thread dumps captured during performance-related incidents
A brief timeline outlining when the issue occurred and key events

Customer Engagement

The customer must actively collaborate with Support.
Timely responses to information requests are required.
If engagement stops, Support will document findings available at that point and close the ticket after notification.

Customer-Requested RCA

Customers may request an RCA only for a Severity 1 Production Outage.

Important expectations:

Submission of an RCA request does not guarantee that one will be performed.
Determining a definitive root cause is not always possible.
An RCA ticket may remain open for several months, depending on complexity.
Communication typically occurs monthly through the dedicated RCA ticket.
The original SEV1 ticket will be closed.
A new ticket will be created specifically for the RCA investigation.

Support will investigate, document findings, and share recommendations where applicable.

Resolution

The RCA Process

RCA investigations differ from standard support cases and do not follow standard SLA response timelines.

RCA tickets are of lower priority than active production issues because the incident has already occurred.

Whenever possible, Support follows these structured steps:

1. Investigate

Collect logs, system data, and configuration information
Analyze events leading up to the issue
Reproduce the problem if feasible
Identify potential failure points

2. Report

Document findings
File a defect or bug report if applicable
Provide detailed technical analysis to relevant teams

3. Recommend

Provide mitigation steps
Recommend configuration improvements
Suggest process changes or upgrades
Outline preventive measures to reduce recurrence

Additional Resources

Feedback:
We welcome your input! Please sign in to leave any comments, suggestions, or improvement ideas below.

Related to