What Is Root Cause Analysis (RCA)?

Read about root cause analysis, its methods and benefits.

What Is Root Cause Analysis (RCA)?

  • Root cause analysis (RCA) is the process of finding and analyzing the causes of a problem or an event impacting the value delivery of an application or an organization. RCA can find one or more root causes underlying a problem needing to be addressed to solve and prevent the problem from recurring.

    Many science and engineering domains employ RCA to find causes of problems and develop a systematic approach to address them. In hazardous workplace environments, investigators perform root cause analysis to understand what led to an incident or a system failure and subsequently prevent such events. In public health systems, epidemiologists perform RCA to determine underlying conditions and implement an emergency response to contain a disease outbreak.

    In IT operations, successfully determining root causes is instrumental to minimize the impact of problems such as a data breach, an application downtime, or a service disruption.

  • A wide range of methods, tools, techniques, and philosophies influence RCA. While techniques and methodologies used can vary in different problem domains, phases involved in RCA are common.

    Problem Definition: The process of root cause analysis starts by clearly defining the problem and its impact. This step is critical because a poorly described problem hampers identifying root causes and subsequently addressing the problem.

    For example, based on an alert from the IT monitoring system, the IT Ops team poorly defines the problem as “the host machine restarted abruptly.” However, various reasons could contribute to the unavailability of the host machine. It could be a network issue, a badly configured host machine, or an error in monitoring itself. Hence, an accurate description in this case is “the host machine went offline abruptly for a certain period of time.”

    Data, Information, and Evidence: In this phase, any relevant information about the problem should be gathered. In the IT Ops realm, this data is typically available from systems used for log managementapplication performance monitoring, network monitoring, etc. It’s important to gather only relevant data and differentiate facts and opinions. This phase should answer:

    • how long has the problem or event existed?
    • are there any specific circumstances that led to the problem?
    • is a sequence of internal or external events what led to the problem?
    • what are the symptoms of the problem?

    Issues and Events Contributing to the Problem: Based on the information and data, the issues and events concerning the problem are identified. These issues are mapped, and the sequence is analyzed to reach the main contributing factors of the problem.

    Determine Root Causes: Cause-and-effect relationships are determined by analyzing the contributing factors of the problem. This allows classifying the causes into three main categories:

    • Direct Cause: Causes directly contributing to the problem and are most visible
    • Indirect Cause: Causes not directly creating the problem, but in some ways are relate to a problem or play a role in direct cause

    After determining root causes, recommendations and mitigation plans are identified to eliminate or prevent the problem from recurring.

  • A variety of RCA methods are available to find the fundamental causes of a problem. Depending on the problem domain and context, investigators use a single method or a combination of methods. Among RCA methods, Five Whys and Fishbone Diagram Analysis are some of the most popular methods.

    Five Whys: A simple cause-and-effect approach that begins by questioning why a problem occurred. The response to the first why question acts as input for the next why question. Consequently, this chain of questions and responses leads to finding one or more root causes of the problem. In order to achieve effectiveness, however, it’s critical to ask the right questions and respond based on facts. Moreover, the number of questions needed to be asked can be as less as two or as many as 40 until all root causes are identified.

    Fishbone Diagram Analysis: Also known as Ishikawa Cause and Effect Analysis, this root cause analysis method uses visualization that looks like a fish skeleton. The head contains the problem definition, and the body represents each cause as a bone and factors contributing to the cause. The analysis continues until one more root causes are identified.

  • RCA is an iterative process and is most effective when implemented in a systematic approach. It benefits an organization by helping to:

    • Find permanent solutions to problems by identifying the most fundamental causes
    • Create an organized, practical approach to problem-solving using the organizational data
    • Identify organizational needs for improvement by analyzing past data
  • In simple words, RCA fundamentally focuses on analyzing a problem to find what happened, how the problem occurred, and why it happened to help determine and implement permanent solutions to the problem.

Featured in this Resource
Like what you see? Try out the product.
SolarWinds Observability

Unify and extend visibility across the entire SaaS technology stack supporting your modern and custom web applications.

Email link to free trialFully functional for 30 days