STPA[VisualPro Tech Insight] Hidden Risks in Semiconductor Fabs? A Guide to STPA Safety Analysis Examples

Modern systems, such as smart factories, autonomous driving, aerospace, and aviation, have become incomparably more complex than in the past. Rather than simple component failures, there are now many cases where 'incorrect interactions between system components' lead to major accidents.


To control the complexity of these modern systems, an innovative safety analysis technique developed at MIT, known as STPA (Systems Theoretic Process Analysis), is utilized. Today, we will explore how to conduct an STPA analysis focusing on 'Semiconductor Fab Safety' using our VisualPro software.

Note: The risk-related scenarios and analysis content in this material are fictional, created with the assistance of AI, and are not related to any specific real-world organization or individual.


05e2af1986b32.png


Step 1: What Must Be Prevented? (Defining Losses and Hazards) 

The first step is to identify situations that must not occur in the system—Losses—and the system states that lead to those losses—Hazards.

In this analysis, the following losses were identified:

ID

Loss

Traceability

Loss-1

Loss of life and injury (toxic gas leaks, explosions, occupational diseases, physical safety accidents)

H-1, H-2, H-3, H-5

Loss-2

Environmental pollution (atmospheric dispersion of harmful gases, unauthorized discharge of toxic wastewater, soil contamination)

H-1, H-4

Loss-3

Financial loss and equipment damage (damage to expensive equipment due to fire/explosion, mass product disposal and recalls)

H-1, H-2, H-3, H-4, H-5

Loss-4

Mission (production) failure (full factory shutdown due to accidents, failure to secure yield, supply chain disruption)

H-1, H-2, H-4, H-5

As the causes of these losses, the following hazards were defined:

ID

Hazard

Traceability

H-1

Leakage of highly toxic and corrosive chemicals

Loss-1, Loss-2, Loss-3, Loss-4

H-2

Accumulation of flammable/explosive gases in confined spaces and exposure to ignition sources

Loss-1, Loss-3, Loss-4

H-3

Neutralization of shielding and interlocks for high-energy sources (radiation, high temperature, etc.)

Loss-1, Loss-3

H-4

Bypassing of hazardous pollutant purification facilities and untreated discharge

Loss-2, Loss-3, Loss-4

H-5

Non-compliance with Lockout/Tagout (LOTO) procedures during hazardous work and neglect of defects

Loss-1, Loss-3, Loss-4



Step 2: How Is the System Controlled? (Control Structure Modeling) 

This step visualizes the commands and feedback exchanged between system components. Using VisualPro, even complex diagrams can be drawn with ease.

In the semiconductor fab control structure, there are broadly four Controllers managing the subordinate Semiconductor Manufacturing Process & Pollution Prevention Facilities (Controlled Process):

  • Government & Regulatory/Standardization Bodies: Enforce legal compliance and issue corrective orders.
  • Top Management & Corporate EHS Organization: Issue Standard Operating Procedures (SOPs) and operate the work permit system.
  • On-site Integrated System & Safety Managers: Execute automatic equipment interlocks and Emergency Off (EMO) mechanisms.
  • Field Operators & Maintenance Personnel: Manually operate chemical equipment and implement LOTO.

5a3e41bd78aea.png

Diagramming this Master structure results in the following:

  • Human

Name

Field Operator and Maintenance Personnel

Type

Human

Level

1

Responsibilities

[None]

Process Model

[None]

Control Action

Target of Control Action

Manual operation of chemical equipment & Implementation of LOTO (Lockout/Tagout)

Semiconductor Manufacturing Process and Pollution Prevention Facilities

Feedback

Target of Feedback

Reporting of safety blind spots and hazards (Near-misses)

Top Management and Corporate EHS Organization


  • Controller

Name

On-site Integrated System and Safety Manager

Type

Controller

Level

1

Responsibilities

[None]

Process Model

[None]

Control Action

Target of Control Action

Automated equipment Interlock & Emergency Off (EMO)

Semiconductor Manufacturing Process and Pollution Prevention Facilities

Feedback

Target of Feedback

Reporting of on-site diagnostic data & Safety expenditure status

Top Management and Corporate EHS Organization


  • Controlled Process

Name

Semiconductor Manufacturing Process and Pollution Prevention Facilities

Type

Controlled Process

Level

1

Control Action

Target of Control Action

External discharge of treated wastewater and exhaust gas

Surrounding Ecosystem (External Factors) and Local Community

Feedback

Target of Feedback

Physical pressure gauge indication & Emergency siren alarm

Field Operator and Maintenance Personnel

Real-time data streaming (Gas concentration, vibration, etc.)

On-site Integrated System and Safety Manager


  • External Controller

Name

Government and Regulatory/Standards Bodies

Type

External Controller

Level

1

Control Action

Target of Control Action

Enforcement of legal compliance & Corrective orders

Top Management and Corporate EHS Organization

Feedback

Target of Feedback

[None]

[None]




Step 3: Which Control Actions Create Risks? (UCA Identification) 

Now, it is time to identify Unsafe Control Actions (UCAs). We analyze the risks that occur when a control action is not provided, is provided incorrectly, or has incorrect timing.

Through VisualPro, we were able to identify critical UCAs such as the following:

  • (UCA-N-1): The Top Management/EHS Organization fails to issue a mandatory work permit requiring a preliminary risk assessment, leaving workers to perform inspections exposed to danger (Not providing).
  • (UCA-P-2): The permit is arbitrarily and incorrectly approved due to production schedule pressure even though gas shut-off is incomplete (Providing).
  • (UCA-N-9): Field operators proceed with opening equipment without physically locking out hazardous energy valves (LOTO).

The full list of derived UCAs is mapped out accordingly.


Control Action (Source -> Target)

Level

UCA Flag

UCA ID

Description

Assumption

SOP Issuance & Work Permit System Operation



(Top Management & Corporate EHS Organization -> Field Operators & Maintenance Personnel)

1

Not providing causes hazard

(UCA-N-1)

Failure to issue a mandatory work permit requiring a preliminary risk assessment, leaving the operator to conduct inspections exposed to danger [H-1, H-2]



Providing causes hazard

(UCA-P-2)

Arbitrary incorrect permit approval due to production schedule pressure even though gas shut-off is incomplete [H-1, H-2]



Too early, too late, out of order

(UCA-T-3)

Allowing entry too early before the residual toxic gas purge is complete [H-1]



Stopped too soon, applied too long

(UCA-S-4)

Leaving the existing permit approval state active for too long without re-evaluation despite changes in the hazardous environment [H-1]


Enforcement of Legal Compliance & Corrective Orders



(Government & Regulatory/Standards Bodies -> Top Management & Corporate EHS Organization)

1

Not providing causes hazard





Providing causes hazard





Too early, too late, out of order





Stopped too soon, applied too long




Automated Equipment Interlock & Emergency Off (EMO)



(On-site Integrated System & Safety Manager -> Semiconductor Manufacturing Process & Pollution Prevention Facilities)

1

Not providing causes hazard

(UCA-N-5)

Failure to provide Interlock/EMO emergency control even after detecting signs of a gas leak, leaving the leak unattended [H-1, H-2]



Providing causes hazard

(UCA-P-6)

Forcible EMO activation due to sensor malfunction during the process, shutting off even essential cooling systems [H-1]



Too early, too late, out of order

(UCA-T-7)

Emergency activation is too late due to leak notification relay delays, resulting in fatal concentration levels being exceeded [H-1]



Stopped too soon, applied too long

(UCA-S-8)

Releasing the safety shut-off too early while risk factors are still unresolved, causing a secondary leak [H-1]


Issuance of Safety Goals & Direction for Field Supervision



(Top Management & Corporate EHS Organization -> On-site Integrated System & Safety Manager)

1

Not providing causes hazard





Providing causes hazard





Too early, too late, out of order





Stopped too soon, applied too long




External Discharge of Treated Wastewater & Exhaust Gas



(Semiconductor Manufacturing Process & Pollution Prevention Facilities -> Surrounding Ecosystem & Local Community)

1

Not providing causes hazard





Providing causes hazard





Too early, too late, out of order





Stopped too soon, applied too long




Manual Operation of Chemical Equipment & LOTO Implementation



(Field Operators & Maintenance Personnel -> Semiconductor Manufacturing Process & Pollution Prevention Facilities)

1

Not providing causes hazard

(UCA-N-9)

Proceeding with equipment opening without physically locking out (LOTO) the hazardous energy valves [H-3, H-5]



Providing causes hazard

(UCA-P-10)

Applying LOTO to the wrong valve or mistakenly locking the scrubber exhaust valve where ventilation is essential [H-3, H-5]



Too early, too late, out of order

(UCA-T-11)

Implementing the LOTO lock too late, long after entering the work environment, leading to exposure from initially emitted gases [H-5]



Stopped too soon, applied too long

(UCA-S-12)

Releasing the LOTO lock too early before checking the surroundings while a colleague is still inside, causing a gas/power leak [H-5]



Step 4: Why Did Such UCAs Occur? (Loss Scenario Discovery) 

We trace the causes (scenarios) explaining why the Unsafe Control Actions (UCAs) inevitably occurred. This includes all aspects such as system defects, sensor errors, and operator misunderstandings.

  • System Misjudgment (LS-1): Even though there were signs of a minor hydrofluoric acid gas leak the previous day, the EHS system fails to recognize it, misclassifies the task as 'simple lighting/filter replacement,' and omits the risk assessment permit.
  • Psychological Pressure (LS-4): As exhaust system failure alarms and urgent maintenance requests from the production team arrive simultaneously, a manager feeling psychological pressure prioritizes the production team and bypasses the permit procedure.
  • Model Inconsistency (LS-8): Despite receiving operator feedback, the referenced drawings are outdated, leading to the failure to recognize and address the gas leak risks in newly bypassed piping.

The complete list of Loss Scenarios related to the UCAs is generated for thorough analysis.

Control Action

SOP Issuance & Work Permit System Operation

Level

1

UCA-N-1

Failure to issue a mandatory work permit requiring a preliminary risk assessment, leaving the operator to conduct inspections exposed to danger

ID

Loss Scenario

LS-1

[Class 1/Case 6] Permit omission due to blind spots in rule-based logic that ignores exceptional situations

LS-1-1

The EHS system determines 'simple lighting/filter replacement' as a general task not requiring a preliminary risk assessment. Even though there were signs of a minor hydrofluoric acid gas leak the previous day, the system fails to recognize it and omits the permit.

LS-2

[Class 1/Case 20] Permit omission due to blind faith in past default safety states in the absence of feedback

LS-2-1

When the gas concentration sensor fails to operate due to a communication disconnection, the manager blindly trusts the past state, assuming it is safe because a purge was done yesterday, and verbally proceeds without a permit.

LS-3

[Class 1/Case 23] Failure to send the actual on-site permit because the server was left in simulation mode

LS-3-1

The EHS server remains in 'diagnostic mode' after a weekend inspection, mistaking a silane pipe repair permit request for test data and omitting issuance to the actual site.

LS-4

[Class 1/Case 30] Ignoring the risk assessment by prioritizing the production team's urgent request during multiple input conflicts

LS-4-1

As exhaust system failure alarms and urgent maintenance requests from the production team arrive simultaneously, a manager feeling psychological pressure prioritizes the production team and bypasses the permit procedure.

LS-5

[Class 1/Case 18] Exclusion from risk assessment target by misinterpreting pressure drop feedback as an empty gas cylinder

LS-5-1

A sudden drop in gas cabinet pressure is misinterpreted as 'the operator has already closed the valve' instead of a gas leak, thereby invalidating the review logic.

LS-6

[Class 2/Case 8] Failure to recognize fatal shielded area opening feedback due to alarm flood overload

LS-6-1

Due to a flood of tens of thousands of sensor alarms during a factory maintenance period, critical alarms such as the opening of a shielded door are missed, and the permit issuance procedure is not initiated.

LS-7

[Class 2/Case 11] Ignoring ambiguous field anomaly reports due to excessive blind faith in machine readings

LS-7-1

When a field operator's radio report of an ammonia smell conflicts with the system data showing 0%, the mechanical readings are absolutely trusted, ignoring the field judgment, and personnel are dispatched.

LS-8

[Class 2/Case 7] Overlooking the risk of gas leaks in bypass piping by applying an outdated process model (old drawings)

LS-8-1

Despite receiving operator feedback, the referenced drawing is an outdated process model, leading to the failure to recognize and address the gas leak risks in newly bypassed piping.

LS-9

[Class 3/Case 9] Both commands are rejected as control actions from different sources collide at the field terminal

LS-9-1

An 'access control' command from a smart pad and a 'maintenance permit' message from the center collide at the door terminal, and both are rejected, creating a control gap.

LS-10

[Class 3/Case 5] Permit packet is lost due to being overwritten by a broadcast message on the wireless communication network

LS-10-1

At the moment of transmitting a specific work instruction packet, a firmware update broadcast distributed throughout the factory consumes the communication network, causing data loss.

LS-11

[Class 3/Case 7] Reversal due to a past 'cancel' command arriving after the latest 'permit' due to a communication buffer delay

LS-11-1

Due to a buffer delay, an old 'permit cancellation' packet from yesterday is processed 1 second after the newly approved signal, resulting in the permit ultimately being canceled on-site.

LS-12

[Class 4/Case 7] A screen door control chip that should be blocked allows the door to open due to being stuck in standby mode

LS-12-1

Zone lockdown should be the default upon error, but the door control chipset gets stuck in standby mode, ignoring the system's access block command and allowing a free pass.

LS-13

[Class 4/Case 6] Physical unlocking outside the central control logic due to latch (component) damage caused by corrosive gas

LS-13-1

The central server assumes the door is locked because no permit was issued, but the actual on-site locking latch is completely rusted by chemical fumes and clatters open.

LS-14

[Class 4/Case 13] Corruption of received permit data due to High Frequency (RF) electromagnetic noise interference

LS-14-1

The permit data packet on the field pad is corrupted by strong noise generated during the operation of high-power plasma etching equipment. The operator mistakes it for a standby delay and begins arbitrary disassembly.

LS-15

[Class 4/Case 4] Standby team exposure due to self-triggered emergency exhaust caused by errors such as internal sensor failure

LS-15-1

While waiting for central review, a secondary sensor inside the cabinet malfunctions and autonomously triggers emergency exhaust, causing backflowing gas to strike the work team waiting outside the door for the permit.



Step 5: How Do We Make It Safe? (Establishing Countermeasures)

Finally, countermeasures that can fundamentally block the identified scenarios are established as system requirements.

  • [CM-1] Cross-validation Interlock Logic: Integrates with gas leak anomaly symptom monitoring to completely block automatic entry approval upon detecting errors.
  • [CM-4] Hard-coded Hierarchical Safety Interrupt: Grants the highest authority when life-threatening alarms (e.g., exhaust system failure) occur, indefinitely suspending operations.
  • [CM-8] Digital Twin Piping Recognition Forced Lock: Triggers a system-level permit lock if the RFID information of newly added valves does not match the central drawings.

Examples of comprehensive countermeasures are mapped directly to their corresponding Loss Scenarios to ensure complete coverage.

ID

Priority

Countermeasure

Description

Traceability

CM-1

1

Cross-validation Interlock Logic

Integrates with gas leak anomaly symptom monitoring to completely block automatic entry approval upon detecting errors.

LS-1

CM-2

1

Hardware-level Fail-Safe Default Design

Upon loss of sensor feedback, ignores the previous state and immediately defaults to a 'Danger (Access Prohibited)' state.

LS-2

CM-3

1

Forced Network Isolation and Auto-Reboot in Diagnostic Mode

Applies a test mode timer; unconditionally forces a return to real-time operation mode upon timeout.

LS-3

CM-4

1

Hard-coded Hierarchical Safety Interrupt

Grants the highest scheduling authority to life-threatening alarms (e.g., exhaust system failure) and suspends operations indefinitely.

LS-4

CM-5

1

Multi-variable Sensor Fusion (AND Gate)

Approves 'Safe' logic only when complex conditions (purge/pump, etc.) are met, rather than relying on a single pressure sensor.

LS-5

CM-6

1

Alarm Triage and Independent Relay Indicator Network

Assigns critical hazard signals to hardware warning lights isolated from general monitors and prevents operators from muting them.

LS-6

CM-7

1

Manual Override Lockdown

Overrides existing 0% digital sensor readings upon manual field reports (e.g., emergency buttons), enforcing the highest-priority forced lockdown control.

LS-7

CM-8

1

Digital Twin Piping Recognition Forced Lock

Triggers a system-level permit lock if the RFID information of newly added valves does not match the central drawings.

LS-8

CM-9

1

Deterministic Arbiter Gate

Outputs an unconditionally conservative measure (e.g., total lockdown) via hardware logic combinations when conflicting commands are received at the device.

LS-9

CM-10

1

Dedicated Independent Fieldbus Network Allocation for Safety

Physically separates the safety permit packet network to prevent interference from high-volume traffic such as general firmware updates.

LS-10

CM-11

1

Timestamp Sequencing Discard (TTL)

Forces fail-safe discarding of arriving packets that have exceeded their timeout to prevent command reversal caused by communication buffer delays.

LS-11

CM-12

1

Normally Closed (NC) Relay Integrated Watchdog

Treats chipset freezing in standby mode as a missing survival heartbeat, cutting power and forcibly maintaining a door-closed state.

LS-12

CM-13

1

Physical Tension-based Position Feedback Sensor

Measures actual latch corrosion/friction in addition to door closure sensors; triggers immediate lockdown if the probability of loosening/separation increases.

LS-13

CM-14

1

Noise-Shielded Optical Line Conversion & Enhanced CRC

Designs optical cables for high-frequency interference process zones and installs structures to instantly block corrupted received data packets.

LS-14

CM-15

1

Hardware-Isolated Check Valve Forced Vent Configuration

Designed to completely block gas backflow using a mechanical damper (Check Valve) even if the internal S/W sensor malfunctions and reverses rotation.

LS-15



In this way, STPA maps out the big picture of the system, persistently investigates blind spots that can occur in the interactions between components, and derives the most definitive safety measures.

87d057aa840dc.png

<VisualPro: The Official MIT-Certified STPA Analysis Tool>


By systematically conducting analyses with VisualPro, you can easily follow along and perform seemingly complex STPA analyses without omissions. Through its intuitive UI and automated traceability management, anyone can perform expert-level STPA analysis. Recently, an AI Chat feature was added, assisting in STPA analysis to make it much easier and more accurate.

Experience easy-to-understand STPA analysis with VisualPro.


Roh Kyung Hyun
04559, 5F Pyeonggwang Building, 243 Toegye-ro, Jung-gu, Seoul (Chungmuro 5-ga 19-19)
+82-10-8337-9837
631-81-00287
www.vwaycorp.com
vway@vwaycorp.com

© VWAY All rights reserved


Representative

Roh Kyung HyunBusiness Registration Number
631-81-00287
Company Address
5th Floor, Pyeong-kwang B/D, 243, Toegye-ro, Jung-gu, Seoul, Republic of Korea
Website
www.vwaycorp.com
Telephone
+82-2-2285-6541
Representative Email
vway@vwaycorp.com

© VWAY All rights reserved