Definition
An oops is a diagnostic report generated by the Linux kernel when it detects a non‑fatal internal error. The message provides details about the exception, including the processor state, memory addresses, and a stack trace, and typically leads to the termination of the offending process while the rest of the system continues to operate.
Overview
When the kernel encounters an unexpected condition—such as an invalid memory access, a null‑pointer dereference, or an illegal instruction—it may trigger an oops. The kernel’s exception handling routine captures the relevant context and prints the oops message to the console and, if configured, to system logs (e.g., /var/log/kern.log). After reporting, the kernel attempts to recover by killing the current process and, in many cases, allowing other processes to continue running. However, a severe oops can corrupt kernel data structures, potentially leading to a subsequent kernel panic if the system cannot maintain stability.
Etymology / Origin
The term “oops” is informal slang for a minor mistake. In the Linux community, it was adopted early in the kernel’s development (mid‑1990s) to denote a less catastrophic error than a full kernel panic. The naming reflects the developers’ view that such errors, while undesirable, are often recoverable and do not immediately halt the entire operating system.
Characteristics
- Trigger Conditions: Invalid memory accesses, assertion failures, unimplemented system calls, hardware exceptions, and other kernel‑level bugs.
- Content of the Report:
- Exception type and error code.
- CPU register dump (e.g.,
EIP,EAXon x86). - Stack trace with function names and offsets.
- Process identifier (PID) and command name of the faulting task.
- Module information if the fault originated in a loadable kernel module.
- Impact on System:
- The offending process is usually killed (
SIGKILL). - System may remain stable, but subsequent oopses can indicate deeper instability.
- In certain configurations (e.g.,
panic_on_oopssysctl), the kernel escalates the oops to a panic, halting the system.
- The offending process is usually killed (
- Logging and Analysis:
- Stored in kernel log buffers (
dmesg). - Tools such as
oops(a script) andkernelsharkcan parse and summarize oops messages. - Developers use the information to locate and fix bugs in kernel code or modules.
- Stored in kernel log buffers (
- Configuration Options:
panic_on_oops(sysctl) – forces a panic after an oops.oops_limit– limits the number of oops messages before triggering a panic.
Related Topics
- Kernel panic – a non‑recoverable error that halts the entire system.
- Segmentation fault – a user‑space analogue of an invalid memory access causing a process to receive
SIGSEGV. - Exception handling in Linux – mechanisms for managing hardware and software exceptions.
- Loadable kernel modules (LKMs) – dynamically added code that can be a source of oopses if buggy.
- System logging (syslog, journald) – facilities for storing kernel messages, including oops reports.
- Debugging tools –
kdb,kgdb,ftrace, andperfare used to investigate and reproduce oops conditions.