kb

How to adjust OOM score for a process?

In Linux, each process is assigned an Out-Of-Memory (OOM) score, which reflects its memory usage relative to other processes. When the system nears memory exhaustion, the kernel’s OOM killer terminates the process with the highest score to free up memory.

This mechanism is crucial for system stability but can inadvertently target critical applications such as database servers if not properly managed. DBA Square brings you an insight into this today!

Understanding the OOM Score

The OOM score is primarily determined by a process’s memory consumption, but it can also be influenced by other factors such as process priority and historical behavior. The OOM killer uses this score as a heuristic to decide which process to kill when the system is under memory pressure.

Adjusting the OOM Score

To protect important applications from being terminated, you can manually adjust their OOM score. This is done by modifying the oom_score_adj file located in the /proc filesystem. For older kernels (pre-2.6.29), the interface is provided through oom_adj.

  • For modern kernels (>= 2.6.29):
    The file /proc/[pid]/oom_score_adj accepts values ranging from -1000 to 1000.
  • For older kernels (< 2.6.29):
    The file /proc/[pid]/oom_adj accepts values from -17 to 15.

A negative value decreases the likelihood of the process being killed, while a positive value increases it.

Each process in Linux has a OOM score assigned to it. Its value is primarily based on the amount of memory a process uses. Whenever system is about to run out of memory, OOM killer terminates the program with the highest score.

To prevent it from killing a critical application, such as for example a database instance, the score can be manually adjusted. It is possible through /proc/[pid]/oom_score_adj (or /proc/[pid]/oom_adj for kernels older than 2.6.29). The range of values which oom_score_adj accepts is from -1000 to 1000, or from -17 to 15 in the deprecated interface that relies on oom_adj. The score is either reduced or increased by the adjustment value.

For example to reduce chances of loosing mysqld process:

# ps ax | grep '[m]ysqld'
 6445 ?        Ssl    0:04 /usr/sbin/mysqld --defaults-file=/etc/mysql/my.cnf
# cat /proc/6445/oom_score
124
# echo '-1000' > /proc/6445/oom_score_adj
# cat /proc/6445/oom_score
0

Practical Example: Protecting the MySQL Server

To reduce the risk of the MySQL daemon (mysqld) being terminated, follow these steps:

  1. Identify the Process ID (PID):bashCopyps ax | grep '[m]ysqld' This command filters the process list to find the PID of mysqld.
  2. Check the Current OOM Score:bashCopycat /proc/<PID>/oom_score Replace <PID> with the actual process ID obtained from the previous step.
  3. Adjust the OOM Score:To significantly lower the chance of termination, write -1000 to the oom_score_adj file:bashCopyecho '-1000' > /proc/<PID>/oom_score_adj
  4. Verify the Change:Re-check the OOM score to ensure it has been adjusted:bashCopycat /proc/<PID>/oom_score The score should now be reduced accordingly (often showing as 0 when the adjustment fully compensates the calculated score).

Considerations and Best Practices

  • System Stability:
    Adjusting OOM scores can protect critical services but may inadvertently lead to less critical processes being terminated. Always test changes in a controlled environment before deploying them to production.
  • Automating with Systemd:
    For services managed by systemd, you can set the OOM score directly in the service unit file using the OOMScoreAdjust directive. This provides a persistent configuration that survives reboots.
  • Monitoring and Alerts:
    Consider implementing monitoring tools to track memory usage and OOM events. Alerts can help you understand when the system is under memory pressure and allow you to take proactive measures.
  • Documentation:
    Document any manual changes to OOM scores within your system configuration management to ensure that all team members understand the custom settings and their rationale.

Conclusion

Adjusting the OOM score is a powerful technique to ensure that essential services remain operational during memory pressure events. However, it should be done with caution and a clear understanding of the system’s overall memory management strategy. By following the steps and best practices outlined above, you can tailor the behavior of the OOM killer to better suit your system’s needs.