1

We're migrating some servers from centos 7.7ish to Rocky linux 8, and we're finding just about every process is using 20-50% more memory, causing machines to lock up.

For example, sh on centos:

# sh -c 'egrep "VmRSS|VmSwap" /proc/$$/status'
VmRSS:       664 kB
VmSwap:        0 kB

vs. rocky:

# sh -c 'egrep "VmRSS|VmSwap" /proc/$$/status'
VmRSS:      1136 kB
VmSwap:        0 kB

That's just a simple example; httpd, nginx, postgres processes all have large increases, although the exact amount varies from around 20% to 50%, on identical hardware.

Does anyone have suggestions for what to look at to debug this? Some /proc settings that we might look at?

Kernels report: centos: 3.10.0-1160.62.1.el7.x86_64 rocky: 4.18.0-348.12.2.el8_5.x86_64

edit

We have updated the centos 7 kernel to match and then exceed the rocky 8 version. Centos 7 memory usage did not grow to match what's going on with rocky 8 (it increased a little, maybe 3%, as I'd expect just from kernel bloat). So we hypothesize it's not a kernel problem, but something in the libraries (since the problem is so widespread across different processes).

getconf PAGESIZE returns 4096 for both OS

/proc/meminfo (note this machine has swap, which our production machines don't)

MemTotal:        3826180 kB
MemFree:         1292540 kB
MemAvailable:    2010048 kB
Buffers:             100 kB
Cached:          1071740 kB
SwapCached:        76916 kB
Active:          1028020 kB
Inactive:        1219360 kB
Active(anon):     266816 kB
Inactive(anon):  1097744 kB
Active(file):     761204 kB
Inactive(file):   121616 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1679356 kB
SwapFree:         983108 kB
Dirty:                52 kB
Writeback:             0 kB
AnonPages:       1104196 kB
Mapped:           132244 kB
Shmem:            193280 kB
KReclaimable:      78224 kB
Slab:             130756 kB
SReclaimable:      78224 kB
SUnreclaim:        52532 kB
KernelStack:        5728 kB
PageTables:        31140 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3592444 kB
Committed_AS:    3708844 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:             1776 kB
HardwareCorrupted:     0 kB
AnonHugePages:    724992 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      370560 kB
DirectMap2M:     3823616 kB
3
  • Please add output of cat /proc/meminfo and getconf PAGESIZE commands.
    – asktyagi
    Apr 18 at 6:49
  • Why do you think that a greater VmRSS has anything to do with machine lockup?
    – shodanshok
    Apr 18 at 17:19
  • Available memory is near zero. Various processes are dying from lack of available memory, soon after reboot when idle. Stuff fails to run from shell with out of memory errors. Finally, we noticed our dev machines have swap configured, and run fine.
    – Matt DiMeo
    Apr 20 at 15:27

1 Answer 1

0

you could try running sysctl -a on both systems to find what's different. look for settings related to vm.swappiness, vm.dirty_background_ratio, and vm.overcommit_memory, etc. etc.

You might also want to check your httpd, nginx, and postgres config files to see if the defaults might have changed. Also check for any extra packages that could have installed on rocky Linux 8 that are using more memory. For example, with Apache/Nginx, make sure to check the number of worker processes/threads to match the available hardware resources. Having too many workers can cause issues with memory usage.

if all else fails, you could try profiling the memory usage of said processes using something like the Valgrind tool 'Massif' to profile heap memory usage. But that's a whole other can of worms.

1
  • Sysctl settings are essentially the same, aside from new features. Config files are identical.
    – Matt DiMeo
    Apr 20 at 15:28

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .