I'm facing a 100% usage issue on the root partition (/dev/sda2) on my Ubuntu server, despite no obvious large files, and it's driving me insane:
$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 996G 976G 0 100% /
While my du shows only 521G is used
$ du -xh --max-depth=1 / | sort -hr | head -n 5
521G /
466G /home
25G /var
12G /usr
11G /root
I have 455G used by something, but I can't put my finger on it.
This issue prevented my users from accessing their sessions, so I cleaned up some users in /home, but the issue will creep back up.
What I already did :
Docker images and volumes are healthy:
$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 12 5 15.84GB 8.503GB (53%)
Containers 6 0 297.5kB 297.5kB (100%)
Local Volumes 4 3 72.06MB 25.08kB (0%)
Build Cache 0 0 0B 0B
Snap packages: removed old disabled versions with sudo snap remove --purge, got me 3G of headroom (yeah!).
Journal logs: only ~352MB used.
/var/lib/snapd is around 4.7G.
lsof | grep '(deleted)' shows only PulseAudio memfd buffers (not disk files)
$ lsof | grep '(deleted)'
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
Output information may be incomplete.
lsof: WARNING: can't stat() nsfs file system /var/snap/lxd/common/ns/shmounts
Output information may be incomplete.
lsof: WARNING: can't stat() nsfs file system /var/snap/lxd/common/ns/mntns
Output information may be incomplete.
pulseaudi 210380 administrateur 6u REG 0,1 67108864 5195534 /memfd:pulseaudio (deleted)
pulseaudi 210380 210419 null-sink administrateur 6u REG 0,1 67108864 5195534 /memfd:pulseaudio (deleted)
pulseaudi 210380 210422 snapd-gli administrateur 6u REG 0,1 67108864 5195534 /memfd:pulseaudio (deleted)
I have a large mount under /mnt/NASLABO2, but this is a remote NAS mount and should not affecting local disk ?
//192.168.26.102/Backup_Linux 118T 73T 46T 62% /mnt/NASLABO2
I moved a lot of my data from my main partition to the higher up partition, but i dont want to touch the /home of my main group of users yet.
dev/sda3 31T 5,4T 25T 19% /TDL
I even had some downtime because of FSCK, whithout finding anything.
Something interesting that an user told me, the hidden data appeared years ago after an electrical failure and a hard reboot.
Since this is a critical server where downtime must be minimized, Iām looking for a more permanent solution. Any idea of what could cause this issue, and how i could fix it ?
Cheers !