Linux 服务器磁盘无空间故障排除

2019/06/24

今天偶尔重启了一下服务器,结果发现MySQL启动不了了,手动启动了一下服务结果还是没有效果,输出以下的错误。

Jun 24 15:22:32 ubuntu systemd[1]: Starting MySQL Community Server...
Jun 24 15:22:33 ubuntu systemd[1]: mysql.service: Main process exited, code=exited, status=1/FAILURE

因为我的服务器只有20G的容量。所以我第一个想到的问题就是20G的控件占满了导致MySQL服务无法启动,因为没有足够的储存资源可以分给他。用 df 指令检查了以下各个挂载点的磁盘占用,果然 / 挂载点已经满了,并且 \boot 挂载点也已经没有剩余空间了。

[email protected]:~# df -lh
Filesystem      Size  Used Avail Use% Mounted on
udev            484M     0  484M   0% /dev
tmpfs           101M  5.6M   96M   6% /run
/dev/sda2        20G   19G  129M 100% /
tmpfs           504M     0  504M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           504M     0  504M   0% /sys/fs/cgroup
/dev/sda1       361M  359M     0 100% /boot
tmpfs           101M     0  101M   0% /run/user/0

\boot 挂载点如果出现剩余空间不足的话根据以往的经验一般是Linux内核太多了,因为Ubuntu更新了内核之后旧的内核还会存在操作系统里面,所以应该清理一下老的内核。

apt-get autoremove 命令一般就可以解决问题,但是在我的情况下又报了很多错误,按照提示运行了 apt-get -f install 指令也没有成功。于是我想着去手动删掉旧的内核。

[email protected]:~# sudo apt-get autoremove
Reading package lists... Done
Building dependency tree       
Reading state information... Done
You might want to run 'apt-get -f install' to correct these.
The following packages have unmet dependencies:
 linux-image-generic : Depends: linux-image-4.4.0-151-generic but it is not installed or
                                linux-image-unsigned-4.4.0-151-generic but it is not installed
                       Recommends: thermald but it is not installed
 linux-modules-extra-4.4.0-151-generic : Depends: linux-image-4.4.0-151-generic but it is not installed or
                                                  linux-image-unsigned-4.4.0-151-generic but it is not installed
E: Unmet dependencies. Try using -f.

首先需要查看当前内核版本,我的服务器使用的是 4.4.0-150

[email protected]:~# uname -r
4.4.0-150-generic

之后用指令查看可以删除的内核,在下面的列表中

  • rc:表示已经被移除
  • ii:表示符合移除条件(可移除)
  • iU:已进入 apt 安装队列,但还未被安装(不可移除)。
[email protected]:~# dpkg -l | tail -n +6 | grep -E 'linux-image-[0-9]+' | grep -Fv $(uname -r)
ii  linux-image-4.4.0-130-generic         4.4.0-130.156                              amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-4.4.0-134-generic         4.4.0-134.160                              amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-4.4.0-137-generic         4.4.0-137.163                              amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-4.4.0-139-generic         4.4.0-139.165                              amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-4.4.0-141-generic         4.4.0-141.167                              amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-4.4.0-142-generic         4.4.0-142.168                              amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP
ii  linux-image-4.4.0-143-generic         4.4.0-143.169                              amd64        Signed kernel image generic
ii  linux-image-4.4.0-145-generic         4.4.0-145.171                              amd64        Signed kernel image generic
ii  linux-image-4.4.0-148-generic         4.4.0-148.174                              amd64        Signed kernel image generic
ii  linux-image-4.4.0-62-generic          4.4.0-62.83                                amd64        Linux kernel image for version 4.4.0 on 64 bit x86 SMP

查询到了可以移除的内核之后我试着使用 dpkg --purge 来卸载内核,不过很遗憾,还是不行。运行dpkg指令的时候都会提醒我 No space left on device。

[email protected]:~# sudo dpkg --purge linux-image-4.4.0-130-generic
dpkg: dependency problems prevent removal of linux-image-4.4.0-130-generic:
 linux-image-extra-4.4.0-130-generic depends on linux-image-4.4.0-130-generic.

dpkg: error processing package linux-image-4.4.0-130-generic (--purge):
 dependency problems - not removing
Errors were encountered while processing:
 linux-image-4.4.0-130-generic
 
[email protected]:~# sudo dpkg --purge linux-image-extra-4.4.0-130-generic
(Reading database ... 454615 files and directories currently installed.)
Removing linux-image-extra-4.4.0-130-generic (4.4.0-130.156) ...
depmod: FATAL: could not load /boot/System.map-4.4.0-130-generic: No such file or directory
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.4.0-130-generic /boot/vmlinuz-4.4.0-130-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.4.0-130-generic /boot/vmlinuz-4.4.0-130-generic
update-initramfs: Generating /boot/initrd.img-4.4.0-130-generic
W: mdadm: /etc/mdadm/mdadm.conf defines no arrays.

gzip: stdout: No space left on device
E: mkinitramfs failure cpio 141 gzip 1
update-initramfs: failed for /boot/initrd.img-4.4.0-130-generic with 1.
run-parts: /etc/kernel/postinst.d/initramfs-tools exited with return code 1
dpkg: error processing package linux-image-extra-4.4.0-130-generic (--purge):
 subprocess installed post-removal script returned error exit status 1
Errors were encountered while processing:
 linux-image-extra-4.4.0-130-generic

上述所有的方法无法成功其实就是因为 \boot 挂载点被完全占满了,执行内核相关操作的指令比如apt的时候也需要一些磁盘空间,所以上面的方法都没有办法手动,所以还是要纯手动的删除一些文件。

首先进入 /boot 挂载点,然后用du指令列出来了文件列表,可以看出来里面只要文件名结尾不是 4.4.0-150-generic 的文件都是就的内核的相关文件。

[email protected]:~# cd /boot
[email protected]:/boot# du -sk *|sort -n
1       retpoline-4.4.0-139-generic
1       retpoline-4.4.0-142-generic
12      lost+found
188     config-4.4.0-139-generic
188     config-4.4.0-142-generic
188     config-4.4.0-143-generic
188     config-4.4.0-145-generic
188     config-4.4.0-148-generic
188     config-4.4.0-150-generic
188     config-4.4.0-151-generic
1229    abi-4.4.0-139-generic
1230    abi-4.4.0-142-generic
3830    System.map-4.4.0-139-generic
3830    System.map-4.4.0-142-generic
3831    System.map-4.4.0-143-generic
3831    System.map-4.4.0-145-generic
3833    System.map-4.4.0-148-generic
3834    System.map-4.4.0-150-generic
3834    System.map-4.4.0-151-generic
6859    grub
7030    vmlinuz-4.4.0-139-generic
7045    vmlinuz-4.4.0-142-generic
7050    vmlinuz-4.4.0-145-generic
7052    vmlinuz-4.4.0-143-generic
7057    vmlinuz-4.4.0-148-generic
7059    vmlinuz-4.4.0-150-generic
9390    initrd.img-4.4.0-138-generic
38910   initrd.img-4.4.0-139-generic
38987   initrd.img-4.4.0-141-generic
38997   initrd.img-4.4.0-143-generic
38998   initrd.img-4.4.0-142-generic
39657   initrd.img-4.4.0-145-generic
39945   initrd.img-4.4.0-148-generic
39946   initrd.img-4.4.0-150-generic

根据上面的文件列表,使用rm指令来删除不需要的文件。之后再使用du指令发现文件已经被顺利删除了。

[email protected]:/boot# sudo rm -rf /boot/*-4.4.0-{138,139,141,143,142,145,148}-*
[email protected]:/boot# du -sk *|sort -n
12      lost+found
188     config-4.4.0-150-generic
188     config-4.4.0-151-generic
3834    System.map-4.4.0-150-generic
3834    System.map-4.4.0-151-generic
6859    grub
7059    vmlinuz-4.4.0-150-generic
39946   initrd.img-4.4.0-150-generic

删除之后/boot挂载点瞬间下降了80%,这时apt指令就可以用了,运行下面的指令即可完全移除不需要的内核和package。

sudo apt-get autoremove

Copyright © 2015-2021 MikeTech.it. All rights reserved.

Developed By Yigang Zhou