鲲鹏社区首页
中文
注册
JVM coredump分析系列(4):常见的SIGBUS案例分析

JVM coredump分析系列(4):常见的SIGBUS案例分析

DevKit

发表于 2023/06/07

0

前言

笔者先前遇到多个SIGBUS crash问题,在此处归纳整理下问题定位思路并且给出复现的用例,以便提升定位同类问题的效率。通常访问内存触发 SIGBUS 有如下几种场景:

(1)未对齐内存的读写

(2)机器物理内存故障

(3)文件映射异常访问

本文主要从 机器物理内存故障 和 文件映射异常访问 两个场景分别阐述问题发生的现象、排查方法以及复现的用例。

机器物理内存故障触发的SIGBUS

机器上很多进程都会出现crash,每次出现crash的堆栈不一样,并且有些进程crash在系统库上,例如 libc.so、libpthread.so。

1. 排查方法

(1)分析 hs_err_pid 文件

从hs_err_pid文件中可以看出访问地址 0x000000054c0bc000 触发 SIGBUS,并且 si_code 为 4 (BUS_MCEERR_AR)。

从官方 sigaction 用户手册中[1]查看到 si_code 中 BUS_MCEERR_AR (4) 、BUS_MCEERR_AO (5) 表示物理内存故障。

BUS_ADRALN
    Invalid address alignment.

BUS_ADRERR
    Nonexistent physical address.

BUS_OBJERR
    Object-specific hardware error.

BUS_MCEERR_AR (since Linux 2.6.32)
    Hardware memory error consumed on a machine check;
    action required.

BUS_MCEERR_AO (since Linux 2.6.32)
    Hardware memory error detected in process but not
    consumed; action optional.

(2)分析系统日志

查看出现crash前后时间点的系统日志,可以看到打印出很多 kernel 异常信息(Hardware Error ,hardware memory error等),从系统日志中进一步佐证是由于物理内存故障导致访问内存crash。

文件映射异常访问触发的SIGBUS

文件映射访问异常触发 SIGBUS 在用户态最为常见[2],也最容易触发。通常来说根本原因都是进程 mmap 了一个文件后,另外的进程把这个文件截断了,导致 mmap 出来的某些内存页超出文件的实际大小,访问那些超出的内存页就会触发 SIGBUS。具体来说有以下几种场景:

(1)进程 mmap 一个文件后,其它进程 truncate 该文件到更小;

(2)动态库更新,直接 cp 覆盖;

(3)可执行文件更新,直接 cp 覆盖。

1. 排查方法

我们可以按照如下步骤排查文件映射异常访问触发的SIGBUS:

(1)查看 hs_err_pid 文件 T H R E A D 信息中打印的 si_addr;

(2)查看 hs_err_pid 文件 Dynamic libraries 找到 si_addr 映射的文件;


(3)在业务日志中打印对应文件的操作记录,查看是否存在并发读写问题。

2. 复现案例

在Java应用中,每次文件映射异常访问触发的SIGBUS的线程堆栈可能不一样,下面笔者在下文中阐述下最常见的两个案例。

案例一:并发处理同一文件触发SIGBUS

笔者在业务中多次碰到在x86_64机器中调用 ~StubRoutines::jlong_disjoint_arraycopy, 在aarch64机器上调用 ~StubRoutines::arrayof_jlong_disjoint_arraycopy 触发 SIGBUS 问题。通过 3.1 章节的排查方法,最终定位到是由于多个线程同时操作一个文件引起的。

(1)触发SIGBUS堆栈信息

// x86_64
Stack: [0x00007f3c798a6000,0x00007f3c799a7000],  sp=0x00007f3c799a5940,  free space=1022k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
v  ~StubRoutines::jlong_disjoint_arraycopy
J 1553 C2 java.nio.Bits.copyToArray(JLjava/lang/Object;JJJ)V (68 bytes) @ 0x00007f40c184f100 [0x00007f40c184f0a0+0x60]
J 1551 C2 java.nio.DirectByteBuffer.get([BII)Ljava/nio/ByteBuffer; (126 bytes) @ 0x00007f40c184dfcc [0x00007f40c184df20+0xac]
J 1549 C2 java.nio.ByteBuffer.get([B)Ljava/nio/ByteBuffer; (9 bytes) @ 0x00007f40c184c424 [0x00007f40c184c3e0+0x44]
j  TestSigBus$2.run()V+68
J 1501 C2 java.lang.Thread.run()V (17 bytes) @ 0x00007f40c181d56c [0x00007f40c181d520+0x4c]
v  ~StubRoutines::call_stub
V  [libjvm.so+0x6e87e5]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0xdd5
V  [libjvm.so+0x6e5d1b]  JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x2ab
V  [libjvm.so+0x6e6337]  JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x57
V  [libjvm.so+0x7865cb]  thread_entry(JavaThread*, Thread*)+0x7b
V  [libjvm.so+0xb00911]  JavaThread::thread_main_inner()+0xf1
V  [libjvm.so+0x9a5558]  java_start(Thread*)+0xf8
C  [libpthread.so.0+0x8164]  start_thread+0xe4

// aarch64

Stack: [0x0000fffda92b0000,0x0000fffda94b0000],  sp=0x0000fffda94ae380,  free space=2040k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
v  ~StubRoutines::arrayof_jlong_disjoint_arraycopy
J 1730 C2 java.nio.Bits.copyToArray(JLjava/lang/Object;JJJ)V (68 bytes) @ 0x0000ffff718c0848 [0x0000ffff718c0800+0x48]
C  0x0000000000002000

(2)复现步骤

a. 在主线程中初始化文件写入 2 个 PAGE_SIZE 字节数据, 并且调用mmap映射文件;

b. 创建一个 truncate 线程, 先清空文件然后再写入一个 PAGE_SIZE 字节数据;

c. 创建一个 read 线程,读取所有的文件数据;

d. 执行用例 TestSigBus

(3)复现代码

import sun.misc.Unsafe;

import java.io.*;
import java.lang.reflect.Field;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.util.Arrays;
import java.util.concurrent.locks.ReentrantLock;

public class TestSigBus {
    private static Unsafe unsafe;
    private static int pageSize;
    private static int fileSize;

    static {
        unsafe = createUnsafe();
        pageSize = unsafe.pageSize();
        fileSize = pageSize * 2;
    }

    public static Unsafe createUnsafe() {
        try {
            Class<?> unsafeClass = Class.forName("sun.misc.Unsafe");
            Field field = unsafeClass.getDeclaredField("theUnsafe");
            field.setAccessible(true);
            Unsafe unsafe = (Unsafe) field.get(null);
            return unsafe;
        } catch (Exception e) {
            e.printStackTrace();
        }
        return null;
    }

    public static File initFile() {
        if (unsafe == null) {
            System.err.println("Create Unsafe failed.");
            return null;
        }

        File file = new File("/home/xiezhaokun/test/tmp.ttt");
        try (FileWriter fileWriter = new FileWriter(file)) {
            for (int i = 0; i < fileSize; i++) {
                fileWriter.write('1');
                fileWriter.flush();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return file;
    }

    public static MappedByteBuffer mappingFile(File file) {
        MappedByteBuffer mappedByteBuffer = null;
        try (FileInputStream fileInputStream = new FileInputStream(file)) {
            FileChannel fileChannel = fileInputStream.getChannel();
            long size = fileChannel.size();
            System.out.println(size);
            mappedByteBuffer = fileChannel.map(
                    FileChannel.MapMode.READ_ONLY, 0, size);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return mappedByteBuffer;
    }

    public static boolean truncateFile(File file) {
        int len = pageSize;
        try (FileWriter fileWriter = new FileWriter(file)) {
            for (int i = 0; i < len; i++) {
                fileWriter.write('1');
                fileWriter.flush();
            }
        } catch (IOException e) {
            e.printStackTrace();
            return false;
        }
        return true;
    }

    public static void main(String[] args) throws InterruptedException {
        // init
        File file = initFile();
        if (file == null) {
            System.err.println("Init file failed.");
            return;
        }

        // mapping
        MappedByteBuffer mappedByteBuffer = mappingFile(file);
        if (mappedByteBuffer == null) {
            System.err.println("Mapping file failed.");
            return;
        }

        ReentrantLock lock = new ReentrantLock();
        // truncate thread
        new Thread(new Runnable() {
            @Override
            public void run() {
                lock.lock();
                try {
                    boolean isSuccess = truncateFile(file);
                    if (!isSuccess) {
                        System.err.println("Clear file failed.");
                        return;
                    }
                } finally {
                    lock.unlock();
                }
            }
        }).start();

        Thread.sleep(2000);

        // read thread
        /*
         * The byteLen should be more than 6 (java.nio.Bits.JNI_COPY_TO_ARRAY_THRESHOLD).
         * @see java.nio.DirectByteBuffer#get(byte[], int, int)
         * @see java.nio.Bits.JNI_COPY_TO_ARRAY_THRESHOLD
         *
         */
        new Thread(new Runnable() {
            @Override
            public void run() {
                lock.lock();
                try {
                    int byteLen = 8;
                    byte[] bytes = new byte[byteLen];
                    int capacity = mappedByteBuffer.capacity();
                    int loops = capacity / byteLen;
                    for (int i = 0; i < loops; i++) {
                        mappedByteBuffer.get(bytes);
                    }
                } finally {
                    lock.unlock();
                }
            }
        }).start();

    }
}

案例二:处理压缩文件时,压缩文件被修改或清空触发SIGBUS

(1)触发SIGBUS堆栈信息

Stack: [0x0000ffff847d0000,0x0000ffff849d0000], sp=0x0000ffff849cda60, free space=2038k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libzip.so+0x139fc] newEntry.isra.4+0x74
C [libzip.so+0x14918] ZIP_GetEntry2+0x168
C [libzip.so+0x4488] Java_java_util_zip_ZipFile_getEntry+0x98
j java.util.zip.ZipFile.getEntry(J[BZ)J+0
j java.util.zip.ZipFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry;+38
j java.util.jar.JarFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry;+2
j java.util.jar.JarFile.getJarEntry(Ljava/lang/String;)Ljava/util/jar/JarEntry;+2
j java.util.jar.JarFile.getManEntry()Ljava/util/jar/JarEntry;+11
j java.util.jar.JarFile.getManifestFromReference()Ljava/util/jar/Manifest;+27
j java.util.jar.JarFile.getManifest()Ljava/util/jar/Manifest;+1
j TestJarFileSigBus.main([Ljava/lang/String;)V+20
v ~StubRoutines::call_stub
V [libjvm.so+0x6d057c] JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0xe54
V [libjvm.so+0x75caf8] jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) [clone .isra.83] [clone .constprop.126]+0x198
V [libjvm.so+0x75edb0] jni_CallStaticVoidMethod+0x148
C [libjli.so+0x8530]
C [libpthread.so.0+0x7d38] start_thread+0xb4
C [libc.so.6+0xdf5f0] thread_start+0x30

(2)复现步骤

a. 创建jar包文件,里面包含一个TestClass类;

b. 清空该jar包文件;

c. 执行用例 TestJarFileSigBus

(3)复现代码

public class TestClass {
    static {
        System.out.println("test");
    }

    public static void main(String[] args) {
        System.out.println(TestClass.class);
    }
}
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.jar.*;

public class TestJarFileSigBus {
    static String jarPath = "test.jar";
    static String classPath = "TestClass.class";

    public static void main(String[] args) throws IOException {
        createTestJarFile();
  
        try (JarFile jarFile = new JarFile(jarPath)) {
            clearJarFile();
            jarFile.getManifest();
        }
    }
  
    public static void clearJarFile() throws IOException {
        try (FileWriter fileWriter = new FileWriter(jarPath)) {
            fileWriter.write("");
        }
    }
  
    public static void createTestJarFile() throws IOException {
        Manifest manifest = new Manifest();
        Attributes mainAttributes = manifest.getMainAttributes();
        mainAttributes.put(new Attributes.Name("Manifest-Version"), "1.0.0");
        mainAttributes.put(new Attributes.Name("Main-Class"), "TestClass");
        Path path = Paths.get(classPath);
  
        try (JarOutputStream jos = new JarOutputStream(
                new FileOutputStream(jarPath), manifest)) {
            byte[] bytes = Files.readAllBytes(path);
            for (int i = 0; i < 10; i++) {
                JarEntry jarEntry = new JarEntry("TestClass" + i + ".class");
                jos.putNextEntry(jarEntry);
                jos.write(bytes);
                jos.closeEntry();
            }
            jos.finish();
        }
    }
}

参考

1. https://man7.org/linux/man-pages/man2/sigaction.2.html

2. https://www.cnblogs.com/catch/p/10973762.html

本页内容