java.lang.String的substring、split方法引起的内存问题

项目运行遇到了OutOfMemoryError异常.内存溢出？觉得是不是MaxPermSize设置小了，又给了1个G的大小，试下还是不行，然后使用Java heap分析工具，找出内存占用超出预期的嫌疑对象

dump heap

Heap Dump也叫堆转储文件，是一个Java进程在某个时间点上的内存快照。Heap Dump是有着多种类型的。不过总体上heap dump在触发快照的时候都保存了java对象和类的信息。通常在写heap dump文件前会触发一次FullGC，所以heap dump文件中保存的是FullGC后留下的对象信息。关于Heap Dump

使用jconsole获取dump heap：
建立连接后，选择页签MBean，执行com.sun.management. HotSpotDiagnostic下的操作dumpHeap。第一个参数p0是要获取的dump文件的完整路径名，记得文件要以.hprof作为扩展名（要在Memory AnalysisPerspective下打开扩展名必须是这个）。如果我们只想获取live的对象，第二个参数p1需要保持为true。
JDK自带的jmap工具:
Java代码

jmap -dump:format=b,file=heap.bin <pid>

format=b的含义是，dump出来的文件时二进制格式。
file-heap.bin的含义是，dump出来的文件名是heap.bin。
<pid>就是JVM的进程号。
（在linux下）先执行ps aux | grep java，找到JVM的pid；然后再执行jmap -dump:format=b,file=heap.bin <pid>，得到heap dump文件。

analyze heap
将二进制的heap dump文件解析成human-readable的信息，自然是需要专业工具的帮助，Memory Analyzer
　Memory Analyzer，简称MAT，是Eclipse基金会的开源项目，由SAP和IBM捐助。巨头公司出品的软件还是很中用的，MAT可以分析包含数亿级对象的heap、快速计算每个对象占用的内存大小、对象之间的引用关系、自动检测内存泄露的嫌疑对象，功能强大，而且界面友好易用。
　MAT的界面基于Eclipse开发，以两种形式发布：Eclipse插件和Eclipe RCP。MAT的分析结果以图片和报表的形式提供，一目了然。

最后发现到这里内存突然就爆增

看到这里我就觉得是不是split()使用的有什么问题于是就上网查了一下split学习了一下。

原文连接：https://blog.csdn.net/caihaijiang/article/details/7748560

先用一个极端例子说明String的substring方法引起的OutOfMemoryError问题：

public class TestGC {   
  private String large = new String(new char[100000]);   
  
  public String getSubString() {   
    return this.large.substring(0,2);   
  }   
  
  public static void main(String[] args) {   
    ArrayList<String> subStrings = new ArrayList<String>();   
    for (int i = 0; i <1000000; i++) {   
      TestGC testGC = new TestGC();   
      subStrings.add(testGC.getSubString());   
    }   
  }   
}

：对一个很长的字符串，使用substring循环保留该字符串里面的一小部分，保存到HashMap中
运行该程序，结果出现：Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

为什么会出现这个情况？查看一下JDK String类substring方法的源码，可以找到原因，源码如下：

    public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > count) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    if (beginIndex > endIndex) {
        throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
    }
    return ((beginIndex == 0) && (endIndex == count)) ? this :
        new String(offset + beginIndex, endIndex - beginIndex, value);
    }

该方法最后一行，调用了String的一个私有的构造方法，如下：

    // Package private constructor which shares value array for speed.
    String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
    }

该方法为了避免内存拷贝，提高性能，并没有重新创建char数组，而是直接复用了原String对象的char[]，通过改变偏移量和长度来标识不同的字符串内容。也就是说，substring出的来String小对象，仍然会指向原String大对象的char[]，所以就导致了OutOfMemoryError问题。

找到问题之后，将上面代码中，getSubString的方法修改一下，如下：

    public String getSubString() {
        return new String(this.large.substring(0,2)); 
    }

将substring的结果，重新new一个String出来。再运行该程序，则没有出现OutOfMemoryError的问题。为什么？因为此时调用的是String类的public的构造方法，该方法源码如下：

    public String(String original) {
    int size = original.count;
    char[] originalValue = original.value;
    char[] v;
      if (originalValue.length > size) {
         // The array representing the String is bigger than the new
         // String itself.  Perhaps this constructor is being called
         // in order to trim the baggage, so make a copy of the array.
            int off = original.offset;
            v = Arrays.copyOfRange(originalValue, off, off+size);
     } else {
         // The array representing the String is the same
         // size as the String, so no point in making a copy.
        v = originalValue;
     }
    this.offset = 0;
    this.count = size;
    this.value = v;
    }

从代码可以看出，在String对象中value的length大于count的情况下，会重新创建一个char[]，并进行内存拷贝。

除了substring方法之后，String的split方法，也存在同样的问题，split的源码如下：

    public String[] split(String regex, int limit) {
    return Pattern.compile(regex).split(this, limit);
    }

可以看出，String的split方法通过Pattern的split方法来实现，Pattern的split方法源码如下：

public String[] split(CharSequence input, int limit) {
        int index = 0;
        boolean matchLimited = limit > 0;
        ArrayList<String> matchList = new ArrayList<String>();
        Matcher m = matcher(input);
 
        // Add segments before each match found
        while(m.find()) {
            if (!matchLimited || matchList.size() < limit - 1) {
                String match = input.subSequence(index, m.start()).toString();
                matchList.add(match);
                index = m.end();
            } else if (matchList.size() == limit - 1) { // last one
                String match = input.subSequence(index,
                                                 input.length()).toString();
                matchList.add(match);
                index = m.end();
            }
        }
 
        // If no match was found, return this
        if (index == 0)
            return new String[] {input.toString()};
 
        // Add remaining segment
        if (!matchLimited || matchList.size() < limit)
            matchList.add(input.subSequence(index, input.length()).toString());
 
        // Construct result
        int resultSize = matchList.size();
        if (limit == 0)
            while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
                resultSize--;
        String[] result = new String[resultSize];
        return matchList.subList(0, resultSize).toArray(result);
    }

方法中的第9行： Stirng match = input.subSequence(intdex, m.start()).toString();
调用了String类的subSequence方法，该方法源码如下：

    public CharSequence subSequence(int beginIndex, int endIndex) {
        return this.substring(beginIndex, endIndex);
    }

通过代码可以看出，最终调用的是String类的substring方法，因此存在同样的问题。split出来的小对象，直接使用原String对象的char[]。

看了一下StringBuilder和StringBuffer的substring方法，则不存在这样的问题。其源码如下：

    public String substring(int start, int end) {
    if (start < 0)
        throw new StringIndexOutOfBoundsException(start);
    if (end > count)
        throw new StringIndexOutOfBoundsException(end);
    if (start > end)
        throw new StringIndexOutOfBoundsException(end - start);
        return new String(value, start, end - start);
    }

最后一行，调用了String类的public构造方法，方法源码如下：

    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.offset = 0;
        this.count = count;
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

方法不是直接使用原String对象的char[]，而是重新进行了内存拷贝。

java.lang.String的substring、split方法引起的内存问题

dump heap

JavaWeb前端技术细节记录

Java关于文件和目录的操作

评论 (0)