常用数据结构之String字符串

字符串

在Java编程语言中，字符可以使用基本数据类型char来保存，在 Java 中字符串属于对象，Java 提供了 String 类来创建和操作字符串。

操作字符串常用的有三种类：String、StringBuilder、StringBuffer

接下来看看这三类常见用法

String

java">String value = new String("测试下");
System.out.println(value);

特点：

不可变

java">String value_a = "测试下原来值";
        value_a = "改变了";
        System.out.println(value_a);

实际打印

改变了

看起来好像改变了，那string为啥还具有不可变的特点呢

那咱来看看以上程序执行过程中value_a的指向

可以看到原来value_a指向“测试下原来值”字符串，后来有指向了“改变了”

为啥String不可变，咱进入String源码看看咋实现的

java">public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;

    /**
     * Class String is special cased within the Serialization Stream Protocol.
     *
     * A String instance is written into an ObjectOutputStream according to
     * <a href="{@docRoot}/../platform/serialization/spec/output.html">
     * Object Serialization Specification, Section 6.2, "Stream Elements"</a>
     */
    private static final ObjectStreamField[] serialPersistentFields =
        new ObjectStreamField[0];

    /**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    public String() {
        this.value = "".value;
    }

    /**
     * Initializes a newly created {@code String} object so that it represents
     * the same sequence of characters as the argument; in other words, the
     * newly created string is a copy of the argument string. Unless an
     * explicit copy of {@code original} is needed, use of this constructor is
     * unnecessary since Strings are immutable.
     *
     * @param  original
     *         A {@code String}
     */
    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

    /**
     * Allocates a new {@code String} so that it represents the sequence of
     * characters currently contained in the character array argument. The
     * contents of the character array are copied; subsequent modification of
     * the character array does not affect the newly created string.
     *
     * @param  value
     *         The initial value of the string
     */
    public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

    /**
     * Allocates a new {@code String} that contains characters from a subarray
     * of the character array argument. The {@code offset} argument is the
     * index of the first character of the subarray and the {@code count}
     * argument specifies the length of the subarray. The contents of the
     * subarray are copied; subsequent modification of the character array does
     * not affect the newly created string.
     *
     * @param  value
     *         Array that is the source of characters
     *
     * @param  offset
     *         The initial offset
     *
     * @param  count
     *         The length
     *
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} and {@code count} arguments index
     *          characters outside the bounds of the {@code value} array
     */
    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

    /**
     * Allocates a new {@code String} that contains characters from a subarray
     * of the <a href="Character.html#unicode">Unicode code point</a> array
     * argument.  The {@code offset} argument is the index of the first code
     * point of the subarray and the {@code count} argument specifies the
     * length of the subarray.  The contents of the subarray are converted to
     * {@code char}s; subsequent modification of the {@code int} array does not
     * affect the newly created string.
     *
     * @param  codePoints
     *         Array that is the source of Unicode code points
     *
     * @param  offset
     *         The initial offset
     *
     * @param  count
     *         The length
     *
     * @throws  IllegalArgumentException
     *          If any invalid Unicode code point is found in {@code
     *          codePoints}
     *
     * @throws  IndexOutOfBoundsException
     *          If the {@code offset} and {@code count} arguments index
     *          characters outside the bounds of the {@code codePoints} array
     *
     * @since  1.5
     */
    public String(int[] codePoints, int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= codePoints.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > codePoints.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }

        final int end = offset + count;

        // Pass 1: Compute precise size of char[]
        int n = count;
        for (int i = offset; i < end; i++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                continue;
            else if (Character.isValidCodePoint(c))
                n++;
            else throw new IllegalArgumentException(Integer.toString(c));
        }

        // Pass 2: Allocate and fill in char[]
        final char[] v = new char[n];

        for (int i = offset, j = 0; i < end; i++, j++) {
            int c = codePoints[i];
            if (Character.isBmpCodePoint(c))
                v[j] = (char)c;
            else
                Character.toSurrogates(c, v, j++);
        }

        this.value = v;
    }
}

可以看到String类有个final修饰符，final修饰符，用来修饰类、方法和变量，final 修饰的类不能够被继承，修饰的方法不能被继承类重新定义，修饰的变量为常量，是不可修改的

相当于string类型指向的具体内容不会变，例如刚才的例子，

value_a = "测试下原来值"; 这个数值不会变，原本有这个字符串有char value[]来进行保存本质不会被修改
value_a = "改变了"; 后续改变value_a的数值，会把原本value_a的引用指向新的字符串"改变了",而不是改变原来有char value[]保存的"测试下原来值"，但由于无引用指向原来的字符串"测试下原来值"，会被垃圾回收掉

结论：String不可变指的是String的内容不会变，但是可以改变String引用指向新的字符串

接下来看个🌰

java">String pre = "Hello,World";
        String new_value = "Hello,World";
        String object_value = new String("Hello,World");
        System.out.println(object_value.hashCode());
        System.out.println(pre.hashCode());
        System.out.println(new_value.hashCode());
        System.out.print("通过==判断pre是否与new_value相等->");
        System.out.println(pre==new_value);
        System.out.print("通过==判断pre是否与object_value相等->");
        System.out.println(pre==object_value);
        System.out.println("----------------");
        System.out.print("通过.equals判断pre是否与new_value相等->");
        System.out.println(pre.equals(new_value));
        System.out.print("通过.equals判断pre是否与object_value相等->");
        System.out.println(pre.equals(object_value));

以上程序分别会打印true还是false呢

可以看到通过字面创建的"Hello,World",不管新建多少次，==判断和equals判断都相等

通过对象创建的"Hello,World"和字面量新建的字符串通过==判断也相等，但是通过equals判断对象创建和字面量创建返回的false

首先来看看equals和==的区别

equals与==区别

==：判断两个对象的地址是否相等

equals：Object超级父类有个equals方法

java">    public boolean equals(Object obj) {
        return (this == obj);
    }

Object是直接判断两者地址是否相同，与==作用相同

而且所有的对象都会继承Object类

可以看看官方的解释

java">package java.lang;

/**
 * Class {@code Object} is the root of the class hierarchy.
 * Every class has {@code Object} as a superclass. All objects,
 * including arrays, implement the methods of this class.
 *类对象是类层次结构的根。每个类都有Object作为超类。所有对象，包括数组，都实现了此类的方法。
 * @author  unascribed
 * @see     java.lang.Class
 * @since   JDK1.0
 */
public class Object

所以String也继承了Object类，具有equals方法

来看看String的equals的方法咋实现的呢

java">public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;//假设两个对象地址相同==，则返回true 相当于两者相同
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {//循环遍历String的value数组
                    if (v1[i] != v2[i])
                        return false;//其中有任何一个不同，认定两者不同
                    i++;
                }
                return true; //所有字符都相同的情况下，相当于两者相同
            }
        }
        return false; //把不是String类与String对比 相当于两者不同
    }

可以看出来String的equals方法

1，假设两个对象地址相同==，则返回true 相当于两者相同

2，判断对象是否属于string

3，循环遍历String的value数组，其中有任何一个不同，认定两者不同

4，所有字符都相同的情况下，相当于两者相同

5，不是String类与String对比相当于两者不同

而Object是直接判断两者地址是否相同

所以再看看之前举的🌰

java">String pre = "Hello,World";
        String new_value = "Hello,World";
        String object_value = new String("Hello,World");
        System.out.println(object_value.hashCode());
        System.out.println(pre.hashCode());
        System.out.println(new_value.hashCode());
        System.out.print("通过==判断pre是否与new_value相等->");
        System.out.println(pre==new_value);
        System.out.print("通过==判断pre是否与object_value相等->");
        System.out.println(pre==object_value);
        System.out.println("----------------");
        System.out.print("通过.equals判断pre是否与new_value相等->");
        System.out.println(pre.equals(new_value));
        System.out.print("通过.equals判断pre是否与object_value相等->");
        System.out.println(pre.equals(object_value));

pre和new_value都是通过字面量的形式创建的字符串，"Hello,World"字符串会保存到常量池当中，新建了pre引用变量和new_value引用变量指向的同一个"Hello,World"对象，所以两者本质属于同个对象，所以==和equals都相同

object_value在堆中新建了对象，object_value引用变量指向堆中的"Hello,World"对象

所以pre==object_value会返回false，两者属于两个对象

pre.equals(new_value)返回true，由于String对equals重写了，只需两者都是String对象，且value数组的值都相同则返回true

hashcode

再看看为啥这三者的hashcode都相同呢

直接看源码

java">public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

获取哈希码，也称为散列码

hashCode()方法‌：用于返回对象的哈希码，这是一个整数值。哈希码用于确定对象在哈希表中的索引位置。理想情况下，不同的对象应该产生不同的哈希码，但这不是强制的。哈希码的计算通常基于对象的属性，不同的对象可能产生相同的哈希码，这就是所谓的哈希冲突。

上述hashcode实现简化成:

value[n]，字符数组有n个字符计算规则顺序定义h[n]为当前遍历计算出来的值

n=0 h[0]=value[0]

n=1: h[1]=h[0]*31+value[1]

n=2: h[2] = h[1]*31+value[2]=(value[0]*31+value1)*31+value[2]

选择 31原因：

优化计算‌：31可以被编译器优化为31 * i = (i << 5) - i，这种形式利用了位运算和减法，比直接的乘法运算效率更高‌

减少冲突‌：31是一个质数，使用质数作为乘法因子可以降低哈希冲突的概率。质数的乘法性质使得每个字符的哈希值与31相乘时，都能保持相对独立的哈希值，从而减少冲突的可能性‌

31用2进制来表达为 11111 ，31*i等价于(i<<5)-i，即先将i左移5位（相当于乘以32），然后再减去i

注意点：

1，两者equals相同，则两者的hashcode返回整数值相同

2，若两者hashcode相同，equals不一定相同

3，不同的对象hashcode，为了使得减少哈希表的冲突，尽量保持不同

可以看到pre、new_value、object_value三者的value数组保存的数值相同都是"Hello,World"

所以通过hash算法算出来的索引为止即hash值相同

接下来再看个🌰

java">String a = "Hello";
        String b = "Hello";
        String c = new String("Hello");
        String e = a+b;
        String t = a+b+c;
        String l = a+"World";
        String l1 = "Hello"+World"

以上程序共新建了多少对象呢

a="Hello"，一个对象

b由于"Hello"在字符串常量池中存在，

c 在堆中新建1个新的字符串常量对象"Hello"，栈内存储c变量引用指向它

e 通过+来拼接字符串实际通过StringBuilder来构建的所有会新建个对象StringBuilder，

t a="Hello"在常量池中有，无需新建，b="Hello"在常量池中有，无需新建，c为堆对象引用，已新建过了，但是新的字符串"HelloHelloHello"会放入堆中新建个对象

l "World"和"Hello"会在编译期间拼接，新建"HelloWorld"对象，注意这里不会新建"World"对象进入常量池

l1 编译期间拼接成"HelloWorld" 新建1个对象注意这里与l不同 l通过StringBuilder拼接的两个对象部署同个对象

所以总共新建了6个对象

总结

字面量新建

java">String a = "Hello"; // 在字符串常量池中存入"Hello" 新建1个对象 a引用变量指向它
String b = "Hello";// 在字符串常量找到了"Hello" 无需新建对象 b引用变量指向它
String c = "World";// 在字符串常量池中找不到"World" 常量池中存入"World" 新建1个对象 c引用变量指向它

对象新建

java">String a = "Hello"; // 在字符串常量池中存入"Hello" 新建1个对象 a引用变量指向它
String b = "Hello";// 在字符串常量找到了"Hello" 无需新建对象 b引用变量指向它
String c = "World";// 在字符串常量池中找不到"World" 常量池中存入"World" 新建1个对象 c引用变量指向它
String t = new String("Hello");//"Hello"在字符串常量池中存在，无需在常量池在存入"Hello",在堆中新建对象指向它
String t1 = new String("HelloWorld");//"HelloWorld"在字符串常量池中不存在，在常量池在存入"HelloWorld"，内存堆中新建对象指向它

+拼接

java">String a = "Hello"; // 在字符串常量池中存入"Hello" 新建1个对象 a引用变量指向它
String b = "Hello";// 在字符串常量找到了"Hello" 无需新建对象 b引用变量指向它
String c = "World";// 在字符串常量池中找不到"World" 常量池中存入"World" 新建1个对象 c引用变量指向它
String t = new String("Hello");//"Hello"在字符串常量池中存在，无需在常量池在存入"Hello",在堆中新建对象指向它
String t1 = new String("HelloWorld");//"HelloWorld"在字符串常量池中不存在，在常量池在存入"HelloWorld"，内存堆中新建对象指向它
String b1 = "Hello"+"World";//"HelloWorld"在常量池中已有，所以无需新建对象
String new_value = "Hello"+"Java";//"HelloJava"在常量池中不存在，常量池新建"HelloJava"
String value_new = a+b;//通过StringBuilder拼接会新建个对象

拼接"Hello"常量：直接拼接，会在编译期间进入常量池

但不是所有的常量都会进行折叠，只有编译器在程序编译期就可以确定值的常量才可以

基本数据类型( byte、boolean、short、char、int、float、long、double)以及字符串常量。
final 修饰的基本数据类型和字符串变量
字符串通过 “+”拼接得到的字符串、基本数据类型之间算数运算（加减乘除）、基本数据类型的位运算（<<、>>、>>> ）

引用的值在程序编译期是无法确定的，编译器无法对其进行优化

拼接常量引用变量：字符对象的拼接实际上底层是使用的StringBuilder的append方法，先将字符串对象转换成StringBuilder然后调用append方法之后再调用toString(),此时生成的是另一个String对象，String对象存储在堆中，不会存入常量池

注意：‌String().intern()方法在Java中的作用是将字符串对象添加到字符串常量池中，如果常量池中已经存在相同的字符串，则返回该字符串的引用；如果不存在，则创建一个新的字符串对象并添加到常量池中‌‌

对象引用和“+”的字符串拼接方式，实际上是通过 StringBuilder 调用 append() 方法实现的，拼接完成之后调用 toString() 得到一个 String 对象。

注意被final修饰的常量的引用拼接可以直接在编译期间再进入常量池

但修饰1个当时无法确定的数值，即在运行时才可以确定的值，则海还是会通过StringBuilder来拼接

开发中应该减少多个字符串拼接操作

所以出现了StringBuilder和StringBuffer

StringBuilder

由于string为不可变的，后续又设计了stringbuilder类

‌StringBuilder是Java中的一个可变字符串操作类，主要用于在需要频繁修改字符串内容的场景下使用，以提高性能。

StringBuilder的优势和适用场景

‌性能优势‌：StringBuilder是可变的，对字符串的修改直接作用于当前对象，无需创建新对象，因此在需要频繁拼接或修改字符串时，性能更高‌1。
‌适用场景‌：适合在单线程环境下使用，特别是在本地应用程序或单线程任务中需要频繁修改字符串时，StringBuilder的性能优于StringBuilder

源码

java">public final class StringBuilder
    extends AbstractStringBuilder
    implements java.io.Serializable, CharSequence
{

    /** use serialVersionUID for interoperability */
    static final long serialVersionUID = 4383685877147921099L;

    /**
     * Constructs a string builder with no characters in it and an
     * initial capacity of 16 characters.
     */
    public StringBuilder() {
        super(16);
    }

    /**
     * Constructs a string builder with no characters in it and an
     * initial capacity specified by the {@code capacity} argument.
     *
     * @param      capacity  the initial capacity.
     * @throws     NegativeArraySizeException  if the {@code capacity}
     *               argument is less than {@code 0}.
     */
    public StringBuilder(int capacity) {
        super(capacity);
    }

    /**
     * Constructs a string builder initialized to the contents of the
     * specified string. The initial capacity of the string builder is
     * {@code 16} plus the length of the string argument.
     *
     * @param   str   the initial contents of the buffer.
     */
    public StringBuilder(String str) {
        super(str.length() + 16);
        append(str);
    }

    /**
     * Constructs a string builder that contains the same characters
     * as the specified {@code CharSequence}. The initial capacity of
     * the string builder is {@code 16} plus the length of the
     * {@code CharSequence} argument.
     *
     * @param      seq   the sequence to copy.
     */
    public StringBuilder(CharSequence seq) {
        this(seq.length() + 16);
        append(seq);
    }

    @Override
    public StringBuilder append(Object obj) {
        return append(String.valueOf(obj));
    }

    @Override
    public StringBuilder append(String str) {
        super.append(str);
        return this;
    }

    /**
     * Appends the specified {@code StringBuffer} to this sequence.
     * <p>
     * The characters of the {@code StringBuffer} argument are appended,
     * in order, to this sequence, increasing the
     * length of this sequence by the length of the argument.
     * If {@code sb} is {@code null}, then the four characters
     * {@code "null"} are appended to this sequence.
     * <p>
     * Let <i>n</i> be the length of this character sequence just prior to
     * execution of the {@code append} method. Then the character at index
     * <i>k</i> in the new character sequence is equal to the character at
     * index <i>k</i> in the old character sequence, if <i>k</i> is less than
     * <i>n</i>; otherwise, it is equal to the character at index <i>k-n</i>
     * in the argument {@code sb}.
     *
     * @param   sb   the {@code StringBuffer} to append.
     * @return  a reference to this object.
     */
    public StringBuilder append(StringBuffer sb) {
        super.append(sb);
        return this;
    }

    @Override
    public StringBuilder append(CharSequence s) {
        super.append(s);
        return this;
    }
}

可以看出来构造函数有4个

StringBuilder()：字符初始容量为16的StringBuilder

StringBuilder(int capacity)：字符初始容量为指定数量的StringBuilder

StringBuilder(CharSequence seq)：包含与指定的CharSequence相同的字符序列

StringBuilder(String str)：包含与指定的String相同的字符序列

常用方法：

append():

java">public AbstractStringBuilder append(String str) {
        if (str == null)
            return appendNull();
        int len = str.length();
        ensureCapacityInternal(count + len);
        str.getChars(0, len, value, count);
        count += len;
        return this;
    }

可以看出来返回的对象是this，不会增加新的对象，对比String内存占用少了很多

insert():

java">public AbstractStringBuilder insert(int offset, char[] str) {
        if ((offset < 0) || (offset > length()))
            throw new StringIndexOutOfBoundsException(offset);
        int len = str.length;
        ensureCapacityInternal(count + len);
        System.arraycopy(value, offset, value, offset + len, count - offset);
        System.arraycopy(str, 0, value, offset, len);
        count += len;
        return this;
    }

可以看出来返回的对象是this，不会增加新的对象，对比String内存占用少了很多

toString():

java">@Override
    public String toString() {
        // Create a copy, don't share the array
        return new String(value, 0, count);
    }

StringBuilder类不是线程安全的，有多个线程同时对同一个StringBuilder对象进行操作，可能会出现并发问题。

StringBuffer

‌线程安全的，主要原因在于其内部方法关键字进行同步‌。这意味着多个线程可以安全地同时访问和修改同一个StringBuffer对象，而不会导致数据不一致或其他线程相关的问题‌

java">@Override
    public synchronized StringBuffer append(String str) {
        toStringCache = null;
        super.append(str);
        return this;
    }

java">synchronized即同步锁，当前线程执行情况下，其它线程会同步等待直至当前线程释放锁