String Interning, the String Pool, and Memory in Java — What Actually Happens

by Arif Ikhsanudin, Backend Developer

The three ways a String ends up in memory

Not all String objects are the same in Java. Where a string lives in memory and whether it shares identity with another string of the same content depends on how it was created.

String literals — any string written directly in source code — are placed in the string pool (also called the string constant pool) at class load time. The pool is deduplicated: two class files containing the literal "pending" reference the same String object in the pool, not two separate objects.

new String(...) — explicitly allocating a string — always creates a new object on the heap, separate from the pool, even if an identical string already exists in the pool.

String.intern() — returns the pooled version of a string, adding it to the pool if not already present.

String a = "hello";           // pool
String b = "hello";           // same pool entry as a
String c = new String("hello"); // new heap object, not the pool entry
String d = c.intern();        // returns the pool entry — same as a and b

System.out.println(a == b);   // true  — same pool object
System.out.println(a == c);   // false — c is a separate heap object
System.out.println(a == d);   // true  — d is the pool entry
System.out.println(a.equals(c)); // true — content is the same

This is why == for string comparison is a bug — two strings with identical content may or may not be the same object depending on how they were created. equals() always compares content. == compares identity.

Where the pool lives

Before Java 7, the string pool was in PermGen — a fixed-size memory region separate from the heap. This made aggressive interning dangerous: fill PermGen with interned strings and you get OutOfMemoryError: PermGen space.

Since Java 7, the string pool is on the heap. This means:

  • Pool strings are subject to GC (though in practice, strings interned from literals are reachable through class metadata and rarely collected)
  • The pool can grow as large as the heap allows
  • -XX:StringTableSize controls the number of buckets in the pool's hash table (default 65536 in Java 11+, tunable for large-scale interning)

The pool is implemented as a hash table keyed by string content. intern() performs a lookup — O(1) average — and either returns the existing entry or inserts the new one.

How the JIT and javac interact with literals

The Java compiler performs compile-time string concatenation of literals. Constant string expressions are folded at compile time, not runtime:

String s1 = "hello" + " " + "world"; // compile-time: single literal "hello world"
String s2 = "hello world";

System.out.println(s1 == s2); // true — both reference the same pool entry

The compiler folds the concatenation into a single literal. Both s1 and s2 reference the same pool entry. If any operand is a variable (not a compile-time constant), folding doesn't apply:

String prefix = "hello";
String s3 = prefix + " world"; // runtime concatenation — new heap object

System.out.println(s2 == s3); // false — s3 is a separate heap object

final variables that are compile-time constants are treated as literals:

final String PREFIX = "hello";
String s4 = PREFIX + " world"; // compile-time constant — folded to "hello world"

System.out.println(s2 == s4); // true

This is a subtle distinction: final variables that are initialized with non-constant expressions — final String timestamp = LocalDateTime.now().toString() — are not compile-time constants and do not participate in constant folding.

String concatenation and allocation

The + operator on strings compiles to StringBuilder operations in modern Java (via invokedynamic since Java 9, StringConcatFactory). Each concatenation expression creates a new String object on the heap — not in the pool — along with the intermediate StringBuilder:

String result = "Order " + orderId + " status: " + status;
// Roughly equivalent to:
// new StringBuilder().append("Order ").append(orderId)
//     .append(" status: ").append(status).toString()

In a hot path called millions of times, this allocates two objects per call (the StringBuilder and the result String). For logging — where the string may not even be used if the log level is off — this allocation happens before the level check:

// Allocates the string even if DEBUG is disabled
logger.debug("Processing order " + orderId + " for user " + userId);

// No allocation if DEBUG is disabled — lambda is only evaluated if needed
logger.debug("Processing order {} for user {}", orderId, userId);
// Or with a supplier:
logger.debug(() -> "Processing order " + orderId + " for user " + userId);

SLF4J's parameterized logging ({} placeholders) defers string construction to after the level check. This is not a minor optimization in high-throughput services — logging at DEBUG in a method called 100,000 times per second creates 200,000 objects per second if the string is always constructed.

intern() — when it helps and when it backfires

intern() is appropriate when you have a large number of objects holding the same string values, and equality checks are frequent and performance-sensitive. The canonical case: a field that holds one of a small set of known values — status codes, category names, currency codes.

// Without interning — each deserialized record creates a new String
record.setStatus(jsonNode.get("status").asText()); // "pending", "shipped", etc.

// With interning — all records with status "pending" share one object
record.setStatus(jsonNode.get("status").asText().intern());

// Equality check becomes identity check
if (record.getStatus() == "pending") { ... } // valid after interning

The memory saving: 10 million records each holding a separate "pending" string consumes 10 million String objects (~240MB on a 64-bit JVM with compressed oops). With interning, they all reference one object.

The identity-check optimization is real but dangerous as a practice — it works only if you can guarantee all strings in the comparison have been interned, which requires discipline across the entire codebase. Miss one new String(...) and == silently returns false. equals() is always safer.

The risk: interning high-cardinality strings — user IDs, session tokens, request IDs — fills the pool with unique values that are never GC'd (pool entries backed by class metadata remain reachable). This is the String.intern() memory leak described in the memory leaks article.

The rule: intern strings only if the cardinality is bounded and small. Status codes, ISO currency codes, HTTP method names — these are safe to intern. Arbitrary user input, request identifiers, URLs — these are not.

G1 string deduplication — automatic without interning

G1GC has a background string deduplication feature that identifies String objects with identical content and replaces their backing char[] (or byte[] since Java 9's compact strings) with a shared reference — without changing the String object's identity or moving it to the pool:

-XX:+UseStringDeduplication  # requires -XX:+UseG1GC (default since Java 9)

String deduplication runs as part of the concurrent GC cycle. It identifies duplicate backing arrays and makes them reference the same underlying data. The String objects remain separate heap objects — == is still false — but they share backing storage.

This reduces heap usage for applications with many duplicate strings without the risks of intern(). The tradeoff: deduplication runs on the GC thread and has a small throughput cost. For applications with high string duplication (log processing, data pipelines, applications that parse the same field values repeatedly), the memory savings typically outweigh the cost.

Monitor with:

-XX:+PrintStringDeduplicationStatistics

This logs how many strings were deduplicated and how much space was reclaimed.

The equals() contract and pool assumptions

One final trap: code that assumes pool membership and uses == breaks when strings arrive from outside the pool:

// Brittle — works only if status was interned or is a literal
if (order.getStatus() == "PENDING") { ... }

// This works regardless of how status was created
if ("PENDING".equals(order.getStatus())) { ... }
// Putting the literal first also handles null safely — no NullPointerException

The equals() method is defined on content, not identity. It works correctly regardless of whether either string was interned, created with new, deserialized from JSON, read from a database, or produced by concatenation. == works correctly only for strings you can guarantee are pool entries — which in practice means only literal comparisons, and even then only within the same class loader.

The practical takeaway: use equals() for string comparison in all application code. Use intern() only for deliberate memory optimization on bounded-cardinality strings, with awareness of the pool growth risk. Let G1's deduplication handle the rest if memory pressure from duplicate strings is a measured problem.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

What Clients Wish Their Contractors Would Just Tell Them

The things clients most want to hear from contractors are usually the things contractors are most reluctant to say. Knowing the gap makes it easier to close.

Read more

Feature Flags: Ship Code Without Releasing Features

Feature flags decouple code deployment from feature release — letting teams ship continuously while controlling what users see. They're one of the most effective tools for reducing deployment risk, and one of the most commonly misused.

Read more

Async Communication Is a Skill. Most Remote Contractors Have Not Mastered It.

Asynchronous communication is not just communication that happens to be written. It is a different discipline — one that most remote workers learned by accident and most contractors never fully internalized.

Read more

Documentation Is Not a Chore. It Is Part of the Work.

Engineers who treat documentation as separate from engineering work produce systems that are harder to operate, extend, and hand off. The ones who treat it as integral produce systems that outlast their original authors.

Read more