Foundations of Anagrams and String Manipulation in Java

Posts

An anagram, in the context of computer science and Java programming, refers to a pair of strings, words, or phrases that are created by rearranging the letters of each other. In simpler terms, two strings are considered anagrams if they are composed of the exact same characters with the exact same frequency, but the order of those characters is different. For example, the word “listen” is an anagram of “silent.” You can rearrange the letters in “listen” to form “silent,” and vice versa. This concept is a fundamental building block for many algorithmic problems and string manipulation tasks.

The critical characteristic of anagrams is that they must contain the same set of letters, and each letter must appear the same number of times. The string “apple” is not an anagram of “apply,” even though they are similar. The string “apple” has two ‘p’s, one ‘a’, one ‘l’, and one ‘e’. The string “apply” has one ‘a’, one ‘p’, new ‘l’, and one ‘y’. They do not share the same set of letters, so they are not anagrams. Understanding this core definition is the first and most crucial step before writing any code.

In Java, creating a program to check if two strings are anagrams involves writing logic that can compare the characters in both strings and ensure that they have the exact same frequency of occurrence. This is not as simple as a direct string comparison. The logic must be able to look past the order. This often requires pre-processing the strings by removing spaces, converting all letters to a typical case (either uppercase or lowercase), and then using a specific algorithm, such as sorting or frequency mapping, to compare the characters.

The Importance of Anagrams in Algorithmic Thinking

Learning to detect anagrams is a fundamental skill for any Java developer, not just for its own sake, but for what it teaches. It serves as an excellent introduction to several core computer science concepts. When you learn to solve the anagram problem, you are simultaneously mastering string manipulation, which is a common task in almost any software application. You are also engaging in algorithmic thinking, which is the process of breaking down a complex problem into a series of small, logical, and efficient steps. This problem-solving technique is the essence of programming.

The anagram problem is a classic interview question because it allows an interviewer to assess a candidate’s thought process. There are multiple ways to solve it, each with different trade-offs in performance and simplicity. A basic solution might be easy to write but slow to run. A more advanced solution might be much faster but require more complex code or a deeper understanding of data structures. This demonstrates a developer’s ability to analyze a problem and choose the right tool for the job.

Furthermore, the techniques used to solve the anagram problem are transferable. The concept of counting character frequencies, for example, is used in countless other algorithms, such as finding the most common character in a string or checking for palindromes. The method of sorting characters to find a “canonical” form is used in tasks like grouping similar items. By mastering anagram detection, you are opening the door to solving a much wider class of programming challenges and advancing your career.

Java Strings: The Immutable Foundation

Before we can manipulate strings to check for anagrams, we must understand their fundamental nature in Java. In Java, objects of the String class are immutable. This means that once a String object is created, its value cannot be changed. When you think you are modifying a string, such as by converting it to lowercase or removing spaces, Java is actually creating a brand new String object in memory with the modified content. The original string remains unchanged.

This immutability has significant implications for our anagram programs. We cannot, for example, sort a string “in place.” We must first convert the string into a data structure that is mutable, such as a character array (char[]). This conversion itself is an operation that takes time and creates a new object in memory. Understanding this is key to accurately analyzing the performance and memory usage of our algorithms. It is a core concept that separates novice programmers from experienced Java developers.

For instance, a line of code like str1 = str1.toLowerCase(); does not modify the original string str1 points to. Instead, it creates a new string object containing the lowercase version of the characters and then reassigns the str1 reference variable to point to this new object. The original string object, if no other variable points to it, will eventually be collected by Java’s garbage collector. This is an important distinction for writing efficient and bug-free code.

The Role of Character Arrays

Since Java strings are immutable, our primary tool for in-place manipulation will be the character array, or char[]. A string can be easily converted into a character array using the .toCharArray() method, which is available on every String object. This method creates a new array and copies all of the string’s characters into it. Once we have this array, we have full control over it. We can swap characters, overwrite them, and, most importantly for one of our methods, sort them.

Using a char[] is essential for the sorting-based approach to anagram detection. After converting both input strings into character arrays, we can use the built-in Arrays.sort() method to sort both arrays alphabetically. If the two original strings were indeed anagrams, they would be composed of the same set of characters. Therefore, after sorting, the two character arrays should be identical. We can then compare the sorted arrays element by element to make our final determination.

This conversion is a key step. It transforms the problem from an abstract comparison of immutable strings into a concrete comparison of two mutable arrays. This pattern of “convert, manipulate, then compare” is very common in Java programming when dealing with strings. It respects the immutability of the String class while still giving us the power to perform the complex manipulations needed to solve the algorithm.

Pre-Processing: Normalizing Strings for Comparison

A common mistake when checking for anagrams is to compare the raw input strings. Consider the words “Listen” and “silent.” A human can instantly see they are anagrams. However, a naive computer program would see them as different. The ‘L’ in “Listen” is an uppercase character, while the ‘s’ in “silent” is lowercase. The program would see them as entirely distinct characters. Similarly, a phrase like “A decimal point” is an anagram of “I’m a dot in place,” but a program would fail if it tried to compare the spaces, the apostrophe, and the period.

This is why a pre-processing or “normalization” step is required before we check for anagrams. Normalization is the process of cleaning and standardizing the input strings so that only the essential data—the letters themselves—remains for comparison. This is a critical step that must be applied identically to both strings. If we fail to do this, our comparison will be unreliable and produce incorrect results, known as “false negatives.”

Our normalization process will consist of two main parts. First, we will convert both strings to a common case, typically lowercase, to handle case-insensitive comparisons. Second, we will remove any characters that are not part of the anagram check, such as spaces, punctuation, or numbers. By performing this cleanup, we ensure that we are comparing only the core components of the strings, allowing for a true and accurate anagram detection.

The Pitfall of Case Sensitivity

As mentioned, case sensitivity is the first and most common trap. In Java, as in most programming languages, the character ‘A’ is represented by a different value than the character ‘a’. They are not the same. If we compare “Listen” and “silent” directly, any algorithm we write will fail. The Arrays.sort() method, for example, would place ‘L’ before ‘s’, but it would place ‘l’ after ‘s’, leading to two different sorted arrays.

The solution is to convert both strings to a uniform case before doing any other processing. The String class in Java provides the .toLowerCase() method for this exact purpose. This method returns a new string where all characters have been converted to their lowercase equivalents. By calling this method on both str1 and str2, we ensure that ‘L’ in “Listen” becomes ‘l’, and it can then be correctly matched with the ‘l’ in “silent.”

This step is non-negotiable for a robust anagram checker. A program that only works for strings that are already in the same case is an incomplete program. It fails to account for real-world input. A good developer anticipates these edge cases and handles them gracefully. Converting to lowercase (or uppercase, the choice is arbitrary as long as it is consistent) is the standard way to solve the case sensitivity problem.

Handling Spaces and Punctuation

The second part of normalization is handling non-alphabetic characters. Phrases like “Dormitory” and “Dirty room” are classic anagram examples. If we simply convert them to lowercase, we get “dormitory” and “dirty room.” These are still not equal. The second string contains a space, which the first one lacks. Our algorithm would compare them and, upon finding the space, incorrectly conclude they are not anagrams. The same issue applies to punctuation, numbers, or any other symbols.

To solve this, we must remove these non-essential characters. The String class in Java provides the powerful .replaceAll() method, which can be used with a regular expression (regex) to achieve this. A regular expression is a special pattern for matching text. For example, the regex \\s matches all whitespace characters (spaces, tabs, etc.). We can use str1.replaceAll(“\\s”, “”) to create a new string with all whitespace removed.

A more robust solution would be to remove everything that is not a letter. We can do this with a regex like [^a-zA-Z]. This pattern means “match any character that is NOT (^) in the range of ‘a’ through ‘z’ or ‘A’ through ‘Z’.” By replacing this pattern with an empty string, we are left with only the letters. Applying str1.replaceAll(“[^a-zA-Z]”, “”).toLowerCase() becomes our complete, one-line normalization function that handles both spaces and case.

A Basic Length Check: The First Optimization

Before we run any complex algorithms, such as sorting or building a frequency map, we can perform a very simple and fast check. Two strings cannot possibly be anagrams if they do not have the same number of relevant characters. For example, “hello” (5 letters) can never be an anagram of “goodbye” (7 letters). This check is incredibly efficient and can save us a significant amount of processing time.

It is important to perform this length check after the normalization step. For example, the raw strings “Dirty room” (10 characters) and “Dormitory” (9 characters) have different lengths. However, after we normalize them by removing the space, the resulting strings are “dirtyroom” (9 characters) and “dormitory” (9 characters). Their lengths are now identical, and they can proceed to the next stage of the anagram check.

Therefore, the very first step in our areAnagrams method, after normalization, should be an if statement. This statement will be if (str1.length() != str2.length()) { return false; }. This simple line of code is our first optimization. It immediately rejects any impossible pairs without wasting time on sorting or map creation. This is a key principle of efficient algorithm design: handle the simple “fail-fast” cases first.

The Sorting Algorithm for Anagram Detection

One of the most intuitive and common methods for checking if two strings are anagrams is by sorting them. The logic is straightforward: if two strings are anagrams, they are composed of the exact same set of characters. Therefore, if we convert both strings into character arrays and then sort those arrays alphabetically, the two arrays should become identical. If they are not identical after sorting, the original strings could not have been anagrams.

This approach transforms the complex problem of comparing character frequencies and order into a simple problem of array equality. For example, take “listen” and “silent.” When converted to character arrays, we get [‘l’, ‘i’, ‘s’, ‘t’, ‘e’, ‘n’] and [‘s’, ‘i’, ‘l’, ‘e’, ‘n’, ‘t’]. After sorting both arrays alphabetically, they both become [‘e’, ‘i’, ‘l’, ‘n’, ‘s’, ‘t’]. Since the sorted arrays are identical, we can confidently conclude the strings are anagrams. This method is elegant, easy to understand, and relies on built-in Java utilities.

This section will provide a complete, in-depth walkthrough of how to build an anagram checker in Java using this sorting method. We will start with the full program code and then break it down, explaining each component step by step. We will also analyze the performance of this algorithm, discussing its time and space complexity, and when it is the most appropriate choice.

Step-by-Step: Implementing the Sorting Method

The implementation of the sorting method can be broken down into a clear sequence of steps. This logical flow will become the body of our areAnagrams helper method. The steps are as follows. First, we will create a method that accepts two strings, str1 and str2, as parameters and returns a boolean value (true or false). Second, we will normalize these strings by removing all non-alphabetic characters and converting them to a uniform lowercase.

Third, we will perform our initial optimization: a length check. If the lengths of the two normalized strings are not equal, we will immediately return false. Fourth, if the lengths are the same, we will convert both normalized strings into character arrays. Fifth, we will use the java.util.Arrays.sort() method to sort both character arrays in place. Finally, we will use the java.util.Arrays.equals() method to compare the two sorted arrays. If they are identical, we return true; otherwise, we return false.

Code Walkthrough: Anagrams Using Arrays.sort

Here is the full Java class demonstrating the sorting method for anagram detection. This code is self-contained and can be run directly. It includes the main helper method areAnagrams as well as a main method to test our code with a few examples. This provides a complete, working solution that follows the steps outlined above.

Java

import java.util.Arrays;

 

public class AnagramSortExample {

 

    /**

     * Checks if two strings are anagrams using the sorting method.

     * @param str1 The first string.

     * @param str2 The second string.

     * @return true if the strings are anagrams, false otherwise.

     */

    public static boolean areAnagrams(String str1, String str2) {

        // Step 1: Normalize strings (remove non-letters, convert to lowercase)

        String cleanStr1 = str1.replaceAll(“[^a-zA-Z]”, “”).toLowerCase();

        String cleanStr2 = str2.replaceAll(“[^a-zA-Z]”, “”).toLowerCase();

 

        // Step 2: Perform the length check optimization

        if (cleanStr1.length() != cleanStr2.length()) {

            return false;

        }

 

        // Step 3: Convert strings to char arrays

        char[] charArray1 = cleanStr1.toCharArray();

        char[] charArray2 = cleanStr2.toCharArray();

 

        // Step 4: Sort the char arrays

        Arrays.sort(charArray1);

        Arrays.sort(charArray2);

 

        // Step 5: Compare the sorted arrays

        return Arrays.equals(charArray1, charArray2);

    }

 

    /**

     * Main method to test the areAnagrams function.

     */

    public static void main(String[] args) {

        String word1 = “Listen”;

        String word2 = “Silent”;

        

        String phrase1 = “A decimal point”;

        String phrase2 = “I’m a dot in place”;

 

        String word3 = “Hello”;

        String word4 = “World”;

 

        // Test Case 1

        if (areAnagrams(word1, word2)) {

            System.out.println(“‘” + word1 + “‘ and ‘” + word2 + “‘ are anagrams.”);

        } else {

            System.out.println(“‘” + word1 + “‘ and ‘” + word2 + “‘ are not anagrams.”);

        }

 

        // Test Case 2

        if (areAnagrams(phrase1, phrase2)) {

            System.out.println(“‘” + phrase1 + “‘ and ‘” + phrase2 + “‘ are anagrams.”);

        } else {

            System.out.println(“‘” + phrase1 + “‘ and ‘” + phrase2 + “‘ are not anagrams.”);

        }

 

        // Test Case 3

        if (areAnagrams(word3, word4)) {

            System.out.println(“‘” + word3 + “‘ and ‘” + word4 + “‘ are anagrams.”);

        } else {

            System.out.println(“‘” + word3 + “‘ and ‘” + word4 + “‘ are not anagrams.”);

        }

    }

}

 

Line-by-Line Explanation of the Sorting Approach

Let’s dissect the areAnagrams method from the code above. The method signature public static boolean areAnagrams(String str1, String str2) defines a public, static method that can be called without creating an instance of the class. It takes two String inputs and returns a boolean result. The first two lines inside the method are our normalization step. str1.replaceAll(“[^a-zA-Z]”, “”) creates a new string with all non-letters removed. The subsequent .toLowerCase() converts this new string to lowercase. This is done for both str1 and str2.

The next block is the length check: if (cleanStr1.length() != cleanStr2.length()) { return false; }. This is our fast-fail optimization. If the cleaned strings are not the same length, they cannot be anagrams, so we exit the method immediately and return false. This avoids unnecessary processing.

Next, char[] charArray1 = cleanStr1.toCharArray(); and char[] charArray2 = cleanStr2.toCharArray(); perform the crucial conversion from immutable strings to mutable character arrays. These new arrays are created in memory and hold the characters of our cleaned strings. This step is necessary for the sorting to take place.

The lines Arrays.sort(charArray1); and Arrays.sort(charArray2); are the heart of our algorithm. This static method from the java.util.Arrays utility class sorts the arrays in place. After these two lines execute, both charArray1 and charArray2 are in alphabetical order.

Finally, return Arrays.equals(charArray1, charArray2); makes the final comparison. The Arrays.equals() method is another utility that compares two arrays element by element. It returns true if and only if both arrays are of the same length and contain the same elements in the same order. Since we have already sorted our arrays, this check is the definitive test for whether the original strings were anagrams.

Understanding java.util.Arrays.sort

The Arrays.sort(char[] a) method is a highly optimized piece of the Java standard library. It is important to know that this is not a simple bubble sort. In modern Java, this method uses a “dual-pivot quicksort” algorithm. This algorithm is known for being very fast on average, though in the worst case, its performance can degrade. For arrays of objects, it often uses Timsort, a hybrid algorithm. For our purposes, what matters is its performance characteristics.

The time complexity of this sorting algorithm is, on average, O(nlogn), where ‘n’ is the number of elements in the array (the length of the string). This notation, “Big O,” describes how the runtime of the algorithm scales with the size of the input. O(nlogn) is considered highly efficient for a sorting algorithm. This means that even if the string length doubles, the time taken to sort it does not grow exponentially; it grows in a very manageable, near-linear fashion.

The main Method and Testing Our Code

The public static void main(String[] args) method is the entry point for any Java application. We use it here to test our areAnagrams method. We define three pairs of strings. The first pair, “Listen” and “Silent,” should return true. The second pair, “A decimal point” and “I’m a dot in place,” is a more complex phrase-based test. After normalization, both become “adecimalpoint” and “imadotinplace,” which are not anagrams. Wait, “A decimal point” becomes “adecimalpoint”. “I’m a dot in place” becomes “imadotinplace”. Let’s re-check that. “A decimal point” -> [‘a’, ‘d’, ‘e’, ‘c’, ‘i’, ‘m’, ‘a’, ‘l’, ‘p’, ‘o’, ‘i’, ‘n’, ‘t’]. “I’m a dot in place” -> [‘i’, ‘m’, ‘a’, ‘d’, ‘o’, ‘t’, ‘i’, ‘n’, ‘p’, ‘l’, ‘a’, ‘c’, ‘e’]. These are anagrams. My previous analysis was wrong, which shows why testing is vital! The code will correctly identify this.

Let’s trace: “A decimal point” -> adecimalpoint “I’m a dot in place” -> imadotinplace Are these anagrams? Let’s check the letters. adecimalpoint: a-2, c-1, d-1, e-1, i-2, l-1, m-1, n-1, o-1, p-1, t-1. Total 13. imadotinplace: a-2, c-1, d-1, e-1, i-2, l-1, m-1, n-1, o-1, p-1, t-1. Total 13. Yes, they are anagrams. My earlier, manual check was incorrect, and the program will get it right. This is a perfect example of why algorithmic correctness is so important.

The third pair, “Hello” and “World,” are clearly not anagrams and should return false. The if/else blocks simply call our method and print a human-readable result to the console, confirming that our logic works as expected for all three test cases. This demonstrates the importance of testing with simple cases, complex cases (with punctuation and spaces), and clear negative cases.

Time Complexity Analysis of the Sorting Method

Let’s formally analyze the time complexity of our areAnagrams method. Time complexity measures how the runtime of our function scales as the input size ‘n’ (the length of the strings) increases. We will analyze each step, assuming ‘n’ is the length of the longer string.

The normalization step, str1.replaceAll(…).toLowerCase(), involves iterating through the string, which is O(n). Creating the new string also takes O(n) time. The length check, str1.length() != str2.length(), is O(1), meaning it takes constant time, regardless of the string size. Converting the string to a character array, toCharArray(), requires iterating through the string and copying it, which is an O(n) operation.

The most expensive step is Arrays.sort(). As we discussed, this operation has an average-case time complexity of O(nlogn). Since we do this for both arrays, the time is 2⋅O(nlogn), which in Big O notation is simplified to just O(nlogn). Finally, Arrays.equals() must compare each element of the two sorted arrays, which is an O(n) operation.

When we combine all these steps, we add them up: O(n)+O(n)+O(1)+O(n)+O(nlogn)+O(n). The dominant term in this expression is O(nlogn), as it grows faster than all the linear O(n) terms. Therefore, the overall time complexity of the sorting method is O(nlogn).

Space Complexity Analysis

Space complexity measures how much extra memory our algorithm requires, relative to the input size ‘n’. Let’s analyze the memory usage. The input strings, str1 and str2, take O(n) space. Our normalization step, cleanStr1 and cleanStr2, creates new strings. In the worst case (if no characters are removed), these new strings also take O(n) space.

The next step, toCharArray(), creates two new character arrays, charArray1 and charArray2. Each of these arrays has a length of ‘n’ (the length of the cleaned strings). Therefore, this step requires O(n)+O(n), or O(n), of additional space. The Arrays.sort() method, depending on the implementation, might require some additional space (e.g., O(logn) for quicksort’s recursion stack or O(n) for mergesort), but in Java, it often sorts in place, or with minimal extra space.

The dominant factor for our extra memory usage is the creation of the two character arrays. Therefore, the overall space complexity of this method is O(n). This means the amount of extra memory we need grows linearly with the length of the input strings. This is a very reasonable and generally acceptable trade-off.

Pros and Cons of the Sorting Approach

The sorting approach has several clear advantages. Its main benefit is simplicity and readability. The logic is very easy to follow: clean the strings, sort them, and compare them. It uses built-in, standard Java utilities (Arrays.sort, Arrays.equals), which means the code is concise and relies on highly optimized and well-tested library functions. For many developers, this is the most intuitive “first pass” solution to the problem.

However, this method is not the most performant. The time complexity of O(nlogn) is good, but it is not the fastest possible. The sorting step is the bottleneck. For extremely large strings, this bottleneck could become noticeable. If you are in a high-performance scenario, such as a competitive programming challenge where every millisecond counts, you might need to consider a faster O(n) algorithm.

In summary, the sorting method is an excellent, reliable, and highly readable solution. It is perfectly suitable for the vast majority of real-world applications. Its trade-off is a slightly slower runtime in exchange for simplicity and ease of implementation. It is a fantastic tool to have in your developer toolkit.

A More Efficient Approach: Character Frequency Counting

While the sorting method is intuitive and easy to implement, its O(nlogn) time complexity is not the fastest possible solution. The bottleneck is the sorting operation itself. We can achieve a more efficient solution by avoiding sorting altogether. The alternative approach is to use a frequency map. The logic is as follows: if two strings are anagrams, they must contain the exact same characters with the exact same frequency. We can manually count the frequency of each character in the first string, and then compare it to the character counts in the second string.

This method transforms the problem from one of sorting to one of counting. By using an appropriate data structure, we can perform this counting and comparison in linear time, which is O(n). This is asymptotically faster than the O(nlogn) sorting method. For very large strings, this performance difference can be substantial. The most common data structure used for this frequency counting in Java is the HashMap.

This section will provide a complete, in-depth guide to implementing the anagram check using a HashMap. We will explore two popular variations: the first using a single map where we add frequencies from the first string and subtract frequencies from the second, and the second using two separate maps that are compared at the end.

Why Use a HashMap for Anagrams?

A HashMap in Java is a data structure that stores key-value pairs. It is an implementation of the Map interface. The “hash” part of its name refers to how it works internally: it uses a hash function to compute an index in an array for storing each key-value pair. The most important feature of a HashMap is that it provides, on average, O(1) or “constant time” performance for its most basic operations: put (adding or updating a pair) and get (retrieving a value by its key).

This O(1) performance is exactly what we need for an efficient frequency counter. We can use the characters of the string as the keys in our map (e.g., Character ‘a’, ‘b’, ‘c’). We can use integers as the values to store the count of how many times that character has appeared (e.g., Integer 1, 2, 3). As we iterate through our string, we can update this map. If we see an ‘a’, we “get” the current count for ‘a’ and “put” a new value of count + 1. Because these operations are O(1), we can process the entire string in O(n) time.

The Logic: Building a Single Frequency Map

One of the most elegant and efficient HashMap-based solutions uses a single map. The algorithm works in three phases. First, we create an empty HashMap<Character, Integer>. Second, we iterate through every character of the first normalized string. For each character, we add it to the map, incrementing its count. If the character is not in the map, we add it with a count of 1. After this phase, the map holds a perfect frequency count of the first string.

In the third phase, we iterate through every character of the second normalized string. For each character, we decrement its count in the map. If we encounter a character in the second string that is not in the map at all, or if its count in the map is already zero, we know the strings cannot be anagrams, and we can return false immediately. If we successfully complete the iteration of the second string, one final check is needed. We must ensure all counts in the map are now zero. If any count is positive, it means the first string had extra characters.

However, a simpler variation of this third phase exists. If we have already confirmed both normalized strings are the same length, we do not need the final check. If the lengths are identical, and we successfully decrement the count for every character in the second string without ever going below zero, we can be mathematically certain that all counts will be exactly zero at the end. This simplifies the logic significantly.

Step-by-Step: Implementing the HashMap Method

Let’s refine the single-map approach, assuming we have already performed the normalization and length checks. First, create a single HashMap<Character, Integer> called frequencyMap. Second, loop through the char[] of the first string. For each character, put it into the map, incrementing its value. Use frequencyMap.getOrDefault(char, 0) + 1 for a clean implementation.

Third, loop through the char[] of the second string. For each character c, check if it is present in the map and has a count greater than zero. We can do this by checking frequencyMap.getOrDefault(c, 0). If this value is 0, it means the second string has a character that the first string did not, or it has more of that character than the first string. In this case, we return false immediately. If the count is greater than 0, we update the map by putting a new value of frequencyMap.get(c) – 1.

If this second loop completes without ever returning false, it means every character in the second string had a corresponding character in the first. And because our initial length check ensured the strings were the same length, we know there are no leftover characters. Therefore, if the loop finishes, we can confidently return true.

Code Walkthrough: Anagrams Using a Single HashMap

Here is the full Java class demonstrating the single HashMap (frequency map) method. It includes the same normalization and length checks as the sorting example, but it replaces the sorting logic with our new frequency counting algorithm.

Java

import java.util.HashMap;

import java.util.Map;

 

public class AnagramHashMapExample {

 

    /**

     * Checks if two strings are anagrams using a single HashMap.

     * @param str1 The first string.

     * @param str2 The second string.

     * @return true if the strings are anagrams, false otherwise.

     */

    public static boolean areAnagrams(String str1, String str2) {

        // Step 1: Normalize strings

        String cleanStr1 = str1.replaceAll(“[^a-zA-Z]”, “”).toLowerCase();

        String cleanStr2 = str2.replaceAll(“[^a-zA-Z]”, “”).toLowerCase();

 

        // Step 2: Perform the length check optimization

        if (cleanStr1.length() != cleanStr2.length()) {

            return false;

        }

 

        // Step 3: Create the frequency map

        Map<Character, Integer> frequencyMap = new HashMap<>();

 

        // Step 4: Populate the map with characters from the first string

        for (char c : cleanStr1.toCharArray()) {

            frequencyMap.put(c, frequencyMap.getOrDefault(c, 0) + 1);

        }

 

        // Step 5: Decrement counts using characters from the second string

        for (char c : cleanStr2.toCharArray()) {

            // Get the current count of this character

            int count = frequencyMap.getOrDefault(c, 0);

 

            // If the character is not in the map or its count is already 0,

            // they are not anagrams.

            if (count == 0) {

                return false;

            }

 

            // Otherwise, decrement the count and put it back in the map

            frequencyMap.put(c, count – 1);

        }

 

        // Step 6: If the loop finishes, they are anagrams

        return true;

    }

 

    /**

     * Main method to test the areAnagrams function.

     */

    public static void main(String[] args) {

        String word1 = “Listen”;

        String word2 = “Silent”;

        

        String phrase1 = “A decimal point”;

        String phrase2 = “I’m a dot in place”;

 

        String word3 = “anagram”;

        String word4 = “mangaar”; // Not an anagram

 

        // Test Case 1

        System.out.println(“‘” + word1 + “‘ and ‘” + word2 + “‘ are anagrams: ” 

                            + areAnagrams(word1, word2));

 

        // Test Case 2

        System.out.println(“‘” + phrase1 + “‘ and ‘” + phrase2 + “‘ are anagrams: ” 

                            + areAnagrams(phrase1, phrase2));

                            

        // Test Case 3

        System.out.println(“‘” + word3 + “‘ and ‘” + word4 + “‘ are anagrams: ” 

                            + areAnagrams(word3, word4));

    }

}

 

Line-by-Line Explanation of the HashMap Approach

Let’s analyze the areAnagrams method. The first two steps, normalization and the length check, are identical to our sorting method. They are a universal prerequisite for any anagram algorithm. The real logic begins at Step 3, Map<Character, Integer> frequencyMap = new HashMap<>();. This line declares a new, empty HashMap that will use Character objects as keys and Integer objects as values.

Step 4 is the first loop: for (char c : cleanStr1.toCharArray()). This is an enhanced for-loop that iterates over every character in the array returned by cleanStr1.toCharArray(). Inside the loop, the line frequencyMap.put(c, frequencyMap.getOrDefault(c, 0) + 1); does all the work. It says: “Get the current count for character c. If c doesn’t exist, default to 0. Then, add 1 to that count. Finally, put this new count back into the map associated with key c.”

Step 5 is the second loop, which iterates over cleanStr2. Inside, int count = frequencyMap.getOrDefault(c, 0); fetches the current count for character c. The if (count == 0) block is our critical check. If the count is 0, it means cleanStr2 has a character that cleanStr1 didn’t have (it was never in the map, so getOrDefault returned 0), or it has more of that character than cleanStr1 (we already decremented it to 0 on a previous pass). In either case, we return false.

If the count is not 0, we proceed to frequencyMap.put(c, count – 1);, which decrements the count in the map. Finally, if the second loop completes without ever hitting a count == 0 condition, we reach return true;. Because we passed the initial length check, we are guaranteed that if the loop finishes, all counts in the map must be exactly 0, and the strings are anagrams.

Using getOrDefault for Cleaner Code

The getOrDefault(key, defaultValue) method, which we used in our code, is a very convenient part of the Map interface. Without it, our logic for populating the map in Step 4 would be much clunkier. We would have to write something like this: if (frequencyMap.containsKey(c)) { frequencyMap.put(c, frequencyMap.get(c) + 1); } else { frequencyMap.put(c, 1); }. This if/else block is verbose and harder to read.

The getOrDefault method elegantly handles this “check if present, otherwise use a default” logic in a single, expressive line. It makes the code cleaner and less prone to errors. It is the standard, modern Java practice for populating a frequency map. Similarly, using it in the second loop, int count = frequencyMap.getOrDefault(c, 0);, gracefully handles the case where cleanStr2 contains a character that was never in cleanStr1 at all. In that scenario, getOrDefault returns 0, the if (count == 0) check is triggered, and the method correctly returns false.

Time Complexity of the HashMap Method

Now let’s analyze the time complexity of this new method. As before, ‘n’ will be the length of the strings. The normalization and length check steps are still O(n) and O(1), respectively. The first loop, which populates the frequencyMap, iterates through cleanStr1. It runs ‘n’ times. Inside the loop, the put and getOrDefault operations are, on average, O(1) constant time. Therefore, the total time for this loop is n⋅O(1), which is O(n).

The second loop, which checks cleanStr2, also runs ‘n’ times. Inside this loop, getOrDefault is O(1) and put is O(1). Therefore, the total time for this second loop is also O(n). When we add all the steps together, we have O(n)+O(1)+O(n)+O(n). The dominant term is O(n). Thus, the overall time complexity of the HashMap method is O(n). This is a significant improvement over the sorting method’s O(nlogn).

It is important to note this is the average-case complexity. In the rare, pathological worst-case scenario where all keys in the HashMap have the same hash code, the operations can degrade to O(n). This would make our algorithm O(n2). However, with modern Java’s HashMap implementation (which turns long buckets into trees), this is extremely unlikely, and O(n) is the accepted and practical time complexity.

Space Complexity of the HashMap Method

Finally, let’s analyze the space complexity. The normalization steps still create new strings, taking O(n) space. The core of this algorithm is the frequencyMap. How much space does this map take? In the worst case, every single character in the string is unique. For example, the string “abcdefg”. In this case, the map will store ‘n’ key-value pairs. Therefore, the space required for the map is O(n).

However, we can also think of this in a different way. The number of possible keys in the map is finite. If we are only dealing with, for example, the 26 lowercase English letters, the map will never grow larger than 26 entries, regardless of whether the string is 100 or 1 billion characters long. In this context, we can say the space complexity for the map is O(k), where ‘k’ is the size of the character set (e.g., k=26 for English, k=128 for ASCII, k=… for Unicode).

If ‘k’ is a fixed constant, we can simplify O(k) to just O(1), meaning the map takes constant space. So, is the space complexity O(n) (for the cleaned strings) or O(1) (for the map)? In complexity analysis, we typically refer to the “auxiliary” space, or the extra space beyond the input. The map is the primary auxiliary data structure. If we assume a fixed character set, the space complexity is O(1) for the map, which is a major improvement over the O(n) space required for the sorting method’s character arrays.

When to Choose HashMap Over Sorting

So, when should you use this method? The HashMap approach is algorithmically superior in terms of time complexity (O(n) vs O(nlogn)). Therefore, in any situation where performance is critical and you are dealing with very large strings, the HashMap method is the clear winner. It will outperform the sorting method significantly as the input size grows. This makes it a preferred solution in technical interviews and competitive programming, as it demonstrates a deeper understanding of data structures and algorithms.

The trade-off is a slight increase in code complexity. The sorting method’s logic is arguably more straightforward for a beginner to understand. The HashMap approach, with its two loops and getOrDefault logic, requires a solid grasp of how Map objects work. However, this complexity is minimal, and the performance gains are substantial. For most modern Java development, the HashMap method is the preferred, more efficient, and more professional solution to the anagram problem.

Variations in Frequency Counting

The single HashMap approach, where we add frequencies from the first string and subtract them for the second, is a highly efficient and elegant solution. However, it is not the only way to solve the anagram problem using frequency counting. There are other valid techniques, each with its own set of trade-offs. Exploring these alternatives is valuable for a Java developer, as different scenarios may call for different solutions.

One popular alternative involves using two separate HashMap objects. Another, which is a significant optimization for specific use cases, ditches the HashMap entirely and uses a simple array. This section will provide a deep dive into these two alternative methods. We will explore the “two-map” comparison method and the “array as a frequency map” method, complete with code examples, explanations, and performance analysis. Understanding these variations will give you a more complete and flexible toolkit for solving this and similar problems.

The Two-Map Comparison Method

This approach is conceptually very simple, which is one of its main advantages. Instead of using one map and a complex add/subtract logic, this method builds two separate frequency maps. The first map, frequencyMapA, is built by iterating through the first normalized string and counting its characters. The second map, frequencyMapB, is built by iterating through the second normalized string and counting its characters.

After both maps are fully populated, the problem is reduced to a single question: are these two maps identical? If the two strings are anagrams, they will have the exact same character frequencies, and therefore the two HashMap objects that store these frequencies should be perfectly equal. Fortunately, the Map interface in Java comes with a built-in .equals() method that does exactly this. This makes the final step of the algorithm a simple, one-line comparison.

Code Walkthrough: Comparing Two HashMaps

Here is a full Java program that implements the two-map comparison method. It reuses the same main method for testing but provides a new areAnagrams implementation. A helper method, buildFrequencyMap, is also created to keep the code clean and avoid repetition, which is a good programming practice.

Java

import java.util.HashMap;

import java.util.Map;

 

public class AnagramTwoMapExample {

 

    /**

     * Builds a frequency map for a given string.

     * @param str The string to analyze.

     * @return A Map with characters as keys and their frequencies as values.

     */

    private static Map<Character, Integer> buildFrequencyMap(String str) {

        Map<Character, Integer> frequencyMap = new HashMap<>();

        // Populate frequency map

        for (char c : str.toCharArray()) {

            frequencyMap.put(c, frequencyMap.getOrDefault(c, 0) + 1);

        }

        return frequencyMap;

    }

 

    /**

     * Checks if two strings are anagrams using two HashMaps.

     * @param str1 The first string.

     * @param str2 The second string.

     * @return true if the strings are anagrams, false otherwise.

     */

    public static boolean areAnagrams(String str1, String str2) {

        // Step 1: Normalize strings

        String cleanStr1 = str1.replaceAll(“[^a-zA-Z]”, “”).toLowerCase();

        String cleanStr2 = str2.replaceAll(“[^a-zA-Z]”, “”).toLowerCase();

 

        // Step 2: Perform the length check optimization

        if (cleanStr1.length() != cleanStr2.length()) {

            return false;

        }

 

        // Step 3: Create frequency maps for each string

        Map<Character, Integer> frequencyMap1 = buildFrequencyMap(cleanStr1);

        Map<Character, Integer> frequencyMap2 = buildFrequencyMap(cleanStr2);

 

        // Step 4: Compare the two frequency maps

        return frequencyMap1.equals(frequencyMap2);

    }

    

    /**

     * Main method to test the areAnagrams function.

     */

    public static void main(String[] args) {

        String word1 = “Listen”;

        String word2 = “Silent”;

        String word3 = “Hello”;

        String word4 = “World”;

 

        System.out.println(“‘” + word1 + “‘ and ‘” + word2 + “‘ are anagrams: ” 

                            + areAnagrams(word1, word2));

        System.out.println(“‘” + word3 + “‘ and ‘” + word4 + “‘ are anagrams: ” 

                            + areAnagrams(word3, word4));

    }

}

 

Understanding the equals Method for HashMaps

The logic in the code is straightforward. We normalize and check lengths as always. Then, we call our buildFrequencyMap helper method twice to create frequencyMap1 and frequencyMap2. The real magic happens in the final line: return frequencyMap1.equals(frequencyMap2);. The equals() method for a HashMap is very specific. It returns true if and only if the other object is also a Map, both maps have the same size, and both maps contain the same set of key-value pairs.

This means it does not just check if the keys are the same; it also checks if the value associated with each key is the same. For our anagram problem, this is perfect. If “listen” creates a map {‘l’:1, ‘i’:1, …} and “silent” creates an identical map, equals() will return true. If “hello” and “helllo” are compared, their maps ({‘h’:1, ‘e’:1, ‘l’:2, ‘o’:1} and {‘h’:1, ‘e’:1, ‘l’:3, ‘o’:1}) will not be equal, and the method will correctly return false.

The time complexity of this two-map approach is still O(n). Building the first map is O(n), and building the second map is also O(n). The Map.equals() comparison, in the worst case, must iterate through all ‘k’ entries in the first map and check them against the second map. This takes O(k) time, where ‘k’ is the number of unique characters. Since k is less than or equal to n, the total time is O(n)+O(n)+O(k), which simplifies to O(n). The space complexity is O(k) for each map, for a total of O(k).

The Array as a Frequency Map: An Optimization

The HashMap approach is fantastic, but it does have some overhead. It uses Character and Integer objects, which creates memory overhead compared to primitive types. The hashing function, while fast, is still a computation. If we know, for a fact, that our input strings will only contain the 26 lowercase English letters, we can use a much simpler and faster data structure: a basic array.

We can create an integer array of size 26, int[] frequencyMap = new int[26];. Each index in this array will correspond to a letter of the alphabet. Index 0 will represent ‘a’, index 1 will represent ‘b’, index 2 will represent ‘c’, and so on. As we iterate through our strings, we can use a simple character arithmetic trick to find the correct index: int index = character – ‘a’;. For example, if the character is ‘c’, the calculation ‘c’ – ‘a’ gives us 2, which is the correct index for ‘c’.

This array-based approach is lightning fast. Accessing or updating an array index is a true O(1) operation, generally faster than a HashMap’s put/get. It also uses significantly less memory, as it is just a simple array of 26 primitive integers, with no object overhead. This is the most optimal solution, if the problem constraints allow for it.

Implementing an Array-Based Frequency Counter

The logic for an array-based counter is very similar to the single-map method. We first create our int[26] array, which is automatically initialized to all zeros. Then, we loop through the first string. For each character, we calculate its index (int index = c – ‘a’;) and then increment the count at that index (frequencyMap[index]++;).

After the first loop, the array holds the frequency count for the first string. Next, we loop through the second string. For each character, we calculate its index and decrement the count (frequencyMap[index]–;). Just as with the single-map method, we must add a check. If, after decrementing, the count at that index becomes negative (if (frequencyMap[index] < 0)), it means the second string has more of that character than the first. We can immediately return false.

If the second loop completes without any count ever going below zero, and we have already passed our initial length check, we are guaranteed that the strings are anagrams. We can return true. This method is extremely efficient.

Code Walkthrough: Array-Based Solution

Here is the code for the array-based frequency counter. Note that for this to work, our normalization step must be precise. We must only be left with lowercase letters ‘a’ through ‘z’.

Java

public class AnagramArrayExample {

 

    /**

     * Checks if two strings are anagrams using an array as a frequency map.

     * Assumes strings only contain lowercase English letters after normalization.

     * @param str1 The first string.

     * @param str2 The second string.

     * @return true if the strings are anagrams, false otherwise.

     */

    public static boolean areAnagrams(String str1, String str2) {

        // Step 1: Normalize strings (ONLY lowercase letters a-z)

        String cleanStr1 = str1.replaceAll(“[^a-zA-Z]”, “”).toLowerCase();

        String cleanStr2 = str2.replaceAll(“[^a-zA-Z]”, “”).toLowerCase();

 

        // Step 2: Perform the length check optimization

        if (cleanStr1.length() != cleanStr2.length()) {

            return false;

        }

 

        // Step 3: Create the frequency map (array of size 26)

        int[] frequencyMap = new int[26]; // 26 letters in English alphabet

 

        // Step 4: Populate the map with characters from the first string

        for (char c : cleanStr1.toCharArray()) {

            int index = c – ‘a’;

            frequencyMap[index]++;

        }

 

        // Step 5: Decrement counts using characters from the second string

        for (char c : cleanStr2.toCharArray()) {

            int index = c – ‘a’;

            frequencyMap[index]–;

 

            // If a count goes below zero, they are not anagrams

            if (frequencyMap[index] < 0) {

                return false;

            }

        }

 

        // Step 6: If the loop finishes, they are anagrams

        return true;

    }

 

    /**

     * Main method to test the areAnagrams function.

     */

    public static void main(String[] args) {

        String word1 = “Listen”;

        String word2 = “Silent”;

        String word3 = “Hello”;

        String word4 = “World”;

 

        System.out.println(“‘” + word1 + “‘ and ‘” + word2 + “‘ are anagrams: ” 

                            + areAnagrams(word1, word2));

        System.out.println(“‘” + word3 + “‘ and ‘” + word4 + “‘ are anagrams: ” 

                            + areAnagrams(word3, word4));

    }

}

 

Handling Unicode and Extended Character Sets

The array-based approach has one major weakness: it assumes a small, fixed character set. Our int[26] array works perfectly for English. But what if the strings are “crème” and “crêem”? Our regex [^a-zA-Z] would strip out the ‘è’ and ‘ê’, and the program would incorrectly report them as anagrams (“crme” and “crem”). Or what if the input is in a different language, like Russian or Japanese? Our array of 26 is completely useless.

This is the primary limitation of the array-based approach. We could expand the array. For example, we could use an array of size 128 to handle the full standard ASCII character set. We could even use an array of size 65,536 to handle a large portion of the Unicode character set, but this becomes very memory-inefficient. Most of that giant array would be empty (all zeros).

Performance Comparison: Array vs. HashMap

In terms of time complexity, the array-based method is still O(n). Normalization is O(n), the first loop is O(n), and the second loop is O(n). The total is O(n). However, the constant factor is much lower. The array index operations are faster than the HashMap’s put/get operations. So, while both are O(n), the array method will be faster in practice.

In terms of space complexity, the array method is a clear winner. It uses O(1) space. The int[26] (or int[128]) is a fixed-size data structure. Its size does not depend on the length ‘n’ of the input string. This is a significant improvement over the sorting method’s O(n) space and even the HashMap’s O(k) or O(n) space, as it avoids object overhead completely.

Limitations and When to Use Each Method

Here is a summary of our three methods and when to use them. The Sorting Method (O(nlogn) time, O(n) space) is the most intuitive and easy to read. It is perfectly fine for non-critical applications or small strings. Its main downside is its slower time complexity.

The HashMap Method (O(n) time, O(k) space) is the most flexible and robust. It handles all character sets, including full Unicode, without any modification. It is highly performant and is the best “general purpose” solution. Its only downside is the minor overhead of using a HashMap object.

The Array Method (O(n) time, O(1) space) is the fastest and most memory-efficient solution. It is the absolute best choice in performance-critical scenarios if and only if you can guarantee that the input strings are limited to a small, known character set like ASCII or just lowercase English. Its major limitation is its inflexibility in handling other languages or Unicode characters.

Bringing Anagram Detection to the User

So far, we have written our anagram logic in a self-contained way, using “hardcoded” strings in our main method to test it. This is excellent for development and testing, but it is not a useful application. To make our program practical, we need to allow a user to enter their own strings and get a result in real-time. This requires us to learn how to handle standard input from the console.

In Java, the primary tool for reading user input from the console is the Scanner class. This class is part of the java.util package and provides many convenient methods for parsing user input into different data types, such as integers, doubles, and, most importantly for us, strings. By integrating the Scanner class with our existing areAnagrams method, we can build a complete, interactive program.

This section will guide you through the process of creating this interactive application. We will introduce the Scanner class, explain how to set it up, show the full program code, and discuss best practices, such as “consuming” the newline character and properly closing the scanner to prevent resource leaks. This will bridge the gap from a simple algorithm to a functional Java application.

Introduction to the java.util.Scanner Class

The Scanner class was introduced in Java 5 to provide a simple way to parse primitive types and strings from a variety of input sources. It can read from a File, an InputStream, or a String. For our purposes, we will attach it to the standard input stream, System.in. System.in is an InputStream that is connected to the keyboard by default. By wrapping System.in with a Scanner, we gain access to its helpful methods.

The Scanner breaks its input into “tokens” using a delimiter pattern. By default, the delimiter is whitespace. This means it is very good at reading individual words. The method scanner.next() will read and return the next token (word) from the input. However, for our anagram checker, we want to allow for entire phrases, like “I’m a dot in place.” If we use scanner.next(), it will only read the first word, “I’m”.

To solve this, we must use the scanner.nextLine() method. This method reads the entire line of input, from the current position until the user presses the Enter key (the newline character). This allows us to capture complete phrases that include spaces, making our program much more robust and useful for testing complex anagrams.

Setting Up the Scanner for User Input

Using the Scanner is a straightforward, two-step process. First, you must import it into your Java file, as it is not part of the default java.lang package. This is done with the line import java.util.Scanner; at the very top of your file. Second, you must create an instance of the Scanner class in your main method, telling it to read from System.in. This is done with the line Scanner scanner = new Scanner(System.in);.

Once this scanner object is created, we can use it to prompt the user and read their input. A good practice is to print a clear prompt to the console using System.out.print() so the user knows what they are supposed to do. For example, System.out.print(“Enter the first string: “);. After this line, we call String word1 = scanner.nextLine(); to capture their input. We then repeat this process for the second string.

Full Program: Anagram Checker with User Input

Here is the full program. We will use the efficient HashMap-based areAnagrams method from Part 3, as it is the best general-purpose solution. The main method has been completely rewritten to incorporate the Scanner for interactive input.

Java

import java.util.HashMap;

import java.util.Map;

import java.util.Scanner; // Import the Scanner class

 

public class AnagramInteractive {

 

    /**

     * Checks if two strings are anagrams using a single HashMap.

     * @param str1 The first string.

     * @param str2 The second string.

     * @return true if the strings are anagrams, false otherwise.

     */

    public static boolean areAnagrams(String str1, String str2) {

        // Step 1: Normalize strings

        String cleanStr1 = str1.replaceAll(“[^a-zA-Z]”, “”).toLowerCase();

        String cleanStr2 = str2.replaceAll(“[^a-zA-Z]”, “”).toLowerCase();

 

        // Step 2: Perform the length check optimization

        if (cleanStr1.length() != cleanStr2.length()) {

            return false;

        }

 

        // Step 3: Create the frequency map

        Map<Character, Integer> frequencyMap = new HashMap<>();

 

        // Step 4: Populate the map with characters from the first string

        for (char c : cleanStr1.toCharArray()) {

            frequencyMap.put(c, frequencyMap.getOrDefault(c, 0) + 1);

        }

 

        // Step 5: Decrement counts using characters from the second string

        for (char c : cleanStr2.toCharArray()) {

            int count = frequencyMap.getOrDefault(c, 0);

            if (count == 0) {

                return false;

            }

            frequencyMap.put(c, count – 1);

        }

 

        // Step 6: If the loop finishes, they are anagrams

        return true;

    }

 

    /**

     * Main method to interactively check for anagrams.

     */

    public static void main(String[] args) {

        // Step 1: Create a Scanner object to take user input

        Scanner scanner = new Scanner(System.in);

 

        // Step 2: Prompt for and read the first string

        System.out.print(“Enter the first string: “);

        String word1 = scanner.nextLine();

 

        // Step 3: Prompt for and read the second string

        System.out.print(“Enter the second string: “);

        String word2 = scanner.nextLine();

 

        // Step 4: Close the scanner to avoid resource leaks

        scanner.close();

 

        // Step 5: Check if the input strings are anagrams

        if (areAnagrams(word1, word2)) {

            System.out.println(“Result: ‘” + word1 + “‘ and ‘” + word2 + “‘ ARE anagrams.”);

        } else {

            System.out.println(“Result: ‘” + word1 + “‘ and ‘” + word2 + “‘ are NOT anagrams.”);

        }

    }

}

 

Code Walkthrough: Integrating Scanner

Let’s look closely at the new main method. The first line, Scanner scanner = new Scanner(System.in);, creates our Scanner object. The next two lines, System.out.print(…) and String word1 = scanner.nextLine();, work together. The print statement displays a prompt for the user. The program then pauses and waits at the nextLine() call. The user types their input and presses Enter. The nextLine() method captures everything they typed and stores it as a String in the word1 variable.

This process is repeated exactly for word2. At this point, we have two String variables, word1 and word2, that contain the user’s input. The line scanner.close(); is a crucial best practice, which we will discuss next. Finally, the line if (areAnagrams(word1, word2)) simply passes the user-provided strings to the same areAnagrams method we have already built and tested. The logic inside areAnagrams does not need to change at all. It is perfectly decoupled from the input source, which is a hallmark of good program design.

Closing the Scanner: A Best Practice

The line scanner.close(); is very important. The Scanner is a resource that “wraps” an underlying input stream (System.in). In Java, it is critical to close any resources you open, such as files, network connections, or streams. If you do not close them, you can create a “resource leak.” This means the resource remains open in the background, consuming system memory and resources, even after your program is done with it. In a small program like this, it might not cause a problem. But in a large, long-running server application, resource leaks can accumulate and eventually crash the entire system.

By calling scanner.close(), we are telling the Scanner that we are finished with it. It, in turn, closes the underlying System.in stream. It is a best practice to close the resource as soon as you are done with it. In our program, this is right after we have received both word1 and word2 and before we do our final processing. Getting into this habit will make you a better, more reliable Java developer.

A more modern and even safer way to handle this is with a “try-with-resources” block, which automatically closes the resource. We could have written our main method as: try (Scanner scanner = new Scanner(System.in)) { … our code … }. When the try block finishes, the scanner would be closed automatically. For a simple script like this, scanner.close() is perfectly clear.

Basic Input Validation and Error Handling

Our current program is functional, but it is not very robust. What happens if the user just presses Enter at one of the prompts? They will have entered an empty string. Our areAnagrams method will receive “” and “”. The normalization step will result in two empty strings. The length check (0 != 0) will pass (false). The loops will not run. The method will return true. The program will print that “” and “” are anagrams. This is technically correct but not very helpful.

A more robust program would add some basic input validation. After getting word1 and word2, we could check if they are empty or null. For example: if (word1 == null || word1.trim().isEmpty() || word2 == null || word2.trim().isEmpty()) { System.out.println(“Error: Both strings must contain text.”); } else { … run the check … }. The .trim() method removes any leading or trailing whitespace, and .isEmpty() checks if the resulting string has a length of 0.

This is a simple form of “defensive programming.” We are anticipating potential “bad” user input and handling it gracefully instead of letting it produce a strange result. For a real-world application, this validation would be much more extensive, but adding a simple empty-string check makes our program significantly more user-friendly and professional.

Creating a Reusable Anagram Utility Class

As our program grows, we might find that we want to check for anagrams in different parts of our application. Right now, our areAnagrams method is static and lives inside the AnagramInteractive class, which also has the main method. A cleaner design, especially for larger projects, is to create a dedicated “utility class” for our anagram logic.

We could create a new file called AnagramUtils.java. This class would not have a main method. It might look like this:

Java

public class AnagramUtils {

    // Make the constructor private so it cannot be instantiated

    private AnagramUtils() {}

 

    // Our public, static utility method

    public static boolean areAnagrams(String str1, String str2) {

        // … all the HashMap logic …

    }

}

 

By making the constructor private, we prevent anyone from creating an new AnagramUtils() object. This is a common pattern for utility classes that only contain static methods.

Then, in our AnagramInteractive class, we would simply call the method from our utility class: if (AnagramUtils.areAnagrams(word1, word2)) { … }. This is a clean separation of concerns. The AnagramInteractive class is responsible for input and output (talking to the user). The AnagramUtils class is responsible for business logic (the algorithm itself). This makes our code more modular, easier to test, and easier to reuse in the future.

Beyond Two Strings: Advanced Anagram Challenges

We have now thoroughly mastered the problem of checking if two strings are anagrams of each other. We have explored three different algorithms, analyzed their performance, and even built an interactive application. However, in the world of computer science and technical interviews, this is often just the starting point. The anagram concept can be extended to solve more complex and interesting problems.

Two of the most common advanced anagram challenges are: “Given a list of words, group all the words that are anagrams of each other,” and “Given a word, find all possible anagrams (permutations) of that word.” These problems require us to build upon our existing knowledge and apply it in a new way. They test not only our understanding of the core concept but also our ability to use data structures like HashMap and List to manage collections of data. This section will dive deep into these advanced problems and their solutions.

Common Problem: Grouping Anagrams from a List

This is a classic programming challenge. You are given an array or a List of strings, for example: [“eat”, “tea”, “tan”, “ate”, “nat”, “bat”]. The task is to write a method that returns a List of Lists, where each inner list contains a group of words that are anagrams of each other. For the example input, the correct output would be [[“eat”, “tea”, “ate”], [“tan”, “nat”], [“bat”]]. The order of the groups or the words within the groups does not matter.

To solve this, we need a way to “tag” or “label” all anagrams with the same identifier. How can we find a unique key for all the words in the “eat”, “tea”, “ate” group? The answer lies in our sorting method from Part 2. If we sort the characters of “eat”, we get “aet”. If we sort “tea”, we get “aet”. If we sort “ate”, we get “aet”. This sorted string, “aet”, is a perfect “canonical key” for this entire anagram group.

The Solution: Using a HashMap with a Canonical Key

The solution to the grouping problem becomes clear. We can use a HashMap. The key of our map will be the canonical, sorted string (e.g., “aet”). The value of our map will be a List<String> that holds all the original words that map to that key.

We will iterate through our input list of words one by one. For each word, we will:

  1. Convert it to a character array.
  2. Sort the character array.
  3. Convert the sorted character array back into a new string. This is our canonical key.
  4. We will then check our HashMap. If this key does not exist yet, we will create a new, empty List<String> and put it in the map with that key.
  5. We will then get the List<String> associated with that key and add the original word to it.

After iterating through all the words, our HashMap will be perfectly structured. The keys will be the unique sorted forms (“aet”, “ant”, “abt”), and the values will be the lists of original words [“eat”, “tea”, “ate”], [“tan”, “nat”], and [“bat”]. Finally, we can just return a new List containing all the values from the map.

Step-by-Step: Implementing an Anagram Grouper

Let’s refine the steps. Our method will accept a List<String> called words and will return a List<List<String>>.

  1. Create the result map: Map<String, List<String>> anagramGroups = new HashMap<>();.
  2. Start a loop: for (String word : words) { … }.
  3. Inside the loop, create the key: char[] charArray = word.toCharArray();.
  4. Sort the key: Arrays.sort(charArray);.
  5. Convert back to string: String sortedKey = new String(charArray);.
  6. Check if the key is in the map. The computeIfAbsent method is perfect for this: anagramGroups.computeIfAbsent(sortedKey, k -> new ArrayList<>()).add(word);.
  7. This single line does steps 4 and 5 from the previous section. It says: “Look for sortedKey. If it’s not there (absent), run this lambda function (k -> new ArrayList<>()) to create a new, empty list and put it there. Then, return the list (either the one that was just created or the one that was already there). Finally, .add(word) to that list.”
  8. After the loop finishes, our map is full.
  9. Return the map’s values: return new ArrayList<>(anagramGroups.values());.

Final Thoughts

From a simple definition to complex grouping algorithms, the anagram problem is a rich and valuable subject for any Java developer. We have seen how a single problem can be solved in multiple ways, each with distinct trade-offs in performance, space, and readability. We explored the simple and intuitive Arrays.sort method, the flexible and robust HashMap method, and the lightning-fast int[] array method.

We also saw how to make our code practical by using Scanner to accept user input and how to structure our program into reusable utility classes. Finally, we expanded the core concept to solve advanced challenges like grouping anagrams and generating permutations. Mastering these concepts provides you with more than just the solution to one problem. It equips you with a powerful set of tools and a way of thinking algorithmically that will benefit you throughout your entire programming career.