Longest Substring Without Repeating Characters

Photo by Trent Erwin on Unsplash

Longest Substring Without Repeating Characters

Leetcode problem - No.3

Given a string s, find the length of the longest substring without repeating characters.

Example 1:

Input: s = "abcabcbb"
Output: 3
Explanation: The answer is "abc", with the length of 3.

Example 2:

Input: s = "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.

Example 3:

Input: s = "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3.
Notice that the answer must be a substring, "pwke" is a subsequence and not a substring.

Constraints:

  • 0 <= s.length <= 5 * 10<sup>4</sup>

  • s consists of English letters, digits, symbols and spaces.

Solution

This code solves the problem using a sliding window approach with two pointers: leftPointer and rightPointer.

public class Solution {
    public int LengthOfLongestSubstring(string s) {
        HashSet<char> charSet = new HashSet<char>();
        int leftPointer = 0;
        int rightPointer = 0;
        int maxLength = 0;

        while (rightPointer < s.Length)
        {
            if (!charSet.Contains(s[rightPointer]))
            {
                charSet.Add(s[rightPointer]);
                maxLength = Math.Max(maxLength, rightPointer - leftPointer + 1);
                rightPointer++;
            }
            else
            {
                charSet.Remove(s[leftPointer]);
                leftPointer++;
            }
        }
        return maxLength;
    }
}

Code Breakdown:

  1. Initialization:

    • HashSet<char> charSet: This set stores unique characters encountered so far in the current substring. (Explanation for HashSet could be found below)

    • leftPointer: Starts at the beginning of the string (leftPointer = 0).

    • rightPointer: Starts at the beginning of the string (rightPointer = 0).

    • maxLength: Stores the length of the longest substring found so far, initialized to 0.

  2. Loop:

    • While rightPointer is within the string's boundaries (rightPointer < s.Length):

      • Case 1: No duplicate character:

        • If the current character (s[rightPointer]) is not present in the charSet:

          • Add the character to the charSet.

          • Update maxLength if the current substring length (rightPointer - leftPointer + 1) is greater than the previous maximum.

          • Move rightPointer forward to explore the next character.

      • Case 2: Duplicate character:

        • If the current character (s[rightPointer]) is already in the charSet:

          • Remove the character from the charSet, shrinking the current substring.

          • Move leftPointer forward to exclude the removed character.

  3. Return:

    • After the loop completes, maxLength holds the length of the longest substring without repeating characters.

    • Return maxLength.

Example:

Consider the string abcabcbb.

  • Initially, maxLength = 0, leftPointer = 0, rightPointer = 0, and charSet = {}.

  • As rightPointer moves, charSet keeps track of unique characters:

    • rightPointer = 1: charSet = {a} (update maxLength to 1)

    • rightPointer = 2: charSet = {a, b} (update maxLength to 2)

    • rightPointer = 3: charSet = {a, b, c} (update maxLength to 3)

  • However, at rightPointer = 4, c is already in charSet:

    • charSet = {a, b} again

    • leftPointer moves to 1, excluding the first c

  • The process continues, updating maxLength and adjusting charSet as needed.

  • Finally, maxLength = 3 is returned, representing the substring "abc".

Key Points:

  • The charSet helps keep track of unique characters within the current sliding window.

  • The sliding window expands by adding new characters and shrinks by removing duplicates.

  • The maxLength variable is updated whenever a longer substring is found.

Explanation about HashSet

In computer science, a hash set is a data structure that stores a collection of items. Unlike regular sets, it uses a hashing function to efficiently check if an item is already present in the set. Here's a breakdown of how it works:

How it works:

  1. Hashing: Each item in the set is associated with a unique key called a hash. The hash is generated by a hashing function, which is a special algorithm that takes an item as input and produces a fixed-length output (the hash). Ideally, different items should have different hashes.

  2. Storage: Items are stored in an array based on their hashes. This array is often called a hash table. Collisions can occur when different items have the same hash (think of multiple people having the same birthday). Collision resolution techniques are used to address this issue.

  3. Membership testing: To check if an item is present in the set, the hashing function is used to calculate its hash. The item is then compared to the item stored at the corresponding location in the hash table. If they match, the item is present. Otherwise, it's not.

Advantages of hash sets:

  • Fast membership testing: Finding if an item is present in the set is very fast, typically taking constant time (independent of the set size) in the average case. This makes them ideal for tasks like checking if an email address is already registered or if a user has already seen a particular ad.

  • No order preservation: Hash sets don't maintain the order in which items are added, which can be beneficial when only membership testing is required and order doesn't matter.

Disadvantages of hash sets:

  • Duplicates not allowed: Since each item has a unique hash, hash sets inherently don't allow duplicate items.

  • Order not preserved: If the order of items is important, a hash set is not the right choice.

The HashSet<char> in the solution for finding the longest substring without repeating characters utilizes these properties to efficiently determine if a character has already been encountered within the current substring. This helps identify and shrink duplicate characters, ultimately leading to the longest substring discovery.

Examples of usage:

Hash sets are widely used in various applications:

  • Checking website visitor uniqueness

  • Managing user accounts (unique usernames)

  • Storing unique words in a text document

  • Implementing caching mechanisms