Finally Diving into DS & A


I’ve been working through the Structy.net course recently to gain a deeper understanding of Data Structures & Algorithms (as the aspiring Software Engineer does). The course is great, and Alvin is a terrific teacher! The lessons are slow, methodical, and visual, which I appreciate. I’m also already finding myself thinking of ways to implement some of the concepts I’m learning into my code at work. Pretty neat.

The reason I felt compelled to write this post was because of some interesting behavior in the execution times I observed while solving the most_frequent_char problem. Specifically, I noticed that Python’s collections.Counter was slightly faster than my custom code for counting letter frequencies. This, of course, led me down a rabbit hole, including a Reddit thread that highlighted this observation. Python is written in C, and when you use something like collections.Counter you can see a slightly speedier result than when writing out a counter Hash yourself. There is a bit of back and forth about what Counter is using, but it appears to be directly calling the C function _count_elements. And as the user tiedyedvortex commented in that Reddit post:

reddit-person-says-use-counter.png

Disclaimer 🚨: When I say speedier, we are talking 0.02s compared to 0.07s from my contrived testing in a small domain. This isn’t even noticable in most cases and I’m purely interested in the min-max mentality of what is under the hood when Python is running. Turns out it’s C (which I didn’t know). This is purely for discovery, not silly over-optimization. It’s just interesting!

Slow(er) Custom Code

Here was my initial implementation of the most_frequent_char function:

python
def most_frequent_char(s):
    count = {}
    for char in s:
        if char not in count:
            count[char] = 0
        count[char] += 1

    most = None
    for char in s:
        if most is None or count[char] > count[most]:
            most = char

    return most

We initialize our tally object and then iterate through the string:

  1. Creating a dictionary to count the frequency of each character in the string s.
  2. Iterating through the string again to find the character with the highest count.

While this implementation is fine, when benchmarked, it's slower than using Python’s built-in collections.Counter and I’ll get into more detail below.

Using collections.Counter


In addition to an execution speed boost, using collections.Counter simplifies the code as well. Because collections.Counter is a subclass of Python’s built-in dict and is specifically designed for counting hashable objects, it’s a super clean and efficient way to count elements in an iterable, such as a string.

python
from collections import Counter

def most_frequent_char(s):
    return Counter(s).most_common(1)[0][0]

Testing collections.Counter

Here’s where we can actually set up a test to see all the hype around saving milliseconds lol. Below is a basic test to see some of the different ways collections.Counter takes first place against a manual Hash map.

Most Frequent Char

python
from collections import Counter
import time
import random
import string

def custom_most_frequent_char(s):
    count = {}
    for char in s:
        if char not in count:
            count[char] = 0
        count[char] += 1

    most = None
    for char in s:
        if most is None or count[char] > count[most]:
            most = char

    return most

def counter_most_frequent_char(s):
    return Counter(s).most_common(1)[0][0]

def run_test(test_name, input_string):
    print(f"\n{test_name}")

    start = time.time()
    result_custom = custom_most_frequent_char(input_string)
    custom_time = time.time() - start
    print(f"Custom implementation: Result = {result_custom}, Time = {custom_time:.8f} seconds")

    start = time.time()
    result_counter = counter_most_frequent_char(input_string)
    counter_time = time.time() - start
    print(f"Counter implementation: Result = {result_counter}, Time = {counter_time:.8f} seconds")

# Test Case 1: Uniform distribution (random characters)
random.seed(42)
uniform_string = ''.join(random.choices(string.ascii_lowercase, k=1000000))
run_test("Test 1: Uniform distribution (random lowercase letters)", uniform_string)

# Test Case 2: Skewed distribution (mostly 'a' with some other characters)
skewed_string = 'a' * 900000 + ''.join(random.choices(string.ascii_lowercase, k=100000))
run_test("Test 2: Skewed distribution (90% 'a')", skewed_string)

# Test Case 3: Single character (all 'x')
single_char_string = 'x' * 1000000
run_test("Test 3: Single character (all 'x')", single_char_string)

# Test Case 4: Mixed case with spaces (simulating text)
mixed_string = ''.join(random.choices(string.ascii_letters + ' ', k=1000000))
run_test("Test 4: Mixed case with spaces", mixed_string)

Output

Mileage may vary. Here’s what I got when running on my M1

❯ python py/most_freq_char.py

Test 1: Uniform distribution (random lowercase letters)
Custom implementation: Result = q, Time = 0.07745409 seconds
Counter implementation: Result = q, Time = 0.03135109 seconds

Test 2: Skewed distribution (90% 'a')
Custom implementation: Result = a, Time = 0.07405376 seconds
Counter implementation: Result = a, Time = 0.02743888 seconds

Test 3: Single character (all 'x')
Custom implementation: Result = x, Time = 0.07211709 seconds
Counter implementation: Result = x, Time = 0.02716064 seconds

Test 4: Mixed case with spaces
Custom implementation: Result = P, Time = 0.08189106 seconds
Counter implementation: Result = P, Time = 0.03045082 seconds

And there you have it. It is faster, but again, this is not something that's going to make or break your code. It's cool to see the actual numbers, but using collections.Counter is more about the cleaner solution it provides rather than the speed boost!

Wrapping Up


AspectCustom Implementationcollections.Counter
LanguagePython (interpreted)C (compiled)
Counting MethodManual loop with dictionary updatesOptimized _count_elements C function
Method LookupsRepeated Python-level lookups (e.g., in, get)Minimal, handled in C
Code ComplexityMore lines, explicit loopsConcise, single method call

The performance difference between collections.Counter and my custom implementation comes down to how Counter is implemented under the hood. A lot of the heavy lifting has already been done in Python to accommodate developer needs when it comes to these kinds of data structures. I would say this was a worthwhile detour. I learned a bit more about Python, including that it is C-based. As I keep progressing in my DS&A journey, I’m even more excited to keep exploring Python’s standard library and community veterans’ insights to write better, more-informed code.

Remember to commit early and often, see ya next time! 🏁