Finally Diving into DS & A
I’ve been working through the Structy.net course recently to gain a deeper understanding of Data Structures & Algorithms (as the aspiring Software Engineer does). The course is great, and Alvin is a terrific teacher! The lessons are slow, methodical, and visual, which I appreciate. I’m also already finding myself thinking of ways to implement some of the concepts I’m learning into my code at work. Pretty neat.
The reason I felt compelled to write this post was because of some interesting behavior in the execution times I observed while solving the most_frequent_char
problem. Specifically, I noticed that Python’s collections.Counter
was slightly faster than my custom code for counting letter frequencies. This, of course, led me down a rabbit hole, including a Reddit thread that highlighted this observation. Python is written in C, and when you use something like collections.Counter
you can see a slightly speedier result than when writing out a counter Hash yourself. There is a bit of back and forth about what Counter
is using, but it appears to be directly calling the C function _count_elements
. And as the user tiedyedvortex commented in that Reddit post:

Disclaimer 🚨: When I say speedier, we are talking
0.02s
compared to0.07s
from my contrived testing in a small domain. This isn’t even noticable in most cases and I’m purely interested in the min-max mentality of what is under the hood when Python is running. Turns out it’s C (which I didn’t know). This is purely for discovery, not silly over-optimization. It’s just interesting!
Slow(er) Custom Code
Here was my initial implementation of the most_frequent_char
function:
def most_frequent_char(s):
count = {}
for char in s:
if char not in count:
count[char] = 0
count[char] += 1
most = None
for char in s:
if most is None or count[char] > count[most]:
most = char
return most
We initialize our tally object and then iterate through the string:
- Creating a dictionary to count the frequency of each character in the string
s
. - Iterating through the string again to find the character with the highest count.
While this implementation is fine, when benchmarked, it's slower than using Python’s built-in collections.Counter
and I’ll get into more detail below.
Using collections.Counter
In addition to an execution speed boost, using collections.Counter
simplifies the code as well. Because collections.Counter
is a subclass of Python’s built-in dict
and is specifically designed for counting hashable objects, it’s a super clean and efficient way to count elements in an iterable, such as a string.
from collections import Counter
def most_frequent_char(s):
return Counter(s).most_common(1)[0][0]
Testing collections.Counter
Here’s where we can actually set up a test to see all the hype around saving milliseconds lol. Below is a basic test to see some of the different ways collections.Counter
takes first place against a manual Hash map.
Most Frequent Char
from collections import Counter
import time
import random
import string
def custom_most_frequent_char(s):
count = {}
for char in s:
if char not in count:
count[char] = 0
count[char] += 1
most = None
for char in s:
if most is None or count[char] > count[most]:
most = char
return most
def counter_most_frequent_char(s):
return Counter(s).most_common(1)[0][0]
def run_test(test_name, input_string):
print(f"\n{test_name}")
start = time.time()
result_custom = custom_most_frequent_char(input_string)
custom_time = time.time() - start
print(f"Custom implementation: Result = {result_custom}, Time = {custom_time:.8f} seconds")
start = time.time()
result_counter = counter_most_frequent_char(input_string)
counter_time = time.time() - start
print(f"Counter implementation: Result = {result_counter}, Time = {counter_time:.8f} seconds")
# Test Case 1: Uniform distribution (random characters)
random.seed(42)
uniform_string = ''.join(random.choices(string.ascii_lowercase, k=1000000))
run_test("Test 1: Uniform distribution (random lowercase letters)", uniform_string)
# Test Case 2: Skewed distribution (mostly 'a' with some other characters)
skewed_string = 'a' * 900000 + ''.join(random.choices(string.ascii_lowercase, k=100000))
run_test("Test 2: Skewed distribution (90% 'a')", skewed_string)
# Test Case 3: Single character (all 'x')
single_char_string = 'x' * 1000000
run_test("Test 3: Single character (all 'x')", single_char_string)
# Test Case 4: Mixed case with spaces (simulating text)
mixed_string = ''.join(random.choices(string.ascii_letters + ' ', k=1000000))
run_test("Test 4: Mixed case with spaces", mixed_string)
Output
Mileage may vary. Here’s what I got when running on my M1
❯ python py/most_freq_char.py
Test 1: Uniform distribution (random lowercase letters)
Custom implementation: Result = q, Time = 0.07745409 seconds
Counter implementation: Result = q, Time = 0.03135109 seconds
Test 2: Skewed distribution (90% 'a')
Custom implementation: Result = a, Time = 0.07405376 seconds
Counter implementation: Result = a, Time = 0.02743888 seconds
Test 3: Single character (all 'x')
Custom implementation: Result = x, Time = 0.07211709 seconds
Counter implementation: Result = x, Time = 0.02716064 seconds
Test 4: Mixed case with spaces
Custom implementation: Result = P, Time = 0.08189106 seconds
Counter implementation: Result = P, Time = 0.03045082 seconds
And there you have it. It is faster
, but again, this is not something that's going to make or break your code. It's cool to see the actual numbers, but using collections.Counter
is more about the cleaner solution it provides rather than the speed boost
!
Wrapping Up
Aspect | Custom Implementation | collections.Counter |
Language | Python (interpreted) | C (compiled) |
Counting Method | Manual loop with dictionary updates | Optimized _count_elements C function |
Method Lookups | Repeated Python-level lookups (e.g., in, get) | Minimal, handled in C |
Code Complexity | More lines, explicit loops | Concise, single method call |
The performance difference between collections.Counter
and my custom implementation comes down to how Counter
is implemented under the hood. A lot of the heavy lifting has already been done in Python to accommodate developer needs when it comes to these kinds of data structures. I would say this was a worthwhile detour. I learned a bit more about Python, including that it is C-based. As I keep progressing in my DS&A journey, I’m even more excited to keep exploring Python’s standard library and community veterans’ insights to write better, more-informed code.