The One-Character Bug That Broke My Recommendation Algorithm

Split-screen comparison of + and * symbols with recommendation scores

Introduction

I built a recommendation system that was supposed to intelligently balance two factors: semantic similarity (70% weight) and community ratings (30% weight). In testing, it worked. Recommendations looked reasonable. Then I actually looked at the scoring formula and found something that made me question everything—a typo that had been silently breaking the algorithm the entire time.

The bug was small: one character in one line. But it exposed something bigger about how easily scoring systems can fail silently, and why code review matters more for math than it does for anything else.

The Bug: A Typo Nobody Noticed

Here's the original code:

recommended_anime['combined_score'] = (0.7 * (1 - D[0])) + (0.3 + recommended_anime['normalized_rating'])

Do you see it?

The second part should be:

recommended_anime['combined_score'] = (0.7 * (1 - D[0])) + (0.3 * recommended_anime['normalized_rating'])

That's it. A + instead of *.

What this meant:

Intended: Multiply the rating weight (0.3) by the normalized rating (0 to 1)
Actual: Add 0.3 to every anime, then add the normalized rating

Why This Broke Everything (And Why Nobody Noticed)

Let's work through the math with actual numbers.

Intended formula:

Similarity score: 0.7 * (1 - distance) → ranges from 0 to 0.7
Rating score: 0.3 * normalized_rating → ranges from 0 to 0.3
Combined: ranges from 0 to 1.0

Actual formula:

Similarity score: 0.7 * (1 - distance) → ranges from 0 to 0.7
Rating score: 0.3 + normalized_rating → ranges from 0.3 to 1.3
Combined: ranges from 0.3 to 2.0

This means:

Every recommendation started with a baseline of 0.3, just from the constant 0.3 being added
Rating swamped similarity, because now you were adding a value between 0.3-1.3 instead of multiplying by 0.3
A highly-rated mediocre match would score higher than a perfectly similar low-rated anime

Example:

Anime A: 95% similar to user preference, rated 6/10
- Score: 0.7 * 0.95 + (0.3 + 0.6) = 0.665 + 0.9 = 1.565
Anime B: 50% similar to user preference, rated 9/10
- Score: 0.7 * 0.50 + (0.3 + 0.9) = 0.35 + 1.2 = 1.55

Result: Anime B wins, even though Anime A is almost twice as similar.

The kicker: because all scores were inflated by ~0.3-1.3 instead of normalized to 0-1, the relative rankings still looked plausible. You'd get recommendations, they'd seem okay, but they'd be skewed toward rating over relevance in a hidden, hard-to-catch way.

Why This Happens

This is a category of bug I call a "silent scoring failure." It's dangerous because:

No crashes: The code runs fine. No errors, no exceptions.
Outputs look reasonable: You get 5 recommendations, they're real anime, the similarity scores seem in range.
The mistake is subtle: In a string of math operations, a + can hide in plain sight.
Testing blind spots: If you test with hand-picked data, you might miss the bias because your examples aren't adversarial enough.

The deeper issue: formula bugs are easy to hide because the output space is continuous. A database query that returns the wrong row is obviously wrong. A scoring function that's off by 20% is invisible until you're wondering why your recommendations suddenly got worse after six months.

The Fix

recommended_anime['combined_score'] = (0.7 * (1 - D[0])) + (0.3 * recommended_anime['normalized_rating'])

Change the + to *. That's it.

Now the formula works as designed:

Similarity contributes 0-0.7
Rating contributes 0-0.3
Total ranges from 0-1.0
Weights are enforced mathematically

Real-World Impact

After the fix, recommendation quality changed noticeably:

Before (buggy):

Top recommendations were often high-rated but low-similarity matches
Users complained: "Why are you recommending random popular anime?"

After (fixed):

Top recommendations balanced similarity and quality
Niche, highly-similar anime with moderate ratings ranked appropriately
The system actually delivered on its premise

The bug hadn't destroyed the system—it had just broken the weights. Recommendations were still useful, just weighted wrong.

The Lesson: Math Is Not Forgiving

Here's what this teaches:

1. Formula bugs hide in code review Most developers read 0.3 + as "the rating weight term" and move on. Your brain fills in the expected operation. This is why having someone unfamiliar review math-heavy code matters.

2. Test with adversarial examples Create test cases that should break your weights:

High similarity + low rating vs. low similarity + high rating
Edge cases: similarity at 0, rating at 0
Verify the output ratios match your formula

3. Separate formula from implementation Write your formula as a comment or doc before coding:

# Score = 0.7 * similarity + 0.3 * rating
# Range: 0-1.0
recommended_anime['combined_score'] = (0.7 * (1 - D[0])) + (0.3 * recommended_anime['normalized_rating'])

This creates a contract. Later developers can verify the code matches the spec.

4. Validate output ranges Before deploying, check:

print(f"Score range: {scores.min()} to {scores.max()}")
print(f"Weight contribution - Similarity: {sim_scores.max()}, Rating: {rating_scores.max()}")

If your "weighted average" produces values >1, something's wrong.

Checklist: Scoring System Review

Before you deploy any scoring formula:

Formula is documented as a comment with expected output range
Weights sum correctly (check the math algebraically)
Test with edge cases (min/max for each input)
Verify output range with actual data (check min/max of final scores)
Adversarial comparison (high X + low Y vs. low X + high Y)
Code review includes someone who didn't write it (fresh eyes catch operators)
Output is logged in production (so future bugs are catchable)

Conclusion

A single character—the difference between + and *—is all it took to silently degrade recommendation quality. The system kept running, kept returning results, and nobody noticed until we looked. This is the nature of scoring bugs: they don't crash, they corrupt. When you're building ML systems or any data ranking system, remember that mathematical correctness is not something you can debug your way out of later. You have to build it in from the start.

Split-screen comparison of + and * symbols with recommendation scores

Introduction

The Bug: A Typo Nobody Noticed

Here's the original code:

recommended_anime['combined_score'] = (0.7 * (1 - D[0])) + (0.3 + recommended_anime['normalized_rating'])

Do you see it?

The second part should be:

recommended_anime['combined_score'] = (0.7 * (1 - D[0])) + (0.3 * recommended_anime['normalized_rating'])

That's it. A + instead of *.

What this meant:

Intended: Multiply the rating weight (0.3) by the normalized rating (0 to 1)
Actual: Add 0.3 to every anime, then add the normalized rating

Why This Broke Everything (And Why Nobody Noticed)

Let's work through the math with actual numbers.

Intended formula:

Similarity score: 0.7 * (1 - distance) → ranges from 0 to 0.7
Rating score: 0.3 * normalized_rating → ranges from 0 to 0.3
Combined: ranges from 0 to 1.0

Actual formula:

Similarity score: 0.7 * (1 - distance) → ranges from 0 to 0.7
Rating score: 0.3 + normalized_rating → ranges from 0.3 to 1.3
Combined: ranges from 0.3 to 2.0

This means:

Every recommendation started with a baseline of 0.3, just from the constant 0.3 being added
Rating swamped similarity, because now you were adding a value between 0.3-1.3 instead of multiplying by 0.3
A highly-rated mediocre match would score higher than a perfectly similar low-rated anime

Example:

Anime A: 95% similar to user preference, rated 6/10
- Score: 0.7 * 0.95 + (0.3 + 0.6) = 0.665 + 0.9 = 1.565
Anime B: 50% similar to user preference, rated 9/10
- Score: 0.7 * 0.50 + (0.3 + 0.9) = 0.35 + 1.2 = 1.55

Result: Anime B wins, even though Anime A is almost twice as similar.

Why This Happens

This is a category of bug I call a "silent scoring failure." It's dangerous because:

No crashes: The code runs fine. No errors, no exceptions.
Outputs look reasonable: You get 5 recommendations, they're real anime, the similarity scores seem in range.
The mistake is subtle: In a string of math operations, a + can hide in plain sight.
Testing blind spots: If you test with hand-picked data, you might miss the bias because your examples aren't adversarial enough.

The Fix

recommended_anime['combined_score'] = (0.7 * (1 - D[0])) + (0.3 * recommended_anime['normalized_rating'])

Change the + to *. That's it.

Now the formula works as designed:

Similarity contributes 0-0.7
Rating contributes 0-0.3
Total ranges from 0-1.0
Weights are enforced mathematically

Real-World Impact

After the fix, recommendation quality changed noticeably:

Before (buggy):

Top recommendations were often high-rated but low-similarity matches
Users complained: "Why are you recommending random popular anime?"

After (fixed):

Top recommendations balanced similarity and quality
Niche, highly-similar anime with moderate ratings ranked appropriately
The system actually delivered on its premise

The bug hadn't destroyed the system—it had just broken the weights. Recommendations were still useful, just weighted wrong.

The Lesson: Math Is Not Forgiving

Here's what this teaches:

2. Test with adversarial examples Create test cases that should break your weights:

High similarity + low rating vs. low similarity + high rating
Edge cases: similarity at 0, rating at 0
Verify the output ratios match your formula

3. Separate formula from implementation Write your formula as a comment or doc before coding:

# Score = 0.7 * similarity + 0.3 * rating
# Range: 0-1.0
recommended_anime['combined_score'] = (0.7 * (1 - D[0])) + (0.3 * recommended_anime['normalized_rating'])

This creates a contract. Later developers can verify the code matches the spec.

4. Validate output ranges Before deploying, check:

print(f"Score range: {scores.min()} to {scores.max()}")
print(f"Weight contribution - Similarity: {sim_scores.max()}, Rating: {rating_scores.max()}")

If your "weighted average" produces values >1, something's wrong.

Checklist: Scoring System Review

Before you deploy any scoring formula:

Formula is documented as a comment with expected output range
Weights sum correctly (check the math algebraically)
Test with edge cases (min/max for each input)
Verify output range with actual data (check min/max of final scores)
Adversarial comparison (high X + low Y vs. low X + high Y)
Code review includes someone who didn't write it (fresh eyes catch operators)
Output is logged in production (so future bugs are catchable)

The One-Character Bug That Broke My Recommendation Algorithm

Introduction

The Bug: A Typo Nobody Noticed

Why This Broke Everything (And Why Nobody Noticed)

Why This Happens

The Fix

Real-World Impact

The Lesson: Math Is Not Forgiving

Checklist: Scoring System Review

Conclusion

Written by priyans

Related Articles

How a 60-Second Wait Killed User Experience (And How I Fixed It)

How I Solved Real-Time Reminder Synchronization in a Tauri Desktop App

Enjoyed this post?

The One-Character Bug That Broke My Recommendation Algorithm

Introduction

The Bug: A Typo Nobody Noticed

Why This Broke Everything (And Why Nobody Noticed)

Why This Happens

The Fix

Real-World Impact

The Lesson: Math Is Not Forgiving

Checklist: Scoring System Review

Conclusion

Written by priyans

Related Articles

How a 60-Second Wait Killed User Experience (And How I Fixed It)

How I Solved Real-Time Reminder Synchronization in a Tauri Desktop App

Enjoyed this post?