How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Python math and statistics: Numbers Beyond Arithmetic — Python Standard Library

Python math and statistics: Numbers Beyond Arithmetic — Python Standard Library | zuzu.codes

Day 19 · ~16m●

The ops team has a week of response-time data — milliseconds per request. They want mean, median, standard deviation, and the 95th percentile. What do you reach for?

I know how to compute a mean — sum divided by count. Median I can do with sorted(). But standard deviation I'd have to look up the formula. And 95th percentile I have no idea.

Python ships both answers. The math module for mathematical functions. The statistics module for descriptive statistics. Let's start with math:

python

import math

# Basic math operations beyond arithmetic
print(math.sqrt(144))     # 12.0
print(math.log(1000, 10)) # 3.0 — log base 10
print(math.log(math.e))   # 1.0 — natural log
print(math.ceil(3.2))     # 4   — round up
print(math.floor(3.8))    # 3   — round down
print(math.pi)            # 3.141592653589793
print(math.inf)           # inf — positive infinity

For the ops team, math.log could be useful — response time distributions are often log-normal. math.ceil and math.floor for binning response times into buckets. And math.inf as a sentinel for "no maximum seen yet" in a running max tracker.

All of those. The statistics module handles the descriptive stats:

python

import statistics

response_times = [245, 312, 189, 1205, 344, 287, 456, 198, 2341, 301]

print(statistics.mean(response_times))     # 487.8
print(statistics.median(response_times))   # 306.5
print(statistics.stdev(response_times))    # 668.3  (sample std dev)
print(statistics.pstdev(response_times))   # 633.8  (population std dev)
print(statistics.variance(response_times)) # 446800.4  (variance)

One-liners for all of these. I was about to write a manual standard deviation calculation — variance is mean of squared deviations from the mean, square root of that, I looked it up in Track 1 stats... statistics.stdev(response_times) does all of that in one call.

stdev is the sample standard deviation — uses N-1 in the denominator, assumes your data is a sample of a larger population. pstdev is population standard deviation — uses N. For server response times where you have all the data for a time window, pstdev is technically more correct. In practice the difference is small for large samples.

What about percentiles? The ops team's SLA says 95th percentile response time must be under 500ms. How do I compute that?

statistics.quantiles() for percentiles:

python

import statistics

response_times = [245, 312, 189, 1205, 344, 287, 456, 198, 2341, 301]

# quartiles — 25th, 50th, 75th percentile
q = statistics.quantiles(response_times, n=4)
print(q)  # [~250, ~306, ~450]

# percentiles — specify n=100 for percentile-level granularity
p95_list = statistics.quantiles(response_times, n=100)
p95 = p95_list[94]  # index 94 = 95th percentile
print(f"p95: {p95:.1f}ms")

quantiles(data, n=100) gives me a list of 99 cut points — p95_list[94] is the 95th percentile. The index math is off by one because the list has N-1 elements for N quantiles.

Correct — quantiles(n=4) gives 3 values (Q1, Q2, Q3), not 4. The list is the boundaries between the quartiles. For percentiles: n=100 gives 99 values; the 95th percentile is index 94. And one more function that's genuinely useful for log spike detection:

python

import statistics

# mode — most common value
errors_per_hour = [3, 5, 3, 7, 3, 12, 3, 4]
print(statistics.mode(errors_per_hour))  # 3 — most common value

statistics.mode() for "which error count is most typical this week." The ops team asks me that manually every week and I count by eye. One function call.

Now you're building the weekly stats report in your head. Today's problem: a list of response times, compute the full descriptive profile — mean, median, stdev, p95, min, max, and a slow-request count above a threshold. All stdlib, no third-party packages.

Tomorrow is random — for test data generation. I'm guessing I'll use it to generate fake response time distributions for testing the stats function I'm about to write.

You guessed exactly what tomorrow's lesson is about. The stats function you write today, you'll test with random data tomorrow. That's the progression.

math and statistics: Python's Built-In Numerical Toolkit

Python's math and statistics modules provide the numerical functions that were previously available only through NumPy or manual implementation. For log analysis — computing response time statistics, normalizing metrics, detecting anomalies — both modules are sufficient without any third-party dependencies.

The math Module

math is a thin wrapper around the C standard library's math functions. All operations work on Python float and int values. The module provides: trigonometric functions (sin, cos, tan), exponential and logarithmic functions (exp, log, log2, log10), power functions (pow, sqrt), rounding (ceil, floor, trunc), special values (pi, e, inf, nan), and utility functions (factorial, gcd, isfinite, isinf, isnan).

For log analysis specifically: math.log for log-normal response time analysis, math.ceil and math.floor for bucketing, math.inf as a "no-value-yet" sentinel in running min/max trackers, and math.isnan / math.isfinite for data quality checks before statistical computation.

The statistics Module

statistics was added in Python 3.4 to provide basic descriptive statistics without NumPy. Key functions: mean(), median(), mode(), multimode() (Python 3.8+), stdev() (sample), pstdev() (population), variance(), pvariance(), quantiles(), harmonic_mean(), geometric_mean() (Python 3.8+).

Sample vs Population Statistics

statistics.stdev() computes sample standard deviation (denominator N-1). statistics.pstdev() computes population standard deviation (denominator N). For server monitoring where you have all requests in a time window — not a sample — pstdev is technically correct. For anomaly detection where you're treating the current window as a sample of the system's general behavior, stdev is more conservative. The practical difference is negligible for N > 30.

quantiles() for Percentile Analysis

statistics.quantiles(data, n=4) returns N-1 cut points that divide the sorted data into N equal-probability groups. For percentiles: n=100 gives 99 values; the 95th percentile is index 94 (0-based). Note that quantiles() requires at least 2 data points and raises StatisticsError on empty input — always guard with if data: before calling.

When to Reach for NumPy Instead

statistics is appropriate for small datasets (thousands of values) and simple computations. NumPy is appropriate for large datasets (millions of values), vectorized operations, and array-based workflows. The statistics module operates on Python lists and is single-threaded. NumPy operates on C arrays and can exploit SIMD instructions. For weekly report generation over a day's worth of response times, statistics is fine. For real-time streaming metrics over millions of data points, NumPy is the right tool.

Practice your skills

Already have an account? Sign in

Day 19 · ~16m●

The ops team has a week of response-time data — milliseconds per request. They want mean, median, standard deviation, and the 95th percentile. What do you reach for?

I know how to compute a mean — sum divided by count. Median I can do with sorted(). But standard deviation I'd have to look up the formula. And 95th percentile I have no idea.

Python ships both answers. The math module for mathematical functions. The statistics module for descriptive statistics. Let's start with math:

python

import math

# Basic math operations beyond arithmetic
print(math.sqrt(144))     # 12.0
print(math.log(1000, 10)) # 3.0 — log base 10
print(math.log(math.e))   # 1.0 — natural log
print(math.ceil(3.2))     # 4   — round up
print(math.floor(3.8))    # 3   — round down
print(math.pi)            # 3.141592653589793
print(math.inf)           # inf — positive infinity

All of those. The statistics module handles the descriptive stats:

python

import statistics

response_times = [245, 312, 189, 1205, 344, 287, 456, 198, 2341, 301]

print(statistics.mean(response_times))     # 487.8
print(statistics.median(response_times))   # 306.5
print(statistics.stdev(response_times))    # 668.3  (sample std dev)
print(statistics.pstdev(response_times))   # 633.8  (population std dev)
print(statistics.variance(response_times)) # 446800.4  (variance)

What about percentiles? The ops team's SLA says 95th percentile response time must be under 500ms. How do I compute that?

statistics.quantiles() for percentiles:

python

import statistics

response_times = [245, 312, 189, 1205, 344, 287, 456, 198, 2341, 301]

# quartiles — 25th, 50th, 75th percentile
q = statistics.quantiles(response_times, n=4)
print(q)  # [~250, ~306, ~450]

# percentiles — specify n=100 for percentile-level granularity
p95_list = statistics.quantiles(response_times, n=100)
p95 = p95_list[94]  # index 94 = 95th percentile
print(f"p95: {p95:.1f}ms")

quantiles(data, n=100) gives me a list of 99 cut points — p95_list[94] is the 95th percentile. The index math is off by one because the list has N-1 elements for N quantiles.

python

import statistics

# mode — most common value
errors_per_hour = [3, 5, 3, 7, 3, 12, 3, 4]
print(statistics.mode(errors_per_hour))  # 3 — most common value

statistics.mode() for "which error count is most typical this week." The ops team asks me that manually every week and I count by eye. One function call.

Tomorrow is random — for test data generation. I'm guessing I'll use it to generate fake response time distributions for testing the stats function I'm about to write.

You guessed exactly what tomorrow's lesson is about. The stats function you write today, you'll test with random data tomorrow. That's the progression.

math and statistics: Python's Built-In Numerical Toolkit

The math Module

The statistics Module

Sample vs Population Statistics

quantiles() for Percentile Analysis

When to Reach for NumPy Instead

Day 19 · ~16m●

The ops team has a week of response-time data — milliseconds per request. They want mean, median, standard deviation, and the 95th percentile. What do you reach for?

I know how to compute a mean — sum divided by count. Median I can do with sorted(). But standard deviation I'd have to look up the formula. And 95th percentile I have no idea.

Python ships both answers. The math module for mathematical functions. The statistics module for descriptive statistics. Let's start with math:

python

import math

# Basic math operations beyond arithmetic
print(math.sqrt(144))     # 12.0
print(math.log(1000, 10)) # 3.0 — log base 10
print(math.log(math.e))   # 1.0 — natural log
print(math.ceil(3.2))     # 4   — round up
print(math.floor(3.8))    # 3   — round down
print(math.pi)            # 3.141592653589793
print(math.inf)           # inf — positive infinity

All of those. The statistics module handles the descriptive stats:

python

import statistics

response_times = [245, 312, 189, 1205, 344, 287, 456, 198, 2341, 301]

print(statistics.mean(response_times))     # 487.8
print(statistics.median(response_times))   # 306.5
print(statistics.stdev(response_times))    # 668.3  (sample std dev)
print(statistics.pstdev(response_times))   # 633.8  (population std dev)
print(statistics.variance(response_times)) # 446800.4  (variance)

What about percentiles? The ops team's SLA says 95th percentile response time must be under 500ms. How do I compute that?

statistics.quantiles() for percentiles:

python

import statistics

response_times = [245, 312, 189, 1205, 344, 287, 456, 198, 2341, 301]

# quartiles — 25th, 50th, 75th percentile
q = statistics.quantiles(response_times, n=4)
print(q)  # [~250, ~306, ~450]

# percentiles — specify n=100 for percentile-level granularity
p95_list = statistics.quantiles(response_times, n=100)
p95 = p95_list[94]  # index 94 = 95th percentile
print(f"p95: {p95:.1f}ms")

quantiles(data, n=100) gives me a list of 99 cut points — p95_list[94] is the 95th percentile. The index math is off by one because the list has N-1 elements for N quantiles.

python

import statistics

# mode — most common value
errors_per_hour = [3, 5, 3, 7, 3, 12, 3, 4]
print(statistics.mode(errors_per_hour))  # 3 — most common value

statistics.mode() for "which error count is most typical this week." The ops team asks me that manually every week and I count by eye. One function call.

Tomorrow is random — for test data generation. I'm guessing I'll use it to generate fake response time distributions for testing the stats function I'm about to write.

You guessed exactly what tomorrow's lesson is about. The stats function you write today, you'll test with random data tomorrow. That's the progression.

math and statistics: Python's Built-In Numerical Toolkit

The math Module

The statistics Module

Sample vs Population Statistics

quantiles() for Percentile Analysis

When to Reach for NumPy Instead

Practice your skills

Already have an account? Sign in

Day 19 · ~16m●

The ops team has a week of response-time data — milliseconds per request. They want mean, median, standard deviation, and the 95th percentile. What do you reach for?

I know how to compute a mean — sum divided by count. Median I can do with sorted(). But standard deviation I'd have to look up the formula. And 95th percentile I have no idea.

Python ships both answers. The math module for mathematical functions. The statistics module for descriptive statistics. Let's start with math:

python

import math

# Basic math operations beyond arithmetic
print(math.sqrt(144))     # 12.0
print(math.log(1000, 10)) # 3.0 — log base 10
print(math.log(math.e))   # 1.0 — natural log
print(math.ceil(3.2))     # 4   — round up
print(math.floor(3.8))    # 3   — round down
print(math.pi)            # 3.141592653589793
print(math.inf)           # inf — positive infinity

All of those. The statistics module handles the descriptive stats:

python

import statistics

response_times = [245, 312, 189, 1205, 344, 287, 456, 198, 2341, 301]

print(statistics.mean(response_times))     # 487.8
print(statistics.median(response_times))   # 306.5
print(statistics.stdev(response_times))    # 668.3  (sample std dev)
print(statistics.pstdev(response_times))   # 633.8  (population std dev)
print(statistics.variance(response_times)) # 446800.4  (variance)

What about percentiles? The ops team's SLA says 95th percentile response time must be under 500ms. How do I compute that?

statistics.quantiles() for percentiles:

python

import statistics

response_times = [245, 312, 189, 1205, 344, 287, 456, 198, 2341, 301]

# quartiles — 25th, 50th, 75th percentile
q = statistics.quantiles(response_times, n=4)
print(q)  # [~250, ~306, ~450]

# percentiles — specify n=100 for percentile-level granularity
p95_list = statistics.quantiles(response_times, n=100)
p95 = p95_list[94]  # index 94 = 95th percentile
print(f"p95: {p95:.1f}ms")

quantiles(data, n=100) gives me a list of 99 cut points — p95_list[94] is the 95th percentile. The index math is off by one because the list has N-1 elements for N quantiles.

python

import statistics

# mode — most common value
errors_per_hour = [3, 5, 3, 7, 3, 12, 3, 4]
print(statistics.mode(errors_per_hour))  # 3 — most common value

statistics.mode() for "which error count is most typical this week." The ops team asks me that manually every week and I count by eye. One function call.

Tomorrow is random — for test data generation. I'm guessing I'll use it to generate fake response time distributions for testing the stats function I'm about to write.

You guessed exactly what tomorrow's lesson is about. The stats function you write today, you'll test with random data tomorrow. That's the progression.