Logs tell you what happened. Metrics tell you how often and how long. Two flavors cover most cases:
import time
import json
def do_op(name, work_secs):
start = time.monotonic()
time.sleep(work_secs) # simulated work
duration_ms = int((time.monotonic() - start) * 1000)
print(json.dumps({"metric": "latency_ms", "op": name, "value": duration_ms}))
return duration_ms
timings = []
timings.append(do_op("a", 0.01))
timings.append(do_op("b", 0.02))
timings.append(do_op("c", 0.03))
print(json.dumps({"metric": "summary", "ops": len(timings), "total_ms": sum(timings)}))Why time.monotonic() instead of time.time()?
time.time() returns wall-clock time, which can jump backwards when the system clock is adjusted (NTP sync, daylight savings, manual change). time.monotonic() is guaranteed to never decrease — durations measured with it are always non-negative. Use it for intervals; reach for time.time() only when you need an actual wall-clock timestamp.
Why emit one line per op AND a summary?
Per-op gives you the distribution ("95th percentile latency"); summary gives you the headline number ("3 ops in 60ms"). Both are useful — and both are essentially free, structurally.
| Shape | Question it answers | Example |
|---|---|---|
| Counter | "How many?" | items processed, errors, retries |
| Timing | "How long?" | per-op latency, total wall-clock |
A third shape — gauge (current value: queue depth, items in-flight) — matters too, but counters and timings cover most automation needs.
time.monotonicimport time
start = time.monotonic()
# ... work ...
duration_ms = int((time.monotonic() - start) * 1000)time.monotonic() returns a float (seconds, with sub-microsecond resolution). Cast to ms (multiply by 1000) for readability — two-digit ms is more legible than 0.0234567 seconds.
A helper for repeated use:
import time
import json
from contextlib import contextmanager
@contextmanager
def timed(op_name):
start = time.monotonic()
yield
duration_ms = int((time.monotonic() - start) * 1000)
print(json.dumps({"metric": "latency_ms", "op": op_name, "value": duration_ms}))
with timed("fetch"):
fetch_messages()
with timed("filter"):
matching = filter_to_urgent(messages)For learning, the explicit start = ...; duration = ... shape is clearer. The context-manager idiom is a refinement.
error_count = 0
for item in items:
try:
process(item)
except Exception:
error_count += 1
print(json.dumps({"metric": "errors", "value": error_count}))Python int is the simplest counter store. For multi-process systems, you'd use Prometheus / StatsD / a real metrics library — same shape, persistent counter.
When you care about distribution (median, p95, p99) rather than just total/average, you need a histogram. Real metrics systems compute these server-side from individual timing emissions. Your job: emit each timing as a line. The aggregator handles the math.
{"event": "failed", "err": "ConnectionError", "item_id": "x"}){"metric": "errors", "value": 12})The lines blur — many systems treat metrics as a special-shaped log. For learning purposes: if you want to query distributions, emit a metric line. If you want to read the story of one item's path, emit a log line.
A closing summary line is incredibly useful for at-a-glance scanning:
print(json.dumps({
"metric": "summary",
"ops": len(timings),
"total_ms": sum(timings),
"errors": error_count,
"processed": len(items) - error_count,
}))One line tells you whether the run succeeded, how long it took, and what failed.
Logs tell you what happened. Metrics tell you how often and how long. Two flavors cover most cases:
import time
import json
def do_op(name, work_secs):
start = time.monotonic()
time.sleep(work_secs) # simulated work
duration_ms = int((time.monotonic() - start) * 1000)
print(json.dumps({"metric": "latency_ms", "op": name, "value": duration_ms}))
return duration_ms
timings = []
timings.append(do_op("a", 0.01))
timings.append(do_op("b", 0.02))
timings.append(do_op("c", 0.03))
print(json.dumps({"metric": "summary", "ops": len(timings), "total_ms": sum(timings)}))Why time.monotonic() instead of time.time()?
time.time() returns wall-clock time, which can jump backwards when the system clock is adjusted (NTP sync, daylight savings, manual change). time.monotonic() is guaranteed to never decrease — durations measured with it are always non-negative. Use it for intervals; reach for time.time() only when you need an actual wall-clock timestamp.
Why emit one line per op AND a summary?
Per-op gives you the distribution ("95th percentile latency"); summary gives you the headline number ("3 ops in 60ms"). Both are useful — and both are essentially free, structurally.
| Shape | Question it answers | Example |
|---|---|---|
| Counter | "How many?" | items processed, errors, retries |
| Timing | "How long?" | per-op latency, total wall-clock |
A third shape — gauge (current value: queue depth, items in-flight) — matters too, but counters and timings cover most automation needs.
time.monotonicimport time
start = time.monotonic()
# ... work ...
duration_ms = int((time.monotonic() - start) * 1000)time.monotonic() returns a float (seconds, with sub-microsecond resolution). Cast to ms (multiply by 1000) for readability — two-digit ms is more legible than 0.0234567 seconds.
A helper for repeated use:
import time
import json
from contextlib import contextmanager
@contextmanager
def timed(op_name):
start = time.monotonic()
yield
duration_ms = int((time.monotonic() - start) * 1000)
print(json.dumps({"metric": "latency_ms", "op": op_name, "value": duration_ms}))
with timed("fetch"):
fetch_messages()
with timed("filter"):
matching = filter_to_urgent(messages)For learning, the explicit start = ...; duration = ... shape is clearer. The context-manager idiom is a refinement.
error_count = 0
for item in items:
try:
process(item)
except Exception:
error_count += 1
print(json.dumps({"metric": "errors", "value": error_count}))Python int is the simplest counter store. For multi-process systems, you'd use Prometheus / StatsD / a real metrics library — same shape, persistent counter.
When you care about distribution (median, p95, p99) rather than just total/average, you need a histogram. Real metrics systems compute these server-side from individual timing emissions. Your job: emit each timing as a line. The aggregator handles the math.
{"event": "failed", "err": "ConnectionError", "item_id": "x"}){"metric": "errors", "value": 12})The lines blur — many systems treat metrics as a special-shaped log. For learning purposes: if you want to query distributions, emit a metric line. If you want to read the story of one item's path, emit a log line.
A closing summary line is incredibly useful for at-a-glance scanning:
print(json.dumps({
"metric": "summary",
"ops": len(timings),
"total_ms": sum(timings),
"errors": error_count,
"processed": len(items) - error_count,
}))One line tells you whether the run succeeded, how long it took, and what failed.
Create a free account to get started. Paid plans unlock all tracks.