Stop copy-pasting formulas. Start writing scripts that do the work for you.
I'm a data analyst. I live in Excel. Why should I bother learning Python?
How much time do you spend cleaning data every week?
Honestly? 6-8 hours. Removing duplicates, fixing formats, merging sheets from different sources.
Python can do all of that in under 30 seconds. A 10-line script cleans, merges, and formats data that takes you hours by hand. Here's what that Monday cleanup actually looks like:
import pandas as pd
df = pd.read_csv("sales_data.csv")
df = df.drop_duplicates()
df["date"] = pd.to_datetime(df["date"])
df["revenue"].fillna(0, inplace=True)
df.to_csv("sales_data_clean.csv", index=False)
print("Done. Cleaned", len(df), "rows.")You write it once. Run it every Monday. Never think about it again.
But what about pandas? I keep hearing about it and it looks intimidating.
Pandas is just Excel on steroids — the concepts map almost directly. VLOOKUP becomes df.merge(), pivot tables become df.groupby(), filtering becomes df[df["revenue"] > 1000]. You already know the concepts. Pandas just removes the row limit and the crashes.
My company uses Excel. Everyone sends me spreadsheets. Can Python work with that?
Python reads .xlsx files natively. Load a multi-sheet workbook, process every sheet, write results back — without opening Excel once. And it works with CSVs, Google Sheets, SQL databases, and APIs. Whatever your company throws at you.
That Monday cleanup takes me 2 hours every week. If I could automate that...
That's 100 hours a year you'd get back. And cohort analysis that takes a day in Excel takes 20 lines in pandas. Start with the Python Fundamentals track — by week 3 you'll be automating your first real workflow.
OK I'm starting today. My spreadsheets can wait.
If you've spent years in Excel and Google Sheets, you're already a programmer — you just don't know it yet. Every SUMIFS, every VLOOKUP, every pivot table is a data transformation expressed in a domain-specific language. Python is that same logic, freed from a grid.
This isn't about replacing Excel. It's about knowing when Excel is the right tool and when Python does in 10 seconds what Excel can't finish at all.
Excel's 1,048,576-row limit sounds huge until you're working with transaction logs, clickstream data, or any dataset that accumulates daily. At 800K rows, Excel slows to a crawl. At 1M+, it crashes. Pandas loads 10 million rows in under 3 seconds on a laptop — and the analysis runs just as fast on 10 rows as on 10 million.
import pandas as pd
# 10 million rows — runs in under 3 seconds
df = pd.read_csv("transactions_2024.csv")
print(f"Loaded {len(df):,} rows") # Loaded 10,000,000 rows
# Group and aggregate — no spinning wheel
summary = df.groupby(["region", "product_category"]).agg(
total_revenue=("revenue", "sum"),
order_count=("order_id", "count"),
avg_order_value=("revenue", "mean")
).round(2)The conceptual jump is smaller than it looks:
| Excel operation | Pandas equivalent |
|---|---|
| VLOOKUP | df.merge(other, on="id") |
| Pivot Table | df.groupby("category").sum() |
| Remove blank rows | df.dropna() |
| Filter rows | df[df["revenue"] > 1000] |
| COUNTIF | df["status"].value_counts() |
| Sort A to Z | df.sort_values("date") |
| IF formula | df["flag"] = df["revenue"].apply(lambda x: "high" if x > 5000 else "low") |
| IFERROR | pd.to_numeric(df["col"], errors="coerce") |
Learning pandas is mostly re-learning what you already know under different syntax.
The biggest shift isn't the analysis — it's repeatability. In Excel, every Monday you open the file, run the same steps, and save the result. In Python, you write those steps once and schedule them:
# schedule with cron: 0 8 * * 1 (every Monday at 8 AM)
import pandas as pd
import smtplib
from email.mime.text import MIMEText
df = pd.read_csv("transactions.csv")
weekly = df.groupby("region")["revenue"].sum().reset_index()
weekly.columns = ["Region", "Total Revenue"]
weekly["Total Revenue"] = weekly["Total Revenue"].map("${:,.0f}".format)
table_html = weekly.to_html(index=False)
msg = MIMEText(f"<h2>Weekly Revenue</h2>{table_html}", "html")
msg["Subject"] = "Weekly Revenue Report — Auto-generated"
msg["From"] = "analytics@company.com"
msg["To"] = "team@company.com"
# connect to SMTP and send
print("Report sent.")That script replaces 90 minutes of your Monday. Every Monday. Forever.
| Task | Before Python | After Python |
|---|---|---|
| Weekly data cleanup | 6-8 hours manual | 30-second script, scheduled |
| Merging 12 monthly sheets | 45 minutes of copy-paste | pd.concat([xl.parse(s) for s in xl.sheet_names]) |
| Cohort retention analysis | Full day in Excel | 20-line pandas script |
| Ad-hoc revenue breakdown | New pivot table per request | Reusable script with parameters |
| Stakeholder report | Build from scratch weekly | Scheduled email, auto-formatted |
Retention analysis is the clearest example of Python doing something Excel genuinely cannot:
import pandas as pd
df = pd.read_csv("user_events.csv", parse_dates=["event_date"])
# Each user's acquisition month
first_activity = df.groupby("user_id")["event_date"].min().dt.to_period("M")
first_activity.name = "cohort"
df = df.join(first_activity, on="user_id")
df["months_since_start"] = (
df["event_date"].dt.to_period("M") - df["cohort"]
).apply(lambda x: x.n)
cohort_table = df.groupby(["cohort", "months_since_start"])["user_id"].nunique().unstack()
retention = cohort_table.divide(cohort_table[0], axis=0)This produces a full cohort retention matrix across every acquisition month. Building the equivalent in Excel requires hours of formulas and manual cross-referencing. In Python it's a one-time write.
If you earn $70K a year and spend 30% of your time on manual data wrangling, that's $21,000 of salary spent on work a script could do. Show your manager one automated report. That's the pitch. Analysts who learn Python don't just save time — they take on higher-value work, get noticed, and move up faster.
The analysts who get replaced by automation are the ones who only know Excel. The ones who stay are the ones who write the automation.
Not syntax — just thinking. How would you solve these?
1.Your manager asks for a breakdown of total revenue by region for Q3. You have a CSV with 200,000 rows. What's the right approach?
2.A colleague sends you a dataset where dates are stored as strings like '03-15-2024'. Your analysis needs to calculate days between events. What do you do first?
3.You've built a weekly cleanup script that takes 5 seconds to run. Your manager wants the cleaned file emailed to the team every Monday at 8 AM. What's the best next step?
Build real Python step by step — runs right here in your browser.
Clean the Weekly Sales Data
You receive a raw sales CSV every Monday. It has duplicate rows, missing revenue values, and inconsistent region names ("North", "north", "NORTH" all mean the same thing). Write a function `clean_sales(rows)` that takes a list of row dicts and returns a cleaned list: - Remove exact duplicate rows - Fill missing revenue (None or missing key) with 0 - Normalize region names to title case (e.g. "north" → "North")
# clean_sales([{"id":1,"region":"north","revenue":500},{"id":1,"region":"north","revenue":500},{"id":2,"region":"SOUTH","revenue":null}])
[
{
"id": 1,
"region": "North",
"revenue": 500
},
{
"id": 2,
"region": "South",
"revenue": 0
}
]Start with the free Python track. No credit card required.