Pull up the data pipeline row parser. You destructure each row with three index assignments: row[0], row[1], row[2:]. What does row[2:] represent?
The remaining fields after ID and name. I slice from index 2 to get everything else. It works for any row length, but you have to know the structure from context — the variable names don't tell you what the data means.
Here is the same thing with extended iterable unpacking:
# Before: three index assignments
record_id = row[0]
name = row[1]
rest = row[2:]
# After: one line, names carry the structure
record_id, name, *rest = rowThe star captures everything after the named positions into a list. It can go in the middle too — first, *middle, last = row — wherever you need it. One starred expression per unpacking.
And if row has fewer elements than the named positions expect?
ValueError immediately. That is the point. row[0] on a one-element row gives you the only element with no complaint. record_id, name, *rest = row raises immediately if fewer than two elements exist. The unpacking version is a contract — it asserts the structure and fails loudly if violated. Slicing is silent about structure. It gives you whatever shape you feed it.
I see the correctness argument. What about the merge? I have a function that loops through overrides and applies them one by one to a copy of the base dict. Is there a shorter form?
Dict unpacking with double-star:
# Before: explicit copy + update loop
def merge(base, overrides):
result = dict(base)
for key, value in overrides.items():
result[key] = value
return result
# After: one expression
def merge(base, overrides):
return {**base, **overrides}Later keys win. base | overrides does the same in Python 3.9+. Both create a new dict. Neither touches the inputs.
I use dict.update() for this. One line. Why isn't that the right answer?
update() mutates in place. If the caller passed you their dict, you have changed their data. In a pipeline that merges the same base config with different overrides in parallel, silent mutation is a race condition. The {**} version creates a copy first. The caller's dict is unchanged.
I have six helper functions in a utility module — each is a three-line loop doing what a one-line expression does. merge_dicts, apply_overrides, copy_and_update. Every one of them is unnecessary. I can delete the whole file.
Well-tested dead weight is the most expensive kind. The tests pass so nobody questions it. The file grows. Then someone writes a seventh function doing the same thing because they didn't find the first six. Now — put both idioms together on the pipeline:
# Before: index access + loop merge
def process_record(row, overrides):
record_id = row[0]
name = row[1]
rest = row[2:]
result = dict(row_to_dict(record_id, name, rest))
for key, value in overrides.items():
result[key] = value
return result
# After: star unpack + dict merge
def process_record(row, overrides):
record_id, name, *rest = row
return {**row_to_dict(record_id, name, rest), **overrides}Six lines to two. And the second version reads like the spec: unpack the row into semantic parts, build a dict from them, apply overrides. The first version reads like an implementation. You have to trace every step to understand the intent.
That is the difference between declarative and imperative code. Star unpacking declares the shape of the input. {**} declares the precedence of the output. When you can name the structure, Python lets you put the names in the line that reads the data.
How star unpacking works in CPython. first, *rest = iterable compiles to a UNPACK_EX bytecode instruction with two arguments: the number of names to the left of the star and the number to the right. CPython iterates the right-hand side, assigns the leading named variables from the front, assigns the trailing named variables from the back, and collects everything in between into a list. The star always produces a list — never a tuple, never None — even if zero elements fall into it. For sequences, this is O(n) with a single pass. For generators, Python exhausts the generator to collect the starred portion.
The correctness argument for unpacking over slicing. row[1:-1] on a two-element list returns [] silently. first, *middle, last = row raises ValueError: not enough values to unpack if fewer than two elements exist. Unpacking is a structural assertion. You are telling Python: "I expect this iterable to have at least this shape." Slicing makes no assertion — it accommodates any shape and returns an empty slice for mismatched expectations. In data pipelines where rows should always have a fixed schema, the strict failure is preferable to the silent wrong result.
Dict merge semantics and the mutation trap. {**d1, **d2} creates a new dict. CPython evaluates it with BUILD_MAP_UNPACK bytecode: start with an empty dict, then call dict.update() internally on each ** argument in order. Later keys win. d1 | d2 (Python 3.9+) uses the same semantics but is implemented as dict.__or__, which also creates a new dict. d1 |= d2 is dict.__ior__, which mutates d1 in place — the in-place version of update(). The rule: when you do not own the input dict (received as a function argument), never mutate it. Express the desired output as a new value.
When | is cleaner than {**}. d1 | d2 reads as a binary operation — merge d1 with d2. For exactly two dicts it is the most readable form. {**d1, **d2} is better when you need to add extra keys inline — {**base, **overrides, "timestamp": now()} — or when merging more than two sources. For merging a variable list of dicts, neither | nor {**} scales cleanly; the correct form is a comprehension: {k: v for d in dicts for k, v in d.items()}.
Sign up to write and run code in this lesson.
Pull up the data pipeline row parser. You destructure each row with three index assignments: row[0], row[1], row[2:]. What does row[2:] represent?
The remaining fields after ID and name. I slice from index 2 to get everything else. It works for any row length, but you have to know the structure from context — the variable names don't tell you what the data means.
Here is the same thing with extended iterable unpacking:
# Before: three index assignments
record_id = row[0]
name = row[1]
rest = row[2:]
# After: one line, names carry the structure
record_id, name, *rest = rowThe star captures everything after the named positions into a list. It can go in the middle too — first, *middle, last = row — wherever you need it. One starred expression per unpacking.
And if row has fewer elements than the named positions expect?
ValueError immediately. That is the point. row[0] on a one-element row gives you the only element with no complaint. record_id, name, *rest = row raises immediately if fewer than two elements exist. The unpacking version is a contract — it asserts the structure and fails loudly if violated. Slicing is silent about structure. It gives you whatever shape you feed it.
I see the correctness argument. What about the merge? I have a function that loops through overrides and applies them one by one to a copy of the base dict. Is there a shorter form?
Dict unpacking with double-star:
# Before: explicit copy + update loop
def merge(base, overrides):
result = dict(base)
for key, value in overrides.items():
result[key] = value
return result
# After: one expression
def merge(base, overrides):
return {**base, **overrides}Later keys win. base | overrides does the same in Python 3.9+. Both create a new dict. Neither touches the inputs.
I use dict.update() for this. One line. Why isn't that the right answer?
update() mutates in place. If the caller passed you their dict, you have changed their data. In a pipeline that merges the same base config with different overrides in parallel, silent mutation is a race condition. The {**} version creates a copy first. The caller's dict is unchanged.
I have six helper functions in a utility module — each is a three-line loop doing what a one-line expression does. merge_dicts, apply_overrides, copy_and_update. Every one of them is unnecessary. I can delete the whole file.
Well-tested dead weight is the most expensive kind. The tests pass so nobody questions it. The file grows. Then someone writes a seventh function doing the same thing because they didn't find the first six. Now — put both idioms together on the pipeline:
# Before: index access + loop merge
def process_record(row, overrides):
record_id = row[0]
name = row[1]
rest = row[2:]
result = dict(row_to_dict(record_id, name, rest))
for key, value in overrides.items():
result[key] = value
return result
# After: star unpack + dict merge
def process_record(row, overrides):
record_id, name, *rest = row
return {**row_to_dict(record_id, name, rest), **overrides}Six lines to two. And the second version reads like the spec: unpack the row into semantic parts, build a dict from them, apply overrides. The first version reads like an implementation. You have to trace every step to understand the intent.
That is the difference between declarative and imperative code. Star unpacking declares the shape of the input. {**} declares the precedence of the output. When you can name the structure, Python lets you put the names in the line that reads the data.
How star unpacking works in CPython. first, *rest = iterable compiles to a UNPACK_EX bytecode instruction with two arguments: the number of names to the left of the star and the number to the right. CPython iterates the right-hand side, assigns the leading named variables from the front, assigns the trailing named variables from the back, and collects everything in between into a list. The star always produces a list — never a tuple, never None — even if zero elements fall into it. For sequences, this is O(n) with a single pass. For generators, Python exhausts the generator to collect the starred portion.
The correctness argument for unpacking over slicing. row[1:-1] on a two-element list returns [] silently. first, *middle, last = row raises ValueError: not enough values to unpack if fewer than two elements exist. Unpacking is a structural assertion. You are telling Python: "I expect this iterable to have at least this shape." Slicing makes no assertion — it accommodates any shape and returns an empty slice for mismatched expectations. In data pipelines where rows should always have a fixed schema, the strict failure is preferable to the silent wrong result.
Dict merge semantics and the mutation trap. {**d1, **d2} creates a new dict. CPython evaluates it with BUILD_MAP_UNPACK bytecode: start with an empty dict, then call dict.update() internally on each ** argument in order. Later keys win. d1 | d2 (Python 3.9+) uses the same semantics but is implemented as dict.__or__, which also creates a new dict. d1 |= d2 is dict.__ior__, which mutates d1 in place — the in-place version of update(). The rule: when you do not own the input dict (received as a function argument), never mutate it. Express the desired output as a new value.
When | is cleaner than {**}. d1 | d2 reads as a binary operation — merge d1 with d2. For exactly two dicts it is the most readable form. {**d1, **d2} is better when you need to add extra keys inline — {**base, **overrides, "timestamp": now()} — or when merging more than two sources. For merging a variable list of dicts, neither | nor {**} scales cleanly; the correct form is a comprehension: {k: v for d in dicts for k, v in d.items()}.