---

### 🎯 Your Next Step

1. Pick a workbook question. 2. Follow the **Context → Code → Commentary** template above. 3. Run the code locally to verify it works. 4. Polish the write‑up, add the performance notes, and you’ll have a solid, original answer.

---

# 1️⃣ Load the file as an RDD lines = sc.textFile("hdfs:///data/input.txt")

```python from pyspark import SparkContext

## 7. Putting It All Together – A Mini‑Project Blueprint