▲ 83 r/dataengineersindia
KPMG Data Engineer Interview Experience (Face-to-Face | Azure + Databricks + PySpark)
Recently attended a face-to-face interview for a Data Engineering role at KPMG. Sharing the questions asked:
- Short introduction
- From a Python list with repeating items, get only the first occurrence.
- Extract even numbers from a nested JSON.
- How do we read Excel files in Databricks?
- If CSV data has extra commas inside values, how do we handle it?
- What is the most challenging task you have done?
- Write SCD Type 2 logic in Spark.
- Explain your day-to-day work.
- How do you write tests in CI/CD pipelines?
- What is flake8? I mentioned we use black formatter, then they asked why black is used.
- Explain CI/CD pipeline architecture. Why are pipelines written in YAML?
- What is git stash?
- Given an email column, extract the first name and create a new column.
- Difference between Tables and Volumes in Databricks.
- Difference between Serverless and Dedicated SQL Pools.
- What are the libraries you have used most?
- Difference between External and Managed tables. Where is the data stored in both?
Interview was mostly practical and project-oriented with focus on PySpark, Databricks, CI/CD, and real-world scenarios.
u/Acceptable-Trash-420 — 6 days ago