u/Acceptable-Trash-420

Recently attended a face-to-face interview for a Data Engineering role at KPMG. Sharing the questions asked:

Short introduction
From a Python list with repeating items, get only the first occurrence.
Extract even numbers from a nested JSON.
How do we read Excel files in Databricks?
If CSV data has extra commas inside values, how do we handle it?
What is the most challenging task you have done?
Write SCD Type 2 logic in Spark.
Explain your day-to-day work.
How do you write tests in CI/CD pipelines?
What is flake8? I mentioned we use black formatter, then they asked why black is used.
Explain CI/CD pipeline architecture. Why are pipelines written in YAML?
What is git stash?
Given an email column, extract the first name and create a new column.
Difference between Tables and Volumes in Databricks.
Difference between Serverless and Dedicated SQL Pools.
What are the libraries you have used most?
Difference between External and Managed tables. Where is the data stored in both?

Interview was mostly practical and project-oriented with focus on PySpark, Databricks, CI/CD, and real-world scenarios.