u/Acceptable-Trash-420

KPMG Data Engineer Interview Experience (Face-to-Face | Azure + Databricks + PySpark)

Recently attended a face-to-face interview for a Data Engineering role at KPMG. Sharing the questions asked:

  1. Short introduction
  2. From a Python list with repeating items, get only the first occurrence.
  3. Extract even numbers from a nested JSON.
  4. How do we read Excel files in Databricks?
  5. If CSV data has extra commas inside values, how do we handle it?
  6. What is the most challenging task you have done?
  7. Write SCD Type 2 logic in Spark.
  8. Explain your day-to-day work.
  9. How do you write tests in CI/CD pipelines?
  10. What is flake8? I mentioned we use black formatter, then they asked why black is used.
  11. Explain CI/CD pipeline architecture. Why are pipelines written in YAML?
  12. What is git stash?
  13. Given an email column, extract the first name and create a new column.
  14. Difference between Tables and Volumes in Databricks.
  15. Difference between Serverless and Dedicated SQL Pools.
  16. What are the libraries you have used most?
  17. Difference between External and Managed tables. Where is the data stored in both?

Interview was mostly practical and project-oriented with focus on PySpark, Databricks, CI/CD, and real-world scenarios.

reddit.com
u/Acceptable-Trash-420 — 6 days ago

Hi everyone,

The role is focused on Databricks, PySpark, SQL, and Azure data engineering at Stolt Nielsen , so I’m trying to understand what to expect in terms of:

  • Number of rounds
  • Level of difficulty (coding vs system design)
  • Focus areas (PySpark transformations, SQL optimization, pipelines, etc.)
  • Any real-world case studies or scenario-based questions
  • Overall interview experience

Also, if anyone has insights on the work culture or what they emphasize during interviews, that would really help.

Would appreciate any guidance or tips. Thanks in advance!

reddit.com
u/Acceptable-Trash-420 — 20 days ago