Celebal Technologies | Senior Data Engineer Interview Experience | 2 YoE



Round 1: Technical

✅ Databricks vs. Hadoop

📍 How is Databricks different from Hadoop?

✅ Scaling in Databricks

📍 What are the methods to scale in Databricks?

📍 How can you ensure autoscaling in a cluster?

✅ Spark Architecture

📍 Provide a walkthrough of Spark architecture.

✅ Cluster Transformations

📍 What types of transformations can a cluster handle?

📍 Explain the concept of lazy transformations.

✅ Adaptive Query Execution (AQE)

📍 What is AQE, and how does it enhance query performance?

✅ Delta Live Tables (DLT)

📍 Explain DLT in Databricks.

📍 Provide the code flow for the DLT framework.

✅ Optimization in Databricks

📍 What are the key optimization techniques in Databricks?

✅ Pipeline Design

📍 What is the thought process behind creating a pipeline where data arrives daily and historical data needs to be managed simultaneously?

✅ Data Flow vs. Control Flow

📍 What is the difference between data flow and control flow?

✅ Triggers in Azure Data Factory (ADF)

📍 What are the types of triggers available in ADF?

Round 2: Technical

✅ Connecting to Different Sources in ADF

📍 What are the basic steps to establish a connection with different sources in ADF?

✅ Integration Runtime (IR)

📍 What is the role of the Integration Runtime in ADF?

✅ Linked Service

📍 What is a Linked Service in ADF?

📍 Can a single Linked Service be used to connect two different Salesforce instances via IR?

✅ Dynamic Scheduling

📍 If pulling multiple Excel files using SFTP without a schedule trigger, how can a trigger be set up to activate whenever a file arrives?

✅ Databricks vs. ADF for Data Ingestion

📍 Databricks and ADF both support data ingestion. Which is preferred and why?

✅ Large Data Ingestion

📍 If ingesting 10TB of data from on-premises, what tool or approach would you prefer?

✅ Pipeline Troubleshooting

📍 A long-running pipeline is processing only 50GB of data but taking over 6 hours. Where would you start troubleshooting as a data engineer?

Round 3: Technical

✅ PySpark and SQL

📍 Questions focused on PySpark and SQL problem-solving.


✅ Verdict: Selected