u/parkdrew

▲ 9 r/SQL

How should I update tables in Databricks?

I’m very unfamiliar with data engineering (I’m a junior data analyst), so any feedback would be appreciated. I have a set up in Databricks where I use python scripts to ingest multiple table data from SAP and put them in the bronze layer. These data could be changed, added, or deleted, and we always want the latest versions of the tables.

We’ve had some iterations for updating our silver tables from bronze. At first we just called CREATE OR REPLACE TABLE, so it would overwrite all the data with fresh data every time we ran.
Then, we used MERGE INTO to make it more efficient for incremental changes.
Then, we used row-hash comparison in Python to update all the tables.
The tables don’t have many rows yet, with the largest having almost 1M rows. But we are constantly ingesting more tables as the project size grows.
Now looking back, maybe all the iterations was a waste of time since the tables aren’t big enough. We wanted to minimize sql run time to minimize cost.
Those who are seasoned experts, what do you think?

reddit.com
u/parkdrew — 2 days ago

It's not like I hate it. I'm just lost. Maybe I'm slow and don't understand how many of the science in this game works. Maybe the lack of understanding just led to more confusion in the end.

I enjoyed it and was so thrilled towards the mid-end of the game. I couldn't wait to hop on the game and uncover the remaining mysteries of the universe. Especially when I encountered people on social media saying how the game changed their outlook on life, I was so excited myself to reach the end. But for some reason, I don't think I was able to develop the level of relationship other people had with the game.

For instance, maybe it's because I don't really understand how the >! 22-minute mechanic works !< and >! what the eye really represents. !<

Maybe I'll learn to appreciate it more when I'm at a certain point of my life. But I'm really glad many people here had very good experiences. Just wanted to vent and see if others experienced something similar like me.

reddit.com
u/parkdrew — 19 days ago
▲ 2 r/boeing

Hi, I was curious if anyone here applied to the Entry-level Data Analyst position back in late March - early April and heard back from them. I applied on April 1st, but my application is still being considered.

I know it could take some time for them to reach back. Just curious if some people already heard back.

reddit.com
u/parkdrew — 22 days ago