u/Silent_Damage_1156

Just got accepted as a Data Engineering intern at a startup with almost no guidance i need advice on organizing the work and choosing a tech stack

Hi everyone,

I recently got accepted for a Data Engineering internship at a small startup, and I’m both excited and a bit overwhelmed. Since the company is still small, there isn’t really a structured mentorship or established data infrastructure yet, so I’ll have to handle a lot of things independently.

I’ll basically be starting from scratch:

  • organizing raw data
  • building ETL pipelines
  • cleaning and preparing datasets
  • choosing the tech stack
  • deciding on storage/database solutions
  • selecting platforms/tools/workflows
  • preparing data for AI/model training later on

The data comes in many formats (PDFs, presentations, images, logs, documents, etc.), so I’m trying to figure out the best way to structure everything professionally from the beginning.

I have theoretical knowledge and some academic/project experience, but this is my first real-world experience where the work will actually be used in production, so I want to avoid bad decisions early on.

I’d really appreciate advice on:

  • how to organize the project from day one
  • recommended beginner-friendly but scalable tech stacks
  • tools/platforms you would choose if starting today
  • common mistakes to avoid in startup environments
  • how to document and manage everything properly
  • what skills I should prioritize learning first

Any advice, roadmap suggestions, GitHub repos, YouTube channels, or real-world tips would help a lot. Thanks!

reddit.com
u/Silent_Damage_1156 — 4 days ago

Just got accepted as a Data Engineering intern at a startup with almost no guidance i need advice on organizing the work and choosing a tech stack

Hi everyone,

I recently got accepted for a Data Engineering internship at a small startup, and I’m both excited and a bit overwhelmed. Since the company is still small, there isn’t really a structured mentorship or established data infrastructure yet, so I’ll have to handle a lot of things independently.

I’ll basically be starting from scratch:

  • organizing raw data
  • building ETL pipelines
  • cleaning and preparing datasets
  • choosing the tech stack
  • deciding on storage/database solutions
  • selecting platforms/tools/workflows
  • preparing data for AI/model training later on

The data comes in many formats (PDFs, presentations, images, logs, documents, etc.), so I’m trying to figure out the best way to structure everything professionally from the beginning.

I have theoretical knowledge and some academic/project experience, but this is my first real-world experience where the work will actually be used in production, so I want to avoid bad decisions early on.

I’d really appreciate advice on:

  • how to organize the project from day one
  • recommended beginner-friendly but scalable tech stacks
  • tools/platforms you would choose if starting today
  • common mistakes to avoid in startup environments
  • how to document and manage everything properly
  • what skills I should prioritize learning first

Any advice, roadmap suggestions, GitHub repos, YouTube channels, or real-world tips would help a lot. Thanks!

reddit.com
u/Silent_Damage_1156 — 4 days ago