u/Silent_Damage_1156

Hi everyone,

I recently got accepted for a Data Engineering internship at a small startup, and I’m both excited and a bit overwhelmed. Since the company is still small, there isn’t really a structured mentorship or established data infrastructure yet, so I’ll have to handle a lot of things independently.

I’ll basically be starting from scratch:

organizing raw data
building ETL pipelines
cleaning and preparing datasets
choosing the tech stack
deciding on storage/database solutions
selecting platforms/tools/workflows
preparing data for AI/model training later on

The data comes in many formats (PDFs, presentations, images, logs, documents, etc.), so I’m trying to figure out the best way to structure everything professionally from the beginning.

I have theoretical knowledge and some academic/project experience, but this is my first real-world experience where the work will actually be used in production, so I want to avoid bad decisions early on.

I’d really appreciate advice on:

how to organize the project from day one
recommended beginner-friendly but scalable tech stacks
tools/platforms you would choose if starting today
common mistakes to avoid in startup environments
how to document and manage everything properly
what skills I should prioritize learning first

Any advice, roadmap suggestions, GitHub repos, YouTube channels, or real-world tips would help a lot. Thanks!