يا رجالة، محتاج "Brutal Honesty" من الناس اللي شغالة في الـ Data Science و الـ Machine Learning في مصر أو ريموتلي.

أنا خلصت مشروعين حاسس إنهم ثقال في الـ Portfolio بتاعي، بس عندي مشكلة نفسية ومحتاج أعرف رأيكم فيها.

المشاريع اللي عملتها:

Hybrid Recommender System (Amazon Dataset): اشتغلت على نص مليون مراجعة (500k reviews). استخدمت Implicit ALS مع TF-IDF. واجهت مشاكل في الرامات وحليتها بالـ Sparse Matrices (CSR).
Fintech Fraud Detection: مشروع في مجال البنوك، قدرت أحل مشكلة الـ Class Imbalance العنيفة وركزت على الـ F1-Score والـ Recall بدل الـ Accuracy الوهمية.

المشكلة فين؟ أنا فاهم الـ Logic والـ Math ورا الموديلات دي كويس جداً وفاهم الـ Architecture، بس الحقيقة أنا استخدمت الـ AI بشكل كبير كـ "Pair Programmer". يعني ساعدني جداً في الـ Syntax المعقد، الـ Library errors الغلسة (زي الـ Scipy index mapping)، والـ Boilerplate code.

لو جيت دلوقتي قلتلي اكتب الـ CSR matrix mapping من الصفر "لوحدي" تماماً بدون مساعدة، غالباً هعك شوية أو هاخد وقت طويل.

هل دي "Red Flag" لواحد Junior أو Mid إنه يعتمد على الـ AI في الـ implementation طالما فاهم الـ Logic والأساس اللي الموديل شغال بيه؟
إزاي أثبت في الـ Technical Interview إني فعلاً فاهم ومستوعب كل سطر كود، مش مجرد "Copy-Paste" من GPT؟
إيه الخطوة الجاية عشان أكون "Independent" بجد وأشيل عقدة الخوف من إني أكتب كود لوحدي؟

عايز أعرف هل أنا كدة جاهز أقدم على وظيفة ريموتلي بمرتب كويس، ولا محتاج أفرمل شوية وأذاكر حاجات تانية؟

reddit.com

u/Grand-Squirrel3173 — 17 days ago

▲ 28 r/MLQuestions

Hey everyone,

I’ve just finished two solid projects for my portfolio, and I’m looking for some brutal honesty.

The Projects:

Hybrid Recommender (Amazon Dataset): Built a system for 500k reviews using Implicit ALS and TF-IDF. Handled memory issues with Sparse Matrices.

Fintech Fraud Detection: Solved extreme class imbalance using feature engineering and prioritized F1-Score/Recall over accuracy.

I understand the logic, the math behind ALS, and why I chose my metrics. However, I used AI heavily as a "Pair Programmer" to handle complex syntax, library errors (like Scipy index mapping), and boilerplate code.

If you asked me to write the entire CSR matrix mapping from scratch without assistance, I’d probably struggle.

Is it a "red flag" for a Junior/Mid candidate to rely on AI for implementation if they understand the underlying architecture?How do I prove in a technical interview that I actually "get it"?Based on these projects, what should be my next step to become truly "independent"?

reddit.com

u/Grand-Squirrel3173 — 17 days ago

▲ 0 r/askdatascience

Hey everyone,

I’ve just finished two solid projects for my portfolio, and I’m looking for some brutal honesty.

The Projects:

Hybrid Recommender (Amazon Dataset): Built a system for 500k reviews using Implicit ALS and TF-IDF. Handled memory issues with Sparse Matrices.

Fintech Fraud Detection: Solved extreme class imbalance using feature engineering and prioritized F1-Score/Recall over accuracy.

If you asked me to write the entire CSR matrix mapping from scratch without assistance, I’d probably struggle.

reddit.com

u/Grand-Squirrel3173 — 17 days ago

▲ 0 r/MLQuestions

Hi everyone,

I’ve been grinding on two major end-to-end Machine Learning projects to build a solid portfolio, and I’d love to get some feedback from the senior engineers here on my progress and whether I’m "Remote-Ready."

Project 1: Hybrid Recommender System (Amazon Fine Food Dataset)

Goal: Build a scalable recommendation engine handling 500k+ reviews.

The Problem: Solving user-item sparsity and the "Cold Start" problem.
My Approach: I implemented a Hybrid System combining Collaborative Filtering (Implicit ALS) to capture latent user patterns and Content-Based Filtering (TF-IDF) on review summaries.
Key Engineering:
- Optimized memory usage by leveraging Scipy CSR Sparse Matrices.
- Handled data privacy using SHA-256 Hashing for User IDs.
- Managed complex indexing/mapping issues between raw data and the ALS model's latent factors.
Result: The model successfully recommends logically related items (e.g., suggesting varied snacks to a chips buyer) even with sparse interaction history.

Project 2: Financial Risk Analytics (Fraud & Churn Detection)

Goal: Identifying rare fraudulent transactions and predicting customer churn.

The Problem: Dealing with extreme Class Imbalance (fraud cases are <1% of the data).
My Approach:
- Heavy Feature Engineering to extract behavioral patterns from financial logs.
- Focused on Precision-Recall Curves and F1-Score rather than Accuracy to ensure the business doesn't lose money on missed fraud.
- Used advanced classification models to balance sensitivity and specificity.

My Questions for the Community:

Architecture: For the recommender, is the ALS + TF-IDF hybrid still a strong baseline in production, or should I jump straight into Deep Learning (like Two-Tower models)?
Remote Readiness: Does a portfolio covering both Fintech (Structured/Tabular) and E-commerce (Text/NLP/Big Data) show enough versatility for a Junior/Mid-level remote position?
Next Steps: Should I focus on MLOps (FastAPI, Docker, monitoring) for these projects, or build a third project in a different domain (e.g., Computer Vision)?

I’m currently polishing the GitHub repos, but I wanted to hear your thoughts on the technical stack and project choice first.

Thanks for your time and feedback!

reddit.com

u/Grand-Squirrel3173 — 17 days ago