u/Cold_Bass3981

The “stalling phase” in every beginner project

If your project is stuck as a prototype, it’s likely you’re treating it more like a research paper. What saved me a lot of time and energy is putting all my focus into the boring tasks that most organisations just want results for. cost, safety, & speed.

The details people need to keep being reminded of:

  • Token Economics. If you don't know exactly what a user session costs in API credits, you have a liability on your hands.
  • Latency Targets. 10 second response time is just broken. If your initial response isn't under 2 seconds, your UX has failed.
  • Data Safety. Security is the foundational pipe that determines if you’re allowed to touch production data.

More often than not a 1% accuracy boost is useless if it adds 3 seconds to the latency.

  • Audit your token usage then build a dashboard to track spend per request.
  • Users will value a system that is 90% accurate and 100% reliable over a system that is 98% accurate but crashes twice a day.
reddit.com
u/Cold_Bass3981 — 3 days ago

RAG is 90% Data Engineering

If your RAG system is giving you bad answers, it's usually because your data pipeline is a mess. 

Most beginners spend weeks trying to write the perfect prompt. But don't realize that how you cut up your data (Semantic Chunking) matters much more. If you split a paragraph in the middle of a vital sentence, the AI loses the meaning.

When the system tries to find information later, it pulls back a broken fragment instead of a clear fact.

  • Clean your data first. Use a tool like Unstructured to strip out the junk (headers, footers, wierd table artifacts) from your PDFs or HTML files before they ever reach the model.
  • Smart Chunking. Instead of a blind character count, try cutting text where the topic changes. This keeps the context together so the AI can actually "understand" the relationship between facts.
  • Use a framework like Ragas to score your answers. This gives you a hard number to look at so you know for sure if your system is getting smarter or dumber.
  • Re-ranking, add a step (like a Cross-Encoder) that checks if the information the AI found is relevant to the question before you let the model generate an answer.

If you cannot prove your system improved after a change, you're just guessing.

reddit.com
u/Cold_Bass3981 — 4 days ago

If you didn't measure, you've already failed

The common trap you can fall into as an engineer is what I call Vibe Based engineering. You write a prompt, looks pretty good at first, and you just assume you're finished. I’m a culprit of this. And then wonder why the 4:00 AM emergency arises and the AI starts to hallucinate like hell for all your users.

Evaluation comes first. Period. Before you even write your first line of code.

Building an AI system is like building a bridge. You don't just put some bricks down and hope it holds up under the pressure of a car. You calculate the stress points first. In AI, those stress points are your eval sets: a simple list of 20 to 50 tricky questions that your AI must answer correctly every time. 

When you change a setting or a prompt, you run these tests to see if you made things better or just different.

The starter kit I use and recommend to everyone:

  • The Librarian Phase (RAG) break PDFs into chunks and store them in a Vector Database (like Pinecone or pgvector). If the AI can't find the right info, it can't give the right answer.
  • The Judge Phase (Evals) use tools like Promptfoo or Ragas to grade your AI. You can even use a stronger model (like GPT-4o or Claude 3.5 Sonnet) to grade your smaller, production model. 
  • The Mechanic Phase (Observability) once your AI is live, you need to see what’s happening under the hood. Tools like LangSmith or Langfuse let you trace exactly where a conversation went wrong so you can fix it.

Before your next deployment, create a Golden Set of 20 Q&A pairs. Run your prompt through this set and manually grade the faithfulness. If you can't hit 90% accuracy on this small set, your system isn't ready for a thousand users.

reddit.com
u/Cold_Bass3981 — 4 days ago
▲ 2 r/AIEngineeringMastery+1 crossposts

Day 1 Reality of AI Engineering

When I first dipped my toe into AI engineering, I thought I was going to be writing complex algorithms.

But I just spent 80% of my time figuring out how to clean up messy text files and 20% of my time trying to figure out why the AI was ignoring my instructions.

If you’re starting today, the most important shift you can make is realizing that you're building a system.

Think about the last time you tried to find a specific receipt in a shoebox. You didn't need a genius to find it; you just needed a better way to organize the box. That’s what most AI engineering is. We take a giant shoebox of data (PDFs, emails, Slack logs) and we turn it into something a computer can navigate.

If you want to build AI these things always apply

  • The Filing Cabinet (Vector DBs) Learn how to store information so the AI can find it by meaning rather than just keywords. If I search for frozen treats, a good system finds ice cream even if the exact words don't match.
  • The Filter (RAG) This is the bread and butter of the job. It’s the process of grabbing the right piece of info and handing it to the AI at the right time.
  • The Stress Test (Evaluation) This is the part everyone forgets. You have to learn how to prove your AI is getting better. If you change one line of code, did the answers get smarter or just weirder?

To stop shoebox data problems, implement a basic cleaning script using LangChain’s RecursiveCharacterTextSplitter. Don't dump raw text. split it into meaningful chunks (e.g., 500-1000 tokens) with a 10% overlap to ensure the AI never loses the context of a sentence.

reddit.com
u/Cold_Bass3981 — 5 days ago

Building AI? I’d start here.

I get asked this question at least once per week: "Do I need a PhD in math to become an AI Engineer?" The answer is NO.

A few years ago, you had to understand the deep, complex calculus behind how an AI thinks just to get it to say hello. Today, building with AI is much more like playing with LEGOs. The pieces are already made for you; your job is to learn how to snap them together to build something useful.

If you’re starting from zero today, here is the no-panic roadmap:

  • Learn the Logic of Language. Before you write a single line of code, learn how to talk to the AI. If you can't get a clear answer out of a chatbot, you won't be able to build a tool that does it for others.
  • Pick up Basic Python. Python is the glue of the AI world. You don't need to be a world-class coder, but you do need to know how to move data from point A to point B.
  • Understand the Lego Pieces. Learn what a Model is (the brain), what a Vector DB is (the filing cabinet), and what an API is (the phone line that connects them).

The mistake I see beginners make is spending six months studying linear algebra before they ever build an app. Don't do that. Build a tiny, dumb app first. Make it fail. Then figure out why it failed.

Start by building a "Hello World" RAG app using Streamlit and OpenAI’s API. Focus on the integration logic, getting a user query to fetch a text snippet and return an answer. Mastering the data flow is 10x more valuable for a junior engineer than mastering the underlying calculus.

reddit.com
u/Cold_Bass3981 — 7 days ago

When you treat AI like Google search

Most people use AI the same way they use a search engine. They type in a short, two-word phrase like marketing plan or recipe ideas and then get frustrated when the AI gives them a generic, boring answer. The secret to getting better results is in changing how you view the AI entirely.

Instead of seeing it as a search box, try treating it like a brilliant but very literal intern. If you hired a new intern and just said "make a marketing plan," they would have no idea who your customers are, what your budget is, or what you have tried before.

They would probably come back with something totally useless. But if you sat that intern down and said, "I’m launching a new brand of organic dog treats for city-dwellers, my budget is $500, and I want a three-week plan for Instagram," you would get something much closer to what you need.

The tech world calls this Prompt Engineering, but that is just a fancy way of saying "being a good communicator." The more context you give, the more the AI can help you.

How to get better results today:

  • Define the Persona: Tell the AI exactly who it is supposed to be (e.g., Act as a Senior Content Strategist).
  • Give Constraints: Set a budget, a word count, or a specific tone of voice to follow.
  • Provide Context: Instead of "write an email," try "write a follow-up email to a client who hasn't replied to my $2,000 quote in three days."
  • Use Few-Shot Prompting: Give the AI 2 or 3 examples of the style or format you want. It's the fastest way to get it to match your "vibe" without writing a novel of instructions.

Next time you use an LLM, try the "Role-Task-Format" framework. Define the Role (Expert Coder), the Task (Refactor this Python function), and the Format (Output a markdown code block with comments). You'll notice an immediate jump in output quality.

reddit.com
u/Cold_Bass3981 — 9 days ago

when clients don't value you more as an engineer

painful lesson #6666

I worried about deep math for so f****** long and over engineering my agent to look more impressive in front of my clients (vanity metric). looking back now it was just wasted time.

what I'm doing now is with clients is paying attention to the things that would worry my previous boss. for example how much the AI costs to run, how to keep user data safe, and how to make the app fast. 

these are the boring details that most people brush off, but make no mistake they are important when you are trying to ship a product. if you cannot solve these basic underlying problems, your project will never leave the testing phase. this is what I saw my other fellow engineers get credited for

start by auditing your token usage per request and setting hard latency targets (e.g., < 2s for initial response). building a simple dashboard to track these metrics is more valuable to a stakeholder than a slightly better accuracy score on a theoretical dataset.

when I shifted my focus on the boring ass plumbing, the parts that handle data and cost, I become much more valuable in my clients eyes. companies want a system that is secure, and cheap enough to run every day.

thought I'd share, so you don't make the same painful mistake.
don't know if anyone else can resonate?

reddit.com
u/Cold_Bass3981 — 9 days ago

getting past the text only bottleneck with multimodal??

I’m curious if anyone else has been doing this.

My limit on building with AI used to be the text box. If I had a broken sink or buggy UI. For the love of god, I’d have to write a whole paragraph to explain it. That translation layer has mostly gone, praise the lord.

The models process images, audio, and video directly. And currently I'm changing how I’m building tools. AI finally handles raw context without a human-in-the-loop to describe it.

This is what I’m doing right now. Thought I’d share.

  • Visual Debugging. Upload a raw UI screenshot to GPT-4o or Claude 3.5 Sonnet. It can identify layout shifts and suggest a CSS fix immediately. This is much faster than when I would manually describe a bug in a ticket.
  • Audio-to-Data. Use Whisper to pipe messy voice notes into a structured JSON schema. This turns unstructured speech into data your backend can actually use for logs or field reports.
  • Multimodal RAG. Index your visual assets alongside your text. Add captions and visual descriptions to the vector database so the search engine understands both the technical documentation and the actual schematics.

To be honest when I treat the model as a partner that processes raw input, rather than a chat box. It flippin helped. I stopped wasting my time on prompting, and put all my focus on solving the underlying problem.

reddit.com
u/Cold_Bass3981 — 11 days ago

wtf is an AI Agent??

The word AI Agent is everywhere lately, and it usually makes people feel like they’re missing out on some complex secret. However, the reality is much simpler than the tech world makes it sound.

A normal chatbot is basically just talk. It can give you great advice, explain things really well, and answer almost any question, but it can’t actually do anything. You tell it to organize your spreadsheet and it’ll tell you how… but it can’t open the file or make any changes.

An AI Agent is different. It can take action.

Give it the right tools and it can actually use them. For example, if you ask an agent to plan a trip, it doesn’t just list hotels. It can check real flight prices, look at your calendar, compare dates, and even draft the emails for you. If one step doesn’t work, it tries another way until the task is done.

In simple terms:

Chatbots talk.

Agents do the work.

Here’s how to start building them:

  • Tool Calling (Function Calling): This is what gives the agent hands. Using OpenAI or Anthropic APIs, you give the model access to specific functions like checking the weather, querying your database, sending emails, etc. The model decides when to use them.
  • Reasoning Loops (ReAct): Instead of asking once and getting one answer, you run a loop: the agent thinks → takes an action → sees the result → thinks again. This lets it fix its own mistakes if something goes wrong.
  • Start small: Don’t try to build an all-powerful assistant right away. Begin with one clear purpose, like a Calendar Optimizer or Expense Tracker. It’s way easier to build, test, and make reliable.
reddit.com
u/Cold_Bass3981 — 12 days ago

The one thing that actually gets you hired in AI Engineering

The people landing the best offers are not the ones with the most complicated or impressive code. They’re the ones who have the clearest proof that they can build something useful.

If you’re job hunting right now, stop spending all your time on super complex projects that only you understand. Instead, focus on building something simple that recruiters can click, play with, and instantly get.

For me, the project that got the most attention was a simple RAG tool: you upload a PDF, ask a question, and it highlights the exact sentence it used for the answer. Nothing flashy, but a recruiter could open the link on their phone and see it working in under 30 seconds. That alone did most of the talking for me.

A lot of us overthink this. We pour weeks into fancy backend stuff nobody will ever see, while the actual demo looks messy or hard to use. I only realized my mistake when I started prioritizing reliability and ease of use over raw complexity. A clean, working tool that someone can try immediately beats a sophisticated but broken notebook every single time.

Here’s what seems to work:

  • Make the UI simple and clean. Use Streamlit, Gradio, or Vercel to turn your script into something recruiters can click without any setup. If they can’t open it and try it, it basically doesn’t exist for them.
  • Solve one small, clear problem. Instead of building a giant all-in-one AI assistant, make something specific like a Legal Contract Summarizer or a GitHub README Generator. Specificity shows you understand how to solve real problems.
  • Show your work behind the scenes. Add a public link to your LangSmith or Arize Phoenix traces so they can see that you actually monitor and care about how the app performs. It quietly proves you think like an engineer.

Bottom line: If you want your LinkedIn messages to start getting replies, build one reliable app, make it look decent, deploy it publicly, and let people play with it.

reddit.com
u/Cold_Bass3981 — 13 days ago

I used to treat evaluation like a deep-cleaning day. Something I only did once a month when I had extra time. Predictably, that meant I was shipping code that broke on edge cases I could have caught in minutes if I just had a repeatable process.

Now, I don't hit deploy without running a minimalist 5-minute check. It’s not a full research benchmark, but it catches the retrieval misses that account for the vast majority of production failures.

My eval stack starts with a "20-Question Golden Set." I stopped trying to build 500-question datasets because, for a v1, you only need 20 high-quality rows. I divide them into four buckets:

  • 5 "Happy Path": Standard questions the model should nail.
  • 5 "Multi-Hop": Requires connecting info from different parts of a document.
  • 5 "Edge Cases": Specific details found in things like footnotes or tables.
  • 5 "Negative Cases": Questions where the answer is intentionally missing from the context.

To grade these, I use an LLM-as-a-Judge prompt with a small, fast model (like Llama 3 or Phi-3.5). I have the judge extract every factual claim and check if it’s directly supported by the source context. If a claim is unsupported, it's flagged as a hallucination.

I track two specific Ship/No-Ship Metrics:

  1. Faithfulness Rate (>90%): The AI can't lie more than once in ten tries.
  2. Abstention Accuracy (100%): This is the hard rule. If the AI tries to answer a "Negative Case" instead of saying it doesn't know, the deploy is dead.

This simple ritual has saved me from at least three "how did this happen?" meetings in the last month alone. If your model tries to be "helpful" by making up an answer to a question it can't solve, you need to tighten the system instructions before your users find those hallucinations for you.

reddit.com
u/Cold_Bass3981 — 19 days ago

I used to spend hours writing massive, obsessive system prompts for my RAG apps. I’d have ten different refusal examples, "never do X," "always check Y," and a whole paragraph of the model role-playing as a "safe and truthful assistant." 

It looked impressive in the code, but the second a real user tried a basic jailbreak, the model would just fold. I was playing a game of whack-a-mole with my own instructions, adding 50 words every time a hallucination slipped through until the prompt became a novel the model started ignoring anyway.

I only broke that cycle when I started treating prompt engineering like a technical constraint rather than a creative writing exercise. I leaned into structured prompting patterns to move away from "be helpful" and toward "follow these exact logic gates." 

Now, I use one simple pattern for 90% of my builds. I slap an 8-line guardrail template at the end of every prompt that forces the model to answer ONLY using the provided context and to reply with a specific "not enough information" string if the context is missing.

The secret sauce is forcing the model to quote 1-3 verbatim sentences from the source before answering. By making the AI "prove its work" with no paraphrasing allowed, you kill 80% of hallucinations instantly. 

It’s not a 100% fix, but it replaced nearly all of my custom guardrail code with eight lines of text. When I tested it against 20 jailbreak attempts last week, it refused 95% of them. It turns out that a reliable system doesn't need a longer prompt; it just needs a stricter structure.

Next time you see your RAG app hallucinating, resist the urge to add "please be more accurate" to your prompt. Instead, add a rule that requires a verbatim quote from the source before the answer. If the model can't find a quote, it can't invent a lie.

reddit.com
u/Cold_Bass3981 — 21 days ago

I used to spend hours writing massive, obsessive system prompts for my RAG apps. I’d have ten different refusal examples, "never do X," "always check Y," and a whole paragraph of the model role-playing as a "safe and truthful assistant." 

It looked impressive in the code, but the second a real user tried a basic jailbreak, the model would just fold. I was playing a game of whack-a-mole with my own instructions, adding 50 words every time a hallucination slipped through until the prompt became a novel the model started ignoring anyway.

I only broke that cycle when I started treating prompt engineering like a technical constraint rather than a creative writing exercise. I leaned into structured prompting patterns to move away from "be helpful" and toward "follow these exact logic gates." 

Now, I use one simple pattern for 90% of my builds. I slap an 8-line guardrail template at the end of every prompt that forces the model to answer ONLY using the provided context and to reply with a specific "not enough information" string if the context is missing.

The secret sauce is forcing the model to quote 1-3 verbatim sentences from the source before answering. By making the AI "prove its work" with no paraphrasing allowed, you kill 80% of hallucinations instantly. 

It’s not a 100% fix, but it replaced nearly all of my custom guardrail code with eight lines of text. When I tested it against 20 jailbreak attempts last week, it refused 95% of them. It turns out that a reliable system doesn't need a longer prompt; it just needs a stricter structure.

Next time you see your RAG app hallucinating, resist the urge to add "please be more accurate" to your prompt. Instead, add a rule that requires a verbatim quote from the source before the answer. If the model can't find a quote, it can't invent a lie.

reddit.com
u/Cold_Bass3981 — 21 days ago

I used to be all-in on cloud APIs. For any side project, I’d just grab an OpenAI or Anthropic key and not think twice. It was convenient. No worrying about VRAM, super fast responses, and I could spin something up in minutes.

But that “pay-as-you-go” comfort slowly turned into real pain.

Last month one of my small RAG tools that I built for a few friends racked up $120 in API costs. Then an experimental agent I left running in a loop hit $450. That was the moment I opened a spreadsheet and realized I was basically burning money every time someone used my stuff.

The numbers that really shocked me were pretty simple:

A single RAG query on something like GPT-4o-mini costs around $0.0005. Sounds tiny, right? But once you scale to a million queries, that becomes a $500 monthly bill for what’s supposed to be a side project.

Now compare that to running a quantized Llama-3.1-8B locally on a 4090. For those same million queries, you’re probably looking at just $15–30 in electricity and normal hardware wear.

Even at a more realistic 200k tokens per month, the cloud bill was hitting $50 while the local setup cost me barely $10. And the best part? My latency went from about 2 seconds waiting on the cloud to under 0.5 seconds locally.

These days I still use Claude 3.5 Sonnet when I’m in the early prototyping phase and I need that really strong reasoning. But the moment a project starts getting real users or higher volume, I move it over to a local model.

The freedom feels good. No more rate limits, full privacy, and zero surprise bills at the end of the month.

If you’re tired of watching your cloud costs creep up, try tracking your token usage for just one week. If you’re spending more than $50 a month on inference for stuff that a 7B or 8B model can handle decently, it might be worth thinking about running things locally instead of renting compute forever.

Has anyone else made the switch from cloud to local and actually stuck with it?

reddit.com
u/Cold_Bass3981 — 22 days ago
▲ 6 r/AiBuilders+1 crossposts

I’ve been auditing quite a few RAG codebases lately, and it’s surprising how often the hallucinations creep in even when the setup looks decent on paper.

A lot of the trouble starts with chunking. People are still breaking documents into fixed-size pieces with no overlap whatsoever. That means a sentence can get sliced right down the middle, or an important qualifying detail ends up in a completely different chunk. The model doesn’t get the full picture, so it ends up guessing to make the answer hang together.

I’ve tried switching to splitting on actual sentences and adding something like 100 tokens of overlap. It’s a small tweak, but it gives the model complete thoughts instead of fragments. In the cases I tested, it reduced a good chunk of those made-up answers pretty quickly.

Another issue that shows up a lot is missing metadata filtering. The retriever just grabs any chunks that seem related, even if they come from totally different documents or sections. 

You might get one piece from the beginning of a report and another from way later, and the model tries to stitch them together. That almost always leads to invented connections that weren’t in the original material.

Putting in basic filters, like keeping everything tied to the right filename or section header, helps keep the context focused and relevant. It’s not fancy, but it stops a lot of that mixing-and-matching nonsense.

On top of that, most projects don’t test properly. Throwing in a line like “be accurate” in the prompt doesn’t do much in practice. What actually helps is putting together a small set of real questions (maybe 20 or so) that you know the correct answers for, then using another LLM to judge whether the generated response sticks faithfully to the retrieved sources. 

Without that kind of check, it’s hard to know if your system is really solid or just lucky on the easy cases.

When it comes down to it, making RAG reliable has less to do with picking the newest model and more to do with cleaning up these everyday parts, better ways to split the text, smarter retrieval rules, and honest evaluation that catches problems early.

If your RAG starts hallucinating on a question, my first move now is to look at the chunk boundaries. If a key fact is split between two chunks, the model never really had everything it needed, so it’s no wonder it starts filling in the blanks.

Have any of you dealt with hallucinations that were tricky to track down? What fixed it for you?

reddit.com
u/Cold_Bass3981 — 23 days ago

Look, when those 2 million-token context windows dropped earlier this year, I thought RAG was dead. I was like, “Why am I still chunking documents and building vector databases when I can just throw 50 PDFs into one prompt and be done?”

So I tried it for a week straight. Big mistake.

Yeah, the model can technically read everything, but its attention drifts like crazy, and the reasoning still falls apart. It starts missing important parts, especially in the middle.

I also ran into latency issues, waiting 40–45 seconds for every single response. Users hated it, and honestly, I got tired of it too.

So I went back to a hybrid setup. Use RAG to quickly grab the 10 most relevant chunks, then feed just those into the large context window for the actual reasoning. Boom! Responses dropped to ~2 seconds, with way better accuracy.

What I realized is that it’s not “RAG vs. long context.” It’s “use RAG so you don’t dump garbage into that long context.”

Even with massive windows, a little smart filtering still wins. Old-school retrieval keeps the AI fast and actually focused.

If you’re thinking about stuffing your whole codebase or a bunch of docs into one prompt… do yourself a favor and run a quick “needle in a haystack” test first. If the model starts missing details in the middle, you already know you still need retrieval.

What do you guys think still going all-in on long context, or keeping RAG in the mix?

reddit.com
u/Cold_Bass3981 — 23 days ago

Look, when those 2 million-token context windows dropped earlier this year, I thought RAG was dead. I was like, “Why am I still chunking documents and building vector databases when I can just throw 50 PDFs into one prompt and be done?”

So I tried it for a week straight. Big mistake.

Yeah, the model can technically read everything, but its attention drifts like crazy, and the reasoning still falls apart. It starts missing important parts, especially in the middle.

I also ran into latency issues, waiting 40–45 seconds for every single response. Users hated it, and honestly, I got tired of it too.

So I went back to a hybrid setup. Use RAG to quickly grab the 10 most relevant chunks, then feed just those into the large context window for the actual reasoning. Boom! Responses dropped to ~2 seconds, with way better accuracy.

What I realized is that it’s not “RAG vs. long context.” It’s “use RAG so you don’t dump garbage into that long context.”

Even with massive windows, a little smart filtering still wins. Old-school retrieval keeps the AI fast and actually focused.

If you’re thinking about stuffing your whole codebase or a bunch of docs into one prompt… do yourself a favor and run a quick “needle in a haystack” test first. If the model starts missing details in the middle, you already know you still need retrieval.

What do you guys think still going all-in on long context, or keeping RAG in the mix?

reddit.com
u/Cold_Bass3981 — 23 days ago

Look, when those 2 million-token context windows dropped earlier this year, I thought RAG was dead. I was like, “Why am I still chunking documents and building vector databases when I can just throw 50 PDFs into one prompt and be done?”

So I tried it for a week straight. Big mistake.

Yeah, the model can technically read everything, but its attention drifts like crazy, and the reasoning still falls apart. It starts missing important parts, especially in the middle.

I also ran into latency issues, waiting 40–45 seconds for every single response. Users hated it, and honestly, I got tired of it too.

So I went back to a hybrid setup. Use RAG to quickly grab the 10 most relevant chunks, then feed just those into the large context window for the actual reasoning. Boom! Responses dropped to ~2 seconds, with way better accuracy.

What I realized is that it’s not “RAG vs. long context.” It’s “use RAG so you don’t dump garbage into that long context.”

Even with massive windows, a little smart filtering still wins. Old-school retrieval keeps the AI fast and actually focused.

If you’re thinking about stuffing your whole codebase or a bunch of docs into one prompt… do yourself a favor and run a quick “needle in a haystack” test first. If the model starts missing details in the middle, you already know you still need retrieval.

What do you guys think still going all-in on long context, or keeping RAG in the mix?

reddit.com
u/Cold_Bass3981 — 23 days ago