u/Expensive-Insect-317

Controlling BigQuery jobs from dbt (priority, concurrency, timeouts & cost governance)

I just published a new article where I explore how to better control BigQuery jobs when using dbt, focusing on:

  • Job priority management
  • Concurrency control
  • Timeout strategies
  • Cost governance in BigQuery + dbt workflows

If you're working with dbt + BigQuery in production, this might help you avoid runaway costs and better structure workloads.

medium.com
u/Expensive-Insect-317 — 5 days ago
▲ 1 r/data

Guardrails in LLM Agents: Why They’re a System Design Problem, Not Just Prompts

I recently read this article on guardrails in LLM agents and it made me rethink how we’re building production AI systems.

The core idea is that guardrails are not just “safety filters”, but actual system architecture:

  • Input validation layers
  • Context and memory control
  • Output verification
  • Tool execution boundaries
  • Observability and auditability

What stood out to me is the framing that as models get more capable, guardrails become *more important (*not less) because capability increases impact of failure.

medium.com
u/Expensive-Insect-317 — 5 days ago

This Agent Skills Framework idea is really interesting. The concept of a middle layer for modular, reusable agent capabilities feels like a step toward more structured and scalable AI systems rather than prompt-heavy setups.

medium.com
u/Expensive-Insect-317 — 26 days ago
▲ 14 r/data+2 crossposts

I just read an interesting article about using Apache Spark not only to transform data else also to enforce data contracts within pipelines.

The key idea: the problem isn't that jobs fail, but that they don't fail when they should. The pipelines keep running, but the data might be corrupted → silent errors.

The proposal:

  • Define contracts (schema, quality, SLAs)
  • Validate them at runtime with Spark
  • Fail on critical errors and monitor the rest

This transforms pipelines into systems that guarantee quality, not just move data.

If you don't validate your data within the pipeline, you're relying on assumptions.

u/Expensive-Insect-317 — 19 days ago