r/bigquery

A workspace that unifies AI SQL generation, BigQuery execution, and visualization into a single flow.

A workspace that unifies AI SQL generation, BigQuery execution, and visualization into a single flow.

Hey everyone,

While AI has sped up writing BigQuery SQL, the actual workflow around it is still heavily fragmented.

For most data teams, the process currently looks like this: prompt an external LLM, copy the SQL, paste it into the BQ console, fix the schema errors, run the query, and then export the results to a BI tool like Looker Studio or Tableau just to visualize it.

We built Dataki.ai to eliminate that context switching. It’s a unified workspace designed specifically to bridge the gap between AI, BigQuery, and your dashboards.

How it works:

  • Schema-Aware Generation: Dataki connects directly to your BigQuery environment. The AI understands your actual tables and schemas, which drastically reduces hallucinations.
  • Auto-Visualization: When a query runs, the output is automatically mapped to interactive visualizations. No manual axis mapping required.
  • Full Code Control: The platform doesn't hide the code. The generated SQL is fully exposed in the editor for your team to tweak, optimize, and review.
  • Instant Dashboards: You can pin any chart or table directly into a live dashboard without leaving the platform. Then share with your team

Why we're posting:

Dataki is currently in beta and completely free to use.

We are looking for unvarnished feedback from data engineers and analysts who live in BigQuery (or any supported data soruceS). We want to know how the platform handles your real-world workflows, and more importantly, where it breaks down when you throw complex schemas or nested arrays at it.

If your team is looking to streamline the AI-to-BI pipeline, you can try it out here: dataki.ai

We'll be in the comments to answer any technical questions or hear your feedback.

u/fgatti — 2 days ago
▲ 12 r/bigquery+1 crossposts

Cost effective setup for decentralized users with BigQuery as the data warehouse

I work at a national healthcare organization where health facilities submit patient data through an in-house system. We then have an ELT pipeline to take the raw data from this system to BigQuery. Data is cleaned weekly by national-level analysts either within BQ using SQL or RStudio (using BigRQuery package, depending on the preference of the analyst for each dataset). Both raw and clean datasets are stored in BigQuery.

To ensure uniform numbers between national and sub-national levels (the level between our national office and the health facility), we want to make the clean data accessible to analysts working at the sub-national office. There are 20 sub-national offices. National and sub-national analysts use the clean data to make weekly static reports, dashboards, and ad hoc reports per request.

Is it cost effective to provide BQ access to the sub-national level? Or should we put it in a separate storage, like CloudSQL? We use GCP infrastructure so we are limited to Google services.

reddit.com
u/anonyuser2023 — 4 days ago

First time building a Data Warehouse — going with BigQuery + PostgreSQL for a client-facing app

Hi all, first post here :)!

I've been heads-down designing our company's first real Data Warehouse for the past few months and honestly it's been equal parts exciting and overwhelming. Thought I'd throw our setup out here and see if anyone's been through something similar.

Quick background: we're a mid-sized company in Mexico trying to stop living in spreadsheets and actually centralize our data. We have three main sources — an on-prem ERP (Microsip, probably not well known outside MX), HubSpot for CRM, and Shopify for e-commerce. The idea is to consolidate everything into a Medallion architecture (Bronze/Silver/Gold) and have one actual source of truth.

Worth mentioning — we're not dealing with massive scale here. About 10GB built up over 5 years of operations. Not exactly big data, I know. But we've been burned before by building things that don't scale, so we're trying to do this right from the start even if it feels like overkill right now.

There are two things we need this to do: feed internal dashboards and reporting, and also power a client-facing portal where our customers can log in and see their purchase history, warranty info, product suggestions, promotions — basically a unified view of everything across the three platforms.

What we're thinking stack-wise:

BigQuery as the core warehouse handling all the Medallion layers and BI stuff. Then Cloud SQL for PostgreSQL as a serving layer for the app — because from what I've read and tested, hitting BigQuery directly for a customer portal with concurrent users is just not a great idea latency-wise.

We'd sync the relevant Gold-layer data over to Postgres and serve the app from there. Still figuring out the sync mechanism, leaning toward Datastream or just a scheduled pipeline.

Where I'm still lost:

Is BQ → PostgreSQL actually the move here or is there a cleaner pattern I'm missing?

Do you sync full Gold models to the serving layer or build separate denormalized tables just for the app?

Anyone dealt with on-prem ERPs in a setup like this? That's honestly our biggest headache right now

CDC vs scheduled batch for the sync — how much does it matter for a portal like this?

And genuinely curious — given we're only at 10GB, is there anything in this stack you'd simplify or replace with something lighter?

Any experience will be helpful, thanksss!

reddit.com
u/Comfortable_Bus_9781 — 4 days ago
▲ 5 r/bigquery+1 crossposts

Datastream - MySQL to Big query

Hello Everyone!

I want to basically replicate data from my cloud sql instance to Big Query. The problem is since the initial load is expensive , I am gonna use a dump for that and only want the real time data to be captured.

I want it to create empty datasets and tables in Big Query automatically without the initial historical data. Any other solution?

reddit.com
u/OkRock1009 — 5 days ago
▲ 12 r/bigquery+1 crossposts

Dbt + bigquery = perfect match

I've been using dbt and bigquery for a while and discovered how much the duo is just wow .
Curious to know your using dbt with what ?

reddit.com
u/ouhaddaoualid — 6 days ago
▲ 6 r/bigquery+2 crossposts

BiqQuery - larger dataset issue

Has anyone had an issue when trying fetch 20k+ records from BiqQuery to Postgres DB? Everything works fine if I keep it under 10k, using Table Input + SQL, but as soon as I try more records the pipeline fails. Odd Java error message. Ultimately, I am looking to move like 500k records from BQ to Postgres DB.

reddit.com
u/zadrogasauce — 12 days ago

Is BigQuery late to the AI game?

I've used BigQuery for a few years now and this past year I've seen so many different AI tools that help with everything from text-to-SQL to actually building reports and other features.

On one hand I understand they make their bread and butter from the actual warehouse and processing but as a user I would've liked to see more AI features integrated into the product. The new Gemini features work alright but it seems like an afterthought, like there's no way to build reports or visualizations, integrate into messaging apps, or connecting your context and semantics layers.

That was one of the reasons why I joined Bruin as a Developer Advocate recently because I wanted to be involved in building tools that address the stuff I wished I had as a data engineer. We just made our AI data analyst generally available. It connects to any warehouse like BigQuery, it imports the metadata of your datasets and creates a mental map of your data. You can also connect your dbt, airflow, dagster, or bruin pipeline repos to add additional context about your models.

The whole point is to have an agent that lives right inside your team and acts like a team member - from answering quick questions to preparing reports and even troubleshooting data & pipeline issues.

I was quite skeptical at first but we have dozens of clients using it and the more they use it the better the agent gets because it is self-correcting - every conversation and every correction further improves the context.

While I'm speaking about Bruin here, this is the general blueprint and framework for any organization to build themselves an AI data agent that does more than just text-to-sql.

reddit.com
u/uncertainschrodinger — 11 days ago