u/uncertainschrodinger

Image 1 — [OC] Wikipedia AI referenced articles growth since
Image 2 — [OC] Wikipedia AI referenced articles growth since
Image 3 — [OC] Wikipedia AI referenced articles growth since
Image 4 — [OC] Wikipedia AI referenced articles growth since

[OC] Wikipedia AI referenced articles growth since

Sources: Wikipedia MediaWiki Action API, Wikipedia Vital Articles / Level 4

Tools: Bruin cli, BigQuery, Bruin dac

Methodology

Universe. Two tiers (14,004 articles total, 11 top-level subjects, 110 sub-subjects). Tier 1: Wikipedia Vital Articles / Level 4 - 9,907 curated articles across all 11 subjects. Tier 2: 4,097 WikiProject Top/High-importance articles from Companies, Brands, Computing, Internet culture, and Business - added only to Society and social sciences (+2,735) and Technology (+1,362) to compensate for those areas being under-represented in Vital L4. Vital takes priority on collision.

AI seed list. 48 curated AI-topic articles spanning foundations (Artificial intelligence, Machine learning, Neural network, Deep learning, Supervised/Unsupervised/Self-supervised learning), architectures (Transformer, CNN, RNN, GAN, Diffusion model, Attention, LSTM), modern systems (LLM, GPT-3, GPT-4, ChatGPT, Claude, Gemini, LLaMA, BERT, Stable Diffusion, DALL-E, Midjourney, Generative AI, Foundation model), companies (OpenAI, Anthropic, DeepMind, Hugging Face), sub-fields (NLP, Computer vision, RL, Speech recognition, Symbolic AI, Machine translation, Robotics, Expert system), and cultural/policy (AI alignment, safety, ethics, AGI, existential risk, technological singularity, regulation, AI winter). Each canonical title is expanded with its current redirect aliases.

Snapshots. 14 semiannual snapshots at fixed dates (December 1 and May 1, Dec-2019 through May-2026). For each (article × date), the MediaWiki Action API returns the closest revision at or before the target date; body wikilinks (regex-extracted from wikitext, excluding namespace, self, and anchor links) are intersected with the AI alias list to count "AI references".

Pipeline. Raw scrapes -> staging joins -> subject/sub-subject/article aggregates. This dashboard queries staging.wat_ai_reference_counts directly. All assets run via Bruin cli on BigQuery; the dashboard renders via Bruin dac.

Limitations & caveats

Slicing & filtering. Gainer charts rank by absolute percentage-point gain since Dec 2019, not relative growth; the sub-subject chart shows the top 8 only. Both gainer charts and every small-multiples panel apply the same eligibility filter: n>=20 articles AND >=1 AI-referencing article at the latest snapshot. The 20-article floor avoids small-denominator noise (e.g. a 2-article sub-subject swinging to 50% on a single edit). Small-multiples panels show up to 7 sub-subjects (top by article count); panels with sparse AI uptake show fewer (History 2; Everyday life and Geography 3; Mathematics 4; Arts and Physical sciences 5; Biology & health 6).

Comparability. In the small-multiples grid, per-panel y-ranges are independent - compare shapes, not heights. The universe is not uniform across subjects: only Society and social sciences and Technology receive the WikiProject Top/High extension; the other 9 subjects are Vital L4 only. Cross-subject magnitudes therefore reflect both AI uptake AND uneven corpus composition.

What "AI reference" means. A structural body wikilink to one of 48 curated AI articles (plus current redirect aliases), not a semantic measure of AI content. Template-generated and navbox links are excluded; only editor-chosen body links count.

Scope. Universe is curated (Vital L4 + WikiProject Top/High in 5 categories = 14,004 articles), not a random or exhaustive sample of Wikipedia. English Wikipedia only. Results generalise to "important, well-edited articles", not to long-tail content.

Time. Some AI seed pages did not exist in 2019 (e.g. ChatGPT, GPT-4, Claude, Gemini, LLaMA, Stable Diffusion, Midjourney), so apparent growth partly reflects new AI vocabulary entering Wikipedia rather than only existing articles adopting new links. Snapshots are semiannual (Dec 1 / May 1), so spikes shorter than ~6 months and revisions reverted between snapshots are invisible. The MediaWiki API returns the closest revision at or before each snapshot date, so an article's state can be up to ~6 months stale relative to the next snapshot.

u/uncertainschrodinger — 2 days ago

Do you prefer building dashboards using a UI based BI tool or code?

On a scale of 1 to 3, what is your preference for building dashboards and visualizations?

1 -> fully no-code (drag-and-drop only) don't write queries or code, and often can't fully access or customize the underlying code behind charts or dashboards (e.g. Power BI, Data Studio)

2 -> hybrid (mostly drag-and-drop) drag-and-drop dashboard building, but you can also write/edit queries and view the underlying code and customize things (Grafana, Superset, Metabase)

3 -> fully code-driven (code-only) queries, layout, styling, interactions, and chart behaviour all defined in code (e.g. Plotly Dash, D3js, Streamlit)

reddit.com
u/uncertainschrodinger — 8 days ago

[OC] How Claude is used across occupations: directive, iteration, feedback, validation, learning (AEI v3)

Sources: Anthropic Economic Index v3 - collaboration patterns (CC BY 4.0; per-task directive / task_iteration / feedback_loop / validation / learning percentages and per-task usage_count_global weights), O*NET-SOC 28.3 (CC BY 4.0; 8-digit O*NET-SOC codes used to roll AEI tasks up to BLS SOC major groups via LEFT(onet_soc_code, 2)).

Tools: Bruin CLI (pipeline), BigQuery (warehouse), Bruin DAC (visualization).

Limitations: AEI also publishes none and not_classified collaboration buckets - both are dropped before renormalizing so the five shown patterns sum to 100 % within a group, which inflates each pattern's share by however much fell into those two buckets. Within-group percentages are weighted by AEI conversation count, so heavy tasks dominate the group composition; an unweighted view would be flatter. Groups with fewer than 1,000 weighted conversations are dropped. Construction & Extraction and Building & Grounds Cleaning have all-NULL collaboration fields in release_2026_01_15 and are also dropped, leaving 20 of the 22 BLS SOC major groups.

u/uncertainschrodinger — 10 days ago

Co-Working offices in Kadikoy

Anyone have any experience with co-working offices in Kadikoy or surrounding area?

Most of the team lives around marmaray so we're looking for something preferably accessible by marmaray or M4.

We've checked out a few places but I'm curious to hear what place there are and what your experience has been.

reddit.com
u/uncertainschrodinger — 10 days ago

Is BigQuery late to the AI game?

I've used BigQuery for a few years now and this past year I've seen so many different AI tools that help with everything from text-to-SQL to actually building reports and other features.

On one hand I understand they make their bread and butter from the actual warehouse and processing but as a user I would've liked to see more AI features integrated into the product. The new Gemini features work alright but it seems like an afterthought, like there's no way to build reports or visualizations, integrate into messaging apps, or connecting your context and semantics layers.

That was one of the reasons why I joined Bruin as a Developer Advocate recently because I wanted to be involved in building tools that address the stuff I wished I had as a data engineer. We just made our AI data analyst generally available. It connects to any warehouse like BigQuery, it imports the metadata of your datasets and creates a mental map of your data. You can also connect your dbt, airflow, dagster, or bruin pipeline repos to add additional context about your models.

The whole point is to have an agent that lives right inside your team and acts like a team member - from answering quick questions to preparing reports and even troubleshooting data & pipeline issues.

I was quite skeptical at first but we have dozens of clients using it and the more they use it the better the agent gets because it is self-correcting - every conversation and every correction further improves the context.

While I'm speaking about Bruin here, this is the general blueprint and framework for any organization to build themselves an AI data agent that does more than just text-to-sql.

reddit.com
u/uncertainschrodinger — 11 days ago

Tool for data ingestion, transformation, orchestrations, and analysis [self-promotion]

Disclaimer, I’m a developer advocate at Bruin. I previously worked in data analyst and then data engineering roles for almost 10 years, and now at this job I finally have the freedom to play around with data just for fun. This community has always been my go to place to find cool datasets.

That’s why I’m excited to share this announcement with you but I promise to keep the promotional talk very minimal.

I’m sure many of you use AI agents to analyze data, build dashboards, and share them with friends and others. Bruin has a lot of open-source tools for data ingestion, transformation, orchestration, and visualization. Today we are announcing the general availability of Bruin Cloud which is the managed service of those free open-source tools.

I’m personally excited because as a dev advocate I’ve focused mainly on our open-source tools but managing and deploying them locally is sometimes an obstacle for someone that just wants to play around with data - so the free tier (no payment required) version of Bruin Cloud will give you enough credits to get started to run your pipelines but more importantly analyze your data using the AI data analyst and dashboard builder.

Check out the open-source tools: https://github.com/bruin-data

If interested, feel free to check Bruin Cloud too: https://cloud.getbruin.com/register

u/uncertainschrodinger — 11 days ago

Sources: MeteostatOpen-MeteoPolymarket CLOB.

Tools: Bruin CLI (pipeline), BigQuery (warehouse), Bruin DAC (visualization).

Limitations: Meteostat returns the METAR nearest the top of each UTC hour, so the alleged sub-hour spike at CDG on 2026-04-15 between 19:00 and 20:00 shows up as a recovery leg rather than a spike. The dashed price line is the last CLOB tick within each hour; intra-hour movement is not visible. Trader identity and on-chain wallet attribution are out of scope.

u/uncertainschrodinger — 22 days ago

Sources: MeteostatOpen-MeteoPolymarket CLOB.

Tools: Bruin CLI (pipeline), BigQuery (warehouse), Bruin DAC (visualization).

Limitations: Meteostat returns the METAR nearest the top of each UTC hour, so the alleged sub-hour spike at CDG on 2026-04-15 between 19:00 and 20:00 shows up as a recovery leg rather than a spike. The dashed price line is the last CLOB tick within each hour; intra-hour movement is not visible. Trader identity and on-chain wallet attribution are out of scope.

u/uncertainschrodinger — 22 days ago

I’ve built an open source CLI tool to build dashboards, but the key point is that it is based on “dashboard as code” principles so that every dashboard’s properties, queries, and semantic layer lives inside yaml or tsx files, which makes it agent-friendly out of the box.

This is my answer to the whole AI dashboard and BI tools out there, but focusing more on the framework and semantic layer so that it works better with AI agents.

Today's the first day of releasing this publicly, so please share your honest feedback, skepticism, and even roast it - and if you want, give the repo a star.

reddit.com
u/uncertainschrodinger — 23 days ago
▲ 0 r/foss

DAC (dashboard-as-code) is a free open-source tool built using Go that connects to most databases and you can build dashboards right inside YAML/JSX files (and yeah, that means load-time dynamic generations of charts, tabs, and values).

The idea here is to create an open standard for building the analytics tools for databases that is built for AI agents out of the box. You can connect it to any agent and start building the semantic layer and dashboards and deploy it locally or on a server.

Today's the first day of releasing this publicly, so please share your honest feedback, skepticism, and even roast it - and if you want, give the repo a star.

https://github.com/bruin-data/dac

u/uncertainschrodinger — 23 days ago

I wrote a blog summarizing my personal journey this past year, going from an AI skeptic that barely used copilot in vscode for basic tasks, to using it every day to build pipelines, maintain things, and analyze data. Link in the comments.

reddit.com
u/uncertainschrodinger — 24 days ago