r/SQL

Self-Join
▲ 1 r/SQL

Self-Join

Can you guys explain which answer is right and why? I think A is the right option here

u/Iva1996 — 7 hours ago
▲ 52 r/SQL

Pretty sure I just blew the biggest interview of my life. AMA!

Just had an interview with an employer that most people would consider a dream job and am nearly 100% sure I blew it. This is the only interview I've ever studied for. I did not apply to this role. An internal recruiter reached out to me. I do have some positive takeaways as I know what weaknesses I need to shore up for future opportunities.

reddit.com
u/TraumaBondage — 21 hours ago
▲ 36 r/SQL+27 crossposts

We ran a 1,655 person blind study on AI memory. The results changed how we think about the problem.

We’re building KAPEX (getkapex.ai), memoryware for AI applications. Two co-founders, bootstrapped, patent pending. I wanted to share some of what we’ve learned because the discourse in this space keeps circling the same assumptions and I think a few of them are wrong.

The study: 1,655 participants interacted with AI systems with and without our memory layer. Blind setup, they didn’t know which condition they were in.
The finding that mattered most: first-session preference was around 65%. Not bad, but not a clear signal. After 20+ sessions, preference climbed past 80% and kept rising. The longer people used it, the wider the gap.

That trajectory is the insight. Not the final number. The trajectory.

Here’s why that matters for anyone building in this space:

Most AI memory tools are optimized for first impressions. Demo well, retrieve fast, show the user you remembered their name. That’s fine. But it means the entire evaluation framework for memory (including the benchmarks everyone cites) is testing the wrong thing. LongMemEval and LoCoMo test whether you can find what was said. They don’t test whether the system knows what still matters.
Retrieval and relevance are different problems. The industry has spent two years building better retrieval. Almost nobody is building relevance governance: what stays important, what fades, what gets superseded, and whether the user can see and correct what the system believes.

Three things we learned the hard way:

1.	Clean store beats fancy retrieval. Every time. If your memory layer lets stale context accumulate without governance, no amount of reranking or hybrid search fixes the degradation over time. The capture and maintenance side is where the leverage actually is.

2.	Memory without transparency is a black box. If developers can’t see why the agent believes something, and users can’t see what the system thinks it knows about them, then memory becomes a liability rather than a feature. Inspectability isn’t a nice-to-have. It’s what makes correctability possible.

3.	The value of memory is invisible in short sessions. This is why benchmarks miss it. A 5-turn evaluation can’t distinguish between a system with real governance and one that just retrieved the right vector. The difference only shows up after sustained use, which is also when it matters most.  

Our approach treats relevance as something that should be handled continuously by the architecture, not at query time by the retrieval layer. Context that stops being reinforced through usage naturally loses priority. Not deleted, just deprioritized. That’s the principle. Can’t share more on implementation for IP reasons.

Curious what others here are seeing. Is anyone else finding that the retrieval-first paradigm breaks down over time? And is anyone working on evaluation frameworks that test sustained-use performance rather than single-session recall?

getkapex.ai if you want to follow along. Still pre-launch but opening access soon.

reddit.com
u/sandstone-oli — 22 hours ago
▲ 0 r/SQL

Can anybody tell me why this query fails ? Thank you

SELECT 
            H1.created_at , H1.event_type, 
            CASE 
                WHEN H1.event_type = 9 THEN H1.animal_id
                ELSE "No animal assigned"
            END as animal_id,
            H1.rank_assigned_by ,H1.supervisor_id as CURR_SUPERVISOR, 
            H2.supervisor_id as PREV_SUPERVISOR
            ,H1.rescue_point_id , H1.emp_rank as CURR_RANK , H2.emp_rank as PREV_RANK  , 
            H1.salary as CURR_SALARY , H2.salary as PREV_SALARY ,H1.reason
            from Employee_history  H1 left join  (
                SELECT * from Employee_history  E1 where E1.created_at < Employee_history.created_at 
                order by E1.created_at
                limit 1
            ) as H2 on H1.emp_id = H2.emp_id;
reddit.com
u/OddCardiologist2981 — 1 day ago
▲ 1 r/SQL

I built a SQL workspace inside VS Code — saved queries, version history, data compare, session monitor

https://preview.redd.it/z4pqqtxbxs2h1.jpg?width=1917&format=pjpg&auto=webp&s=a9c51a8930ad3f3eca0f4d2f9392ad3d87276090

I got tired of switching between VS Code and a separate DB tool every few minutes, so I built SQLLab — a VS Code extension that puts a full SQL workspace inside the editor.

What it does:

  • Connect to SQL Server, PostgreSQL, MySQL, Oracle, SQLite, DuckDB
  • Saved query library with full-text search (search name, description, and the SQL itself)
  • Every save creates a version — diff any two versions side by side
  • Data Compare: run two queries against different DBs and see SAME/DIFFERENT per row and column
  • Session Monitor: see blocked/active sessions, kill them from right-click
  • Batch Run: run multiple saved queries in parallel, track status per script
  • Parameterised SQL with Jinja-style {{ variable }} substitution
  • Schema-aware autocomplete (tables, columns, PKs, indexes)
  • Paginated results — no freezing on large datasets

Still early stage, so I’d really appreciate feedback:
What features are missing?
What would stop you from using it?
What DB workflow annoys you the most today?

​

reddit.com
u/No-Lettuce-1655 — 23 hours ago
▲ 6 r/SQL+1 crossposts

Synthetic Insurance Claims Dataset for SQL practice - 54 exercises from basic to advanced

I'm an actuary, and I spent the last few weeks building a realistic insurance claims dataset. Insurance data is difficult to come by in the wild; most datasets are either too simple or completely proprietary. Other practice datasets are retail sales or Titanic, which don't provide much value. I wanted something that reflects how real industry data actually looks, so I built it.

It's a SQLite database covering four years of claims across employer groups — members, claims, claim lines, providers, plans, premium rates, the works. Realistic messy data: out-of-network pricing spreads, denial reasons, pending claims, annual maximum exhaustion, processing lag.

Comes with 54 exercises across five tiers:

  1. Foundational SQL (SELECT, WHERE, GROUP BY, JOINs)

  2. Intermediate analytics (window functions, utilization metrics, provider analysis)

  3. Advanced (CTEs, self-joins, cost trend, member behavior analysis)

  4. Actuarial analyses (IBNR, experience rating, credibility, frequency/severity)

  5. Data quality investigation (duplicate claims, billing anomalies, eligibility audits)

Plus four open-ended capstone projects suitable for a portfolio (e.g. dashboards).

Full solution guide included. Works in DBeaver, DB Browser for SQLite, or any SQLite-compatible client — no server setup.

I published the dataset and guides on Gumroad. If interested, let me know and I will provide the link.

reddit.com
u/TarHeelActuary — 20 hours ago
▲ 7 r/SQL

Best way to update NULL values?

I have data that looks like this:

Col1 Col2 Col3
A Two 3
B Two NULL
C Two NULL
D Five NULL
E Five 6

Working in SQL Server, what's the best way to update the NULL values in Col3 to the only non-null value associated with equal values in Col2? e.g. I'd want to update this table to read

Col1 Col2 Col3
A Two 3
B Two 3
C Two 3
D Five 6
E Five 6
reddit.com
u/FlucDissThm — 2 days ago
▲ 9 r/SQL

How should I update tables in Databricks?

I’m very unfamiliar with data engineering (I’m a junior data analyst), so any feedback would be appreciated. I have a set up in Databricks where I use python scripts to ingest multiple table data from SAP and put them in the bronze layer. These data could be changed, added, or deleted, and we always want the latest versions of the tables.

We’ve had some iterations for updating our silver tables from bronze. At first we just called CREATE OR REPLACE TABLE, so it would overwrite all the data with fresh data every time we ran.
Then, we used MERGE INTO to make it more efficient for incremental changes.
Then, we used row-hash comparison in Python to update all the tables.
The tables don’t have many rows yet, with the largest having almost 1M rows. But we are constantly ingesting more tables as the project size grows.
Now looking back, maybe all the iterations was a waste of time since the tables aren’t big enough. We wanted to minimize sql run time to minimize cost.
Those who are seasoned experts, what do you think?

reddit.com
u/parkdrew — 1 day ago
▲ 42 r/SQL

What are common SQL red flags?

Hello! interview prepping, here wondering what are some common red flags for wrioting SQL?

Like

LIKE failing to index, not having trasnactions, usign SELECT * instead of specific collums, etc 😃

reddit.com
u/badboyzpwns — 3 days ago
▲ 5 r/SQL+1 crossposts

Anyone interested in "adopting" a disciple? :P

I'm interested in starting in the data analysis area, i work atm in an accounting office, almost nothing related"... Its been 5 years already, i'm from Brazil, i want learn data analysis and start working internationally to receive in U$, i'm tired of receiving in BRL and paying in U$ 😛...

I would love having someone as a "teaching coach" through discord, if anyone is interested in helping me learn, give me a "roadmap", teach me stuff, MOTIVATE me, would love keeping contact with that person... At the end, think that you get someone who knows all you wanted it to know, soo i'll be like you with less knowledge (i'll learn everything i know, not everything you know, aaah~), could end up working with you/for you, who knows :c... Appreciate your time reading and even more if your interested in taking my hand through this new scary career change!

reddit.com
u/Weaszy — 2 days ago
▲ 0 r/SQL

Help me out !

I've just started SQL from Data with Baara ~ 30hr course
For the course he used Microsoft SQL, but I'm a Mac user
what should I do ?

reddit.com
u/Consistent-Leader-24 — 2 days ago
▲ 2 r/SQL

Struggling with Self-Joins

Hey everyone, I am struggling with learning the self join concept specifically when you are joining the table and it's duplicate on the same column. Why are there duplicate values? What is an example use case for this situation? And lastly regarding the filtering you can do in the WHERE clause, why does it remove the duplicate values? And is that particular filtering logic pretty much the same every time in this situation?

I truly feel like an idiot trying to get my brain to understand this, so please try to explain in the most simplistic way possible.

Thanks!

reddit.com
u/Inner-Significance41 — 3 days ago
▲ 2 r/SQL

using a materialised view to track user-entity authorisation

I'm wondering if this is a used pattern, or something definitely not to do. Working with SQL Server.

We have a system where the rules about what a user can access are things like "if the user has presented a course within the last year then they have access to course materials within that same subject area."

So "rules-based" auth not role-based. Many queries are slow because of all the things they need to check to show a list of available courses, etc to any particular user.

Taking the above example, I was thinking of creating materialised views whose SQL create rows that link users to the entities they can access. So e.g. a user_coursematerial view, which contains user_id and coursematerial_id - the query for which joins users and course materials based on those rules as above.

(Edit: The idea being that the view maintains those connections itself as data changes - I assume that's what happens?)

Then every time I want to list all the course materials a user has access to, I just join the user to that view, and bam I have all the correct material_ids for that user, which is fast and the rules are all in one place.

Is this achievable and acceptable, or are there better approaches to complex permissions issues like this?

reddit.com
u/sweetnsourgrapes — 3 days ago
▲ 2 r/SQL

Best way to practice SQL?

Hello, I am studying data analysis by my own. I've done 2 projects for github and currently I am doing my third project end to end, from SQL to PowerBI.

The thing is that my SQL skills are pretty lame, I need to practice A LOT, I can only do basic queries and I would like to tryhard my SQL skills.

I started this third project and decided to use it for practicing from zero with DB little.

I am doing basic queries through asking myself basic sales questions from the tables that I have, also I am learning with SQL bolt. Everytime I am stuck or the query isn't retriving what I want, I ask Chatgpt to correct and explain the solution to me.

Is this the best way to practice? Have you found a better alternative to learn and practice?

reddit.com
u/franco6652 — 3 days ago
▲ 8 r/SQL

Platforms to practice SQL

I have completed my graduation and have been practicing SQL from a while including stored procedures , triggers.

I want to know what are some certifications that i have to do it or any good platforms to practice/solve and get certifications

or should I do some projects (pls tell me if u know what type of projects )

Thank you

reddit.com
u/FerretLow4499 — 3 days ago
▲ 9 r/SQL

Frontend polling + heavy SQL joins = deadlocks. Looking for architecture advice

Hi everyone,

I’d like some advice on a scalability/database architecture issue.

At work, we built a truck management system. Trucks enter the factory, load products, and deliver them to different distribution centers.

The problem is that management now wants near real-time dashboards showing the full lifecycle of operations. Most of our dashboard queries rely on joins against large historical tables, and some queries take 10–15 seconds to complete.

Right now, the frontend polls the API on a timer to refresh dashboards. This is starting to cause issues:

  • Heavy read queries sometimes block write operations
  • Backend update processes occasionally deadlock with dashboard queries
  • Overall DB performance is degrading as data grows

My current idea is to create separate denormalized/reporting tables specifically for dashboards, populated every few minutes by background jobs, so dashboards stop querying historical transactional data directly.

Would this be the right approach?
How would you handle this architecture-wise?

We're using SQL SERVER.

reddit.com
u/samirson — 4 days ago
▲ 1 r/SQL

[MySQL] creating new column based off categorical data in two other columns, duplicating for each value

hey folks --

apologies for the title gore. I can't actually think of the cleanest way to ask this question (which is partially why I haven't had luck using Claude/etc.). I'll try to explain with a relatively simple example.

let's say you've got columns A, B, and C.

A is a user id (can be treated as a string). B and C are categorical columns (string values) -- let's say B is 'cuisine' and C is 'country'.

presently my table is keyed on A -- there's only a single row, and that row has columns for both cuisine and country.

I'm trying to find the most efficient way to create a new table that has two columns: A and B_C.

for every value in A, there needs to be two rows where one row's B_C value is the value that was in 'cuisine' and the other row's B_C value is what was in 'country.'

I know that I can, e.g., query the original table twice and do a 'CASE WHEN' -- I'm trying to figure out if there's a way to do this in a single query.

reddit.com
u/roastedoolong — 3 days ago