u/mportdata

Text-to-SQL with DuckDB
▲ 13 r/DuckDB+2 crossposts

Text-to-SQL with DuckDB

I’m currently creating a modular text-to-SQL library, for this I need to run integration tests regularly and evals preferably on databases with a variety of tables. DuckDB has been perfect for this as I can run it locally without the cost of storing and processing my sample database in a cloud data warehouse. The TPC-H extension also meant I could populate my DuckDB database really easily too. I’ve outlined where this is up with a DuckDB example in this video. Sharing as I think DuckDB has been so useful for me in this scenario. Text-to-SQL: using piglets to prepare your context with u/duckdb and u/OpenAI

https://youtu.be/cNXm1t_4mh0

u/mportdata — 13 days ago

For anyone building text-to-SQL workflows or agents, I've created a new python library that might help:

https://github.com/mportdata/piglets

Video on what it can do so far is here: https://www.youtube.com/watch?v=MARYRBQY2OE

So far piglets can be used to perform logical planning and dual-pathway pruning.

It can be used with all LLM providers and so far Snowflake, BQ and Motherduck on the cloud data warehouse side.

Why did I make this? From using out of the box text to SQL tools I've found a benefit from doing some batch pre processing up front such as enhancing metadata using an LLM or reducing the context to only fields we believe to be relevant.

piglets is meant to be a modular toolkit so you can bolt on additional functionality to an existing text-to-sql workflow. I will be adding more functionality soon, the techniques I plan to implement come from recent research papers and I will call out where they come from as I add them. The current techniques both come from the Apex-SQL paper.

u/mportdata — 25 days ago