u/ruslan_zasukhin

Image 1 — What would you expect from a good visual Parquet workflow for DuckDB users?
Image 2 — What would you expect from a good visual Parquet workflow for DuckDB users?
Image 3 — What would you expect from a good visual Parquet workflow for DuckDB users?
Image 4 — What would you expect from a good visual Parquet workflow for DuckDB users?
▲ 18 r/DuckDB

What would you expect from a good visual Parquet workflow for DuckDB users?

Hi DuckDB community,

Disclosure: I’m from Paradigma Software, the team behind Valentina Studio. 

Some users asked us whether Valentina Studio could support working with local Parquet files.

In response, we added Parquet support in Valentina Studio 17.4, and I’m curious whether it covers the workflows DuckDB users actually need.

In the Free edition user can open Parquet file, read it, work with it in SQL Editor, having AI Chat assistance.

In the Pro edition more tools integrated to work/edit/manage parquet files in the same way as a local DB: for example DuckDB files or SQLite files.

  • Schema Editor — inspect Parquet schema and column types
  • Data Editor — browse file content visually, including nested/list/struct values
  • Data transfer - move records from/to Parquet to/from any other supported DB.
  • Report Editor - Parquet can be used as data source.

For those of you who work with Parquet regularly — do you have any wish-list for features or workflow improvements?

P.S. VStudio available on Mac, Windows 32/64/arm, Lin x64/arm

u/ruslan_zasukhin — 6 days ago
▲ 15 r/DuckDB

DuckDB 1.5 introduced the new VARIANT type, so we added support for it in Valentina Studio 17.3.

https://preview.redd.it/okxv3i49p90h1.png?width=1536&format=png&auto=webp&s=24587f8e3f6db2eab1bebd7c7efc3642e6eff8da

Current support includes:

  • Schema Editor integration
  • Visual inspection/editing of nested objects & arrays
  • special editors for images, blobs, UUIDs, etc.
  • AI-assisted SQL Editor
  • Direct editing of VARIANT values in Data Editor

Available on macOS, Windows and Linux.

Curious how DuckDB users currently work with semi-structured data and whether visual tooling around VARIANT is useful in practice.

More details:
https://valentina-db.com/dokuwiki/doku.php?id=valentina:articles:vstudio_v173_duck_variant

reddit.com
u/ruslan_zasukhin — 13 days ago
▲ 1 r/SQL+1 crossposts

Hi all,

Let me share some information.

About two years ago, we were asked to add DuckDB support to Valentina Studio. As we explored this database, we realized that from an architectural perspective it is similar to Valentina DB and SQLite — a local embedded database engine. At the same time, like Valentina DB, DuckDB is column-oriented.

This observation led us to the idea of integrating DuckDB into Valentina Server as well. We implemented this integration approximately 1.5 years ago by adding DuckDB under our VKERNEL DLL layer, similar to what we had previously done with SQLite.

For a long time, we were focused on other priorities. However, we were naturally interested in benchmarking Valentina DB against DuckDB. In March, we started developing a benchmark suite using Google Benchmark.

During this work, we also introduced SIMD vectorization into Valentina DB, which significantly improved performance.

Below are the results for WHERE queries. These results are from approximately three weeks ago.

Benchmarks were executed on MacBook Arm M1, tables with 100K, 1M, and 10M records. Tests include both full table scans and filtered selections:

  • Small: ~10 records
  • Half: ~0.5N records
  • Large: ~N − 10 records

It is interesting to note that Valentina outperforms DuckDB in both indexed and non-indexed search scenarios.

https://preview.redd.it/pxmo2y8fd2zg1.png?width=1736&format=png&auto=webp&s=55283174079deed662c47c9389b8d54a9305ba83

reddit.com
u/ruslan_zasukhin — 19 days ago