u/Shoddy_One4465

▲ 11 r/erlang+1 crossposts

[ANN] ExDataSketch v0.9.0 - Streaming Integrations

GitHub | Hex | Docs

Summary

ExDataSketch v0.9.0 adds stream-native integration, production persistence, structured observability, and ULL accuracy corrections. The release positions ex_data_sketch as a BEAM-native streaming approximate analytics infrastructure layer, not merely a collection of probabilistic algorithms.

Key Changes

  • Stream/Collectable integration for all 13 mergeable sketch types
  • Broadway, GenStage, and Flow pipeline integration (all optional)
  • 5 persistence backends: ETS, DETS, CubDB, Mnesia, Ecto
  • Structured :telemetry events + OpenTelemetry bridge
  • ULL linear counting + large range correction (62.5% -> 0.8% error at p=8/n=1000)
  • Configurable update_many chunk size (HLL, ULL, CMS, Theta)
  • EXSK v1 serialization escape hatch for backward compatibility
  • 9 production-oriented Livebooks
  • 3 new educational guides (aggregation_wall, distributed_merge_semantics, livebooks)

Backward Compatibility

Full backward compatibility with v0.8.0. No API changes to existing modules. ULL estimates at low cardinalities (p < 12, n < 500) are more accurate but may differ numerically from v0.8.0.

New Dependencies

  • :telemetry ~> 1.0 (required)
  • :broadway, :flow, :cubdb, :ecto_sql, :opentelemetry_api (optional)

Modules Added (21)

ExDataSketch.Stream
ExDataSketch.Broadway
ExDataSketch.Broadway.PeriodicAggregator
ExDataSketch.Flow
ExDataSketch.GenStage
ExDataSketch.GenStage.SketchConsumer
ExDataSketch.GenStage.SketchProducer
ExDataSketch.GenStage.SketchStage
ExDataSketch.Storage
ExDataSketch.Storage.ETS
ExDataSketch.Storage.DETS
ExDataSketch.Storage.CubDB
ExDataSketch.Storage.Mnesia
ExDataSketch.Storage.Ecto
ExDataSketch.Storage.Ecto.Schema
ExDataSketch.Storage.Ecto.Migration
ExDataSketch.Telemetry
ExDataSketch.Telemetry.OpenTelemetry
ExDataSketch.Integration
ExDataSketch.Binary (encode_v1/4)

Livebooks (9)

streaming_cardinality, broadway_integration, genstage_aggregation, rolling_telemetry, distributed_merges, persistence_snapshots, livedashboard_integration, ai_token_analytics, phoenix_observability

Guides (3 new + 7 updated)

New: aggregation_wall, distributed_merge_semantics, livebooks Updated: streaming_sketches, broadway_integration, genstage_integration, flow_integration, persistence, telemetry, observability

Benchmarks (5 new)

persistence_bench, serialization_bench, merge_throughput_bench, update_many_chunk_bench, stream_ingestion_bench

Upgrade Path

No code changes required. Update dependency to {:ex_data_sketch, "~> 0.9.0"}.

Known Risks

  • ULL low-cardinality estimates differ from v0.8.0 (more accurate)
  • Membership filter raw-NIF hashing deferred to v0.10.0
  • Mnesia compile warnings are pre-existing OTP tracking limitations
  • OTEL integration requires :opentelemetry_api ~> 1.0
reddit.com
u/Shoddy_One4465 — 1 day ago
▲ 21 r/elixir

[ANN] ExDatalog v0.2.0

ExDatalog is production-grade pure Elixir Datalog engine with semi-naive fixpoint evaluation, pluggable storage backends, and deterministic cross-backend results.

What's new in v0.2.0

  • Constraint types: type predicates (type_integer, type_binary, type_atom), string predicates (starts_with, contains), and membership (member) — all evaluated deterministically across backends.
  • ETS storage backend: off-heap per-relation ETS tables with O(1) membership, concurrent read support, and observability via :observer.
  • Capabilities system: each backend reports what it supports; merge and query capability sets with Capabilities.merge/2 and satisfies?/2.
  • Provenance: track which rule derived each fact with explain: true.
  • Telemetry: :telemetry events for query start, stop, and exceptions.
  • 601 tests, 0 failures, credo clean, dialyzer clean.

Breaking changes

  • Constraints.StringConstraints.StringPredicate
  • Telemetry.emit_stop/4 default arg removed; pass storage_type explicitly
  • Constraint behaviour evaluate/3 callback now uses Binding.t() not map()
  • Storage teardown callback return type widened to :ok | {:error, term()}

Bug fixes

  • ETS member?/3 was using :ets.match_object/3 instead of :ets.member/2 (O(n) instead of O(1), with an unreachable clause)
  • ETS teardown now raises ArgumentError with a clear message on post-teardown use instead of opaque :badarg
  • Constraint.valid_right?(:member, ...) now requires {:const, list}
  • Engine wraps evaluation in try/after to guarantee ETS table cleanup

Links

reddit.com
u/Shoddy_One4465 — 6 days ago
▲ 13 r/erlang+1 crossposts

ex_data_sketch v0.8.0 — Deterministic Foundations

ex_data_sketch v0.8.0 is out. This release invests entirely in the substrate that all 15 existing sketches share, preparing the grounds for release v0.9.0 where we add streaming integrations for Broadway / GenStage support, ETS / DETS / Zarr.

What's new:

  • Deterministic hashing. Every sketch now goes through a validated, byte-stable hash layer. HLL, ULL, Theta, and CMS accept hash_strategy: :murmur3 for Apache DataSketches interop — this was silently ignored in v0.7.x. XXHash3 remains the default and fastest path (~30 M items/sec at p=14 on the Rust NIF).

  • Binary stability & corruption detection. Serialized sketches now carry a CRC32C trailer and an embedded hash metadata block (EXSK v2). Bit-flip corruption that previously would silently produce wrong estimates is now caught and returns a structured DeserializationError. v0.8.0 reads v1 frames; v0.7.x cannot read v2 — stage your rollout accordingly.

  • Murmur3 hot path. 8 new Rust NIFs extend in-Rust hashing to Murmur3. The Murmur3 path is within 8% of XXH3 throughput. No more falling off the fast path when you select :murmur3.

  • Precompiled NIFs for Windows. x86_64 and ARM64 MSVC targets join the matrix. 16 artifacts total (8 targets x 2 NIF versions). No Rust toolchain needed on any supported platform.

  • Property-locked guarantees. 14 StreamData properties lock HLL/ULL monotonicity and error bounds, KLL/REQ rank consistency, CMS overestimation-only, and Bloom/XorFilter/Cuckoo no-false-negative. A 200-mutation fuzz suite verifies that binary v2 corruption never silently propagates.

Breaking changes (2):

  1. EXSK v2 is one-way. v0.7.x readers can't decode v2 frames. Deploy readers first, then producers.
  2. hash_strategy: :murmur3 is no longer silently overridden to :xxhash3. Sketches that specified Murmur3 will now actually use it — estimates are correct but differ from v0.7.x.

One-liner upgrade:

{:ex_data_sketch, "~> 0.8.0"}

Most users need no code changes. Full migration guide ships in HexDocs.

Stats: 1,317 tests, 171 properties, 92.7% coverage, 0 credo issues.

GitHub | Hex | Docs

reddit.com
u/Shoddy_One4465 — 9 days ago
▲ 20 r/erlang+1 crossposts

Released v0.2.0 of ExSystolic -- a BEAM-native systolic array simulator. If you're into parallel algorithms, dataflow computing, or just like seeing deterministic parallelism on the BEAM, this might be interesting.

What's a systolic array? It's a grid of simple processors (PEs) connected by FIFO links, all driven by a global clock. Data pulses through the grid one tick at a time. The canonical use case is matrix multiplication, but the same pattern works for convolution, shortest paths (tropical semi-ring), and any sliding-window computation.

A systolic array is a hardware execution model designed for high-throughput, repetitive computations where data flows through a grid of simple processing units. Instead of constantly moving data back and forth from memory (the real bottleneck in modern systems), it keeps data in motion and reuses it as it propagates through the array. This makes it extremely efficient for workloads dominated by linear algebra—especially matrix multiplications, convolutions, and streaming transformations.

This is why systolic designs underpin much of today’s AI and high-performance compute stack. Google TPUs, accelerators from NVIDIA, and chips used by Tesla all leverage similar principles to power neural networks, computer vision, and real-time inference. The same model also applies to signal processing, scientific computing, and even emerging database and graph workloads—making systolic execution a compelling abstraction for building next-generation data and compute systems.

What's new in v0.2.0:

  • Parallel backend -- splits arrays into tiles, dispatches them in parallel via Task.Supervisor or a Poolex worker pool. The interpreted (sequential) backend still works.
  • Proven determinism -- both backends follow the same 6-step BSP contract. Conformance tests verify that interpreted and partitioned backends produce identical PE states and trace events. The parallel backend uses ordered: true dispatch and sorts trace events by {tick, coord}.
  • Pluggable topology -- ExSystolic.Space behaviour with a new links/2 callback. Default is 2D grid, but you can implement graph spaces, hierarchical layouts, etc.
  • Shared link operations -- Backend.LinkOps eliminates triple-duplicated inject/read/write logic (~150 LOC removed).
  • 98.4% test coverage, 185 tests + 34 doctests, 0 dialyzer errors.

Quick example (2x2 GEMM, both backends):

alias ExSystolic.{Array, Clock, PE.MAC, Examples.GEMM}

a = [[1,2],[3,4]]
b = [[5,6],[7,8]]

array =
  Array.new(rows: 2, cols: 2)
  |> Array.fill(MAC)
  |> Array.connect(:west_to_east)
  |> Array.connect(:north_to_south)
  |> Array.input(:west, GEMM.west_streams(a, 2, 2, 2))
  |> Array.input(:north, GEMM.north_streams(b, 2, 2, 2))

# Sequential
interp = Clock.run(array, ticks: 5) |> Array.result_matrix()

# Parallel (same result!)
part = Clock.run(array, ticks: 5, backend: :partitioned) |> Array.result_matrix()

interp == part  # => true

The README has a full tutorial on systolic arrays, including image convolution and shortest-path examples.

Would love feedback, especially on the Space/topology abstraction and the parallel dispatch design.

u/Shoddy_One4465 — 18 days ago