[ANN] ExDataSketch v0.9.0 - Streaming Integrations
Summary
ExDataSketch v0.9.0 adds stream-native integration, production persistence, structured observability, and ULL accuracy corrections. The release positions ex_data_sketch as a BEAM-native streaming approximate analytics infrastructure layer, not merely a collection of probabilistic algorithms.
Key Changes
- Stream/Collectable integration for all 13 mergeable sketch types
- Broadway, GenStage, and Flow pipeline integration (all optional)
- 5 persistence backends: ETS, DETS, CubDB, Mnesia, Ecto
- Structured :telemetry events + OpenTelemetry bridge
- ULL linear counting + large range correction
(62.5% -> 0.8% error at p=8/n=1000) - Configurable update_many chunk size (HLL, ULL, CMS, Theta)
- EXSK v1 serialization escape hatch for backward compatibility
- 9 production-oriented Livebooks
- 3 new educational guides (aggregation_wall, distributed_merge_semantics, livebooks)
Backward Compatibility
Full backward compatibility with v0.8.0. No API changes to existing modules.
ULL estimates at low cardinalities (p < 12, n < 500) are more accurate but may differ numerically from v0.8.0.
New Dependencies
:telemetry ~> 1.0(required):broadway, :flow, :cubdb, :ecto_sql, :opentelemetry_api(optional)
Modules Added (21)
ExDataSketch.Stream
ExDataSketch.Broadway
ExDataSketch.Broadway.PeriodicAggregator
ExDataSketch.Flow
ExDataSketch.GenStage
ExDataSketch.GenStage.SketchConsumer
ExDataSketch.GenStage.SketchProducer
ExDataSketch.GenStage.SketchStage
ExDataSketch.Storage
ExDataSketch.Storage.ETS
ExDataSketch.Storage.DETS
ExDataSketch.Storage.CubDB
ExDataSketch.Storage.Mnesia
ExDataSketch.Storage.Ecto
ExDataSketch.Storage.Ecto.Schema
ExDataSketch.Storage.Ecto.Migration
ExDataSketch.Telemetry
ExDataSketch.Telemetry.OpenTelemetry
ExDataSketch.Integration
ExDataSketch.Binary (encode_v1/4)
Livebooks (9)
streaming_cardinality, broadway_integration, genstage_aggregation, rolling_telemetry, distributed_merges, persistence_snapshots, livedashboard_integration, ai_token_analytics, phoenix_observability
Guides (3 new + 7 updated)
New: aggregation_wall, distributed_merge_semantics, livebooks
Updated: streaming_sketches, broadway_integration, genstage_integration, flow_integration, persistence, telemetry, observability
Benchmarks (5 new)
persistence_bench, serialization_bench, merge_throughput_bench, update_many_chunk_bench, stream_ingestion_bench
Upgrade Path
No code changes required. Update dependency to {:ex_data_sketch, "~> 0.9.0"}.
Known Risks
- ULL low-cardinality estimates differ from
v0.8.0(more accurate) - Membership filter raw-NIF hashing deferred to
v0.10.0 - Mnesia compile warnings are pre-existing OTP tracking limitations
- OTEL integration requires
:opentelemetry_api ~> 1.0