r/elasticsearch

VM elastic best practices

Hello everyone,

I have an Elastic VM hosted on my VMware ESXi 8 server.

Recently, I’ve been noticing slowness on this VM, and after reading a bit about it, I saw that one possible cause of slowness/freezing could be the VM’s CPU Ready.

Look:

https://preview.redd.it/w4y346h3pc2h1.png?width=728&format=png&auto=webp&s=80d802dde0b2f3bb99cb601e3120a8873d491059

How bad are my current parameters? What would be considered ideal?

reddit.com

ingest pipeline doesnt work

hi,
I want to send logs through ingest pipeline to rename them to a different name, now the pipeline does look like its running but the names arent changing.

if i try it with a random file from the index it said it worked and has all the processors check and green but the names just doesnt want to change

i try to deliver logs from hayabusa so every log has a different set of fields

tnx for every help i can get

reddit.com
u/Choice-Departure6379 — 2 days ago

Fetching rows beyond 10k index

We are populating the table data using es search api and using pagination.The total row count is more than 10k.So if we go to the last page it gives empty rows. I found out that we need to use search_after or scroll_id but we don't get pagination using those two right? So is there a way to get pagination and also fetch rows beyond 10k ?

reddit.com
u/Tharun_116 — 4 days ago
▲ 69 r/elasticsearch+1 crossposts

Faster vector search in Elasticsearch with SIMD (deep dive into the new engine)

Hey folks,

I’ve been working on improving vector search performance in Elasticsearch and wanted to share a deep dive into a new SIMD-accelerated vector search engine we’ve been building.

We focus on:

  • How SIMD is used to speed up vector similarity computations
  • What changes were made under the hood in Elasticsearch
  • Real performance gains and tradeoffs
  • When this approach actually helps (and when it doesn’t)

If you're working with kNN, embeddings, or large-scale retrieval systems, this might be useful.

Would love feedback from anyone running vector search in production — especially around bottlenecks or tuning challenges.

Blog post:
https://www.elastic.co/search-labs/blog/elasticsearch-vector-search-simdvec-engine

u/chegar999 — 9 days ago
▲ 5 r/elasticsearch+1 crossposts

SQL Full-Text Search vs ElasticSearch

We're looking to implement a full-text search of .pdf documents we have stored in a SQL database. The application front-end is Angular. The plan is a textbox within the application that users can type a search term into and have it bring up all .pdfs that contain that term.

The documents are stored as [VARBINARY](MAX) FILESTREAM in a single SQL table. There are currently around 500,000 .pdfs in the table and we average approx. 4,800 new .pdfs added each month.

I want something that will return results to the user within a couple of seconds and that won't require any manual process when new .pdfs are added. It needs to handle multi-page .pdfs and should allow us to retain our existing security restrictions on what documents the user is allowed to see.

Based on my research it seems like Elasticsearch is the best tool for this, but I've also been looking at the native SQL Server full-text search feature. It seems like it would be significantly easier to implement and maintain, but I'm worried about performance given the number of files.

I'm new to full-text search. Does anyone have any experience with these tools? Or have a recommendation for a different one?

Thanks!

reddit.com
u/NationalMonument — 9 days ago
▲ 25 r/elasticsearch+4 crossposts

paradedb/benchmarker: a workload agnostic, multi-backend benchmarking tool.

Hi r/postgresql!

We just open sourced ParadeDB Benchmarker, a multi-backend benchmarking framework built on top of the excellent Grafana k6 (blog post).

One of the goals was avoiding a shared query abstraction layer. PostgreSQL queries stay PostgreSQL queries, with their own driver and native SQL.

Supports PostgreSQL, Elasticsearch, OpenSearch, ClickHouse, MongoDB, and ParadeDB with:

  • mixed read/write workloads
  • support for docker-compose profiles per backend
  • dataset loader
  • config and setup capture
  • live metrics + exported reports

One of the ah-ha moments I had building this was using the pgx Go driver in anger for the first time, I'm a Rust guy, but I'm seriously impressed with pgx and what it can do.

Any comments welcome, we will be using this to benchmark ParadeDB, but you can write your own datasets and workloads which have nothing to do with full-text search.

github.com
u/jamesgresql — 9 days ago

Reroute logs in different dataset

Hello guys,

I ingest logs from one SaaS solution though the pre-built elastic agent integration. The logs are pretty noisy and I want to reroute them in different namespaces (data streams) to apply different ILM policies.
What are my options?
I have tried to reroute those logs via *@custom pipeline using different fields and it has broken the integration (at least there were no logs from the integration before I made the pipeline empty (deleted all processors) lol). I am thinking of adding the reroute processors in the "final pipeline" after the logs are parsed. Is it a good idea at all?

I would appreciate any help regarding this.

reddit.com
u/proclick- — 11 days ago

Dashboards

Hi,
Why is it so tricky to import an NDJSON file and get it to work? Is the syntax and formatting really that strict?

Does anyone have any tips or tricks for handling it more easily?

reddit.com
u/ShirtResponsible4233 — 10 days ago

Best Practices for Handling Unmatched Logs

Hi, I’m looking for a good strategy to capture and monitor logs that are not matched by any existing parsing, filtering, or classification rules.

I’m considering setting up a dedicated dashboard for unmatched logs to improve visibility and identify missing patterns or filters over time. Maybe it exists?

Do you already have a solution or recommended approach for this? Also, are there any RFCs, standards, or industry best practices related to handling unmatched or unclassified logs?

reddit.com
u/ShirtResponsible4233 — 9 days ago
▲ 0 r/elasticsearch+1 crossposts

Built an embedded systems search engine that searches Stack Overflow, EE Stack Exchange + GitHub Issues simultaneously. Solo project from India, roast it

Hey r/embedded,

I’m a fresh B.Tech graduate from Gujarat, and I’ve been running into a recurring frustration while debugging embedded systems. Wanted to sanity check if others feel the same or if I’m just overthinking it.

When working on issues like STM32 I2C hangs, ESP32 WiFi instability, or FreeRTOS scheduling bugs, I usually end up jumping between:
Stack Overflow
Vendor docs (ST, Espressif, Nordic, etc.)
GitHub issues
Random forum threads
Community discussions

The info is all out there, but it’s super fragmented, and the constant context switching slows things down a lot.

To explore this, I built a small weekend prototype that tries to aggregate embedded debugging knowledge into one search flow. It pulls relevant discussions from multiple sources and shows consolidated answers instead of just links.

I’m also planning to integrate Reddit responses (where possible) since a lot of real-world debugging gold is there, and add datasheet analysis support as well.

The idea is to give two ways to use it:

  1. A normal platform-based search interface
  2. An MCP-based integration so AI tools can pull more grounded embedded answers directly

This is still very early and experimental, and I’m not even sure if it’s actually useful or just “nice in theory”.

I’d really appreciate feedback on:
Does this solve a real problem in your workflow or is it unnecessary abstraction?
How do you currently handle debugging across all these sources?
What would make something like this actually useful (or less annoying)?
Any ideas on what direction this should go next?

Example queries it handles:
STM32 I2C bus hanging
ESP32 WiFi drops connection
FreeRTOS task priority not working
MOSFET gate driver issues

If anyone’s curious, here’s the prototype (very rough):
embedmcp.up.railway.app

GitHub repo:
https://github.com/Sakriya-Kirtan/embedmcp

Happy to take any criticism, especially the blunt kind. Trying to figure out if this is worth iterating on or just a learning project.

reddit.com
u/HalfNote_ — 11 days ago