r/zabbix

▲ 6 r/zabbix

Port monitoring

Hi,

I am pretty new to Zabbix. I have a server set up with several agents working in either Passive or active mode. I have a new service/server that needs to be monitored. The way I check if the service is up right now is telnet to the port to see if the tcp connection is created. That's all I need. I would like to recreate that in Zabbix.

In my head I would like to create the host and run that test periodically. If it comes back "open" the little icon stays green, if not it turns red. I don't know if I'm even on the correct path. Is this the Zabbix way?

I found two different recommendations, one was to use Zabbix Agent, the other said Simple Test. The key for both is net.tcp.service[tcp,4.200.200.200, 678]

When I test the active agent option I get an error? "Received empty response from Zabbix Agent at [4.150.186.189]. Assuming that agent dropped connection because of access permissions." To me that seems like it passed. I'm reading that as the server sent you something something but it was empty. That is enough for my purpose.

The Simple Check item runs the test, doesn't throw an error but just goes back to the test window. Did that pass?

I feel like I'm missing something.

reddit.com
u/miners-cart — 13 hours ago
▲ 1 r/zabbix

How to get accurate CPU Utilization Hyper V Host?

Getting True CPU Utilization on Hyper-V Hosts in Zabbix

I'm trying to get accurate CPU utilization figures for my Hyper-V hosts in Zabbix, but I'm running into a discrepancy between what Zabbix reports and what I'm actually seeing in Windows.

The Problem: Zabbix (and Task Manager) are only reporting CPU usage for the host OS partition itself — not including the load from guest VMs. Resource Monitor, on the other hand, shows the true aggregate utilization including all VM workloads.

As you can see in my screenshot, Task Manager and Zabbix both show ~4% CPU — but Resource Monitor is sitting closer to 50%, which is the real picture when you factor in the VMs running on the host.

What I've tried / what I know:

  • Zabbix is pulling the standard Windows CPU performance counter, which only reflects the root partition
  • Resource Monitor appears to use a different counter that accounts for hypervisor-level usage

What I'm looking for: Has anyone successfully configured Zabbix to capture true host CPU utilization on Hyper-V — the kind that includes VM guest load? Specifically, I'm wondering if there's a Hyper-V-specific WMI query or a performance counter (like Hyper-V Hypervisor Logical Processor) that would give the right number.

Any tips or working templates would be greatly appreciated!

u/eld101 — 1 day ago
▲ 3 r/zabbix

Zabbix maintenance period not suppressing alerts for hosts behind a proxy.

Running Zabbix 7.4.10 and experiencing an issue with maintenance periods not working correctly for hosts monitored through a Zabbix proxy.

The setup:
- Two hosts sitting behind a Zabbix proxy
- Maintenance period configured with both hosts explicitly listed
- Maintenance type: "With data collection"
- "Pause operations for suppressed problems" is enabled in the trigger action

The problem:
During the maintenance window, alerts are still being sent out.
When checking Monitoring - Problems with "Show suppressed problems" enabled, the problems don't show up at all - meaning the server never marked them as suppressed in the first place.

After some digging I found this in the Zabbix documentation:
"The Zabbix proxy is not aware of maintenance periods because there is no synchronization of maintenance configuration between the Zabbix server and proxy."

So it seems the proxy just forwards data without knowing about the maintenance, and the server isn't handling the suppression correctly either.

Moving the hosts directly under the Zabbix server is not an option due to network/firewall restrictions.

Has anyone dealt with this and found a cleaner solution?
Would love to hear how others are handling maintenance periods in a proxy-based setup.

reddit.com
u/sgmmaffe — 3 days ago
▲ 5 r/zabbix

Zabbix Agent: Active + Passive

Hi everyone,

In my Zabbix agent config, I have enabled both modes pointing to the same server IP:

Ini, TOML

Server=IP
ServerActive=IP

Is there any downside or counter-indication to letting the agent run in this "hybrid" mode (both active and passive) in production?

Does it cause any performance issues or conflicts, or is this considered standard practice?

Thanks for your feedback!

reddit.com
u/Level_Pool3403 — 4 days ago
▲ 2 r/zabbix

distributed monitoring across branch offices has become painful to manage

we support multiple branch offices and remote locations and monitoring them consistently has become increasingly difficult. VPN instability firewall rules disconnected collectors and inconsistent configurations create blind spot all the time. expanding monitoring to a new site often feels more complicated than deploying the actual infrastructure there. I am trying to find a way to centralize visibility without constantly fighting connectivity and deployment issues. especially interested in approaches that dont require heavy infrastructure at every location.

reddit.com
u/Ste2_fan4 — 4 days ago
▲ 8 r/zabbix

Opensense monitoring API Endpoints for Zabbix

Hello guys
here are the most important Opnsense API Endpoints to make Zabbix able to pull monitoring Information from Opensense over HTTP Agent
r/opnsense

"services":                "api/core/service/search",
"interfaces":              "api/diagnostics/traffic/interface",
"protocolStatistics":    "api/diagnostics/interface/get_protocol_statistics",
"pfStatisticsByInterface": "api/diagnostics/firewall/pf_statistics/interfaces",
"arp":                     "api/diagnostics/interface/search_arp",
"dhcpv4":                  "api/dhcpv4/leases/searchLease",
"openVPNInstances":        "api/openvpn/instances/search",
"openVPNSessions":         "api/openvpn/service/search_sessions",
  "gatewaysStatus":          "api/routing/settings/searchGateway",
  "unboundDNSStatus":        "api/unbound/diagnostics/stats",
"cronJobs":                "api/cron/settings/searchJobs",
  "wireguardClients":        "api/wireguard/service/show",
"ipsecPhase1":             "api/ipsec/sessions/search_phase1",
"ipsecPhase2":             "api/ipsec/sessions/search_phase2",
"healthCheck":             "api/core/system/status",
"firmware":                "api/core/firmware/status",

reddit.com
u/OccasionExtra8029 — 3 days ago
▲ 5 r/zabbix

Zabbix on EKS

Does anyone have real-world use cases of running Zabbix in an Amazon EKS environment?

I’d like to understand the best way to make Zabbix scale automatically as the environment grows.

How many replicas do you usually run for the Zabbix server/proxies?

What is the recommended approach for autoscaling?

Is there a way to run the cluster in active-active mode with load balancing between all nodes/instances?

Any recommendations regarding HA architecture, proxies, or database scalability?

Would appreciate any architecture examples, lessons learned, or production experiences.

reddit.com
u/FG1100 — 5 days ago
▲ 2 r/zabbix

Ubiquiti AirOS SNMP: Interface ath1 – High Error Rate (>2 for 5m)

Hi everyone,

I’m having an issue with a trigger in my monitoring dashboard related to a Ubiquiti AirOS device.

The trigger “Interface ath1: High error rate (>2 for 5m)” is constantly being activated, and I’m trying to understand the real cause instead of simply increasing the macro threshold to hide the alert.

At the moment, the interface appears to be working normally, but the error counter keeps increasing and generating alerts frequently. I’m not sure if this could be related to:

  • Wireless interference
  • Signal quality issues
  • Hardware problems
  • Duplex/speed mismatch
  • High traffic load
  • SNMP polling behavior
  • Something specific in AirOS

Has anyone experienced the same issue before?

I’d like to properly troubleshoot and identify the root cause rather than just suppressing the alert. Any suggestions on what I should check first, or which metrics/logs are most useful for diagnosing this kind of problem?

Thanks in advance.

reddit.com
u/RPIEROTTI — 7 days ago
▲ 3 r/zabbix

Generate PDF Report

Hey guys,

currently im doing monitoring dashboard for AP and smartcabinet in Zabbix 7.0.25. is there any method to generate monthly and yearly pdf report?

as far that i know, zabbix only has scheduled report. but i want to generate report whenever i want. asked chatgpt and it recommended to Use Zabbix API + External PDF Generator which requires coding and a bit of hassle tbh.

appreciate the input from you guys!

u/cytrium — 9 days ago
▲ 23 r/zabbix

We had graphana for 2 years.

At work we currently use Grafana to visualize alerts, but I'm starting to question whether it's the right choice. Since Grafana is purely a visualization layer and relies on an external data source for alerting, it feels like an incomplete solution on its own.

I've been thinking that a better architecture would be: a Zabbix server in High Availability, with a Zabbix Proxy and Agents deployed per customer. This would cover data collection, alerting, escalation, and dashboarding all in one place.

That said — am I wrong? Is there a strong case for Grafana that I'm missing? I'm genuinely curious if anyone has a setup where Grafana truly shines over a full monitoring stack like Zabbix.

reddit.com
u/MediumAd7537 — 13 days ago
▲ 1 r/zabbix

Email notification issue

We have Zabbix deployed in a Docker container and recently updated the SendGrid password. Since then, Zabbix email notifications have stopped working, even after updating the new password in the media type configuration.

I tested email sending manually from inside the container, and emails are being received successfully. However, when testing through the Zabbix media type, it fails with the following error:

"Timeout was reached: Connection timed out after 40002 milliseconds."

Any idea how can I resolve it.?

reddit.com
u/desiboyomi — 11 days ago
▲ 10 r/zabbix

Is there some kind of baseline of what to monitor for a M365/windows-environment?

Hello everyone!

I've pretty much just started working with Zabbix in a M365/windows environment and I've helped the company that I'm currently doing my internship at to monitor things like:

- AD Replication
- Certificate expiration on "critical" certificates
- RDS-related monitoring
- Specific windows-services (that are not included in Zabbix templates for Windows)
- Other AD-related monitoring
And more.

I've also set up specific triggers for the monitoring I've created, adjusted some of the current triggers to behave more precisely, created a couple of dashboards, to be able to link some of the more important services/servers together in one place.

Outside of this I feel like we monitor the "standard stuff" in our Zabbix environment.

What I'm now looking for is not the "basic server/service monitoring", but more the things that templates and the basic configuration don't actually include/monitor.

Do you guys have any tips, tricks, or examples of things that are worth monitoring in Zabbix, but that people often miss when setting up their environment?

I'm especially interested in practical examples/tips from real environments.

Thanks in advance and take care!

reddit.com
u/hehe123exde — 11 days ago
▲ 3 r/zabbix

File Based Appliance Upgrades

In CheckMk it let's you upload a single file in order to upgrade everything, does Zabbix have anything like this for the appliance?

Or is there a way to make a Docker Install work in an airgapped network, where I can just pull some docker files and use a docker hosting OS that supports file based updates?

reddit.com
u/flappysack- — 12 days ago
▲ 2 r/zabbix

Help with getting certain information into alert

Hello,

I've got a nice Teams alert working with lots of information sent within the alert. However I just can't get the firmware version to show in the alert.

Here is what I have.

NVMe Device: {#NVME_DEVICE}

I'm using a discovery like this which works:

https://preview.redd.it/k3omhe4b220h1.png?width=1826&format=png&auto=webp&s=beb35e6d098cefa9ac0fb9a8fb0735f54ce4ea1b

Trigger works:

https://preview.redd.it/svcttosyk30h1.png?width=1468&format=png&auto=webp&s=63e9015931e88c1cb6c7d63cebf7311bbceab96f

Latest data is good:

https://preview.redd.it/c92w20fl220h1.png?width=1580&format=png&auto=webp&s=e053df1636be5a3a54aa0350316fd0a62f90deb2

The is what my Teams alert shows:

https://preview.redd.it/t6dz9myq220h1.png?width=700&format=png&auto=webp&s=19ec9835c69d3fc9586d1d98f5aea7305e74a487

With

NVMe Device: {#NVME_DEVICE}
NVMe Status: {ITEM.VALUE1}
{EVENT.OPDATA}

Should I be using {ITEM.VALUE1}?

Is there a way to test these alert and not have to wait until the issue happens?

Thanks

reddit.com
u/bgprouting — 13 days ago