r/ansible

▲ 10 r/ansible

Semantics question, do you use .j2 file extension on templates, or do you prefer to keep file extensions original?

This is a discussion to check what the community at large prefers, and why.

My philosophy is always to keep my working environment as simple as possible, because complexity raises the chance of problems, and problems in my work environment is something I absolutely do not want to waste time on.

And therefore I keep most of my template files in their original file extension, .ini, .yaml, or whatever that might be. I never add the .j2 extension.

Because the j2 part of templates is 99/100 times a tiny part, most of the file is in its original syntax.

So why should I add complex editor plugins that first parse the file as j2, and then also have to support various upstream formats like yaml, ini, toml or whatever? It seems unnecessarily complex.

I'm a vim user if that matters, but I think the same philosophy applies to any editor or IDE.

reddit.com
u/sajkoterrapefft — 21 hours ago

Sending email without using SMTP?

Hey everyone,
Pretty new to ansible. Was trying to send myself an email from ansible. Unfortunately, the org that I work for does not support using SMTP server for ansible. I was able to get the info using a teams webhook.
My question is, are there any other ways I can go around to send myself an email. Something I thought was instead of using the ansible built-in email module, run a curl command that uses SMTP to do the email. Any advice is appreciated.

reddit.com
▲ 39 r/ansible+2 crossposts

Built a Dockerized Ansible lab with a browser-based IDE

I built a portable Ansible lab that spins up in seconds using Docker. Thought some of you might find it useful for learning or testing playbooks.

https://github.com/Yoas1/ansible-handson

The setup:

  • **1 controller** (Python + Ansible + code-server IDE on port 8080)
  • **2 workers** — one Ubuntu 22.04, one Red Hat UBI 9
  • Pre-configured SSH keys (Ed25519), inventory, ansible.cfg, Vault, and linters

You literally run `docker compose up`, open your browser, and start writing/running playbooks. No manual VM setup, no SSH config headaches.

What I like about it:

  • **Hot-reload configs** — edit .config/ files and inotifywait auto-applies them via update_config.sh
  • **Pre-commit hooks** built in — yamllint, ansible-lint, shellcheck, markdownlint all run before commit
  • **Multi-distro workers** — test your playbooks against both Debian-based and RHEL-based systems
  • **Code-server** — full VS Code in the browser with Ansible and Python extensions

Would love feedback or ideas for improvement. The full setup is on my GitHub if anyone wants to check it out.

Cheers

u/yoas1a — 1 day ago

VMWare Ubuntu VM Provisioning and Cloud-init config

I have written a playbook that provisions an Ubuntu 24.04 VM from template using VMWare, which works fine.

However I then am trying to parse through cloud-init config from Ansible to VMWare then into the VM. Most of this config works, the only thing I cannot get the VM to detect or apply is networking config.

VMware keeps inserting its own netplan using DHCP on IPV6

Only way I have gotten mine to appear at all is in the userdata putting in a file creation that has the netplan config in and does a netplan apply in the runcmd.

However when doing this the VMWare config for Netplan still applies and it takes provisioning time from under a minute to over 5 mins where it is getting stuck on various services and does a reboot when getting to the login screen.

Anyone got any ideas? I would post the playbook, but there have been many iterations of the networking and was curious if there is just something very obvious I'm missing.

I have put in disable_vmware_customization: true to a config file and all other cloud init config seems to be applying (some file changes, host name etc), just really struggling with the networking.

reddit.com
u/HPFOC — 2 days ago

Login with password

I get assigned machines with a temporary root password so the first thing I do is ssh in and create a usr1 user I'll use to do all the setup work. I set up usr1 with sudo so it can do what root can. I have to manually set up all the authentication such as deleting the password for root and setting up ssh keys for usr1.

And then I can run it like this from my computer:

ansible-playbook play.yml

I run that as the usr1 user on my computer so it uses the same usr1 user on the machine. By default tasks will run as usr1 but if a task has become: yes it runs as root.

But it takes too long to set up all the authentication for the first time manually so I want to try it from ansible. I want to make ansible-playbook run as usr1 like it always does but instead of logging in through ssh using ssh keys for usr1, it should use temporary ssh password for root user.

I tried this but it is running tasks without become: yes as root but it should as usr1. Is there a better way?

ansible-playbook play.yml --extra-vars "ansible_user=root ansible_password=temporarysshpassword become_user=usr1"

I tried --ssh-extra-args but I don't know what the proper syntax is.

reddit.com
u/Beautiful-Log5632 — 2 days ago

AAP 2.6-8 Containerized, Time Zone breaks get_service_token()

So for the past three weeks I've gone through the guide here, Upgrading and Migrating from AAP 2.5 RPM to 2.6 Container and wanted to document a bug I hit and I resolved it.

I'm going to fast-forward and not touch on every little tid bit during this process, but please ask questions if you want. I also do not claim 100% accuracy on the statements I make below.. this is just what I found, and what worked for me.

So went through the guide and got the the past where it's time to install 2.6 and upgrade the containerized 2.5 hosts. I've gotta mention how the installation process took upwards of 8 hours (!!!), this is on M6i.xlarge EC2's as well!

Anyway, so the installation was completing however the web gui reported an Error connecting to the Controller API. Web dev tools showed an HTTP 401 to /api/controller/v2/me/. So I started digging into the controller and gateway logs. This is a gist of what I found:

  • Envoy auths theuser, add a JWT then forwards to the controller
  • Controller validates the JWT, then needs the user claims from the gateway
  • Controller generates a service token using datetime.now(), which in this case returned CDT or 5 hours behind UTC
  • PyJWT encoded the CDT time as UTC
  • Controller sends the expired token value to the gateway's /api/gateway/v1/jwt_claims/ endpoint
  • Gateway rejects the expired token, returns HTTP 401
  • Controller can't validate the user, returns HTTP 401 to the browser
  • Dashboard shows "Error connecting to the controller api"

Now at the time I was unaware that containers default to UTC time.. I did find a RH KB on resolving the receptor images TZ, but I did not find info on the other containers. Now since I like seeing logs and events in my local timezone, I used those steps and created a mounts.conf under .config/container/ for the AAP user on all 8 hosts to set it to 'America/Chicago'.

Now this did not resolve the token issue.. I figure because Django doesn't care what the OS is set to use. So I changed the time_zone in my inventory file to 'UTC'.. went ahead and tested this by editing the /home/aap_user/aap/controller/etc/settings.py under controller_extra_settings time_zone to UTC and restarted the containers.

Controller API error resolved.

From what I gather, this might not be a problem if get_service_token() at /var/lib/awx/venv/awx/lib64/python3.12/site-packages/ansible_base/resource_registry/resource_server.py line:

payload["exp"] = datetime.now() + timedelta(seconds=expiration)
Was changed to:
from datetime import timezone

payload["exp"] = datetime.now(tz=timezone.utc) + timedelta(seconds=expiration)

But Im no python dev so.. IDK. Anyway thanks for reading.

u/invalidpath — 3 days ago

Using Ansible through a Perle console server

We use Ansible for configuration deployment in a network environment.

We would like to push baseline configurations to devices, mainly Cisco devices.

For console access, we use a new Perle IOLAN SCR258. However, when we try to connect through it, we get a blank screen and need to press Enter on the keyboard to get the CLI prompt. Ansible gets stuck at this point.

Since the connection works over a console port, commands are buffered, and when we finally connect and press Enter, the buffered commands are executed all at once.

I have already read all the documentation and even asked AI tools, but I could not find a proper solution.

I hope someone has been in a similar situation and can help.

reddit.com
u/Big_Individual8863 — 4 days ago

Ansible become password prompt when running tasks on multiple servers

​

I am working in a lab environment using Ansible to manage multiple Ubuntu servers (3 nodes) over SSH.

When executing tasks that require elevated privileges (using --become), Ansible prompts for a “BECOME password” during the playbook execution. I’m not sure which password I should provide. At first, I thought it was asking for my workstation password (the local machine I am using to run Ansible).

However, after some research, I found that it is actually asking for the password of the remote server.

Is this correct?

In my setup, I have created a separate password for each server.

I am connecting to multiple servers via SSH

Using Ansible ad-hoc commands and playbooks

Running tasks that require root privileges

Ansible requests the become (sudo) password interactively when --ask-become-pass is enabled

I want really someone to explain ....because i search and didn't really get it

reddit.com
u/mello_v5 — 6 days ago
▲ 20 r/ansible

For private setup, do you set up an actual physical control node?

I have a couple of home and cloud servers for various private use-cases. Right now slowly transitioning all setup to ansible. Love it so far!

Right now I am deploying everything from my Laptop. It is the only machine that contains all the credentials. Of course backups exist in various places, but no other machine.

Problem: When I am away from home, I have to carry my laptop with me, because if anything breaks, it's the only machine that allows me to log in and fix anything.

I could install an ssh client on my phone (iOS) and create keys, then I can at least log into servers, but still cannot run playbooks.

What I am thinking about: building a physical control node, i.e. a small PC with Linux, completely security isolated, only for that purpose, full disk encryption, the only way to login is ssh key, and from there I can access all other machines, run all playbooks. I could even install semaphore or something on it to get a web interface.

To be clear, I am not asking about connectivity - that is solved, all my machines and phones share a VPN and can talk to one another.

My current issue is with being able to run meaningful tasks while away, running ansible while on the road, being unclear about being able to trust iOS terminal apps...


tl;dr: (1) Does a physical control node make sense for a small private setup or is it more effort than it's worth? Do you have one?; (2) Can one trust iOS ssh terminal apps?

reddit.com
u/AlpineGuy — 6 days ago

Automated software installation idea

Hi, guys! While struggling with Windows software installations and creating a custom install scripts per every piece of the software, I came to an idea to create a site/api where vendors and community in general can add commands and switches/flags/arguments for install and uninstall the software. That should be a 'database' available to everyone who want to automatize the software (un)installation.

As every installer has its own command args which differ between each others, this API would help to standardize a software (un)installation and finally make possible to have one 'task' in Ansible, or whatever, to install all needed software.

I prototyped this and you can find more details on https://use-cli.com and this thread. Any feedback is welcome.

reddit.com
u/Many_Ad7628 — 6 days ago

My experience learning Ansible with Claude and need suggestions going fwd!

Hi, I am a network engineer and new to Ansible.

I installed WSL 2 on Windows and tried some things. What happened is I soon ran into a lot of issues, like some Python issues and then some Azure environment issues, because I was trying to integrate secrets management with Azure Key Vault. I am not sure if AI is teaching me good practices.

The next thing I am trying to get set up is Semaphore because I think that's one of the ways to actually learn production practices.

I am just putting my workflow here, what I did so far, but I want to know if this is a good idea or maybe I'm going down a wrong path and learning bad practices.

  • Verified Ansible was installed via apt on Ubuntu WSL2 — no /etc/ansible/ created, used project-based ansible.cfg in ~/ansible/ instead
  • Reinstalled Az CLI natively in WSL (was running Windows Az CLI via PATH passthrough) and authenticated with az login --use-device-code
  • Created Service Principal sp-ansible-akmlabs in Entra ID, assigned Key Vault Secrets Officer role on the vault, stored credentials in ~/ansible/.env
  • Created Azure Key Vault akmlab-az-kv in portal with RBAC mode enabled, stored a test secret via Az CLI
  • Created Python venv, installed azure.azcollection requirements plus hidden dependencies, installed ansible-core inside the venv so Ansible runs from venv Python
  • Wrote and tested playbook that authenticates as the SP and pulls the secret from Key Vault — confirmed hello-from-akmlabs returned successfully

Then ran into some crazy python issues which took 3 hrs to troubleshoot!

  • No /etc/ansible/ — modern apt install doesn't create it; used project-based ansible.cfg in ~/ansible/ instead
  • Az CLI using Windows Python — WSL was calling the Windows Az CLI via PATH passthrough; fixed by installing Az CLI natively in WSL via the Microsoft install script
  • Wrong collection module name (claude got these wrong) — used azure_keyvault_secret_info which doesn't exist; correct name is azure_rm_keyvaultsecret_info, found by listing the modules directory
  • Ubuntu blocked pip — PEP 668 prevents pip from modifying system Python; fixed by creating a venv and installing all packages there
  • Missing hidden dependencies — requirements.txt from the collection is incomplete; had to manually install msrestazure and others not listed (known upstream issue GitHub #1463) - This then took 3 hrs lol!
  • Venv packages being ignored — interpreter_python in ansible.cfg controls target host Python, not Ansible's own process; fixed by installing ansible-core inside the venv so Ansible itself runs from venv Python
  • Wrong env var names — azure.azcollection uses AZURE_SECRET and AZURE_TENANT, not AZURE_CLIENT_SECRET and AZURE_TENANT_ID; silent auth failure until corrected
  • Wrong secret field name — output dict uses .secret not .value; dumped raw output with var: kv_output to find the correct field name.

Feeling like all this was a waste of time and I did not learn anything that is done in a real production environment. If somebody can make a quick list of things that I could look into and actually learn something, it would be nice. Overall I think it's still fine. I did something I guess.

reddit.com
u/masterofrants — 7 days ago
▲ 18 r/ansible

AD DC Promotion via Ansible

I’m being “forced” to automate my DC promotion process (and a lot of overall AD administrative tasks, but baby steps). Has anyone successfully used ansible playbook to promote a DC to an existing forest?

If so, would you mind sharing your playbook and how you did it? Was it worth it?

reddit.com
u/Slow-Savings-414 — 8 days ago
▲ 35 r/ansible

Ansible for large compute cluster

So I have mostly worked on Ansible-based node bring-up for smaller environments (100-200 servers) and I am comfortable with Ansible playbooks, roles, Molecule testing, ansible-lint/rules, CI pipelines, etc.

Now I have been thrown into a very different scale problem:

We’re building a onprem bare-metal CPU compute cluster starting at ~10,000 nodes (mostly AMD EPYC nodes) with plans to scale toward 20k–30k nodes.

Think large HPC-style infrastructure / compute farm setup.

Current thinking is:

- Initial provisioning via Kickstart/PXE/iPXE
- Then handoff to Ansible for configuration and lifecycle management
- Mostly bare metal
- Need fast, repeatable node bring-up and recovery
- Scale matters more than “traditional enterprise Ansible”

I’d really like opinions from people who’ve actually operated infrastructure at this scale.

Some areas I’m trying to think through:

- Would you still use “push-based” Ansible at this scale?
- Would you move toward Ansible Pull?
- Multiple/decentralized control nodes?
- Event-driven orchestration?
- How do you avoid SSH/control-node bottlenecks?
- Golden images vs fully dynamic provisioning?
- How much should happen in Kickstart vs post-provision Ansible?
- How do you handle inventory at this scale?
- Any lessons around idempotency/performance becoming painful?

Is Ansible Automation Platform/Tower worth it at this scale, or does it become more of an operational overhead?

What would you absolutely avoid after learning the hard way?

Would especially love responses from people running:

- HPC clusters
- AI/ML farms
- Large on-prem compute fleets
- Bare-metal Kubernetes worker farms

Interested in architectural patterns, lessons, scaling bottlenecks, recovery strategies, and “things you only learn after production pain.”

We are also exploring tools like TinkerBell, Canonical MaaS etc instead of Kickstart, would love opinions on that as well.

reddit.com
u/no1bullshitguy — 10 days ago
▲ 15 r/ansible

Best setup: several playbooks or one big one

Hi all,

I've recently started to get into Ansible to setup my Proxmox, VMs and CTs.

Now I'm struggling a bit on what's best practice.
I have a first playbook that sets up SSH, e.g. this setup changes port and installs key pairs. This thus means that after the first run, I cannot use this playbook anymore since SSH port and login method are changed.

This is just one of more of the same issues I'm facing.

What would be a best solution:
Keep just one playbook for a first setup --> bootstrap?
Next, for each (Prxomox, VM and CT) a separate playbook?

Or just one big playbook and using tags?
(I'm not there yet to setup and work with roles...but want to get there eventually).
I'm still discovering stuff with Ansible, and maybe further along down the road I can optimize the whole process myself, but at the moment I'm a bit stuck here 😃

I hope someone could shed a light on this for me, on how to tackle this challenge.

reddit.com
u/Patrice_77 — 11 days ago
▲ 18 r/ansible

Introducing ansible to an existing setup

Hi there!

We're just getting started with ansible in our company, set up a test environment and played with some simple playbooks and looked into some best practice blogs. Next step would be using ansible for fresh systems.

But I am not sure how to get started to using ansible for our existing server setup. We have a very mixed setup with different OS versions (at least most of it is debian) and in some cases very different configurations for same types of servers. Best way would probably be to leave the old setup as it is and just start a rotation but thats not an option for every task. We need some of them also for the old setups but I am really afraid to miss something during testing and break some serious shit on that old stuff :')

Any tips or advise? Some first hand experience on how you handled something like that?

reddit.com
u/larox-sn4k3s — 13 days ago
▲ 26 r/ansible

Windows alternative to Ansible-Pull

Basically the title. Is there an ansible-pull alternative for Windows hosts?

We manage roughly 6k servers, where half of them are Windows.

We currently use Puppet for Desired State Configuration for its idempotency, and Ansible/AWX for ad-hoc automation.

We are planning to consolidate our platforms into a single tool, and thought of using Ansible.

Any tips on how to substitute Puppet by Ansible?

reddit.com
u/iamnotMJ — 13 days ago

Ansible AAP Doesn't find collections but running the same playbook from the execution environment directly finds them

For some reason when I run a playbook for accessing Huawei Network Devices through APP DOESN't find the collection (community.network) and gives the following error:

"ERROR! Couldn't resolve module/action 'community.network.ce_command'. This often indicates a misspelling, missing collection, or incorrect module path.

However when I run the same playbook from the EE Pod directly it finds the collection.

192.168.X.X> ANSIBLE_NETWORK_IMPORT_MODULES: enabled

<192.168.X.X> ANSIBLE_NETWORK_IMPORT_MODULES: found community.network.ce_command at /usr/share/ansible/collections/ansib                     le_collections/community/network/plugins/modules/ce_command.py

<192.168.X.X> ANSIBLE_NETWORK_IMPORT_MODULES: running community.network.ce_command

Even the Redhat engineer mentioned that the collection is not loading but I showed the "ansible-galaxy collection list" from the EE and it showed the modules.

Is there a way to find out why the collection is not found when the Playbook is executed from the AAP? Are we missing any settings?

reddit.com
u/antiriad76 — 14 days ago
▲ 11 r/ansible

"become: true" not working with Ubuntu 26.04 LTS

Hey folks,

I'm still relatively new to ansible but I've a bunch of well working roles and playbooks at this point, with some of them requiring elevated privileges, so I use "become: true" with those tasks. I use ssh key authentication for login and different becomes passwords for each machine, stored in an ansible vault. Everything works fine with my Ubuntu 24.04 LTS machines but today I tried 26.04 LTS and get the following error:

Timeout (12s) waiting for privilege escalation prompt".

Has anyone seen this behavior?

reddit.com
u/ksmt — 15 days ago