r/openstack

I built a tool that deploys a fully functional OpenStack on Ubuntu/Debian with a single command

I built a tool that deploys a fully functional OpenStack on Ubuntu/Debian with a single command

Hey everyone,

I've been working on DeployStack, an open-source CLI tool that deploys a complete, working OpenStack environment on a single Debian/Ubuntu node — batteries included.

Why I built it

If you've ever tried to set up OpenStack for development or testing on Ubuntu, you know the pain. Devstack is messy and developer-oriented, Microstack is locked into Snap and doesn't configure Cinder or Neutron properly out of the box, and tools like Kolla-Ansible or Juju are overkill for a single node. On RHEL/CentOS there was Packstack, which actually worked. On Debian/Ubuntu, nothing comparable ever existed — so I built it.

What it does

One command:

deploystack deploy --allinone

A few minutes later you have a fully working OpenStack with:

  • Keystone, Glance, Nova, Neutron, Placement, Horizon
  • Cinder with LVM backend (loopback or physical volume) — works immediately, no extra steps
  • Neutron with OVS or OVN — instances have internet access out of the box
  • Automatic network interface detection — no manual bridge configuration
  • Floating IPs working immediately after deployment

You can also launch instances directly:

deploystack launch --name my-vm --image ubuntu --password MySecret123

And download and upload cloud images automatically:

deploystack image upload --os ubuntu --version noble --arch amd64

What makes it different from Microstack

Microstack gives you OpenStack "installed" but not "working" — Cinder requires extra flags that are marked experimental and often fail, and instances don't have internet access without manual network configuration. DeployStack configures everything end-to-end, including OVS/OVN bridges, LVM volumes, and provider networks.

Stack

  • Python 3.10+
  • Debian/Ubuntu (tested on Ubuntu 22.04, 24.04)
  • OpenStack Caracal
  • OVS or OVN for Neutron

Still in active development — a .deb package is coming soon.

GitHub: https://github.com/St3vSoft/DeployStack Wiki: https://github.com/St3vSoft/DeployStack/wiki

Would love feedback from anyone who's fought with OpenStack deployments before!

u/Sorecchione07 — 5 days ago

[Help] How to achieve Instance HA (Masakari) on a 3-Node Hyperconverged cluster? (Kolla-Ansible Pacemaker conflict)

Hi everyone,

I’m looking for some architectural advice. I have 3 powerful bare-metal servers and I want to deploy a highly available OpenStack cloud on them. Because I only have 3 nodes, they need to be hyperconverged (running both Control and Compute services on all 3 nodes).

My primary requirement is Instance HA—if one of the physical nodes suddenly dies, I need the VMs to automatically evacuate and restart on the surviving nodes. Naturally, I looked into Masakari.

I am currently using Kolla-Ansible, but I've hit an architectural roadblock:

  • Masakari's host-monitor relies on Pacemaker/Corosync to detect host failures.
  • In Kolla, Controller nodes run the full pacemaker service, while Compute nodes run pacemaker_remote.
  • Because my nodes are both Control and Compute, Kolla-Ansible conflicts trying to deploy both pacemaker roles on the same host, breaking the deployment/monitoring.

I am open to any changes necessary to get this working. My questions for the community are:

  1. Is there a clean workaround in Kolla-Ansible for this? Has anyone successfully deployed Masakari on hyperconverged nodes using Kolla?
  2. Alternative Masakari Drivers: I’ve read that Masakari can technically use Consul or direct libvirt polling instead of Pacemaker. Is it worth trying to hack Kolla to use Consul + external IPMI fencing scripts, or is that a maintenance nightmare?
  3. Different Deployment Tools: Do other deployment tools (like OpenStack-Ansible, Kolla-K8s, or Canonical/Sunbeam) handle Instance HA on hyperconverged nodes better than Kolla-Ansible?
  4. The Proxmox Route: Would it be better to just install Proxmox on the bare-metal for node-level HA, and run OpenStack Control and Compute as VMs on top? (I'm worried about the nested virtualization performance penalty here).

Any advice, documentation, or reality-checks would be hugely appreciated. Thanks in advance!

reddit.com
u/UniiMiinD — 6 days ago

Nova/RabbitMQ error?

Hello, this time I have a problem while trying to set up a VM. Everything works and was set up correctly as far I know, but trying to create a server causes an error in Nova conductor I haven't been able to figure out. It persists until a restart and the VM is stuck on "BUILD" status. All config files, passwords, connectivity between nodes has been checked. May I ask for some help?

root@aio-controller ~(keystone)# tail -n 5 /var/log/nova/nova-conductor.log
2026-05-16 16:01:00.899 3427 INFO oslo_service.backend._eventlet.service [None req-9c11cc4e-78d1-4fe3-9962-59cf5153293e - - - - - -] Starting 4 workers
2026-05-16 16:01:00.903 3456 INFO nova.service [-] Starting conductor node (version 32.0.0)
2026-05-16 16:01:00.904 3457 INFO nova.service [-] Starting conductor node (version 32.0.0)
2026-05-16 16:01:00.906 3458 INFO nova.service [-] Starting conductor node (version 32.0.0)
2026-05-16 16:01:00.906 3459 INFO nova.service [-] Starting conductor node (version 32.0.0)
root@aio-controller ~(keystone)# source keystonerc 
root@aio-controller ~(keystone)# openstack server create --flavor m1.tiny --image cirros --security-group secgroup01 --nic net-id=$netID --key-name mykey cirrosnew 
+-------------------------------------+------------------------------------------------------------------------------------------------------+
| Field                               | Value                                                                                                |
+-------------------------------------+------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                                                               |
| OS-EXT-AZ:availability_zone         | nova                                                                                                 |
| OS-EXT-SRV-ATTR:host                | None                                                                                                 |
| OS-EXT-SRV-ATTR:hostname            | cirrosnew                                                                                            |
| OS-EXT-SRV-ATTR:hypervisor_hostname | None                                                                                                 |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000006                                                                                    |
| OS-EXT-SRV-ATTR:kernel_id           |                                                                                                      |
| OS-EXT-SRV-ATTR:launch_index        | 0                                                                                                    |
| OS-EXT-SRV-ATTR:ramdisk_id          |                                                                                                      |
| OS-EXT-SRV-ATTR:reservation_id      | r-vu0i0p25                                                                                           |
| OS-EXT-SRV-ATTR:root_device_name    | None                                                                                                 |
| OS-EXT-SRV-ATTR:user_data           | None                                                                                                 |
| OS-EXT-STS:power_state              | NOSTATE                                                                                              |
| OS-EXT-STS:task_state               | scheduling                                                                                           |
| OS-EXT-STS:vm_state                 | building                                                                                             |
| OS-SRV-USG:launched_at              | None                                                                                                 |
| OS-SRV-USG:terminated_at            | None                                                                                                 |
| accessIPv4                          |                                                                                                      |
| accessIPv6                          |                                                                                                      |
| addresses                           |                                                                                                      |
| config_drive                        |                                                                                                      |
| created                             | 2026-05-16T14:17:49Z                                                                                 |
| description                         | None                                                                                                 |
| flavor                              | description=, disk='10', ephemeral='0', , id='m1.tiny', is_disabled=, is_public='True', location=,   |
|                                     | name='m1.tiny', original_name='m1.tiny', ram='1024', rxtx_factor=, swap='0', vcpus='1'               |
| hostId                              |                                                                                                      |
| host_status                         |                                                                                                      |
| id                                  | e29bed1f-4630-4094-b547-198e75c8faa1                                                                 |
| image                               | cirros (a09135ed-0d8b-4ac8-bce8-8ca6d1b20654)                                                        |
| key_name                            | mykey                                                                                                |
| locked                              | False                                                                                                |
| locked_reason                       | None                                                                                                 |
| name                                | cirrosnew                                                                                            |
| pinned_availability_zone            | None                                                                                                 |
| progress                            | 0                                                                                                    |
| project_id                          | c872b474c2e54238bd55660a8992e4e5                                                                     |
| properties                          |                                                                                                      |
| scheduler_hints                     |                                                                                                      |
| server_groups                       | []                                                                                                   |
| status                              | BUILD                                                                                                |
| tags                                |                                                                                                      |
| trusted_image_certificates          | None                                                                                                 |
| updated                             | 2026-05-16T14:25:01Z                                                                                 |
| user_id                             | 7e14185c00394d3e8d56ebfcd0119d0a                                                                     |
| volumes_attached                    |                                                                                                      |
+-------------------------------------+------------------------------------------------------------------------------------------------------+
root@aio-controller ~(keystone)# tail -n 5 /var/log/nova/nova-conductor.log
2026-05-16 16:01:00.904 3457 INFO nova.service [-] Starting conductor node (version 32.0.0)
2026-05-16 16:01:00.906 3458 INFO nova.service [-] Starting conductor node (version 32.0.0)
2026-05-16 16:01:00.906 3459 INFO nova.service [-] Starting conductor node (version 32.0.0)
2026-05-16 16:02:00.674 3457 ERROR oslo.messaging._drivers.impl_rabbit [None req-a563165f-39cb-41e1-b70d-1188d41ed766 7e14185c00394d3e8d56ebfcd0119d0a c872b474c2e54238bd55660a8992e4e5 - - default default] Connection failed: [Errno 111] ECONNREFUSED (retrying in 1.0 seconds): ConnectionRefusedError: [Errno 111] ECONNREFUSED
2026-05-16 16:02:01.945 3457 ERROR oslo.messaging._drivers.impl_rabbit [None req-a563165f-39cb-41e1-b70d-1188d41ed766 7e14185c00394d3e8d56ebfcd0119d0a c872b474c2e54238bd55660a8992e4e5 - - default default] Connection failed: [Errno 111] ECONNREFUSED (retrying in 3.0 seconds): ConnectionRefusedError: [Errno 111] ECONNREFUSED
root@aio-controller ~(keystone)# openstack server list
+--------------------------------------+-----------+--------+----------+--------+---------+
| ID                                   | Name      | Status | Networks | Image  | Flavor  |
+--------------------------------------+-----------+--------+----------+--------+---------+
| b4e1442e-b6c4-4452-aea2-e601af0a2552 | cirrosnew | BUILD  |          | cirros | m1.tiny |
+--------------------------------------+-----------+--------+----------+--------+---------+
root@aio-controller ~(keystone)# 
reddit.com
u/BladyGreg — 5 days ago

Live Migration Failure for Instance with PCI Passthrough (OpenStack Epoxy / Ubuntu 24.04)

Hi everyone,

I encountered an issue when trying to perform a live migration for an instance with PCI passthrough.

Environment:

Issue Description: I can successfully spawn instances with PCI passthrough on every compute node without any issues. However, when I attempt to live migrate the instance via the Dashboard (Horizon), the process fails.

I found the following error messages in the nova-compute logs:

---------------------------------------------------------------------------

2026-05-13 15:29:41.668 7 INFO nova.compute.rpcapi [None req-3573ed71-a795-4673-8cec-75c834b352e7 1c048bb1747e49fca293e1b9d8c2e854 83b1a4951d534fc6980f7dda61cebeaf - - default default] Automatically selected compute RPC version 6.4 from minimum service version 68

2026-05-13 15:29:50.223 7 INFO nova.compute.manager [None req-3573ed71-a795-4673-8cec-75c834b352e7 1c048bb1747e49fca293e1b9d8c2e854 83b1a4951d534fc6980f7dda61cebeaf - - default default] [instance: 2e860bab-d6cd-49e7-a72b-b813537d2f33] Took 9.07 seconds for pre_live_migration on destination host ecc-edge-compute01.

2026-05-13 15:29:50.498 7 WARNING nova.compute.manager [req-585626ca-e41f-4522-97b5-dbe2d3179410 req-c44b83bf-65da-43d1-b2d0-60a39583a4db d73bc2af52f2481ba54878eaabd331aa e28d9231c61e48259e7fa2211e3b65fe - - default default] [instance: 2e860bab-d6cd-49e7-a72b-b813537d2f33] Received unexpected event network-vif-plugged-aef81b5a-d016-4286-a4b0-e07213f9f86c for instance with vm_state active and task_state migrating.

2026-05-13 15:29:51.301 7 ERROR nova.virt.libvirt.driver [None req-3573ed71-a795-4673-8cec-75c834b352e7 1c048bb1747e49fca293e1b9d8c2e854 83b1a4951d534fc6980f7dda61cebeaf - - default default] [instance: 2e860bab-d6cd-49e7-a72b-b813537d2f33] Live Migration failure: Requested operation is not valid: cannot migrate domain: 0000:3b:00.0: VFIO migration is not supported in kernel: libvirt.libvirtError: Requested operation is not valid: cannot migrate domain: 0000:3b:00.0: VFIO migration is not supported in kernel

2026-05-13 15:29:51.760 7 ERROR nova.virt.libvirt.driver [None req-3573ed71-a795-4673-8cec-75c834b352e7 1c048bb1747e49fca293e1b9d8c2e854 83b1a4951d534fc6980f7dda61cebeaf - - default default] [instance: 2e860bab-d6cd-49e7-a72b-b813537d2f33] Migration operation has aborted

2026-05-13 15:29:52.297 7 INFO nova.compute.manager [None req-3573ed71-a795-4673-8cec-75c834b352e7 1c048bb1747e49fca293e1b9d8c2e854 83b1a4951d534fc6980f7dda61cebeaf - - default default] [instance: 2e860bab-d6cd-49e7-a72b-b813537d2f33] Swapping old allocation on dict_keys(['0908272f-fb28-4fcd-b888-faed3ebe008d']) held by migration c544f968-a817-43c0-9ad8-ce31da02715a for instance

2026-05-13 15:29:57.274 7 WARNING nova.compute.manager [req-d154f165-86f0-4461-825f-5d6732f75dec req-93ca2943-9913-4eb8-938d-b7b3b352d741 d73bc2af52f2481ba54878eaabd331aa e28d9231c61e48259e7fa2211e3b65fe - - default default] [instance: 2e860bab-d6cd-49e7-a72b-b813537d2f33] Received unexpected event network-vif-unplugged-aef81b5a-d016-4286-a4b0-e07213f9f86c for instance with vm_state active and task_state None.

---------------------------------------------------------------------------

Does anyone have any ideas or suggestions on why this might be happening?

Thanks in advance for your help!

reddit.com
u/RickWangRD — 9 days ago

Complete OpenStack beginner with 3 servers for lab, which architecture?

Hey everyone,

Total newbie to OpenStack here. I've got a decent Linux sysadmin background but never touched OpenStack before, and I really want to build a proper lab to learn.

I'm working with 3 physical servers I can dedicate to this, each with 4+ NICs. I also have switches and a firewall on hand if I need them.

My current thinking is to deploy all 3 nodes as combined controller + compute.

I don't want to burn all my hardware just running the control plane and end up with barely nothing left to actually spin up VMs and experiment. But I'm honestly not sure if that's a smart move for learning.

So I'd love some input from people who've been down this road:

  • Is the converged controller+compute setup a reasonable starting point, or should I run the controlers as VM on a 4th hypervisor
  • Use Kolla-Ansible?
  • With 4 NICs per node, how would you split management, external, tenant, and storage traffic?
  • Any diagrams, tutorials, or blog posts that explain how to deploy ?
reddit.com
u/Gilgaflynn — 11 days ago

OpenStack Alternatives

Hi,

We are in the process of deploying openstack in our firm but from my (limited) research it seems that OpenStack isn't so popular anymore and that businesses are moving away from it.

Firstly, is this true? If so, what are the alternatives that businesses are moving to?

And as a side note, does any one have any tutorials they can recommend for a newbie?

Thanks!

Edit: Also, how much in depth hardware knowledge does one need to deploy and administer openstack?

reddit.com
u/nightcrow100 — 11 days ago

PCIe topology for GPU/Infiniband VMs

Hi everyone,

I'm working on an OpenStack deployment with several GPU-enabled nodes, each having a fairly complex PCIe topology connecting 8x H200 GPUs to 4x ConnectX-7 InfiniBand NICs.

PCI passthrough is working correctly and inside the VM we can see all GPUs, NVSwitches, and NICs without issues.

However, in order to achieve near bare-metal performance for distributed AI workloads, the default libvirt XML generated by Nova is not enough. We need to:

- pin guest memory to the correct NUMA nodes

- pin vCPUs appropriately

- create a guest PCIe topology that closely mirrors the host topology

NVIDIA documents this approach here:

https://docs.nvidia.com/ai-enterprise/planning-resource/optimizing-vm-configuration-ai-inference/latest/configuring-vms.html#virtual-cpu-configuration

Without these adjustments, topology-aware libraries like NCCL cannot correctly compute optimal communication graphs, and microbenchmark performance is significantly worse than bare metal.

Our current workflow is roughly:

- create the VM normally through Nova

- intercept/dump the libvirt XML from nova_libvirt

- patch the XML with a custom script following the NVIDIA recommendations

- restart the domain with virsh

After this, performance becomes extremely close to bare metal and everything works well.

The problem is that any Nova-driven operation (soft reboot, hard reboot, cold migration, etc.) regenerates the libvirt XML, so we need to repeat the entire procedure every time.

My question is:

Does Nova expose any mechanism to deeply customize or persist libvirt XML configuration for instances?

I know about flavor/image metadata and extra specs, but they seem too limited for this level of topology customization. Ideally we'd like a cleaner and more OpenStack-native approach than patching XML after instance creation.

Has anyone here tackled something similar for high-performance GPU/NVLink/InfiniBand workloads?

Thanks!

reddit.com
u/mjf-89 — 13 days ago

Can I get some help? I checked every configuration file, every log, problems arise only with this command.

root@aio-controller stack(keystone)# su -s /bin/bash placement -c "placement-manage db sync"

root@aio-controller stack(keystone)# su -s /bin/bash nova -c "nova-manage api_db sync"

root@aio-controller stack(keystone)# su -s /bin/bash nova -c "nova-manage cell_v2 map_cell0"

Cell0 is already setup

root@aio-controller stack(keystone)# su -s /bin/bash nova -c "nova-manage db sync"

ERROR: Could not access cell0.

Has the nova_api database been created?

Has the nova_cell0 database been created?

Has "nova-manage api_db sync" been run?

Has "nova-manage cell_v2 map_cell0" been run?

Is [api_database]/connection set in nova.conf?

Is the cell0 database connection URL correct?

Error: Can't load plugin: sqlalchemy.dialects:mysql_pymysql

reddit.com
u/BladyGreg — 14 days ago

Hi all,

I've been trying for the past weeks to get the following going:

3 datacenters -> 2 big, one small (space-wise)
Openstack Helm + Rook-ceph (stretched mode)

I'd like to setup 3 availability zones for customers to use. One in dc1, one in dc2 and one "stretched" zone for workloads that can't do their own HA.

So far, I've managed to get Ceph configured and set up the corresponding Cinder backends and volume types (disabling cross az attach in Nova and az fallback in Cinder), but I run against a brick wall with two services - Nova/Horizon and by extension Octavia (Amphora).

The issue I encounter is that - because I need multiple backends in Cinder - I need different volume types for the different AZs even though they are all the same "quality" (nvme). Therefore, as Horizon does not allow me to select the volume type at the time of instance creation, the creation of new Instances fails when Nova tries to request a volume in the selected Nova/Cinder AZ.

I can create the volume first with the correct volume type and then create an instance from it, but that's very inconvenient.

With Octavia it's similar. If I don't hardcode the volume type in the config, octavia requests the instance in the correct Nova AZ, but the volume creation will fail there as well.

Did anyone encounter this problem before? And if so, how did you solve it?
Or am I completly misunderstanding AZs?

reddit.com
u/_k4mpfk3ks_ — 14 days ago