As you are I am sure all aware the OpenStack Stein release is available and documented on the RDO site now.
While the “stein” release has been available for a while this post took a long time to prepare as I had to work through a lot of issues before I could get a perfectly working installation so this post was delayed until I could document a working implementation. So this post is in two parts, issues to be aware of, and then how to get a fully working install.
Following the post will give you a fully working install, including additional compute nodes should you wish to do so. I would recomend adding additional compute nodes as openstack guestimates available memory based on physical memory installed and in an allinone setup does not take into account that most of the physical memory is not actually available but is used by openstack processes.
There are some issues to be aware of when installing the RDO release of Stein
Networking
The first major thing to note is that in the Stein relase openvswitch (ovs) networking has been replaced by Openflow (ovn) as the default. It uses the mechanism driver OVN instead of Openvswitch and a ml2 driver of geneve instead of vxlan… which simply does not work. Using packstack with the default OVN networking results in the following warning after each run; so no good reason for it to be the default.
Additional information: * Parameter CONFIG_NEUTRON_L2_AGENT: You have choosen OVN neutron backend. Note that this backend does not support LBaaS, VPNaaS or FWaaS services. Geneve will be used as encapsulation method for tenant networks
Apart from the warning message there is the small detail that geneve for tenant networks does not work, either one or more required services are not setup correctly or it is just not yet supported. You will always get errors when trying to lauch any instance as below, which places the instance into an error state (basically if you use geneve for tenant networks you cannot launch instances)
2019-07-21 17:38:31.846 10172 ERROR neutron.plugins.ml2.managers [req-4118a28e-b444-4031-910e-c570b760d0e9 f1c86cefca4e463b82851b3819cf9623 bdae98d8c35b4303b66f0aa9ddb63275 - default default] Failed to bind port d72d31ba-0dcd-4146-af83-16bafb85138f on host region1server1 for vnic_type normal using segments [{'network_id': '941c40cb-934d-4939-a2be-1e64440db0b9', 'segmentation_id': 27, 'physical_network': None, 'id': '1c302d60-fe08-4848-9b09-fa199b174079', 'network_type': u'geneve'}]
Also note that when creating a tenant network when tenant network types are set to a list (ie: =vxlan,geneve) you do not actually get a choice of what type of tenant network to create, it will always just use the first entry in the list (by default with OVN the only entry set is geneve, which doesn’t work). This prevents configuring both types to have a working vxlan while trying to debug geneve so just don’t use geneve.
If you have installed the release already and are having the issue where when lauching an instance from the dashboard results in different ip-addresses appearing/disapearing before the instance goes into a failed “error” state you may be able to resolve the issue by editing the neutron plugins ml2_conf.ini file and changing the tenant network type from geneve to vxlan; you will probably also have to change a few setting from ovn to openvswitch also based on the commands I used to create my working system.
Another point worth noteing is that for floating ip addresses openvswitch is still required, at least the section on the RDO website for using an existing external network shows only an ovs example, and of course the network bridge is still openvswitch.
With the default OVN install after installtion the default MTU sizes on networks created by horizon still seem to be set to 1500, the documentation at https://docs.openstack.org/networking-ovn/latest/install/migration.html indicates these need to be much lower, but there is no way to override the defaults when creating tenant networks in horizon. There are major changes between OVS and OVN summarised at https://docs.openstack.org/networking-ovn/latest/faq/index.html the main one for debugging being there is no qrouter network namespace to look at.
Disclaimer: if I had spent another few more months working on it I may have been able to get everything working using OVN rather than OVS, despite google searches returning everyone having trouble with geneve. However I was not willing to spend that extra time as my primary goal was to get a working system I could use to continue using it for the testing/development tasks I wish to use it for; basically I wanted to replace my aging Ocata system with the Stein release. As that has been achieved be reverting to OVS networking my current goal has been achieved. It may be entirely possible to get an environment up and running using OVN and geneve networking, but this post is about reverting to OVS and vxlan networking.
In order to get it working for my use I decided to use Openvswitch only.
If you want to ‘plug-in’ to an existing home network review the documentation at https://www.rdoproject.org/networking/neutron-with-existing-external-network/ before running packstack although I have included the relavant commands in this post, I also covered setting up the openvswitch bridge in my much earlier post on setting up for the queens install at https://mdickinson.dyndns.org/php/wordpress/?p=872 which may or may not provide additional information. Basically you need to set up a openvswitch network bridge on the machine you will use for providing networking.
Console access
Another issue is that console access into instances from the horizon dashboard once again does not work ‘out-of-the-box’. All the settings needed to setup console access need to be manually set, how to do that I cover later on.
High IO activity
And the final thing worth mentioning is use fast disks if possible, “iotop” shows that mariadb (mysqld) even with zero instances running hammers the disks, you can expect around a 4% wait busy time in top due to database activity even when nothing is happening if installed on a normal desktop.
How I installed a fully working setup, using Openvswitch instead of OVN
Creating br-ex as an openswitch bridge is covered in my earlier post linked to above plus examples are on the RDO site in the documentation on using an existing external network. Creating the VM(s) and setting up those I will not repeat here; just make sure all your servers can reference each other in the hosts file of each machine (or are in dns) and you have an openvswitch bridge setup on the controller host.
My ethernet card on the allinone host is ens3, configured under openvswitch to br-ex. My controller/network (allin1) host is 192.168.1.172, and I also define a second compute node on a host at 192.168.1.162. My existing external network is 192.168.1.0/24.
I explicitly selected an Openvswitch installation as that was the networking I could get working.
The initial step is to generate a packstack answers file to edit before actually using it.
packstack --gen-answer-file=answers_default_allin1.txt \ --allinone --timeout=999999 --default-password=password \ --provision-demo=n \ --os-neutron-ovs-bridge-mappings=extnet:br-ex \ --os-neutron-ovs-bridge-interfaces=br-ex:ens3 \ --os-neutron-ml2-type-drivers=vxlan,flat \ --os-neutron-ml2-tenant-network-types=vxlan \ --os-heat-install=y --os-heat-cfn-install=y \ --os-magnum-install=y \ --os-neutron-l2-agent=openvswitch
Edit the answers file, change the setting below to as shown, packstack has set it to ‘ovn’ which will cause serious networking problems
CONFIG_NEUTRON_ML2_MECHANISM_DRIVERS=openvswitch,l2population
I personally add additional compute hosts at this step, do not edit the config_compute_hosts entry if you are just doing a allinone install to a single host. The below example is for my environment where the ‘allinone’ host is 172 and my second compute host is 162.
CONFIG_COMPUTE_HOSTS=192.168.1.172,192.168.1.162
Then run the packstack command using the answers file, with a huge timeout value. It should complete without problems.
If there are timeouts the command can be rerun but rerun it in a “screen” session if you are not on the console as a rerun will drop the network (with a screen session you can just ssh back into the server and reconnect to the screen session).
packstack --timeout=99999 --answer-file=answers_default_allin1.txt
Then you should manually create the external network and subnet pool using the command line as I had issues with doing so using the dashboard. Note that I use a 192.168.1.0/24 network, make sure you change it to your external network. Also the allocation pool should be addresses in a range your dhcp-server/router does not issue, I use only 240-250 for openstack (I use 160-190 for non-openstack physical and kvm machines) as I have few dynamic devices and my router has never issued anything above 14 (grin).
While such a limited 240-250 range reserved for floating ipaddrs for openstack use may seem small it is actually probably too large for a home lab (see usage tips at the end of the post).
source ~/keystonerc_admin neutron net-create external_network \ --shared --provider:network_type flat \ --provider:physical_network extnet \ --router:external neutron subnet-create --name public_subnet \ --enable_dhcp=False \ --allocation-pool=start=192.168.1.240,end=192.168.1.250 \ --gateway=192.168.1.1 external_network 192.168.1.0/24
As we did not provision the demo manually load the cirros image to test with
source ~/keystonerc_admin curl http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img | glance \ image-create --name='cirros image' --visibility=public --container-format=bare --disk-format=qcow2
Logon to the horizon dashboard as the admin user (http://your-allinone-server-ip/)
- create a userid for yourself as an admin user under the identity/user tab, leave project blank
- then create a project for yourself, make yourself the only member with an admin role
- logoff the admin user
Now is a good time to reboot the server(s) to make sure everything works.
Logon to the dashboard as the new user you just created (http://your-allinone-server-ip/)
- create a tenant network from the project network tab (ie: 10.0.1.0/24 with subnet range 10.0.1.2,10.0.1.254, start at .2 not .1 as one address is reserved for the gateway/router/dhcp-server, in this example 10.0.1.1 will be the gateway), ensure dhcp is enabled
- launch a test instance using the cirros image and m1.tiny flavor, using your new tenant network
- if it launches ok and goes to active/running you are in business
- click on the instance name and check the log to make sure it started ok
- note: console access is not working at this point as noted in the issues earlier
If you try to associate a floating ip at this point it will fail as your project has no access to the external network yet, so
- under the project network/router tab create a router for the project using the external network you created earlier
- select the router and add an interface using the tenant network you have created
- now you can associate a floating ip to the instance
You may be supprised you cannot ping the instance from your external network using the floating ip-address, that is because access into the the floating ip is not allowed by the default security rules so create a new network security group
- select the network tab, select security groups
- create a new rule ssh-and-icmp
- add rule “all icmp,ingress,cdir,0.0.0/0”
- add rule “ssh,ingress,cdir,0.0.0.0/0”
- go back to compute/instances and modify the security groups by adding your new rule group, security group rules can be added/deleted on the fly while the instance is running
- you can now ping your cirros test instance using the floating ip address
- you can also ssh cirros@floatingipaddr and login with passsword “cubswin:)” as seen in the instance log
Most cloud images do not allow direct login but need key pairs. For example a Fedora cloud image expects login to fedora instances only to the fedora user via ssh key.
- select instances/key pairs and create a new ssh key, when prompted to download save it somewhere you can remember it, and copy it to every workstation you will be using to login to your instances (I normally place it in my ~/.ssh directory)
- you will need to access most cloud instances with ssh clouduser@ipaddress -i keypairname.pem as most do not allow userid/password logins, the test cirros image being the exception.
Now as jolly ssh is a command line interface lets mention something you do need to do on the command line. There are a lot of openstack commands that are issued from the command line and you don’t want to have to do them all as the admin user, especially as you really only need to worry about your project at this point. So cd to the root user home directory and copy keystonerc_admin to keystonerc_yournewuserid, edit keystonerc_yournewuserid to use the userid, password and default project you created for your openstack userid. You should copy that to your personal unix directory and source that file instead of keystomerc_admin when issuing commands; and as you created your new userid as an admin you can issue pretty much any command needed.
At this point you will probably want to create images for actual cloud distributions, most distributions provide images such as CentOS-7-x86_64-GenericCloud-1704.qcow2 and Fedora-Cloud-Base-30-1.2.x86_64.qcow2. You will also need to create custom flavours for those images as they will each have their own requirements.
For example a “qemu-img info Fedora-Cloud-Base-30-1.2.x86_64.qcow2 shows the disk image size is actually 4Gb which is the minumum needed and should be set on the image to prevent flavors with smaller disk sizes using it, so loading that as an image would be
source ~/keystonerc_admin glance image-create \ --name "Fedora 30" \ --visibility public \ --disk-format qcow2 \ --min-ram 512 \ --min-disk 4 \ --container-format bare \ --protected False \ --progress \ --file Fedora-Cloud-Base-30-1.2.x86_64.qcow2
And you would then create a custom flavour for fedora30 with at least 512Mb of ram and 4Gb or larger disk. Certainly you could use an existing flavour than allocates Gbs of ram and a huge disk, but why would you want to in a home lab where resources atre scarce.
Getting console access working
Make sure all instances on all compute nodes are stopped, you will be rebooting as that is the simplest way to pick up the changes.
On all compute nodes edit /etc/nova/nova.conf, all changes to be made are in the [vnc] section
- set server_listen to the ip-address of the compute node you are updating the file on, default is 127.0.0.1
- set server_proxyclient_address to the ip-address of the compute node you are updating the file on
- set novncproxy_base_url=http://controller-ipaddr:6080/vnc_auto.html, not the compute node address but the controller address
- set xvpvncproxy_base_url=http://controller-ipaddr:6081/console, not the compute node address but the controller address
- reboot everything
- when all the servers have stabalised restart the instances, you now have console access via the horizon dashboard to all compute nodes
Testing the install, if you have multiple compute nodes
If, like me, you installed additional compute nodes you should test them all to ensure they work as expected.
I do this by disabling all but one compute host at a time from the hypervisor tab and starting instances to force an instance to start on specific compute hosts, and after you have an instance on each compute host remember to re-enable all the compute hosts.
You have by doing so tested you can start an instance on each compute node.
You should then test private tenant networking between compute hosts by pinging or sshing between the instances on different compute nodes, plus check you have console access to instances on all the compute nodes.
Usage tips
It lies, the dashboard and openstack environment report lots of free memory on the allinone host but assumes all memory resources are available based on physical memory installed, it does not take into account that the openstack software is actually using most of that memory, so trying to lauch an instance that needs a lot of memory will probably fail. If the dashboard shows the compute node on the allinone server shows 14Gb free on a 15Gb server you can safely assume you have at most 5Gb free.
What I do is on the allinone server start a small footprint instance on the private tenant network I am using assigned a floating ip-address, then disable the allinone compute node to force all other instances to start on my second compute node. The small footprint instance I can then use as a gateway server to access all instances on the private tenant network without any need to assign additional floating ip-addresses to any of them. To access the instances on the private network on each desktop machine that wants to access them simply add a route, for example to access instances on a private network of 10.0.1.0/24 through a gateway instance assigned a floating ip of 192.168.1.241 you would just add the route below and all those desktop machines can immediately locate and access instances on the private network.
route add -net 10.0.1.0/24 gw 192.168.1.241
This means your reserved floating ip-address range can be extremely small as you only need one per tenant network, not one per instance you want external access to.
The reason I start the gateway instance on the allinone node is simply because that is the network node, and if it is not available nothing would work anyway.
Another thing I would generally do is install nrpe to allow monitoring. While that would normally be a simple case of “yum -y install epel-release”, “yum -y install nrpe nagios-plugins-all”, “systemctl enable nrpe”, “systemctl start nrpe” there is actually a catch.
Remember that openstack must have firewalld disabled so you cannot use firewall-cmd to open the nrpe port, plus it’s use of iptables to load all its convoluted but necessary network traffic rules takes quite a bit of time on a system restart. There is no safe way to automate opening the nrpe port as you must be 100% sure all the iptables rules needed for openstack to function have been completely setup before touching them. The only safe way I have found is to manually enter the command below about 10 minutes after rebooting the server(s) to insert the rule to open the nrpe port 5666.
iptables -I INPUT -p tcp --dport 5666 -m conntrack --ctstate NEW,ESTABLISHED -j ACCEPT
Even if you do not monitor using nagios/nrpe it is important you know that you should not play with the iptables rules until after the openstack environment has fully initialised and stabalised.
Summary
OpenStack Stein release using openvswitch, ovs and vxlan networking is, if you were able to follow the directions, installed and working.
Enjoy.