Docker Isolation, and non-Isolation

Docker is not KVM, there are major security trade-offs with a container, The key ones are shown below.

Processes are not isolated

The processes that are run by containers run for all intents and purposes as processes on the Docker host machine, this is an issue as an admin on the host machine may inadvertently cause failure of applications within a container. For example in a memory shortage situation on the host an admin may kill a process that is a memory hog on the host without realising it is spawned from a container application.

The display below is a process display on the host, every single one of the results (apart from the grep itself) are processes launced from within the container.

[root@vosprey2 log]# ps -ef | grep herc
root      1203  1185  0 15:45 ?        00:00:00 /bin/bash /home/mark/hercules/tk4-minus/start_system_wrapper.sh
root      1240  1203  0 15:45 ?        00:00:00 /usr/bin/su -c bash /home/mark/hercules/tk4-minus/start_system.sh > /var/tmp/hercstart.log 2>&1 mark
mark      1241  1240  0 15:45 ?        00:00:00 bash -c bash /home/mark/hercules/tk4-minus/start_system.sh > /var/tmp/hercstart.log 2>&1
mark      1242  1241  0 15:45 ?        00:00:00 bash /home/mark/hercules/tk4-minus/start_system.sh
mark      1244  1203  0 15:45 ?        00:00:00 SCREEN -t hercules -S hercules -p hercules -d -m hercules -f mark/marks.conf
mark      1246  1244  9 15:45 pts/0    00:00:49 hercules -f mark/marks.conf
mark      3321  1246  0 15:46 pts/0    00:00:00 hercules -f mark/marks.conf
mark      3322  3321  0 15:46 pts/0    00:00:00 /bin/bash /home/mark/hercules/tk4-minus/mark/scripts/printer_interface.sh
mark      3333  3322  0 15:46 pts/0    00:00:00 /bin/bash /home/mark/hercules/tk4-minus/mark/scripts/printer_interface.sh
root     11963 10154  0 15:53 pts/0    00:00:00 grep --color=auto herc
[root@vosprey2 log]# 

Network connections are hidden

In a complete reverse of the above issue, network connections to applications within a Docker container are not visible on the host machine. That can make the job of diagnosing network connectivity issues difficult as the admin would normally look on the host for established tcpip sessions; but established sessions to container applications are not visible on the host.

Refer to the output below. The container application listens on port 3270 but while netstat shows no established sessions in this example a computer remote to the docker host does have an established connection via the host into the container on that port. A docker exec into the application lets us see that established connection from 192.168.1.187 and it is dangerous to not have that connection displayed on the docker host. The last command in the output below was run on the computer that established the session, it correctly shows it is connected to the docker host (189) to access the port mapped to the container even though displays on the docker host do not show the connection.

By dangerous I mean in a large environment if there are network issues to be resolved an admin cannot be expected to attach to or exec into every container to see what established connections exist on a host (assuming the container is even built to contain the netstat command), not having established connections to a host displayable on the host I would consider a serious issue. And a ‘ip netns’ shows no network namespaces are in use so I really don’t know how the connection is being hidden.

[root@vosprey2 log]# netstat -an | grep 3270
tcp6       0      0 :::3270                 :::*                    LISTEN     

[root@vosprey2 log]# docker exec -ti mvs38j1 /bin/bash
bash-5.0# netstat -an | grep 3270
tcp        0      0 0.0.0.0:3270            0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:3270          127.0.0.1:39892         ESTABLISHED
tcp        0      0 127.0.0.1:39896         127.0.0.1:3270          ESTABLISHED
tcp        0      0 172.17.0.2:3270         192.168.1.187:48842     ESTABLISHED
tcp        0      0 127.0.0.1:39892         127.0.0.1:3270          ESTABLISHED
tcp        0      0 127.0.0.1:3270          127.0.0.1:39896         ESTABLISHED
tcp        0      0 127.0.0.1:39894         127.0.0.1:3270          ESTABLISHED
tcp        0      0 127.0.0.1:3270          127.0.0.1:39894         ESTABLISHED
unix  2      [ ACC ]     STREAM     LISTENING     16297755 /run/screen/S-mark/71.c3270A
unix  2      [ ACC ]     STREAM     LISTENING     16297819 /run/screen/S-mark/76.c3270B
bash-5.0# exit
exit

[root@vosprey2 log]# netstat -an | grep 3270 
tcp6       0      0 :::3270                 :::*                    LISTEN     
[root@vosprey2 log]#
[root@vosprey2 log]# ip netns
[root@vosprey2 log]#

[mark@phoenix mvs38j]$ netstat -an | grep 3270
tcp        0      0 192.168.1.187:48842     192.168.1.189:3270      ESTABLISHED
[mark@phoenix mvs38j]$ 

The dangers of running apps in containers as non-root

Everybody will tell you user applications should never be run as the root user, and applications in containers should also follow that rule.

There is a major issue with running applications as a non-root user in containers however.

Remember the processes lauched within a host container run as actual processes on the docker host, and as containers are supposed to be portable there is no way to guarantee that UIDs assigned to users within the container will match UIDs on the docker host.

Refer the the output shown below. Within the container the application runs under the userid ‘ircserver’ with a uid of 1000, all good right.

[root@vosprey2 log]# docker exec -ti ircserver1 /bin/bash
bash-5.0# ps -ef 
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 03:28 ?        00:00:00 /bin/bash /home/ircserver/marks_irc_server
ircserv+    23     1  0 03:28 ?        00:00:00 /home/ircserver/IRC/inspircd-3.3.0/run/bin/inspircd --config=/home/ircserver/IRC/inspircd-3.3.0/run/conf/inspircd.conf
root        81     1  0 03:58 ?        00:00:00 sleep 600
root        82     0  7 04:01 ?        00:00:00 /bin/bash
root        87    82  0 04:01 ?        00:00:00 ps -ef
bash-5.0# grep ircserver /etc/passwd
ircserver:x:1000:1000:Used to run the inspircd IRC server:/home/ircserver:/bin/bash
bash-5.0# 

Wrong !. Displaying the process on the docker host shows a different story. On the docker host it also runs under UID 1000 correctly, but on the docker host the ircserver user does not exist and another user is assigned uid 1000.

[root@vosprey2 log]# ps -ef | grep IRC | grep -v grep
mark     17875 17834  0 15:28 ?        00:00:00 /home/ircserver/IRC/inspircd-3.3.0/run/bin/inspircd --config=/home/ircserver/IRC/inspircd-3.3.0/run/conf/inspircd.conf
[root@vosprey2 log]# 
[root@vosprey2 log]# grep 1000 /etc/passwd
mark:x:1000:1000:Mark Dickinson:/home/mark:/bin/bash
[root@vosprey2 log]# 

The obvious major issue here is that unless all container applications are run as root they cannot be considered portable, as there is no way to ensure UIDs either match or do not exist on every docker host that may ever run the image.

Imagine again from a admin perspective trying to troubleshoot a memory hog process, and a ‘ps’ shows the issue causing process is being run by user ‘fred’ but user fred swears he never started the process; he may not have started the process, it could have been spawned from a container using a matching UID for fred. Admitedly if they were all root processes admins would still have to track down the container causing the impact; but would not have been sidetracked into wasting time chasing after fred.

Also lets not forget that ‘fred’ can at any time kill any of those processes he has been granted ownership of, causing issues with the application within the container.

Operating system dependent

Containers are not truely portable, they should be considered operating system dependent and a container build on Fedora 30 should only run on a Fedora 30 host; and the container and host should be on similar patch levels. If an OS upgrade is done all containers should be rebuilt.

Obviously this depends on the complexity of your container applications. One I have been working on requires –device mapping and overlaying the host /lib/modules directory over the container… because when the container is run ‘uname’ reports the host kernel level not the kernel version the container was build from so the container does not have the correct modules. But it is fair to say a container OS must be a very close match to the host OS in order to function.

Summary

If you are security conscious you would not consolidate applications running on KVM hosts into Docker containers.

If your environment is secure and totally locked down then docker containers can be used to consolidate applications running on KVM instances. You will get no application memory savings from moving an application (if an app needs 1Gb to run on a KVM instance it will still need 1Gb to run in a Docker container) but you will get around 750Mb OS alone overhead back from each KVM instance shutdown if you are migrating from KVM to Docker.

And of course containers start faster than booting a KVM, which I personally do not consider a selling point. If designed properly images enable copies of applications to be started in containers on multiple hosts with minimal effort, of course KVM live migration has been a thing for a long time now so thats not really a major selling point either. Being able to encapulate an entire online application in an image is useful.

Docker networking between Docker containers in a multihost docker engine environment is much hyped, and much is touted about docker swarms, so I’m sure docker internal networking works well although I have not needed to play with any of that stuff.

External networking from the docker containers is another issue. One network issue I have already highlighted above, from a troubleshooting viewpoint on the host.

There are many other networking issues I will probably cover in a later post, when I figure out how to resolve them, lets just say if your docker container needs pass-through access to external hosts outside the internal docker network, prepare for a long frustrating time.

Posted in Unix | Comments Off on Docker Isolation, and non-Isolation

Using Bacula to backup your Fedora/CentOS Linux servers

This post was written to acknowledge bacula has saved my ass again. My main desktop machine failed, a new replacement was purchased and swapped in, new OS installed with same hostname and ip-address. Then installed puppet and let the puppet agent do its thing, one of its tasks was to install and configure bacula-fd. So within 10 minutes of starting the puppet agent I could use bacula to restore all custom server configuration files and everything under /home, as the last bacula backup ran only a few hours before the server died I lost zero data. As bacula is wonderfull it deserves another post.

First be aware bacula is most useful if you have multiple machines (physical or virtual) to backup to a seperate dedicated physical server.
The reason is simply your data is going to be available; if the backup server fails in any way your actual servers being backed up lose no data and you can quickly rebuild tha backup server and do a full backup of all the machines to recreate a backup point for them all (yes, you will have lost backups of individual files you may have accidentally deleted prior to that date), and if a server fails you can simply rebuild that and restore from the backup server.

If you have only one machine bacula would still be useful if you needed to restore one or more files from a specific date if needed should you accidentally delete or corrupt them, although obviously if that server fails you have lost everything.

This post is on how to set it up in a home lab environment. My environment is a Fedora30 backup server used to backup Fedora and CentOS servers. I use only one storage daemon with maximum concurrent jobs zet to 1 so jobs queue to ensure only one job runs at a time, and I use multiple schedules.

While it is a large post, with what may seem like a lot of manual effort and time to set bacula up, it is in fact extremely simple to setup and get running and you will probably spend less than 10 minutes in an editor on the configuration files, unless you have dozens of clients to add to be backed up.

So the ideal configuration would be

  • a dedicated backup server with a lot of storage (or usb ports for a very large external disk), or at least one always-on machine that can do the backups
  • multiple servers to be backed up

The requirement for a lot of storage or a very large external disk and considerations to take into account are

  • even though bacula does compress the data used by backups you must remember that multiple backups will be kept depending upon your configuration, for example you may do a full backup once every 30 days and incremental backups across those 30 days, if you change a lot of large files dailt you will need more space
  • you should size for the maximum, for example if you are backing up machines with 1Tb disks that are only 30% used in the 30 day backup cycle mentioned above your backups would probably only take around 400Gb or less per machine (I backup 9 machines with a combined actual filesystem usage of around 3.5Tb using the 30 day cycle mentioned above and all the backups fit (just) into 2Tb of space), however over time space usage on the machines being backed up will grow so you should size for the maximum space that could be used not what is currently used
  • what I use as a backup storage device for my home lab is an external 2Tb disk, the major benefit of using a single external disk is that as backup space needed grows you can simply replace it with a larger device on the same mount point (after copying the backup data off the old disk onto it of course)
  • do not grow space by linking multiple disks into a pool or union, because if any single one of those disks fail you have lost all your backups anyway, better to use a single large disk external or otherwise so if it fails you simply replace it without needing to reinstate pool or union structures
  • if the backup device used by bacula on the backup machine fails with data loss and needs to be replaced with an empty device you need to drop and recreate the database and delete all the work files it uses, then do full backups on all the machines. The reason for that is you do not want the bacula backup service to try to append to or recall from data that no longer exists
  • if sized correctly the disk based virtual backup volumes will just be re-used resulting in static space usage once they have all been used and start recycling. Personally I hit an issue when a few 100Gb that should have been excluded got backed up and I had to manually label a few extra volumes, to avoid having to manually delete those when they eventually expired I changed my configuration to “Action On Purge = Truncate” which does not affect existing volumes, only new volumes created so for existing volumes from bconsole update them all as per this example : “update volume=Inc-0091 ActionOnPurge=Truncate”
Update: 14Mar2021 new commands are introduced in more recent versions of bacula
to work in conjunction with the "Action on purge = truncate" option. My Catalog
backup job now looks like this.

Job {
  Name = "BackupCatalog"
  JobDefs = "DefaultJob"
  Client = vmhost1-fd
  Level = Full
  FileSet="Catalog"
  Schedule = "WeeklyCycleAfterBackup"
  RunBeforeJob = "/usr/libexec/bacula/make_catalog_backup.pl bacula"
  # I take a copy of the catalog backup and move it to my database dump/backup directory
  RunAfterJob  = "/var/spool/bacula/marks_post_BackupCatalog"
  Write Bootstrap = "/var/spool/bacula/%n.bsr"
  Priority = 11                   # run after main backup
  RunScript {
    RunsWhen=Before
    RunsOnClient=No
    Console = "prune expired volume yes"
  }
  RunScript {
    RunsWhen=After
    RunsOnClient=No
    Console = "purge volume action=all allpools storage=File"
    Console = "truncate storage=File pool=File-Incr-Pool"
    Console = "truncate storage=File pool=File-Full-Pool"
  }
}

The actual software requirements needed to run bacula are

  • on the backup server itself
    • the bacula-fd service, this is on all servers and is used as the client to perform backups/restores
    • the bacula-sd service, this is the storage service, it requires the bacup device to be available before it is started. Note that the bacula-sd service can be on a seperate machine, and you can in fact have multiple servers running a bacula-sd service in your environment which allows not just spreading the load but the setup of copy and migration jobs to get ‘offsite’ backups, nice but not really relevant for a home lab
    • the bacula-dir service, this is the director service that runs the backup jobs on schedule and runs ad-hoc backup and restore jobs
    • the bacula-console package, the command line interface needed to run restore jobs, control scheduled jobs and servers (cancel backups, disable/enable servers to be backed up etc)
    • mariadb is what I prefer as the database to be used, it also supports postgress and sqlite3 but as I use mariadb only that is covered in this post

    when using external disks the bacula-sd and bacula-dir services should be started (in that order) only after the external disks have been mounted so do not autostart those if using external disks

  • on all the servers to be backed up
    • the bacula-fd service, this is on all servers and is used as the client to perform backups/restores
    • the bacula-console package, the command line interface needed to run restore jobs for the server, and all the admin functions mentioned in the server section
    • optionally install the bacula-console-bat package on desktops as it needs gtk for a gui interface, personally I find the command line interface easier to use

The configuration that should be set before running bacula should be

  • On the backup server itself, the one running the bacula-dir service
    • look in /usr/libexec/bacula, you need to check/edit run the scripts create_mysql_database, make_mysql_tables, grant_mysql_privileges to create the database to be used
  • On the backup server itself in /etc/bacula
    • bacula-sd.conf : storage configuration
      • set all the passwords, they must match the ones used in bacula-dir.conf, bacula-fd.conf, bconsole.conf
      • set the archive device file path, it must be a mounted filesystem with enough space to hold all your backups, this file path cannot easily be changed later so get it correct now. This may be external media as long as you can guarantee it will be mounted before the bacula-sd service starts after reboots, make sure automatic mount is set to yes and always open is set to no
      • comment out all the autochange/tape entries. If these are not commented the bacula-sd service will hand scanning for tape devices on startup
    • bacula-dir.conf : jobs, filesets, schedule, clients (servers to backup), storage servers to use, sql database user/password/port, volume pools, pretty much everything
      • set all the passwords, they must match the ones used in bacula-sd.conf, bacula-fd.conf, bconsole.conf
      • search for the Catalog entry, set the user/password/databasename etc to match the database you created earlier. Note that the supplied example with dbuser and dbpassword parameters does not work, use something like
        Catalog {
          Name = bacula
          dbname = "bacula"
          user = "bacula"
          password = "bacula"
          DB address = "vmhost1"
          DB port = "3306"
        }
        
      • make sure the ‘mail’ address is set correctly to something that works, bacula manual “intervention required” errors are emailed. note: mail address configuration occurs twice in the file, setup both
      • set up multiple schedules to be used, you do not want all your backups run simultaneously (although they can queue). Note: a job defined without a schedule will not be automatically scheduled byt can still be run on demand
      • set up your fileset entries to define the filesystems to be backed up. Bacula will not bacup multiple filesystems so if you have a standard fedora installation with a seperate /home directory you will need at least two filesets, one for / and one for /home. It is in the fileset definition you specify exclude entries, customise the exclude entries for your servers before running any backups; for example if you run lots of virt-manager VMs you may want to exclude /usr/lib/libvirt/images which may contain many hundreds of GBs of data in disk images that if the VMs are running and changing the virtual disk access time would be backed up with every incremental backup (basically exclude huge files that change often other than important things like database backups); if you do want to backup things like that create a seperate fileset and non-scheduled job to use the fileset and do ad-hoc backups rather than every day
      • update the “default” jobsdef entry as it can be used by all your jobs
      • edit the job definition for RestoreFiles which is the default job used when restores are requested, specifically change the “Where” entry from /tmp/bacula-restores to somewhere you have a lot of space, I change it to /var/tmp/bacula-restores; the difference being on Fedors /tmp is a ‘tmpfs’ filesystem using real memory and /var/tmp is on disk; I can restore 100Gb onto disk but not into memory :-). Note: CentOS7 has /tmp on disk so would be ok, but the job definition is used by all clients so cater for all your client OSs
      • create client entries for every server you intend to backup, the password configured in the client entry has to match that set in the bacula-fd.conf on the client server. Unless you are in a super secure environment use the same password for all client entries or al least the same format as that will make it much simpler to manage rolling out the bacula-fd.conf file to every client. Also ensure your client names follow a fixed naming convention, for example servers wally and fred could have client name entries of wally-fd and fred-fd, making automated client deployment easier
      • Now you can create backup “job” definitions for the clients. Simple standards would be if the client is a Fedora server named Wally you would have two jobs BackupWally and BackupWallyHome if you created seperate filesets for the two filesystems, and may even have them on different schedules. If you add a schedule entry to the job it will run automatically when scheduled, no schedule entry and the jobs are only available for ad-hoc backup requests
      • check all the entries to make sure ‘max full interval’,’file retention’,’job retention’ values throughout the file all make sense. For example a full backup retention of 14 days with an incremental backup retention of 30 days does not make sense, bacula is smart enough to determine during an incremental run that it has no full backup and if your job definition is setup correctly will run a full backup instead of an incremental you may not want a full backup every 14 days, so make sure your full backup expiries closely match your incremental ones
      • open firewall ports 9101, 9102, 9103 (bacula-dir, bacula-fd, bacula-sd)
      • for a home lab set “Maximum Concurrent Jobs = 1” in the Director entry of bacula-dir.conf, this ensures only one backup job runs at a time and others are queued after each other, to avoid stress on your lab network and backup server disks
      • in the BackupCatalog job the default RunAfter job deletes the disk file sql backup file created after bacula has backed it up, personally I replace this job with a custom one that takes a backup of it (/var/spool/bacula/bacula.sql) to the same external storage device I use for the bacula virtual backup volumes before deleting it as it seems silly to me to only be able to access it to rebuild bacula if bacula is actually running and able to restore it, so I keep a copy. You may want to do the same
  • On the all servers to be backed up including the backup server itself in /etc/bacula
    • bacula-fd.conf
      • set all the passwords, they must match the ones used in bacula-dir.conf
      • in the filedaemon section set the ‘name’ to match the client name you used when defining the client on the backup server in bacula-dir.conf, they must match
      • open firewall port 9102 (bacula-fd)
    • bconsole.conf
      • set the hostname of the backup server and the director passord to match the one you set in bacula-dir.conf
      • ensure the /etc/bacula/bconsole.conf file is readable by all users that need to perform restores and issue admin commands to the director, you should probably restrict access as the interface grants full admin access although retricting access means users cannot restore their own files and must always bug an admin user to do so, not as mch of an issue in a home lab enviroment, personally I recomend securing all conf files root:bacula 640 and add backup admins (only yourself in a home lab case) to the bacula group
      • optionally if you installed the bacula-console-bat package ‘cp -p bconsole.conf bat.conf’, as they are identical in configuration

On the client servers you can ‘systemctl enable bacula-fd’ and ‘systemctl start bacula-fd’ immediately on all the client servers, there will be no activity from them until the bacula director requests a backup, whether manual or scheduled.

I would suggest manually logging into mariadb to make sure the userid/password works and you can access the bacula tables before starting anything on the backup server.
When happy make sure the archive file path is mounted and ‘systemctl enable bacula-sd’ and ‘systemctl start bacula-sd’. Check messages in /var/log/messages and ‘systemctl status bacula-sd’ to make sure there were no errors.
When happy ‘systemctl enable bacula-dir’ and ‘systemctl start bacula-dir’, again check messages for startup errors.
If all OK they will start automatically on a system reboot.

It is important to note that if your storage configuration is configured to use external media you should not enable the bacula-sd and bacula-dir services to autostart on a reboot as there is no guarantee the external media will spin up in time to be mounted and available, when using external storage manually start the services in the order bacula-sd then bacula-dir after checking the external media is available.

Tips on configuraion

  • I use passwords that include the server name, so template deployment can be done onto servers
  • I use bacula-fd client names that include the server name, so template deployment can be done onto servers
  • as the SD task can be run and used on multiple servers should you have a huge home lab, include the server name in the name of the bacula-sd name so template deployment can be done onto servers
  • if your bacula-dir.conf file is getting too large to maintain the @ symbol can be used to include files (ie: @/etc/bacula/client1.conf in bacula-dir.conf would include that file when the director starts) so you can split sections out. Also as the @ symbol can also be used within included files to include additional files it is easily possible to create structured/managed configurations.

Security tips, and issues

  • users cannot be allowed to restore their own files, simply because bconsole is a full admin interface and you cannot limit a user to only restore functions whilst omitting destructive functions
  • another reason general users must be kept away is that all the configuration files for bacula have hard coded passwords to chat between services and to use the sql database, note however this is not unique to bacula and most *nix applications have passwords stored in configuration files, sometimes hashed and sometimes not
  • these are not security issues reguardless of what an auditor may tell you, as long as the configuration files are secured so only admin users have access to read them (which is needed to perform their jobs) there is no exposure to non-admin users so not an issue. However as not all admin users (in fedora those in the wheel group) will be tasked with doing restores add only those that need access to the bacula group also, the config files should all be root:bacula 640 (the bacula user must be able to read the files so leave the group bacula, not root or wheel or bacula can never start, unless you make the perms 644 instead of 640 and let everybody on the server read the config files, do not do that)
  • the only issue is that in a secure world sysadmins have to do all the restores, not ideal but common across every *nix backup solution I have ever seen

Tips on usage

  • bconsole can do everything
  • if you are running short on disk space on the bacula storage device use ‘list volumes’ to check for any expired volumes you can delete with the delete volume command. This should not be necessary if you have configured the volumes for truncation however
  • make damb sure you have changed the default restore location as mentioned in the setup steps if you are to restore GBs of data
  • when using bconsole to restore files if you are not sure of the exact filenames use the “6: Select backup for a client before a specified time” option, this builds a file list and drops you into a ‘shell’ where you can cd between backed up directories and select files or entire directories to ‘mark’ for restore
  • scheduling issues, if you have a few jobs scheduled to run at 17:00 daily and the director is stopped over the 17:00 period, when it is restarted the jobs will still be scheduled to run at 17:00, the next day. There is no catchup on missed jobs so just do not shutdown the director over backup periods
  • setup a job to periodically check the last 40 or so messages in /var/log/bacula/bacula.log to check for manual intervention cases, I have a nrpe job that checks for manual ‘label’ requests required that occur when max allowed volumes has been exceeded and one must be manually created. All errors such as that will be emailed also, so ensure you have the email address setup correctly, but who checks their emails all day long
    Update: 14Mar2021 – changes in bacula no longer log mount messages to the log file, I now have created a new user and mount (any action) events are mailed to that user and my nrpe check job now checks if a file exists for that user under /var/spool/mail and raises an alert if it does
Posted in Automation, Unix | Comments Off on Using Bacula to backup your Fedora/CentOS Linux servers

Installing the OpenStack Stein release from the RDO repositories

As you are I am sure all aware the OpenStack Stein release is available and documented on the RDO site now.

While the “stein” release has been available for a while this post took a long time to prepare as I had to work through a lot of issues before I could get a perfectly working installation so this post was delayed until I could document a working implementation. So this post is in two parts, issues to be aware of, and then how to get a fully working install.

Following the post will give you a fully working install, including additional compute nodes should you wish to do so. I would recomend adding additional compute nodes as openstack guestimates available memory based on physical memory installed and in an allinone setup does not take into account that most of the physical memory is not actually available but is used by openstack processes.

There are some issues to be aware of when installing the RDO release of Stein

Networking

The first major thing to note is that in the Stein relase openvswitch (ovs) networking has been replaced by Openflow (ovn) as the default. It uses the mechanism driver OVN instead of Openvswitch and a ml2 driver of geneve instead of vxlan… which simply does not work. Using packstack with the default OVN networking results in the following warning after each run; so no good reason for it to be the default.

Additional information:
 * Parameter CONFIG_NEUTRON_L2_AGENT: You have choosen OVN neutron backend. Note that this backend does not support LBaaS, VPNaaS or FWaaS services. Geneve will be used as encapsulation method for tenant networks

Apart from the warning message there is the small detail that geneve for tenant networks does not work, either one or more required services are not setup correctly or it is just not yet supported. You will always get errors when trying to lauch any instance as below, which places the instance into an error state (basically if you use geneve for tenant networks you cannot launch instances)

2019-07-21 17:38:31.846 10172 ERROR neutron.plugins.ml2.managers [req-4118a28e-b444-4031-910e-c570b760d0e9 f1c86cefca4e463b82851b3819cf9623 bdae98d8c35b4303b66f0aa9ddb63275 - default default] Failed to bind port d72d31ba-0dcd-4146-af83-16bafb85138f on host region1server1 for vnic_type normal using segments [{'network_id': '941c40cb-934d-4939-a2be-1e64440db0b9', 'segmentation_id': 27, 'physical_network': None, 'id': '1c302d60-fe08-4848-9b09-fa199b174079', 'network_type': u'geneve'}]

Also note that when creating a tenant network when tenant network types are set to a list (ie: =vxlan,geneve) you do not actually get a choice of what type of tenant network to create, it will always just use the first entry in the list (by default with OVN the only entry set is geneve, which doesn’t work). This prevents configuring both types to have a working vxlan while trying to debug geneve so just don’t use geneve.

If you have installed the release already and are having the issue where when lauching an instance from the dashboard results in different ip-addresses appearing/disapearing before the instance goes into a failed “error” state you may be able to resolve the issue by editing the neutron plugins ml2_conf.ini file and changing the tenant network type from geneve to vxlan; you will probably also have to change a few setting from ovn to openvswitch also based on the commands I used to create my working system.

Another point worth noteing is that for floating ip addresses openvswitch is still required, at least the section on the RDO website for using an existing external network shows only an ovs example, and of course the network bridge is still openvswitch.

With the default OVN install after installtion the default MTU sizes on networks created by horizon still seem to be set to 1500, the documentation at https://docs.openstack.org/networking-ovn/latest/install/migration.html indicates these need to be much lower, but there is no way to override the defaults when creating tenant networks in horizon. There are major changes between OVS and OVN summarised at https://docs.openstack.org/networking-ovn/latest/faq/index.html the main one for debugging being there is no qrouter network namespace to look at.

Disclaimer: if I had spent another few more months working on it I may have been able to get everything working using OVN rather than OVS, despite google searches returning everyone having trouble with geneve. However I was not willing to spend that extra time as my primary goal was to get a working system I could use to continue using it for the testing/development tasks I wish to use it for; basically I wanted to replace my aging Ocata system with the Stein release. As that has been achieved be reverting to OVS networking my current goal has been achieved. It may be entirely possible to get an environment up and running using OVN and geneve networking, but this post is about reverting to OVS and vxlan networking.

In order to get it working for my use I decided to use Openvswitch only.

If you want to ‘plug-in’ to an existing home network review the documentation at https://www.rdoproject.org/networking/neutron-with-existing-external-network/ before running packstack although I have included the relavant commands in this post, I also covered setting up the openvswitch bridge in my much earlier post on setting up for the queens install at https://mdickinson.dyndns.org/php/wordpress/?p=872 which may or may not provide additional information. Basically you need to set up a openvswitch network bridge on the machine you will use for providing networking.

Console access

Another issue is that console access into instances from the horizon dashboard once again does not work ‘out-of-the-box’. All the settings needed to setup console access need to be manually set, how to do that I cover later on.

High IO activity

And the final thing worth mentioning is use fast disks if possible, “iotop” shows that mariadb (mysqld) even with zero instances running hammers the disks, you can expect around a 4% wait busy time in top due to database activity even when nothing is happening if installed on a normal desktop.

How I installed a fully working setup, using Openvswitch instead of OVN

Creating br-ex as an openswitch bridge is covered in my earlier post linked to above plus examples are on the RDO site in the documentation on using an existing external network. Creating the VM(s) and setting up those I will not repeat here; just make sure all your servers can reference each other in the hosts file of each machine (or are in dns) and you have an openvswitch bridge setup on the controller host.

My ethernet card on the allinone host is ens3, configured under openvswitch to br-ex. My controller/network (allin1) host is 192.168.1.172, and I also define a second compute node on a host at 192.168.1.162. My existing external network is 192.168.1.0/24.

I explicitly selected an Openvswitch installation as that was the networking I could get working.

The initial step is to generate a packstack answers file to edit before actually using it.

packstack --gen-answer-file=answers_default_allin1.txt \
 --allinone --timeout=999999 --default-password=password \
 --provision-demo=n \
 --os-neutron-ovs-bridge-mappings=extnet:br-ex \
 --os-neutron-ovs-bridge-interfaces=br-ex:ens3 \
 --os-neutron-ml2-type-drivers=vxlan,flat \
 --os-neutron-ml2-tenant-network-types=vxlan \
 --os-heat-install=y --os-heat-cfn-install=y \
 --os-magnum-install=y \
 --os-neutron-l2-agent=openvswitch

Edit the answers file, change the setting below to as shown, packstack has set it to ‘ovn’ which will cause serious networking problems

CONFIG_NEUTRON_ML2_MECHANISM_DRIVERS=openvswitch,l2population 

I personally add additional compute hosts at this step, do not edit the config_compute_hosts entry if you are just doing a allinone install to a single host. The below example is for my environment where the ‘allinone’ host is 172 and my second compute host is 162.
CONFIG_COMPUTE_HOSTS=192.168.1.172,192.168.1.162

Then run the packstack command using the answers file, with a huge timeout value. It should complete without problems.
If there are timeouts the command can be rerun but rerun it in a “screen” session if you are not on the console as a rerun will drop the network (with a screen session you can just ssh back into the server and reconnect to the screen session).

packstack --timeout=99999 --answer-file=answers_default_allin1.txt

Then you should manually create the external network and subnet pool using the command line as I had issues with doing so using the dashboard. Note that I use a 192.168.1.0/24 network, make sure you change it to your external network. Also the allocation pool should be addresses in a range your dhcp-server/router does not issue, I use only 240-250 for openstack (I use 160-190 for non-openstack physical and kvm machines) as I have few dynamic devices and my router has never issued anything above 14 (grin).

While such a limited 240-250 range reserved for floating ipaddrs for openstack use may seem small it is actually probably too large for a home lab (see usage tips at the end of the post).

source ~/keystonerc_admin
neutron net-create external_network \
 --shared --provider:network_type flat \
 --provider:physical_network extnet \
 --router:external
neutron subnet-create --name public_subnet \
 --enable_dhcp=False \
 --allocation-pool=start=192.168.1.240,end=192.168.1.250 \
 --gateway=192.168.1.1 external_network 192.168.1.0/24

As we did not provision the demo manually load the cirros image to test with

source ~/keystonerc_admin
curl http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img | glance \
         image-create --name='cirros image' --visibility=public --container-format=bare --disk-format=qcow2

Logon to the horizon dashboard as the admin user (http://your-allinone-server-ip/)

  • create a userid for yourself as an admin user under the identity/user tab, leave project blank
  • then create a project for yourself, make yourself the only member with an admin role
  • logoff the admin user

Now is a good time to reboot the server(s) to make sure everything works.

Logon to the dashboard as the new user you just created (http://your-allinone-server-ip/)

  • create a tenant network from the project network tab (ie: 10.0.1.0/24 with subnet range 10.0.1.2,10.0.1.254, start at .2 not .1 as one address is reserved for the gateway/router/dhcp-server, in this example 10.0.1.1 will be the gateway), ensure dhcp is enabled
  • launch a test instance using the cirros image and m1.tiny flavor, using your new tenant network
  • if it launches ok and goes to active/running you are in business
  • click on the instance name and check the log to make sure it started ok
  • note: console access is not working at this point as noted in the issues earlier

If you try to associate a floating ip at this point it will fail as your project has no access to the external network yet, so

  • under the project network/router tab create a router for the project using the external network you created earlier
  • select the router and add an interface using the tenant network you have created
  • now you can associate a floating ip to the instance

You may be supprised you cannot ping the instance from your external network using the floating ip-address, that is because access into the the floating ip is not allowed by the default security rules so create a new network security group

  • select the network tab, select security groups
  • create a new rule ssh-and-icmp
  • add rule “all icmp,ingress,cdir,0.0.0/0”
  • add rule “ssh,ingress,cdir,0.0.0.0/0”
  • go back to compute/instances and modify the security groups by adding your new rule group, security group rules can be added/deleted on the fly while the instance is running
  • you can now ping your cirros test instance using the floating ip address
  • you can also ssh cirros@floatingipaddr and login with passsword “cubswin:)” as seen in the instance log

Most cloud images do not allow direct login but need key pairs. For example a Fedora cloud image expects login to fedora instances only to the fedora user via ssh key.

  • select instances/key pairs and create a new ssh key, when prompted to download save it somewhere you can remember it, and copy it to every workstation you will be using to login to your instances (I normally place it in my ~/.ssh directory)
  • you will need to access most cloud instances with ssh clouduser@ipaddress -i keypairname.pem as most do not allow userid/password logins, the test cirros image being the exception.

Now as jolly ssh is a command line interface lets mention something you do need to do on the command line. There are a lot of openstack commands that are issued from the command line and you don’t want to have to do them all as the admin user, especially as you really only need to worry about your project at this point. So cd to the root user home directory and copy keystonerc_admin to keystonerc_yournewuserid, edit keystonerc_yournewuserid to use the userid, password and default project you created for your openstack userid. You should copy that to your personal unix directory and source that file instead of keystomerc_admin when issuing commands; and as you created your new userid as an admin you can issue pretty much any command needed.

At this point you will probably want to create images for actual cloud distributions, most distributions provide images such as CentOS-7-x86_64-GenericCloud-1704.qcow2 and Fedora-Cloud-Base-30-1.2.x86_64.qcow2. You will also need to create custom flavours for those images as they will each have their own requirements.

For example a “qemu-img info Fedora-Cloud-Base-30-1.2.x86_64.qcow2 shows the disk image size is actually 4Gb which is the minumum needed and should be set on the image to prevent flavors with smaller disk sizes using it, so loading that as an image would be

source ~/keystonerc_admin
glance image-create \
 --name "Fedora 30" \
 --visibility public \
 --disk-format qcow2 \
 --min-ram 512 \
 --min-disk 4 \
 --container-format bare \
 --protected False \
 --progress \
 --file Fedora-Cloud-Base-30-1.2.x86_64.qcow2

And you would then create a custom flavour for fedora30 with at least 512Mb of ram and 4Gb or larger disk. Certainly you could use an existing flavour than allocates Gbs of ram and a huge disk, but why would you want to in a home lab where resources atre scarce.

Getting console access working

Make sure all instances on all compute nodes are stopped, you will be rebooting as that is the simplest way to pick up the changes.

On all compute nodes edit /etc/nova/nova.conf, all changes to be made are in the [vnc] section

  • set server_listen to the ip-address of the compute node you are updating the file on, default is 127.0.0.1
  • set server_proxyclient_address to the ip-address of the compute node you are updating the file on
  • set novncproxy_base_url=http://controller-ipaddr:6080/vnc_auto.html, not the compute node address but the controller address
  • set xvpvncproxy_base_url=http://controller-ipaddr:6081/console, not the compute node address but the controller address
  • reboot everything
  • when all the servers have stabalised restart the instances, you now have console access via the horizon dashboard to all compute nodes

Testing the install, if you have multiple compute nodes

If, like me, you installed additional compute nodes you should test them all to ensure they work as expected.
I do this by disabling all but one compute host at a time from the hypervisor tab and starting instances to force an instance to start on specific compute hosts, and after you have an instance on each compute host remember to re-enable all the compute hosts.
You have by doing so tested you can start an instance on each compute node.
You should then test private tenant networking between compute hosts by pinging or sshing between the instances on different compute nodes, plus check you have console access to instances on all the compute nodes.

Usage tips

It lies, the dashboard and openstack environment report lots of free memory on the allinone host but assumes all memory resources are available based on physical memory installed, it does not take into account that the openstack software is actually using most of that memory, so trying to lauch an instance that needs a lot of memory will probably fail. If the dashboard shows the compute node on the allinone server shows 14Gb free on a 15Gb server you can safely assume you have at most 5Gb free.

What I do is on the allinone server start a small footprint instance on the private tenant network I am using assigned a floating ip-address, then disable the allinone compute node to force all other instances to start on my second compute node. The small footprint instance I can then use as a gateway server to access all instances on the private tenant network without any need to assign additional floating ip-addresses to any of them. To access the instances on the private network on each desktop machine that wants to access them simply add a route, for example to access instances on a private network of 10.0.1.0/24 through a gateway instance assigned a floating ip of 192.168.1.241 you would just add the route below and all those desktop machines can immediately locate and access instances on the private network.

route add -net 10.0.1.0/24 gw 192.168.1.241

This means your reserved floating ip-address range can be extremely small as you only need one per tenant network, not one per instance you want external access to.

The reason I start the gateway instance on the allinone node is simply because that is the network node, and if it is not available nothing would work anyway.

Another thing I would generally do is install nrpe to allow monitoring. While that would normally be a simple case of “yum -y install epel-release”, “yum -y install nrpe nagios-plugins-all”, “systemctl enable nrpe”, “systemctl start nrpe” there is actually a catch.

Remember that openstack must have firewalld disabled so you cannot use firewall-cmd to open the nrpe port, plus it’s use of iptables to load all its convoluted but necessary network traffic rules takes quite a bit of time on a system restart. There is no safe way to automate opening the nrpe port as you must be 100% sure all the iptables rules needed for openstack to function have been completely setup before touching them. The only safe way I have found is to manually enter the command below about 10 minutes after rebooting the server(s) to insert the rule to open the nrpe port 5666.


iptables -I INPUT -p tcp --dport 5666 -m conntrack --ctstate NEW,ESTABLISHED -j ACCEPT

Even if you do not monitor using nagios/nrpe it is important you know that you should not play with the iptables rules until after the openstack environment has fully initialised and stabalised.

Summary

OpenStack Stein release using openvswitch, ovs and vxlan networking is, if you were able to follow the directions, installed and working.
Enjoy.

Posted in OpenStack, Unix | Comments Off on Installing the OpenStack Stein release from the RDO repositories

Looks like Google is breaking the rules

It would appear I am blocking Google from indexing my site, at least until they fix their robot indexers to perform as documented.

The documentation at http://www.google.com/bot.html, url thoughtfully provided in the user agent string, clearly says Googlebot and all respectable search engine bots will respect the directives in robots.txt and only spammers do not. Well Google obviously does ignore the robots.txt as seen from my logs.

access_log-20190526:66.249.71.126 - - [19/May/2019:20:59:40 +1200] "GET /badbehavedbot HTTP/1.1" 200 677 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
access_log-20190526:159.69.117.171 - - [25/May/2019:18:38:48 +1200] "GET /badbehavedbot HTTP/1.1" 200 678 "-" "Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/)"
access_log-20190616:144.76.68.76 - - [12/Jun/2019:16:19:05 +1200] "GET /badbehavedbot HTTP/1.1" 200 676 "-" "serpstatbot/1.0 (advanced backlink tracking bot; http://serpstatbot.com/; abuse@serpstatbot.com)"

It does appear to be a real google server that is ignoring the robots.txt file.

[root@vosprey2 httpd]# nslookup 66.249.71.126
126.71.249.66.in-addr.arpa	name = crawl-66-249-71-126.googlebot.com.

Why do I think it is ignoring it you ask, because my robots.txt file starts with

User-agent: *
Disallow: /badbehavedbot/
and many more

And also specifically has a section

User-agent: Googlebot
Disallow: /badbehavedbot/
and many more

The reason this will be preventing Google from indexing my site is that the /badbehavedbot/ URL is a ‘honeytrap’ link that exists hidden on all pages including the main page, no user can ever click on it, it is specifically there to automatically blacklist any search engine that refuses to obey the robots.txt file and tries to follow it… therefore within seconds of Googlebot trying to index that link the source ip-address the robot is crawling from is permanently blocked in iptables rules from accessing my website.

The blacklisting is fully automatic and I don’t intend to change it, it only impacts badly behaved web crawlers which in the URL Google provide in their agent string clearly states is behaviour only done by spammers of nogoodniks; it appears Google now include themselves in that category.

Actually, looking at a few search results (yes found by google) on this issue it seems to have been reported in 2017 that Googlebot suddenly started ignoring robots.txt. You can do your own searches but it seems to be a common issue.

One suggestion made in the posts I found is that entries are being indexed because the bot followed the url from a link on another site which may have been new behaviour added, but as this particular link simply does not refer to an existing page I doubt anyone has linked to it or that is the reason for it happening here.

I suppose it is possible that following a url posted on another site referencing a page on my site makes them believe they can then try and index everything on my site simply because the remote site that directed them here did not contain in its robots.txt file that they should not do so is the sort of fuzzy logic that I will completely ignore and continue blacklisting ip-addresses that do not observe my robots.txt file when crawling my site.

Another suggestion is that the crawler behaviour was changed to retrieve all pages and then refer to the robots.txt to decide what not to index, rather than refer to the robots.txt file before attempting to retrieve pages. If that is the case they have decided to retrieve pages you do not want them to retrieve, which I am sure they would not do so lets discount that as a reason.

From my perspective, I am gradually blacklisting all the googlebot crawler servers ip-addresses (and other badly behaved bots) from accessing my site as new ip-addresses trigger the blacklist rule, however as I do not particularly care about page rankings I can live with that. I will probably every few months remove those specific blocks to see if the behaviour has improved happy in the knowledge that if it has not they will simply be automatically blocked again.

The important thing is that if there is a Disallow entry in the robots.txt file it is there because you specifically do not want that page retrieved and indexed, it may be sensitive or secure information. The correct behaviour is to immediately block the requesting ip-address from accessing the website, so I will leave my automation rules in place to do so.

Posted in Home Life | Comments Off on Looks like Google is breaking the rules

Logcheck on Fedora, for my use

Logcheck on fedora, by default when installed, runs every hour. For my use this generates so much email traffic (as it runs on all my servers) that it becomes garbage, and I ended up just deleting all the emails rather than reading them.

In reference to the garbage, by garbage I mean that running hourly, against the same log files, resulted in the emails being sent containing pretty much duplicate information from the prior run; pointless. Yes I could change the time parameters but I don’t want the emails every hour anyway.

So I decided to change it from hourly to daily, while that does not mean I will read them all it does mean I have a chance to actually check other emails from other subsystems that tended to be deleted while I was bulk deleting the logcheck ones.

The scheduling for logcheck does not reside in /etc/cron.hourly to be simply moved to /etc/cron.daily to be run whenever /etc/anacrontab decides to run it during the defined day, it is actually in a file /etc/cron.d/logcheck to run at a specific time.

Point of interest: files in /etc/cron.d are similar to, but not the same as normal crontab entries, the main difference being there is a requirement to specify the userid a job is to be run as. The interesting point however is that files in /etc/cron.d also have the interesting property of being able to be scheduled to run at ‘@reboot’ rather than a specific crontab format time such as ‘2 * * * *’, which I may find a use for at some point.

So I simply changed the default scheduling time in /etc/cron.d/logcheck from ‘2 * * * *’ to ‘2 2 * * *’ to run at 2:02am daily rather than at two minutes past every hour.

I also have found a need to create custom suppression filters, which will be an ongoing work in progress, those rules also need to be copied to all my servers.

So as this affected all my servers I created a puppet class for logcheck and added it to the ‘base’ profile, and added it to my generic ‘allservers’ role that contains common configurations and packages needed on all servers. This will ensure all servers are correctly setup as I was starting to lose track of which ones I had manually updated the cron.d entry on, and now I am creating custom filter rules that will be updated fairly regularly I just want to update everything from one place, which puppet lets me do.

While some may think the simple change to scheduling time was probably not worth a post, I think the ability to use ‘@reboot’ as a scheduling time option in files under /etc/cron.d has potential, which many of you may not have known about had you not read this pointless post. Irritatingly I knew that function existed but had forgotten about it, so pherhaps this post is more a reminder to me :-)

Posted in Unix | Comments Off on Logcheck on Fedora, for my use

Suppressing audit messages to /var/log/messages on Fedora 29 and 30

One of my machines has started rebooting for no reason that I can find, possibly power surges as my UPS seems have now have a dead battery.

One difficulty in finding the issue is that the messages file if full of audit messages, which are a waste of space as those same messages also get logged to /var/log/audit as well.

How to turn them off was a question posted on fedoraforum.org that I posted the answer to as I found a similar match on stackexchange, but having rebuilt a few machines and having to search out the answer I posted again I have decided to place the inforamtion in my blog as well, simply so it is easier for me to find.

Adding as the first two lines under the #### RULES #### section in /etc/rsyslog.conf and doing a systemctl restart rsyslog will stop the messages being logged to the messages file, they are still logged to audit.log if you want to look at them.


# no audit
:programname, isequal, "audit" ~

That is not ideal as it logs a depreciated warning and is simply suppression that does not address the source of the problem which is the audit messages being somewhere configured to write to both syslog and the audit log. It does not address the issue as you say, of why the messages are being generated in the first place.

But getting rid of the bulk of the messages being logged to /var/log/messages may help me track down why my machine is rebooting itself… which is a pain as it has a luks encrypted disk(s) so while it tries to restart itself, it cannot.

Posted in Unix | Comments Off on Suppressing audit messages to /var/log/messages on Fedora 29 and 30

Using the certbot package on Fedora 30 to get LetsEncrypt certificates

There was a post on the fedora forums stating that the certbot apache plugin does not work on fedora, so I had a look. The post was correct, the apache plugin for certbot wants to use the “apachectl -v” command which simply does not exist on fedora, so certbot with the –apache option will always error and fail on fedora.

That rpm package for the apache plugin should probably be removed from the Fedora repositories, but that is not my call. The cerbot package itself I found works OK and provides all the hands off functionality needed.

As I was interested in replacing my self signed certificates with a “real” one from LetsEncrypt I had a more detailed look into certbot and it’s options.

I had avoided looking into it in the past as my sites name is resolved by dyndns, is a .dyndns.org address, and as I don’t own the dyndns.org name wrongly assumed I could not obtain a certificate that included it. But is is possible using the –webroot option.

The goal of course is to use LetsEncrypt to provide valid certificates for my website to use, and to have them managed in an automated hands-off way.

One important thing to note in using certbot is that despite the entire idea of SSL certs is to have all traffic across port 443 it is necessary to also have ports open for the unencrypted port 80 traffic and your web server handling requests on that port as that is needed by certbot. If you get errors about your server being unreachable it will be because you have port 80 blocked (or your dns name is unresolveable on the internet which is a completely seperate issue).

The first time you run certbot it will prompt for details such as your name and contact information. Unfortunately I did not take a copy of the output for that as it was the first of many failed attempts as my port 80 was blocked at that time. Once the information is entered it is not asked for again.
I assume it is embedded in the private files created under /etc/letsencrypt/live/mdickinson.dyndns.org or other files under the directories created under /etc/letsencrypt; it is enough to know enter the correct details in the first place as they would appear to be difficult to change.

Anyway, once port 80 is opened, using the –webroot option works perfectly. You need to run the command as root as it needs to create lots of directories under /etc/letsencrypt. The –webroot path needs to be the directory your websites html directory is served from, certbot will create a temporary directory under there that the LetsEncrypt servers will attempt to read to verify that the internet dns name used as your domain name does in fact resolve to your webserver, if they can from the big wide internet resolve your server address and read the temporary files created under your webserver path it verifies you have admin control over the machine the webserver is hosted on and creates the certificates.

A log of the working request, unfortunately missing the prompts for site details as mentioned above, is below.

[root@vosprey2 httpd]# certbot certonly --webroot -w /var/www/html -d your.hostname.org
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator webroot, Installer None
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for your.hostname.org
Using the webroot path /var/www/html for all unmatched domains.
Waiting for verification...
Cleaning up challenges

IMPORTANT NOTES:
 - Congratulations! Your certificate and chain have been saved at:
   /etc/letsencrypt/live/your.hostname.org/fullchain.pem
   Your key file has been saved at:
   /etc/letsencrypt/live/your.hostname.org/privkey.pem
   Your cert will expire on 2019-08-13. To obtain a new or tweaked
   version of this certificate in the future, simply run certbot
   again. To non-interactively renew *all* of your certificates, run
   "certbot renew"
 - If you like Certbot, please consider supporting our work by:

   Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
   Donating to EFF:                    https://eff.org/donate-le

The following changes were made to /etc/httpd/conf.d/ssl.conf to replace my self signed csr certificate files with the LetsEncrypt ones. They were the only changes needed.

[root@vosprey2 conf.d]# grep -i letsencrypt ssl.conf
SSLCertificateFile /etc/letsencrypt/live/your.hostname.org/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/your.hostname.org/privkey.pem
SSLCertificateChainFile /etc/letsencrypt/live/your.hostname.org/fullchain.pem

A “systemctl restart httpd” to pick up the new keys and clicking on the lock on a https browsing of the website shows the letsencrypt provided certificate is working perfectly.

Note that I changed the cerificate paths to the location that LetsEncrypt places the certificates, this in theory means that a simple “certbot renew” as shown in the output logs will replace the certificates with no need to make further changes to my configuration.

Of course a scheduled job will need to be setup to run the command and perform a restart of httpd, plus parse the output to find the next expiry date and reschedule the job; all on the todo list. For now I have a LetsEncrypt certificate instead of a self signed one.

Using certbot to obtain LetsEncrypt signed certificates for Fedora 30 is quite easy; if you haven’t already give it a try.

Posted in Unix | Comments Off on Using the certbot package on Fedora 30 to get LetsEncrypt certificates

Interesting issues with grub2 and Fedora 30

While the resolution was found in the Fedora forums as RedHat is built from successfull pilots of new code in Fedora in may exist in RedHat systems as well.

I had major issues upgrading my KVM F29 instances to F30. Every attempt attempt resulted in the “grub>” prompt. The beauty of KVM instances of course is I could just restore the qcow2 disk image and retry a lot of different options until I found the post with the solution to the issue.

This does affect physical machines as well, the post I found with the resloution was posted by someone who had hit the issue with a physical system and had to resort to live-cd to rebuild their grub setup. I was lucky in that the only physical machine I had updated was recently rebuilt from F28 and uograded to F29 before I did the F30 upgrade, so was up-to-date enough to upgrade OK, if I had attempted the upgrade on my main machine it would have been bricked, well at the “grub>” prompt requiring hours of work with a live-cd to recover.

I am dubious about taking the risk of updating the F29 OS on my dual boot laptop as there are lots of posts reporting that grub2 can only find windoze partitions after an upgrade to F30, but I guess I will have to attempt it at some point. That is probably a seperate issue I may post on if that goes wrong.

Anyway the issue identified by the post I found is that on BIOS machines (and KVM as it emulates a BIOS environment) I have been running a while that have been upgraded through F24,F25,F26,F27,F28,F29 and now F30 the updates will update the grub2 software packages installed but do not actually update the physical grub2 boot environment; the MBR (or boot sector if you boot off a partition instead of the MBR) itself is not updated with the newer versions of the software; the boot sector becomes stale.

With the update to F30 the differences between the stale entries in the MBR and the installed software on the system(s) result in the “grub>” prompt being displayed instead of the boot menu. Somewhere along the upgrade chain the way grub2 is layed out on the filesystem has changed and the older MBR version of grub2 cannot handle the new layout.

The way to sucessfully update the KVM instances is to replace the normal upgrade of

sudo dnf upgrade --refresh
sudo dnf install dnf-plugin-system-upgrade
sudo dnf system-upgrade download --releasever=30
sudo dnf system-upgrade reboot

With the below, there is just the one extra command

sudo dnf upgrade --refresh
sudo dnf install dnf-plugin-system-upgrade
sudo dnf system-upgrade download --releasever=30
sudo grub2-install /dev/vda   # < ---- the extra line, change to your boot disk MBR or partition
sudo dnf system-upgrade reboot

Obviously on physical hardware you need to know if you are booting from the MBR or a partition on a disk, and perform the grub2-install onto the correct boot device; and also it is only needed if you also have not done so for a while, personally I will do it for every new upgrade.

Fedora 30 grub issues after the upgrade are getting quite a few posts raised on the furums, this must be one of the messiest upgrades in quite a while. Not looking forward to upgrading the dual boot laptop at all.

Posted in Unix | Comments Off on Interesting issues with grub2 and Fedora 30

BasKet notes appear to be no more, switched to CherryTree

I used to rely on BasKet notes on both Fedora and Redhat

After rebuilding a Desktop I discovered it was no longer available in the Fedora repositories. This may not be a new thing as I discovered I had also downloaded/saved the RPM file for it so must have manually installed in to use in the past, I would assume to avoid losing years worth of how-to notes I had saved. That application appears to no longer be in active development which is a shame.

This time I decided to move forward, and after a fair bit of searching I found that ‘CherryTree’ provided similar functionality and as an added bonus the documentation indicated it could import basket files. It uses SQLite for storage but unlike many other products that use that it provides a database compression/reclaim from within the application menu (addmitedly I have not tried to compress the database yet as it will be a good few months of entries before I have enough fragmentation to see if it does anything).

Anyway, the ‘CherryTree’ package is still supported in the Fedora repositories so it was a simple case of ‘dnf -y install cherrytree’ to make it available.

I uncompressed a baskets backup and used CherryTree to import the baskets data… it worked. So I can move on using a currently supported application with loss of years of notes. Its functionality for day-to-day use is also very similar, so a good match as a replacement application.

Posted in Unix | Comments Off on BasKet notes appear to be no more, switched to CherryTree

Mirroring up two spare internal disks under Fedora29

Having had to rebuild another machine, After it was up and running I had a server with the OS installed on the boot disk and two spare internal disks that used to be in a raid array. While the machine supported hardware raid I chose to setup software raid as it gives me more control, and personally I would rather ‘tweak’ with the command line than at the hardware bios level.

Anyway, rather than re-create the wheel, there is a very good article at https://www.tecmint.com/create-raid1-in-linux/ that explains how to mirror up a spare two disks.

I put a LUKs filesystem on mirrored disks and made it a /home directory.

Posted in Unix | Comments Off on Mirroring up two spare internal disks under Fedora29