Jellyfin media server, and using remote disks

Have I mentioned I have discovered Jellyfin ?

Everyone wants their own home media server to avoid having to locate that elusive DVD you want to watch again, which may have degraded to the point it is unwatchable anyway so should have been backed up to disk in the first place. Of course backing it up to disk may still mean you have to move a disk to the TV you want to watch it on, so still moving things around. A media server solves that.

There are quite a few about, Kodi has existed for what seems like forever and is probably the gold standard, Plex has been around for a while, and I have found Jellyfin.

I chose to use remote external hard drives to host my media and I had to work through some issues for that so I have a section on the issues you will hit and how to work around them at the end of this post, they were not issues with Jellyfin but with permissions.

A bonus for me is that the Jellyfin client application can be installed on Chromecast devices (I have two) and Android tablets (I have one) and smart android based TVs which can access your Jellyfin media server just using your home network internal 192.168.x.x address (note: Android tablets can cast to the chromecast using the Jellyfin app, the the Linux Jellyfin app cannot… and web browsers cannot cast to chromecast on internal network addresses either so probably the google library was used; lots of internet search results saying that is because google requires you to cast to a DNS name secured by https (presumably via its DNS servers) so having a client installable on the chromecast (or smart TV) that can locate the media only by local ip-address was essential to me).

Note: google searches while investigating the case issue show PLEX is not affected… because PLEX routed through the PLEX servers; not sure exactly how that works but I wanted something that could use 192.168.1.nnn addresses and not go near the internet so chose Jellyfin.

Jellyfin and other similar media library implementations will helpfully try to locate movie covers, cast biographies, and a lot of other stuff by querying internet servers such as IMDB for every movie or show you add. To get any useful information from those queries it relies on you having a directory stucture and naming standard each solution expects and none of us tend to backup in the expected naming standard so expect a lot of folder renaming and file relocations just to get it to look like you spent some time on it and expect to have to locate DVD covers manually, Jellyfin will try to work out how to display your media backups as they are if possible and also has a “folder” view that can be enabled, but that is not as pretty as taking the time into sorting them into movies and shows (assuming you already backup in categories here).

I would say if you are copying in a huge archive of backups to a media server for the first time let it do the lookups (in my case it populated about 5% of my titles) and then turn external lookups off and just enter everything yourself going forward; a bit of effort in locating media covers and movie info but easier than trying to rename a decades worth or archives and constantlt rescanning.

Anyway, why did I chose Jellyfin over the others

  • It has a damn very footprint. I created a new Debian12 VM with only 1Gb memory assigned, with 4Tb of movies and shows (on a remote external disk) for it to scan/manage and it is using zero swap… but I did create a new VM rather and install natively than use a container (jellyfin provide a docker container) as that best suoted hos I wanted to use it
  • Even if naming standards are not followed it can still in many cases work out shows and movies
  • They have implemented a optional “folder view” facility, not brilliant but it can find and handle some edge cases with a lot of directories to traverse; not especially well but it beats having to rename a decades worth of collected backups
  • Easy to manually edit images and metadata to put DVD cover pictures and year of release etc into the displays, if you remember to tick the store images locally option for each library
  • There is a jellyfin client that can be installed on ChromeCast devices (and presumably any android TV) that can directly use the Jellyfin server by internal network ipaddress
  • Media can be added via local directories, nfs or samba (you really should read my notes on remote disks below first)
  • It can be installed from the Debian12 repos into me new VM with a simple “apt install jellyfin”

How I decided to use it

I chose a VM rather than a container as a lot of information (such as save graphics images locally) does not seem to be stored in the directories on the media library disk themselves and as it takes a lot of effort to locate and setup images I would not want to lose all that info which I would if I deleted a container in order to start it on another machine; moving a VM disk to another machine will not lose any info (moving a VM disk I find easier than snapshotting containers and volumes and trying to move them about). I also store a copy of the VM disk on each media external disk.

While Jellyfin allows the use of remote disks within the application itself (I think it can mount samba/CIFS directories itself) I decided to use only local filesystem mount points such as /mnt/media1 /mnt/media2 etc and mount remote disks manually so I have control over what remote disks are mounted and can move the external disks between machines as needed without needing to reconfigure jellyfin as the application will always refer to teh same mountpoints (plus I would prefer to use NFS rather than CIFS); so both the disks and VM are portable and do not need to be on the same physical machine :-). Be sure to read my notes on Samba/CIFS and NFS issues you will hit below.

Another advantage of using a VM is that I can keep a resonably up to date copy of the VM disk image on the external media disk itself so as long as I backup the media disk I can spin it up anywhere (yes a snapshot of a container could have been taken if I had gone the container way; containers are a lot harder to upgrade without losing data however).

So my key requirements were

  • must not need to go anywhere near the internet, all streaming to be confined to my 192.168.1.x network
  • must have a small footprint (I have so many VMs I am maxing out all my dedicated VM machines)
  • must be usable on ChromeCast [due to casting limitations of the google APIs that means must have a client app]
  • must allow me to move external library disks about; I’m exhausting available USB slots in physical machines and despite whatever anyone tells you USB powered disks (even on on a powered USB hub) just won’t work, so I have to move them about as needed
  • must be easy to setup and use

On the last… while easy to setup and use it requires a specific directory structure which my backup disks do not use. As my backup disks are in a structure that makes sense to me and I choose not to change those I needed another disk to re-layout files for Jellyfin (actually another two, backup everything !, a copy of the VM image is also backed up on the disks providing the libraries which is another advantage of using a VM).

Also on the easy to use the web interface can also easily update DVD cover images from localy saved images (ie: from imdb that may have media cover shots of something snapped from a camera) and easy to edit the metadata fields (if you remeber to tick the store locally box).

While I am a strong believer in backup everything identically; out of pure habbit all my initial testing was on a T2B external disk LUKS encrypted with an ext4 filesystem (nfs mounted to the Jellyfin VM). When I needed to move to a 4Tb I left it as FAT32 so I could test for any issues with CIFS/Samba and see how it compared to NFS (result: CIFS works OK, NFS seems to be faster and have less pauses; all my interfaces and switches are 1Gb but the chomecasts attached to the TVs are using wireless, so the only difference was a change from NFS to CIFS ands I only see pausing using CIFS [and the disk was changed from 2Tb EXT4 to 4TB FAT32] but the pausing is infrequent so certainly usable.

OK, so how to do it

Debian has the Jellyfin server package. Create a new Debian12 VM (as noted only 1Gb of memory is needed, I allocated 2 vCPUs but it can probably work with 1); then “apt update;apt upgrade -y;apt install jellyfin;systemctl stop jellyfin;systemctl disable jellyfin”.

The stop and diable of the jellyfin service you may be curious about… remember my media libraries are on external disks that may be on remote machines (always remote for a VM actually), I need to mount them onto /mnt/mediaN for my setup before manually starting jellyfin.

Then copy some files in using the Movies/Shows filesystem structure; “systemctl start jellyfin”; use the Web interface to add Libraries (I kept shows and movies in seperate libraries as recommended). Jellyfin will scan the libraries as they are added but it will take a while.

To add additional media later you can just copy them into the folders already added to libraries (I had turned off the automatic regular scans for new files but it either does it daily anyway or I missed turning it off in at least one library I copied files into later). You can either wait for them to be detected over time or from the Web interface dashboard request a library scan queued.

Oh yes, you should also create a new user/admin id and remove the default. You can also add additional non-admin users such as kids in which case the rating of teh moves/shows can limit what they can see… I found that pointles as the chromecast app remembers who set it up/logged-on and automatically logs on as that user each time anyway.

Thats it. You can now watch media using the web browser interface. Obviously you want a lot more than that so install the official Jellyfin client application on you tablets, TVs and chromecasts… note when you install the client apps they expect your jellyfin server to be running and will search for it on your home network at install time at which point I find just confirming the ipaddr found is correct and manually entering username/password works better
that trying to run back to the admin gui to look for some hot-plug OK button that never seems to appear.

Then enjoy the benefit of having your media available anywhere… and what I really like is just being able to back-arrow out of something I am watching in the living room and exit the app, and starting the app in the bedroom and going to the “continue watching” section and resume where I left off; don’t know how I lived without it.

And a seperate section on the issues of using remote disks

Disk labels

Lots of external disks have default disk labels with spaces in the name (ie: “One Touch” for seagate one touch drives); that is difficult to manage for entries in fstab, exports and samba. Install gparted, unmount the disk, use gparted on the partition (ie: gparted /dev/sdc2) right click on the partition and change the label; this does it non-destructively. That is a lot esier than trying to figure out where quotes and backslashes are needed in configuration files… and not all config files allow spaces at all.

Disk mount locations, permissions and issues

What you must remember for all these example discussed here is that how I have decided to use Jellyfin is with a remote disk (as my Jellyfin instance runs in a VM the disk will always be remote even if plugged into the same physical machine as the VM) and to avoid having to reconfigure the VM whenever the disk is moved Jellyfin within the VM is always configured to look for its libraries at /mnt/media1. All discussions and examples are related to getting the disk available at /mnt/media1 and writeable by the jellyfin user in the VM.

It should also probably be noted that as things like image uploads seem to be stored in locations within the VM rather than the media disk the media disk(s) probably does not have to be writeable for Jellyfin use but as I had issues with permissions with CIFS shares everything here discusses writeable as it will probably be helpful to know for other unrelated projects as well.

On Linux machines manually plugging in an external disk will normally place it under the logged on users /media/username directory (Debian) or /run.media/username (rhel based). for FAT32 disks (the default for large external HDs) all the files will be set to the ownership of the logged on user; for EXT4 disks I think ownership of permissions is treated as for a normal ext4 filesystem but of course only the logged on user can traverse the initial /media/username directory path.

On Linux machines a FAT32 disk/directory must be exported using Samba (NFS cannot export FAT32). Samba can probably also export an EXT4 filesystem so you may think it easier just to go with Samba; just bear in mind EXT4/NFS is faster and in my experience more stable.
On Windows machines even though disks may get a different drive letter they seem to remember when a drectory has been set to shared on them, but it always pays to check each time.

If you plug in your “Library” disk to a Windows machine it must be FAT32/NTFS and can only be shared via CIFS. If plugging into a Linux machine you can use EXT4 shared by either NFS or Samba or if a FAT32 disk it can only be shared by Samba.

My preference is for EXT4 filesystems as I like to LUKS encrypt all my external drives. I also dislike the need to install and configure Samba on each machine I might want to plug the external disk into when I already have NFS on all the Linux ones anyway.

The main issues with using EXT4 filesystems is that they must be attached to a Linux machne and user ownership and permissions are correctly maintained; an issue in that the Jellyfin processes are not going to be running under the same userid you used to populate the files on your disk, if mounted using samba that can be bypassed (see Samba notes below) but if NFS mounting you must change ownership of all the directories and files to the jellyfin user, which makes it difficult to add additional files using your own userid.

The issues with FAT32 filesystems are that while they can be plugged into both Windows and Linux machines while easy to share from a Windows machine on Linux you will have to install and configure Samba on each machine you might plug it into. You must remember to override the ownership of the remote mount as discussed below on Samba mounts.

On the Jellyfin VM you need to install either the NFS or CIFS tools (or both) depending on what you will use.

Using NFS mounts

You must ensure all the directories on the remote disk are traversable/updateable by the Jellyfin user. But actulally mounting a EXT4 disk using NFS is simple.

Examples are for a EXT4 filesystem with a disk label of JELLYFIN so it is mounted under /media/mark, my Library folders are all under an Exported directory.

An /etc/exports entry on the Linux server exporting the disk (the Jellyfin VM is named jellyfin), when updating/changing it remember to “systemctl restart nfs-mountd.service”.

/media/mark/JELLYFIN/Exported jellyfin(rw,sync,no_subtree_check,no_root_squash)

Also on the server exporting the disk (only has to be done once) “firewall-cmd –add-service nfs;firewall-cmd –add-service nfs –permanent”.

An /etc/fstab entry on the Jellyfin VM, when updating fstab remember to “systemctl daemon-reload”. Note that I use noauto as I choose to manually mount my remote media.

vmhost3:/media/mark/JELLYFIN/Exported  /mnt/media1   nfs noauto,nofail,noexec,nosuid,noatime,nolock,intr,tcp,actimeo=1800 0 0

Using CIFS/Samba mounts to a disk on a Windows machine

Examples are for a FAT32 filesystem, my Library folders are all under an Exported directory.

On the Windows machine use file explorer to select the Exported directory, right click on it and select properties, select the sharing tab and share the directory with a name of Exported.

Critical notes: to avoid having to enter Windows user credentials each mount use a creditials file as shown below; and even more importantly the mounted files will be default only be updateable by root so use the the fstab CIFS mount options uid/gid to set owndership (as far as the VM mounting it is concerned) to the jellyfin UID and GID (values for you install can be obtained by grepping jellyfin from passwd and group). With those set correctly a “ls -la” of the mounted filesystem will show owner:group as jellyfin which is required.

An /etc/fstab entry on the Jellyfin VM, when updating fstab remember to “systemctl daemon-reload”.

//192.168.1.178/Exported /mnt/media1 cifs noauto,uid=103,gid=110,credentials=/home/jellyfin/smb_credfile_windows.txt

An example of a credentials file for a Win10 Home machine being used to “share” the directory

username=windozeuser
password=userpassword
domain=

Using CIFS/Samba mounts to a disk on a Linux machine running Samba

Obviously the first step on the machine you will be plugging the disk into would be “apt install samba -y” (or dnf install if on a rhel type OS).

Then “systemctl stop smbd;systemctl disable smbd”. Required as not only do we need to edit the config remember that we are using external disks that may not be plugged in so lets not allow Samba to automatically start.
At this time may as well also “firewall-cmd –add-service samba;firewall-cmd –add-service samba –permanent”.

Critical notes: remembering that externally mounted disks are normally mounted under /media/username and only username can traverse the path you do not want any defaults when the directory is shared by samba to anonymous users (anonymous users will be treated as user nobody which will not have permissions which is why normally Samba mounts are on world writeable directories or secured to a group/user in smbpasswd but we do not want to waste time with that here), so you must use the force_user entry to the user that owns the files on the server which in my case is always going to me me (mark) for disks under /media/mark.

Example of share needed to be added to /etc/samba/smbd.conf (after changes “systemctl restart smbd”).

[Exported]
path = /media/mark/JELLYFIN/Exported
browseable = yes
writable = yes
read only = no
guest ok = yes
force user = mark

Critical notes: to avoid having to enter Windows user credentials each mount use a creditials file as shown below; and even more importantly the mounted files will be default not be uodateable by the jellyfin user so use the the fstab CIFS mount options uid/gid to set ownership (as far as the VM mounting it is concerned) to the jellyfin UID and GID (values for you install can be obtained by grepping jellyfin from passwd and group). With those set correctly a “ls -la” of the mounted filesystem will show owner:group as jellyfin which is required.

An /etc/fstab entry on the Jellyfin VM, when updating fstab remember to “systemctl daemon-reload”.

//192.168.1.179/Exported /mnt/media1 cifs noauto,uid=103,gid=110,credentials=/home/jellyfin/smb_credfile_samba.txt

An example of a credentials file for a Linux machine with default samba setup (default domain is WORKGROUP) and the Exported directory above (Yes it does prompt for user root and an empty password (on Debian12 with the default setup anyway; probably need to use smbpasswd to setup groups/users but not for this jellyfin post)

username=root
password=
domain=WORKGROUP

CIFS/Samba troubleshooting notes

On the client server you are going to mount onto you must “apt install cifs-utils” (DFebian12, rhel may use a different package name). It installs a lot of stuff so you may want to remove it again after testing.

To list shares available on the remote server using/testing a credential file

smbclient --authentication-file=/home/jellyfin/smb_credfile_samba.txt --list 192.168.1.179

To list shares available on the remote without a credential file (will be prompted for empty ROOT password)

smbclient --list 192.168.1.179

To manually mount/test a mount works before adding it to fstab
Below prompts for password for ROOT@ipaddr, which is just enter anyway

mount -t cifs -o uid=103,gid=110 //192.168.1.179/Exported /mnt/media1

Or with a credential file with root and blank password skips that prompt

mount -t cifs -o uid=103,gid=110,credentials=/home/jellyfin/smb_credfile_samba.txt //192.168.1.179/Exported /mnt/media1

On the Samba server side logs are kept in /var/log/samba and any mount errors if the client was able to contact the server will be in a file named log.servername (so in my case log.jellyfin as jellyfin is the client servername) or if the server name is not resolveable log.ipaddress if you have left the rest of the default samba configuration file untouched.

Update: 09Apr2024
I have discovered using the default FAT32 file system that comes on standard 4Gb external disks either foesn’t like large files or Linux doesn’t play well with that filesystem type; I for some reason (porability) decided to use the default filesystem on a seasgate 4Tb external drive mounted to Linux; a directory containing large files became corrupt [Linux showed “d?????????”; plugging it into a windows machine windows found no errors but when trying to delete the damaged directory windows even though file explorer showed it, it also said the directory did not exist and may have been moved when trying to delete; none if the hits on solutions a google google found on that windows error (there were many, this issue seems to be common for FAT32) worked].
So, don’t use FAT32. I have now ditched FAT32/CIFS for the disk and gone back to EXT4/NFS to serve the disk to my Jellyfin VM as I have been using that combination to serve remote disks for years without issue.
Up to you of course.

Posted in Home Life | Comments Off on Jellyfin media server, and using remote disks

Apache Guacamole terminal server – self hosted

Apache Guacamole is a terminal server you can run in-house to provide access to pretty much any server you are running using nothing but a web-browser. It can be installed onto a physical host or VM, or run as docker containers. Installed natively you will need at least 4Gb of memory and a couple of CPUs; installed as containers it will run happily on a host with 2Gb memory (I have run it in 1.5Gb OKwith minimal swap usage if it is dedicated to that) and a couple of virtual CPUs.

Useful for those that do not already have tools to provide the ability to SSH, VNC or RDP into remote machines; it does not yet support the spice protocol so not really suitable for KVM users unless you want to reconfigre them from the default of spice to vnc (not as simple as it sounds and if you are a Linux user which you are if using KVM you are then you already use remote-viewer that supports both spice and vnc for access to all your VM consoles).

For windows clients that may not have all the opensource tools Linux users have or those that simply like the idea of accessing everything via a web browser this is also useful.

There are lots of Videos on YouTube showing how easy Apache Guacamole is to setup and use. Interestingly none of those videos mention any of the pitfalls or issues you can expect from using it; so I will cover a few of those here.

If you have not already implemented remote access from your desktop to everything in your home lab this can be very useful, although in all honesty it is only easy to setup if you have already setup remote access to everything already. The reason I say it is easier if you already have setup remote access in other ways already is covered in the available connection types available quickly listed here

    1. Kubernetes, I have not tries to figure this one out yet
    2. Telnet, nobody uses that to logon anymore
    3. RDP access to windows machines, is obviously going to be easier if you have already configured your windows machines to accept RDP connections
    4. VNC access to windows machines, is obviously going to be easier if you have already configured your windows machines to accept VNC connections (I use TigerVNC to remote access my windows machines as RDP is only available in windows professional/enterprise editions not home editions)
    5. SSH access to remote machines, is obviously easier if you already have a existing private key on your desktop for your user that you already use to SSH into your remote machines, so you can simply add the username and paste the existing key when defining SSH sessions for your user (which is not good practice, but I have put my thoughts on that below)
    6. SPICE is not yet implemented, this is bad if you use KVM virtual machines as by default all console sessions for KVM instances default to SPICE [which you have of course configured to fixed ports] (and I have a few notes on that below as well)

    My notes on point E SSH are that while it is probably best practice to have a private key per machine the user is likely to logon to with the public key copied to all the servers the user is likely to want to ssh to from each of those machines apart from it eventually becoming unmanageable… it does not actually work like that which is why private keys should be secured tightly, you can copy the private key to any machine and it will work.

    For a home lab I will in most cases have one private key per user (not machine) and use puppet to push the public key to all the servers (and puppet to push the private key to ‘desktop’ roles the user is in) so I can change the keys in a few minutes if needed… at the *nix level; which is where applications start making it messy…

    Example1: AWX (or ansible tower), as I do have a few ansible scripts I ran from the command line (ssh keys deployed by puppet of course) when I started playing with AWX I just imported the same SSH key used from the command line into AWX and it all just worked

    Example2: Guacamole (this post), I just copy/pasted the private key text into each host connection entry for my username, and SSH from Guacamole just worked [so make sure for those cases no definitions are “shared” as user/key should not be shared]

    Why are the examples relevant ?, because in both those examples the SSH keys are stored in databases now, not just in the users .ssh directory, using a tool like puppet or ansible to change keys globally is no longer an option. Possibly an ansible script could walk through the Guacamole SQL database searching for records containing the username and updating the key (it is in every user SSH connection definition); other scripts for every other application that wants to use SSH… becoming unmanageable again. For Guacamole you could configure user/password and set all your servers to allow password logons across SSH but passwords expire more often than keys.

    My thoughts on this: Just something to be aware of… lots of youtube videos on how easy Guacamole is to setup, none on the issues you will encounter as you periodically change keys and have to update every connection entry.

    My notes on point F SPICE, when using VMs under KVM the default console type is SPICE, and of course you edit your virsh definitions to use a fixed port per VM and open firewall ports to the host for the connection, so from your remote desktop when things go wrong and you cannot ssh in you can connect to the VM console from your desktop with a simple “remote-viewer spice://hostname:portnumber –display $DISPLAY” to fix the problem.

    It is possible to use “virsh edit” to play with the “graphics” section to change the KVM console settings from SPICE to VNC, in which case ytou would simply change spice to vnc for remote-viewer and use “remote-viewer vnc://hostname:portnumber –display $DISPLAY” to access the console from the command line. And when it is configured for VNC Guacamole can connect to it.

    However changing from SPICE to VNC gets more complicated with every update to KVM, SPICE now has a lot of hooks into USB virtual devices so it is not simple a case of changing the “graphics” section from spice to vnc anymore, and delting all the USB entries tied to SPICE devices can leave you with no mouse when you VNC into a remote Gnome session which is pretty fatal for troubleshooting. But it can be done.

    My thoughts on this: VNC has been implemented which is nice, RDP has been implemented which is only available on Windows professional editions (not available on Windows Home editions) but SPICE which is the default for all KVM VMs has not been implemented and not considered a priority so while it is nice Apache Guacamole is free remember it seems targeted at large commercial users (nobody at home has Windows Professional with RDP or running kubernetes) and most Linux users use KVM and spice. For home windows users TigerVNC server installed on your windows machines will work with the VNC connection type [ without Guacamole just using the command line “vncviewer -geometry 1366×768 windows-machine-ipaddr” works to test that ]; and there are a couple of developers working on providing SPICE to Guacamole (thanks folks) so it may be available one day, but at the moment if you use KVM Guacamole is not for you it you want it for console connections to KVM.

    I did mention earlier in the post it is easier to use Guacamole if you already had an environment for already remotely connection to everything. What made it easy for me to setup was my existing environment, which of course it all run from a terminal session under Gnome to provide the display; my environment is…

    • All my KVM instances do not use port “auto” for consoles, but have explicit port numbers assigned (and firewall ports opened). The consoles whether spice or vnc are remotely accessable from my desktops for those occasions when the server stops accepting ssh sessions (those that are VNC can be added to Guacamole easily; those that are SPICE cannot).I just have a shell script “vmmenu” that lets me select any VM console on any of my VM hosts that I want to connect to
    • I have a single private SSH key for my personal userid on my desktop(s), public key deployed to all Linux servers and VMs by puppet-ce; so I can already SSH to any server. Configuring Guacamole to SSH to the servers was simply a case of using the same userid and pasting the same private key into the connection entryI can ssh into any server without needing Guacamole
    • I only have Windows Home Edition on my dual-boot laptop, Windows Home Edition does not allow RDP into it. TigerVMC however is installed on it and “vncviewer” as mentioned above can remotely control that just as well as RDP could.Note: TigerVNC allows multiple connections to the same session, noticed only when I was playing with Guacamole as I had both a Guacamole (in a web browser) and vncviewer connection active in different windows at the same time, and could see a mouse move from one interface move in the other; whether that is good or bad is up to you. You should configure each user session so each user has a different port, but you do not have to apparently :-)

    So with my environment the advantages Guacamole gives me are… none. The disadvantages are that webservers (whether firefox, chrome, brave…) all chew through unreasonable amounts of real memory (and swap) on any Linux desktop.

    It is also fair to say you would never open up a Guacamole server to be accesable from the internet. Maybe if you had a VPN into your environment ? (which you would need to resolve hostnames/ipaddrs anyway).

    Soooo… I am not sure who all those YouTube videos on how wonderful it is are aimed at. Nor where this solution is aimed, it is not at Linux users as SPICE is needed for anyone that uses KVM, not at home users as Windows Home edition does not have RDP, not at corporate users as having ‘connection’ properties where for every connection properties are at a user level is not feasible (not that I would ever want anything to go near windows directory/ldap as that immediately breaks everything).

    My thoughts: I have no idea what target this tool is being aimed at, or what problem it is trying to solve as at the moment it is partially useful for a home lab environment but it works for what has been implemented. Additional protocols and authentication methods may (probably will) be added over time but I am still unsure of the target audience [ to allow access to remote servers using only a web browser… for those clients that do not already have clients for rdp/ssh/vnc/telnet/kubernetes (I cannot think of a single client OS that does not already have a client for the protocol) ].

    But it works, so lets contnue on.

    Fast start setup using the container docker implementation

    1. Create a new VM, min 2CPU and 2Gb Memory (I used Debian as the OS)
      This is not covered here as you should already know how to create VMs
    2. Install docker
      This is not covered here as you should already know how to install docker (not podman)
    3. Install mariadb or mysql
      This is not covered here, you should know how to install this from the existing repos and it is different depending on whether you are using a rhel or debian family OS
    4. Use the provided container to generate the SQL required to create the DB, and create the DB
    5. Run it
    6. logon, (otional change admim user/password,) create a group and user, and start creating connections

  1. Step D – create the mariadb/mysql database

    A container is provided that generates the DB schema; generate that, use mysql to create a database and permissions and then run the generated schema.

    docker run --rm guacamole/guacamole /opt/guacamole/bin/initdb.sh --mysql > initdb.sql
    mysql -u root -p      [reply to password prompt]
        CREATE DATABASE guacamole_db;
        CREATE USER 'guacamole_user'@'%' IDENTIFIED BY 'guacamole_pw';
        GRANT SELECT,INSERT,UPDATE,DELETE ON guacamole_db.* TO 'guacamole_user'@'%';
        FLUSH PRIVILEGES;
        USE guacamole_db;
        \. initdb.sql
        exit
    

    Step E – Run it

    The only thing to note is that the “guacd” must be started first so the other container can link to it. You must also of course ensure the DB paramaters match what you created just above.

    docker run --name mark-guacd -d guacamole/guacd
    docker run --name mark-guacamole \
        --link mark-guacd:guacd        \
        -e REMOTE_IP_VALVE_ENABLED=TRUE \
        -e MYSQL_HOSTNAME="192.168.1.189"  \
        -e MYSQL_PORT=3306 \
        -e MYSQL_DATABASE="guacamole_db" \
        -e MYSQL_USER="guacamole_user" \
        -e MYSQL_PASSWORD="guacamole_pw" \
        -d -p 8080:8080 guacamole/guacamole
    

    Thats it, when a “docker container list” shows they have gone from starting to started it is ready for use, simple yes ?.

    Step F – Use it

    Point your web-browser at your host port 8080 (remember to open ythe firewall port on your host) and you must have the guacamile/ part of the url… in the example above the URL would he http://192.168.1.189:8080/guacamole/ and logon with the defaults of user “guacadmin” with the password “guacadmin”. At that point you should create a group and your own admin user and delete the default one.

    Logon, under your “name” on the right top of the window is a “settings” option, use that to start creating “connections” to your servers.

    For ssh connections as I mentioned earlier I prefer to use a username and private ssh key, user/password may work instead. As I only have Windows Home edition (does not provide RDP server) I use TigerVNC server on windows machines and VNC in to port 5900 which works fine. For KVM consoles my graphics section of each KVM listens on explicit ports and ipaddrs (not “auto”) and a few I have converted from spice to VNC but the bulk of them I will wait until Guacamole supports the default KVM spice protocol and stick to remote-viewer rather than use Guacamole.

    Leave settings and go back to the home screen to start using the connections.

    One usage note: when you connect the browser tab is dedicated to that connection, start another tab/window to the guacamole URL to start a connection to an additional machine (repeat as needed). Then you can just switch between the tabs to switch between your active connections. On the main URL page where it shows a list of machines you are connected to… ignore that as it does not show all the active connections in the pretty picture section, you need to look at the list a little further down the page.

    Guacamole showing two active connections

Posted in Unix, Virtual Machines | Comments Off on Apache Guacamole terminal server – self hosted

Accessing EXT4 drives from Windows10

People are stil asking how to do this in forums today; presumably mostly dual boot uses that want to look at their Linux partiton.

This post is primarily for EXT4 filesystems as those are the ones used when installed supplied (–online) images into WSL2.

So there are two use cases. (1) Those that have a dual boot environment on a single machine, (2) those that wany Windows to remote mount filesystems from remote Linux machine.

Local mounting, everything on the one machine

For users with dual boot systems on the same physical hard drive

Trying to mount a ext4 partition from Windows on a dual boot machine when the linux partition is on the same HDD/SSD as the Windows partition seems to have limited optiions.

  • Commercial: Paragon software offers ExtFS for Windows. It allows you to read and write ext2 ext3 and ext4 from all Windows OS versions ( ref: http://www.paragon-software.com/home/extfs-windows/ )
  • Free and Commercial: Linux Reader from Diskinternals, that can mount all ext, HFS and ReiserFS too, but read-only in the free version at least ( ref: http://www.diskinternals.com/linux-reader/ ). You can copy files off, writing back may need the PRO cmmercial version
  • Free (GPL) solutions for “read only” on github, https://github.com/bobranten/Ext4Fsd which is a fork of the earlier https://github.com/matt-wu/Ext3Fsd/releases/tag/Ext3Fsd-0.69

Obviously none of the above or similar options will work if you have wisely LUKs encrypted that Linux partition.

For users with dual boot systems with Linux on a second hard drive

Can be done from WSL which can mount physical disks (entire disks, not partitions).

You can use WSL to mount entire disks formatted with ext4, but not individual partitions, so currently that is not an option for when you have a a linux partition on the same HDD/SSD as your windows OS, but if you Linux environment is on a seperate disk you can mount it into a WSL instance.

There is microsoft documentation on how to do this at https://learn.microsoft.com/en-us/windows/wsl/wsl2-mount-disk.

Also a Youtube tutorial on doing it at https://www.youtube.com/watch?v=aX1vH1j7m7U.

However this has both pros and cons

  • pro: WSL virtual disks (and anything mounted onto them) are visible via a Windows File Explorer “Linux” tab if they exist making them easy to navigate, they are read-write so you can open and edit shell scripts with notepad for example
  • pro: the WSL Linux system is a full linux system so you can install the crypt utilities and use this method to mount LUKs encrypted partitions if they are on a sperate physical disk
  • con: if you are going to install a Linux system under WSL anyway do you really need a dual boot environment (the answer of course is yes if you intend to use it rather than play with it, ie: if you are going to store data on it it should be LUKs encrypted which I do not thing WSL supports for its virtual disks

But… if you just want to play with Linux just install into WSL which takes only a few commands and is much easier than having to partition disks and manually install an OS and edit the bootloader etc. so the issue will not exist. An example is shown below where a installed Ubuntu image is viewed using Windows File Explorer from the “Linux” tab.

Note: I assume Windows has some sort of API into WSL virtual disks to manage the ext4 file systems (a “mount” command within a WSL instance shows they are ext4 so may be all they support) and will not magically recognize a external disk with a ext4 file system that is just plugged into a USB port. I have not tried that as all my external disks are LUKs encrypted which it does not handle. I might try with a dying USB stick one day by making that ext4 to see if it does handle it.

Remote mounting, the filesystem is on a remote machine

Again multiple ways of doing this. It depends on your environment. Both options I would consider involve a lot of work.

If you want to avoid using WSL alltogether really your only option is SMB/NMB (SAMBA). Anyone who has been playing with Linux servers for a while (long before Windows10/WSL) and still has some Windows clients will have gone down this path. However while that may make it easy from a Windows user point of view as they will (should after a lot of effort) show up as available networked devices setting up Samba on your Linux servers is not trivial; you do not want to do that on a lot of machines. It also tends to break with every upgrade and need rework.

If you have an old USB attached printer on one of those remote servers then Samba and CUPS may be the only option for using the printer; with newer ip printers not so much

That solution suits an environment with very few Linux servers and a lot of Windows clients, where whoever is on the clients has no interest in knowing that the filesystems are served from a unix server. However you need to setup users on the Linux server(s) for each Windows client user, decide on a domain server, and generally do a lot of work on the Linux server side.
For someone that just wants to mount an ext4 filesystem for single use that is overkill.

The second option, if you are a lone developer with a single windows client, is to use WSL on the Windows10 machine and just NFS mount the remote filesystem you want to work with onto a directory in your WSL imstance so it shows up under the Windows File Explorer “Linux” tab as just another directory to walk into.

That also requires some work on the remote servers to setup the NFS exports; plus a few commands within the WSL instance to mount them. and never mount a directory that has symbolic links and expect it to work… for example you may think creating an exports entry of /mnt/shareme on the remote machine and placing in it symbolic links such as /mnt/shareme/marks_homedir pointing to /home/mark on the remote machine… the actual behaviour when that is mounted is that references to /mnt/shareme/marks_homedir will indeed reference files under /home/mark on the local machine not the remote one. Yes a nasty trap; do not use links.

It should be noted that the issue of symbolic links causing problems is not limited to NFS mounts; while that exact trap should not confuse Samba (which should map exact directories and not links as well anyway; but in a reference mapping case Samba would cope). I doubt there is a unix admin alive anywhere that has not done a “du -ks dirname” to check space in a directory and found it used say 100Mb then done a recursive copy “cp -rp dirname newplace” and found the target disk 100% full after a few 100Gbs of data were copied… because somewhere in the directory structure was a link pointing up a directory level which was faithfully followed up a level, reached again and followed up a level, repeat until out of disk space. (now even if mapped by Samba trying to copy the directory and everything under it would have the same loop condition).
Basically try and avoid mounting any remote directory that has symbolic links under it.

I will not discuss Samba, setting that up is a never ending task. To implement NFS mounts to a WSL instance however is simple.

  • start a powershell session
  • “wsl –list” to see what you have installed
  • if nothing then “wsl –list –online” to see what is available and “wsl –install -d nameofoneofthem” to install one
  • Always “wsl –update” to get the latest kernel
  • simply use “wsl” to drop into it
  • then “sudo apt install nfs-common”, and you have everything you need to mount remote exported filesystems which when mounted to WSL are read/write available to Windows vie the “Linux” tab in Windows File Explorer

On the remote server to make for example /home/mark available, /etc/exports would contain

/home/mark *(rw,sync,no_subtree_check,no_root_squash,insecure)

The * before the ( should be replaced by the ipaddr of the machine running WSL (not the ip assigned to the WSL instance as it is NAT’ed via the host machine so it is the host machine ipaddr that is presented to NFS on the remote machine); but * works if you don’t care who connects.

If you change or add mountpoints to /etc/exports on the remote machine you must “systemctl restart nfs-mountd.service” on the remote machine to pick up the changes.

Under WSL simply “mount -t nfs xxx.xxx.xxx.xxx:/home/mark /some/local/dir/mountpoint”. From Windows File Explorer under the “Linux” tab, under the WSL insance, it is available under /some/local/dir/mountpoint as fully read-write.

Posted in Unix, windoze | Comments Off on Accessing EXT4 drives from Windows10

Setting up a DNS (dnsmasq) server for your home network

First a what this post is and is not. It covers only using dnsmasq.

Who this post is for

It is is primarily for people that have a lot of VMs running in their
home network and are finding the time needed to keep /etc/hosts files
up to date on multiple machines.

It is is for people that want a simple home network dns resolver
as the solution rather than investigate deployment tools that could push out
hosts files to dozens of VMs/servers.

it is for people who used to have dnsmasq working and then it suddenly all
broke a few years ago.
(which is a read the docs, the way short names were handled by default completely
changed, which is good because now all clients across all OSs can expect the
same responses).

And the biggee; it is for people who have tried to set it up but have issues
with short names not resolving, SERVFAIL requests coming back, or just a
general mess as a result
. This post covers off the things that are probably
breaking in the way you have set it up.

And see the very last section on why you would want to, you may not want to bother.

Who this post is not for

it is definately not for people that want to assign known ip-addresses
to the many devices that may connect to their home network via wireless connections…
as of course they get the info for wireless connections from your wireless router which
is outside the scope of this.

That includes things like laptops that may have both a cabled static address plus a wireless
dhcp address to your network active at the sametime, the wireless connection will really
mess things up as the dhcp settings happily take precedence over you static ones.

So on to DNSMASQ itself

As I am sure you are aware dnsmasq builds it entries by reading the /etc/hosts
file in the machine it is running on (by default, you can provide other files if you wish).
That should be simple should it not ?, if your hosts file works it should also work in dnsmasq right ?.

Of course not, your hosts files are probably populated with a mix of short names and FQDNs
making it impossible for a remote client to know what format to use.

If you are reading this post because you may have issues with short names not
resolving for some machines, sometimes long names do and sometimes they do not, or
generally strange and what appears inconsistent behaviour then read on.

The key thing is that using dnsmasq to privide DNS lookup services for
your home network is that everything in your home network should be in the
same domain
.

So the issues you experience could be caused by

  • you have not configured your domain in dnsmasq
  • your servers were installed using defaults like localdomain or
    if built using dhcp before assigning a static address will be in your
    routers domain (Home, D-Link etc) instead of your home domain
  • you have made life over complicated by having both short named and FQDN names
    in the hosts file used by dnsmasq

STEP1 – configure DNSMASQ properly

First step: in any /etc/hosts file on servers running dnsmasq only have the
short names of the servers (ie: do not have entries like myserver1.myhomenet.org;
you would only have a entry for myserver1). If you have a FQDN in there you
are doing it all wrong.

Second Step: in /etc/dnsmasq.conf search for the line “#local=/localnet/” and
change it to your domain, for example “local=/myhomenet.org/” uncommented of course).

The effect of this change is that dnsmasq will append myhomenet.org to all the short
names read from the hosts file it uses (short name being anything without a trailing
dot and data). You may wonder why this is going to help if you want to lookup a short
name; read on as it is the clients fault
.
Remember to change the example myhomenet.org to your domain of course.

Third step:
In dnsmasq.conf uncomment “domain-needed” and “bogus-priv”. You do not want
short name queries forwarded to the internet;
and we will be correcting client queries later.

Fourth step:
In dnsmasq.conf uncomment “strict-order”

On your dnsmasq DNS server(s) you should have configured them to first lookup
their own ip-address (have your local dnsmasq server ip as the first nameserver as it
should be the first queried by tools like nslookup if run on your dnsmasq host;
or you will find client queries work but quesries on the dnsmasq host itself do not, so you
must also tell this dnsmasq server it can resolve names using itself),

if you do not have strict-order plus your dnsmasq server first
nameservers will
be selected from the list in resolv.conf in random
(or round robin) order
and you will end up with some queries being sent
to DNS servers that know nothing about your local domain (if step3 was not done
to the internet to resolve), and queries will randonly fail.

For those of you who may have got to playing with this in the past and got
frustrated by some queries made on the dnsmasq host itself working and some not,
omitting strict-order
is probably why, it would have been occasionally querying the upstream server
instead of itself (for queries from things like “nslookup”, tools that do
not use dns but would have used the /etc/hosts file will still have worked
masking any issue… assuming your nsswitch.conf has files as the first entry
which I think you should have in a home network, even if a clients host file
has nothing but localhost in it there is always going to be an exception.

Slightly of topic, “dig” and “nslookup” work in different ways, if “dig”
is working and “nslookup” is not you have a config error in your DNS setup.

That is it for the dnsmasq config but I bet it is still not working :-).
We have to move onto the clients.

Random tip: if your DNS search list (/etc/resolv.conf) on the server running
dnsmasq does not contain
the ip of the machine running yor dnsmasq instance and the interface it listens
on is managed by NetworkManager
the example below may help (eth0 is the
primary interface on my server and .181 is the server ip… I lock it down to
server ip, if you do not do that 127.0.0.1 may work as the first entry).
The last nameserver (.1) is my router, so with strict order it it cannot find
a server locally it will hunt the internet dns servers for it via the router
with whatver ISP defaults were set on that.

nmcli conn modify eth0 ipv4.dns 192.168.1.181,192.168.1.1
systemctl restart NetworkManager

STEP2 – configure your clients, thats the real problem

We have dnsmasq setup correctly, why are queries for short names still failing ?.
Because the clients are misconfigured of course.

Now I do not now how many servers or VMs you have setup, but if you
have accepted all the default prompts you probably have localdomain as a domain.
If you installed a VM using DHCP and later changed it to a static address it probably
still has the domain name assigned by the DHCP sever (or your router).

Guess what, that is not going to work. There are two things you can do,
manually edit /etc/resolv.conf on each server after every reboot or fix
each client server.

Do it properly, fix each client server and for future installs use the
correct domain name instead of the defaults.

You may see in some of your client /etc/resolv.conf files entries like “search localdomain” or “domain localdomain” (or on some of my VMs built from DHCP before changing to static things like Home or D-Link).

That is obviously a problem that will prevent name lookups working… (well part of it as discussed later).

When a DNS lookup is performed on a short name the client will
append to any short name (a hostname without a dot) the domain/search value from
resolv.conf to the query to make a FQDN to be looked up.

Now if you had a search value in /etc/resolv.conf on the client as used in the dnsmasq
steps as “myhomenet.org” then the short name query would work as the client will
append the correct domain part and find a match in dnsmasq which has also now been
configured to add the domain name to short names.

But if you have left install defaults like localdomain still lying around
on clients that will never work (unless of course all your servers are setup for
localdomain and you set that as the home network domain in dnsmasq).

So, first fix the server running dnsmasq, a unique client setup

This is what you want on the server running dnsmasq, not for clients in general.
The main difference is that the server(s) running dnsmasq must only reference themselves and
upstream DNS resolvers, never any other dnsmasq servers you may be running for the same
domain or you will be an an endless loop as they refer to each other as they try and
resolve a mistyped host name.

The assumption in the examples is that the interface name is eth0 and your dns server
ip address is 192.168.1.181 with the next upstream server an internet (or router) dns server.
The next upstream server is required as in many cases you will want to resolve
hostnames/URLs out in the big wide world also.

RHEL has pretty much dropped /etc/sysconfig/network-scripts and you must use
NetworkManager; NetworkManager is on most Debian servers as well although you
can still use the older network/interfaces.d files there

NetworkManager: Assuming eth0 is your interface name and we use the example myhomenet.org.

nmcli conn modify eth0 ipv4.dns-search myhomenet.org
nmcli conn modify eth0 ipv4.dns 192.168.1.181,192.168.1.1
systemctl restart NetworkManager   # applies changes to resolv.conf, no connection drop

For debian static manual interfaces in /etc/network/interfaces.d

auto eth0
iface eth0 inet static
      address 192.168.1.181
      netmask 255.255.0.0
      gateway 192.168.1.1
      dns-nameserver 192.168.1.181
      dns-nameserver 192.168.1.1
      dns-search myhomenet.org
      dns-domain myhomenet.org

When the dnsmasq server is correct, update/correct all your clients

It goes without saying that to avoid doing this often, always use the correct domain name
and dns list when building a new VM or server so it will never need to be done on those.

It is important to note a few things here about clients in general that do not run
any copies of dnsmasq themselves.

  1. they can refer to multiple dnsmasq servers on your home network so they can
    resolve names if one is down
  2. while you can also include an upstream DNS server that will probably stop
    things working correctly again, you should only search your home network dnsmasq servers

The second point above is an important one, you will remember we configured dnsmasq
to use strict order to always check the home network dnsmasq instance first.

Guess what, clients also are unlikely to use a strict order.
Depending upon what version of operating system you are running lookup operations
by the client are quite likely to round-robin through the nameserver list rather
than use strict order so can
quite easily query the upstream server instead of your dnsmasq server;
this is something else that can cause dns lookups of servers in your home network
to sometimes work and sometimes not
.

The solution I use is to have two dnsmasq servers so one is always available,
all clients only use those two for all name resolution (yes, including external)
and no client has any internet dns resolver address configured relying on the
dnsmasq server instead.

As the dnsmasq servers are configured with an upsteam dns then any external
host name they are unable to resolve themselves (ie: google.com) the dnsmasq servers
will query the external DNS and resturn the correct ip-address for the name to the client
requesting it. The client does not need an external dns entry and not having one
avioids lookup problems if a client round-robins through name servers.

Remember this is just for the dns name resolution, once the client has the ip-address
it will cache it and all traffic is from the client direct to the ip-address (unless you go
through a proxy of course).

Based on the examples for dnsmasq above with a single server providing dnsmasq,
and letting that dnsmasq server query any upstream dns resolvers on behalf of the client(s)
then a client configuration would be based on the above examples as below.
If you have a second dnsmasq servers just add it to the dns ip list,
as long as the two dnsmasq servers do not reference each other in any way to avoid
the endless lokkup loop situation.

# NetworkManager
nmcli conn modify eth0 ipv4.dns-search myhomenet.org
nmcli conn modify eth0 ipv4.dns 192.168.1.181
systemctl restart NetworkManager   # applies changes to resolv.conf, no connection drop

For debian static manual interfaces in /etc/network/interfaces.d

# Debian /etc/network/interfaces.d files (xxx is the client ipaddr)
auto eth0
iface eth0 inet static
      address 192.168.1.xxx
      netmask 255.255.0.0
      gateway 192.168.1.1
      dns-nameserver 192.168.1.181
      dns-search myhomenet.org
      dns-domain myhomenet.org

Congradulations,it now of course works

You can lookup your internal servers from any client by either short or FQDN names
using yout home network domain without
any issue; and internet names are still resolvable for all your clients.

Troubleshooting

  • “dig” works but “nslookup” does not. You have misconfigured it
  • to check a nameserver “nslookup hostnametolookup dnsserveripaddr”,
    if querying the dns server explicity by dnsserveripaddr returns the
    results expected there is nothing wrong with it, the issue is your nameserver search order

You could of course manually edit /etc/resolv.conf to correct search and nameserver
entries for testing, those changes would be lost pretty quickly.

For search this may be an option for testing.
(Ref: https://man7.org/linux/man-pages/man5/resolv.conf.5.html)
The search keyword of a system’s resolv.conf file can be
overridden on a per-process basis by setting the environment
variable LOCALDOMAIN to a space-separated list of search domains.

That depends on the client inplementing it of course.

The big question of course is why you would want to

So why would you want to run a home network DNS server ?.

The main reasons for needing a DNS server for your home network would be
that you have a lot of servers or VMs and trying to keep their hosts files
all syncronised is becoming too much effort; you want to be able to just
edit a single file and all your servers be able to use it.

This would only be an issue if you did not have any sort of deployment
infrastructure like ansible/chef/puppet that could deploy a “template”
hosts file to all your servers from a single source file; and yes I do
mean template file not a static hosts file sas each hosts file would have to
be correctly set with the ip and hostname of the server it is being deployed to
as so many things depend on that.

Now suppose you did run two dnsmasq servers, without a deployment tool
to push a central edited hosts file to both servers and restart dnsmasq
you are already editing two hosts files for every change now. Still a lot
less than the effort of doing so on every server but they could also get out
of sync if manual edits on each are required.

You should (if you have enough headroom for another small VM on a different
physical server) run two copies; if you followed the post I have made here you
will have all your clients now doing internat name resolution via your dnsmasq
servers upstream queries so if you have a single dns server and it stops
you have lost name resolution to not only your servers but to the internet
(which is not an issue, you just edit /etc/resolv.conf to insert a nameserver
line for your router or maybe one of googles nameservers to get internet access
back for that client so you can at least watch youtube while your dnsmasq server reboots).

Using something like puppet-ce or ansible would let you deploy a ‘source’
hosts file to both dnsmaqs servers so you only need to edit the file in one place,
however you could also go wild and use them to deploy the hsosts file to all
your servers negating the need for a home network dns server at all… the drawback with
the later of course being that anyone that can see your /etc/hosts file would then
know every machine in your network; best to have it on as few machines as possible.

Deployment tools have a learning curve you may not be interested in, so for
a home network dns setup I would say just run two dnsmasq servers and on only
those two servers have a rsync job that runs occasionally to check for an
updated hosts file on whatever server you want to make edits to it on. Or if you
only want to run one then there is only one file to edit so no syncing needed at all.

Once you get above 4-5 VMs manually keeping /etc/hosts files up-to-date
on each becomes a nightmare. A home network DNS resolver (or two) becomes
essential.

Hopefully this long winded post has got you past any issues you were having
with setting on up using dnsmasq.

One last note: if the changes above do not result in network or NetworkManager startup correctly setting /etc/resolv.conf then… it could simply be you do not have NetworkManager/resolved or similar service installed that updates it; in which case simply vi/nano the resolv.conf file and set the values you want. That took me ages to work out when a new VM refused to correctly set resolv.conf, until a check on the last modified time showed nothing was updating it on a reboot at all. So a new first step, see if the file is being updated on boot and if not just manually edit it.

Posted in Unix | Comments Off on Setting up a DNS (dnsmasq) server for your home network

I have upgraded a few machines from Alma8.7 to 9.2 – my notes

I saw Alma had 9.2 available this month so decided to upgrade my 8.7 versions.

My old sledgehammer method of “dnf distro-sync –releasever=nn” did not work for this. It may have if I had spent months resolving conflicting packages but there is an easier way.

This post at https://www.layerstack.com/resources/tutorials/How-to-upgrade-from-AlmaLinux8-to-AlmaLinux9 covered all the key steps. It has a few unnecessary steps and omitted a few issues you will hit that I have added here.

Look at the things to watch out for at the end of this post before deciding if you want to actually do this.

Things not mentioned in that post are you have to cleanup any old gpg-keys stored in rpm, I also had to remove one conflict package (even though pre-upgrade found no issues), and the biggee was finding that the install had inserted a grub command line option to set selinux to permissive even if /etc/selinux/config was set to enforcing. That last biggee may have resolved itself the next time you get a new kernel version but if you are using enforcing you probably do not want to wait that long.

Warning: do not do the initial step to fully dnf update your installed system. I did that on one of my systems and it upgraded it from Alma8.7 to Alma8.8; the “leapp” (as of 21May2023) only supports upgrades from 8.7 and flags 8.8 as unsupported/inhibitor for upgrade to 9.2. They may have fixed that by the time you read this of course.

Another thing of note which I have repeated in the “Things to watch out for” list is that when it reboots to do the upgrade if you have a console session you will see it drops into emergency recovery mode with the dracut prompt, do not as I first did start issuing commands to find out what has gone wrong… just wait, the upgrade wants to be in that place.

And a critical warning the legacy network configuration method of using config files in /etc/sysconfig/network-scripts depreciated in rhel 8 is now fully removed in rhel 9; if you use that method to configure the network your 9,x machine will boot without any networking configured. So you must fully convert your machine to use only NetworkManager before upgrading (and yes, I did ignore that and ended up with a machine with no network configured, but fortunately with a console). HOWEVER on one server I upgraded using network-scripts the network was not configured after the upgrade, on another el9 the legacy scripts did configure the network successfully; so I don’t know what is going on there.

Then the initial steps to upgrade are as below.

# First "vi /etc/firewalld/firewalld.conf" and set "AllowZoneDrifing" off
setenforce 0
curl https://repo.almalinux.org/elevate/testing/elevate-testing.repo -o /etc/yum.repos.d/elevate-testing.repo
rpm --import https://repo.almalinux.org/elevate/RPM-GPG-KEY-ELevate
yum install -y leapp-upgrade leapp-data-almalinux
leapp preupgrade
# READ THE REPORT AND FIX ANY ISSUES AND REPEAT preupgrade ABOVE UNTIL NO INHIBITORS
leapp upgrade
grub2-install /dev/sda   # << whatever your disk is (see Note1 below)
reboot    # DO NOT CUT/PASTE THIS LINE, you will want to check for errors first

Note1: On legacy BIOS machines the report produced in the preupgrade and upgrade steps highlights it will not upgrade grub on legacy boot machines (ie: any VM or cloud image) and needs the grub2-install run. My VM boot partition was /dev/sda1 but that was a ext4 filesystem grub2-install refused to write (or may have as it issued lots of warnings rather than errors) to so I used /dev/sda which seems to have worked (now also done for a few /dev/vda so that seems the correct approach).

After rebooting check the version in /etc/os-release is indeed 9.2 (or later is you are reading this long after I typed it in). It is important to note that the upgrade is not completed at the time of that last reboot. If you have a console session you wil see that the "leapp" app is still performing post-install activites for quite a while after it has rebooted and allowed ssh logons.

The upgrade does not upgrade anything from third party repos so you need to do that yourself after the upgrade and do another relabel.

dnf -y update
touch /.autorelabel
reboot

Things to watch out for

  • At 21May2023 if you have upgraded to Alma 8.8 (done if you have done a "dnf update lately") you cannot upgrade... not until they update the "leapp" script anyway as the last version it supports for upgrade is 8.7
  • After the upgrade you must run another dnf update as nothing from third party repos is upgraded by "leapp" (probably a wise design decision as they cannot all be tested).
  • Not all third party repositories have packages (or even repos) for 9.x yet. DNF behaviour seems to have changes to abort if a repo cannot be reached rather than skipping it and moving onto the next so unless you are faliar with -the disable-repo flag life can be painful here
  • Look through the preupgrade report and fix as many issues as you can, do not proceed until you have resolved all inhibitors
  • You must convert any legacy network config in /etc/sysconfig/network-scripts to a NetworkManager config as legacy network configuration is not supported at all in rhel 9. You can try "nmcli connection migrate" to attempt to convert your configurations but no guarantees if it is a complicated one.
  • you will find that the upgrade has inserted an enforcing=0 into your grub command line so even if /etc/selinux/config is set to enforcing it will be ignored and you will be in permissive mode . That may fix itself on the next kernel update that updates grub but you may want to fix that yourself.
  • As noted above you have to manually run another update to update from 3rd party repositories.
  • If you have any SHA1 keys stored in RPM the upgrade will be inhibited until you remove them.
  • IPTables has been depreciated and while the upgrade still leaves them available for now some packages that used to use iptables no longer have versions with iptables support (for example fail2ban was removed from my servers suring the upgrade as only firewalld versions are now available in the repos).
  • If upgrading from a machine (or VM) you have had for a while ypu may see in the preupgrade report issues like "OLD PV header" in which case you must update that before doing the upgrade. That is simply "vgck --updatemetadata YourPVNameHere"
  • And absolutely horrible things to ignore, if like me you have a console available when you do the reboot for the upgrade you will see truely horrible messages about the systen dropping into emergency recovery mode with journal errors. You have plenty of time to try and start troubleshooting as it is in recovery mode and you can happily enter commands, don't. After 2-3 minutes you will see "leapp" messages start to appear on the console as it starts upgrading.
  • On the journal, if you have a journalled filesystem you will see after the system is stable a lot more journal rotation messages logged which is annoying. If like me you have lots of nice automation rules to manage rotating and cleaning journals they will have to be revisited.

Tips: the SHA1 keys

You can display the gpg-keys in RPM with

rpm -q gpg-pubkey --qf '%{NAME}-%{VERSION}-%{RELEASE}\t%{SUMMARY}\n'

You can use the standard RPM commands to display details of each key, for example

rpm -qi gpg-pubkey-4bd6ec30-4c37bb40

You can use the output and clean up old keys, ie: if the install date was a while ago it may not be needed). If you have upgraded from other OS'es you may have a lot of old keys, for example I had gpg-pubkeys for multiple versions of Fedora and CentOS (I migrated from Fedora to CentOS, then to Alma, and all the old gpg-pubkey entries were still there)

And the stndard RPM commands to remove/erase old keys

rpm -e gpg-pubkey-4bd6ec30-4c37bb40

If you have SHA1 signed packages you absolutely must keep you can before the preupgrade and upgrade use "update-crypto-policies --set LEGACY" jst remember after you have finished the upgrade use "update-crypto-policies --set DEFAULT" to set it back to the default. You will have to do that flip-flop every time you want to do anything with packages/repos using SHA1 keys.

Posted in Unix | Comments Off on I have upgraded a few machines from Alma8.7 to 9.2 – my notes

Been a long while since I posted anything, the real world has been keeping me too busy. But here are a few things that have been irritating me lately.

Will hackers ever go away

There was an article on the register with a link to known Russian hacktivists Killnet open proxy addresses. Reference https://www.theregister.com/2023/02/06/killnet_proxy_ip_list/

I downloaded the list and compared the ip-addrs in it against the 15530 entries my automation rules have added to my blacklist filter since I last rolled it on 17 Aug 2022. There were 16 matches. Mainly just trying “/wp-login.php?action=register” or running “masscan”.

The masscan is on github at the public repo https://github.com/robertdavidgraham/masscan and can be used to scan the entire internet and warns it can cause denial of service if used for ipv6 addresses. Looks like hacker kiddies have taken to playing with it as is rather than trying anything orgional.

The list referred to by the register site has around 16,000 entries as its rather specifically targetted at the open proxies used by the russian killnet hacking team. Are they a worry ?, no, I have 15514 additional unique ip-address that have tried to hack into my server in less than a year and this is a personal web sever that gets almost no real traffic so there are a lot more hackers to worry about than that one group (more in my list now, a few more realtime ones added and a couple I manually added as snort tells me there are still losers out there trying the log4j hack).

And a Docker update broke docker

The below used to work in damon.json for docker, it stopped working after my last update and prevented docker starting.

{
  "insecure-registries" : [ "docker-local:5000" ]
}
{
  "dns": ["192.168.1.179", "192.168.1.1", "8.8.8.8"]
}

No biggee, apart from I did not notice for a while. The fix is to just edit the file to be as below.

 

{
  "insecure-registries" : [ "docker-local:5000" ],
  "dns": ["192.168.1.179", "192.168.1.1", "8.8.8.8"]
}

Yes I do run a local insecure registry for internal use. It is only exposed externally via a https proxy so not an issue.

 

Bloody Java

Interesting article on the register at https://www.theregister.com/2023/01/27/oracle_java_licensing_change/

Many of you will already have come across discussions on those licensing changes on forums or technology websites.

Oracle bless them are short of money again, and are changing the licensing terms from “per user” to “per employee”, so if you have one Java server and two Java developers you are probably not paying much at the moment; if your company has 100 employees your license costs have just gone through the roof.

Too avoid the Oracle Java oin a server side is probably not too difficult. For a Java EE engine I personally try to stick to alternatives to anything from commercial vendors such as Oracle and IBM and use Jetty as the Java EE server where possible as it has a very tiny footprint. There is also the opensource Glassfish Java EE server or Apache Tomcat Java EE server for those wanting a heavier footprint server [note: with Eclipse and Apache licenses respectively (not GPL)]. It should be noted there is at least one company providing commercial support for Glassfish (according to Wikipedia) but as a general rule you are on your own with opensource although it generally just works.

IDEs and compilers on the other hand may be an issue. There are a lot of IDEs out there that compile Java code (a useful list at https://blog.hubspot.com/website/best-java-compiler) but it is hard to determine what they use in the back-end to actually compile Java code.

It is most likely, but no guarantees, that the Java compiler packages used by most Linux systems are not provided by or directly based on anything by Oracle; that would generally be OpenJDK (jdk.java.net) and OpenJFX (https://gluonhq.com/products/javafx/) [note: OpenJDK seems to at some point feed into JavaSE non-free from Oracle (noted on the download pages on jdk.java.net that it is also available as an Oracle commercial build), not sure or care how that works but it may become an issue at some point].

Windows users will most likely have by default Oracle supplied Java runtimes and backend compilers.

I guess the key thing to note from the licensing changes is that you should avoid Oracle software because it will eventually sting you.

Posted in Home Life | Comments Off on

When puppetserver master CA expires

One issue with puppetserver CE is; the damn CA and key expire

OK. That is obviously a good idea; but it is a real pain to sort out.

Everything here is for puppetserver version 7.8.0 and puppet agents at versions 7.17.0, 6.26.0, 5.5.20 (alma and debian11, rocky and fedora33 respectively).

Some of the issues I had to get around were that Debian and RHEL family servers seem to have the certs (including copies of the expired ones) in different places and I have one server with a different version of the puppet agent that… you guessed it has them in a different place. Throw in a couple of servers where the puppet agent is downversion at 5.x and the locations chage again.

Of course the major isue for most people is that it is very difficult to find how to rebuild a new CA and puppetserver certificate; lots of pointless google hits before I found the solutions.

Oh and the biggest issue was it took a while to determine what the problem was; the first error was simply the message “Error: Could not run: stack level too deep” from a ‘puppet agent –test’ request; for those reading this post from searching on the error message it probably means your puppetserver CA cert has finally expired, which I did not find at all obvious from that error message.

Anyway, agents cache the expired certificate from puppetserver in different places depending on OS and puppet agent version, likewise the agent keys. I could have made a smarter playbook to use ‘puppet config print | grep -i ssldir’ on all the servers; but to hell with that complexity. If a ‘rm -rf’ is done on a directory that does not exist it does no harm so I just chose to swat every posible directory… because I did not want to do it manually as I had 10 VMs to sort out (and you may have more).

Fortunately I had used puppet earlier to deploy ansible, so all servers with a puppet agent had the userid, ssh keys, and my extremely restricted sudoers.d file for ansible deployed already; so I could use that to sort out all my servers (although I will have to revisit the restricrtions as the ‘rm’ paths are not as tightly locked down as I thought).

As I had to clean-up multiple servers it was easiest to do it using ansible (actually it wasn’t; I probably spent longer getting the playbooks working than it would have taken to do it manually on each server, but next time it will just be a few commands).

Basically for the cleanup to work all puppet agemts must be stopped, if even one is left running it could post a cert request that would stop a new puppetserver CA from being created.

So I have used three playbooks, one to stop all puppet agents (and puppetserver when that host is in the inventory) and delete all agent certs, the second to stop puppetserver and delete all certs it knows about plus create a new CA and certificate, and the third to restrt the agents. If you (correctly) do not have autosign configured you will need to manually sign the cert requests from the agents.

But if you have the issue described here, and need to regenerate the CA and certs, even if you do not use ansible you can pull the commands required from the three playbooks here… just remember that before running the commands in the second playbook ALL agents on all servers that run puppet agents must be stopped.

The shell script I use to run the playbooks showing the correct order

ansible-playbook -i ./hosts --limit always_up ./wipe_puppet_certs_part1.yml
ansible-playbook -i ./hosts --limit puppet ./wipe_puppet_certs_part2.yml
ansible-playbook -i ./hosts --limit always_up ./wipe_puppet_certs_part3.yml

Playbook 1 – stop agents and erase their certs

---
- name: Stop all puppet agents and wipe their certificates
  hosts: all
  vars:
    puppet_master: "puppet"
  tasks:
    - name: Stop puppet master if puppetserver host
      become: "yes"
      command: "systemctl stop puppetserver"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Stop puppet agent
      become: "yes"
      command: "systemctl stop puppet"
      ignore_errors: yes

      # SCREAM the below does not delete the files on all agent servers, no bloody idea why
      # maually stopping puppet, copy/paste the rm command, start puppet; and its all ok
      # but the entire point is not to do it manually
      # The issue is the below will not work
      #      /bin/rm -rf /etc/puppetlabs/puppet/ssl/*
      # The below will work; but have to rely on puppet to recreate the directory
      #      /bin/rm -rf /etc/puppetlabs/puppet/ssl
      # Ansible must do some nasty expansion that screws it up with the /*.
    - name: Delete puppet agent certs dir 1
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppet/ssl"
      ignore_errors: yes

    - name: Delete puppet agent certs dir 2
      become: "yes"
      command: "/bin/rm -rf /var/lib/puppet/ssl"
      ignore_errors: yes

    - name: Delete puppet agent certs dir 3
      become: "yes"
      command: "/bin/rm -rf /etc/puppet/ssl"
      ignore_errors: yes

    - name: Delete puppet agent certs dir 3
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppetserver/ca"
      ignore_errors: yes

Playbook 2 – on puppetserver host only stop puppetserver, erase existing certs, create new ones, start puppetserver. Use your domain name in the alt-name.

---
- name: Force recreation of puppet master CA
  hosts: all
  vars:
    puppet_master: "puppet"
  tasks:
    - name: Stop puppet master
      become: "yes"
      command: "systemctl stop puppetserver" 
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Erase puppetserver certs on puppet master
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppetserver/ca"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Erase any local agent certs on puppet master
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppet/ssl"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Create new puppet master CA
      become: "yes"
      command: "/opt/puppetlabs/bin/puppetserver ca setup"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Create new puppet master certificate
      become: "yes"
      command: "/opt/puppetlabs/bin/puppetserver ca generate --certname puppet --subject-alt-names puppet.yourdomain.org --ca-client"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Start puppet master
      become: "yes"
      command: "systemctl start puppetserver" 
      when: inventory_hostname == puppet_master
      ignore_errors: yes

Playbook 3 – start agents, they will generate new certs and signing rerquests

---
- name: Start all puppet agents
  hosts: all
  vars:
  tasks:
    - name: Start puppet agent 
      become: "yes"
      command: "systemctl start puppet"
      ignore_errors: yes

Then if you are not using autosign (which you should not be) use on the puppetserver host ‘puppetserver ca list’ and ‘puppetserver ca sign -certname=xxxx’ to sign the cert requests from the agents.

And some additional notes for V5.x agents

There is an additional step if you have any agents in the 5.x (and possibly 6.x) range. In puppetserver version 7 the certificates are chained and version 5.x servers cannot handle that, they only retrieve the first key in the chain and connot autheniticate it. Documented https://puppet.com/docs/puppetserver/5.3/intermediate_ca_configuration.html but in the simplest terms you must copy the entire CA key to each 5.x version puppet agent manually. You must also set chaining to ‘leaf’ or you will still get lots of certificate verification failed errors.

Puppet is setup so that old keys are cached, so old agents were able to update their personal server keys and keep working until now, but we have just recreated all the keys so the new keys have to be copied to the older version servers and they need to be configured not to do a full chain check they can never complete.

Ideally the cert would be copied from the puppetserver machine, but all V7 agents seem to retrieve the entire certificate so if your ansible host is running with a recent puppet agent version the below playbook will work to get those old V5.x agents working again. It is basically just the steps from the webpage document reference above put into a playbook so I don’t have to do it manually on all servers. Note: you may need to reply ‘y’ to fingerprint prompts for the scp step as ansible likes to use sftp rather than scp (as I ran the first three on all servers with no issue but still got a prompt for one of mine when it ran the scp in this playbook).

---
- name: Copy CA keys to old version 5 agents
  hosts: oldhost1,oldhost2
  vars:
    user: ansible
  tasks:
    - name: Copy new CA key to V5.2 puppet agents
      local_action: "command scp /etc/puppetlabs/puppet/ssl/certs/ca.pem {{user}}@{{inventory_hostname}}:/var/tmp/ca.pem"
      ignore_errors: yes
    - name: Install key on V5.2 puppet agents
      become: "yes"
      command: "/bin/mv /var/tmp/ca.pem /etc/puppet/ssl/certs/ca.pem"
      ignore_errors: yes
    - name: Alter cert revocation handling
      become: "yes"
      command: "puppet config set --section main certificate_revocation leaf"
      ignore_errors: yes
    - name: Restart puppet agent 
      become: "yes"
      command: "systemctl restart puppet"
      ignore_errors: yes

I have a few extra lines in the bash file I use to run the playbooks, just to be abosolutely sure I only hit the servers that are 5.x for that last additional playbook.

ansible-playbook -i ./hosts --limit oldhost1 ./wipe_puppet_certs_part4.yml
ansible-playbook -i ./hosts --limit oldhost2 ./wipe_puppet_certs_part4.yml

And thats it. Everything should be working again.

Posted in Automation | Comments Off on When puppetserver master CA expires

Writing Ansible modules using bash instead of python

If any of you have been paying attention I have lately been looking into ansible.

First a disclaimer, I use ‘puppetserver’ and puppet agents as since they moved away from Ruby to their own scripting language which is pretty much english it is incredibly easy to configure. If / ifelse / else syntax means I can have a simple config for say ‘desktop’ that configures a desktop selecting appropiate packages for Ubuntu/Debian/Fedora/CentOS/Rocky and for specific versions (ie: CentOS8 is missing packages that were in centOS7, debian uses completely different package names etc.) And puppet has a fantastic templating feature that maybe one day in the future ansible will be able to match.

Ansible with the playbook parsing json responses from servers can have the playbook configured to run seperate tasks depending on the results returned in previous steps but yaml files are not plain english and it doesn’t really support targets such as ‘configure desktop reguardless of OS’ and at the moment you are better off having a seperate set of playbooks per-OS type… or more simply it is not as readable or manageble yet.

Ansible is also primarily targeted at (has builtin/sipplied modules for) Linux servers although it supports windoze as well; the main issue with ansible is that most of the modules are written in python(3). In the Linux world that is not really an issue as it is almost impossible to have a Linux machine without python. On RHEL based systems it is so tightly integrated it is impossible to remove (it’s needed by systemd, firewall, dnf etc.); fortunately even though Debian(11) installs it by default it is possible to remove it on that OS so I was able to test the examples here on a machine without python installed; although of course the tests were against Linux machines.

The advantage of ansible is that there are many operating systems that support some variant of ssh/scp/sftp even if they do not support python (yes people, there are operating systems that do not and probably never will have python ported to them; some I work on) and ansible allows modules to be written in any language so modules can in theory be written in any scripting language a target machine supports.

The basic function of an ansible playbook ‘task’ seems to be to use sftp to copy the module for a task from it’s local repository or filesystem to the target machine, also sftp a file containing all the data parameters in the playbook to the target machine, and run the script with the datafile of parameters as argument one for the module(script); at the end of the task it deletes the two files it copied to the target.

What I am not sure of is how it triggers the module to run, on *nix it probably uses ‘sh’, windoze powershell; but can it be configured to run other interpreters such as rexx/clist/tacl(or gtacl) etc. If it can then it can poke its gaping wide security hole into and manage every machine that exists in theory.

By security hole I of course just mean that it needs god-like access from ssh keys alone (you do not want 2FA for every step in a playbook) and despite decades of advice on keeping private ssh keys secure inadvertently published ones still keep popping up on forums that intend you no good; and ‘sudo’ does not care about where you connected from so anybody with that private key and access to you network is god on your infrastructure; of course you could be paranoid and have a seperate key pair per server but it you have hundreds of them a maintenance nightmare.

Anyway, my interest in ansible is primarily in if it can easily manage machines that will never have python installed on them. It is also fair to say that if I am to write anything extremely complicated and possibly destructive I would prefer to do it in bash rather than in a language like python I am not really familiar with yet.

As such I have no need of python modules and need to try to avoid letting any python modules be deployed on machines I am testing against. As noted above I managed to remove python from a Debian11 server so am able to test against that.

As all my lab servers are currently *nix servers I don’t really have the oportunity to test against all the different systems I would like to see if ansible works on them (although I might try and get netrexx or oorexx as targets on a few Linux servers).

This post is about how to write ansible modules using the bash shell rather than python.

It is also worth noting where to place your custom modules so they can be located by ansible. If on a Linux machine using ‘ansible-playbook’ from the command line for testing it is easiest to simply create a directory called ‘library’ under the directory your playbook is in, for testing that would simply be your working directory and they will be found in there. You can also use the environmanet variable ANSIBLE_LIBRARY to specify a list of locations if you want a module available to all your playbooks. Note: there are many other ansible environment variables that can be useful, refer to the ansible docs.

You should also note that the ‘ansible-doc’ command only works against python modules, and while it is recomended that you write the documentation in a Python file adjacent to the module file… don’t do that or the ansible-playbook command will try and run the .py file instead of the .sh file. Just ensure it is well documented in your site specific documentation.

While the example module shown in this post may seem a little complicated one thing you must note is that by default an ansible task will run a ‘gather_facts’ step on the target machine to populate variables such as OS family, OS version, hostname, and basically everything about the target machine. That is done by a python module so is not possible on target machines without python, so the example here sets ‘gather_facts: no’ and obtains whet it needs in the module itself as well as returning the information to the playbook for use.

It’s also a little more complicated than it needs to be in that I was curious as to how variables from the playbook would be passed if indented in the playbook under another value; they are passed as jsom if imbedded, for example

Task entries
  dest: /some/dir
  state: present

Results in the file passed as arg1 to the module on the target machine containing
  dest="/some/dir"
  state="present"

Task entries
  parms:
     dest: /some/dir
     state: present

Results in the file passed as arg1 to the module on the target machine containing
  parms="{'dest': '/tmp/hello', 'state': 'present'}"

Knowing in what format the data values are going to be provided to your script is a rather important thing to know :-). In a bash or sh script you can set the variables simply using “source $1” but you do need to know the values will be in different formats depending on indent level. My example script here will handle both of those above examples but not any further levels of indentation. There will be command line json parsers that could help on Linux but remembering I’m curious about non-linux servers I need the scripts to do everything themselves.

For my hosts file I included a ‘rocky8’, ‘debian11’, and ‘fedora32’ server. The bash example returned the expected results from all of them.

It should also be noted that the test against server name in the example playbook step always returns ‘changed’ as the ‘dummy_file.yml’ uses ‘local_action’ which seems to always return changed as true reguardless of what we stuff in stdout so the command function must return its own data fields. Where only the bash module is used we control that value.

But enough rambling. ‘cd’ to a working directory, ‘mkdir library’.

Create a file my_example_module.yml and paste in the below.

# Demo playbook to run my bash ansible module passing
# parms in 'dest' and 'state' in both ways permitted.
# 
# Note: the last example also shows how to take action based on the returned result,
# in this case pull in a yml file containing a list of additional tasks for a specific
# hostname (in my test env the ansible 'hosts' file for the test had ip-addrs only
# so the hostname test is against the returned values).
# While ansible has builtins for hostname and OS info tests such as
#    when: ansible_facts['os_family'] == "RedHat" and ansible_facts['lsb']['major_release'] | int >= 6
# that is only useful for target hosts that have python installed and you use the default 'gather_facts: yes',
# using a bash module implies the target does not have python so we use 'gather_facts: no' so we have
# to do our own tests; and it is a useful example anyway :-).
--- 
- hosts: all
  gather_facts: no
  tasks:
  - name: test direct parms
    become: false
    my_example_module:
      dest: /tmp/hello
      state: present
    register: result
  - debug: var=result
  - name: testing imbedded parms 
    become: false
    my_example_module:
      parms:
        dest: /tmp/hello
        state: present
    register: result 
  - debug: var=result
  - set_fact: target_host="{{ result.hostinfo[0].hostname}}" 
  - include: dummy_file.yml
    when: target_host == 'nagios2'
    ignore_errors: true

Change the hostname in the target_host test condition near the end of the above file to one of your hostnames, it is an example of running a tasks file for a single host and will be skipped if no hostnames match.

Create a file dummy_file.yml (used by the hostname test step) containing the below

  - name: A dummy task to test it is triggered
    local_action: "command echo '{\"changed\": false, \"msg\": \"Dummy task run\"'"
    register: result
  - debug: var=result

Create a file library/my_example_module.sh and paste in the below

#!/bin/bash 
# =====================================================================================================
#
# my_example_module.sh     - example bash ansible module 'my_example_module'
#
# Description:
#   Demonstration of writing a ansible module using the bash shell
#   (1) handles two parameters passed to it (dest, state) passed at either the
#       top level or indented under a parms: field
#   (2) returns useful host OS details (from os-release) as these are useful
#       for logic branching (ie: centos7 and centos8 have different packages
#       available so you need to know what targets modules run on).
#       ---obviously, this method is only useful for linux target hosts
#       Ansible functions such as
#            when: ansible_facts['os_family'] == "RedHat" and ansible_facts['lsb']['major_release'] | int >= 6
#       are obviously not useful for targets that do not run python where we must
#       set gather_facts: no
#
#   Items of note...
#     * all values in the playbook are passed as var=value in a file sent to
#       the remote host and passed as an argument to the script as $1 so script
#       variables can be set withs a simple 'source' command from $1
#     * data values placed immediately after the module name can be used 'as-is'
#       (see example 1) 
#           dest=/tmp/hello
#           state=present
#       however if values are imbedded they will be passed as a
#       JSON string which needs to be parsed
#       (see example 2 where values are placed under a 'parms:' tag)
#           parms='{'"'"'dest'"'"': '"'"'/tmp/hello'"'"', '"'"'state'"'"': '"'"'present'"'"'}'
#       which after the 'source' command sets the parms variable to
#           {'dest': '/tmp/hello', 'state': 'present'}
#       which needs to be parsed to extract the variables
#     * so, don't imbed more than needed for playbook readability or your script will be messy
#
#     * the 'failed' variable indicates to the ansible caller if the script has failed
#       or not so should be set by the script (failed value in ansible completion display)
#     * the 'changed' variable indicates to the ansible caller if the script has changed
#       anything on the host so should be set by the script (the changed value in ansible
#       completion display). You should set that if your module changes anything, but
#       this example has it hard coded as false in the response as the script changes nothing
#     * oh yes, the output/response must be a JSON string, if you have trouble with your
#       outputs try locating the error with https://jsonlint.com/
#
# Usage/Testing
#    Under your currect working directory 'mkdir library' and place your script
#    in there as (for this example) 'my_example_module.sh'.
#    Then as long as the ansible-playbook command is run from your working
#    directory the playbook will magically find and run module my_example_module
#    Obviously for 'production' you would have a dedicated playbook directory
#    in one of the normal locations or use an envronment variable to set the
#    ansible library path, but for testing you do want to keep it isolated to
#    your working directory path :-)
#
# Examples of use in a playbook,
#    1st example is vars at top level, 2nd is imbedded under parms:
#    We use 'gather_facts: no' as using bash modules implies that the
#    targets are servers without python installed so that would fail :-)
#
#   --- 
#   - hosts: all
#     gather_facts: no
#     tasks:
#     - name: example 1 test direct parms
#       my_module_example:
#         dest: /tmp/hello
#         state: present
#       register: result
#     - debug: var=result
#     - name: testing imbedded parms 
#       my_module_example: 
#         parms:
#           dest: /tmp/hello
#           state: present
#       register: result 
#     - debug: var=result
#
#
# Example response produced
#  {
#  	"changed": false,
#  	"failed": false,
#  	"msg": "test run, parms were dest=/somedir state=missing",
#  	"hostinfo": [{
#  		"hostname": "hawk",
#  		"osid": "rocky",
#  		"osname": "Rocky Linux",
#  		"osversion": [{
#  			"major": "8",
#  			"minor": "4"
#  		}]
#  	}]
#  }
#
# =====================================================================================================

source $1         # load all the data values passed in the temporary file
failed="false"    # default is that we have not failed

# If data was passed as a "parms:" subgroup it will be in JSON format such as the below
# {'dest': '/tmp/hello', 'state': 'present'}
# So we need to convert it to dest=xx and state=xx to set the variables
# Parsing for variable name as well as value allows them to be passed in any order
if [ "${parms}." != "." ];
then
   isjson=${parms:0:1}             # field name in playbook is parms:
   if [ "${isjson}." == '{.' ]    # If it is in json format will be {'dest': '/tmp/hello', 'state': 'present'}
   then
     f1=`echo "${parms}" | awk -F\' {'print $2'}`
     d1=`echo "${parms}" | awk -F\' {'print $4'}`
     f2=`echo "${parms}" | awk -F\' {'print $6'}`
     d2=`echo "${parms}" | awk -F\' {'print $8'}`
     export ${f1}="${d1}"     # must use 'export' or the value of f1 is treated as a command
     export ${f2}="${d2}"
   else
      failed="true"
      printf '{ "changed": false, "failed": %s, "msg": "*** Invalid parameters ***" }' "${failed}"
      exit 1
   fi
fi
# Else data was passed as direct values so will have been set by the source command, no parsing needed

# You would of course always check all expected data was provided
if [ "${dest}." == "." -o "${state}." == "." ];
then
   failed="true"
   printf '{ "changed": false, "failed": %s, "msg": "*** Missing parameters ***" }' "${failed}"
   exit 1
fi

OSHOST="$(uname -n)"                                          # Get the node name (host name)
if [ -r /etc/os-release ];
then
   # /etc/os-release is expected to have " around the values, we don't check in this
   # example but assume correct and strip them out.
   # In the real world test for all types of quotes or no quotes :-)
   OSID=`grep '^ID=' /etc/os-release | awk -F\= {'print $2'} | sed -e 's/"//g'`     # Get the OS ID (ie: "rocky")
   OSNAME=`grep '^NAME=' /etc/os-release | awk -F\= {'print $2'} | sed -e 's/"//g'` # Get the OS Name (ie: "Rocky Linux")
   osversion=`grep '^VERSION_ID=' /etc/os-release | awk -F\= {'print $2'} | sed -e 's/"//g'` # Get OS Version (ie: "8.4")
   OSVER_MAJOR=`echo "${osversion}" | awk -F. {'print $1'}`
   OSVER_MINOR=`echo "${osversion}" | awk -F. {'print $2'}`
   if [ "${OSVER_MINOR}." == "." ];   # Debian 11 (at least what I run) does't have a minor version
   then
      OSVER_MINOR="0"
   fi
   hostinfo=`printf '{"hostname": "%s", "osid": "%s", "osname": "%s", "osversion": [{"major": "%s","minor": "%s"}]}' \
            "${OSHOST}" "${OSID}" "${OSNAME}" "${OSVER_MAJOR}" "${OSVER_MINOR}"`
else
   hostinfo=`printf '{"hostname": "%s", "osid": "missing", "osname": "missing", "osversion": [{"major": "0","minor": "0"}]}' "${OSHOST}"`
fi

# Return the JSON response string with a bunch of variables we want to pass back
printf '{ "changed": false, "failed": %s, "msg": "test run, parms were dest=%s state=%s", "hostinfo": [%s] }' \
	 "${failed}" "$dest" "${state}" "${hostinfo}"
exit 0

Create a file named hosts and enter a list of the hostnames or ip-addresses you want to test against as below (using you machines ids of course). Note that the python override is required for Debian11 servers, it does no harm using it on the others.

localhost
192.168.1.177
192.168.1.187
#192.168.1.9
#test_host ansible_port=5555 ansible_host=192.168.1.9
[all:vars]
ansible_python_interpreter=/usr/bin/python3

[test_group]
localhost
192.168.1.177

Create a file names TEST.sh, simply because it is easier to run that multiple times than type in the entire command. Place into that file

ansible-playbook -i hosts my_example_module.yml
#ansible-playbook -i hosts my_example_nopython.yml

Yes the last line is commented, you have not created that file yet.

You are ready to go. Simply ‘bash TEST.sh’ and watch it run, you have your first working bash module.

Now, you are probably wondering about the commented example in the TEST.sh file above.

As mentioned I am curious as to how to use ansible to manage servers that do not have python installed, and have been thinking about how to do it.

This last example avoids the default ansible python modules, manually copies across the script/module to be executed, manually created the data input file to be used, and manually runs the ‘/bin/bash’ command to execute it, cleans up the files it copied.

While manually using the ‘/bin/bash’ command is overkill for Linux servers where you can just place at the start of the file what script execution program should run the script; it shows how you could in theory use the ‘raw’ function to invoke any script processor on the target machine.

I must point out it’s a badly written example, in that ansible is considered a configuration management tool so must have inbuilt functions to copy files from an ansible server to a managed server so having a manual ‘scp’ step is probably not necessary but I am trying to do it with as little inbuilt functions as possible for this example. Also in a managed environment you would probably not scp files from the local server but use curl/wget to pull them from a git repository; but not all operating systems support tools like wget/curl so knowing a manual scp is a way to get files there is useful.

Anyway this example copies exacly the same module as that used in the above example across to the target server, creates a data file of parms, runs the module explicitly specifying /bin/bash as the execution shell, and deletes the two copied files; just as ansible would in the background.

You could take it a lot further, for example not clean up the script file and have a prior step to see if it already existed on the target and skip the copy if it did, useful if you have a huge farm of servers and the files being moved about are large. But all that is beyong the scope of this post.

The playbook to do that is below. You can create the file my_example_nopython.yml, paste the contents below, and uncomment the line in the TEST.sh file to confirm it works. You must of course change the scriptsourcedir value to the working directory you are using, and it must be a full path; and of course change the host ip-addr used to one of your servers.

# Example of using a playbook to simulate what ansible does.
# MUST have 'gather_facts: no' if the target server does not have python3 installed as
# gathering facts is an ansible default module that is of course written in python.
#
# Obviously in the real world you would not copy scripts from a local filesystem but pull them
# from a source repository (but as examples go this you can copy and work on immediately...
# after updating the hosts (and hosts file) and script source location of course
#
# Uses the my demo bash module as the script to run so we must populate a file with the
# two data values it expects, done with a simple echo to a file in this example.
#
# NOTE: if the target server does not have python installed ansible will still happily
#       (if you disabled facts gathering which of course would cause a failure)
#       as part of it's inbuilt processing copy a python module to the target along with
#       a file containing data values and try to run the module (which of course fails);
#       that is the functionality we are duplicating here as an example, as you can
#       easily build on this to make things a lot more complicated :-)
#       (ansible probably uses sftp to copy the files, as the sftp subsystem needs to be enabled)
#       And we of course copy my example module written in bash as we want the demo to work :-)
#
# Why is a playbook like this important ?.
# Many servers that are non-Linux (even non-*nix) support some form of ssh/scp/sftp.
# Using a playbook like this can let you handle the quirks of those systems where supplied
# ansible default modules cannot.
--- 
- hosts: 192.168.1.177
  gather_facts: no
  vars:
    user: ansible
    scriptsourcedir: /home/ansible/testing/library
    scriptname: my_example_module.sh
  tasks:
  - name: copy script to remote host
    local_action: "command scp {{scriptsourcedir}}/{{scriptname}} {{user}}@{{inventory_hostname}}:/var/tmp/{{scriptname}}"
    register: result 
  - debug: var=result
  - name: create remote parm file
    raw: echo 'dest="/some/dir";state="present"' > /var/tmp/{{scriptname}}_data
    become: false
    register: result 
  - name: run remote script
    raw: /bin/bash /var/tmp/{{scriptname}} /var/tmp/{{scriptname}}_data
    become: false
    register: result 
  - debug: var=result
  - name: remove remote script
    raw: /bin/rm /var/tmp/{{scriptname}} /var/tmp/{{scriptname}}_data
    become: false

So, you have seen how to write and test a module in bash, not too complicated after all. There is one important thing you must always remember though. The output of your module must be valid JSON, get a bracket out of place and it will go splat; so two tips

  • if you end up with bad JSON output I find https://jsonlint.com/ is a quick way of finding the problem
  • if you are unsure of what data is being placed in the data value input file by ansible place a ‘sleep 30’ command in the script which gives you time on the target machine to look at the files under ~ansible/.ansible/tmp (replace ~ansible with the userid used on the target machine) and ‘cat’ the files under there to see what values are actually being set

Enjoy breaking things.

Posted in Automation, Unix | Comments Off on Writing Ansible modules using bash instead of python

Quick install of AWX (ansible-tower upstream) into Docker

AWX is the upstream open source project for ansible-tower. If you are interested in ansible-tower it is obviously a good starting point, as it is the starting point for ansible-tower. It is also free to use, not just free as in open source but in that you do not need a RedHat account to use it.

AWX now likes to be installed into a Kubernetes environment using awx-operator; that makes it damn hard to be workable in a home lab situation unless you want to spend a lot of time trying to work out what the installation configuration files are doing and changing them to make them usable (ingress/egress/dns etc).

It is far simpler to get a full working environment using Docker, even if the Docker method has been depreciated. This post explains how to get it installed and working using Docker. There are customisations needed to make it useful for a home lab which are covered here; as as you step though them you will understand why I chose Docker for my home lab, it would be a lot more work to make it usable under Kubernetes (even under MiniKube).

These instructions are valid as of 16th October 2021.

This post also assumes, or at least hopes, you have been using ansible from the command line quite happily already so you understand the last step on SSH keys :-).

But first, why would you want to install AWX ?. The only possible reason would be that you want to get an idea of how ansible-tower works, there is certainly no benefit for any small organisation or for your own home lab over the command line; and it is actually a real step backward for a home lab as you really need a spare machine with at least 6Gb ram running either docker or kubernetes (even to get it to run a playbook with a simple shell ‘ls’ command), plus it makes it a lot harder to do things as it expects by default all resources (EE containers, playbooks etc.) to be provided across the internet and never locally (not even local network)… although with a lot of effort you can change that and some of the steps are covered in this post.

Yes it does have user access controls, organisations, groups etc. to limit who can do what isung AWX or ansible-tower. It does not in any way stop anybody with half a clue from simply bypassing all that and using the command line.

However if you want to learn how to use ansible-tower without going anywhere near a RedHat subscription/licence AWX is the place to start.
And if that is what you are interested in then read on.

1. Follow the install instructions from the AWX github site

First get it up and running, thats simply a case of (almost) following the instructions for installing under Docker on the github site. The difference is do NOT use the “-b version” tag to select the latest version, for 19.4.0 at least that pulls down a git clone with no branch head and simply doesn’t work, so omit the “-b” tag and the instructions are simply…

git clone https://github.com/ansible/awx.git
cd awx
vi tools/docker-compose/inventory   # set pg_password,broadcast_websocket_secret,secret_key etc
dnf -y install ansible              # MUST be installed on the build host, used during the build
make docker-compose-build   # creates docker image quay.io/awx/awx_devel
make docker-compose COMPOSE_UP_OPTS=-d     # starts awx, postgresssql and redis containers -detached

Wait until everything is up and running and then do the next steps

docker exec tools_awx_1 make clean-ui ui-devel            # clean and build the UI
docker exec -ti tools_awx_1 awx_manage createsuperuser    # create the initial admin user, default of awx is recomended
                                                          # note: I ran the above twice to create a personal superuser id also
docker exec tools_awx_1 awx-manage create_preload_data    # optional: install the demo data (actually, it may already have been installed)

At the end of the installation you have a running AWX environment, but it has some major limitations we will be fixing below.

At this point you can use a web browser to port 8043 on the Docker host and test the superuser logon you created works, but there is more work to do before going any further.

2. Customisations needed

Do NOT use the “make” command to start it again from this point, or it will destroy any customisations you make to the yaml file by re-creating it.
The install creates persistent docker volumes for the data so after the install we just need a custom script to stop and start the environment rather than using the Makefile.

Simply create a script as below

cat << EOF > control_script.sh
#!/bin/bash
case "$1" in
	"start") cd awx_default
		docker-compose -f tools/docker-compose/_sources/docker-compose.yml up -d --remove-orphans
		;;
	"stop") cd awx_default
		docker-compose -f tools/docker-compose/_sources/docker-compose.yml down
		;;
	*) echo "Use start or stop"
		;;
esac
exit 0
EOF
chmod 755 control_script.sh

Note we are still using the generated yaml (yml) file, but by using the script rather than the “make” command it will not be overwritten.

As installed you cannot use the “manual” playbooks from the filesystem but only from a remote source repository (which has its own problems discussed in the next bit). You may also want to use your own in-house modules.

To use the manual playbooks you need to have a bind mount to the docker host filesystem, so “vi tools/docker-compose/_sources/docker-compose.yml“, in the volumes section add a additional entry to the existing long list as below

     - "/opt/awx_projects:/var/lib/awx/projects"

Then “sudo mkdir /opt/awx_projects;sudo chmod 777 /opt/awx_projects”. Ok, you would probably chown to whatever userid is assigned to UID 1000 rather than set 777 but whatever works.

After restarting the containers you can now place manual playbooks in the /opt/awx_projects directory on the Docker host and AWX will be able to find and use them from the containers /var/lib/awx/projects directory. But don’t restart the containers yet :-).

Also under the projects directory you would normally have directories for individual projects to contain the playbooks for that project. If under each of those project directories you create a new directory named /library you can place any customised or user written modules you want to run in those playbooks (ie: if you have projects/debian and projects/centos you need /projects/debian/library and projects/centos/library; it is within the individual projects levels you create the library directory not the top level projects directory level for AWX). But worth noting as that is the easiest way of implementing your own custom modules.

The second issue is the resolution of host names; it will work fine if you only want playbooks from internet source repositories and only want to manage machines that are resolvable by the world wide web DNS servers but what if you want to use local docker and source management repositories, and more importantly you need to resolve host names of machines on your internal network if you want AWX to manage them.

On the local docker image repository side; at this point AWX will not use insecure private repositories which is a bit of a pain. However to resolve your own internal hostnames you need to reconfigure Docker to use a DNS that can resolve those hostnames. That is simple in Docker, simply “vi /etc/docker/daemon.json” and insert something like the below

{
  "insecure-registries" : [ "docker-local:5000" ],
  "dns" : [ "192.168.1.181" , "192.168.1.1" , "8.8.8.8" ]
}

In the DNS list above is one of my DNS servers (I use dnsmasq) that can resolve all my server hostnames, my router and google DNS to resolve all the external sites. Customise the list for your environment.

Now at this point you will want to restart docker (to pick up the daemon.json changes) and the containers (for the new volume bind mapping), so assuming you created the control_script.sh above.

./control_script.sh stop
systemctl stop docker
systemctl start docker
./control_script.sh start

When the containers are running again “docker exec -ti tools_awx_1 /bin/sh” and try pinging one or more of your local servers to ensure name resolution is working.

3. SSH keys, credentials

If you have been using ansible normally from the command line before you will be aware that part of the implementation is to create SSH keys, the private key on the ansible ‘master’ server and the public key placed on every server to be managed. The good news is that you can just create a “credential” in AWX for the ansible user and paste that same private key into it.

When setting up the command line ansible you had to ssh to every server to be managed from the ‘master’ and reply Y to the ssh prompt; AWX handles that prompt when it tries to run a job on any server that hasn’t been done for yet which is the only time saver over command line. Having said that you still need the public key on every server to be managed so you need to setup a playbook to do that anyway (personally I use puppet for such things so the keys already existed on all my servers).

So for hosts to manage add the new host to an inventory (maybe create some custom inventories first) and in AWX go to inventories->hosts->runcommand [the run command option is only available if you go into an inventory and list the hosts, not from the hosts page] and select the ‘shell’ module and a simple command such as ‘ls’ to make sure it works (use the default awx-EE and the credentials you added the SSH key to). It should just work… although see below.

4. Performance compared to simple command line

The latest versions of AWX use “execution environments” to run commands. This involves spinning up a new container to run each command.

Great for isolation, bad for performance. A simple shell module “ls -la” command from the command line is done in seconds, from AWX it takes few minutes to do the same simple command as it needs to start an execution environment if one is not running (and even download it if there is a later version container image; tip after the first pull of the contianer image change pull to never).

Inventories, ansible hosts files

If you have been using ansible from the command line you probably have quite a few, or even one large ansible hosts file with lots of nice carefully worked out groupings.

This may be a time to split them out into seperate inventories if you have not already done so, for example a hosts file for debian, one for centos, one for rocky, one for fedora etc. depending on how many entries you want to show up in a GUI display of an inventory.

Or you could just create one inventory and store the entire hosts file into that one inventory. By store of course I mean import.

Copy your current ansible hosts file from /etc/ansible (or as many different hosts files as you have to load into seperate inventories) to the awx_tools_1 container. Create an inventory in AWX for each hosts file, and use awx-manage to import the file. The invetory must be pre-created in AWX before trying to import.

For example I created an inventory called internal_machines, did a quick “docker exec -to awx_tools_1 /bin/sh”, in the container a cd to /var/tmp and “vi xxx”; I did a cat of my existing ansible hosts file with all its hosts and groupings and pasted it into the “vi xxx” file on the container session and saved it. Then simply…

awx-manage inventory_import --inventory-name internal_machines --source xxx

My ansible hosts file was in INI format (lots of [] sections rather than json or yaml), the import command will add all the hosts and groups to the inventory (and the hosts to the hosts enries of course).

It is important to note that you should have run a test job (such as an ‘ls -la’ as mentioned above) against a server you have manually added first in order to start at least one execution environment container first; otherwise the import job will time-out waiting for the container to start and fail to import anything.

Issues

The default execution environment awx-ee:latest seems to want to download for every job even though I have the EE settings to only pull if missing. A major issue with this is that often a job will fail with being unable to pull it from quay.io ‘with failed to read blob’, repeatedly retrying will eventually work. The real major major issue is not being able to use a local insecure registry to store a local copy (and even create custom EE images) for local use which would immediately alleviate it as a major issue. And EE’s don’t seem to share images even though they are identical images, for example currently jobs run on the control-plane EE using awx-ee:latest with no problems but the AWX-EE EE using awx-ee:latest is trying to download it again.

Basically unless it becomes really easy to decouple from the internet and use local container image repositories, local git sources, local playbooks etc. its pretty useless unless you are happy with a 2Gb download every time you want to run a simple “ls” command. Expect the same issues with Ansible-Tower which wants to use RedHat repositories, you need to decouple or at least make your local sources of everything the first place searched.

Irritations

It is easy to mess up groupings when trying to manually add them, and if you do a job that runs and ends with “no matching hosts found” that job is marked as a sucessfull job which I personally thing should be a failure.
Under both “schedules” and “management jobs” only the supplied maintenance jobs are available and there does’t seem to be an easy way to add user schedules, therefore it seems pointless to go to all this work when for automation the command line interface run via cron provides better automation.

AWX and it’s downstream ansible-tower provide good audit trails on what defined to the tower application user changed what and ran what within the tower but there is of course no auditing within the tower of who changed what git playbooks (although git will record that) and of course manual playbooks on the filesystem are by their nature unaudited. Sooo, auditing is not really a good reason to use it. Also it has a large overhead, needs a minimum of 2cpus and 4gb of memry (AWX does, ansible-tower needs more) just to run.

And how to import custom modules I’m still looking for as far as AWX goes; easy to use custom modules from the command line. As mentioned earlier in the post you can create a ‘library’ directory for each project, you could also inject into the docker run command the ansible environment variable to specify library search path and create another volume bind to that directory on your host, but really you would want your modules in the same git repository as your playbooks… but as this post has shown how to setup project playbooks on the local filesystem I suppose it’s not really relevant to this post.

Of course AWX/Tower allows creation of individual users/groups/projects/organisations… just like a well laid out filesystem structure and use of the unix users and groups files. Of course reporting on such a layout seems a bit tricky in AWX where on a filesystem you could just use “tree” to display the projects structure and contents.

Far easier to run ansible from the command line using filesystem based playbooks (my filesystem playbooks are git managed on the host so no advantage in using AWX or ansible-tower there) which has absolutely no overhead requirements other than whatever memory a terminal or ssh session takes. Plus no requirement on Docker or Kubernetes (no execution environment containers), just small fast and simple and if you want regular jobs scheduled by cron.

However if I can one day get insecure registries working for EE images, and figure out how to get playbooks and custom modules deployed from my local gitlab server (rather than having to use the filesystem volume mount) it may be useful; and I might even update this post with how. However by useful I mean as a learning experience, my home lab doesn’t have a spare 2Gb memory I can dedicate to actually using it when I find puppetserver so much more powerfull.

Posted in Automation, Unix | Comments Off on Quick install of AWX (ansible-tower upstream) into Docker

Differences between Docker and Kubernetes from a container viewpoint

This post is not about the differences between Docker and Kubernetes; it is about the differences needed for containers. And why you should not expect any container from dockerhub.io to run under Kubernetes.

The main difference discussed here is that

  • Docker and Docker Swarm run containers from a ‘root’ environment by default
  • Kubernetes containers run under the user that started Kubernetes or as defined by the RunasUser and RunasGroup parameters in the deployment yaml file

For Docker this allows complex applications with components running under multiple users to be run from a container, as the container sartup script is run as root the startup script can ‘su’ down to each of the neeed users to start those components. As long as any UID needed for bind mounts is documented that works well.

The same startup logic cannot be used for Kubernetes containers. The container startup script must be owned and executable by the RunasUser/RunasGroup pair and as a general rule all application components would be started by that one user. This helps in keeping containers designed for one purpose as simple and as small as possible (microservice) but makes running an entire application with more than one component in a container difficult.

Many container images designed for docker require to be run in a root environment. Should you try to run an image that has the startup script secured to root or user X with a RunasUser of Y under kubernetes the pod will just go into CrashLoopBackOff status because of permission denied on the script.

It is certanly possible to design containers to run under both environments. What makes it difficult is that there is no environment variable set for a container by docker of kubernetes to let the container know where it is running; not a major issue as you can simply provide your own.

From personal experience with my own small containers designed for docker that I wanted to move only used ‘su’ to move from root to the user I wanted the app to run under, the conversion to support both is simply to pass a custom environment variable to the container if running under kubernetes and if the variable exists assume the RunasUser/RunasGroup were set correctly and just start the app; if not present assume docker and ‘su’ down to the expected user to start the app. Actually as I was writing this post I just thought an easier way would be to simply check if the script is running as root and determine the environment that way, which I am going to use instead :-).

It is certainly possible to start apps under multiple users under Kubernetes; you would simply add ‘sudoers’ entries for the RunasUser to allow it to run the startup commands under the correct user. But if the container is designed to run under either engine would have sudoer entries in the container not required by docker and containers are audited these days :-).

For me personally I only want to build one container per app and have it run under either.

If you are relying on external container images just remember images on dockerhub.io are for Docker, don’t expect them to even start on Kubernetes. Images built explicitly for Kubernetes are not intended to be run as root so should not be run under Docker.

If you are developing your own containers and use both engines, design the container to be able to run under both. All that is needed is the awareness of the fact Docker will run startup scripts as root and Kubernetes prefers not to.

This post is primarily because containers I was moving from docker to minikube were going into CrashLoopBackOff and after sorting that out thought it may be of interest to others as it doesn’t seem to be highlighted anywhere.

Posted in Automation, Unix | Comments Off on Differences between Docker and Kubernetes from a container viewpoint