Gitea – the new on the block home lab git server

Lots of posts on youtube recently about using Gitea as a home lab git repository system. I got interested in looking at that further as it supposedly has very low resource requirements. It may not be a ‘new kid’ as it has probably been around a while without me hearing about it until videos about it started trending on youtube.

I am looking at this from the point of view of trying to find a solution with minimal resource requirements, as in a home lab every Gb of memory saved could be another VM started to test something in.

Most of those videos on gitea are about running it in a docker container environment; where it will magically install itself and create the mysql database container and docker volumes needed as part of the docker compose. It seems relatively simple to use that way. But that seems a bit resource intensive as well.

I found a much simpler way, the project also provides standalone binaries which are very tiny (compared to docker images) and even easier to setup. While you can if you want use external databases such as mariadb or postgesql for a home lab it is much simpler to simply use the sqlite3 database it supports embedded.

So a quick summary would be that for home lab use, or anywhere with very few users, where you do not need an external high performance database, Gitea uses very few resources and works well.

What I was using – gitlab-ce

What I have been using until now was the gitlab-ce docker image, which while working well had the following issues

  • Needed a VM with 6Gb memory assigned, and if left running for a while at one point used 2Gb swap before I restarted the container
  • takes quite a few minutes for the container to start up and become ‘healthy’, sometimes needs a bounce to get it there
  • Has to be updated frequently, and as the image is huge if you miss few updates and have to incrementally apply them by pulling one image version at a time, running it, getting the next version etc. it can take a while
  • while most updates work ok, I have had to for a few of them delete everything and recreate from scratch
[root@gitlab gitea]# docker image list
REPOSITORY                  TAG                   IMAGE ID       CREATED       SIZE
gitlab/gitlab-ce            latest                e71188eb2659   3 weeks ago   3.63GB
gitlab/gitlab-ce            old                   3f416ff94400   7 weeks ago   3.71GB

And Gitea on the same VM

Remembering that I am not comparing apples to apples here, I am using the standalone binary rather than a docker container for Gitea. The docker implementation will also use a database in the container and a docker volume which are not needed for the standalone binary, the binary can use an external database but I chose not to as that is overkill for a small implementation so I use the inbuilt sqlite3 one.

  • in the same VM with 6Gb memory assigned… top showed 4Gb available memory, so I have resized the VM down 2Gb and no issues (leaving 2Gb headroom for other containers I start as needed in that VM)
  • starts up in seconds
  • comapared to docker image sizes the binary is tiny
[root@gitlab gitea]# ls -la bin
total 110528
drwxr-xr-x. 2 gitea gitea      4096 Jun 17 17:08 .
drwx------. 7 gitea gitea      4096 Jun 17 18:17 ..
lrwxrwxrwx. 1 gitea gitea        24 Jun 17 17:08 gitea -> gitea-1.24.0-linux-amd64
-rwxr-xr-x. 1 gitea gitea 113168376 Jun 17 17:04 gitea-1.24.0-linux-amd64
[root@gitlab gitea]#

How I installed and configured it

The basic instructions for obtaining and using the Gitea binary are at https://docs.gitea.com/installation/install-from-binary at time of writing this. The major differences between the instructions and what I did were to put everything under the gitea user directory

  1. The user I created was gitea, with a home directory of /home/gitea
  2. where there are instructions to created the needed filesystem structure I removed the leading / and created everything under the /home/gitea directory (so for me GITEA_WORK_DIR=/var/lib/gitea/ becomes GITEA_WORK_DIR=/home/gitea/var/lib/gitea/ for example)
  3. I also created a /home/gitea/bin directory to put the binary file in. It is recomended that the binary be called “gitea” to save complications when upgrading; as you can see above I just used a link so upgrades (yet to be attempted) just need changing the link to the new binary
  4. and of course all references to /etc/gitea/app.ini I changed to /home/gitea/etc/gitea/app.ini in the command line for testing and in the service file when I was happy with it

Once you are ready to start it, just run the binary (with the updated config file of course) and point a web browser at port 3000 to get the initial config screen. On the config screen…

  • I selected sqlite3 in the dropdown as the database to use (no need for external databases to chew up resources); it creates the sqlite database under the /home/gitea/var… structure I used
  • Helpful tip

    If you do not already use it investigate the use of a .ssh/config file :-) where for a hostname you can configure the remote port and local private key identity file to use for the hostname; so for in my examples here I can “git push” to origion “gitea-local” and ssh to “gitea-server” (no entry so uses default port 22) without needing to remember what keys or ports to use (and I would say still document them but that is what the config file does); just remember to have a default with nothing at the end.

    Host gitea-local
      IdentityFile ~/.ssh/id_ed25519_local_gitlab
      Port 5522
    Host github.com
      IdentityFile ~/.ssh/id_ed25519_github_20250506
      Port 22
      User git
    Host router
       KexAlgorithms +diffie-hellman-group1-sha1
       Ciphers +aes128-cbc
    Host *
    
    
  • I also set the SSH port to 5522 as of course I need 22 for SSH (grin), but see the issue I found with SSH below)
  • Note that you have to expand the tab near the bottom of that initial config screen to setup the inital admin user
  • then just click the button and wait a little and for me everything was setup, apart from the issues below

The only real issue I had was even though I had set SSH to port to 5522 in the initial configuration screen by default the SSH server did not start, you have to manually add to the app.ini file in the [server] section “START_SSH_SERVER = true” for it to start. After that it worked perfectly well.

The other minor issue was just annoying, it was logging a lot of messages for /metrics URL not found, which it kept doing even after I added the below to the app.ini file.

[metrics]
ENABLED = false
ENABLED_ISSUE_BY_LABEL = false
ENABLED_ISSUE_BY_REPOSITORY = false

however they were informational, and as I want this to be lightweight and not fill up my filesystem I changed the app.ini log section from Info to warn, additionally from console to file. The messages tend to imply the docker images also include prometheus to collect stats from /metrics, I do not need stats so have not looked further into that (for this post anyway).

After it was running I just created new repos in Gitea to match what I had been using in gitlab-ce, and in the filesystem of my dev machine ‘git remote remove origion” and used the “git add remote origion…” command shown when the repo was created (almost the same, I removed the port number, as noted I use .ssh/config to select ports and keys to use as I just have so many of them) to change where they pointed and did an initial push; and it just worked.

What I have not tested

I have not had to upgrade the gitea binary to a new relase as of course I used the latest; so that procedure is untested. And of course I have no idea if I will eventually hit a limit on the size of the sqlite database, although as this has been so simple to setup just fire up a second copy.

Some of the videos on youtube discuss actions and runners, where in the docker solution another container can be started as a ‘runner’ of a specific OS/distro when things change to do stuff. I have not looked at any of that stuff yet so do not know if it requires a docker environment; not high on my priority list as I never used those functions in gitlab-ce anyway as I have Jenkins but the functions may be useful as a replacement if available in the binary Gitea (there are tabs/selectios to set them up, I just have not looked into what those do yet).

Summary

Gitea, the binary install anyway, is very low on resource needs and starts/restarts in seconds. Any upgraded needed will not take GBs of data to download which is another big win. If you want to host your own local git repository this is ideal.

Why you might want to host your own repository ?. You of course all use git for anything important in your local filesystems to manage changes, and I do test my backup/restores often, but being able to push changes to a local git repository (on a seperate machine of course) is another good backup for that important stuff.
I do have stuff on github, but having a local repository allows for lots of test branches that I may never want to push there, and a lot of ‘site only’ stuff it would be pointless to put there that still needs to be managed so for me a local repository is needed. You may never have a need for one… but this was so simple/easy to setup you probably should.

Posted in Home Life | Leave a comment

The latest version of RDO OpenStack is a pain

This is for the caracal release on Centos9 Stream as documened at https://www.rdoproject.org/deploy/packstack/ for the packstack install.

First off, I have not been able to get this working yet. So this post is just covering the first problem you are going to hit.

network service not found

For those that may have more patience than me there is an important step missing in the install instructions, which are for CentOS Stream 9, which does not support the old network scripts.
Where it says

sudo systemctl disable firewalld;
sudo systemctl stop firewalld;
sudo systemctl disable NetworkManager;
sudo systemctl stop NetworkManager;
sudo systemctl enable network;
sudo systemctl start network

It is important to note that will not work; CentOS 9 does not support legacy networking.
In later steps it adds a repository, too late, you need to do that first. What it needs to say is

sudo dnf config-manager --enable crb
sudo dnf install -y centos-release-openstack-caracal
sudo dnf update -y
sudo dnf -y install openstack-network-scripts  # the legacy network scripts
sudo systemctl disable firewalld;
sudo systemctl stop firewalld;
sudo systemctl disable NetworkManager;
sudo systemctl stop NetworkManager;
sudo systemctl enable network;
sudo systemctl start network

However be aware that will not magically create the ifcfg-interface files you need (NetworkManager no longer used those legacy file locations, and has even changed file syntax from the old legacy format in the files it does create so you cannot jusy copy them). So if you do not create them yourself you will have no network configured after a reboot.

It is actually trivial to create them.
Do a “ip a” to get the interface name and MAC address, as mentioned in the doc you do need a static ip-address so this example is for a static ip-address. This example, edited for your interface/MAC/ipaddr of course, must be in a file /etc/sysconfig/network-scripts/ifcfg-interface where interface is the name of your interface (enps01/eth1 etc not the MAC).

TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
NAME=enp1s0
UUID=if-you-know-uuid-insert-it-else-trust-name+device
DEVICE=enp1s0
ONBOOT=yes
IPADDR=static-ipaddr-to-use
PREFIX=24
GATEWAY=ipaddr-of-your-gateway-router
DNS1=your-internal-dns-ipaddrs
DNS2=182.168.1.nnn 
DOMAIN="your.internal.domain"
IPV6_DISABLED=yes

I would also suggest before running packstack may need to “chmod 755 /etc/rc.d/rc.local” as during my packstack run lots of “is not marked executable, skipping” messages were logged for that file.

Then reboot, make sure networking is a you want, and carry on to run packstack.

Why this post stops at that issue

I do not believe it is possible to get this working easily as for starters the Horizon dashboard install is missing dependencies that prevent it being used at all; and I do not intend to debug missing django.core dependencies.

The nova network component is also in a loop logging warning messages, that I could probably resolve but as there is no dashboard why bother; one issue at a time.

Horizon logs show only the below which is why trying to logon to the dashboard fails; I can only assume this is a packaging error that will be fixed at some point.

   AttributeError: module 'django.core.cache.backends.memcached' has no attribute 'MemcachedCache'

Nova logs are full of, and wil eventually fill the filesystem with the messages below. Warning and not errors, but I would prefer my disk space to be used for image and instance volumes not warning messages. There is probably a config file setting that will fix it.

   025-05-12 14:49:18.146 2057 WARNING nova.virt.libvirt.driver [None req-5be22791-044d-4bda-af49-b0192f9f0adf - - - - - -] This host appears to have multiple sockets per NUMA node. The `socket` PCI NUMA affinity will not be supported.

I simply do not have time to look into sorting these out at the moment.
But the “network” service not being available is the first issue anyone will hit on this newish caracal release by RDO so I hope it helps some of you move on.

Posted in Home Life | Comments Off on The latest version of RDO OpenStack is a pain

Jellyfin media server, and using remote disks

Have I mentioned I have discovered Jellyfin ?

Everyone wants their own home media server to avoid having to locate that elusive DVD you want to watch again, which may have degraded to the point it is unwatchable anyway so should have been backed up to disk in the first place. Of course backing it up to disk may still mean you have to move a disk to the TV you want to watch it on, so still moving things around. A media server solves that.

There are quite a few about, Kodi has existed for what seems like forever and is probably the gold standard, Plex has been around for a while, and I have found Jellyfin.

I chose to use remote external hard drives to host my media and I had to work through some issues for that so I have a section on the issues you will hit and how to work around them at the end of this post, they were not issues with Jellyfin but with permissions.

A bonus for me is that the Jellyfin client application can be installed on Chromecast devices (I have two) and Android tablets (I have one) and smart android based TVs which can access your Jellyfin media server just using your home network internal 192.168.x.x address (note: Android tablets can cast to the chromecast using the Jellyfin app, the the Linux Jellyfin app cannot… and web browsers cannot cast to chromecast on internal network addresses either so probably the google library was used; lots of internet search results saying that is because google requires you to cast to a DNS name secured by https (presumably via its DNS servers) so having a client installable on the chromecast (or smart TV) that can locate the media only by local ip-address was essential to me).

Note: google searches while investigating the case issue show PLEX is not affected… because PLEX routed through the PLEX servers; not sure exactly how that works but I wanted something that could use 192.168.1.nnn addresses and not go near the internet so chose Jellyfin.

Jellyfin and other similar media library implementations will helpfully try to locate movie covers, cast biographies, and a lot of other stuff by querying internet servers such as IMDB for every movie or show you add. To get any useful information from those queries it relies on you having a directory stucture and naming standard each solution expects and none of us tend to backup in the expected naming standard so expect a lot of folder renaming and file relocations just to get it to look like you spent some time on it and expect to have to locate DVD covers manually, Jellyfin will try to work out how to display your media backups as they are if possible and also has a “folder” view that can be enabled, but that is not as pretty as taking the time into sorting them into movies and shows (assuming you already backup in categories here).

I would say if you are copying in a huge archive of backups to a media server for the first time let it do the lookups (in my case it populated about 5% of my titles) and then turn external lookups off and just enter everything yourself going forward; a bit of effort in locating media covers and movie info but easier than trying to rename a decades worth or archives and constantlt rescanning.

Anyway, why did I chose Jellyfin over the others

  • It has a damn very footprint. I created a new Debian12 VM with only 1Gb memory assigned, with 4Tb of movies and shows (on a remote external disk) for it to scan/manage and it is using zero swap… but I did create a new VM rather and install natively than use a container (jellyfin provide a docker container) as that best suited how I wanted to use it
  • Even if naming standards are not followed it can still in many cases work out shows and movies
  • They have implemented a optional “folder view” facility, not brilliant but it can find and handle some edge cases with a lot of directories to traverse; not especially well but it beats having to rename a decades worth of collected backups
  • Easy to manually edit images and metadata to put DVD cover pictures and year of release etc into the displays, if you remember to tick the store images locally option for each library
  • There is a jellyfin client that can be installed on ChromeCast devices (and presumably any android TV) that can directly use the Jellyfin server by internal network ipaddress
  • Media can be added via local directories, nfs or samba (you really should read my notes on remote disks below first)
  • It can be installed on Debian12 simply by running the install script, full instructions at https://jellyfin.org/docs/general/installation/linux/
  • , after which updates are pulled as part of a normal apt update

How I decided to use it

I chose a VM rather than a container as a lot of information (apart from ticking save images in folders and some metadta) is not stored in the directories on the media library disk themselves and as it takes a lot of effort to locate and setup images I would not want to lose all that info which I would if I deleted a container in order to start it on another machine; moving a VM disk to another machine will not lose any info (moving a VM disk I find easier than snapshotting containers and volumes and trying to move them about). I also store a copy of the VM disk on each media external disk.

While Jellyfin allows the use of remote disks within the application itself (I think it can mount samba/CIFS directories itself) I decided to use only local filesystem mount points such as /mnt/media1 /mnt/media2 etc and mount remote disks manually so I have control over what remote disks are mounted and can move the external disks between machines as needed without needing to reconfigure jellyfin as the application will always refer to the same mountpoints (plus I would prefer to use NFS rather than CIFS); so both the disks and VM are portable and do not need to be on the same physical machine :-). Be sure to read my notes on Samba/CIFS and NFS issues you will hit below.

Another advantage of using a VM is that I can keep a resonably up to date copy of the VM disk image on the external media disk itself so as long as I backup the media disk I can spin it up anywhere (yes a snapshot of a container could have been taken if I had gone the container way; containers are a lot harder to upgrade without losing data however).

So my key requirements were

  • must not need to go anywhere near the internet, all streaming to be confined to my 192.168.1.x network
  • must have a small footprint (I have so many VMs I am maxing out all my dedicated VM machines)
  • must be usable on ChromeCast [due to casting limitations of the google APIs that means must have a client app]
  • must allow me to move external library disks about; I’m exhausting available USB slots in physical machines and despite whatever anyone tells you USB powered disks (even on on a powered USB hub) just won’t work well on a USM hub, so I have to move them about machines as needed
  • must be easy to setup and use

On the last… while easy to setup and use it requires a specific directory structure which my backup disks do not use. As my backup disks are in a structure that makes sense to me and I choose not to change those I needed another disk to re-layout files for Jellyfin (actually another two, backup everything !, a copy of the VM image is also backed up on the disks providing the libraries which is another advantage of using a VM).

Also on the easy to use the web interface can also easily update DVD cover images from localy saved images (ie: from imdb that may have media cover shots or something snapped from a camera) and easy to edit the metadata fields.

While I am a strong believer in backup everything identically; out of pure habbit all my initial testing was on a 2TB external disk LUKS encrypted with an ext4 filesystem (nfs mounted to the Jellyfin VM). When I needed to move to a 4Tb I left it as FAT32 so I could test for any issues with CIFS/Samba and see how it compared to NFS (result: CIFS works OK, NFS seems to be faster and have less pauses; all my interfaces and switches are 1Gb but the chomecasts attached to the TVs are using wireless, so the only difference was a change from NFS to CIFS ands I only see pausing using CIFS but the pausing is infrequent so certainly usable.

OK, so how to do it

Create a new Debian12 VM (as noted only 1Gb of memory is needed, I allocated 2 vCPUs but it can probably work with 1); then refer to https://jellyfin.org/docs/general/installation/linux/ for the install script, and after it is installed for my use I must “systemctl stop jellyfin;systemctl disable jellyfin”.

The stop and diable of the jellyfin service you may be curious about… remember my media libraries are on external disks that may be on remote machines (always remote for a VM actually), I need to mount them onto /mnt/mediaN for my setup before manually starting jellyfin on a VM reboot. One important note here is that if a “apt upgrade” also upgrades jelleyfin it is set to enabled and started so I have to remember to disable it again, and make sure the disk is mounted so it is available when the upgrade starts jellyfin.

Then copy some files in using the Movies/Shows filesystem structure; “systemctl start jellyfin”; use the Web interface to add Libraries (I kept shows and movies in seperate libraries as recommended). Jellyfin will scan the libraries as they are added but it will take a while.

To add additional media later you can just copy them into the folders already added to libraries (I had turned off the automatic regular scans for new files but it either does it daily anyway or I missed turning it off in at least one library I copied files into later). You can either wait for them to be detected over time or from the Web interface dashboard request a library scan queued.

Oh yes, you should also create a new user/admin id and remove the default. You can also add additional non-admin users such as kids in which case you can limit what media libraries they can see… I found that pointles as I hit remember me on the chromecast app to automatically log on as me each time anyway but if you have kids and R+ media you would probably not do that and logoff each time and make them use their own userids.

Thats it. You can now watch media using the web browser interface. Obviously you want a lot more than that so install the official Jellyfin client application on your tablets, TVs and chromecasts… note when you install the client apps they expect your jellyfin server to be running and will search for it on your home network at install time at which point I find just confirming the ipaddr found is correct and manually entering username/password works better
that trying to run back to the admin gui to look for some hot-plug OK button that never seems to appear. From personal experience I find the client app really useful on the chromecast/smart-TVs, but on small tablet screens the app is painful and a web browser is better although having said that I do not use a tablet for viewing and only tried that to see what it looked like,

Then enjoy the benefit of having your media available anywhere… and what I really like is just being able to pause and back-arrow out of something I am watching in the living room and exit the app, and starting the app in the bedroom and going to the “continue watching” section and resume where I left off; don’t know how I lived without it.

And a seperate section on the issues of using remote disks

Disk labels

Lots of external disks have default disk labels with spaces in the name (ie: “One Touch” for seagate one touch drives); that is difficult to manage for entries in fstab, exports and samba. Install gparted, unmount the disk, use gparted on the partition (ie: gparted /dev/sdc2) right click on the partition and change the label; this does it non-destructively. That is a lot esier than trying to figure out where quotes and backslashes are needed in configuration files… and not all config files allow spaces at all.

Disk mount locations, permissions and issues

What you must remember for all these example discussed here is that how I have decided to use Jellyfin is with a remote disk (as my Jellyfin instance runs in a VM the disk will always be remote even if plugged into the same physical machine as the VM) and to avoid having to reconfigure the VM whenever the disk is moved Jellyfin within the VM is always configured to look for its libraries at /mnt/media1. All discussions and examples are related to getting the disk available at /mnt/media1 and writeable by the jellyfin user in the VM.

It should also probably be noted that as things like image uploads seem to be stored in locations within the VM rather than the media disk by default [do check the store images in folders option on every library you create to alter that] the media disk(s) probably does not have to be writeable for Jellyfin use but as I had issues with permissions with CIFS shares everything here discusses writeable as it will probably be helpful to know for other unrelated projects as well.

On Linux machines manually plugging in an external disk will normally place it under the logged on users /media/username directory (Debian) or /run.media/username (rhel based). for FAT32 disks (the default for large external HDs) all the files will be set to the ownership of the logged on user; for EXT4 disks I think ownership of permissions is treated as for a normal ext4 filesystem but of course only the logged on user can traverse the initial /media/username directory path.

On Linux machines a FAT32 disk/directory must be exported using Samba (NFS cannot export FAT32). Samba can probably also export an EXT4 filesystem so you may think it easier just to go with Samba; just bear in mind EXT4/NFS is faster and in my experience more stable.
On Windows machines even though disks may get a different drive letter they seem to remember when a drectory has been set to shared on them, but it always pays to check each time.

If you plug in your “Library” disk to a Windows machine it must be FAT32/NTFS and can only be shared via CIFS. If plugging into a Linux machine you can use EXT4 shared by either NFS or Samba or if a FAT32 disk it can only be shared by Samba.

My preference is for EXT4 filesystems as I like to LUKS encrypt all my external drives. I also dislike the need to install and configure Samba on each machine I might want to plug the external disk into when I already have NFS on all the Linux ones anyway.

The main issues with using EXT4 filesystems is that they must be attached to a Linux machne and user ownership and permissions are correctly maintained; an issue in that the Jellyfin processes are not going to be running under the same userid you used to populate the files on your disk, if mounted using samba that can be bypassed (see Samba notes below) but if NFS mounting you must change ownership of all the directories and files to the jellyfin user, which makes it difficult to add additional files using your own userid.

The issues with FAT32 filesystems are that while they can be plugged into both Windows and Linux machines while easy to share from a Windows machine on Linux you will have to install and configure Samba on each machine you might plug it into. You must remember to override the ownership of the remote mount as discussed below on Samba mounts.

On the Jellyfin VM you need to install either the NFS or CIFS tools (or both) depending on what you will use.

Using NFS mounts

You must ensure all the directories on the remote disk are traversable/updateable by the Jellyfin user. But actulally mounting a EXT4 disk using NFS is simple.

Examples are for a EXT4 filesystem with a disk label of JELLYFIN so it is mounted under /media/mark, my Library folders are all under an Exported directory.

An /etc/exports entry on the Linux server exporting the disk (the Jellyfin VM is named jellyfin), when updating/changing it remember to “systemctl restart nfs-mountd.service”.

/media/mark/JELLYFIN/Exported jellyfin(rw,sync,no_subtree_check,no_root_squash)

Also on the server exporting the disk (only has to be done once) “firewall-cmd –add-service nfs;firewall-cmd –add-service nfs –permanent”.

An /etc/fstab entry on the Jellyfin VM, when updating fstab remember to “systemctl daemon-reload”. Note that I use noauto as I choose to manually mount my remote media.

vmhost3:/media/mark/JELLYFIN/Exported  /mnt/media1   nfs noauto,nofail,noexec,nosuid,noatime,nolock,intr,tcp,actimeo=1800 0 0

Using CIFS/Samba mounts to a disk on a Windows machine

Examples are for a FAT32 filesystem, my Library folders are all under an Exported directory.

On the Windows machine use file explorer to select the Exported directory, right click on it and select properties, select the sharing tab and share the directory with a name of Exported.

Critical notes: to avoid having to enter Windows user credentials each mount use a creditials file as shown below; and even more importantly the mounted files will be default only be updateable by root so use the the fstab CIFS mount options uid/gid to set owndership (as far as the VM mounting it is concerned) to the jellyfin UID and GID (values for you install can be obtained by grepping jellyfin from passwd and group). With those set correctly a “ls -la” of the mounted filesystem will show owner:group as jellyfin which is required.

An /etc/fstab entry on the Jellyfin VM, when updating fstab remember to “systemctl daemon-reload”.

//192.168.1.178/Exported /mnt/media1 cifs noauto,uid=103,gid=110,credentials=/home/jellyfin/smb_credfile_windows.txt

An example of a credentials file for a Win10 Home machine being used to “share” the directory

username=windozeuser
password=userpassword
domain=

Using CIFS/Samba mounts to a disk on a Linux machine running Samba

Obviously the first step on the machine you will be plugging the disk into would be “apt install samba -y” (or dnf install if on a rhel type OS).

Then “systemctl stop smbd;systemctl disable smbd”. Required as not only do we need to edit the config remember that we are using external disks that may not be plugged in so lets not allow Samba to automatically start.
At this time may as well also “firewall-cmd –add-service samba;firewall-cmd –add-service samba –permanent”.

Critical notes: remembering that externally mounted disks are normally mounted under /media/username and only username can traverse the path you do not want any defaults when the directory is shared by samba to anonymous users (anonymous users will be treated as user nobody which will not have permissions which is why normally Samba mounts are on world writeable directories or secured to a group/user in smbpasswd but we do not want to waste time with that here), so you must use the force_user entry to the user that owns the files on the server which in my case is always going to me me (mark) for disks under /media/mark.

Example of share needed to be added to /etc/samba/smbd.conf (after changes “systemctl restart smbd”).

[Exported]
path = /media/mark/JELLYFIN/Exported
browseable = yes
writable = yes
read only = no
guest ok = yes
force user = mark

Critical notes: to avoid having to enter Windows user credentials each mount use a creditials file as shown below; and even more importantly the mounted files will be default not be uodateable by the jellyfin user so use the the fstab CIFS mount options uid/gid to set ownership (as far as the VM mounting it is concerned) to the jellyfin UID and GID (values for you install can be obtained by grepping jellyfin from passwd and group). With those set correctly a “ls -la” of the mounted filesystem will show owner:group as jellyfin which is required.

An /etc/fstab entry on the Jellyfin VM, when updating fstab remember to “systemctl daemon-reload”.

//192.168.1.179/Exported /mnt/media1 cifs noauto,uid=103,gid=110,credentials=/home/jellyfin/smb_credfile_samba.txt

An example of a credentials file for a Linux machine with default samba setup (default domain is WORKGROUP) and the Exported directory above (Yes it does prompt for user root and an empty password (on Debian12 with the default setup anyway; probably need to use smbpasswd to setup groups/users but not for this jellyfin post)

username=some_valid_linux_userid
password=
domain=WORKGROUP

CIFS/Samba troubleshooting notes

On the client server you are going to mount onto you must “apt install cifs-utils” (DFebian12, rhel may use a different package name). It installs a lot of stuff so you may want to remove it again after testing.

To list shares available on the remote server using/testing a credential file

smbclient --authentication-file=/home/jellyfin/smb_credfile_samba.txt --list 192.168.1.179

To list shares available on the remote without a credential file (will be prompted for empty ROOT password)

smbclient --list 192.168.1.179

To manually mount/test a mount works before adding it to fstab
Below prompts for password for ROOT@ipaddr, which is just enter anyway

mount -t cifs -o uid=103,gid=110 //192.168.1.179/Exported /mnt/media1

Or with a credential file with root and blank password skips that prompt

mount -t cifs -o uid=103,gid=110,credentials=/home/jellyfin/smb_credfile_samba.txt //192.168.1.179/Exported /mnt/media1

On the Samba server side logs are kept in /var/log/samba and any mount errors if the client was able to contact the server will be in a file named log.servername (so in my case log.jellyfin as jellyfin is the client servername) or if the server name is not resolveable log.ipaddress if you have left the rest of the default samba configuration file untouched.

Update: 09Apr2024
I have discovered using the default FAT32 file system that comes on standard 4Gb external disks either doesn’t like large files or Linux doesn’t play well with that filesystem type; I for some reason (porability) decided to use the default filesystem on a seasgate 4Tb external drive mounted to Linux; a directory containing large files became corrupt [Linux showed “d?????????”; plugging it into a windows machine windows found no errors but when trying to delete the damaged directory windows even though file explorer showed it, it also said the directory did not exist and may have been moved when trying to delete; none if the hits on solutions a google google found on that windows error (there were many, this issue seems to be common for FAT32) worked].
A disk format fixed it, which is why you need to backup your media disk. So I would recomend using EXT4 over FAT32, however I myself are still using FAT32 and CIFS because as it does not have all the consistency blocks, inodes, journalling (and LUKS encryption stuff] of EXT4 it means CIFS disks can store more media files (which is a noticable space differentce on a 4TB drive). My backup drive is 4TB LUKS+EXT4 and it is a few 100GB short of being able to hold all the data on my “live” FAT32 drive.
Up to you of course.

Update: 06Jul2025
There does seem to be a file limit somewhere. I hit a point where the jellyfin logs showed new media I was added was being scanned by ffmpeg but the media never made it to the point of being visible to the application. The only way I could esolve that was turing off all the external lookup checkboxes in the libraries after which the new media files were presented to the application correctly. Irritating but metadata can be added maually anyway.

Posted in Home Life | Comments Off on Jellyfin media server, and using remote disks

Apache Guacamole terminal server – self hosted

Apache Guacamole is a terminal server you can run in-house to provide access to pretty much any server you are running using nothing but a web-browser. It can be installed onto a physical host or VM, or run as docker containers. Installed natively you will need at least 4Gb of memory and a couple of CPUs; installed as containers it will run happily on a host with 2Gb memory (I have run it in 1.5Gb OKwith minimal swap usage if it is dedicated to that) and a couple of virtual CPUs.

Useful for those that do not already have tools to provide the ability to SSH, VNC or RDP into remote machines; it does not yet support the spice protocol so not really suitable for KVM users unless you want to reconfigre them from the default of spice to vnc (not as simple as it sounds and if you are a Linux user which you are if using KVM you are then you already use remote-viewer that supports both spice and vnc for access to all your VM consoles).

For windows clients that may not have all the opensource tools Linux users have or those that simply like the idea of accessing everything via a web browser this is also useful.

There are lots of Videos on YouTube showing how easy Apache Guacamole is to setup and use. Interestingly none of those videos mention any of the pitfalls or issues you can expect from using it; so I will cover a few of those here.

If you have not already implemented remote access from your desktop to everything in your home lab this can be very useful, although in all honesty it is only easy to setup if you have already setup remote access to everything already. The reason I say it is easier if you already have setup remote access in other ways already is covered in the available connection types available quickly listed here

    1. Kubernetes, I have not tries to figure this one out yet
    2. Telnet, nobody uses that to logon anymore
    3. RDP access to windows machines, is obviously going to be easier if you have already configured your windows machines to accept RDP connections
    4. VNC access to windows machines, is obviously going to be easier if you have already configured your windows machines to accept VNC connections (I use TigerVNC to remote access my windows machines as RDP is only available in windows professional/enterprise editions not home editions)
    5. SSH access to remote machines, is obviously easier if you already have a existing private key on your desktop for your user that you already use to SSH into your remote machines, so you can simply add the username and paste the existing key when defining SSH sessions for your user (which is not good practice, but I have put my thoughts on that below)
    6. SPICE is not yet implemented, this is bad if you use KVM virtual machines as by default all console sessions for KVM instances default to SPICE [which you have of course configured to fixed ports] (and I have a few notes on that below as well)

    My notes on point E SSH are that while it is probably best practice to have a private key per machine the user is likely to logon to with the public key copied to all the servers the user is likely to want to ssh to from each of those machines apart from it eventually becoming unmanageable… it does not actually work like that which is why private keys should be secured tightly, you can copy the private key to any machine and it will work.

    For a home lab I will in most cases have one private key per user (not machine) and use puppet to push the public key to all the servers (and puppet to push the private key to ‘desktop’ roles the user is in) so I can change the keys in a few minutes if needed… at the *nix level; which is where applications start making it messy…

    Example1: AWX (or ansible tower), as I do have a few ansible scripts I ran from the command line (ssh keys deployed by puppet of course) when I started playing with AWX I just imported the same SSH key used from the command line into AWX and it all just worked

    Example2: Guacamole (this post), I just copy/pasted the private key text into each host connection entry for my username, and SSH from Guacamole just worked [so make sure for those cases no definitions are “shared” as user/key should not be shared]

    Why are the examples relevant ?, because in both those examples the SSH keys are stored in databases now, not just in the users .ssh directory, using a tool like puppet or ansible to change keys globally is no longer an option. Possibly an ansible script could walk through the Guacamole SQL database searching for records containing the username and updating the key (it is in every user SSH connection definition); other scripts for every other application that wants to use SSH… becoming unmanageable again. For Guacamole you could configure user/password and set all your servers to allow password logons across SSH but passwords expire more often than keys.

    My thoughts on this: Just something to be aware of… lots of youtube videos on how easy Guacamole is to setup, none on the issues you will encounter as you periodically change keys and have to update every connection entry.

    My notes on point F SPICE, when using VMs under KVM the default console type is SPICE, and of course you edit your virsh definitions to use a fixed port per VM and open firewall ports to the host for the connection, so from your remote desktop when things go wrong and you cannot ssh in you can connect to the VM console from your desktop with a simple “remote-viewer spice://hostname:portnumber –display $DISPLAY” to fix the problem.

    It is possible to use “virsh edit” to play with the “graphics” section to change the KVM console settings from SPICE to VNC, in which case ytou would simply change spice to vnc for remote-viewer and use “remote-viewer vnc://hostname:portnumber –display $DISPLAY” to access the console from the command line. And when it is configured for VNC Guacamole can connect to it.

    However changing from SPICE to VNC gets more complicated with every update to KVM, SPICE now has a lot of hooks into USB virtual devices so it is not simple a case of changing the “graphics” section from spice to vnc anymore, and delting all the USB entries tied to SPICE devices can leave you with no mouse when you VNC into a remote Gnome session which is pretty fatal for troubleshooting. But it can be done.

    My thoughts on this: VNC has been implemented which is nice, RDP has been implemented which is only available on Windows professional editions (not available on Windows Home editions) but SPICE which is the default for all KVM VMs has not been implemented and not considered a priority so while it is nice Apache Guacamole is free remember it seems targeted at large commercial users (nobody at home has Windows Professional with RDP or running kubernetes) and most Linux users use KVM and spice. For home windows users TigerVNC server installed on your windows machines will work with the VNC connection type [ without Guacamole just using the command line “vncviewer -geometry 1366×768 windows-machine-ipaddr” works to test that ]; and there are a couple of developers working on providing SPICE to Guacamole (thanks folks) so it may be available one day, but at the moment if you use KVM Guacamole is not for you it you want it for console connections to KVM.

    I did mention earlier in the post it is easier to use Guacamole if you already had an environment for already remotely connection to everything. What made it easy for me to setup was my existing environment, which of course it all run from a terminal session under Gnome to provide the display; my environment is…

    • All my KVM instances do not use port “auto” for consoles, but have explicit port numbers assigned (and firewall ports opened). The consoles whether spice or vnc are remotely accessable from my desktops for those occasions when the server stops accepting ssh sessions (those that are VNC can be added to Guacamole easily; those that are SPICE cannot).I just have a shell script “vmmenu” that lets me select any VM console on any of my VM hosts that I want to connect to
    • I have a single private SSH key for my personal userid on my desktop(s), public key deployed to all Linux servers and VMs by puppet-ce; so I can already SSH to any server. Configuring Guacamole to SSH to the servers was simply a case of using the same userid and pasting the same private key into the connection entryI can ssh into any server without needing Guacamole
    • I only have Windows Home Edition on my dual-boot laptop, Windows Home Edition does not allow RDP into it. TigerVMC however is installed on it and “vncviewer” as mentioned above can remotely control that just as well as RDP could.Note: TigerVNC allows multiple connections to the same session, noticed only when I was playing with Guacamole as I had both a Guacamole (in a web browser) and vncviewer connection active in different windows at the same time, and could see a mouse move from one interface move in the other; whether that is good or bad is up to you. You should configure each user session so each user has a different port, but you do not have to apparently :-)

    So with my environment the advantages Guacamole gives me are… none. The disadvantages are that webservers (whether firefox, chrome, brave…) all chew through unreasonable amounts of real memory (and swap) on any Linux desktop.

    It is also fair to say you would never open up a Guacamole server to be accesable from the internet. Maybe if you had a VPN into your environment ? (which you would need to resolve hostnames/ipaddrs anyway).

    Soooo… I am not sure who all those YouTube videos on how wonderful it is are aimed at. Nor where this solution is aimed, it is not at Linux users as SPICE is needed for anyone that uses KVM, not at home users as Windows Home edition does not have RDP, not at corporate users as having ‘connection’ properties where for every connection properties are at a user level is not feasible (not that I would ever want anything to go near windows directory/ldap as that immediately breaks everything).

    My thoughts: I have no idea what target this tool is being aimed at, or what problem it is trying to solve as at the moment it is partially useful for a home lab environment but it works for what has been implemented. Additional protocols and authentication methods may (probably will) be added over time but I am still unsure of the target audience [ to allow access to remote servers using only a web browser… for those clients that do not already have clients for rdp/ssh/vnc/telnet/kubernetes (I cannot think of a single client OS that does not already have a client for the protocol) ].

    But it works, so lets contnue on.

    Fast start setup using the container docker implementation

    1. Create a new VM, min 2CPU and 2Gb Memory (I used Debian as the OS)
      This is not covered here as you should already know how to create VMs
    2. Install docker
      This is not covered here as you should already know how to install docker (not podman)
    3. Install mariadb or mysql
      This is not covered here, you should know how to install this from the existing repos and it is different depending on whether you are using a rhel or debian family OS
    4. Use the provided container to generate the SQL required to create the DB, and create the DB
    5. Run it
    6. logon, (otional change admim user/password,) create a group and user, and start creating connections

  1. Step D – create the mariadb/mysql database

    A container is provided that generates the DB schema; generate that, use mysql to create a database and permissions and then run the generated schema.

    docker run --rm guacamole/guacamole /opt/guacamole/bin/initdb.sh --mysql > initdb.sql
    mysql -u root -p      [reply to password prompt]
        CREATE DATABASE guacamole_db;
        CREATE USER 'guacamole_user'@'%' IDENTIFIED BY 'guacamole_pw';
        GRANT SELECT,INSERT,UPDATE,DELETE ON guacamole_db.* TO 'guacamole_user'@'%';
        FLUSH PRIVILEGES;
        USE guacamole_db;
        \. initdb.sql
        exit
    

    Step E – Run it

    The only thing to note is that the “guacd” must be started first so the other container can link to it. You must also of course ensure the DB paramaters match what you created just above.

    docker run --name mark-guacd -d guacamole/guacd
    docker run --name mark-guacamole \
        --link mark-guacd:guacd        \
        -e REMOTE_IP_VALVE_ENABLED=TRUE \
        -e MYSQL_HOSTNAME="192.168.1.189"  \
        -e MYSQL_PORT=3306 \
        -e MYSQL_DATABASE="guacamole_db" \
        -e MYSQL_USER="guacamole_user" \
        -e MYSQL_PASSWORD="guacamole_pw" \
        -d -p 8080:8080 guacamole/guacamole
    

    Thats it, when a “docker container list” shows they have gone from starting to started it is ready for use, simple yes ?.

    Step F – Use it

    Point your web-browser at your host port 8080 (remember to open ythe firewall port on your host) and you must have the guacamile/ part of the url… in the example above the URL would he http://192.168.1.189:8080/guacamole/ and logon with the defaults of user “guacadmin” with the password “guacadmin”. At that point you should create a group and your own admin user and delete the default one.

    Logon, under your “name” on the right top of the window is a “settings” option, use that to start creating “connections” to your servers.

    For ssh connections as I mentioned earlier I prefer to use a username and private ssh key, user/password may work instead. As I only have Windows Home edition (does not provide RDP server) I use TigerVNC server on windows machines and VNC in to port 5900 which works fine. For KVM consoles my graphics section of each KVM listens on explicit ports and ipaddrs (not “auto”) and a few I have converted from spice to VNC but the bulk of them I will wait until Guacamole supports the default KVM spice protocol and stick to remote-viewer rather than use Guacamole.

    Leave settings and go back to the home screen to start using the connections.

    One usage note: when you connect the browser tab is dedicated to that connection, start another tab/window to the guacamole URL to start a connection to an additional machine (repeat as needed). Then you can just switch between the tabs to switch between your active connections. On the main URL page where it shows a list of machines you are connected to… ignore that as it does not show all the active connections in the pretty picture section, you need to look at the list a little further down the page.

    Guacamole showing two active connections

Posted in Unix, Virtual Machines | Comments Off on Apache Guacamole terminal server – self hosted

Accessing EXT4 drives from Windows10

People are stil asking how to do this in forums today; presumably mostly dual boot uses that want to look at their Linux partiton.

This post is primarily for EXT4 filesystems as those are the ones used when installed supplied (–online) images into WSL2.

So there are two use cases. (1) Those that have a dual boot environment on a single machine, (2) those that wany Windows to remote mount filesystems from remote Linux machine.

Local mounting, everything on the one machine

For users with dual boot systems on the same physical hard drive

Trying to mount a ext4 partition from Windows on a dual boot machine when the linux partition is on the same HDD/SSD as the Windows partition seems to have limited optiions.

  • Commercial: Paragon software offers ExtFS for Windows. It allows you to read and write ext2 ext3 and ext4 from all Windows OS versions ( ref: http://www.paragon-software.com/home/extfs-windows/ )
  • Free and Commercial: Linux Reader from Diskinternals, that can mount all ext, HFS and ReiserFS too, but read-only in the free version at least ( ref: http://www.diskinternals.com/linux-reader/ ). You can copy files off, writing back may need the PRO cmmercial version
  • Free (GPL) solutions for “read only” on github, https://github.com/bobranten/Ext4Fsd which is a fork of the earlier https://github.com/matt-wu/Ext3Fsd/releases/tag/Ext3Fsd-0.69

Obviously none of the above or similar options will work if you have wisely LUKs encrypted that Linux partition.

For users with dual boot systems with Linux on a second hard drive

Can be done from WSL which can mount physical disks (entire disks, not partitions).

You can use WSL to mount entire disks formatted with ext4, but not individual partitions, so currently that is not an option for when you have a a linux partition on the same HDD/SSD as your windows OS, but if you Linux environment is on a seperate disk you can mount it into a WSL instance.

There is microsoft documentation on how to do this at https://learn.microsoft.com/en-us/windows/wsl/wsl2-mount-disk.

Also a Youtube tutorial on doing it at https://www.youtube.com/watch?v=aX1vH1j7m7U.

However this has both pros and cons

  • pro: WSL virtual disks (and anything mounted onto them) are visible via a Windows File Explorer “Linux” tab if they exist making them easy to navigate, they are read-write so you can open and edit shell scripts with notepad for example
  • pro: the WSL Linux system is a full linux system so you can install the crypt utilities and use this method to mount LUKs encrypted partitions if they are on a sperate physical disk
  • con: if you are going to install a Linux system under WSL anyway do you really need a dual boot environment (the answer of course is yes if you intend to use it rather than play with it, ie: if you are going to store data on it it should be LUKs encrypted which I do not thing WSL supports for its virtual disks

But… if you just want to play with Linux just install into WSL which takes only a few commands and is much easier than having to partition disks and manually install an OS and edit the bootloader etc. so the issue will not exist. An example is shown below where a installed Ubuntu image is viewed using Windows File Explorer from the “Linux” tab.

Note: I assume Windows has some sort of API into WSL virtual disks to manage the ext4 file systems (a “mount” command within a WSL instance shows they are ext4 so may be all they support) and will not magically recognize a external disk with a ext4 file system that is just plugged into a USB port. I have not tried that as all my external disks are LUKs encrypted which it does not handle. I might try with a dying USB stick one day by making that ext4 to see if it does handle it.

Remote mounting, the filesystem is on a remote machine

Again multiple ways of doing this. It depends on your environment. Both options I would consider involve a lot of work.

If you want to avoid using WSL alltogether really your only option is SMB/NMB (SAMBA). Anyone who has been playing with Linux servers for a while (long before Windows10/WSL) and still has some Windows clients will have gone down this path. However while that may make it easy from a Windows user point of view as they will (should after a lot of effort) show up as available networked devices setting up Samba on your Linux servers is not trivial; you do not want to do that on a lot of machines. It also tends to break with every upgrade and need rework.

If you have an old USB attached printer on one of those remote servers then Samba and CUPS may be the only option for using the printer; with newer ip printers not so much

That solution suits an environment with very few Linux servers and a lot of Windows clients, where whoever is on the clients has no interest in knowing that the filesystems are served from a unix server. However you need to setup users on the Linux server(s) for each Windows client user, decide on a domain server, and generally do a lot of work on the Linux server side.
For someone that just wants to mount an ext4 filesystem for single use that is overkill.

The second option, if you are a lone developer with a single windows client, is to use WSL on the Windows10 machine and just NFS mount the remote filesystem you want to work with onto a directory in your WSL imstance so it shows up under the Windows File Explorer “Linux” tab as just another directory to walk into.

That also requires some work on the remote servers to setup the NFS exports; plus a few commands within the WSL instance to mount them. and never mount a directory that has symbolic links and expect it to work… for example you may think creating an exports entry of /mnt/shareme on the remote machine and placing in it symbolic links such as /mnt/shareme/marks_homedir pointing to /home/mark on the remote machine… the actual behaviour when that is mounted is that references to /mnt/shareme/marks_homedir will indeed reference files under /home/mark on the local machine not the remote one. Yes a nasty trap; do not use links.

It should be noted that the issue of symbolic links causing problems is not limited to NFS mounts; while that exact trap should not confuse Samba (which should map exact directories and not links as well anyway; but in a reference mapping case Samba would cope). I doubt there is a unix admin alive anywhere that has not done a “du -ks dirname” to check space in a directory and found it used say 100Mb then done a recursive copy “cp -rp dirname newplace” and found the target disk 100% full after a few 100Gbs of data were copied… because somewhere in the directory structure was a link pointing up a directory level which was faithfully followed up a level, reached again and followed up a level, repeat until out of disk space. (now even if mapped by Samba trying to copy the directory and everything under it would have the same loop condition).
Basically try and avoid mounting any remote directory that has symbolic links under it.

I will not discuss Samba, setting that up is a never ending task. To implement NFS mounts to a WSL instance however is simple.

  • start a powershell session
  • “wsl –list” to see what you have installed
  • if nothing then “wsl –list –online” to see what is available and “wsl –install -d nameofoneofthem” to install one
  • Always “wsl –update” to get the latest kernel
  • simply use “wsl” to drop into it
  • then “sudo apt install nfs-common”, and you have everything you need to mount remote exported filesystems which when mounted to WSL are read/write available to Windows vie the “Linux” tab in Windows File Explorer

On the remote server to make for example /home/mark available, /etc/exports would contain

/home/mark *(rw,sync,no_subtree_check,no_root_squash,insecure)

The * before the ( should be replaced by the ipaddr of the machine running WSL (not the ip assigned to the WSL instance as it is NAT’ed via the host machine so it is the host machine ipaddr that is presented to NFS on the remote machine); but * works if you don’t care who connects.

If you change or add mountpoints to /etc/exports on the remote machine you must “systemctl restart nfs-mountd.service” on the remote machine to pick up the changes.

Under WSL simply “mount -t nfs xxx.xxx.xxx.xxx:/home/mark /some/local/dir/mountpoint”. From Windows File Explorer under the “Linux” tab, under the WSL insance, it is available under /some/local/dir/mountpoint as fully read-write.

Posted in Unix, windoze | Comments Off on Accessing EXT4 drives from Windows10

Setting up a DNS (dnsmasq) server for your home network

First a what this post is and is not. It covers only using dnsmasq.

Who this post is for

It is is primarily for people that have a lot of VMs running in their
home network and are finding the time needed to keep /etc/hosts files
up to date on multiple machines.

It is is for people that want a simple home network dns resolver
as the solution rather than investigate deployment tools that could push out
hosts files to dozens of VMs/servers.

it is for people who used to have dnsmasq working and then it suddenly all
broke a few years ago.
(which is a read the docs, the way short names were handled by default completely
changed, which is good because now all clients across all OSs can expect the
same responses).

And the biggee; it is for people who have tried to set it up but have issues
with short names not resolving, SERVFAIL requests coming back, or just a
general mess as a result
. This post covers off the things that are probably
breaking in the way you have set it up.

And see the very last section on why you would want to, you may not want to bother.

Who this post is not for

it is definately not for people that want to assign known ip-addresses
to the many devices that may connect to their home network via wireless connections…
as of course they get the info for wireless connections from your wireless router which
is outside the scope of this.

That includes things like laptops that may have both a cabled static address plus a wireless
dhcp address to your network active at the sametime, the wireless connection will really
mess things up as the dhcp settings happily take precedence over you static ones.

So on to DNSMASQ itself

As I am sure you are aware dnsmasq builds it entries by reading the /etc/hosts
file in the machine it is running on (by default, you can provide other files if you wish).
That should be simple should it not ?, if your hosts file works it should also work in dnsmasq right ?.

Of course not, your hosts files are probably populated with a mix of short names and FQDNs
making it impossible for a remote client to know what format to use.

If you are reading this post because you may have issues with short names not
resolving for some machines, sometimes long names do and sometimes they do not, or
generally strange and what appears inconsistent behaviour then read on.

The key thing is that using dnsmasq to privide DNS lookup services for
your home network is that everything in your home network should be in the
same domain
.

So the issues you experience could be caused by

  • you have not configured your domain in dnsmasq
  • your servers were installed using defaults like localdomain or
    if built using dhcp before assigning a static address will be in your
    routers domain (Home, D-Link etc) instead of your home domain
  • you have made life over complicated by having both short named and FQDN names
    in the hosts file used by dnsmasq

STEP1 – configure DNSMASQ properly

First step: in any /etc/hosts file on servers running dnsmasq only have the
short names of the servers (ie: do not have entries like myserver1.myhomenet.org;
you would only have a entry for myserver1). If you have a FQDN in there you
are doing it all wrong.

Second Step: in /etc/dnsmasq.conf search for the line “#local=/localnet/” and
change it to your domain, for example “local=/myhomenet.org/” uncommented of course).

The effect of this change is that dnsmasq will append myhomenet.org to all the short
names read from the hosts file it uses (short name being anything without a trailing
dot and data). You may wonder why this is going to help if you want to lookup a short
name; read on as it is the clients fault
.
Remember to change the example myhomenet.org to your domain of course.

Third step:
In dnsmasq.conf uncomment “domain-needed” and “bogus-priv”. You do not want
short name queries forwarded to the internet;
and we will be correcting client queries later.

Fourth step:
In dnsmasq.conf uncomment “strict-order”

On your dnsmasq DNS server(s) you should have configured them to first lookup
their own ip-address (have your local dnsmasq server ip as the first nameserver as it
should be the first queried by tools like nslookup if run on your dnsmasq host;
or you will find client queries work but quesries on the dnsmasq host itself do not, so you
must also tell this dnsmasq server it can resolve names using itself),

if you do not have strict-order plus your dnsmasq server first
nameservers will
be selected from the list in resolv.conf in random
(or round robin) order
and you will end up with some queries being sent
to DNS servers that know nothing about your local domain (if step3 was not done
to the internet to resolve), and queries will randonly fail.

For those of you who may have got to playing with this in the past and got
frustrated by some queries made on the dnsmasq host itself working and some not,
omitting strict-order
is probably why, it would have been occasionally querying the upstream server
instead of itself (for queries from things like “nslookup”, tools that do
not use dns but would have used the /etc/hosts file will still have worked
masking any issue… assuming your nsswitch.conf has files as the first entry
which I think you should have in a home network, even if a clients host file
has nothing but localhost in it there is always going to be an exception.

Slightly of topic, “dig” and “nslookup” work in different ways, if “dig”
is working and “nslookup” is not you have a config error in your DNS setup.

That is it for the dnsmasq config but I bet it is still not working :-).
We have to move onto the clients.

Random tip: if your DNS search list (/etc/resolv.conf) on the server running
dnsmasq does not contain
the ip of the machine running yor dnsmasq instance and the interface it listens
on is managed by NetworkManager
the example below may help (eth0 is the
primary interface on my server and .181 is the server ip… I lock it down to
server ip, if you do not do that 127.0.0.1 may work as the first entry).
The last nameserver (.1) is my router, so with strict order it it cannot find
a server locally it will hunt the internet dns servers for it via the router
with whatver ISP defaults were set on that.

nmcli conn modify eth0 ipv4.dns 192.168.1.181,192.168.1.1
systemctl restart NetworkManager

STEP2 – configure your clients, thats the real problem

We have dnsmasq setup correctly, why are queries for short names still failing ?.
Because the clients are misconfigured of course.

Now I do not now how many servers or VMs you have setup, but if you
have accepted all the default prompts you probably have localdomain as a domain.
If you installed a VM using DHCP and later changed it to a static address it probably
still has the domain name assigned by the DHCP sever (or your router).

Guess what, that is not going to work. There are two things you can do,
manually edit /etc/resolv.conf on each server after every reboot or fix
each client server.

Do it properly, fix each client server and for future installs use the
correct domain name instead of the defaults.

You may see in some of your client /etc/resolv.conf files entries like “search localdomain” or “domain localdomain” (or on some of my VMs built from DHCP before changing to static things like Home or D-Link).

That is obviously a problem that will prevent name lookups working… (well part of it as discussed later).

When a DNS lookup is performed on a short name the client will
append to any short name (a hostname without a dot) the domain/search value from
resolv.conf to the query to make a FQDN to be looked up.

Now if you had a search value in /etc/resolv.conf on the client as used in the dnsmasq
steps as “myhomenet.org” then the short name query would work as the client will
append the correct domain part and find a match in dnsmasq which has also now been
configured to add the domain name to short names.

But if you have left install defaults like localdomain still lying around
on clients that will never work (unless of course all your servers are setup for
localdomain and you set that as the home network domain in dnsmasq).

So, first fix the server running dnsmasq, a unique client setup

This is what you want on the server running dnsmasq, not for clients in general.
The main difference is that the server(s) running dnsmasq must only reference themselves and
upstream DNS resolvers, never any other dnsmasq servers you may be running for the same
domain or you will be an an endless loop as they refer to each other as they try and
resolve a mistyped host name.

The assumption in the examples is that the interface name is eth0 and your dns server
ip address is 192.168.1.181 with the next upstream server an internet (or router) dns server.
The next upstream server is required as in many cases you will want to resolve
hostnames/URLs out in the big wide world also.

RHEL has pretty much dropped /etc/sysconfig/network-scripts and you must use
NetworkManager; NetworkManager is on most Debian servers as well although you
can still use the older network/interfaces.d files there

NetworkManager: Assuming eth0 is your interface name and we use the example myhomenet.org.

nmcli conn modify eth0 ipv4.dns-search myhomenet.org
nmcli conn modify eth0 ipv4.dns 192.168.1.181,192.168.1.1
systemctl restart NetworkManager   # applies changes to resolv.conf, no connection drop

For debian static manual interfaces in /etc/network/interfaces.d

auto eth0
iface eth0 inet static
      address 192.168.1.181
      netmask 255.255.0.0
      gateway 192.168.1.1
      dns-nameserver 192.168.1.181
      dns-nameserver 192.168.1.1
      dns-search myhomenet.org
      dns-domain myhomenet.org

When the dnsmasq server is correct, update/correct all your clients

It goes without saying that to avoid doing this often, always use the correct domain name
and dns list when building a new VM or server so it will never need to be done on those.

It is important to note a few things here about clients in general that do not run
any copies of dnsmasq themselves.

  1. they can refer to multiple dnsmasq servers on your home network so they can
    resolve names if one is down
  2. while you can also include an upstream DNS server that will probably stop
    things working correctly again, you should only search your home network dnsmasq servers

The second point above is an important one, you will remember we configured dnsmasq
to use strict order to always check the home network dnsmasq instance first.

Guess what, clients also are unlikely to use a strict order.
Depending upon what version of operating system you are running lookup operations
by the client are quite likely to round-robin through the nameserver list rather
than use strict order so can
quite easily query the upstream server instead of your dnsmasq server;
this is something else that can cause dns lookups of servers in your home network
to sometimes work and sometimes not
.

The solution I use is to have two dnsmasq servers so one is always available,
all clients only use those two for all name resolution (yes, including external)
and no client has any internet dns resolver address configured relying on the
dnsmasq server instead.

As the dnsmasq servers are configured with an upsteam dns then any external
host name they are unable to resolve themselves (ie: google.com) the dnsmasq servers
will query the external DNS and resturn the correct ip-address for the name to the client
requesting it. The client does not need an external dns entry and not having one
avioids lookup problems if a client round-robins through name servers.

Remember this is just for the dns name resolution, once the client has the ip-address
it will cache it and all traffic is from the client direct to the ip-address (unless you go
through a proxy of course).

Based on the examples for dnsmasq above with a single server providing dnsmasq,
and letting that dnsmasq server query any upstream dns resolvers on behalf of the client(s)
then a client configuration would be based on the above examples as below.
If you have a second dnsmasq servers just add it to the dns ip list,
as long as the two dnsmasq servers do not reference each other in any way to avoid
the endless lokkup loop situation.

# NetworkManager
nmcli conn modify eth0 ipv4.dns-search myhomenet.org
nmcli conn modify eth0 ipv4.dns 192.168.1.181
systemctl restart NetworkManager   # applies changes to resolv.conf, no connection drop

For debian static manual interfaces in /etc/network/interfaces.d

# Debian /etc/network/interfaces.d files (xxx is the client ipaddr)
auto eth0
iface eth0 inet static
      address 192.168.1.xxx
      netmask 255.255.0.0
      gateway 192.168.1.1
      dns-nameserver 192.168.1.181
      dns-search myhomenet.org
      dns-domain myhomenet.org

Congradulations,it now of course works

You can lookup your internal servers from any client by either short or FQDN names
using yout home network domain without
any issue; and internet names are still resolvable for all your clients.

Troubleshooting

  • “dig” works but “nslookup” does not. You have misconfigured it
  • to check a nameserver “nslookup hostnametolookup dnsserveripaddr”,
    if querying the dns server explicity by dnsserveripaddr returns the
    results expected there is nothing wrong with it, the issue is your nameserver search order

You could of course manually edit /etc/resolv.conf to correct search and nameserver
entries for testing, those changes would be lost pretty quickly.

For search this may be an option for testing.
(Ref: https://man7.org/linux/man-pages/man5/resolv.conf.5.html)
The search keyword of a system’s resolv.conf file can be
overridden on a per-process basis by setting the environment
variable LOCALDOMAIN to a space-separated list of search domains.

That depends on the client inplementing it of course.

The big question of course is why you would want to

So why would you want to run a home network DNS server ?.

The main reasons for needing a DNS server for your home network would be
that you have a lot of servers or VMs and trying to keep their hosts files
all syncronised is becoming too much effort; you want to be able to just
edit a single file and all your servers be able to use it.

This would only be an issue if you did not have any sort of deployment
infrastructure like ansible/chef/puppet that could deploy a “template”
hosts file to all your servers from a single source file; and yes I do
mean template file not a static hosts file sas each hosts file would have to
be correctly set with the ip and hostname of the server it is being deployed to
as so many things depend on that.

Now suppose you did run two dnsmasq servers, without a deployment tool
to push a central edited hosts file to both servers and restart dnsmasq
you are already editing two hosts files for every change now. Still a lot
less than the effort of doing so on every server but they could also get out
of sync if manual edits on each are required.

You should (if you have enough headroom for another small VM on a different
physical server) run two copies; if you followed the post I have made here you
will have all your clients now doing internat name resolution via your dnsmasq
servers upstream queries so if you have a single dns server and it stops
you have lost name resolution to not only your servers but to the internet
(which is not an issue, you just edit /etc/resolv.conf to insert a nameserver
line for your router or maybe one of googles nameservers to get internet access
back for that client so you can at least watch youtube while your dnsmasq server reboots).

Using something like puppet-ce or ansible would let you deploy a ‘source’
hosts file to both dnsmaqs servers so you only need to edit the file in one place,
however you could also go wild and use them to deploy the hsosts file to all
your servers negating the need for a home network dns server at all… the drawback with
the later of course being that anyone that can see your /etc/hosts file would then
know every machine in your network; best to have it on as few machines as possible.

Deployment tools have a learning curve you may not be interested in, so for
a home network dns setup I would say just run two dnsmasq servers and on only
those two servers have a rsync job that runs occasionally to check for an
updated hosts file on whatever server you want to make edits to it on. Or if you
only want to run one then there is only one file to edit so no syncing needed at all.

Once you get above 4-5 VMs manually keeping /etc/hosts files up-to-date
on each becomes a nightmare. A home network DNS resolver (or two) becomes
essential.

Hopefully this long winded post has got you past any issues you were having
with setting on up using dnsmasq.

One last note: if the changes above do not result in network or NetworkManager startup correctly setting /etc/resolv.conf then… it could simply be you do not have NetworkManager/resolved or similar service installed that updates it; in which case simply vi/nano the resolv.conf file and set the values you want. That took me ages to work out when a new VM refused to correctly set resolv.conf, until a check on the last modified time showed nothing was updating it on a reboot at all. So a new first step, see if the file is being updated on boot and if not just manually edit it.

Posted in Unix | Comments Off on Setting up a DNS (dnsmasq) server for your home network

I have upgraded a few machines from Alma8.7 to 9.2 – my notes

I saw Alma had 9.2 available this month so decided to upgrade my 8.7 versions.

My old sledgehammer method of “dnf distro-sync –releasever=nn” did not work for this. It may have if I had spent months resolving conflicting packages but there is an easier way.

This post at https://www.layerstack.com/resources/tutorials/How-to-upgrade-from-AlmaLinux8-to-AlmaLinux9 covered all the key steps. It has a few unnecessary steps and omitted a few issues you will hit that I have added here.

Look at the things to watch out for at the end of this post before deciding if you want to actually do this.

Things not mentioned in that post are you have to cleanup any old gpg-keys stored in rpm, I also had to remove one conflict package (even though pre-upgrade found no issues), and the biggee was finding that the install had inserted a grub command line option to set selinux to permissive even if /etc/selinux/config was set to enforcing. That last biggee may have resolved itself the next time you get a new kernel version but if you are using enforcing you probably do not want to wait that long.

Warning: do not do the initial step to fully dnf update your installed system. I did that on one of my systems and it upgraded it from Alma8.7 to Alma8.8; the “leapp” (as of 21May2023) only supports upgrades from 8.7 and flags 8.8 as unsupported/inhibitor for upgrade to 9.2. They may have fixed that by the time you read this of course.

Another thing of note which I have repeated in the “Things to watch out for” list is that when it reboots to do the upgrade if you have a console session you will see it drops into emergency recovery mode with the dracut prompt, do not as I first did start issuing commands to find out what has gone wrong… just wait, the upgrade wants to be in that place.

And a critical warning the legacy network configuration method of using config files in /etc/sysconfig/network-scripts depreciated in rhel 8 is now fully removed in rhel 9; if you use that method to configure the network your 9,x machine will boot without any networking configured. So you must fully convert your machine to use only NetworkManager before upgrading (and yes, I did ignore that and ended up with a machine with no network configured, but fortunately with a console). HOWEVER on one server I upgraded using network-scripts the network was not configured after the upgrade, on another el9 the legacy scripts did configure the network successfully; so I don’t know what is going on there.

Then the initial steps to upgrade are as below.

# First "vi /etc/firewalld/firewalld.conf" and set "AllowZoneDrifing" off
setenforce 0
curl https://repo.almalinux.org/elevate/testing/elevate-testing.repo -o /etc/yum.repos.d/elevate-testing.repo
rpm --import https://repo.almalinux.org/elevate/RPM-GPG-KEY-ELevate
yum install -y leapp-upgrade leapp-data-almalinux
leapp preupgrade
# READ THE REPORT AND FIX ANY ISSUES AND REPEAT preupgrade ABOVE UNTIL NO INHIBITORS
leapp upgrade
grub2-install /dev/sda   # << whatever your disk is (see Note1 below)
reboot    # DO NOT CUT/PASTE THIS LINE, you will want to check for errors first

Note1: On legacy BIOS machines the report produced in the preupgrade and upgrade steps highlights it will not upgrade grub on legacy boot machines (ie: any VM or cloud image) and needs the grub2-install run. My VM boot partition was /dev/sda1 but that was a ext4 filesystem grub2-install refused to write (or may have as it issued lots of warnings rather than errors) to so I used /dev/sda which seems to have worked (now also done for a few /dev/vda so that seems the correct approach).

After rebooting check the version in /etc/os-release is indeed 9.2 (or later is you are reading this long after I typed it in). It is important to note that the upgrade is not completed at the time of that last reboot. If you have a console session you wil see that the "leapp" app is still performing post-install activites for quite a while after it has rebooted and allowed ssh logons.

The upgrade does not upgrade anything from third party repos so you need to do that yourself after the upgrade and do another relabel.

dnf -y update
touch /.autorelabel
reboot

Things to watch out for

  • At 21May2023 if you have upgraded to Alma 8.8 (done if you have done a "dnf update lately") you cannot upgrade... not until they update the "leapp" script anyway as the last version it supports for upgrade is 8.7
  • After the upgrade you must run another dnf update as nothing from third party repos is upgraded by "leapp" (probably a wise design decision as they cannot all be tested).
  • Not all third party repositories have packages (or even repos) for 9.x yet. DNF behaviour seems to have changes to abort if a repo cannot be reached rather than skipping it and moving onto the next so unless you are faliar with -the disable-repo flag life can be painful here
  • Look through the preupgrade report and fix as many issues as you can, do not proceed until you have resolved all inhibitors
  • You must convert any legacy network config in /etc/sysconfig/network-scripts to a NetworkManager config as legacy network configuration is not supported at all in rhel 9. You can try "nmcli connection migrate" to attempt to convert your configurations but no guarantees if it is a complicated one.
  • you will find that the upgrade has inserted an enforcing=0 into your grub command line so even if /etc/selinux/config is set to enforcing it will be ignored and you will be in permissive mode . That may fix itself on the next kernel update that updates grub but you may want to fix that yourself.
  • As noted above you have to manually run another update to update from 3rd party repositories.
  • If you have any SHA1 keys stored in RPM the upgrade will be inhibited until you remove them.
  • IPTables has been depreciated and while the upgrade still leaves them available for now some packages that used to use iptables no longer have versions with iptables support (for example fail2ban was removed from my servers suring the upgrade as only firewalld versions are now available in the repos).
  • If upgrading from a machine (or VM) you have had for a while ypu may see in the preupgrade report issues like "OLD PV header" in which case you must update that before doing the upgrade. That is simply "vgck --updatemetadata YourPVNameHere"
  • And absolutely horrible things to ignore, if like me you have a console available when you do the reboot for the upgrade you will see truely horrible messages about the systen dropping into emergency recovery mode with journal errors. You have plenty of time to try and start troubleshooting as it is in recovery mode and you can happily enter commands, don't. After 2-3 minutes you will see "leapp" messages start to appear on the console as it starts upgrading.
  • On the journal, if you have a journalled filesystem you will see after the system is stable a lot more journal rotation messages logged which is annoying. If like me you have lots of nice automation rules to manage rotating and cleaning journals they will have to be revisited.

Tips: the SHA1 keys

You can display the gpg-keys in RPM with

rpm -q gpg-pubkey --qf '%{NAME}-%{VERSION}-%{RELEASE}\t%{SUMMARY}\n'

You can use the standard RPM commands to display details of each key, for example

rpm -qi gpg-pubkey-4bd6ec30-4c37bb40

You can use the output and clean up old keys, ie: if the install date was a while ago it may not be needed). If you have upgraded from other OS'es you may have a lot of old keys, for example I had gpg-pubkeys for multiple versions of Fedora and CentOS (I migrated from Fedora to CentOS, then to Alma, and all the old gpg-pubkey entries were still there)

And the stndard RPM commands to remove/erase old keys

rpm -e gpg-pubkey-4bd6ec30-4c37bb40

If you have SHA1 signed packages you absolutely must keep you can before the preupgrade and upgrade use "update-crypto-policies --set LEGACY" jst remember after you have finished the upgrade use "update-crypto-policies --set DEFAULT" to set it back to the default. You will have to do that flip-flop every time you want to do anything with packages/repos using SHA1 keys.

Posted in Unix | Comments Off on I have upgraded a few machines from Alma8.7 to 9.2 – my notes

Been a long while since I posted anything, the real world has been keeping me too busy. But here are a few things that have been irritating me lately.

Will hackers ever go away

There was an article on the register with a link to known Russian hacktivists Killnet open proxy addresses. Reference https://www.theregister.com/2023/02/06/killnet_proxy_ip_list/

I downloaded the list and compared the ip-addrs in it against the 15530 entries my automation rules have added to my blacklist filter since I last rolled it on 17 Aug 2022. There were 16 matches. Mainly just trying “/wp-login.php?action=register” or running “masscan”.

The masscan is on github at the public repo https://github.com/robertdavidgraham/masscan and can be used to scan the entire internet and warns it can cause denial of service if used for ipv6 addresses. Looks like hacker kiddies have taken to playing with it as is rather than trying anything orgional.

The list referred to by the register site has around 16,000 entries as its rather specifically targetted at the open proxies used by the russian killnet hacking team. Are they a worry ?, no, I have 15514 additional unique ip-address that have tried to hack into my server in less than a year and this is a personal web sever that gets almost no real traffic so there are a lot more hackers to worry about than that one group (more in my list now, a few more realtime ones added and a couple I manually added as snort tells me there are still losers out there trying the log4j hack).

And a Docker update broke docker

The below used to work in damon.json for docker, it stopped working after my last update and prevented docker starting.

{
  "insecure-registries" : [ "docker-local:5000" ]
}
{
  "dns": ["192.168.1.179", "192.168.1.1", "8.8.8.8"]
}

No biggee, apart from I did not notice for a while. The fix is to just edit the file to be as below.

 

{
  "insecure-registries" : [ "docker-local:5000" ],
  "dns": ["192.168.1.179", "192.168.1.1", "8.8.8.8"]
}

Yes I do run a local insecure registry for internal use. It is only exposed externally via a https proxy so not an issue.

 

Bloody Java

Interesting article on the register at https://www.theregister.com/2023/01/27/oracle_java_licensing_change/

Many of you will already have come across discussions on those licensing changes on forums or technology websites.

Oracle bless them are short of money again, and are changing the licensing terms from “per user” to “per employee”, so if you have one Java server and two Java developers you are probably not paying much at the moment; if your company has 100 employees your license costs have just gone through the roof.

Too avoid the Oracle Java oin a server side is probably not too difficult. For a Java EE engine I personally try to stick to alternatives to anything from commercial vendors such as Oracle and IBM and use Jetty as the Java EE server where possible as it has a very tiny footprint. There is also the opensource Glassfish Java EE server or Apache Tomcat Java EE server for those wanting a heavier footprint server [note: with Eclipse and Apache licenses respectively (not GPL)]. It should be noted there is at least one company providing commercial support for Glassfish (according to Wikipedia) but as a general rule you are on your own with opensource although it generally just works.

IDEs and compilers on the other hand may be an issue. There are a lot of IDEs out there that compile Java code (a useful list at https://blog.hubspot.com/website/best-java-compiler) but it is hard to determine what they use in the back-end to actually compile Java code.

It is most likely, but no guarantees, that the Java compiler packages used by most Linux systems are not provided by or directly based on anything by Oracle; that would generally be OpenJDK (jdk.java.net) and OpenJFX (https://gluonhq.com/products/javafx/) [note: OpenJDK seems to at some point feed into JavaSE non-free from Oracle (noted on the download pages on jdk.java.net that it is also available as an Oracle commercial build), not sure or care how that works but it may become an issue at some point].

Windows users will most likely have by default Oracle supplied Java runtimes and backend compilers.

I guess the key thing to note from the licensing changes is that you should avoid Oracle software because it will eventually sting you.

Posted in Home Life | Comments Off on

When puppetserver master CA expires

One issue with puppetserver CE is; the damn CA and key expire

OK. That is obviously a good idea; but it is a real pain to sort out.

Everything here is for puppetserver version 7.8.0 and puppet agents at versions 7.17.0, 6.26.0, 5.5.20 (alma and debian11, rocky and fedora33 respectively).

Some of the issues I had to get around were that Debian and RHEL family servers seem to have the certs (including copies of the expired ones) in different places and I have one server with a different version of the puppet agent that… you guessed it has them in a different place. Throw in a couple of servers where the puppet agent is downversion at 5.x and the locations chage again.

Of course the major isue for most people is that it is very difficult to find how to rebuild a new CA and puppetserver certificate; lots of pointless google hits before I found the solutions.

Oh and the biggest issue was it took a while to determine what the problem was; the first error was simply the message “Error: Could not run: stack level too deep” from a ‘puppet agent –test’ request; for those reading this post from searching on the error message it probably means your puppetserver CA cert has finally expired, which I did not find at all obvious from that error message.

Anyway, agents cache the expired certificate from puppetserver in different places depending on OS and puppet agent version, likewise the agent keys. I could have made a smarter playbook to use ‘puppet config print | grep -i ssldir’ on all the servers; but to hell with that complexity. If a ‘rm -rf’ is done on a directory that does not exist it does no harm so I just chose to swat every posible directory… because I did not want to do it manually as I had 10 VMs to sort out (and you may have more).

Fortunately I had used puppet earlier to deploy ansible, so all servers with a puppet agent had the userid, ssh keys, and my extremely restricted sudoers.d file for ansible deployed already; so I could use that to sort out all my servers (although I will have to revisit the restricrtions as the ‘rm’ paths are not as tightly locked down as I thought).

As I had to clean-up multiple servers it was easiest to do it using ansible (actually it wasn’t; I probably spent longer getting the playbooks working than it would have taken to do it manually on each server, but next time it will just be a few commands).

Basically for the cleanup to work all puppet agemts must be stopped, if even one is left running it could post a cert request that would stop a new puppetserver CA from being created.

So I have used three playbooks, one to stop all puppet agents (and puppetserver when that host is in the inventory) and delete all agent certs, the second to stop puppetserver and delete all certs it knows about plus create a new CA and certificate, and the third to restrt the agents. If you (correctly) do not have autosign configured you will need to manually sign the cert requests from the agents.

But if you have the issue described here, and need to regenerate the CA and certs, even if you do not use ansible you can pull the commands required from the three playbooks here… just remember that before running the commands in the second playbook ALL agents on all servers that run puppet agents must be stopped.

The shell script I use to run the playbooks showing the correct order

ansible-playbook -i ./hosts --limit always_up ./wipe_puppet_certs_part1.yml
ansible-playbook -i ./hosts --limit puppet ./wipe_puppet_certs_part2.yml
ansible-playbook -i ./hosts --limit always_up ./wipe_puppet_certs_part3.yml

Playbook 1 – stop agents and erase their certs

---
- name: Stop all puppet agents and wipe their certificates
  hosts: all
  vars:
    puppet_master: "puppet"
  tasks:
    - name: Stop puppet master if puppetserver host
      become: "yes"
      command: "systemctl stop puppetserver"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Stop puppet agent
      become: "yes"
      command: "systemctl stop puppet"
      ignore_errors: yes

      # SCREAM the below does not delete the files on all agent servers, no bloody idea why
      # maually stopping puppet, copy/paste the rm command, start puppet; and its all ok
      # but the entire point is not to do it manually
      # The issue is the below will not work
      #      /bin/rm -rf /etc/puppetlabs/puppet/ssl/*
      # The below will work; but have to rely on puppet to recreate the directory
      #      /bin/rm -rf /etc/puppetlabs/puppet/ssl
      # Ansible must do some nasty expansion that screws it up with the /*.
    - name: Delete puppet agent certs dir 1
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppet/ssl"
      ignore_errors: yes

    - name: Delete puppet agent certs dir 2
      become: "yes"
      command: "/bin/rm -rf /var/lib/puppet/ssl"
      ignore_errors: yes

    - name: Delete puppet agent certs dir 3
      become: "yes"
      command: "/bin/rm -rf /etc/puppet/ssl"
      ignore_errors: yes

    - name: Delete puppet agent certs dir 3
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppetserver/ca"
      ignore_errors: yes

Playbook 2 – on puppetserver host only stop puppetserver, erase existing certs, create new ones, start puppetserver. Use your domain name in the alt-name.

---
- name: Force recreation of puppet master CA
  hosts: all
  vars:
    puppet_master: "puppet"
  tasks:
    - name: Stop puppet master
      become: "yes"
      command: "systemctl stop puppetserver" 
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Erase puppetserver certs on puppet master
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppetserver/ca"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Erase any local agent certs on puppet master
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppet/ssl"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Create new puppet master CA
      become: "yes"
      command: "/opt/puppetlabs/bin/puppetserver ca setup"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Create new puppet master certificate
      become: "yes"
      command: "/opt/puppetlabs/bin/puppetserver ca generate --certname puppet --subject-alt-names puppet.yourdomain.org --ca-client"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Start puppet master
      become: "yes"
      command: "systemctl start puppetserver" 
      when: inventory_hostname == puppet_master
      ignore_errors: yes

Playbook 3 – start agents, they will generate new certs and signing rerquests

---
- name: Start all puppet agents
  hosts: all
  vars:
  tasks:
    - name: Start puppet agent 
      become: "yes"
      command: "systemctl start puppet"
      ignore_errors: yes

Then if you are not using autosign (which you should not be) use on the puppetserver host ‘puppetserver ca list’ and ‘puppetserver ca sign -certname=xxxx’ to sign the cert requests from the agents.

And some additional notes for V5.x agents

There is an additional step if you have any agents in the 5.x (and possibly 6.x) range. In puppetserver version 7 the certificates are chained and version 5.x servers cannot handle that, they only retrieve the first key in the chain and connot autheniticate it. Documented https://puppet.com/docs/puppetserver/5.3/intermediate_ca_configuration.html but in the simplest terms you must copy the entire CA key to each 5.x version puppet agent manually. You must also set chaining to ‘leaf’ or you will still get lots of certificate verification failed errors.

Puppet is setup so that old keys are cached, so old agents were able to update their personal server keys and keep working until now, but we have just recreated all the keys so the new keys have to be copied to the older version servers and they need to be configured not to do a full chain check they can never complete.

Ideally the cert would be copied from the puppetserver machine, but all V7 agents seem to retrieve the entire certificate so if your ansible host is running with a recent puppet agent version the below playbook will work to get those old V5.x agents working again. It is basically just the steps from the webpage document reference above put into a playbook so I don’t have to do it manually on all servers. Note: you may need to reply ‘y’ to fingerprint prompts for the scp step as ansible likes to use sftp rather than scp (as I ran the first three on all servers with no issue but still got a prompt for one of mine when it ran the scp in this playbook).

---
- name: Copy CA keys to old version 5 agents
  hosts: oldhost1,oldhost2
  vars:
    user: ansible
  tasks:
    - name: Copy new CA key to V5.2 puppet agents
      local_action: "command scp /etc/puppetlabs/puppet/ssl/certs/ca.pem {{user}}@{{inventory_hostname}}:/var/tmp/ca.pem"
      ignore_errors: yes
    - name: Install key on V5.2 puppet agents
      become: "yes"
      command: "/bin/mv /var/tmp/ca.pem /etc/puppet/ssl/certs/ca.pem"
      ignore_errors: yes
    - name: Alter cert revocation handling
      become: "yes"
      command: "puppet config set --section main certificate_revocation leaf"
      ignore_errors: yes
    - name: Restart puppet agent 
      become: "yes"
      command: "systemctl restart puppet"
      ignore_errors: yes

I have a few extra lines in the bash file I use to run the playbooks, just to be abosolutely sure I only hit the servers that are 5.x for that last additional playbook.

ansible-playbook -i ./hosts --limit oldhost1 ./wipe_puppet_certs_part4.yml
ansible-playbook -i ./hosts --limit oldhost2 ./wipe_puppet_certs_part4.yml

And thats it. Everything should be working again.

Posted in Automation | Comments Off on When puppetserver master CA expires

Writing Ansible modules using bash instead of python

If any of you have been paying attention I have lately been looking into ansible.

First a disclaimer, I use ‘puppetserver’ and puppet agents as since they moved away from Ruby to their own scripting language which is pretty much english it is incredibly easy to configure. If / ifelse / else syntax means I can have a simple config for say ‘desktop’ that configures a desktop selecting appropiate packages for Ubuntu/Debian/Fedora/CentOS/Rocky and for specific versions (ie: CentOS8 is missing packages that were in centOS7, debian uses completely different package names etc.) And puppet has a fantastic templating feature that maybe one day in the future ansible will be able to match.

Ansible with the playbook parsing json responses from servers can have the playbook configured to run seperate tasks depending on the results returned in previous steps but yaml files are not plain english and it doesn’t really support targets such as ‘configure desktop reguardless of OS’ and at the moment you are better off having a seperate set of playbooks per-OS type… or more simply it is not as readable or manageble yet.

Ansible is also primarily targeted at (has builtin/sipplied modules for) Linux servers although it supports windoze as well; the main issue with ansible is that most of the modules are written in python(3). In the Linux world that is not really an issue as it is almost impossible to have a Linux machine without python. On RHEL based systems it is so tightly integrated it is impossible to remove (it’s needed by systemd, firewall, dnf etc.); fortunately even though Debian(11) installs it by default it is possible to remove it on that OS so I was able to test the examples here on a machine without python installed; although of course the tests were against Linux machines.

The advantage of ansible is that there are many operating systems that support some variant of ssh/scp/sftp even if they do not support python (yes people, there are operating systems that do not and probably never will have python ported to them; some I work on) and ansible allows modules to be written in any language so modules can in theory be written in any scripting language a target machine supports.

The basic function of an ansible playbook ‘task’ seems to be to use sftp to copy the module for a task from it’s local repository or filesystem to the target machine, also sftp a file containing all the data parameters in the playbook to the target machine, and run the script with the datafile of parameters as argument one for the module(script); at the end of the task it deletes the two files it copied to the target.

What I am not sure of is how it triggers the module to run, on *nix it probably uses ‘sh’, windoze powershell; but can it be configured to run other interpreters such as rexx/clist/tacl(or gtacl) etc. If it can then it can poke its gaping wide security hole into and manage every machine that exists in theory.

By security hole I of course just mean that it needs god-like access from ssh keys alone (you do not want 2FA for every step in a playbook) and despite decades of advice on keeping private ssh keys secure inadvertently published ones still keep popping up on forums that intend you no good; and ‘sudo’ does not care about where you connected from so anybody with that private key and access to you network is god on your infrastructure; of course you could be paranoid and have a seperate key pair per server but it you have hundreds of them a maintenance nightmare.

Anyway, my interest in ansible is primarily in if it can easily manage machines that will never have python installed on them. It is also fair to say that if I am to write anything extremely complicated and possibly destructive I would prefer to do it in bash rather than in a language like python I am not really familiar with yet.

As such I have no need of python modules and need to try to avoid letting any python modules be deployed on machines I am testing against. As noted above I managed to remove python from a Debian11 server so am able to test against that.

As all my lab servers are currently *nix servers I don’t really have the oportunity to test against all the different systems I would like to see if ansible works on them (although I might try and get netrexx or oorexx as targets on a few Linux servers).

This post is about how to write ansible modules using the bash shell rather than python.

It is also worth noting where to place your custom modules so they can be located by ansible. If on a Linux machine using ‘ansible-playbook’ from the command line for testing it is easiest to simply create a directory called ‘library’ under the directory your playbook is in, for testing that would simply be your working directory and they will be found in there. You can also use the environmanet variable ANSIBLE_LIBRARY to specify a list of locations if you want a module available to all your playbooks. Note: there are many other ansible environment variables that can be useful, refer to the ansible docs.

You should also note that the ‘ansible-doc’ command only works against python modules, and while it is recomended that you write the documentation in a Python file adjacent to the module file… don’t do that or the ansible-playbook command will try and run the .py file instead of the .sh file. Just ensure it is well documented in your site specific documentation.

While the example module shown in this post may seem a little complicated one thing you must note is that by default an ansible task will run a ‘gather_facts’ step on the target machine to populate variables such as OS family, OS version, hostname, and basically everything about the target machine. That is done by a python module so is not possible on target machines without python, so the example here sets ‘gather_facts: no’ and obtains whet it needs in the module itself as well as returning the information to the playbook for use.

It’s also a little more complicated than it needs to be in that I was curious as to how variables from the playbook would be passed if indented in the playbook under another value; they are passed as jsom if imbedded, for example

Task entries
  dest: /some/dir
  state: present

Results in the file passed as arg1 to the module on the target machine containing
  dest="/some/dir"
  state="present"

Task entries
  parms:
     dest: /some/dir
     state: present

Results in the file passed as arg1 to the module on the target machine containing
  parms="{'dest': '/tmp/hello', 'state': 'present'}"

Knowing in what format the data values are going to be provided to your script is a rather important thing to know :-). In a bash or sh script you can set the variables simply using “source $1” but you do need to know the values will be in different formats depending on indent level. My example script here will handle both of those above examples but not any further levels of indentation. There will be command line json parsers that could help on Linux but remembering I’m curious about non-linux servers I need the scripts to do everything themselves.

For my hosts file I included a ‘rocky8’, ‘debian11’, and ‘fedora32’ server. The bash example returned the expected results from all of them.

It should also be noted that the test against server name in the example playbook step always returns ‘changed’ as the ‘dummy_file.yml’ uses ‘local_action’ which seems to always return changed as true reguardless of what we stuff in stdout so the command function must return its own data fields. Where only the bash module is used we control that value.

But enough rambling. ‘cd’ to a working directory, ‘mkdir library’.

Create a file my_example_module.yml and paste in the below.

# Demo playbook to run my bash ansible module passing
# parms in 'dest' and 'state' in both ways permitted.
# 
# Note: the last example also shows how to take action based on the returned result,
# in this case pull in a yml file containing a list of additional tasks for a specific
# hostname (in my test env the ansible 'hosts' file for the test had ip-addrs only
# so the hostname test is against the returned values).
# While ansible has builtins for hostname and OS info tests such as
#    when: ansible_facts['os_family'] == "RedHat" and ansible_facts['lsb']['major_release'] | int >= 6
# that is only useful for target hosts that have python installed and you use the default 'gather_facts: yes',
# using a bash module implies the target does not have python so we use 'gather_facts: no' so we have
# to do our own tests; and it is a useful example anyway :-).
--- 
- hosts: all
  gather_facts: no
  tasks:
  - name: test direct parms
    become: false
    my_example_module:
      dest: /tmp/hello
      state: present
    register: result
  - debug: var=result
  - name: testing imbedded parms 
    become: false
    my_example_module:
      parms:
        dest: /tmp/hello
        state: present
    register: result 
  - debug: var=result
  - set_fact: target_host="{{ result.hostinfo[0].hostname}}" 
  - include: dummy_file.yml
    when: target_host == 'nagios2'
    ignore_errors: true

Change the hostname in the target_host test condition near the end of the above file to one of your hostnames, it is an example of running a tasks file for a single host and will be skipped if no hostnames match.

Create a file dummy_file.yml (used by the hostname test step) containing the below

  - name: A dummy task to test it is triggered
    local_action: "command echo '{\"changed\": false, \"msg\": \"Dummy task run\"'"
    register: result
  - debug: var=result

Create a file library/my_example_module.sh and paste in the below

#!/bin/bash 
# =====================================================================================================
#
# my_example_module.sh     - example bash ansible module 'my_example_module'
#
# Description:
#   Demonstration of writing a ansible module using the bash shell
#   (1) handles two parameters passed to it (dest, state) passed at either the
#       top level or indented under a parms: field
#   (2) returns useful host OS details (from os-release) as these are useful
#       for logic branching (ie: centos7 and centos8 have different packages
#       available so you need to know what targets modules run on).
#       ---obviously, this method is only useful for linux target hosts
#       Ansible functions such as
#            when: ansible_facts['os_family'] == "RedHat" and ansible_facts['lsb']['major_release'] | int >= 6
#       are obviously not useful for targets that do not run python where we must
#       set gather_facts: no
#
#   Items of note...
#     * all values in the playbook are passed as var=value in a file sent to
#       the remote host and passed as an argument to the script as $1 so script
#       variables can be set withs a simple 'source' command from $1
#     * data values placed immediately after the module name can be used 'as-is'
#       (see example 1) 
#           dest=/tmp/hello
#           state=present
#       however if values are imbedded they will be passed as a
#       JSON string which needs to be parsed
#       (see example 2 where values are placed under a 'parms:' tag)
#           parms='{'"'"'dest'"'"': '"'"'/tmp/hello'"'"', '"'"'state'"'"': '"'"'present'"'"'}'
#       which after the 'source' command sets the parms variable to
#           {'dest': '/tmp/hello', 'state': 'present'}
#       which needs to be parsed to extract the variables
#     * so, don't imbed more than needed for playbook readability or your script will be messy
#
#     * the 'failed' variable indicates to the ansible caller if the script has failed
#       or not so should be set by the script (failed value in ansible completion display)
#     * the 'changed' variable indicates to the ansible caller if the script has changed
#       anything on the host so should be set by the script (the changed value in ansible
#       completion display). You should set that if your module changes anything, but
#       this example has it hard coded as false in the response as the script changes nothing
#     * oh yes, the output/response must be a JSON string, if you have trouble with your
#       outputs try locating the error with https://jsonlint.com/
#
# Usage/Testing
#    Under your currect working directory 'mkdir library' and place your script
#    in there as (for this example) 'my_example_module.sh'.
#    Then as long as the ansible-playbook command is run from your working
#    directory the playbook will magically find and run module my_example_module
#    Obviously for 'production' you would have a dedicated playbook directory
#    in one of the normal locations or use an envronment variable to set the
#    ansible library path, but for testing you do want to keep it isolated to
#    your working directory path :-)
#
# Examples of use in a playbook,
#    1st example is vars at top level, 2nd is imbedded under parms:
#    We use 'gather_facts: no' as using bash modules implies that the
#    targets are servers without python installed so that would fail :-)
#
#   --- 
#   - hosts: all
#     gather_facts: no
#     tasks:
#     - name: example 1 test direct parms
#       my_module_example:
#         dest: /tmp/hello
#         state: present
#       register: result
#     - debug: var=result
#     - name: testing imbedded parms 
#       my_module_example: 
#         parms:
#           dest: /tmp/hello
#           state: present
#       register: result 
#     - debug: var=result
#
#
# Example response produced
#  {
#  	"changed": false,
#  	"failed": false,
#  	"msg": "test run, parms were dest=/somedir state=missing",
#  	"hostinfo": [{
#  		"hostname": "hawk",
#  		"osid": "rocky",
#  		"osname": "Rocky Linux",
#  		"osversion": [{
#  			"major": "8",
#  			"minor": "4"
#  		}]
#  	}]
#  }
#
# =====================================================================================================

source $1         # load all the data values passed in the temporary file
failed="false"    # default is that we have not failed

# If data was passed as a "parms:" subgroup it will be in JSON format such as the below
# {'dest': '/tmp/hello', 'state': 'present'}
# So we need to convert it to dest=xx and state=xx to set the variables
# Parsing for variable name as well as value allows them to be passed in any order
if [ "${parms}." != "." ];
then
   isjson=${parms:0:1}             # field name in playbook is parms:
   if [ "${isjson}." == '{.' ]    # If it is in json format will be {'dest': '/tmp/hello', 'state': 'present'}
   then
     f1=`echo "${parms}" | awk -F\' {'print $2'}`
     d1=`echo "${parms}" | awk -F\' {'print $4'}`
     f2=`echo "${parms}" | awk -F\' {'print $6'}`
     d2=`echo "${parms}" | awk -F\' {'print $8'}`
     export ${f1}="${d1}"     # must use 'export' or the value of f1 is treated as a command
     export ${f2}="${d2}"
   else
      failed="true"
      printf '{ "changed": false, "failed": %s, "msg": "*** Invalid parameters ***" }' "${failed}"
      exit 1
   fi
fi
# Else data was passed as direct values so will have been set by the source command, no parsing needed

# You would of course always check all expected data was provided
if [ "${dest}." == "." -o "${state}." == "." ];
then
   failed="true"
   printf '{ "changed": false, "failed": %s, "msg": "*** Missing parameters ***" }' "${failed}"
   exit 1
fi

OSHOST="$(uname -n)"                                          # Get the node name (host name)
if [ -r /etc/os-release ];
then
   # /etc/os-release is expected to have " around the values, we don't check in this
   # example but assume correct and strip them out.
   # In the real world test for all types of quotes or no quotes :-)
   OSID=`grep '^ID=' /etc/os-release | awk -F\= {'print $2'} | sed -e 's/"//g'`     # Get the OS ID (ie: "rocky")
   OSNAME=`grep '^NAME=' /etc/os-release | awk -F\= {'print $2'} | sed -e 's/"//g'` # Get the OS Name (ie: "Rocky Linux")
   osversion=`grep '^VERSION_ID=' /etc/os-release | awk -F\= {'print $2'} | sed -e 's/"//g'` # Get OS Version (ie: "8.4")
   OSVER_MAJOR=`echo "${osversion}" | awk -F. {'print $1'}`
   OSVER_MINOR=`echo "${osversion}" | awk -F. {'print $2'}`
   if [ "${OSVER_MINOR}." == "." ];   # Debian 11 (at least what I run) does't have a minor version
   then
      OSVER_MINOR="0"
   fi
   hostinfo=`printf '{"hostname": "%s", "osid": "%s", "osname": "%s", "osversion": [{"major": "%s","minor": "%s"}]}' \
            "${OSHOST}" "${OSID}" "${OSNAME}" "${OSVER_MAJOR}" "${OSVER_MINOR}"`
else
   hostinfo=`printf '{"hostname": "%s", "osid": "missing", "osname": "missing", "osversion": [{"major": "0","minor": "0"}]}' "${OSHOST}"`
fi

# Return the JSON response string with a bunch of variables we want to pass back
printf '{ "changed": false, "failed": %s, "msg": "test run, parms were dest=%s state=%s", "hostinfo": [%s] }' \
	 "${failed}" "$dest" "${state}" "${hostinfo}"
exit 0

Create a file named hosts and enter a list of the hostnames or ip-addresses you want to test against as below (using you machines ids of course). Note that the python override is required for Debian11 servers, it does no harm using it on the others.

localhost
192.168.1.177
192.168.1.187
#192.168.1.9
#test_host ansible_port=5555 ansible_host=192.168.1.9
[all:vars]
ansible_python_interpreter=/usr/bin/python3

[test_group]
localhost
192.168.1.177

Create a file names TEST.sh, simply because it is easier to run that multiple times than type in the entire command. Place into that file

ansible-playbook -i hosts my_example_module.yml
#ansible-playbook -i hosts my_example_nopython.yml

Yes the last line is commented, you have not created that file yet.

You are ready to go. Simply ‘bash TEST.sh’ and watch it run, you have your first working bash module.

Now, you are probably wondering about the commented example in the TEST.sh file above.

As mentioned I am curious as to how to use ansible to manage servers that do not have python installed, and have been thinking about how to do it.

This last example avoids the default ansible python modules, manually copies across the script/module to be executed, manually created the data input file to be used, and manually runs the ‘/bin/bash’ command to execute it, cleans up the files it copied.

While manually using the ‘/bin/bash’ command is overkill for Linux servers where you can just place at the start of the file what script execution program should run the script; it shows how you could in theory use the ‘raw’ function to invoke any script processor on the target machine.

I must point out it’s a badly written example, in that ansible is considered a configuration management tool so must have inbuilt functions to copy files from an ansible server to a managed server so having a manual ‘scp’ step is probably not necessary but I am trying to do it with as little inbuilt functions as possible for this example. Also in a managed environment you would probably not scp files from the local server but use curl/wget to pull them from a git repository; but not all operating systems support tools like wget/curl so knowing a manual scp is a way to get files there is useful.

Anyway this example copies exacly the same module as that used in the above example across to the target server, creates a data file of parms, runs the module explicitly specifying /bin/bash as the execution shell, and deletes the two copied files; just as ansible would in the background.

You could take it a lot further, for example not clean up the script file and have a prior step to see if it already existed on the target and skip the copy if it did, useful if you have a huge farm of servers and the files being moved about are large. But all that is beyong the scope of this post.

The playbook to do that is below. You can create the file my_example_nopython.yml, paste the contents below, and uncomment the line in the TEST.sh file to confirm it works. You must of course change the scriptsourcedir value to the working directory you are using, and it must be a full path; and of course change the host ip-addr used to one of your servers.

# Example of using a playbook to simulate what ansible does.
# MUST have 'gather_facts: no' if the target server does not have python3 installed as
# gathering facts is an ansible default module that is of course written in python.
#
# Obviously in the real world you would not copy scripts from a local filesystem but pull them
# from a source repository (but as examples go this you can copy and work on immediately...
# after updating the hosts (and hosts file) and script source location of course
#
# Uses the my demo bash module as the script to run so we must populate a file with the
# two data values it expects, done with a simple echo to a file in this example.
#
# NOTE: if the target server does not have python installed ansible will still happily
#       (if you disabled facts gathering which of course would cause a failure)
#       as part of it's inbuilt processing copy a python module to the target along with
#       a file containing data values and try to run the module (which of course fails);
#       that is the functionality we are duplicating here as an example, as you can
#       easily build on this to make things a lot more complicated :-)
#       (ansible probably uses sftp to copy the files, as the sftp subsystem needs to be enabled)
#       And we of course copy my example module written in bash as we want the demo to work :-)
#
# Why is a playbook like this important ?.
# Many servers that are non-Linux (even non-*nix) support some form of ssh/scp/sftp.
# Using a playbook like this can let you handle the quirks of those systems where supplied
# ansible default modules cannot.
--- 
- hosts: 192.168.1.177
  gather_facts: no
  vars:
    user: ansible
    scriptsourcedir: /home/ansible/testing/library
    scriptname: my_example_module.sh
  tasks:
  - name: copy script to remote host
    local_action: "command scp {{scriptsourcedir}}/{{scriptname}} {{user}}@{{inventory_hostname}}:/var/tmp/{{scriptname}}"
    register: result 
  - debug: var=result
  - name: create remote parm file
    raw: echo 'dest="/some/dir";state="present"' > /var/tmp/{{scriptname}}_data
    become: false
    register: result 
  - name: run remote script
    raw: /bin/bash /var/tmp/{{scriptname}} /var/tmp/{{scriptname}}_data
    become: false
    register: result 
  - debug: var=result
  - name: remove remote script
    raw: /bin/rm /var/tmp/{{scriptname}} /var/tmp/{{scriptname}}_data
    become: false

So, you have seen how to write and test a module in bash, not too complicated after all. There is one important thing you must always remember though. The output of your module must be valid JSON, get a bracket out of place and it will go splat; so two tips

  • if you end up with bad JSON output I find https://jsonlint.com/ is a quick way of finding the problem
  • if you are unsure of what data is being placed in the data value input file by ansible place a ‘sleep 30’ command in the script which gives you time on the target machine to look at the files under ~ansible/.ansible/tmp (replace ~ansible with the userid used on the target machine) and ‘cat’ the files under there to see what values are actually being set

Enjoy breaking things.

Posted in Automation, Unix | Comments Off on Writing Ansible modules using bash instead of python