When puppetserver master CA expires

One issue with puppetserver CE is; the damn CA and key expire

OK. That is obviously a good idea; but it is a real pain to sort out.

Everything here is for puppetserver version 7.8.0 and puppet agents at versions 7.17.0, 6.26.0, 5.5.20 (alma and debian11, rocky and fedora33 respectively).

Some of the issues I had to get around were that Debian and RHEL family servers seem to have the certs (including copies of the expired ones) in different places and I have one server with a different version of the puppet agent that… you guessed it has them in a different place. Throw in a couple of servers where the puppet agent is downversion at 5.x and the locations chage again.

Of course the major isue for most people is that it is very difficult to find how to rebuild a new CA and puppetserver certificate; lots of pointless google hits before I found the solutions.

Oh and the biggest issue was it took a while to determine what the problem was; the first error was simply the message “Error: Could not run: stack level too deep” from a ‘puppet agent –test’ request; for those reading this post from searching on the error message it probably means your puppetserver CA cert has finally expired, which I did not find at all obvious from that error message.

Anyway, agents cache the expired certificate from puppetserver in different places depending on OS and puppet agent version, likewise the agent keys. I could have made a smarter playbook to use ‘puppet config print | grep -i ssldir’ on all the servers; but to hell with that complexity. If a ‘rm -rf’ is done on a directory that does not exist it does no harm so I just chose to swat every posible directory… because I did not want to do it manually as I had 10 VMs to sort out (and you may have more).

Fortunately I had used puppet earlier to deploy ansible, so all servers with a puppet agent had the userid, ssh keys, and my extremely restricted sudoers.d file for ansible deployed already; so I could use that to sort out all my servers (although I will have to revisit the restricrtions as the ‘rm’ paths are not as tightly locked down as I thought).

As I had to clean-up multiple servers it was easiest to do it using ansible (actually it wasn’t; I probably spent longer getting the playbooks working than it would have taken to do it manually on each server, but next time it will just be a few commands).

Basically for the cleanup to work all puppet agemts must be stopped, if even one is left running it could post a cert request that would stop a new puppetserver CA from being created.

So I have used three playbooks, one to stop all puppet agents (and puppetserver when that host is in the inventory) and delete all agent certs, the second to stop puppetserver and delete all certs it knows about plus create a new CA and certificate, and the third to restrt the agents. If you (correctly) do not have autosign configured you will need to manually sign the cert requests from the agents.

But if you have the issue described here, and need to regenerate the CA and certs, even if you do not use ansible you can pull the commands required from the three playbooks here… just remember that before running the commands in the second playbook ALL agents on all servers that run puppet agents must be stopped.

The shell script I use to run the playbooks showing the correct order

ansible-playbook -i ./hosts --limit always_up ./wipe_puppet_certs_part1.yml
ansible-playbook -i ./hosts --limit puppet ./wipe_puppet_certs_part2.yml
ansible-playbook -i ./hosts --limit always_up ./wipe_puppet_certs_part3.yml

Playbook 1 – stop agents and erase their certs

---
- name: Stop all puppet agents and wipe their certificates
  hosts: all
  vars:
    puppet_master: "puppet"
  tasks:
    - name: Stop puppet master if puppetserver host
      become: "yes"
      command: "systemctl stop puppetserver"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Stop puppet agent
      become: "yes"
      command: "systemctl stop puppet"
      ignore_errors: yes

      # SCREAM the below does not delete the files on all agent servers, no bloody idea why
      # maually stopping puppet, copy/paste the rm command, start puppet; and its all ok
      # but the entire point is not to do it manually
      # The issue is the below will not work
      #      /bin/rm -rf /etc/puppetlabs/puppet/ssl/*
      # The below will work; but have to rely on puppet to recreate the directory
      #      /bin/rm -rf /etc/puppetlabs/puppet/ssl
      # Ansible must do some nasty expansion that screws it up with the /*.
    - name: Delete puppet agent certs dir 1
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppet/ssl"
      ignore_errors: yes

    - name: Delete puppet agent certs dir 2
      become: "yes"
      command: "/bin/rm -rf /var/lib/puppet/ssl"
      ignore_errors: yes

    - name: Delete puppet agent certs dir 3
      become: "yes"
      command: "/bin/rm -rf /etc/puppet/ssl"
      ignore_errors: yes

    - name: Delete puppet agent certs dir 3
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppetserver/ca"
      ignore_errors: yes

Playbook 2 – on puppetserver host only stop puppetserver, erase existing certs, create new ones, start puppetserver. Use your domain name in the alt-name.

---
- name: Force recreation of puppet master CA
  hosts: all
  vars:
    puppet_master: "puppet"
  tasks:
    - name: Stop puppet master
      become: "yes"
      command: "systemctl stop puppetserver" 
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Erase puppetserver certs on puppet master
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppetserver/ca"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Erase any local agent certs on puppet master
      become: "yes"
      command: "/bin/rm -rf /etc/puppetlabs/puppet/ssl"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Create new puppet master CA
      become: "yes"
      command: "/opt/puppetlabs/bin/puppetserver ca setup"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Create new puppet master certificate
      become: "yes"
      command: "/opt/puppetlabs/bin/puppetserver ca generate --certname puppet --subject-alt-names puppet.yourdomain.org --ca-client"
      when: inventory_hostname == puppet_master
      ignore_errors: yes

    - name: Start puppet master
      become: "yes"
      command: "systemctl start puppetserver" 
      when: inventory_hostname == puppet_master
      ignore_errors: yes

Playbook 3 – start agents, they will generate new certs and signing rerquests

---
- name: Start all puppet agents
  hosts: all
  vars:
  tasks:
    - name: Start puppet agent 
      become: "yes"
      command: "systemctl start puppet"
      ignore_errors: yes

Then if you are not using autosign (which you should not be) use on the puppetserver host ‘puppetserver ca list’ and ‘puppetserver ca sign -certname=xxxx’ to sign the cert requests from the agents.

And some additional notes for V5.x agents

There is an additional step if you have any agents in the 5.x (and possibly 6.x) range. In puppetserver version 7 the certificates are chained and version 5.x servers cannot handle that, they only retrieve the first key in the chain and connot autheniticate it. Documented https://puppet.com/docs/puppetserver/5.3/intermediate_ca_configuration.html but in the simplest terms you must copy the entire CA key to each 5.x version puppet agent manually. You must also set chaining to ‘leaf’ or you will still get lots of certificate verification failed errors.

Puppet is setup so that old keys are cached, so old agents were able to update their personal server keys and keep working until now, but we have just recreated all the keys so the new keys have to be copied to the older version servers and they need to be configured not to do a full chain check they can never complete.

Ideally the cert would be copied from the puppetserver machine, but all V7 agents seem to retrieve the entire certificate so if your ansible host is running with a recent puppet agent version the below playbook will work to get those old V5.x agents working again. It is basically just the steps from the webpage document reference above put into a playbook so I don’t have to do it manually on all servers. Note: you may need to reply ‘y’ to fingerprint prompts for the scp step as ansible likes to use sftp rather than scp (as I ran the first three on all servers with no issue but still got a prompt for one of mine when it ran the scp in this playbook).

---
- name: Copy CA keys to old version 5 agents
  hosts: oldhost1,oldhost2
  vars:
    user: ansible
  tasks:
    - name: Copy new CA key to V5.2 puppet agents
      local_action: "command scp /etc/puppetlabs/puppet/ssl/certs/ca.pem {{user}}@{{inventory_hostname}}:/var/tmp/ca.pem"
      ignore_errors: yes
    - name: Install key on V5.2 puppet agents
      become: "yes"
      command: "/bin/mv /var/tmp/ca.pem /etc/puppet/ssl/certs/ca.pem"
      ignore_errors: yes
    - name: Alter cert revocation handling
      become: "yes"
      command: "puppet config set --section main certificate_revocation leaf"
      ignore_errors: yes
    - name: Restart puppet agent 
      become: "yes"
      command: "systemctl restart puppet"
      ignore_errors: yes

I have a few extra lines in the bash file I use to run the playbooks, just to be abosolutely sure I only hit the servers that are 5.x for that last additional playbook.

ansible-playbook -i ./hosts --limit oldhost1 ./wipe_puppet_certs_part4.yml
ansible-playbook -i ./hosts --limit oldhost2 ./wipe_puppet_certs_part4.yml

And thats it. Everything should be working again.

About mark

At work, been working on Tandems for around 30yrs (programming + sysadmin), plus AIX and Solaris sysadmin also thrown in during the last 20yrs; also about 5yrs on MVS (mainly operations and automation but also smp/e work). At home I have been using linux for decades. Programming background is commercially in TAL/COBOL/SCOBOL/C(Tandem); 370 assembler(MVS); C, perl and shell scripting in *nix; and Microsoft Macro Assembler(windows).
This entry was posted in Automation. Bookmark the permalink.