Recently I faced a task: to configure the most reliable cluster of PostgreSQL 9.6 servers. According to the idea, I wanted to get a cluster that survives the loss of any server, or even several servers, and is able to automatically put servers into operation after accidents. When planning a cluster, I studied many articles, both from the main documentation for PostgreSQL, and various howto, including from Habr, and tried to set up a standard cluster with RepMgr, experimented with pgpool. In general, it worked, but I periodically had problems with switching, manual intervention was required to recover from disasters, etc. In general, I decided to look for more options. As a result, somewhere (I can’t remember exactly where) I found a link to the wonderful project Zalando Patroni, and wrap everything up …
Introduction Patroni is a python daemon that allows automatic maintenance of PostgreSQL clusters with different types of replication and automatic role switching. Its special beauty, in my opinion, is that distributed DCS storages are used to keep the cluster and master elections up to date (supported by Zookeeper, etcd, Consul). Thus, the cluster can be easily integrated into almost any system, you can always find out who is the master at the moment, and the status of all servers by queries in DCS, or directly to Patroni via http. Well, it’s just beautiful 🙂
I tested the work of Patroni, tried to drop the master and other servers, tried to fill in different databases (~ 25 GB the database automatically rises from zero to 10 GB of the network in a few minutes), and in general I really liked the Patroni project. After the complete implementation of the scheme described below, I carried out testing with a simple bencher, who went to the database at a single address, and experienced the fall of all cluster elements (server master, haproxy, keepalived). The delay in transferring a role to a new master was a couple of seconds. When the former master returns to the cluster, or a new server is added, the roles are not changed. To automate the deployment of the cluster and add new servers, it was decided to use the familiar Ansible (I will give links to the resulting roles at the end of the article). Consul, which we already use, acts as a DCS. The article has two main goals: to show PostgreSQL users that there is such a wonderful thing as Patroni (there are practically no mentions in Internet in general and on github in particular), and at the same time to share a little experience of using Ansible with a simple example, to those who are just starting to work with it. I will try to explain all the action at once using the example of parsing Ansible roles and playbooks. Those who do not use Ansible can transfer all actions to their favorite automated server management tool, or perform them manually. Since most of the yaml scripts will be long, I will wrap them in a spoiler. The story will be divided into two parts – preparing the servers and deploying the cluster itself. Those who are familiar with Ansible will not be interested in the first part, so I recommend that you go directly to the second.
Part 1
For this example, I am using Centos 7 based virtual machines. Virtual machines are deployed from a template that is periodically updated (kernel, system packages), but this topic is beyond the scope of this article. I will only note that no application or server software is installed on the virtual machines in advance. Also, any cloud resources, for example, with AWS, DO, vScale, etc., are quite suitable. For them, there are scripts for dynamic inventory and integration with Ansible, or you can screw Terraform, so the whole process of creating and removing servers from scratch can be automated. First you need to create an inventory of used resources for Ansible. Ansible I have (and by default) is located in / etc / ansible. Create inventory in / etc / ansible / hosts file:
[pgsql] cluster-pgsql-01.local cluster-pgsql-02.local cluster-pgsql-03.local
We use the internal domain zone .local, so the servers have such names. Next, you need to prepare each server to install all the necessary components and working tools. For this purpose, we create a playbook in / etc / ansible / tasks:
/etc/ansible/tasks/essentialsoftware.yml
--- - name: Install essential software yum: name={{ item }} state=latest tags: software with_items: - ntpdate - bzip2 - zip - unzip - openssl-devel - mc - vim - atop - wget - mytop - screen - net-tools - rsync - psmisc - gdb - subversion - htop - bind-utils - sysstat - nano - iptraf - nethogs - ngrep - tcpdump - lm_sensors - mtr - s3cmd - psmisc - gcc - git - python2-pip - python-devel - name: install the 'Development tools' package group yum: name: "@Development tools" state: present
The Essential suite of packages is designed to create a familiar working environment on any server. The Development tools package group, some libraries -devel and python are needed by pip to build Python modules for PostgreSQL. We use virtual machines based on VmWare ESXi, and for ease of administration, we need to run the vmware agent in them. To do this, we will launch the open vmtoolsd agent, and describe its installation in a separate playbook (since not all our servers are virtual, and perhaps for some of them this task will not be needed):
/etc/ansible/tasks/open-vm-tools.yml
--- - name: Install open VM tools for VMWARE yum: name={{ item }} state=latest tags: open-vm-tools with_items: - open-vm-tools - name: VmWare service start and enabling service: name=vmtoolsd.service state=started enabled=yes tags: open-vm-tools
In order to complete the preparation of the server for the installation of the main part of the software, in our case, the following steps are required: 1) configure time sync with ntp 2) install and run zabbix agent for monitoring 3) roll the required ssh keys and authorized_keys. In order not to inflate the article too much with details not related to the cluster itself, I will briefly quote the ansible playbooks that perform these tasks:
NTP:
/etc/ansible/tasks/ntpd.yml
--- - name: setting default timezone set_fact: timezone: name=Europe/Moscow when: timezone is not defined - name: setting TZ timezone: name={{ timezone }} when: timezone is defined tags: - tz - tweaks - ntp - ntpd - name: Configurating cron for ntpdate cron: name="ntpdate" minute="*/5" job="/usr/sbin/ntpdate pool.ntp.org" tags: - tz - tweaks - ntp - ntpd - name: ntpd stop and disable service: name=ntpd state=stopped enabled=no tags: - tz - tweaks - ntp - ntpd ignore_errors: yes - name: crond restart and enabled service: name=crond state=restarted enabled=yes tags: - tz - tweaks - ntp - ntpd
First, it is checked whether a personal time zone is set for the server, and if not, then the Moscow time zone is set (we have the majority of such servers). We do not use ntpd due to problems with time drift on ESXi virtual machines, after which ntpd refuses to synchronize time. (And tinker panic 0 doesn’t help.) Therefore, we just start the ntp client cron every 5 minutes. Zabbix-agent:
/etc/ansible/tasks/zabbix.yml
--- - name: set zabbix ip external set_fact: zabbix_ip: 132.xx.xx.98 tags: zabbix - name: set zabbix ip internal set_fact: zabbix_ip: 192.168.xx.98 when: ansible_all_ipv4_addresses | ipaddr('192.168.0.0/16') tags: zabbix - name: Import Zabbix3 repo yum: name=http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/zabbix-release-3.0-1.el7.noarch.rpm state=present tags: zabbix - name: Remove old zabbix yum: name=zabbix2* state=absent tags: zabbix - name: Install zabbix-agent software yum: name={{ item }} state=latest tags: zabbix with_items: - zabbix-agent - zabbix-release - name: Creates directories file: path={{ item }} state=directory tags: - zabbix - zabbix-mysql with_items: - /etc/zabbix/externalscripts - /etc/zabbix/zabbix_agentd.d - /var/lib/zabbix - name: Copy scripts copy: src=/etc/ansible/templates/zabbix/{{ item }} dest=/etc/zabbix/externalscripts/{{ item }} owner=zabbix group=zabbix mode=0755 tags: zabbix with_items: - netstat.sh - iostat.sh - iostat2.sh - iostat_collect.sh - iostat_parse.sh - php_workers_discovery.sh - name: Copy .my.cnf copy: src=/etc/ansible/files/mysql/.my.cnf dest=/var/lib/zabbix/.my.cnf owner=zabbix group=zabbix mode=0700 tags: - zabbix - zabbix-mysql - name: remove default configs file: path={{ item }} state=absent tags: zabbix with_items: - /etc/zabbix_agentd.conf - /etc/zabbix/zabbix_agentd.conf - name: put zabbix-agentd.conf to default place template: src=/etc/ansible/templates/zabbix/zabbix_agentd.tpl dest=/etc/zabbix_agentd.conf owner=zabbix group=zabbix force=yes tags: zabbix - name: link zabbix-agentd.conf to /etc/zabbix file: src=/etc/zabbix_agentd.conf dest=/etc/zabbix/zabbix_agentd.conf state=link tags: zabbix - name: zabbix-agent start and enable service: name=zabbix-agent state=restarted enabled=yes tags: zabbix
When installing Zabbix, the agent config is rolled from the template, you only need to change the server address. Servers located within our network go to 192.168.x.98, and servers that do not have access to it go to the real address of the same server. Transferring ssh keys and setting up ssh has been moved to a separate role, which can be found, for example, on ansible-galaxy. There are many options, but the essence of the changes is quite trivial, so I don’t see any point in quoting all of its content here. It’s time to roll the created configuration onto the servers. In general, I install all the components and the cluster itself in one step, already with the full config, but it seems to me that for the purposes of this tutorial it would be better to divide it into two steps, respectively by chapters. Create a playbook for a group of servers:
/etc/ansible/cluster-pgsql.yml
--- - hosts: pgsql pre_tasks: - name: Setting system hostname hostname: name="{{ ansible_host }}" - include: tasks/essentialsoftware.yml - include: tasks/open-vm-tools.yml - include: tasks/ntpd.yml post_tasks: - include: tasks/zabbix.yml roles: - ssh.role - ansible-role-patroni
We start processing all servers:
~# ansible-playbook cluster-pgsql.yml --skip-tags patroni
The preparation is over.
Part II
We proceed to deploy the cluster itself. Since setting up a cluster requires a lot of work (installing PostgreSQL and all components, uploading individual configs for them), I separated this whole process into a separate role. Roles in Ansible allow you to group sets of related tasks, making it easier to write scripts and maintain them in working order. I took the role template for installing Patroni here: https://github.com/gitinsky/ansible-role-patroni, for which thanks to its author. For my own purposes, I reworked the existing one and added my haproxy and keepalived playbooks. My roles are in the / etc / ansible / roles directory. Create a directory for the new role, and subdirectories for its components:
~# mkdir /etc/ansible/roles/ansible-role-patroni/tasks ~# mkdir /etc/ansible/roles/ansible-role-patroni/templates
In addition to PostgreSQL, our cluster will consist of the following components: 1) haproxy to monitor the status of servers and redirect requests to the master server. 2) keepalived to provide a single entry point to the cluster – virtual IP. All playbooks performed by this role are listed in the file launched by ansible by default:
/etc/ansible/roles/ansible-role-patroni/tasks/main.yml
- include: postgres.yml - include: haproxy.yml - include: keepalived.yml
Next, we begin to describe individual tasks. The first playbook installs PostgreSQL 9.6 from the native repository, and additional packages required by Patroni, and then downloads Patroni itself from GitHub:
/etc/ansible/roles/ansible-role-patroni/tasks/postgres.yml
--- - name: Import Postgresql96 repo yum: name=https://download.postgresql.org/pub/repos/yum/9.6/redhat/rhel-7-x86_64/pgdg-centos96-9.6-3.noarch.rpm state=present tags: patroni when: install is defined - name: Install PGsql96 yum: name={{ item }} state=latest tags: patroni with_items: - postgresql96 - postgresql96-contrib - postgresql96-server - python-psycopg2 - repmgr96 when: install is defined - name: checkout patroni git: repo=https://github.com/zalando/patroni.git dest=/opt/patroni tags: patroni when: install is defined - name: create /etc/patroni file: state=directory dest=/etc/patroni tags: patroni when: install is defined - name: put postgres.yml template: src=postgres0.yml dest=/etc/patroni/postgres.yml backup=yes tags: patroni when: install is defined - name: install python packages pip: name={{ item }} tags: patroni with_items: - python-etcd - python-consul - dnspython - boto - mock - requests - six - kazoo - click - tzlocal - prettytable - PyYAML when: install is defined - name: put patroni.service systemd unit template: src=patroni.service dest=/etc/systemd/system/patroni.service backup=yes tags: patroni when: install is defined - name: Reload daemon definitions command: /usr/bin/systemctl daemon-reload tags: patroni - name: restart service: name=patroni state=restarted enabled=yes tags: patroni
In addition to installing the software, this playbook also fills in the configuration for the current Patroni server, and a systemd unit to start the daemon in the system, after which it starts the Patroni daemon. Config templates and systemd unit must be in the templates directory inside the role. Patroni config template:
/etc/ansible/roles/ansible-role-patroni/templates/postgres.yml.j2
name: {{ patroni_node_name }} scope: &scope {{ patroni_scope }} consul: host: consul.services.local:8500 restapi: listen: 0.0.0.0:8008 connect_address: {{ ansible_default_ipv4.address }}:8008 auth: 'username:{{ patroni_rest_password }}' bootstrap: dcs: ttl: &ttl 30 loop_wait: &loop_wait 10 maximum_lag_on_failover: 1048576 # 1 megabyte in bytes postgresql: use_pg_rewind: true use_slots: true parameters: archive_mode: "on" wal_level: hot_standby archive_command: mkdir -p ../wal_archive && cp %p ../wal_archive/%f max_wal_senders: 10 wal_keep_segments: 8 archive_timeout: 1800s max_replication_slots: 5 hot_standby: "on" wal_log_hints: "on" pg_hba: # Add following lines to pg_hba.conf after running 'initdb' - host replication replicator 192.168.0.0/16 md5 - host all all 0.0.0.0/0 md5 postgresql: listen: 0.0.0.0:5432 connect_address: {{ ansible_default_ipv4.address }}:5432 data_dir: /var/lib/pgsql/9.6/data pg_rewind: username: superuser password: {{ patroni_postgres_password }} pg_hba: - host all all 0.0.0.0/0 md5 - hostssl all all 0.0.0.0/0 md5 replication: username: replicator password: {{ patroni_replicator_password }} network: 192.168.0.0/16 superuser: username: superuser password: {{ patroni_postgres_password }} admin: username: admin password: {{ patroni_postgres_password }} restore: /opt/patroni/patroni/scripts/restore.py
Since each server in the cluster requires an individual Patroni configuration, its config is in the form of a jinja2 template (postgres0.yml.j2 file), and the template step forces ansible to translate this template with the replacement of variables, the values of which are taken from a separate description for each server. We will indicate the variables common to the entire cluster directly in the inventory, which will now take the following form: Patroni config template:
/etc/ansible/hosts
[pgsql] cluster-pgsql-01.local cluster-pgsql-02.local cluster-pgsql-03.local [pgsql:vars] patroni_scope: "cluster-pgsql" patroni_rest_password: flsdjkfasdjhfsd patroni_postgres_password: flsdjkfasdjhfsd patroni_replicator_password: flsdjkfasdjhfsd cluster_virtual_ip: 192.xx.xx.125 </spoiler> And a separate one for each server - in the host_vars / server_name directory: <spoiler title="/etc/ansible/host_vars/pgsql-cluster-01.local/main.yml"> <source lang="yaml"> patroni_node_name: cluster_pgsql_01 keepalived_priority: 99
I will decipher what some variables are for: patroni_scope – the name of the cluster when registering with Consul patroni_node_name – server name when registering with Consul patroni_rest_password – password for the Patroni http interface (required to send commands to change the cluster) patroni_postgres_password: password for user postgres. It is installed when patroni creates a new base. patroni_replicator_password – password for the replicator user. Replication to the slaves is carried out on his behalf. Also, this file lists some other variables used in other playbooks or roles, in particular, that can be the ssh setting (keys, users), the time zone for the server, the priority of the server in the keepalived cluster, etc. The configuration for the rest of the servers is the same, the server name and priority change accordingly (for example 99-100-101 for three servers). Installing and configuring haproxy:
/etc/ansible/roles/ansible-role-patroni/tasks/haproxy.yml
--- - name: Install haproxy yum: name={{ item }} state=latest tags: - patroni - haproxy with_items: - haproxy when: install is defined - name: put config template: src=haproxy.cfg.j2 dest=/etc/haproxy/haproxy.cfg backup=yes tags: - patroni - haproxy - name: restart and enable service: name=haproxy state=restarted enabled=yes tags: - patroni - haproxy
Haproxy is installed on each host, and contains links to all PostgreSQL servers in its config, checks which server is the master now, and sends requests to it. For this check, a wonderful feature of Patroni is used – the REST interface. When accessing url server: 8008 (8008 is the default port) Patroni returns a report on the state of the cluster in json, and also reflects the http response code whether this server is a master. If it is, there will be a response with a code of 200. If not, an answer with a code of 503. I strongly advise you to refer to the Patroni documentation, the http interface is quite interesting there, it is also allowed to forcibly switch roles, and manage the cluster. Similarly, this can be done using the patronyctl.py console utility from the Patroni distribution. The haproxy configuration is pretty simple:
/etc/ansible/roles/ansible-role-patroni/templates/haproxy.cfg
global maxconn 800 defaults log global mode tcp retries 2 timeout client 30m timeout connect 4s timeout server 30m timeout check 5s frontend ft_postgresql bind *:5000 default_backend postgres-patroni backend postgres-patroni option httpchk http-check expect status 200 default-server inter 3s fall 3 rise 2 server {{ patroni_node_name }} {{ patroni_node_name }}.local:5432 maxconn 300 check port 8008 server {{ patroni_node_name }} {{ patroni_node_name }}.local:5432 maxconn 300 check port 8008 server {{ patroni_node_name }} {{ patroni_node_name }}.local:5432 maxconn 300 check port 8008
According to this configuration, haproxy listens on port 5000 and sends traffic from it to the master server. The status check occurs with an interval of 1 second, to bring the server down, 3 unsuccessful responses are required (code 500), to switch the server back – 2 successful responses (with code 200). At any time, you can contact any haproxy directly, and it will correctly route the traffic to the master server. Also included with Patroni is a template for configuring the confd daemon, and an example of its integration with etcd, which allows you to dynamically change the haproxy config when you remove or add new servers. For now, I’m making a fairly static cluster, unnecessary automation in this situation, IMHO, can lead to unforeseen problems. We wanted clients to have custom logic changes, live server tracking, etc. were not required, so we make a single entry point to the cluster using keepalived. The keepalived daemon works using the vrrp protocol with its neighbors, and as a result of choosing one of the daemons as the main one (the priority is specified in the config, and templated into the keepalived_priority variable in host_vars for each server), it picks up a virtual ip address. The rest of the demons are patiently waiting. If the current main keepalived server for some reason dies or signals a failure to neighbors, re-elections will occur, and the next priority server will take the virtual IP address for itself. To protect against haproxy crashes, the keepalived daemons perform a check by running the “killall -0 haproxy” command once a second. It returns a code of 0 if there is a haproxy process, and 1 if it is not. If the haproxy disappears, the keepalived daemon will signal a vrrp crash and remove the virtual ip. The virtual IP will immediately be picked up by the next priority server, with a live haproxy. Installing and configuring keepalived:
/etc/ansible/roles/ansible-role-patroni/tasks/keepalived.yml
--- - name: Install keepalived yum: name={{ item }} state=latest tags: - patroni - keepalived with_items: - keepalived when: install is defined - name: put alert script template: src=alert.sh.j2 dest=/usr/local/sbin/alert.sh backup=yes mode=755 tags: - patroni - keepalived when: install is defined - name: put config template: src=keepalived.conf.j2 dest=/etc/keepalived/keepalived.conf backup=yes tags: - patroni - keepalived - name: restart and enable service: name=keepalived state=restarted enabled=yes tags: - patroni - keepalived
In addition to installing keepalived, this playbook also copies a simple script for sending alerts via telegrams. The script accepts the message as a variable, and simply pulls the curl telegram API. In this script, you only need to specify your token and telegram group ID to send notifications. The keepalived configuration is described as a jinja2 template:
/etc/ansible/roles/ansible-role-patroni/templates/keepalived.conf.j2
global_defs { router_id {{ patroni_node_name }} } vrrp_script chk_haproxy { script "killall -0 haproxy" interval 1 weight -20 debug fall 2 rise 2 } vrrp_instance {{ patroni_node_name }} { interface ens160 state BACKUP virtual_router_id 150 priority {{ keepalived_priority }} authentication { auth_type PASS auth_pass secret_for_vrrp_auth } track_script { chk_haproxy weight 20 } virtual_ipaddress { {{ cluster_virtual_ip }}/32 dev ens160 } notify_master "/usr/bin/sh /usr/local/sbin/alert.sh '{{ patroni_node_name }} became MASTER'" notify_backup "/usr/bin/sh /usr/local/sbin/alert.sh '{{ patroni_node_name }} became BACKUP'" notify_fault "/usr/bin/sh /usr/local/sbin/alert.sh '{{ patroni_node_name }} became FAULT'" }
The variables patroni_node_name, cluster_virtual_ip and keepalived_priority are translated corresponding data from host_vars. Also in the keepalived config there is a script for sending messages about status change to the telegram channel. We roll the full cluster configuration onto the servers:
~# ansible-playbook cluster-pgsql.yml
Since Ansible is idempotent, i.e. performs steps only if they have not been completed before, you can run the playbook without additional parameters. If you don’t want to wait any longer, or you are sure that the servers are completely ready, you can run ansible-playbook with the -t patroni switch. Then only the steps from the Patroni role will be executed. Note that I do not specify the role of servers separately – master or slave. This configuration will create an empty base, and the first configured server will simply become the wizard. When adding new servers, Patroni will see through DCS that the cluster master already exists, automatically copy the base from the current master, and connect the slave to it. If a slave is started that has lagged behind the master for some time, Patroni will automatically merge changes using pg_rewind. We make sure that all servers have started and have chosen roles for ourselves:
~# journalctl -f -u patroni
Afterword
The only thing that this cluster, in my opinion, lacks for happiness is pooling of connections and proxying read requests to all slaves to improve read performance, and insert and update requests only to the master. In a configuration with asynchronous replication, unloading the read load can lead to unexpected responses, if the slave lags behind the master, this must be taken into account. Streaming (asynchronous) replication does not provide consistency to the cluster at all times, and it requires synchronous replication. In this mode, the master server will wait for confirmation of copying and applying transactions to the slaves, which will slow down the operation of the database. However, if transaction losses are unacceptable (for example, some financial applications), synchronous replication is your choice. Patroni supports all options, and if synchronous replication suits you better, you just need to change the value of several fields in the Patroni configs. The issues of different replication methods are perfectly covered in the Patroni documentation. Someone will probably suggest using pgpool, which itself, in fact, covers all the functionality of this system. It can monitor databases, proxy requests, and expose a virtual IP, as well as pool client connections. Yes, he can do it all. But in my opinion, the scheme with Patroni is much more transparent (of course, this is just my opinion), and while experimenting with pgpool, I caught strange behavior with its watchdog and virtual addresses, which I did not debug too deeply yet, deciding to look for another solution. Of course, it is possible that the problem is only mine in my hands, and later I plan to return to testing pgpool. However, in any case, pgpool will not be able to fully automatically manage the cluster, introduce new and (especially) return failed servers, or work with DCS. In my opinion, this is the most interesting functionality of Patroni. Thank you for your attention, I will be glad to see suggestions for further improving this solution, and answer questions in the comments.
Similar Posts:
- how to upgrade postgresql 11 to 12 using pg_upgradecluster
- how to install PostgreSQL 12 on Ubuntu / Debian 10,11
- postgresql: how to reset forgotten postgres password
- How to change Porstgresql default password
- How to create Dblink between Oracle & PostgreSQL