This Ansible playbook installs, configures and deploys Apache NiFi on a (virtual) machine that is assumed to be dedicated for this purpose. It performs the required system tuning steps, network configuration, prerequisite installation and initial flow configuration of Apache NiFi. It also installs the Apache NiFi Registry and connects it to a GitHub repository for flow configuration storage.
Even though the intent is to completely automate the process, there are a few steps that need to be executed in sequence to set up a NiFi instance as each of the steps can be varied according to the needs of the environment being set up. The steps are:
- Set up a (virtual) machine
- Register the IPv4 address in the DNS
- Configure environment
- Deploy NiFi and its dependencies
The results of the preparatory steps are all stored in a YAML inventory file that is then used by Ansible to run the main playbook. Each of the steps is described below.
Included in this repository is the script make-hetzner-vm.sh
which
creates a suitable virtual machine in the Hetzner cloud. It uses the
hcloud command line utility,
which needs to be installed locally and configured with the desired
access token before the script is run as follows:
./make-hetzner-vm.sh <name> <ssh-key>
The <name>
is used to identify the virtual machine on Hetzner's
side. It can be the same as the host name that is being used for the
NiFi instance. <ssh-key>
indicates the name of the ssh key that
should have access to the root
account on the new virtual machine.
The key needs to be registered with Hetzner beforehand, using the web
interface or hcloud
. This key must be present on the machine and to
the user that is used to run Ansible later on.
The script prints the IPv4 address of the new virtual machine on the standard output when it finishes.
It is possible to provision the virtual machine using any other means, as long as ssh access to the root account is provided for the initial Ansible setup.
In order for the NiFi instance to be usable, a DNS domain name must be
registered for it. If AWS Route53 is used as DNS provider for the
zone in which the name resides, the set-dns-name-route53.sh
script
can be used to associate the DNS name with the IPv4 address of the
NiFi instance like so:
./set-dns-name-route53.sh <name> <address>
<name>
is the fully qualified domain name to register, <address>
is the IPv4 address of the NiFi instance as determined previously.
In order to use this script, the AWS CLI needs to be installed locally and configured with an appropriate access token. The AWS CLI is available in most operating system package repositories.
In order to create an environment, the NiFi host needs to be defined
in an inventory file. A sample inventory is provided in the file
sample-inventory.yml
. Once the inventory has been set up, the
playbook can be run like so:
ansible-playbook -i <inventory-file> bootstrap.yml
File names defined in the inventory file are resolved by
ansible-playbook
relative to to the playbook file
(bootstrap.yml
). It might thus be necessary to navigate out of the
playbook repository by using ..
or using absolute pathnames.
Additional configuration parameters can be set in the inventory for the new virtual machine:
By default, the NiFi instance will be configured so that a server TLS certificate is requested from letsencrypt using certbot. If desired, it is possible to use a keystore and/or a truststore from another source. This can be configured by the following parameters in the inventory:
# The NiFi web interface is protected by the certificate and
# key in this keystore:
ssl_keystore: nifi-dev.ks
nifi_security_keystorePasswd: changeme
# The CA certificate that issued the server certificate needs to
# be included in this truststore:
ssl_truststore: ca-certificate.ks
nifi_security_truststorePasswd: changeme
If a CA certificate is present in the truststore, NiFi will authenticate clients that present a valid client certificate issued by that CA. Access rights need to be configured for such users manually or by editing users.xml and authorizations.xml.
The logstash data collection engine is deployed on the Ansible host to centralize and streamline the collection and distribution of logging information. It is configured to collect the system and application logs and send it to a downstream logging system.
In order to log to an Elasticsearch instance, the following parameters need to be set in the inventory:
logstash_elasticsearch_hosts:
- <elasticsearch-url>
logstash_elasticsearch_index: <index-name>
<elasticsearch-url>
is the url of Elasticsearch that logs need to be
written to. If basic authentication needs to be used, the username
and password need to be included in the url
(https://<user>:<password>@<host>:<port>/...
). The <index-name>
determines the index into which logs are written,
e.g. "nifi_dev-%{+YYYY.MM.dd}"
.
Logs can be sent to DataDog by setting the following parameter:
logstash_datadog_log_api_key: <api-key>
<api-key>
needs to be a valid API key for the account. By default,
logs are sent to Datadog's European log ingestion endpoint
(http-intake.logs.datadoghq.eu
). This can be overridden by setting
the configuration parameter logstash_datadog_log_host
to the
appropriate host name in the inventory.
With all parameters present in the Inventory file, Ansible can be run to install and configure NiFi and its dependencies on the virtual machine.
The ansible playbook automatically sets up the NiFi registry so that
flow configurations are stored in a git repository and pushed to
github. The URL of the github repostitory must be stored in the
nifi_registry_github_repository
configuration variable. As flows
can be edited in the NiFi web UI, the NiFi registry must be able to
push changes back into GitHub. In order to achieve that, a SSH key
needs to be created and set up as a deployment key with write access
in the GitHub flow configuration GitHub repository. The private key
must be stored in a file, and the name of that file needs to be
configured as nifi_registry_github_deploy_key
in the inventory.
The NiFi installation lives under /opt/nifi/
. To install extensions
(.nar files), they must be placed in the /opt/nifi/extensions
directory. This can be done manually or automatically. NiFi
automatically reloads extensions when they change.
https://bryanbende.com/development/2017/10/03/apache-nifi-openid-connect
~/nifi-toolkit-1.11.4/bin/tls-toolkit.sh standalone -O -c <CA-HOSTNAME> -n <NIFI-HOSTNAME> -C 'CN=admin,OU=NiFi' -o certs
openssl pkcs12 -in <ADMIN-DN>.p12 -out CN=<ADMIN-DN>.pem -passin file:<ADMIN-DN>.password -nodes