89luca89 / terrible Goto Github PK
View Code? Open in Web Editor NEWAn Ansible playbook that applies the principle of the Infrastructure as Code on a QEMU/KVM environment.
License: GNU General Public License v3.0
An Ansible playbook that applies the principle of the Infrastructure as Code on a QEMU/KVM environment.
License: GNU General Public License v3.0
The new CI lint shows some errors and warnings, we should fix them
We should use ansible for this type of task instead of terraform
It would be nice for it to be configurable,
someone would like to manage it using git, or an s3 bucket, or a specific folder that is rcloned and so on...
right now, it's:
"{{ ansible_inventory_sources[0] }}-state.tar.gz"
It is a nice default, we could move it to defaults/main.yml
, and make it configurable.
It would be nice to add a list of commands that could be useful to be executed at the end of the setup.
For example:
terrible_custom_provisioners:
- pip3 install -U ansible ansible-base
- ansible-pull -U "{{ ansible_pull_custom_url }}"
This could be useful to do final steps to setup a cluster for example
We should be more clear on the advantages of this approach compared to a full-terraform or full-ansible approach.
Could be really nice to have the support for FreeBSD.
TravisCI.org is no more
we need to migrate to a viable solution for CI testing
We will move to a private baremetal droplet for now.
For interfaces that are not in a libvirt network (eg. the default NAT, an isolated one etc...) we want the ability to setup a static ip for that interface.
To do this, we should decide if it's something we want to do via terraform, or via ansible.
Example use case:
4 nodes hypervisor cluster, internal shared network between the 4 nodes.
We want the VMs to be reachable via internal network on all 4 nodes, all VMs in all nodes should reach all other VMs in other nodes.
We should try to reach a platform agnostic conclusion if possible (eg. NetworkManager vs wicked vs cloud-init )
Some variables specified on the inventory file need to be verified through the Ansible asserts.
Candidate variables could be: os_family
, cpu
, memory
and some ip address.
Es. os_family
: must be one of the values specified in the os_family_support
variable.
Adding cloudinit support for images can be really useful for the disk_source selection.
the terraform-libvirt provider, supports cloudinit images by using a specific resource:
data "template_file" "user_data" {
template = file("./cloud_init.yml")
}
resource "libvirt_cloudinit_disk" "commoninit" {
name = "commoninit.iso"
user_data = data.template_file.user_data.rendered
}
we should add them in the terraform-vm.tf.j2
file inside an if/else checking if this is a cloud-init image
the cloud_init.yml
file can be a template, this is an example of user data:
#cloud-config
ssh_pwauth: True
chpasswd:
list: |
root:password
expire: False
this enables the root user via ssh with password, so terrible can use it as a normal image
We should have some basic tools to ease the use of the project
Ideally we can gather them in a ./utils
folder
Some basic tools can be
generate_inventory
to create a basic inventory from a template, passing arguments to itterrible
up
to easily run the playbookdown
to shut down VMs preserving resourcespurge
to purge a playbookvalidate
to validate an existing playbookcheck_deps
to check if the system meets all terrible requirementsOpen to other ideas
If we declare a remote host for terraform_node
, the saved state+inventory bundle will remain there.
we should handle the situation:
when:
- hostvars['terraform_node']['ansible_connection'] != 'local'
This should use the copy
module if a saved state is present to send it to destination, and use the fetch
module to retrieve the new saved state file.
We should support non-root users for both ansible and terraform.
Also for terraform support use of ssh-keys (those have to be already in the template you're using)
We should add swap support for the disk part
@alegrey91 what do you think?
We should add Alpine family support
Right now, setting up a VM can be quite a bit of YAML code, so we should improve this giving better and sane defaults for all aspects of a VM
Right now we could improve setting up:
default_route
by defaultansible_host
/mnt/${ disk name }
pool_name
swap
it should default mount_point
to none
without having to declare itpool_name
should default to default
Ideally a simple inventory without many variables should be able to declare VMs in a simple manner like:
all:
vars:
disk_source: "/mnt/template.qcow2"
os_family: RedHat
hosts:
terraform_node:
ansible_host: localhost
ansible_connection: local
hypervisor_1:
ansible_host: 10.90.20.12
ansible_user: root
children:
deploy:
children:
my_nodes:
hosts:
node-0:
cpu: 2
memory: 2512
ansible_host: 192.168.122.200
data_disks:
disk-1:
size: 5
...
Or even simpler if no external disks are declared
all:
vars:
disk_source: "/mnt/template.qcow2"
os_family: RedHat
hosts:
terraform_node:
ansible_host: localhost
ansible_connection: local
hypervisor_1:
ansible_host: 10.90.20.12
ansible_user: root
children:
deploy:
children:
my_nodes:
hosts:
node-0:
cpu: 2
memory: 2512
ansible_host: 192.168.122.200
node-1:
cpu: 2
memory: 2512
ansible_host: 192.168.122.201
....
Some assertions should be improved, for example:
host-vm-1:
disk_source: "/mnt/iso/debian10-terraform.qcow2"
os_family: Debian
cpu: 1
memory: 512
hypervisor: hypervisor_1
ansible_host: 192.168.122.9
network_interfaces:
iface_1:
name: default
type: nat
ip: 192.168.122.9
gw: 192.168.122.1
dns:
- 192.168.122.1
default_route: True <<<--- note this is unindented
This snippets passes the assertions even if it's not valid
We should review and evaluate more corner cases to include in the assertions
Also we could think of an external tool/linter as @alegrey91 suggests to validate inventories
To avoid dependencies problems, we could provide a Dockerfile to allow the users to build the Terrible container by themself.
The container should include:
And possibly also other things if necessary. The Travis pipeline could be inspirational for this issue.
When we define additional disks, could be useful to have an encryption
variable to ask Ansible to encrypt the disk.
The variable should be optional
, and needs also other related variables like password
, or password_file
from we can retrieve the password to encrypt the disk.
The inventory could become as follow:
...
data_disks:
disk-storage:
size: 100
pool: default
format: xfs
mount_point: /mnt/data_storage
encryption: True
password: password123
This could be useful for nodes that are part of a cluster and need some type of pruning before shutting down
# Verifiy the correctness of the parameters.
- name: Validate 'terraform_target_hypervisor' parameter
assert:
quiet: yes
that:
- hostvars['terraform_node']['terraform_target_hypervisor'] | ipaddr
fail_msg: >
You are trying to use an unsupported terraform_target_hypervisor value.
when:
- hostvars['terraform_node']['terraform_target_hypervisor'] is defined
@alegrey91 this is too strict, an hostname should be a valid value
When deploying a new set of VMs on a target hypervisor, we should use a different folder tree for each target, this will preserve terraform states for specific targets.
Example:
~/.ansible-terraform-kvm
|_ target-uuid_1
|_ hypervisor_1
|_ vm1
|_ target-uuid_2
|_ hypervisor_1
|_ vm1
To generate UUID for target hypervisor we could use his URI for example
We want to be able to declare (multiple) secondary disks for a VM:
eg:
- data_disks:
- disk1:
size: 10G
pool: secondary_pool
format: ext4
mount_point: /mnt/data_disk1
- disk2:
- - -
This part should be done in a mix of terraform and ansible.
Terraform:
Ansible:
Now that we have a project name, the terraform
role should be renamed to terrible
To maintain a general code the VM's HCL uses dhcp leases and later switch to static IP in the ansible.
Doing this works the first time OR if the lease is free and the machines always gets the requested IP.
If this is not the case, terraform will fail when passing the 2nd or Nth time, so we should keep the main interface as BOTH dhcp+static.
This will ensure terraform could work properly, and static IP ensured if dhcp is not available for it.
We should add Windows family support.
Some sections needs to be updated.
Also, we should include a Table of Contents section.
Other ideas could be placed below.
The next step to improve the code stability is to start to create some kind of basic control.
Following this guide: https://github.com/ansible/ansible-lint-action and other similar, we could implement our own CI pipeline to improve also contribution control.
We should support other providers like:
To include multiple providers we should think on how we want to specify the provider, for example:
hypervisor_1:
ansible_host: remote_kvm_machine.foo.bar
ansible_user: root
provider_type: Proxmox
Or something similar, also we should think on how the code generation will behave, for example
roles/terrible/templates/Libvirt
└── ......tf.j2
roles/terrible/templates/Proxmox
└── ......tf.j2
roles/terrible/templates/oVirt
└── ......tf.j2
And use the provider_type
as a variable to discover the templates.
Discussion open on how to approach the problem
Security updates should be optional or delegated to terrible_custom_provisioner
this comes handy in situations where no internet is available or to keep track of package versions.
It could be just removed and be a part of terrible_custom_provisioners
and left to the user
Hi, does the terraform-libvirt template file support cloud init configuration?
I was using a cloud-init base image and the deploy is stuck on the task TASK [terraform : Terraform apply VMs]
This should be useful for adding/removing ifaces and passtrought in future.
Add
- name: "Terraform renew target VMs"
terraform:
project_path: "{{ hcl_deploy_path }}/{{ inventory_hostname }}"
force_init: true
state: absent
targets: "libvirt_domain.domain-terraform"
tags: deploy, apply, generate_hcl
when: terraform_status.changed
delegate_to: terraform_node
Right now, we are taking for granted that Ubuntu uses netplan.
This is not strictly true, for various reasons:
We should improve the netplan detection itself instead of relying on ubuntu-vs-debian definition
We should ignore virtual
interfaces inside machines, just stick to the real ones
We should streamline more the jump host definition between the ansible and terraform part.
Right now we have the terraform bastion declaration like this:
provider_uri: "qemu+ssh://[email protected]/system"
terraform_bastion_enabled: True
terraform_bastion_host: 10.90.20.12
terraform_bastion_password: password
terraform_bastion_port: 22
terraform_bastion_user: root
This will declare correctly the jump host the terraform remote-exec
commands will use to provision the VMs
This will not work on ansible so we need to add:
ansible_jump_hosts:
- {user: root, host: 10.90.20.12, port: 22}
This setup translates to something like:
Ansible + Terraform Server
--> KVM Server
In a simple setup like this, we should try to have a much simpler setup, like:
provider_uri
ansible_jump_hosts
if we have terraform_bastion
detected.This however could be tricky if we have a different terraform server, separated from the ansible one:
Ansible Server
--> Terraform + KVM Server
This example will indeed need ansible_jump_hosts
setup, but NOT terraform_bastion
We should:
provider_uri
does NOT contain sshterraform_node
is NOT localansible_jump_hosts
accordingly (being the terraform_node
the jump_host)The worst situation to auto-detect is when we have a completely disjointed setup:
Ansible Server
--> Terraform Server
--> KVM Server
In this situation we have:
provider_uri
with ssh, so we need to set terraform_bastion
and add a hop to ansible_jump_hosts
accordinglyterraform_node
is NOT local, so add another hop to ansible_jump_hosts
So the flow should be:
terraform_bastion = false
ansible_jump_hosts = []
if provider_uri does not contain ssh:
# we have a qemu:///system situation
if terraform_node is local:
# ansible, terraform and kvm all on the same machine
return
else:
# local ansible and remote terraform+KVM machine
ansible_jump_hosts.append(terraform_node)
else
# we have a qemu+ssh://user@remote_host/system
if terraform_node is local:
# we have a local ansible+terraform and remote KVM machine
ansible_jump_hosts.append(user@remote_host)
else:
# we have a local ansible, remote terraform and another remote KVM machine
ansible_jump_hosts.append(user@remote_host)
ansible_jump_hosts.append(terraform_node)
As already mentioned in another thread (#40), this issue is dedicated to the evaluation of the implementation of a Github action to automate the deployment of basic infrastructure, using the following actions:
A basic infrastructure means the ability to deploy one VM for each OS family supported (depending on the time it needs to do the job).
Right now, for each VM we generate a separate HCL file in a separate project path,
the right way I think it's to create a per-hypervisor project.
Before
host_1
|_host_1.tf
host_2
|_host_2.tf
After:
hypervisor_1:
|_ variables.ft
|_ host_1.ft
|_ host_2.ft
hypervisor_2:
|_ variables.ft
|_ host_3.ft
|_ host_4.ft
This way we will follow more closely terraform best practices, create less file flooding and have already a file hierarchy that reflects closely the infrastructure we are declaring in our inventory
We should be able to declare per-hypervisor, per-group or per-vm provisioners for terraform, it would be useful to be a simple list:
terraform_custom_provisioners:
- "pkg install python3"
- "pw useradd user1 -g group1 -s /usr/local/bin/bash"
- "pw usermod user1 -G wheel,www-data"
This should be interpreted in the Jinja2 of the terraform in a way similar to this:
if terraform_custom_provisioners is declared:
for provisioner in terraform_custom_provisioners:
GENERATE_TF_CODE_PROVISIONER
endfor
endif
This could be really useful to support different OS (Like BSD, Solaris etc...) and to perform critical actions before using ansible on the VMs (one example, installing python3 where not present)
We should add an autostart
variable that sets the VM autostart to true or false.
If not specified, default to false.
example:
cloud-init-node-0:
os_family: RedHat
#disk_source: "~/Desktop/cloudinit/CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2"
disk_source: "https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2"
cpu: 1
memory: 512
hypervisor: hypervisor_1
ansible_host: 192.168.122.200
cloud_init: True
autostart: True
network_interfaces:
if-0:
...
data_disks:
disk-1:
...
disk-swap:
...
Being terraform under the hood, we should respect the need to preserve the state of the VM deployments, or else it will not work smoothly on multi-clients setup
the idea is to output a tar.gz
of the whole ~/.terrible/$playbook_name
folder to a file called something like $playbook-name.state.tar.gz
in the same folder of the $playbook file
we should have a restore
tag that will take care of restoring the files in ~/.terrible/$playbook_name
, but with the never
tag also, this has to be an explicit action to do (to avoid overriding existing states if not wanted)
so we will have for example
./inventory-1.yml
./inventory-1.yml.state.tar.gz
we will launch
ansible-playbook -i inventory-multihyper.yml -u root main.yml --tags restore
this will populate the folders in ~/.terrible
then we can proceed with a deployment that will have a known state
The purpose of this issue is to completely remove ansible internal variables from the project variables (API variables).
The variables in question are such: ansible_host
, ansible_ssh_pass
and so on.
These variables should be replaced with custom own variables, to avoid to create confusion.
Right now, we should NOT use the zip release to create dockers, but the very source that is checked out during the action
Affected files are
.github/workflows/docker-build.yml
.github/workflows/docker-tag-release.yml
Right now the use of groups and terraform_node
variables to handle multiple hypervisors does not work well, because of: ansible/ansible#32247
This issue suggests to use unique groups for-each parent groups (ex: hypervisor_1, hypervisor_2 etc etc)
while goind this route is feasible, I think it will break compatibility with existing roles that want a specific group structure,
I think we should find a solution that is generic and does not impose specific group names/structures to work.
Right now, terraform node only works on local setups,
we would want to have a separate terraform node that deploys the VM on the KVM host,
changing the terraform_node
from a var to a special host in each hypervisor_X group.
the possible use cases are:
this right now, is what we have to change to support the 4th case:
We should create an action that runs the validate
target on the inventory-example file (and maybe all the other examples?)
This should ensure that all our example and yaml documentations is aligned with any last changes in the code and assertion
We should improve task-wise comments to be more verbose and useful for possible new contributors, and help them understand
We should generate TF files in a way that they describe an entire group, or even better, an entire hypervisor.
This way we do not have situations where if we want to remove a node, we have first to do --tags purge --limit name-of-node
and then remove it from the inventory, but it will be automatically deleted when removed from the inventory itself.
We should add the seguent targets to travis pipeline:
Also we could add (nice to have)
We should add support for other types of virtual networks like:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.