kurokobo / awx-on-k3s Goto Github PK
View Code? Open in Web Editor NEWAn example implementation of AWX on single node K3s using AWX Operator, with easy-to-use simplified configuration with ownership of data and passwords.
License: MIT License
An example implementation of AWX on single node K3s using AWX Operator, with easy-to-use simplified configuration with ownership of data and passwords.
License: MIT License
Special characters such as & in the password for awx in the awx-postgres-configuration causes backups using AWXBackup to not work. The backup container would start, create the backup directory, set permissions, but as soon as it would attempt to run the pg_dump it would error out and the container would get terminated. Removing the & from the password fixed it. The awx-operator documentation says that if the password has special characters it should be quoted.
new install and set the passwords in the kustomization.yaml file to have an & in them then configure and run the backup operator per https://github.com/kurokobo/awx-on-k3s/tree/main/backup
Rather than setting up a self-signed certificate, it'd be better to setup Lets Encrypt.
Since a domain/sub-domain is already set while configuring, might as well not make a self-signed cert.
Attempted to spin up AWX 21 on a brand new server using this repository with the latest operator. The install failed with:
May 13 13:47:19 XXX k3s[27942]: I0513 13:47:19.715551 27942 kubelet_pods.go:891] "Unable to retrieve pull secret, the image pull may not succeed." pod="awx/awx-operator-controller-manager-675865446d-9nh27" secret="" err="secret \"redhat-operators-pull-secret\" not found"
Looking in ansible/awx-operator@859384e I can see a reference to redhat-operators-pull-secret
which was made two weeks ago but I don't recall having to configure this parameter on previous releases.
Checking the version mapping table, I'm pretty sure I was able to use 0.20.1 of the operator to deploy AWX 21.0.0
AWX Operator | AWX |
---|---|
0.21.0 | 21.0.0 |
0.20.2 | 21.0.0 |
0.20.1 | 21.0.0 |
I may rebuild and test again with version 0.20.1 just to rule out a local misconfiguration.
Hi,
Followed your guide and AWX installs without errors according to the install logs. Using a VM in Azure with no ports open to the internet.
Certificate error: CA Root certificate not trusted, issuer is TRAEFIK DEFAULT CERT
The service principal is DNS Zone Contributor in the Azure DNS zone.
Any specific logs one should look at?
I deployed using a self-signed SSL certificate using
AWX_HOST="awx.example.com"
openssl req -x509 -nodes -days 3650 -newkey rsa:2048 -out ./base/tls.crt -keyout ./base/tls.key -subj "/CN=${AWX_HOST}/O=${AWX_HOST}" -addext "subjectAltName = DNS:${AWX_HOST}"
However, I would like to switch it to an enterprise CA signed SSL certificate. Do I just need generate a new certificate and key using my enterprise CA using the same names, place them in the same spot, then run kubectl apply -k base to apply the new certificate and private key?
Hi,
I've manually created the DNS record in Azure and usually acme can create certificate without the need to define azure resources. Why is the Issuer part needed? The concept is not clear to me. A public IP with access via port 80/http should be enough. Thanks in advance
Newer versions contain SSL via ACME. Can this folder be copied to older versions like 0.14.0 and still work?
On a fresh install the postgres container is in 'ErrImagePull' state.
# kubectl -n awx get pod
NAME READY STATUS RESTARTS AGE
awx-operator-controller-manager-6d959bd7dd-9g786 2/2 Running 0 2m58s
awx-postgres-0 0/1 ErrImagePull 0 46s
Looking further it seems it is unable to pull the image from docker.io
# kubectl -n awx describe pod awx-postgres-0
Warning Failed 20s (x2 over 50s) kubelet Error: ImagePullBackOff
Normal Pulling 6s (x3 over 54s) kubelet Pulling image "postgres:12"
Warning Failed 3s (x3 over 50s) kubelet Failed to pull image "postgres:12": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/postgres:12": failed to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/library/postgres/manifests/sha256:ed97ef00029e0df606e9d8c9fba68b1ef5d023dbacc84178b441312282178123: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit
I think I understand the error - I have a login for docker hub, but I have not specified it anywhere so my requests are being denied.
This is very much identical to Image pull failing on K3S deployment using kurokobo/awx-on-k3s
I see references to image_pull_secret
in the AWX Operator guide, but I'm unsure how to use them in conjunction with this guide.
What are the options for specifying credentials to docker hub?
Follow the guide on a fresh installation, but do not provide any docker hub credentials or run 'docker login'.
Hi,
I am having issues figuring out how to install community module requirements and notices that you code contains builder/requirements.yml and builder/requirements.txt.
Can these be used to customize which modules are present at runtime?
Can't execute community.vmware modules since it's nowhere described how to do this in AWX.
- name: Make sure requirements are met to run vmware modules
become: true
ansible.builtin.pip:
name: pyVmomi
state: present
- name: Export virtual machine facts
community.vmware.vmware_guest_info:
hostname: "{{ hostname }}"
username: '{{ domain_user }}'
password: "{{ domain_password }}"
datacenter: "{{ datacenter }}"
validate_certs: no
schema: vsphere
name: VM01
register: virtualmachine_facts
Both commands result in:
fatal: [127.0.0.1]: FAILED! => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/libexec/platform-python"
},
"changed": false,
"module_stderr": "/bin/sh: sudo: command not found\n",
"module_stdout": "",
"msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
"rc": 127
}
Blah blah blah ...
kubectl ...
Hi,
we have two servers where we deploy our AWX instance. They are called ansible01.example.com
and ansible02.example.com
. We create a DNS alias and issue the certificates with an additional SAN called ansible.example.com
.
In the base/awx.yml
the hostname has to match the hostname of the URL that is later used to contact AWX. Otherwise you will get a 404 error.
...
spec:
...
ingress_type: ingress
ingress_tls_secret: awx-secret-tls
hostname: awx.example.com ๐๐๐
...
Is there a way to specify an additional hostname? I know that traefik itself can handle this in the docker environment, but I'm not sure if this is possible in this context.
Thanks and best regards
Jens
Hi again, and thanks for keeping this great project updated!
I am hitting an issue today so I am opening up a report here as I cannot manage to use task_extra_env
to set AWX_CLEANUP_PATHS
to False
(this is related to https://groups.google.com/g/awx-project/c/XhY-uDSxDIo/m/CnhjRQG5AAAJ).
I have double-checked but let me know if I oversaw something. I am filing here since I deploy via your repo but this may be an issue with the upstream operator.
k3s version v1.19.15+k3s1 (e698d6d8)
When task_extra_env
is set to the following, the operator cannot deploy via the updated CRD:
task_extra_env: |
- name: AWX_CLEANUP_PATHS
value: false
The syntax I used is in line with what is advised at https://github.com/ansible/awx-operator#exporting-environment-variables-to-containers:
spec:
task_extra_env: |
- name: MYCUSTOMVAR
value: foo
web_extra_env: |
- name: MYCUSTOMVAR
value: foo
ee_extra_env: |
- name: MYCUSTOMVAR
value: foo
Yet the deployment fails:
TASK [installer : Apply deployment resources] **********************************\r\ntask path: /opt/ansible/roles/installer/tasks/resources_configuration.yml:35\nfatal: [localhost]: FAILED! => {\"changed\": false, \"error\": 400, \"msg\": \"Failed to apply object: b'{\\\"kind\\\":\\\"Status\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"metadata\\\":{},\\\"status\\\":\\\"Failure\\\",\\\"message\\\":\\\"Deployment in version \\\\\\\\\\\"v1\\\\\\\\\\\" cannot be handled as a Deployment: v1.Deployment.Spec: v1.DeploymentSpec.Template: v1.PodTemplateSpec.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.Env: []v1.EnvVar: v1.EnvVar.Value: ReadString: expects \\\\\\\\\\\" or n, but found f, error found in #10 byte of ...|,\\\\\\\\\\\"value\\\\\\\\\\\":false}],\\\\\\\\\\\"im|..., bigger context ...|amespace\\\\\\\\\\\"}}},{\\\\\\\\\\\"name\\\\\\\\\\\":\\\\\\\\\\\"AWX_CLEANUP_PATHS\\\\\\\\\\\",\\\\\\\\\\\"value\\\\\\\\\\\":false}],
I can indeed see the CRD looks fishy. Note the bizarre whitespace placement, indentation was lost. Full output is attached:
$ kubectl describe awx
[...]
Replicas: 1
route_tls_termination_mechanism: Edge
task_extra_env: - name: AWX_CLEANUP_PATHS
value: false
task_privileged: false
Status:
Conditions:
Last Transition Time: 2021-12-07T14:03:53Z
I suspect this is because the templating might be too simple at https://github.com/ansible/awx-operator/blob/devel/roles/installer/templates/deployment.yaml.j2#L266 and not take into account the more complex data structure but I might be wrong.
Did you successfully manage to override those environment variables this way?
base/awx.yaml
to use the attached onekubectl apply -k base/
as usualLogs + full base/
directory: extra_env_error.zip
I have AWX up and running. And have HTTP_X_FORWARDED_FOR set. But callback provisioning isn't working. I get {"msg":"No matching host could be found!"}
Which leads me to believe that somehow the HTTP headers are getting changed too much so AWX doesn't recognize the host. I'm pretty inexperienced with Kubernetes. Is there something that needs to be done with K3s to make the forwarding work right?
Followed this exactly, but getting error. Is there any way to debug this?
VM has 2 cores and 8GB ram. Not sure what's wrong. I have tried 3x now.
PLAY RECAP *********************************************************************\r\nlocalhost : ok=32 changed=0 unreachable=0 failed=1 skipped=25 rescued=0 ignored=0 \r\n\n","job":"5444334632873030104","name":"awx","namespace":"awx","error":"exit status 2"}
{"level":"error","ts":1634329553.7360277,"logger":"controller-runtime.manager.controller.awx-controller","msg":"Reconciler error","name":"awx","namespace":"awx","error":"event runner on failed","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214"}
Here's the status:
[root@ansible awx-on-k3s]# kubectl -n awx get awx,all,ingress,secrets
NAME AGE
awx.awx.ansible.com/awx 57m
NAME READY STATUS RESTARTS AGE
pod/awx-operator-controller-manager-68d787cfbd-tmrnv 2/2 Running 0 62m
pod/awx-84d5c45999-6pc4t 0/4 Pending 0 57m
pod/awx-postgres-0 1/1 Running 0 57m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/awx-operator-controller-manager-metrics-service ClusterIP 10.43.178.126 <none> 8443/TCP 62m
service/awx-postgres ClusterIP None <none> 5432/TCP 57m
service/awx-service ClusterIP 10.43.117.74 <none> 80/TCP 57m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/awx-operator-controller-manager 1/1 1 1 62m
deployment.apps/awx 0/1 1 0 57m
NAME DESIRED CURRENT READY AGE
replicaset.apps/awx-operator-controller-manager-68d787cfbd 1 1 1 62m
replicaset.apps/awx-84d5c45999 1 1 0 57m
NAME READY AGE
statefulset.apps/awx-postgres 1/1 57m
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/awx-ingress <none> awx.mydomain.net 199.XXX.XXX.21 80, 443 57m
NAME TYPE DATA AGE
secret/default-token-t9j72 kubernetes.io/service-account-token 3 62m
secret/awx-operator-controller-manager-token-lgwjk kubernetes.io/service-account-token 3 62m
secret/awx-admin-password Opaque 1 57m
secret/awx-postgres-configuration Opaque 6 57m
secret/awx-secret-tls kubernetes.io/tls 2 57m
secret/awx-secret-key Opaque 1 57m
secret/awx-broadcast-websocket Opaque 1 57m
secret/awx-app-credentials Opaque 3 57m
secret/awx-token-fjbtc kubernetes.io/service-account-token 3 57m
Hi, it would be very useful to pull the execution environments images from the AWS Elastic Container Registry, is it possible to implement such feature?
Thank you for everything, this project is awesome! :)
I want copy file from awx to remote host, but i don't understand how to it work. It show me an error as below.
Could not find or access 'copyfile.txt'\nSearched in:\n\t/runner/project/files/copyfile.txt\n\t/runner/project/copyfile.txt\n\t/runner/project/files/copyfile.txt\n\t/runner/project/copyfile.txt on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option",
I don't know /runner/project is where. I did try go to awx-ee and create copyfile.txt in there but it not working.
ERROR
{
"msg": "Could not find or access 'copyfile.txt'\nSearched in:\n\t/runner/project/files/copyfile.txt\n\t/runner/project/copyfile.txt\n\t/runner/project/files/copyfile.txt\n\t/runner/project/copyfile.txt on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option",
"exception": "Traceback (most recent call last):\n File "/usr/local/lib/python3.8/site-packages/ansible/plugins/action/copy.py", line 466, in run\n source = self._find_needle('files', source)\n File "/usr/local/lib/python3.8/site-packages/ansible/plugins/action/init.py", line 1364, in _find_needle\n return self._loader.path_dwim_relative_stack(path_stack, dirname, needle)\n File "/usr/local/lib/python3.8/site-packages/ansible/parsing/dataloader.py", line 341, in path_dwim_relative_stack\n raise AnsibleFileNotFound(file_name=source, paths=[to_native(p) for p in search])\nansible.errors.AnsibleFileNotFound: Could not find or access 'copyfile.txt'\nSearched in:\n\t/runner/project/files/copyfile.txt\n\t/runner/project/copyfile.txt\n\t/runner/project/files/copyfile.txt\n\t/runner/project/copyfile.txt on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option\n",
"invocation": {
"src": "copyfile.txt",
"dest": "/opt/",
"module_args": {
"src": "copyfile.txt",
"dest": "/opt/"
}
},
"_ansible_no_log": false,
"changed": false
}
First of all, thank you for your amazing job.
Maybe I'm missing something but I recently messed up my awx.
I've changed my /etc/hostname
and I think it did break some things, anyway I've made some changes inside my AWX so I didn't wanted to lose all my work, that's why I've delete everything except my data which is in the directory /data/
Since I wanted to put it in production, I've changed passwords in base/kustomization.yaml
and I think now postgres cannot access the database
2022-02-03T15:32:58.888085423+01:00 stderr F 2022-02-03 14:32:58.887 UTC [1486] FATAL: password authentication failed for user "awx" 2022-02-03T15:32:58.888112407+01:00 stderr F 2022-02-03 14:32:58.887 UTC [1486] DETAIL: Connection matched pg_hba.conf line 99: "host all all all scram-sha-256"
and I don't know where I can tell it that I've changed password. Is it possible to do this without recreate a new db ? I also put a new password for awx-admin-password but for me, it's only to log in the web interface
Hi there,
On my AWX on K3S deployment, I get a "Bad Gateway" same as described here in issue #10. Can you please give me some pointer how may I go about troubleshooting this?
I'm able to start all of the pods:
awx-nathan:~ # kubectl -n awx get all,ingress,awx
NAME READY STATUS RESTARTS AGE
pod/awx-postgres-0 1/1 Running 0 141m
pod/awx-operator-controller-manager-68d787cfbd-2jlsf 2/2 Running 0 144m
pod/awx-b85cd74b6-l4zn4 4/4 Running 20 (6m49s ago) 140m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/awx-operator-controller-manager-metrics-service ClusterIP 10.43.178.253 <none> 8443/TCP 149m
service/awx-postgres ClusterIP None <none> 5432/TCP 141m
service/awx-service ClusterIP 10.43.137.81 <none> 80/TCP 140m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/awx-operator-controller-manager 1/1 1 1 149m
deployment.apps/awx 1/1 1 1 140m
NAME DESIRED CURRENT READY AGE
replicaset.apps/awx-operator-controller-manager-68d787cfbd 1 1 1 149m
replicaset.apps/awx-b85cd74b6 1 1 1 140m
NAME READY AGE
statefulset.apps/awx-postgres 1/1 141m
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/awx-ingress <none> awx-nathan.com 10.188.28.216 80, 443 140m
NAME AGE
awx.awx.ansible.com/awx 141m
But when I try to access the AWX GUI, I get a "Bad Gateway error":
Any pointer would be really appreciated! Thanks.
After deployment, navigating to https://[ip] serves a blank page with the only text being "404 page Not Found".
Running kubectl describe on ingress
$ kubectl -n awx describe ingress
Name: awx-ingress
Namespace: awx
Address: 192.168.0.105
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
TLS:
awx-secret-tls terminates [hostname]
Rules:
Host Path Backends
---- ---- --------
[hostname]
/ awx-service:80 (10.42.0.11:8052)
Annotations: <none>
Events: <none>
Output of all AWX related pods, etc.
$ kubectl -n awx get awx,all,ingress,secrets
NAME AGE
awx.awx.ansible.com/awx 92m
NAME READY STATUS RESTARTS AGE
pod/awx-operator-controller-manager-5ddf49cc4f-2nwdx 2/2 Running 0 96m
pod/awx-postgres-0 1/1 Running 0 91m
pod/awx-6c645b554-kgv2d 4/4 Running 0 90m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/awx-operator-controller-manager-metrics-service ClusterIP 10.43.98.31 <none> 8443/TCP 96m
service/awx-postgres ClusterIP None <none> 5432/TCP 91m
service/awx-service ClusterIP 10.43.231.213 <none> 80/TCP 90m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/awx-operator-controller-manager 1/1 1 1 96m
deployment.apps/awx 1/1 1 1 90m
NAME DESIRED CURRENT READY AGE
replicaset.apps/awx-operator-controller-manager-5ddf49cc4f 1 1 1 96m
replicaset.apps/awx-6c645b554 1 1 1 90m
NAME READY AGE
statefulset.apps/awx-postgres 1/1 91m
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/awx-ingress <none> awx.spartandev.local 192.168.0.105 80, 443 90m
NAME TYPE DATA AGE
secret/default-token-fjwqp kubernetes.io/service-account-token 3 96m
secret/awx-operator-controller-manager-token-9jb8t kubernetes.io/service-account-token 3 96m
secret/awx-admin-password Opaque 1 92m
secret/awx-postgres-configuration Opaque 6 92m
secret/awx-secret-tls kubernetes.io/tls 2 92m
secret/awx-app-credentials Opaque 3 90m
secret/awx-token-l2cjc kubernetes.io/service-account-token 3 90m
secret/awx-secret-key Opaque 1 92m
secret/awx-broadcast-websocket Opaque 1 91m
Kubernetes/K3s: X.Y.Z
#- AWX Operator: 20.4
#Something went wrong" issue on Jobs Settings by trying to edit
#go to settings in awx and try to edit Job settings.
Hello!
I like your straight forward approach to deploy AWX on Kubernetes. We have currently a setup where we deployed AWX 19 on Docker and therefore have an existing environment. Would it be possible to load a PostgreSQL database dump file instead of starting with a fresh install?
Thanks in advance and best regards
Jens
**## Environment
k3s --version
k3s version v1.22.7+k3s1 (8432d7f2)
go version go1.16.10
OS: CentOS X.Y, RHEL X.Y, Ubuntu X.Y, Debian X.Y, ...
CentOS Linux release 8.5.2111
Kubernetes/K3s: X.Y.Z
kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.7+k3s1", GitCommit:"8432d7f239676dfe8f748c0c2a3fabf8cf40a826", GitTreeState:"clean", BuildDate:"2022-02-24T23:03:47Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.7+k3s1", GitCommit:"8432d7f239676dfe8f748c0c2a3fabf8cf40a826", GitTreeState:"clean", BuildDate:"2022-02-24T23:03:47Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
AWX Operator: 0.20.0
I can't find files that should be created in /data/projects/_???
The playbook:
The template run:
EXEC /bin/sh -c 'rm -f -r /home/runner/.ansible/tmp/ansible-tmp-1649343859.1696134-25-67367807368605/ > /dev/null 2>&1 && sleep 0'
changed: [localhost] => {
"archived": [
"compress.txt"
],
"arcroot": "",
"changed": true,
"dest": "compress.txt.zip",
"dest_state": "archive",
"expanded_exclude_paths": [],
"expanded_paths": [
"compress.txt"
],
"gid": 0,
"group": "root",
"invocation": {
"module_args": {
"attributes": null,
"dest": null,
"exclude_path": [],
"exclusion_patterns": null,
"force_archive": false,
"format": "zip",
"group": null,
"mode": null,
"owner": null,
"path": [
"compress.txt"
],
๏ฟฝโฆ
TASK [complete message] ********************************************************
task path: /runner/project/awx_compress.yml:9
This runs and says it is Successful.
I source the project and then run the template.
I just can't find where compress.txt.zip is going if at all.
If I run this using ansible-playbook compress.yml on any system and it works and produces the file.
Template info
Back to Templates
Details
Access
Notifications
Schedules
Jobs
Survey
Name
compress
Job Type
run
Organization
Default
Inventory
Demo Inventory
Project
compress
Execution Environment
Control Plane Execution Environment
Playbook
awx_compress.yml
Forks
0
Verbosity
4 (Connection Debug)
Timeout
0
Show Changes
Off
Job Slicing
1
Created
4/6/2022, 5:32:35 PM by admin
Last Modified
4/7/2022, 10:51:03 AM by admin
Credentials
SSH: Demo Credential
Variables
Project info
Back to Projects
Details
Access
Job Templates
Notifications
Schedules
Last Job Status
Successful
Name
compress
Organization
Default
Source Control Type
Git
Source Control Revision
800c8ae
Source Control URL
http://192.168.99.201/root/compress_playbook.git
Source Control Credential
Scm: local_git_user
Cache Timeout
0 Seconds
Default Execution Environment
AWX EE (latest)
Project Base Path
/var/lib/awx/projects
Playbook Directory
_8__compress
Created
4/6/2022, 5:30:36 PM by admin
Last Modified
4/7/2022, 11:04:00 AM by admin
**
First, a BIG thank you for this useful and well documented project! This makes deployments of AWX a lot easier when using K3S ๐
I am opening this issue (feel free to redirect me if this is not the correct channel) because my AWX deployment fails following your instructions:
manager_logs.txt
plicas\": 1, \"updatedReplicas\": 1}}}\n\r\nPLAY RECAP *********************************************************************\r\nlocalhost : ok=32 changed=0 unreachable=0 failed=1 skipped=25 rescued=0 ignor
ed=0 \r\n\n","job":"5713703289679536467","name":"awx","namespace":"awx-vincent","error":"exit status 2"}
{"level":"error","ts":1634033628.8462682,"logger":"controller-runtime.manager.controller.awx-controller","msg":"Reconciler error","name":"awx","namespace":"awx-vincent","error":"event runner on failed","stacktrace":"sigs.k8s.io/controller
-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.
func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214"}
{"level":"info","ts":1634034630.7230046,"logger":"logging_event_handler","msg":"[playbook task start]","name":"awx","namespace":"awx-vincent","gvk":"awx.ansible.com/v1beta1, Kind=AWX","event_type":"playbook_on_task_start","job":"789813676
9315806288","EventData.Name":"installer : Patching labels to AWX kind"}
The AWX pods cannot be started:
((0.14.0))$ k get pods
NAME READY STATUS RESTARTS AGE
awx-operator-controller-manager-5486747db4-9xb69 2/2 Running 0 114m
awx-postgres-0 1/1 Running 0 93m
awx-84d5c45999-qxgp9 0/4 Pending 0 93m
Upon inspection, this is because of a persistent volume claim which is not bound:
$ k describe pods awx-84d5c45999-qxgp9
[...]
awx-projects:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: awx-projects-claim
ReadOnly: false
awx-token-57kdv:
Type: Secret (a volume populated by a Secret)
SecretName: awx-token-57kdv
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 94m default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Indeed:
((0.14.0))$ k get persistentvolumeclaims
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
awx-projects-claim Pending awx-projects-volume 95m
postgres-awx-postgres-0 Bound awx-postgres-volume 2Gi RWO awx-postgres-volume 95m
The failure appears because of:
((0.14.0))$ k describe persistentvolumeclaims awx-projects-claim
Name: awx-projects-claim
Namespace: awx-vincent
StorageClass: awx-projects-volume
Status: Pending
Volume:
Labels: <none>
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: awx-84d5c45999-qxgp9
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 37s (x382 over 95m) persistentvolume-controller storageclass.storage.k8s.io "awx-projects-volume" not found
The thing is, I cannot find any definition for this storage class and the cluster does not provide it otherwise:
((0.14.0))$ k get storageclasses.storage.k8s.io
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-path (default) rancher.io/local-path Delete WaitForFirstConsumer false 7d23h
Two more important notes I think:
awx
, rather awx-vincent
. I had to change some values under base/
to match for that (files attached: base.zip)awx/
namespace. cf awx/awx-projects-claim
in the following output, although I don't know what could be causing the problem (aren't persistent volumes namespace-agnostic?). The describe
command above showed it is in the correct namespace too.((0.14.0))$ k get persistentvolume
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
awx-projects-volume 2Gi RWO Retain Released awx/awx-projects-claim awx-projects-volume 110m
awx-postgres-volume 2Gi RWO Retain Bound awx-vincent/postgres-awx-postgres-0 awx-postgres-volume 110m
awx-postgres-volume
one, which seem to use the same notation/storage class for awx-postgres-volume
, which is resolved/bound.Could you let me know what I could be doing wrong here? Thanks for your help !
Thank-you for this awesome guide. I was able to get AWX working with minimal difficulties.
Would it be possible to get an alternate set of documentation that shows how to use an external (unmanaged) PostgreSQL database?
Getting stuck with status ImagePullBackOff on the awx-xxxxxxxxxxxxxx pod. I have attempted this install twice on a clean CentOS 8 Stream server. Server is 4 CPU, 12 GB Ram, 80 GB storage. I have been following the instructions to the T, so not sure what is going wrong.
Attached is a sample of the output from "kubectl -n awx logs -f deployments/awx-operator-controller-manager -c" if it helps.
There are a few major changes in 0.14.0.
make
to deploy AWX OperatorDeployment in endless loop, worked once a few days ago, but not anymore
deploy via README.md (with acme issuer)
...has timed out progressing.", "reason": "ProgressDeadlineExceeded", "status": "False", "type": "Progressing"}], "observedGeneration": 1, "replicas": 1, "unavailableReplicas": 1, "updatedReplicas": 1}}}\n\r\nPLAY RECAP *********************************************************************\r\nlocalhost :
ok=41 changed=0 unreachable=0 failed=1 skipped=30 rescued=0 ignored=0
\r\n\n","job":"3116155310435672272","name":"awx","namespace":"awx","error":"exit status 2"}
{"level":"error","ts":1640068891.7860224,"logger":"controller-runtime.manager.controller.awx-controller","msg":"Reconciler error","name":"awx","namespace":"awx","error":"event runner on failed","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214"}
Hi,
Not a bug, but was wondering if backup/restore procedure would work with Azure Storage? If so, could you add an example some time in the future?
BR
I noticed that new versions of AWX and the AWX Operator were released yesterday:
AWX Operator 0.20.1
AWX 21.0.0
It would be great if you could bump the revisions in your repo to support this. Happy to help test this - a couple of bugs in AWX 20 as you've commented on (ansible/awx#11765) were causing problems for us, so it would be great to upgrade.
Hello,
I wanted to add an extra tips page to add custom proxy setting for the awx-web, awx-task and awx-ee containers, but I don not have permission to push my branch.
Would you mind adding me to the list of contributors? I'm in the early stage of conributing to projects in GitHub, but I'm willing to learn.
Thanks and best regards
Jens
I've modified awx password under awx.yaml & deployed with my own Custom Certs.
However, post deployment SSL is not enabled & even modified password is not working.
In kubectl secret | base 64, I can see the password is updated with specified one.
But when changed the admin password through awx-manage (inside pod) it worked.
Can you pls. retest if both SSL & Custom password is working.
Can you please create a guide for safely updating to a newer version of AWX Operator?
root@u500-cube-server:~/awx-on-k3s# kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.43.0.1 443/TCP 29m
kube-system kube-dns ClusterIP 10.43.0.10 53/UDP,53/TCP,9153/TCP 29m
kube-system metrics-server ClusterIP 10.43.55.221 443/TCP 29m
kube-system traefik LoadBalancer 10.43.185.225 192.168.5.104 80:31716/TCP,443:31604/TCP 28m
default awx-operator-metrics ClusterIP 10.43.102.194 8383/TCP,8686/TCP 28m
awx awx-postgres ClusterIP None 5432/TCP 25m
awx awx-service ClusterIP 10.43.251.54 80/TCP 25m
root@u500-cube-server:~/awx-on-k3s# kubectl -n awx get awx,all,ingress,secrets
NAME AGE
awx.awx.ansible.com/awx 29m
NAME READY STATUS RESTARTS AGE
pod/awx-postgres-0 1/1 Running 0 29m
pod/awx-59ff55b5b-2czpx 4/4 Running 2 28m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/awx-postgres ClusterIP None 5432/TCP 29m
service/awx-service ClusterIP 10.43.251.54 80/TCP 28m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/awx 1/1 1 1 28m
NAME DESIRED CURRENT READY AGE
replicaset.apps/awx-59ff55b5b 1 1 1 28m
NAME READY AGE
statefulset.apps/awx-postgres 1/1 29m
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/awx-ingress awx.tunninet.com 192.168.5.104 80, 443 28m
NAME TYPE DATA AGE
secret/awx-admin-password Opaque 1 29m
secret/default-token-fwskh kubernetes.io/service-account-token 3 29m
secret/awx-postgres-configuration Opaque 6 29m
secret/awx-secret-tls kubernetes.io/tls 2 29m
secret/awx-app-credentials Opaque 3 28m
secret/awx-token-mdtqh kubernetes.io/service-account-token 3 28m
secret/awx-secret-key Opaque 1 29m
secret/awx-broadcast-websocket Opaque 1 29m
Hi,
Upgraded 14.0.0 to 21.0.0 and sure enough there is a bug... again. Downgrading resultet in an unknown AWX error so now I have to restore for the first time.
I've deleted everything and have the backup folder here: /data/restore/tower-openshift-backup-2022-05-05-02:01:07
No backup deployment present. Just the files.
Next step: kubectl apply -k restore
Now I am unsure how restore/awxrestore.yaml
should look. Is this correct? Should a base awx be installed first before restoring the pvc?
---
apiVersion: awx.ansible.com/v1beta1
kind: AWXRestore
metadata:
name: awxrestore-2022-05-05
namespace: awx
spec:
deployment_name: awx
# Parameters to restore from AWXBackup object
#backup_pvc_namespace: awx
#backup_name: awxbackup-2021-06-06
# Parameters to restore from existing files on PVC (without AWXBackup object)
backup_pvc_namespace: awx
backup_pvc: awx-backup-claim
backup_dir: /backups/tower-openshift-backup-2022-05-05-02:01:07
OS: RHEL 8.5
Kubernetes/K3s: v1.22.6+k3s1 (3228d9cb)
AWX Operator: 0.16.1
It may be helpful for new users to this project to be aware of the storage requirements - perhaps in the "Prepare CentOS 8 host" section - as I ran into a couple of problems when I first started using this repository.
I ran out of room in /var
as it is on a dedicated filesystem. I've since created a /var/lib/rancher
filesystem which on my base AWX install consumes around 5.6GB of space. I'm unsure if this is going to grow and by how much - is it possible to make a reference to this directory/filesystem and perhaps some initial sizing suggestions?
Our build has a restrictive umask, so my /data directories were different from yours.
drwxr-x---. 3 root root 18 Feb 16 09:07 /data/postgres
drwxr-x---. 2 root root 6 Feb 16 09:07 /data/postgres/data
drwxr-x---. 2 1000 root 6 Feb 16 09:07 /data/projects
However, I followed the steps https://github.com/kurokobo/awx-on-k3s/blob/main/tips/troubleshooting.md#the-pod-for-postgresql-is-in-crashloopbackoff-state-and-shows-permission-denied-log which resolved the issue. I now perform the following on a fresh install:
sudo chmod 755 /data/postgres /data/postgres/data
Is it worth adding that to the setup docs as it seems like a firm requirement?
I created a /data filesystem for my environment - again, unsure if that should be documented. However, I did see this in the base configuration:
postgres_storage_class: awx-postgres-volume
postgres_storage_requirements:
requests:
storage: 2Gi
If a database grows more than this size, is it simply a case of updating that file and running kubectl apply -k base
Follow the deployment guide on a freshly installed Operating System.
k3s version: v1.22.5+k3s1 (405bf79d)
The installation and configuration of awx goes immediately well. There are no error messages.
But when I try to access awx via my ip address (http://10.10.0.9) the error message comes: "404 page not found". The same goes for access via https.
What am I missing?
Hello! Your project has been very useful for us in creating running AWX environments.
I have an environment where we are considering "locking down" kubectl interactions to root user, so am contemplating changing kubeconfig mode to 600/640. Is there risk to doing so? I do not understand the architecture well enough to understand why this kubeconfig file is made readable to non-root users in guide.
Thank you for your consideration.
Hi,
I am experiencing CrashLoopBackOff on the awx-postgres-0 pod. The GUI is not available either (getting 404). Have you experienced this issue, too?
I am running on Ubuntu 20.04.2 LTS.
Thank you for your help and for your great project!
Petr
testssl gives me a few vulnerable ciphers/SSL configuration of the AWX instance
SSLv2 not offered (OK)
SSLv3 not offered (OK)
TLS 1 offered (deprecated)
TLS 1.1 offered (deprecated)
TLS 1.2 offered (OK)
TLS 1.3 offered (OK): final
Triple DES Ciphers / IDEA offered
Obsolete CBC ciphers (AES, ARIA etc.) offered
Has server cipher order? no (NOT ok)
deploy awx-operator and awx-on-k3s with cert-manager
how can this be solved? thank you :)
Attempting to restore awx from another 0.14.0 awx installation. The folder was copied.
Restore fails with message /data/backup/tower-openshift-backup-2021-10-19 does not exist, see the backupDirectory status on your AWXBackup for the correct backup_dir.
---
apiVersion: awx.ansible.com/v1beta1
kind: AWXRestore
metadata:
name: awxrestore-2021-10-20
namespace: awx
spec:
# Parameters to restore from existing files on PVC (without AWXBackup object)
backup_pvc_namespace: awx
backup_pvc: awx-backup-claim
backup_dir: /data/backup/tower-openshift-backup-2021-10-19
The backup exists in /data/backup/tower-openshift-backup-2021-10-19
First attempt resulted in the message that awx-backup-claim did not exist. So I executed kubectl apply -f backup/awxbackup.yml
in order to create awx-backup-claim.
For some reason AWX can't see the folder /data/backup/tower-openshift-backup-2021-10-19
which puzzles me?
BR
Hi, thanks for this great guide!
When running jobs with a large number of hosts we found that we would trigger ansible/awx#10366 which causes the Job in AWX to exit with the message "error" and no summary even through the ee pod would still finish the playbook in the background.
As described in the issue linked above the error occurs when container logs are rotated by the kubelet causing the Kubernetes log stream which AWX uses to fail.
I can confirm that by increasing the maximum container log size as described by this comment ansible/awx#10366 (comment) the issue can be worked around until the root cause is fixed.
We used the following command to update/install K3s with the increased maximum:
curl -sfL https://get.k3s.io | K3S_KUBECONFIG_MODE="644" INSTALL_K3S_EXEC="--kubelet-arg "container-log-max-size=150Mi"" sh -
Maybe it's worth increasing the maximum even further than 150Mi (from the 10Mi default) for bigger environments.
Hi all
I ve got a little problem when trying to update with the instructions.
Backup of 19.3 ok.
git clone https://github.com/ansible/awx-operator.git >> OK
cd awx-operator >> OK
git checkout 0.14.0 >> OK
export NAMESPACE=awx >> ok
make deploy >> KO
/bin/sh: line 4: tar: command not found
curl: (23) Failed writing body (1349 != 1378)
make: *** [Makefile:95: kustomize] Error 127
Any idea?
Thank you!
Create a bastion host for all hosts of my inventories
that's not an issue at all, but do you have a way to implement use of a bastion host on k3s for all hosts on my inventories used on awx?
k3s version v1.21.5+k3s2 (724ef700)
go version go1.16.8
Followed your guide and upgraded from 0.14.0 to 0.17.0. First attempt failed due to resources, but worked after deleting the old deployment.
An error occurs when I execute the backup and the following command:
kubectl -n awx logs deployments/awx-operator-controller-manager -c manager --since=60m --tail=4
error: container manager is not valid for pod awx-operator-controller-manager-775b5cfc56-46htw
NAME READY STATUS RESTARTS AGE
awx-postgres-0 1/1 Running 3 121d
awx-operator-controller-manager-775b5cfc56-46htw 2/2 Running 0 22h
awx-7f55c57c85-7d6q8 4/4 Running 0 22h
[root@awx backup]# kubectl -n awx describe pod awx-operator-controller-manager-775b5cfc56-46htw
Name: awx-operator-controller-manager-775b5cfc56-46htw
Namespace: awx
Priority: 0
Node: awx.domain.local/10.6.104.4
Start Time: Thu, 17 Feb 2022 12:17:54 +0100
Labels: control-plane=controller-manager
pod-template-hash=775b5cfc56
Annotations: <none>
Status: Running
IP: 10.42.0.129
IPs:
IP: 10.42.0.129
Controlled By: ReplicaSet/awx-operator-controller-manager-775b5cfc56
Containers:
kube-rbac-proxy:
Container ID: containerd://2d90e3940e25a32c2d0da5436ddd817b8b3cc3cb2a7a1c612bf3284d1f85cedd
Image: gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0
Image ID: gcr.io/kubebuilder/kube-rbac-proxy@sha256:db06cc4c084dd0253134f156dddaaf53ef1c3fb3cc809e5d81711baa4029ea4c
Port: 8443/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=0.0.0.0:8443
--upstream=http://127.0.0.1:8080/
--logtostderr=true
--v=10
State: Running
Started: Thu, 17 Feb 2022 12:17:55 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-frw4h (ro)
awx-manager:
Container ID: containerd://408d16164ed4bb4b63f5f8a5e619b73d569a057f7674a93807cf0918e4151c9f
Image: quay.io/ansible/awx-operator:0.17.0
Image ID: quay.io/ansible/awx-operator@sha256:2ffa0449b9ee0961df3e4794c5da5bcea2a0f7677df2ddad63e07652fd11ef54
Port: <none>
Host Port: <none>
Args:
--health-probe-bind-address=:6789
--metrics-bind-address=127.0.0.1:8080
--leader-elect
--leader-election-id=awx-operator
State: Running
Started: Thu, 17 Feb 2022 12:17:56 +0100
Ready: True
Restart Count: 0
Liveness: http-get http://:6789/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: http-get http://:6789/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
ANSIBLE_GATHERING: explicit
ANSIBLE_DEBUG_LOGS: false
WATCH_NAMESPACE: awx (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-frw4h (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-frw4h:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
Any ideas? Thx in advance...
Hi,
Did a test backup which was executed as expected. The backup data exists. However, restore was not successful.
Not sure which part to pick out from the log but here is an example.
TASK [Create management pod from templated deployment config] ********************************
fatal: [localhost]: FAILED! => {"changed": false, "error": 422, "msg": "Failed to create object: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"Pod \\\\\"awxrestore-2021-08-30-db-management\\\\\" is invalid: [spec.volumes[0].persistentVolumeClaim.claimName: Required value, spec.containers[0].volumeMounts[0].name: Not found: \\\\\"awxrestore-2021-08-30-backup\\\\\"]\",\"reason\":\"Invalid\",\"details\":{\"name\":\"awxrestore-2021-08-30-db-management\",\"kind\":\"Pod\",\"causes\":[{\"reason\":\"FieldValueRequired\",\"message\":\"Required value\",\"field\":\"spec.volumes[0].persistentVolumeClaim.claimName\"},{\"reason\":\"FieldValueNotFound\",\"message\":\"Not found: \\\\\"awxrestore-2021-08-30-backup\\\\\"\",\"field\":\"spec.containers[0].volumeMounts[0].name\"}]},\"code\":422}\\n'", "reason": "Unprocessable Entity", "status": 422}
Did I miss something?
Thanks
I get a 404 error
I checked that the static files are not in the folder projects.
After the deployment the /data/folder in empty.
The folder postgres well contained the database.
To check if the folder is correct, i put it in 777.
After redeploy come back to 755 but is 0:1000 (instead of 1000:0) so something happen on it.
Don't understand why no files are in projects.
Any idea?
(Sorry for my english)
Originally posted by @Randy29800 in #3 (comment)
Hello,
Thank you for this, really appreciate it!. Where can i change the port number? say i want to run on port 8443 rather than default port 80?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.