Comments (13)
Hi Victor,
Hum since in this operator the client is internally defined, we can create a method there to refresh the token, not affecting the operator in general.
You can find the client @ kubernetes_job_operator/client.
Sadly I have little time at the moment to address this, but if you find a solution we can integrate it.
from kubernetesjoboperator.
Client file location: https://github.com/LamaAni/KubernetesJobOperator/blob/master/airflow_kubernetes_job_operator/kube_api/client.py
from kubernetesjoboperator.
Hi Zav,
Thanks for your quick response!
Yup, i've already saw that the original client is wrapped in operator's KubeApiRestClient
but, according to some comments I saw like kubernetes-client/python#741 (comment) just refreshing the token is not an option, so you have to create a new one. I didn't look that far, but probably we would need to recreate the configuration if authentication happens there. Maybe the fastest solution would be to create both of them in any call, that's why i'm saying that this could end with performance issues in some use cases.
Anyway, knowing that you're not fully available right now, i will try to write some code here. If some other contributor wants to join me, would be a good help.
Wish me luck :)
from kubernetesjoboperator.
I've got something that forks for us, but it's completely custom for our tricky use case. Apart from that, i was forced to do the fix from your 2.0 version because for some reason newer ones are not working. Maybe my college @carlosliarte (I think you already know him) can give you more details on that.
Given I'am in a hurry, i've decided to fork your repo and evolve our custom version from your 2.0 until we both have more time to look at all this details carefully (I wasn't able to run any test here to check if our changes break some other supported usage). The idea is to join those changes in your last version, make that version work in any case (including ours), and get back to this source. Probably by that time, the main issue in the oficial k8s client will be solved and we will just need to upgrade the client version, do some QA and publish a new release.
For the record, this are the (ugly) changes that worked for us:
https://github.com/duferdev/KubernetesJobOperator/pull/1
I will leave it in our preproduction enviroment for a while to check if they're stable
Feel free to close this issue if you want
from kubernetesjoboperator.
Hi Tnx,
I'll grab a look at that and see if I can integrate this idea into the operator and how. I would prefer it to be an option. The recreation of the token can be done, or the recreation of the client in the case of disconnect. It would take a while though, so apologies for that. I have started a new project recently and the load is high.
from kubernetesjoboperator.
Hi
I see you propagated the config file. Can you share your config? I have not read the amazon documentation yet.
from kubernetesjoboperator.
Hi Zav,
I would prefer it to be an option. The recreation of the token can be done, or the recreation of the client in the case of disconnect.
Yep, absolutely. Another improvement would be to catch 401 responses in order to reload the config and retry the request once per Unauthorized error.
It would take a while though, so apologies for that. I have started a new project recently and the load is high.
Don't worry, seems that we have it under control right now, also, this is OS right? :) we understand your situation. You are doing enough and we thank you for that.
Can you share your config? I have not read the amazon documentation yet.
Do you mean the kube conf file? I already posted in the issue description. This is the config format. Shadowed parameters are personal tokens, usernames and so. I don't see how that could be helpful. Anyway, if i'm wrong or you mean something else just tell me.
Thanks!
from kubernetesjoboperator.
Hi
I think I still need to grab a look at the underlining process. The PR you did over there that showed the changes to make the config work were helpfull.
I may have time for this in two three weeks.
Otherwise, if you find a solution I would love a PR.
Best
from kubernetesjoboperator.
Also,
Could you try a more simple command and verify that the issue repeats. E.g.
...
kind: Pod
...
spec:
containers:
- ...
command: |
num=1
while true; do
echo sleep 10
sleep 10
num=$((num +1))
if [ num -eq 100 ]; then break; fi
done
And just sleep for a very long time in your pod? Will that create the same error?
from kubernetesjoboperator.
Yes, for sure it will if the execution last longer than 15 min because AWS EKS security constrains.
Check: https://aws.github.io/aws-eks-best-practices/security/docs/iam/
The token has a time to live (TTL) of 15 minutes after which a new token will need to be generated. This is handled automatically when you use a client like kubectl, however, if you're using the Kubernetes dashboard, you will need to generate a new token and re-authenticate each time the token expires.
Right now, seems that the only way we can refresh those credentials is by reloading the whole config and creating a new client with it. Current open issue posted in the issue description explains why in detail.
But again, this is something that happen in some corner cases where you are using k8s python client against AWS EKS cluster. If you want to reproduce this locally you will need to raise an k8s cluster that emulates AWS behaviour, expiring k8s api tokens in 15 min.
from kubernetesjoboperator.
But that would not matter. We can recreate the client internally in the wrapper and download/update new creds.
I just need to understand how that happens and how to catch it.
If you dont mind trying that with the sleep command I sent, that would produce an example that we can put in examples until I am available to solve the issue for good.
from kubernetesjoboperator.
Would love a PR on that last one.
from kubernetesjoboperator.
Hi still have not gotten time to fix this. Also I have no access to the AWS could as of now.
from kubernetesjoboperator.
Related Issues (20)
- FEATURE: templated specification file / override specific fields HOT 5
- Duplicate pod logs HOT 4
- BUG: Delete pods after job completion HOT 6
- Connection reset by peer on long job run HOT 14
- Get pod logs when multiple containers are used HOT 7
- Uses cases and testimonials HOT 3
- Executing each task without restarting pods for every task HOT 7
- Is KubernetesJobOperator compatible with Apache Airflow >= v2.0? HOT 10
- FEATURE: Delete all completed tasks, regardless the final result of the task (successful or failed). HOT 24
- Investigate warnings HOT 3
- Secrets not found HOT 1
- Error after upgrading to airflow 2.3.3 HOT 2
- Cannot create resource "jobs" HOT 1
- No logs from pod HOT 3
- Namespace resolution failed HOT 5
- Any specific setup on kubernetes? HOT 22
- ConnectionResetError: [Errno 104] Connection reset by peer HOT 5
- Running task logs is incomplete HOT 12
- FEATURE: need 'templates_dict' to ensure proper rendering of time macros. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubernetesjoboperator.