Git Product home page Git Product logo

hystax / optscale Goto Github PK

View Code? Open in Web Editor NEW
983.0 24.0 140.0 619.27 MB

FinOps and MLOps platform to run ML/AI and regular cloud workloads with optimal performance and cost.

Home Page: https://hystax.com

License: Apache License 2.0

Dockerfile 0.29% Python 60.29% Mako 0.02% HTML 6.73% JavaScript 0.46% Shell 0.61% Mustache 0.02% Smarty 0.12% HCL 0.01% TypeScript 31.46%
aws azure devops gcp kubernetes cost-optimization finops mlops ml cloud-cost

optscale's Introduction

⭐ Drop a star to support OptScale ⭐

FinOps and MLOps platform to run ML/AI and regular cloud workloads with optimal performance and cost

OptScale is an open source FinOps and MLOps platform that provides cloud cost optimization for all types of organizations and MLOps capabilities like experiment tracking, model versioning, ML leaderboards.


PyPI - Python Version License Clouds Supported technologies

Customers ML Teams ML/AI models Average cloud cost savings


OptScale schema



MLOps capabilities FinOps and cloud cost optimization
  • ML Leaderboards with candidates and qualifications
  • Dataset and model tracking and versioning
  • Run metrics and experiment tracker
  • Hypertuning integrated with Optuna
  • Training launcher
  • ML Model training profiler
  • Optimal utilization of Reserved Instances, Savings Plans, and Spot Instances
  • Unused resource detection
  • R&D resource power management and rightsizing
  • S3 duplicate object finder
  • Resource bottleneck identification
  • Optimal instance type and family selection
  • Databricks support
  • S3 and Redshift instrumentation
  • VM Power Schedules

You can check OptScale live demo to explore product features on a pre-generated demo organization.

Learn more about the Hystax OptScale platform and its capabilities at our website.

Demos

ML Tasks ML Leaderboards
Experiment tracking ML model profiling integration
Datasets Hypertuning
Databricks connection Cost and performance recommendations
Cost geo map VM Power Schedules
Reserved Instances and Savings Plans Cost breakdown by owner

OptScale components and architecture



Getting started

Minimum hardware requirements for OptScale cluster: CPU: 8+ cores, RAM: 16Gb, SSD: 150+ Gb.

NVMe SSD is recommended.
OS Required: Ubuntu 20.04.
The current installation process does not work on Ubuntu 22.04

Installing required packages

Run the following commands:

sudo apt update ; sudo apt install git python3-venv python3-dev sshpass

Pulling optscale-deploy scripts

Clone the repository

git clone https://github.com/hystax/optscale.git

Change current directory:

cd optscale/optscale-deploy

Preparing virtual environment

Run the following commands:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Kubernetes installation

Run the following command: comma after ip address is required

ansible-playbook -e "ansible_ssh_user=<user>" -k -K -i "<ip address>," ansible/k8s-master.yaml

where - actual username; - host ip address, ip address should be private address of the machine, you can check it with

ip a

If your deployment server is the service-host server, add "ansible_connection=local" to the ansible command.

Creating user overlay

Edit file with overlay - optscale-deploy/overlay/user_template.yml; see comments in overlay file for guidance.

Cluster installation

run the following command:

./runkube.py --with-elk  -o overlay/user_template.yml -- <deployment name> <version>

or if you want to use socket:

./runkube.py --use-socket --with-elk  -o overlay/user_template.yml -- <deployment name> <version>

deployment name must follow the RFC 1123: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/

version:

  • Use hystax/optscale git tag (eg: 2023110701-public) if you use optscale public version.
  • Use your own tag version if you build your optscale images (eg: latest).

please note: if you use key authentication, you should have the required key (id_rsa) on the machine

Cluster update

Run the following command:

./runkube.py --with-elk  --update-only -- <deployment name>  <version>

Get IP access http(s):

kubectl get services --field-selector metadata.name=ngingress-nginx-ingress-controller

Troubleshooting

In case of the following error:

fatal: [172.22.24.157]: FAILED! => {"changed": true, "cmd": "kubeadm init --config /tmp/kubeadm-init.conf --upload-certs > kube_init.log", "delta": "0:00:00.936514", "end": "2022-11-30 09:42:18.304928", "msg": "non-zero return code", "rc": 1, "start": "2022-11-30 09:42:17.368414", "stderr": "W1130 09:42:17.461362  334184 validation.go:28] Cannot validate kube-proxy config - no validator is available\nW1130 09:42:17.461709  334184 validation.go:28] Cannot validate kubelet config - no validator is available\n\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/\nerror execution phase preflight: [preflight] Some fatal errors occurred:\n\t[ERROR Port-6443]: Port 6443 is in use\n\t[ERROR Port-10259]: Port 10259 is in use\n\t[ERROR Port-10257]: Port 10257 is in use\n\t[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists\n\t[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists\n\t[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists\n\t[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists\n\t[ERROR Port-10250]: Port 10250 is in use\n\t[ERROR Port-2379]: Port 2379 is in use\n\t[ERROR Port-2380]: Port 2380 is in use\n\t[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty\n[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["W1130 09:42:17.461362  334184 validation.go:28] Cannot validate kube-proxy config - no validator is available", "W1130 09:42:17.461709  334184 validation.go:28] Cannot validate kubelet config - no validator is available", "\t[WARNING IsDockerSystemdCheck]: detected \"cgroupfs\" as the Docker cgroup driver. The recommended driver is \"systemd\". Please follow the guide at https://kubernetes.io/docs/setup/cri/", "error execution phase preflight: [preflight] Some fatal errors occurred:", "\t[ERROR Port-6443]: Port 6443 is in use", "\t[ERROR Port-10259]: Port 10259 is in use", "\t[ERROR Port-10257]: Port 10257 is in use", "\t[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists", "\t[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists", "\t[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists", "\t[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists", "\t[ERROR Port-10250]: Port 10250 is in use", "\t[ERROR Port-2379]: Port 2379 is in use", "\t[ERROR Port-2380]: Port 2380 is in use", "\t[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty", "[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}

run the following command to reset k8s and retry the installation command:

sudo kubeadm reset -f
ansible-playbook -e "ansible_ssh_user=<user>" -k -K -i "<ip address>," ansible/k8s-master.yaml

In case of the following error during cluster initialization:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='172.22.24.157', port=2376): Max retries exceeded with url: /v1.35/auth (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f73ca7c3340>: Failed to establish a new connection: [Errno 111] Connection refused'))

check the docker port is opened:

sudo netstat -plnt | grep 2376

and open port in docker service config:

sudo nano /etc/systemd/system/docker.service

add this line (do not dorget to close docker port after installing Optscale)

ExecStart=/usr/bin/dockerd -H fd:// -H tcp://0.0.0.0:2376

then reload config and restart docker

sudo systemctl daemon-reload
sudo service docker restart

Roadmap

  • Cost plugin for MLflow, WanDB, and neptune.ai
  • Integration with Optuna to optimize Reserved Instance and other hardware parameter usage
  • Model versioning
  • Better hardware selection recommendations based on usage patterns and algorithms

Documentation

Read the full OptScale documentation 📖

Contributing

Please read and accept our Contribution Agreement before submitting pull requests.

Community

Hystax drives FinOps & MLOps methodology and has crafted a community of FinOps-related people. The community discusses FinOps & MLOps best practices, our experts offer users how-tos and technical recommendations, and provide ongoing details and updates regarding the open-source OptScale solution.

You can check it out on FinOps and MLOps in practice website

Contacts

Feel free to reach out to us with questions, feedback, or ideas at [email protected]. You can check out the latest news from Hystax at:

optscale's People

Contributors

ab-hystax avatar artm-hx avatar ek-hystax avatar elikkatzgit avatar hx-nick avatar maxb-hystax avatar mirlena777 avatar nexusriot avatar nk-hystax avatar sd-hystax avatar stanfra avatar tguisep avatar tm-hystax avatar v-hx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

optscale's Issues

Connecting an AWS account without static IAM user

Hello,

I've deployed OptScale and now want to add a data source, but my company has a security policy to not allow creation of static IAM users. Is there a possibility to add the data source in any other way, like for example using EC2 IAM role?

There is a document describing the manual import of CUR: https://hystax.com/documentation/optscale/e2e_guides/e2e_aws_root_manual_import.html
which is a possible solution, but it also needs a data source to be created with AWS credentials first.

Thanks in advance,
Paweł

GoogleMap API key included

Hello,

By default, the GoogleMap api key is included in the docker image ( AIzaSyAVsTq9KVpPWuIIDzhhvwOFvpnqJeldmoQ ).
You should remove-it, allowing to show the google map, in development purpose, to avoid loading error.

Thomas.

Overlay Setup for AWS

The instruction say to modify overlay/user_template.yml

I want to set this up for AWS only.
What AWS permissions does optscale need?
What would go in here?

    pricing_data:
      dataset_name: pricing_dataset
      table_name: cloud_pricing_export
    project_id: yourprojectid

How much, if any of these items can be configured after the install?

ELK/grafana password

Hello,

I'm looking for the user/password of the nginx proxy for the elk stack and grafana. (and maybe after the kibana/grafana password)
I couldn't find it (source code, secrets, configmap...).

service/elkservice service/grafana

Thanks,

Thomas.

Currency

If I change the currency, the symbol changes but the value remains remain. Any idea whats happening here?

Where is the money ?

Hello,

I'm using the API to process statistics.
But on one of the algorithm, I have a gap (10%) between my results and that I should have.
Through the API, I try to retrieves the cost of each EC2 instance:

Eg:
Cost of march for EC2 from aws billing us-east-1 : 20k$
Cost of march for EC2 from the Optscale UI us-east-1 : 20k$ (all good)

My algorithm:

===> Get all the EC2 for the region:
          cloud_resources = /restapi/v2/organizations/{self.organization_id}/cloud_resources 
          params:  region: us-east-1 
=====> for each **instance** in **cloud_resources**:
         resource_cost = /restapi/v2/resources/{resource_id}/raw_expenses
         params:  start_date=1677628800,  end_date=1680307199  (timestamp retrieved from UX call to be sure to have same timestamps for march)
         total_cost.append(resource_cost['total_cost'])

=== Finally:
sum(total_cost) ==>  18k$

So, where are the 2$k differences ?
I discovered than some EC2 are missing in cloud_resources.
/restapi/v2/organizations/{self.organization_id}/cloud_resources does-it return the current inventory only, or also all the previous resources (it may explain the gap) ? .

Thanks.

Where to find Detailed Documentation

Is there any detailed documentation on this system? The installation instructions are very minimal. And I have not yet found any instructions on how to use it after its installed. Or even how to log in.

FEATURE REQUEST: Use Docker socket instead of port

The line in runkube.py that creates a Docker client expects the daemon to be exposed on a local TCP port:

cl = DockerClient(base_url='tcp://{0}:{1}'.format(node, self.dport))

However, the default on most operating systems is a socket, and exposing the TCP port is advanced. This can be addressed by simply referencing the socket address instead:

cl = DockerClient(base_url='unix://var/run/docker.sock')

Ideally, it would check for both and use the first one it finds.

How billing data is handled

@sd-hystax May I ask how the bill data is processed, what is the logic of importing, cleaning, and aggregation, and how to operate repetitive data? Is there any relevant documentation? Taking AWS as an example, the first import of bills into mongo raw_expenses, the first How to update and insert in the second time? How to clean and aggregate the original data into the mongo resources and clickhouse expenses tables

OVH integration

Hello,

We are using OVH as secondary cloud provider.
Optscale do not take in charge this Cloud provider.

So:

  • Any plan / road map existing about this provider ?

If it's not the case, do you have any advise / starting point to integrate a new provider ?
It would help to have commit/branch example of a cloud provider integration ? Is-it available somewhere ?

Thanks,

Thomas.

Unittests

I am unable to run the test cases. Its giving me auth_server or cloud_adapter module error, even though they are folders.

Recommendation "eligible for upgrade" fail

Hello,

One of the recommendation is failling:
image

I could not find the source (IAM looks ok), and logs are not really talkative.
I tried with the API /insider/v2/swagger/spec.html, without result.

cloud_type for amazon aws = aws_cnr ?

Thanks.

Reflect modifications

If I try to modify in the files for RnD purpose, it's not reflecting in the browser. And I don't know how to restart the application, can you help me with it?

Google's Resource view is broken?

When i tried to view Resource tab, it goes like this.
Tried on other installation, also provide same view.

Currency : IDR
Env : AWS Ec2

image

Azure Enterprise Agreement

Optscale can't read data billing from subscription Azure Enterprise Agreement. I'm suspicious that the Azure Enterprise Agreement subscription api is another API.

Container

Can you please tell me, in which container are all the images saved? Cause when i am trying to do some changes in my repository, its not reflecting.

Security: remove default account

Hello,

There is a default account included by default:
user_root = User('root', type_root, '[email protected]', 'p@ssw0rd',

I tried & check through the API/UX.
This account is active, for security raison, that should definitively not be the case...

Thomas.

TASK [k8s-configure : Install nginx ingress controller] - connect: no route to host

I'm having issues deploying optscale on a VM in Oracle Cloud. Can you help?

`TASK [k8s-configure : Install nginx ingress controller] *******************************************************************************************************************************
fatal: [10.0.0.7]: FAILED! => {"changed": true, "cmd": "helm upgrade --install ngingress stable/nginx-ingress --set rbac.create=true --set controller.hostNetwork=true --set controller.extraArgs.default-ssl-certificate=default/defaultcert --set controller.kind=DaemonSet", "delta": "0:00:02.113905", "end": "2023-04-17 16:09:18.639070", "msg": "non-zero return code", "rc": 1, "start": "2023-04-17 16:09:16.525165", "stderr": "WARNING: This chart is deprecated\nError: UPGRADE FAILED: Get "https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=NAME%!D(MISSING)ngingress%!C(MISSING)OWNER%!D(MISSING)TILLER%!C(MISSING)STATUS%!D(MISSING)DEPLOYED\": dial tcp 10.96.0.1:443: connect: no route to host", "stderr_lines": ["WARNING: This chart is deprecated", "Error: UPGRADE FAILED: Get "https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=NAME%!D(MISSING)ngingress%!C(MISSING)OWNER%!D(MISSING)TILLER%!C(MISSING)STATUS%!D(MISSING)DEPLOYED": dial tcp 10.96.0.1:443: connect: no route to host"], "stdout": "UPGRADE FAILED\nError: Get "https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=NAME%!D(MISSING)ngingress%!C(MISSING)OWNER%!D(MISSING)TILLER%!C(MISSING)STATUS%!D(MISSING)DEPLOYED\": dial tcp 10.96.0.1:443: connect: no route to host", "stdout_lines": ["UPGRADE FAILED", "Error: Get "https://10.96.0.1:443/api/v1/namespaces/kube-system/configmaps?labelSelector=NAME%!D(MISSING)ngingress%!C(MISSING)OWNER%!D(MISSING)TILLER%!C(MISSING)STATUS%!D(MISSING)DEPLOYED": dial tcp 10.96.0.1:443: connect: no route to host"]}

NO MORE HOSTS LEFT ********************************************************************************************************************************************************************

PLAY RECAP ****************************************************************************************************************************************************************************
10.0.0.7 : ok=68 changed=59 unreachable=0 failed=1 skipped=20 rescued=0 ignored=0`

Update / Migration

Hello,

I'm running a previous version of OPTSCALE.

I'm interested about the last changes (specifically the BI module !), but the number of changes look very high.

So, I have two possibilities:

  • Try a upgrade of the current one (after backup)
  • Install a new version, running aside of the current one, integrate my parameters ( I made automation script to integrate everything quickly, I ll try too clean them and share it )

What do you suggest ?

Are you planning new massive update in the coming month ? ( It may be interesting to wait little bit in this case)

Anyway, good work !

Thomas.

Different RDS values ​​when comparing AWS Cost Explorer and Optscale Cost Explorer

I compared the costs of services obtained from AWS Cost Explorer and Cost Explorer from OptsCale. And the value obtained by Cost Explorer from OptsCale was $1200 higher compared to the value obtained from AWS Cost Explorer. I tried to see what could have happened (values ​​in different services, a bug, error in counting, etc) but I didn't find any answer.
Attached is an Excel file showing the daily comparison of the values ​​obtained and the total for each platform. The biggest difference is in the comparison of the RDS service, which is higher in the OptsCal Cost Explorer analysis of more than $800 compared to the AWS Cost Explorer.
analyze cost optscale aws cost explorer.xlsx

GCP Data Source is not active

Hi,
I tried to install and configure the optscale in my private GCP project. But I can not see GCP section in the "Connect Data Source" screen.
Is GCP support come with another version?

image

AWS - Pricing force update collection

Hello,

Aws prices are stored in mongodb/restapi/aws_prices

Is-there a way to force a update of this collection ? (API call...).
I have some issues on it, it would help to diag.

Thanks.

Some files are missing

I am unable to edit the files and build docker image due to missing files.
In NGUI, .env, scripts/prune_node_modules.sh are missing
In rest-api, to pull pip libraries http://pypi.dts.loc:8080/pip/ is not working, rest_api_server/recommendation_cleanup_scripts file is missing, live_demo.json is missing
and many more...

The solution is currently being loaded. How to check if it is really processing?

The solution is currently being loaded. This has been the status for the past 5 hours, with nothing shown on the UI. How do I check if any kind of processing is happening. The pods show as Evicted, Running, Completed, Pending, ErrImageNeverPull, but is there a way to check the logs/database to see if anything has been dumped/processed?

Helm task fails while running the Ansible command

I am using Ubuntu 20.04 and k8s version 1.17. Is this version f k8s still supported?
TASK [k8s-configure : Install nginx ingress controller] ****************************************************************************************************************fatal: [10.0.0.4]: FAILED! => {"changed": true, "cmd": "helm upgrade --install ngingress stable/nginx-ingress --set rbac.create=true --set controller.hostNetwork=true --set controller.extraArgs.default-ssl-certificate=default/defaultcert --set controller.kind=DaemonSet", "delta": "0:10:02.374654", "end": "2023-04-12 18:00:04.699181", "msg": "non-zero return code", "rc": 1, "start": "2023-04-12 17:50:02.324527", "stderr": "E0412 17:50:04.692378 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:04 socat[138996] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:50:05.699383 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:05 socat[138999] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:50:07.152980 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:07 socat[139035] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:50:09.921417 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:09 socat[139045] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:50:13.758916 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:13 socat[139068] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:50:21.097142 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:21 socat[139115] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:50:30.276485 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:30 socat[139186] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:50:48.744004 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:48 socat[139290] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:51:10.279392 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:51:10 socat[139402] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:51:44.926914 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:51:44 socat[139577] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:52:57.223321 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:52:57 socat[139981] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:54:30.179969 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:54:30 socat[140481] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nWARNING: This chart is deprecated\nE0412 17:55:04.693832 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:04 socat[140662] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:55:05.700376 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:05 socat[140664] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:55:07.626284 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:07 socat[140680] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:55:10.145650 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:10 socat[140700] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:55:14.077114 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:14 socat[140719] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:55:21.212406 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:21 socat[140756] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:55:31.452311 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:31 socat[140818] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:55:50.435016 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:50 socat[140919] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:56:12.910805 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:56:12 socat[141045] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:56:52.309720 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:56:52 socat[141258] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:58:06.678200 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:58:06 socat[141648] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nE0412 17:59:41.379685 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:59:41 socat[142173] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused\nError: UPGRADE FAILED: context deadline exceeded", "stderr_lines": ["E0412 17:50:04.692378 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:04 socat[138996] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:50:05.699383 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:05 socat[138999] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:50:07.152980 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:07 socat[139035] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:50:09.921417 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:09 socat[139045] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:50:13.758916 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:13 socat[139068] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:50:21.097142 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:21 socat[139115] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:50:30.276485 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:30 socat[139186] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:50:48.744004 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:50:48 socat[139290] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:51:10.279392 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:51:10 socat[139402] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:51:44.926914 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:51:44 socat[139577] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:52:57.223321 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:52:57 socat[139981] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:54:30.179969 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:54:30 socat[140481] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "WARNING: This chart is deprecated", "E0412 17:55:04.693832 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:04 socat[140662] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:55:05.700376 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:05 socat[140664] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:55:07.626284 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:07 socat[140680] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:55:10.145650 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:10 socat[140700] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:55:14.077114 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:14 socat[140719] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:55:21.212406 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:21 socat[140756] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:55:31.452311 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:31 socat[140818] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:55:50.435016 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:55:50 socat[140919] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:56:12.910805 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:56:12 socat[141045] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:56:52.309720 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:56:52 socat[141258] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:58:06.678200 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:58:06 socat[141648] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "E0412 17:59:41.379685 138988 portforward.go:400] an error occurred forwarding 42479 -> 44134: error forwarding port 44134 to pod 58b922e7943c6e9494360d5780fa390aa7ea142a4401c0e7af213f3178ebd009, uid : exit status 1: 2023/04/12 17:59:41 socat[142173] E connect(5, AF=2 10.0.0.4:44134, 16): Connection refused", "Error: UPGRADE FAILED: context deadline exceeded"], "stdout": "UPGRADE FAILED\nError: context deadline exceeded", "stdout_lines": ["UPGRADE FAILED", "Error: context deadline exceeded"]}

Unable to Deploy

Hello,

I am trying to deploy an instance of optscale using the instructions provided here. I have noticed the following issues when ansible-playbook command is run:

TASK [common : Copy tools requirements file] ************************************************************************************************************************************************************************************************
fatal: [10.0.15.214]: FAILED! => {"changed": false, "checksum": "ecae9ee210af06e1df285c88c933f53a583203df", "msg": "Destination /optscale/tools not writable"}

I overcame this by running chmod -R 0777 /optscale

TASK [k8s-configure : Create .kube] *********************************************************************************************************************************************************************************************************
fatal: [10.0.15.214]: FAILED! => {"changed": false, "gid": 1000, "group": "ubuntu", "mode": "0775", "msg": "chown failed: [Errno 1] Operation not permitted: b'/home/ubuntu/.kube'", "owner": "ubuntu", "path": "/home/ubuntu/.kube", "size": 4096, "state": "directory", "uid": 1000}

I have followed the instructions as per the page. What could be going wrong here? Anything that I have missed (that's not part of the deployment guide)?

Is the code in the shape that can be forked and used as is

It looks like the code doesn't have the dependencies (versions) correctly setup and hence I couldn't get it to run locally as is. However, running the deployment scripts work since the images are downloaded from the private docker repository of optscale. Is that by design or am I wrong in my assessment?

unauthorized request send to servers

I deployed optscale on my server using it's git repository in this process it send unauthorized request to other anonymous servers. I tried two times both time i get IT regulation mail regarding security breach.
How can i solve this. And can i get a demo of how to attach aws root user as it doesn't take root acoount.

Reflect Modifications

If I try to modify in the files for RnD purpose, it's not reflecting in the browser. And I don't know how to restart the application, can you help me with it?

Connect to GCP data source

Hi, I want to connect GCP data source follow your docs: https://hystax.com/documentation/optscale/e2e_guides/e2e_gcp.html
I have some question:

  • I have 1 ORG with about 100 project and 1 billing project . With service account, what project should I create SA?
    I saw SA have some compute roles, if i create SA in billing project, how SA can view compute data in other project?
  • I think I should deploy self hosted VM at billing project and provite SA role at ORG level?

Google Cloud billing report attempt failed.

I spun up an Ubuntu 20.04.6 LTS machine in AWS and followed setup documentation found here. The cluster comes up without issue.

Following the documentation found here, however, after setting up credentials, I receive:

Object name : GCP Organization
Object type : report_import
Description : Billing data import for cloud account GCP Organization (#####################) failed: 'NoneType' object has no attribute 'lower'

docker version not found issue

TASK [common : Install docker-ce] ********************************************************************************************************************************************************************
fatal: [172.31.29.83]: FAILED! => {"cache_update_time": 1677632144, "cache_updated": true, "changed": false, "msg": "'/usr/bin/apt-get -y -o "Dpkg::Options::=--force-confdef" -o "Dpkg::Options::=--force-confold" install 'docker-ce=5:19.03.'' failed: E: Version '5:19.03.' for 'docker-ce' was not found\n", "rc": 100, "stderr": "E: Version '5:19.03.' for 'docker-ce' was not found\n", "stderr_lines": ["E: Version '5:19.03.' for 'docker-ce' was not found"], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nPackage docker-ce is not available, but is referred to by another package.\nThis may mean that the package is missing, has been obsoleted, or\nis only available from another source\nHowever the following packages replace it:\n docker-ce-cli\n\n", "stdout_lines": ["Reading package lists...", "Building dependency tree...", "Reading state information...", "Package docker-ce is not available, but is referred to by another package.", "This may mean that the package is missing, has been obsoleted, or", "is only available from another source", "However the following packages replace it:", " docker-ce-cli", ""]}

PLAY RECAP *******************************************************************************************************************************************************************************************
172.31.29.83 : ok=10 changed=5 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0

can you tell me about Alibaba access details and slack access details?

secrets should be permanent for cluster installation

change them in overlay before cluster installation

secrets:
cluster: fc83d31-461d-44c5-b4d5-41a32d6c36a1
agent: 355d14dd-73b1-4834-aa0e-ccf10849c496

service credentials for service tasks (getting pricings for the recommendations)

recommendations will not work without this

service_credentials:
aws:
access_key_id:
secret_access_key:
azure:
client_id:
tenant:
secret:
subscription_id:
alibaba:
access_key_id:
secret_access_key:

encryption salt for encode user information

encryption_salt: myencypt10ns@lt

used for recommendations etc

smtp:
server:
email:
port:
password:
slacker:
slack_signing_secret:
slack_client_id:
slack_client_secret:

google calendar service settings

google_calendar_service:
access_key:
type: service_account
project_id: optscale
private_key_id: eeee000
private_key: |
-----BEGIN PRIVATE KEY-----
-----END PRIVATE KEY-----
client_email: [email protected]
client_id: ""
auth_uri: https://accounts.google.com/o/oauth2/auth
token_uri: https://oauth2.googleapis.com/token
auth_provider_x509_cert_url: https://www.googleapis.com/oauth2/v1/certs
client_x509_cert_url: https://www.googleapis.com/robot/v1/metadata/x509/[email protected]

encryption key

encryption_key: fffffxdddeadb33f

This overlay should be used for all non-production environments (?)

- https://console.developers.google.com/ to see registered origins for the Google OAuth client"

- https://portal.azure.com/ to see registered origins for the Microsoft OAuth client"

auth:
google_oauth_client_id: ""
microsoft_oauth_client_id: ""

accounts.google.comaccounts.google.com
Sign in - Google Accounts
'+.P(.N(

accounts.google.comaccounts.google.com
Google Cloud Platform
Google Cloud Platform lets you build, deploy, and scale applications, websites, and services on the same infrastructure as Google.

local test

Is there a way to conduct local testing? Previously, it was very inconvenient to directly update the mirror image for verification.

How install optscale?

Hello everyone!

I try to find some documentation on "how to install", but I didn't find it. I want to test in my environment (EC2 Instance for example).

Someone can help me in this case?

Thanks

Google SSO

I can't find where to setup the SSO with Google. Is there any updated documentation? Does this function works for free accounts?

Microsoft SSO

I can't find where to setup the SSO with Microsoft . Is there any updated documentation? Does this function works for free accounts?

Unable to run the ansible command

I have created a single EC2 instance that meets the requirements stated.
I SSHed into the instance and started running the install commands.

This command fails:
ansible-playbook -e "ansible_ssh_user=ubuntu" -k -K -i "10.70.2.60," ansible/k8s-master.yaml

I am using ubuntu which has no password.
I am running this on the instance itself so why does it need to SSH.
From the documentation, I had assumed all the commands listed were to be directly on the instance, do I have that wrong?
Do I need to create a specific users/pass combination on the target server and then run this command from a separate server?

Please let me know.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.