bf2fc6cc711aee1a0c2a / kas-installer Goto Github PK
View Code? Open in Web Editor NEWkas-installer
License: Apache License 2.0
kas-installer
License: Apache License 2.0
Error creating Kafka instance 'instance': region eu-central-1 is not supported for aws, supported regions are: [us-east-1]
Hello, can we get at least another us region enabled, e.g. us-east-2?
Thanks
kas-fleetshard is currently deployed from a set of static files in this repository. A possible enhancement would be to support deploying fleetshard from a git branch/reference. This would require running a build of fleetshard during the installation process to generate the necessary YAML artifacts and images.
cc: @k-wall , @racheljpg
Mention:
SUPPORTED_INSTANCE_TYPES
may break due to a new properties
./kas-installer.sh
(snip)....
configmap/fleetshard-operator-restarter created
Warning: batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
On the current tip of main (da3132b), I am experiencing an issue when I try to provision a Kafka instance using ./managed_kafka.sh --create gryan
. It gets stuck in a provisioning/failed state, because the Kafka brokers can't successfully start. Here's an example log: https://gist.github.com/grdryn/cc3605b8a7f92b5145e061defcf161fb#file-kafka-0-log-L726..L727
In the mas-sso Keycloak, I see a lot of cases of the following exception, which may be related?
18:24:44,033 ERROR [org.jboss.as.controller.management-operation] (management I/O-2) WFLYCTL0013: Operation ("read-attribute") failed - address: ([
("subsystem" => "infinispan"),
("cache-container" => "keycloak"),
("cache" => "userRevisions")
]): org.jboss.msc.service.ServiceNotFoundException: Service service org.wildfly.clustering.infinispan.cache.keycloak.userRevisions not found
at [email protected]//org.jboss.msc.service.ServiceContainerImpl.getRequiredService(ServiceContainerImpl.java:663)
at [email protected]//org.jboss.as.controller.OperationContextImpl$OperationContextServiceRegistry.getRequiredService(OperationContextImpl.java:2293)
at [email protected]//org.wildfly.clustering.service.ServiceSupplier$1.run(ServiceSupplier.java:54)
at [email protected]//org.wildfly.clustering.service.ServiceSupplier$1.run(ServiceSupplier.java:51)
at [email protected]//org.wildfly.clustering.service.PrivilegedActionSupplier.get(PrivilegedActionSupplier.java:37)
at [email protected]//org.wildfly.clustering.service.ServiceSupplier.get(ServiceSupplier.java:67)
at [email protected]//org.jboss.as.clustering.infinispan.subsystem.CacheMetricExecutor.execute(CacheMetricExecutor.java:53)
at [email protected]//org.jboss.as.clustering.infinispan.subsystem.CacheMetricExecutor.execute(CacheMetricExecutor.java:38)
at [email protected]//org.jboss.as.clustering.controller.MetricHandler.executeRuntimeStep(MetricHandler.java:75)
at [email protected]//org.jboss.as.controller.AbstractRuntimeOnlyHandler$1.execute(AbstractRuntimeOnlyHandler.java:59)
at [email protected]//org.jboss.as.controller.AbstractOperationContext.executeStep(AbstractOperationContext.java:1006)
at [email protected]//org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:743)
at [email protected]//org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:467)
at [email protected]//org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1423)
at [email protected]//org.jboss.as.controller.ModelControllerImpl.internalExecute(ModelControllerImpl.java:446)
at [email protected]//org.jboss.as.controller.ModelControllerImpl.lambda$executeForResponse$0(ModelControllerImpl.java:257)
at [email protected]//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:289)
at [email protected]//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:255)
at [email protected]//org.jboss.as.controller.ModelControllerImpl.executeForResponse(ModelControllerImpl.java:257)
at [email protected]//org.jboss.as.controller.ModelControllerImpl.executeOperation(ModelControllerImpl.java:251)
at [email protected]//org.jboss.as.controller.ModelControllerClientFactoryImpl$LocalClient.executeInModelControllerCl(ModelControllerClientFactoryImpl.java:275)
at [email protected]//org.jboss.as.controller.ModelControllerClientFactoryImpl$LocalClient.access$400(ModelControllerClientFactoryImpl.java:126)
at [email protected]//org.jboss.as.controller.ModelControllerClientFactoryImpl$LocalClient$1.run(ModelControllerClientFactoryImpl.java:168)
at [email protected]//org.jboss.as.controller.ModelControllerClientFactoryImpl$LocalClient$1.run(ModelControllerClientFactoryImpl.java:163)
at [email protected]//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:289)
at [email protected]//org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:255)
at [email protected]//org.jboss.as.controller.AccessAuditContext.doAs(AccessAuditContext.java:198)
at [email protected]//org.jboss.as.controller.AccessAuditContext.doAs(AccessAuditContext.java:175)
at [email protected]//org.jboss.as.controller.ModelControllerClientFactoryImpl$LocalClient.executeOperation(ModelControllerClientFactoryImpl.java:163)
at [email protected]//org.jboss.as.controller.LocalModelControllerClient.execute(LocalModelControllerClient.java:54)
at [email protected]//org.jboss.as.controller.LocalModelControllerClient.execute(LocalModelControllerClient.java:39)
at org.wildfly.extension.microprofile.metrics-smallrye@7.3.8.GA-redhat-00001//org.wildfly.extension.microprofile.metrics.MetricCollector.readAttributeValue(MetricCollector.java:331)
at org.wildfly.extension.microprofile.metrics-smallrye@7.3.8.GA-redhat-00001//org.wildfly.extension.microprofile.metrics.MetricCollector.access$400(MetricCollector.java:74)
at org.wildfly.extension.microprofile.metrics-smallrye@7.3.8.GA-redhat-00001//org.wildfly.extension.microprofile.metrics.MetricCollector$3.getValue(MetricCollector.java:205)
at org.wildfly.extension.microprofile.metrics-smallrye@7.3.8.GA-redhat-00001//org.wildfly.extension.microprofile.metrics.MetricCollector$3.getValue(MetricCollector.java:202)
at io.smallrye.metrics//io.smallrye.metrics.exporters.OpenMetricsExporter.createSimpleValueLine(OpenMetricsExporter.java:492)
at io.smallrye.metrics//io.smallrye.metrics.exporters.OpenMetricsExporter.exposeEntries(OpenMetricsExporter.java:192)
at io.smallrye.metrics//io.smallrye.metrics.exporters.OpenMetricsExporter.getEntriesForScope(OpenMetricsExporter.java:158)
at io.smallrye.metrics//io.smallrye.metrics.exporters.OpenMetricsExporter.exportAllScopes(OpenMetricsExporter.java:109)
at io.smallrye.metrics//io.smallrye.metrics.MetricsRequestHandler.handleRequest(MetricsRequestHandler.java:116)
at io.smallrye.metrics//io.smallrye.metrics.MetricsRequestHandler.handleRequest(MetricsRequestHandler.java:73)
at org.wildfly.extension.microprofile.metrics-smallrye@7.3.8.GA-redhat-00001//org.wildfly.extension.microprofile.metrics.MetricsContextService$1.handleRequest(MetricsContextService.java:81)
at [email protected]//org.jboss.as.domain.http.server.security.RealmReadinessHandler.handleRequest(RealmReadinessHandler.java:51)
at [email protected]//org.jboss.as.domain.http.server.security.ServerErrorReadinessHandler.handleRequest(ServerErrorReadinessHandler.java:35)
at [email protected]//io.undertow.server.handlers.PathHandler.handleRequest(PathHandler.java:91)
at [email protected]//io.undertow.server.handlers.ChannelUpgradeHandler.handleRequest(ChannelUpgradeHandler.java:211)
at [email protected]//io.undertow.server.handlers.cache.CacheHandler.handleRequest(CacheHandler.java:92)
at [email protected]//io.undertow.server.handlers.error.SimpleErrorPageHandler.handleRequest(SimpleErrorPageHandler.java:78)
at [email protected]//io.undertow.server.handlers.CanonicalPathHandler.handleRequest(CanonicalPathHandler.java:49)
at [email protected]//org.jboss.as.domain.http.server.ManagementHttpRequestHandler.handleRequest(ManagementHttpRequestHandler.java:57)
at [email protected]//org.jboss.as.domain.http.server.cors.CorsHttpHandler.handleRequest(CorsHttpHandler.java:75)
at [email protected]//org.jboss.as.domain.http.server.ManagementHttpServer$UpgradeFixHandler.handleRequest(ManagementHttpServer.java:717)
at [email protected]//io.undertow.server.Connectors.executeRootHandler(Connectors.java:390)
at [email protected]//io.undertow.server.protocol.http.HttpReadListener.handleEventWithNoRunningRequest(HttpReadListener.java:255)
at [email protected]//io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:136)
at [email protected]//io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:59)
at [email protected]//org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
at [email protected]//org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66)
at [email protected]//org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:89)
at [email protected]//org.xnio.nio.WorkerThread.run(WorkerThread.java:591)
Attempts to read the metrics currently fail within the fleet-manager. This occurs because there is no observability component in the kas-installer installed environment.
Internally, fleet-manager fails like this:
E0715 10:17:40.368875 1 api.go:262] error from metric Post "/api/metrics/v1/test/api/v1/query": can't request metrics without auth
E0715 10:17:40.368900 1 metrics.go:55] error getting metrics: KAFKAS-MGMT-9: failed to retrieve metrics
and returns 500 back to the caller.
Due to the removal of the apiextensions.k8s.io/v1beta1
API as of Kubernetes v1.22, the version of the Observatorium CRDs currently hosted cannot be applied in clusters based on this version or later (e.g. OpenShift v4.11), effectively causing the installation of the Observatorium Operator to fail with:
error: unable to recognize "STDIN": no matches for kind "CustomResourceDefinition" in version "apiextensions.k8s.io/v1beta1"
The fleetshard installation currently relies on (at least) the creation of several secrets in the fleet manager script. See if there is a way to untangle this so that an option to skip fleet manager is possible, similar to the way mas-sso and fleetshard installations can be skipped. The benefit is reduced wait time when repeatedly installing/uninstalling fleetshard during testing.
Using kas-installer for OSD 4.9.X does not work because of the observability operator version. It should install the latest version (currently 3.0.7) instead of the one used (I think it is 3.0.2). The installer is stuck on this step:
I really don't know about the backward compatibility, but we would need kas-installer to work with both versions of Openshift, 4.8.X and 4.9.X (and future) as we should always be testing with the latest version and been able to test in older versions to reproduce any issue.
Jenkins run: https://ci.int.devshift.net/view/managed-services/job/managed-kafka-perf-tests/65/console
Add instructions for generating operator bundle as per the following shell script: https://github.com/bf2fc6cc711aee1a0c2a/kas-installer/blob/main/operators/generate-kas-fleetshard-olm-bundle.sh
When running kas-installer.sh
installation is stuck in the step Waiting until KAS Fleet Manager Deployment is available...
with kas-fleet-manager
pod on CrashLoopBackOff
state. This is the error logs from the pod:
I1210 14:17:23.280718 1 environment.go:108] Initializing development environment
E1210 14:17:23.281175 1 environment.go:119] unable to read configuration files: yaml: unmarshal errors:
line 1: cannot unmarshal !!seq into config.InstanceTypeMap
F1210 14:17:23.281191 1 cmd.go:21] Unable to initialize environment: unable to read configuration files: yaml: unmarshal errors:
line 1: cannot unmarshal !!seq into config.InstanceTypeMap
Tested with the latest commit on december 8th.
kas-installer execution log from Jenkins: https://ci.int.devshift.net/job/managed-kafka-fault-tests-nightly/78/console
kas-fleet-manager pod failed on CrashLoopBackOff when running ./kas-installer.sh. Here are the logs from the service container:
F0117 08:46:45.271815 1 main.go:47] error running command: unknown flag: --vault-kind
Error: unknown flag: --vault-kind
Usage:
kas-fleet-manager serve [flags]
Allow overriding the https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/blob/70ccf0061089c6938b8ca4e8cb1c17bd42f6c426/templates/service-template.yml#L278-L286 parameter to configure quota for an org/user
Without this feature, a kas-installer user is limited to only one eval instance if their org is not part of the default orgs. A kas-installer user should be able to create standard instances (they own the Data plane cluster, they should do anything)
Scaling the kas-fleet-manager
deployment down, right before applying kas-fleet-manager-service
in the installation script, will not work when the kube context is not pointing to the kas-fleet-manager-*
namespace.
REGISTERED_USERS_PER_ORGANISATION
to mention grant enterprise quota. See https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/blob/main/config/quota-management-list-configuration.yaml for the stateENTERPRISE_CLUSTER_REGISTRATION_ALLOWED_ORGANIZATIONS
parameter to enlist allowed orgs for enterprise endpoints. This will be replaced by quota checks. See https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/blob/main/templates/service-template.yml#L180 for more. This might be replacement with quota checks in the future.KFM can be configured in dynamic scaling mode i.e the scaling mode is auto
as opposed to the current manual
mode which requires one to have an already provisioned data plane cluster.
When KFM is running in auto
mode:
See https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/blob/main/docs/architecture/data-plane-osd-cluster-dynamic-scaling.md for the different logic.
The scale up/down and various other configurations can be controlled via configuration knobs explained in https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/blob/main/config/dynamic-scaling-configuration.yaml
Since yesterday, when executing
./managed_kafka.sh --create instance
We always receive the status failed indefinitely.
After running ./managed_kafka.sh --list
, the output is:
{
"kind": "KafkaRequestList",
"page": 1,
"size": 1,
"total": 1,
"items": [
{
"id": "c86dbc2in864m3gvmfl0",
"kind": "Kafka",
"href": "/api/kafkas_mgmt/v1/kafkas/c86dbc2in864m3gvmfl0",
"status": "failed",
"cloud_provider": "aws",
"multi_az": true,
"region": "us-east-1",
"owner": "fvila_kafka_sre",
"name": "instance",
"created_at": "2022-02-16T10:45:04.398942Z",
"updated_at": "2022-02-16T10:50:31.890442Z",
"failed_reason": "failed to get desired Strimzi version c86dbc2in864m3gvmfl0",
"instance_type": "eval",
"reauthentication_enabled": true,
"kafka_storage_size": "1000Gi"
}
]
}
The intent of MANAGEDKAFKA_ADMINSERVER_EDGE_TLS_ENABLED
was that it would cause the admin server to be run HTTPs, edge, terminated. This feature seems to have become broken.
I've not looked why.
You can work around by setting the same env var on the fleetshard subscription.
Current only one region support is offered and the cloud provider is hardcoded to aws
.
fleet manager allows the configuration of multiple cloud provider and regions via the SUPPORTED_CLOUD_PROVIDERS: https://github.com/bf2fc6cc711aee1a0c2a/kas-fleet-manager/blob/70ccf0061089c6938b8ca4e8cb1c17bd42f6c426/templates/service-template.yml#L155 parameter and kas-installer can leverage that to expose support for multi cloud provider and regions.
kas-fleetshard
is currently installed using the yaml files that are copied over from the kas-fleetshard
repository, this should be converted to using a bundle. The only bundle currently available is in managed-tenants
repo. we should also consider a public-facing bundle as a separate effort.
There have been recent changes around the quota list management as per bf2fc6cc711aee1a0c2a/kas-fleet-manager#1368
The issue is to investigate any change that might be required or any new release notes that must go along when the KFM version is bumped. The changes are not supposed to be breaking changes as they are backward compatible so likely nothing will need to be done from user perspective
/cc @ziccardi to provide further insights on the changes
After this change has been applied, we cannot create an instance on the cluster, no matter the region where the cluster is (we have tried us-east-1, eu-west-1 and us-east-2). We always receive the following error:
./managed_kafka.sh --create kafka-instance
Error creating Kafka instance 'kafka-instance': cluster capacity exhausted
Currently, kas-installer configuration only deploys 1 keycloak pod and 1 keycloak-postgresql pod for all AZs in the k8s cluster.
I'd like support to deploy keycloak pods and keycloak-postgresql pods on every AZ, because we need at least 1 keycloak pod and 1 keycloak-postgresql always available to produce and consume messages when targeting external bootstrap url. More info in this thread: https://chat.google.com/room/AAAAHwoNLuU/H_Ulxb4OHi4
This request is needed to reproduce the outage AZ Fault scenario: https://issues.redhat.com/browse/MGDSTRM-7130
Since https://github.com/redhat-developer/app-services-guides/tree/main/rhoas-cli#installing-the-rhoas-cli cli tool has some of the same functionality like creating the service accounts, creating acls and creating managed-kafka we should replace custom scripts in favor of using this cli tool.
I'm noticing when deploying to ROSA (OpenShift 4.11.18), I'm seeing a deprecation warning.
serviceaccount/kas-fleet-manager configured
Warning: would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (containers "migration", "service", "envoy-sidecar" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "migration", "service", "envoy-sidecar" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "migration", "service", "envoy-sidecar" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "migration", "service", "envoy-sidecar" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/kas-fleet-manager created
KFM now exposes the admin server url bf2fc6cc711aee1a0c2a/kas-fleet-manager#1026
In some places kas-installer computes this e.g https://github.com/bf2fc6cc711aee1a0c2a/kas-installer/blob/main/smoke_test.sh#L40
This shouldn't be needed anymore because as soon as the kafka is provisioned, KFM will send this information with the proper protocol
During installation the kas-installer was reporting 'not yet ready: failed' followed by 'not yet ready: null'. This is despite the kafka being fully deployed with no issue
stacktrace_errors_fleetshard_sync_operator_2.pdf
warning_fleetshar_sync_operator_1.pdf
Add an option to the script to run without docker when generating custom kas fleetshard image. This would avoid having to obtain docker license.
I'm using kas-installer.sh
and it's getting stuck with the below error:
deployment keycloak-postgresql still not created. Waiting 10s...
I'm using 4.12.0 OSD version. The keycloak custom resource shows the below message :
message: no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.