cloudfoundry / app-autoscaler-release Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 50.0 93.66 MB

Automated scaling for apps running on Cloud Foundry

License: Apache License 2.0

Shell 4.55% HTML 2.43% Go 73.74% Makefile 1.26% Ruby 1.50% Java 15.72% Dockerfile 0.22% HCL 0.28% Nix 0.30%

cff-wg-app-runtime-interfaces cloud-foundry

app-autoscaler-release's People

Contributors

Stargazers

Watchers

app-autoscaler-release's Issues

Need better instructions on `go mod vendor` before creating the release

Currently, we have set GOPROXY=off in the packaging scripts for the golang packages which stop go build from trying to access the internet when building each binary, but... this required go mod download && go mod vendor to be run before creating the bosh release. This will be a manual step that needs to be run before we have a pipeline that creates a release automatically.

We need to document this somewhere.

Does the metrics introduced in the new autoscaler work with old cloudfoundry version?

@qibobo Hi Qibobo, compared the old app-autoscalerhttps://github.com/cfibmers/open-Autoscaler/tree/486d818e7047123339df45a4b7b0c9d15666fe51, which only support one metric type: Memory, the latest app-autoscaler support 4 scaling metrics: memoryused,memoryutil,responsetime,throughput, currently, the cloud foundry version we are using in our project is pretty old(cf-release v251), does those new metrics need some enhancement in newer cloudfoundry version to be able to scaling based on those metrics? Thanks a lot!

example/operation/external-db.yml

Can we use db_scheme: mysql instead of db_scheme: postgres in the external-db.yml file to Deploy autoscaler with external mysql database because we are using mysql db scheme.

Question on autoscaling

I configured autoscaling in my dev version of cloudfoundry cf release 256

I have applied the below policy to the application .

scenario:-

My application is having one instance only .
At the start start_date_time my application scales from 1 to 3 successfully .
But at the end_time it doesn't scale back to original count which is 1

Per sample catalog definition it should scale down .(I have a feeling that i am unable to understand specific_date schedule ) Any advice is great .
instance_min_count:
type: integer
minimum: 1
description: The number of instances to scale down to once recurrence period
instance_max_count:
type: integer
minimum: 1
description: Maximum number of instances to scale up during recurrence period
initial_min_instance_count:
type: integer
minimum: 1
description: The number of instances to scale up to as soon as the recurrence period starts

  "instance_min_count": 1,
  "instance_max_count": 4,
  "schedules": {
    "timezone": "America/New_York",
    "specific_date": [{
      "start_date_time": "2018-05-30T17:45",
      "end_date_time": "2018-05-30T17:50",
      "instance_min_count": 2,
      "instance_max_count": 4,
      "initial_min_instance_count": 3
    }]
  }
}```

Question on MemoryUTIL

Hello ,

I have deployed autoscaler using master branch with the policy
{
"instance_min_count": 1,
"instance_max_count": 4,
"scaling_rules": [{
"metric_type": "memoryutil",
"stat_window_secs": 60,
"breach_duration_secs": 60,
"threshold": 49,
"operator": "<",
"cool_down_secs": 60,
"adjustment": "-1"
}, {
"metric_type": "memoryutil",
"stat_window_secs": 60,
"breach_duration_secs": 60,
"threshold": 50,
"operator": ">",
"cool_down_secs": 60,
"adjustment": "+1"
}]
}

Below is the app usage .

According to policy the application should scale + 1 based on memory usage, but its not .

Am i doing anything incorrect ?

instances: 1/1
usage: 1G x 1 instances
urls: hello-world-new.run.us2.covapp.io
last uploaded: Thu Jul 26 19:27:31 UTC 2018
stack: covs-internal-stack
buildpack: covs-java-III

     state     since                    cpu    memory         disk           details
#0   running   2018-07-27 01:10:30 PM   0.1%   679.2M of 1G   317.7M of 1G
covladmins-MacBook-Pro-6:FakePolicy nsharma$ cf app hello-world
Showing health and status for app hello-world in org paas / space properties as nsharma...
OK

requested state: started
instances: 1/1
usage: 1G x 1 instances
urls: hello-world-new.run.us2.covapp.io
last uploaded: Thu Jul 26 19:27:31 UTC 2018
stack: covs-internal-stack
buildpack: covs-java-III

     state     since                    cpu    memory         disk           details
#0   running   2018-07-27 01:10:30 PM   0.1%   679.2M of 1G   317.7M of 1G ```

Need scripts to download all blobs for fissile?

Hello,

There no src for most of the packages dependency for jobs, does it mean I need to create a separate scripts to download all blobs for the packages when I want to use fissile to renter the release to docker images?

Thanks and Regards.
HAO

eventgenerator can't start up.

Hi,
Eventgenerator.yml.erb files can't be processed successfully, please kindly check following logs:
...
/var/vcap/jobs-src/eventgenerator/templates/eventgenerator.yml.erb:21:in get_binding': undefined method link' for #Bosh::Template::EvaluationContext:0x00560eea0c1418 (NoMethodError)
from /opt/hcf/configgin/lib/ruby/lib/ruby/2.1.0/erb.rb:850:in eval' from /opt/hcf/configgin/lib/ruby/lib/ruby/2.1.0/erb.rb:850:in result'
from /opt/hcf/configgin/lib/app/lib/generate.rb:30:in generate' from /opt/hcf/configgin/lib/app/bin/configgin:47:in block (2 levels) in

'
from /opt/hcf/configgin/lib/app/bin/configgin:36:in each' from /opt/hcf/configgin/lib/app/bin/configgin:36:in block in '
from /opt/hcf/configgin/lib/app/bin/configgin:33:in each' from /opt/hcf/configgin/lib/app/bin/configgin:33:in '
...
Please kindly help, thanks!

Compatible with older version Cloud Foundry?

Hello,
Probably silly questions, but does this latest autoscaler service compatible some older CF-release, for example https://github.com/cloudfoundry-attic/cf-release/tree/v251
Thanks and Regards
HAO

Support for sending metrics over Loggregator

As a CF component it would be nice if auto scalar published health and metrics over loggregator. CF Operators have all had to integrate loggregator into their monitoring systems to monitor all of the CF Core Components. It would be nice if auto scalar supported the same metrics system as all the other Cf core components. This would allow operators who wished to add auto scalar to their system to not have to bother with monitor auto scalar different than any other CF component.

Today auto scalar provides a Prometheus endpoint which is great for those using Prometheus to monitor their Cloud Foundry Deployments but not so great for those who don't.

With the rewrite to golang that might simplify integration with loggregator since loggregator provides a common golang library for this purpose.

acceptance test for CPU based scaling

it's great to have CPU metrics supported, is there any plans to test it in src/acceptance?

Updating instance asactors failed during the app-autoscaler deployment with databasechangeloglock relation already exists error

I consistently ran this error for any new deployment of app-autoscaler (release 3.0.1):

Task 154346 | 22:10:49 | Preparing deployment: Preparing deployment (00:00:02)
Task 154346 | 22:10:53 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 154346 | 22:10:53 | Creating missing vms: postgres_autoscaler/2c1bd12b-b73d-4108-9820-99a30064e0bb (0)
Task 154346 | 22:10:53 | Creating missing vms: asactors/5ce03077-fdf8-4bfb-9d7e-b45507fd7b4c (0)
Task 154346 | 22:10:53 | Creating missing vms: asmetrics/69e58377-3a81-40fc-841a-03892de2f026 (0)
Task 154346 | 22:10:53 | Creating missing vms: asnozzle/3595b7f5-c04d-4add-b3a4-09d460d8ee20 (0)
Task 154346 | 22:10:53 | Creating missing vms: asapi/06ba43c1-b86f-4040-b0d8-d5e28a8c1686 (0) (00:01:08)
Task 154346 | 22:12:02 | Creating missing vms: asactors/5ce03077-fdf8-4bfb-9d7e-b45507fd7b4c (0) (00:01:09)
Task 154346 | 22:12:02 | Creating missing vms: asnozzle/3595b7f5-c04d-4add-b3a4-09d460d8ee20 (0) (00:01:09)
Task 154346 | 22:12:02 | Creating missing vms: postgres_autoscaler/2c1bd12b-b73d-4108-9820-99a30064e0bb (0) (00:01:09)
Task 154346 | 22:12:02 | Creating missing vms: asmetrics/69e58377-3a81-40fc-841a-03892de2f026 (0) (00:01:09)
Task 154346 | 22:12:06 | Updating instance postgres_autoscaler: postgres_autoscaler/2c1bd12b-b73d-4108-9820-99a30064e0bb (0) (canary) (00:00:21)
Task 154346 | 22:12:27 | Updating instance asactors: asactors/5ce03077-fdf8-4bfb-9d7e-b45507fd7b4c (0) (canary) (00:00:52)
L Error: Action Failed get_task: Task 45355f8b-f7af-4267-6967-b90c3bc6d985 result: 2 of 4 pre-start scripts failed. Failed Jobs: scalingengine, operator. Successful Jobs: bosh-dns, scheduler.
Task 154346 | 22:13:19 | Error: Action Failed get_task: Task 45355f8b-f7af-4267-6967-b90c3bc6d985 result: 2 of 4 pre-start scripts failed. Failed Jobs: scalingengine, operator. Successful Jobs: bosh-dns, scheduler.

when checking /var/vcap/sys/log/scalingengine/pre-start.stdout.log, I found this stacktrace:

Starting Liquibase at Wed, 16 Sep 2020 22:13:02 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Unexpected error running Liquibase: liquibase.exception.DatabaseException: Error executing SQL SELECT COUNT() FROM public.databasechangeloglock: ERROR: current transaction is aborted, commands ignored until end of transaction block
liquibase.exception.LockException: liquibase.exception.UnexpectedLiquibaseException: liquibase.exception.DatabaseException: Error executing SQL SELECT COUNT() FROM public.databasechangeloglock: ERROR: current transaction is aborted, commands ignored until end of transaction block
at liquibase.lockservice.StandardLockService.acquireLock(StandardLockService.java:289)
at liquibase.lockservice.StandardLockService.waitForLock(StandardLockService.java:207)
at liquibase.Liquibase.update(Liquibase.java:184)
at liquibase.Liquibase.update(Liquibase.java:179)
at liquibase.integration.commandline.Main.doMigration(Main.java:1220)
at liquibase.integration.commandline.Main.run(Main.java:199)
at liquibase.integration.commandline.Main.main(Main.java:137)
Caused by: liquibase.exception.UnexpectedLiquibaseException: liquibase.exception.DatabaseException: Error executing SQL SELECT COUNT() FROM public.databasechangeloglock: ERROR: current transaction is aborted, commands ignored until end of transaction block
at liquibase.lockservice.StandardLockService.isDatabaseChangeLogLockTableInitialized(StandardLockService.java:173)
at liquibase.lockservice.StandardLockService.init(StandardLockService.java:121)
at liquibase.lockservice.StandardLockService.acquireLock(StandardLockService.java:246)
... 6 common frames omitted
Caused by: liquibase.exception.DatabaseException: Error executing SQL SELECT COUNT() FROM public.databasechangeloglock: ERROR: current transaction is aborted, commands ignored until end of transaction block
at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:70)
at liquibase.executor.jvm.JdbcExecutor.query(JdbcExecutor.java:138)
at liquibase.executor.jvm.JdbcExecutor.query(JdbcExecutor.java:146)
at liquibase.executor.jvm.JdbcExecutor.queryForObject(JdbcExecutor.java:154)
at liquibase.executor.jvm.JdbcExecutor.queryForObject(JdbcExecutor.java:169)
at liquibase.executor.jvm.JdbcExecutor.queryForInt(JdbcExecutor.java:190)
at liquibase.executor.jvm.JdbcExecutor.queryForInt(JdbcExecutor.java:185)
at liquibase.lockservice.StandardLockService.isDatabaseChangeLogLockTableInitialized(StandardLockService.java:162)
... 8 common frames omitted
Caused by: org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:307)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:293)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:270)
at org.postgresql.jdbc.PgStatement.executeQuery(PgStatement.java:224)
at liquibase.executor.jvm.JdbcExecutor$QueryStatementCallback.doInStatement(JdbcExecutor.java:419)
at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:57)
... 15 common frames omitted
Caused by: org.postgresql.util.PSQLException: ERROR: relation "databasechangeloglock" already exists
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:307)
at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:293)
at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:270)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:266)
at liquibase.executor.jvm.JdbcExecutor$ExecuteStatementCallback.doInStatement(JdbcExecutor.java:352)
at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:57)
at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:125)
at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:109)
at liquibase.lockservice.StandardLockService.init(StandardLockService.java:97)

I noticed that there is another open issue (#207) related to
databasechangeloglock, and I'm not sure if it's the same issue. If it's the same issue, please mark this issue as duplicate and close it. If not, what could be causing this error?

Note: As a temporary workaround, I ran the "bosh deploy" again and it seemed to fix the error.

Error: undefined method `link' in /var/vcap/jobs-src/eventgenerator/templates/eventgenerator.yml.erb

@qibobo Hi Qibobo, I tried to integrate the new autoscaler service with our old cloud foundry version, the same way as you bumped it into scf, but when I tried to start the autoscaler-metrics, there are following error encountered:
/var/vcap/jobs-src/eventgenerator/templates/eventgenerator.yml.erb:21:in get_binding': undefined method link' for #Bosh::Template::EvaluationContext:0x0055a3cd047ce0 (NoMethodError)
from /opt/hcf/configgin/lib/ruby/lib/ruby/2.1.0/erb.rb:850:in eval' from /opt/hcf/configgin/lib/ruby/lib/ruby/2.1.0/erb.rb:850:in result'
from /opt/hcf/configgin/lib/app/lib/generate.rb:30:in generate' from /opt/hcf/configgin/lib/app/bin/configgin:47:in block (2 levels) in

Autoscaler shows incorrect version in stratos

Hi all,

We have found that the autoscaler version is not showing correctly in stratos.

We have just updated autoscaler from v3.0.0 to v3.0.1 on stratos v3.2.1
Nevertheless - in Stratos -> Cloud Foundry -> Summary -> Autoscaler Version is showing 3.0.0 (should be 3.0.1).

Stratos gets the version from the response to autoscaler.(cf system endpoint)/v1/info.
Looks like this comes from https://github.com/cloudfoundry/app-autoscaler/blob/3c85c748d2e9f315f86b216c6ce340416b515800/api/config/info.json
There must be something build/release side that updates that though as it's set to 001.

The version numbers shown in stratos can be configured in https://github.com/cloudfoundry/app-autoscaler-release/blob/master/jobs/golangapiserver/spec#L97 which seems a suboptimal way of doing this sort of thing.

Can we have a fix for that.

Thanks a lot,

Binded app is not work. (RuntimeException, InvocationTargetException)

Issue

When I bind an Autoscaler service instance to a sample app and then restage, the app is not work.

$ cf create-service-broker autoscaler username password https://servicebroker.service.cf.internal:6101
$ cf enable-service-access autoscaler
$ cf create-service autoscaler autoscaler-free-plan autoscaler1
$ cf bind-service spring-music autoscaler1 -c '{"instance_min_count":1,"instance_max_count":4,"scaling_rules":[{"metric_type":"memoryused","stat_window_secs":300,"breach_duration_secs":600,"threshold":30,"operator":"<","cool_down_secs":300,"adjustment":"-1"},{"metric_type":"memoryused","stat_window_secs":300,"breach_duration_secs":600,"threshold":90,"operator":">=","cool_down_secs":300,"adjustment":"+1"}]}'
$ cf restage spring-music

...
Successfully destroyed container

0 of 1 instances running, 1 starting
0 of 1 instances running, 1 starting
0 of 1 instances running, 1 crashed
Failed to watch staging of app spring-music in org cloudlab / space dev as admin...

2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]OUT 2017-07-06 09:11:47.489  INFO 7 --- [           main] .b.l.ClasspathLoggingApplicationListener : Application failed to start with classpath: [file:/home/vcap/app/, jar:file:/home/vcap/app/lib/tomcat-embed-core-8.0.33.jar!/,
...

2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR Exception in thread "main" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:62)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at java.lang.Thread.run(Thread.java:745)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR Caused by: java.lang.reflect.InvocationTargetException
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at java.lang.reflect.Method.invoke(Method.java:498)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:54)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	... 1 more
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR Caused by: java.lang.NullPointerException
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.cloudfoundry.CloudFoundryServiceInfoCreator.uriKeyMatchesScheme(CloudFoundryServiceInfoCreator.java:65)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.cloudfoundry.CloudFoundryServiceInfoCreator.accept(CloudFoundryServiceInfoCreator.java:26)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.cloudfoundry.RelationalServiceInfoCreator.accept(RelationalServiceInfoCreator.java:23)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.cloudfoundry.RelationalServiceInfoCreator.accept(RelationalServiceInfoCreator.java:15)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.AbstractCloudConnector.getServiceInfo(AbstractCloudConnector.java:60)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.AbstractCloudConnector.getServiceInfos(AbstractCloudConnector.java:40)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.cloud.Cloud.getServiceInfos(Cloud.java:89)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.cloudfoundry.samples.music.config.SpringApplicationContextInitializer.getCloudProfile(SpringApplicationContextInitializer.java:64)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.cloudfoundry.samples.music.config.SpringApplicationContextInitializer.initialize(SpringApplicationContextInitializer.java:44)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.boot.SpringApplication.applyInitializers(SpringApplication.java:640)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.boot.SpringApplication.createAndRefreshContext(SpringApplication.java:343)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.springframework.boot.SpringApplication.run(SpringApplication.java:307)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	at org.cloudfoundry.samples.music.Application.main(Application.java:15)
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]ERR 	... 6 more
2017-07-06T18:11:47.49+0900 [APP/PROC/WEB/1]OUT Exit status 0

Context

CF: v247
App-autoscaler-release: master
Postgres: v17

---
director_uuid: {UUID}

name: app-autoscaler-release

## Release Details ###
releases:
  - name: app-autoscaler
    version: latest
  - name: postgres
    url: https://bosh.io/d/github.com/cloudfoundry/postgres-release
    version: '17'
    sha1: b062e32a5409ccd4e4161337c48705c793a58412
  - name: paasta-controller
    version: '2.0'

## Network Section ##
networks: {NETWORK CONFIG}

## Resource Pool ##
resource_pools:
  - name: small
    network: default
    stemcell:
      name: bosh-openstack-kvm-ubuntu-trusty-go_agent
      version: latest
    cloud_properties:
      name: random
      instance_type: m1.small
      availability_zone: nova

## Disk Pool ##
disk_pools:
  - name: default
    disk_size: 1024

## Canary details ##
update:
  canaries: 1
  canary_watch_time: 1000-300000
  max_in_flight: 3
  update_watch_time: 1000-300000

## Compilation ##
compilation:
  workers: 2
  network: default
  reuse_compilation_vms: true
  cloud_properties:
    name: random
    instance_type: m1.small
    availability_zone: nova

## Jobs ##
jobs:
  - name: postgres
    instances: 1
    update:
      serial: true
    resource_pool: small
    networks:
      - name: default
        static_ips:
          - {POSTGRES_IP}
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: postgres, release: postgres}
    properties:
      databases:
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      consul:
        agent:
          mode: client
          services:
            postgres:
              check:
                tcp: 127.0.0.1:5432
                interval: 30s
                timeout: 10s

  - name: apiserver
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: apiserver, release: app-autoscaler}
    properties:
      api_server:
        db_config:
          idle_timeout: 1000
          max_connections: 10
          min_connections: 0
        port: 6100
        ca_cert: {CA_CERT}
        server_cert: {SERVER_CERT}
        server_key: {SERVER_KEY}
        scheduler:
          ca_cert: {CA_CERT}
          client_cert: {CLIENT_CERT}
          client_key: {CLIENT_KEY}
      policy_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      consul:
        agent:
          mode: client
          services:
            apiserver: {}



  - name: scheduler
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: scheduler, release: app-autoscaler}
    properties:
      scheduler:
        port: 6102
        job_reschedule_interval_millisecond: 10000
        job_reschedule_maxcount: 6
        notification_reschedule_maxcount: 3
        ca_cert: {CA_CERT}
        server_cert: {SERVER_CERT}
        server_key: {SERVER_KEY}
        scaling_engine:
          ca_cert: {CA_CERT}
          client_cert: {CLIENT_CERT}
          client_key: {CLIENT_KEY}
      scheduler_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      consul:
        agent:
          mode: client

  - name: servicebroker
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: servicebroker, release: app-autoscaler}
    properties:
      service_broker:
        db_config:
          idle_timeout: 1000
          max_connections: 10
          min_connections: 0
        port : 6101
        ca_cert: {CA_CERT}
        server_cert: {SERVER_CERT}
        server_key: {SERVER_KEY}
        username: username
        password: password
        http_request_timeout: 5000
        catalog:
          services:
          - id: autoscaler-guid
            name: autoscaler
            description: Automatically increase or decrease the number of application instances based on a policy you define.
            bindable: true
            plans:
            - id: autoscaler-free-plan-id
              name: autoscaler-free-plan
              description: This is the free service plan for the Auto-Scaling service.
        api_server:
          ca_cert: {CA_CERT}
          client_cert: {CLIENT_CERT}
          client_key: {CLIENT_KEY}
      binding_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      consul:
        agent:
          mode: client
          services:
            servicebroker: {}

  - name: pruner
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: pruner, release: app-autoscaler}
    properties:
      appmetrics_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      instancemetrics_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      scalingengine_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      pruner:
        logging:
          level: debug
      consul:
        agent:
          mode: client

  - name: metricscollector
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: metricscollector, release: app-autoscaler}
    properties:
      instancemetrics_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      policy_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      cf: {CF_INFO}
      metricscollector:
        logging:
          level: debug
        server:
          port: 6103
        ca_cert: {CA_CERT}
        server_cert: {SERVER_CERT}
        server_key: {SERVER_KEY}
      consul:
        agent:
          mode: client


  - name: eventgenerator
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: eventgenerator, release: app-autoscaler}
    properties:
      appmetrics_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      policy_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      eventgenerator:
        logging:
          level: debug
        scaling_engine:
          ca_cert: {CA_CERT}
          client_cert: {CLIENT_CERT}
          client_key: {CLIENT_KEY}
        metricscollector:
          ca_cert: {CA_CERT}
          client_cert: {CLIENT_CERT}
          client_key: {CLIENT_KEY}
      consul:
        agent:
          mode: client

  - name: scalingengine
    instances: 1
    networks:
      - name: default
    resource_pool: small
    templates:
      - {name: consul_agent, release: paasta-controller}
      - {name: scalingengine, release: app-autoscaler}
    properties:
      scalingengine_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      scheduler_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      policy_db:
        address: {POSTGRES_IP}
        databases:
          - {name: autoscaler, tag: default}
        db_scheme: postgres
        port: 5432
        roles:
          - {name: postgres, password: postgres, tag: default}
      cf: {CF_INFO}
      scalingengine:
        logging:
          level: debug
        server:
          port: 6104
        ca_cert: {CA_CERT}
        server_cert: {SERVIER_CERT}
        server_key: {SERVER_KEY}
        consul:
          cluster: http://127.0.0.1:8500
      consul:
        agent:
          mode: client

properties:
  consul:
    agent:
      domain: cf.internal
      log_level: warn
      servers:
        lan:
        - {CONSUL_IP}
    agent_cert: {AGENT_CERT}
    agent_key: {AGENT_KEY}
    ca_cert: {CA_CERT}
    dns_config: null
    encrypt_keys:
      - {ENCRYPT_KEY}
    server_cert: {SERVIER_CERT}
    server_key: {SERVER_KEY}

Question

How do I resolve this errors?

Failed: Release SHA1 does not match the expected SHA1

I cloned the repository today and successfully uploaded the release.

minjeong@ubuntu:~/workspace/GitHub/PaaSXpert-AutoScaler/app-autoscaler-release$ bosh releases
Acting as user 'admin' on 'Bosh Lite Director'

+----------------+----------+-------------+
| Name           | Versions | Commit Hash |
+----------------+----------+-------------+
| app-autoscaler | 0+dev.1  | af3ece9f    |
| cf             | 268*     | 4057a140+   |
| cf-mysql       | 32       | 6c0314b     |
| cf-redis       | 428.0.0  | 2d766084+   |
| cflinuxfs2     | 1.138.0* | c88004ab+   |
| diego          | 1.23.0*  | edb126ad    |
| garden-runc    | 1.9.0*   | 3f4312b5    |
| grootfs        | 0.21.0   | f896e94     |
| routing        | 0.142.0  | af830ed7+   |
+----------------+----------+-------------+
(*) Currently deployed
(+) Uncommitted changes

Releases total: 9

When I tried to deploy the yml file, I received the following error.

Release manifest: /home/minjeong/workspace/GitHub/PaaSXpert-AutoScaler/app-autoscaler-release/dev_releases/app-autoscaler/app-autoscaler-0+dev.1.yml
Acting as user 'admin' on 'Bosh Lite Director'

Copying packages
----------------
common
nodejs
servicebroker
golang1.7
scalingengine
scheduler
pruner
metricscollector
java
eventgenerator
apiserver
db


Copying jobs
------------
servicebroker
scalingengine
scheduler
pruner
metricscollector
eventgenerator
apiserver


Copying license
---------------
license

Generated /tmp/d20170719-31102-19jxg35/d20170719-31102-1sn865/release.tgz
Release size: 390.9M

Verifying manifest...
Extract manifest                                             OK
Manifest exists                                              OK
Release name/version                                         OK


Uploading release
release.tgz:    96% |oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo     | 375.3MB  29.2MB/s ETA:  00:00:00
Director task 30
  Started extracting release > Extracting release. Done (00:00:04)

  Started verifying manifest > Verifying manifest. Done (00:00:00)

  Started resolving package dependencies > Resolving package dependencies. Done (00:00:00)

  Started creating new packages
  Started creating new packages > common/306e7eb1a8187457885ce91eb4bc22f5aed734e8. Done (00:00:00)
  Started creating new packages > nodejs/78d4ee5eeb7010fd7d15a1d2986992942940229f. Done (00:00:01)
  Started creating new packages > servicebroker/26d3c21dec7897b57b82d5208e73130b7f4e8ac2. Done (00:00:00)
  Started creating new packages > golang1.7/651d77736c6087be1ca1df72eb8e4d2e701778f9. Done (00:00:02)
  Started creating new packages > scalingengine/32d7cedf61f0db125fcdecdcb84a3032d01a331c. Done (00:00:00)
  Started creating new packages > scheduler/8459c8ef0e82f345720e2fbbab5924a74159da0e. Done (00:00:02)
  Started creating new packages > pruner/0a8952978a0c226f84db1a057df244df2cbd7975. Done (00:00:00)
  Started creating new packages > metricscollector/90223e57b1985adc1667c8bb9abd10764cdaac43. Done (00:00:00)
  Started creating new packages > java/d6f4a8bb4e3bfb6c4f3121f231f1a0c569d7fdaf. Done (00:00:02)
  Started creating new packages > eventgenerator/0ddd32b54bed4d93072336948cdb639691c05ee5. Done (00:00:00)
  Started creating new packages > apiserver/83eccb99910f547942da04b38eeb4681305cfec0. Done (00:00:03)
  Started creating new packages > db/5eb59eeffe739dcf942a45cc0c6e06996cbd8f45. Done (00:00:00)
     Done creating new packages (00:00:10)

  Started creating new jobs
  Started creating new jobs > servicebroker/945e3aa0cfa958275c5e7ed0f8d4fd8cb3fa6cb3. Done (00:00:01)
  Started creating new jobs > scalingengine/4398d6bc3b0d7236d21c9ac25258c192771b54f6. Done (00:00:00)
  Started creating new jobs > scheduler/190ace43d570ee3942012713847e9df492961aa2. Done (00:00:00)
  Started creating new jobs > pruner/480130a24acdd8079c6955f8c61890cc2a8788c7. Done (00:00:00)
  Started creating new jobs > metricscollector/4e4c7a25f668d80ead187d3ab8fb33e85f4497d3. Done (00:00:00)
  Started creating new jobs > eventgenerator/fa282337b2b68dc80c7cdf7fab0fce43a037495a. Done (00:00:00)
  Started creating new jobs > apiserver/db00aaf50a50ee6cbd7d31ca8b29a2f6eff7e938. Done (00:00:00)
     Done creating new jobs (00:00:01)

  Started release has been created > app-autoscaler/0+dev.1. Done (00:00:00)

Task 30 done

Started		2017-07-19 12:07:57 UTC
Finished	2017-07-19 12:08:12 UTC
Duration	00:00:15
release.tgz:    96% |oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo     | 376.3MB  13.0MB/s Time: 00:00:28

Release uploaded
Acting as user 'admin' on 'Bosh Lite Director'
Checking whether release postgres/17 already exists...NO
Using remote release 'https://bosh.io/d/github.com/cloudfoundry/postgres-release'

Director task 31
  Started downloading remote release > Downloading remote release. Done (00:01:54)

  Started verifying remote release > Verifying remote release. Failed: Release SHA1 'ad62d5d7e4b7875316ecd5b972f26ee842c4b605' does not match the expected SHA1 'b062e32a5409ccd4e4161337c48705c793a58412' (00:00:00)

Error 30015: Release SHA1 'ad62d5d7e4b7875316ecd5b972f26ee842c4b605' does not match the expected SHA1 'b062e32a5409ccd4e4161337c48705c793a58412'

Task 31 error

For a more detailed error report, run: bosh task 31 --debug

Dependency gone?

Hi,

I follow the README to deploy the autoscaler with BOSH, after run cmd: bosh create-release, following errors encountered:
...
Downloading: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:07 min
[INFO] Finished at: 2018-09-10T03:23:20-07:00
[INFO] Final Memory: 8M/111M
[INFO] ------------------------------------------------------------------------
[ERROR] Plugin org.apache.maven.plugins:maven-clean-plugin:2.5 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-clean-plugin:jar:2.5: Could not transfer artifact org.apache.maven.plugins:maven-clean-plugin:pom:2.5 from/to central (https://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:443 [repo.maven.apache.org/151.101.48.215] failed: Connection timed out -> [Help 1]
...
[FATAL] Non-resolvable parent POM for org.cloudfoundry.autoscaler:scheduler:1.0-SNAPSHOT: Could not transfer artifact org.springframework.boot:spring-boot-starter-parent:pom:1.5.2.RELEASE from/to spring-snapshots (https://repo.spring.io/libs-snapshot): Connect to repo.spring.io:443 [repo.spring.io/35.241.58.96] failed: Connection timed out and 'parent.relativePath' points at no local POM @ line 5, column 10
...
Seems the dependency is gone, please kindly help.
BR//HAO

Failed to recurse into submodule path 'src/app-autoscaler'

I'm having isues when ./scripts/update runs:

Submodule path 'src/gopkg.in/yaml.v2': checked out 'a3f3340b5840cee44f372bddb5880fcbc419b46a'
Failed to recurse into submodule path 'src/app-autoscaler'

I'm following main installation steps https://github.com/cloudfoundry-incubator/app-autoscaler-release#bosh-lite-deployment

I have run "git submodule update --init --recursive" on appautoscaler repo but it didn't work

Have someone seen this error before? Am I missing some step?

Thanks

If autoscaler.operator.require_consul: false then autoscaler.operator.lock.consul_cluster_config should be set to null.

When require_consul is false the operator job still attempts to connect to consul.

In operator/main.go:

if conf.Lock.ConsulClusterConfig != "" {
		consulClient, err := consuladapter.NewClientFromUrl(conf.Lock.ConsulClusterConfig)
		if err != nil {
			logger.Fatal("new consul client failed", err)
		}
...

Because there is a default value for consul_cluster_config in the operator job spec, the value is never "" unless you explicitly set it.

Bosh Bionic Stemcell Support and State of the project

Dear App-Autoscaler maintainers and contributors,

we tried rolling out the app-autoscaler BOSH release with the latest bionic stemcell (v 0.28) and could not succeed.
With the old Xenial stemcells and the same configuration we are able to run the service as intended.

Therefore some general questions:

Are there any plans to make the BOSH release deployable with the Bionic stemcell?
Is the project still maintained, or has it reached the end of it's life?
If it has reached the end of life: Are there alternatives that we could offer to our platform users for providing autoscaling features for Cloud Foundry?

Kind regards,
Julian

Add persistent_disk to the manifest for instance_group=postgres_autoscaler

Hi,

It could be useful to add persistent_disk to the manifest for the postgres_autoscaler instance group.
To prevent data loss in case of VM recreation.

Thanks a lot,

How to use this release with bosh director?

I am having problems with route_registrar and consul. I am not able to figure out the configuration based on our cloud foundry deployment.
Any help would be highly appreciated.

Thanks

Error: Bad Certificate

I successfully deployed the app-autoscaler-release on AWS.

Everything works well except for the metricscollector and scalingengine APIs.

I can use the apiserver APIs.

apiserver/740d157f-8e3f-43fc-bd0b-28d3b43075aa:~$ curl https://apiserver.service.cf.internal:6100/v1/policies/45c39971-41c6-4fb2-b999-a4fc33068329 --insecure
{"instance_max_count":4,"instance_min_count":1,"scaling_rules":[{"adjustment":"-1","breach_duration_secs":600,"cool_down_secs":300,"metric_type":"memoryused","operator":"<","stat_window_secs":300,"threshold":30},{"adjustment":"+1","breach_duration_secs":600,"cool_down_secs":300,"metric_type":"memoryused","operator":">=","stat_window_secs":300,"threshold":90}],"schedules":{"recurring_schedule":[{"days_of_week":[1,2,3],"end_time":"18:00","initial_min_instance_count":5,"instance_max_count":10,"instance_min_count":1,"start_time":"10:00"},{"days_of_month":[5,15,25],"end_date":"2099-07-23","end_time":"19:30","initial_min_instance_count":5,"instance_max_count":10,"instance_min_count":3,"start_date":"2099-06-27","start_time":"11:00"},{"days_of_week":[4,5,6],"end_time":"18:00","instance_max_count":10,"instance_min_count":1,"start_time":"10:00"},{"days_of_month":[10,20,30],"end_time":"19:30","instance_max_count":10,"instance_min_count":1,"start_time":"11:00"}],"specific_date":[{"end_date_time":"2099-06-15T13:59","initial_min_instance_count":2,"instance_max_count":4,"instance_min_count":1,"start_date_time":"2099-06-02T10:00"},{"end_date_time":"2099-02-19T23:15","initial_min_instance_count":3,"instance_max_count":5,"instance_min_count":2,"start_date_time":"2099-01-04T20:00"}],"timezone":"Asia/Shanghai"}}

When I try to access the metricscollector, it says that the certificate is not valid.

apiserver/740d157f-8e3f-43fc-bd0b-28d3b43075aa:~$ curl https://metricscollector.service.cf.internal:6103 --insecure
curl: (35) error:14094412:SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate

curl -v https://metricscollector.service.cf.internal:6103 --cacert ca.crt 
* Rebuilt URL to: https://metricscollector.service.cf.internal:6103/
* Hostname was NOT found in DNS cache
*   Trying 10.244.4.7...
* Connected to metricscollector.service.cf.internal (10.244.4.7) port 6103 (#0)
* successfully set certificate verify locations:
*   CAfile: ca.crt
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Request CERT (13):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS alert, Server hello (2):
* error:14094412:SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate
* Closing connection 0
curl: (35) error:14094412:SSL routines:SSL3_READ_BYTES:sslv3 alert bad certificate

I attached my yml file.

app-autoscaler-release.zip

Any help would be really appreciated!

Please configure GITBOT

Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.

Steps:

Fork this repo: cfgitbot-config
Add your project to config-production.yml file
Submit a PR

If there are any questions, please reach out to [email protected].

create-release scripts not honoring proxy?

I have a linux box with no internet access but has access via a proxy. I have set http_proxy and https_proxy on the box and I can curl -x successfully through the proxy. However when I run bosh create-release it gets a good chunk of the way through but then dies on the apiserver part: Added package 'apiserver/b0ab0e6e8317cd7292c8230d491e13f232f623c7'

It looks like those scripts are not honoring the proxy? I was watching my squid proxy logs throughout the create-release process and it was getting a lot of traffic until the apiserver portion started, and then there wasn't another entry in the proxy log while the apiserver part was running and eventually failed. It's not using the http(s)_proxy settings. Here's a couple snippets of the error:

apache-maven-3.3.9/bin/mvnyjp apache-maven-3.3.9/conf/ apache-maven-3.3.9/conf/logging/ apache-maven-3.3.9/conf/logging/simplelogger.properties apache-maven-3.3.9/conf/settings.xml apache-maven-3.3.9/conf/toolchains.xml apache-maven-3.3.9/lib/ext/ apache-maven-3.3.9/lib/ext/README.txt [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building db 1.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ Downloading: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-clean-plugin/2.5/maven-clean-plugin-2.5.pom [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 02:09 min [INFO] Finished at: 2018-03-28T15:39:32+00:00 [INFO] Final Memory: 9M/111M [INFO] ------------------------------------------------------------------------ [ERROR] Plugin org.apache.maven.plugins:maven-clean-plugin:2.5 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.apache.maven.plugins:maven-clean-plugin:jar:2.5: Could not transfer artifact org.apache.maven.plugins:maven-clean-plugin:pom:2.5 from/to central (https://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:443 [repo.maven.apache.org/151.101.44.215] failed: Connection timed out -> [Help 1]

Mar 28, 2018 3:41:54 PM org.apache.maven.wagon.providers.http.httpclient.impl.execchain.RetryExec execute INFO: I/O exception (java.net.NoRouteToHostException) caught when processing request to {s}->https://repo.maven.apache.org:443: No route to host Mar 28, 2018 3:41:54 PM org.apache.maven.wagon.providers.http.httpclient.impl.execchain.RetryExec execute INFO: Retrying request to {s}->https://repo.maven.apache.org:443

App autoscaler fails to start, settings.json invalid.

attempting to deploying app autoscaler 3.0.0 results in the apiserver failing to start with a message that settings.json is invalid.

from apiserver.stderr.log

Error: settings.json is invalid
at module.exports (/var/vcap/data/packages/apiserver/54a1afb682ac915030a74cc3c9a91ab2554025e8/app.js:16:11)
at Object. (/var/vcap/data/packages/apiserver/54a1afb682ac915030a74cc3c9a91ab2554025e8/index.js:25:56)
at Module._compile (module.js:652:30)
at Object.Module._extensions..js (module.js:663:10)
at Module.load (module.js:565:32)
at tryModuleLoad (module.js:505:12)
at Function.Module._load (module.js:497:3)
at Function.Module.runMain (module.js:693:10)
at startup (bootstrap_node.js:191:16)
at bootstrap_node.js:612:3
/var/vcap/data/packages/apiserver/54a1afb682ac915030a74cc3c9a91ab2554025e8/app.js:16
throw new Error('settings.json is invalid');

from apiserver.stdout.log

{"timestamp":"2019-11-26T03:05:17.836Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2019-11-26T03:05:28.689Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2019-11-26T03:05:40.031Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2019-11-26T03:05:51.275Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2019-11-26T03:06:02.027Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2019-11-26T03:06:13.550Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}

from settings.json
"minBreachDurationSecs": 30, "minCoolDownSecs": 30, "httpClientTimeout": 5000

final release?

Is there plan to cut a final release soon?

Error: 'metricscollector/bd716bf0-8387-4bf4-8a6a-39136bca323f (0)' is not running after update.

Hi, After uploading release of autoscaler, Deploying App-autoscaler have following error. Please help

Blobs not found when uploading release?

Followed the directions, cloned the repo, ran the update. Created the release successfully. When I try and upload the release to my bosh director:

Cannot find blob named 'apiserver/b05314ee056c084dc8cc5f1df532877b8468d62d' with SHA1 'e3e20c84fba48918acfd2048b394343c9511edd7'
Cannot find blob named 'metricscollector/85f0c39bcb9ef477c708d18802b4fbbfd097a7d5' with SHA1 '624ef5956dd9b9a22878ab750c99dd6ecb9c15df'
Cannot find blob named 'pruner/a9b04121b5fab39485f4a681698b452da0118aee' with SHA1 'd40e9d14e5a4902c9555293eea80bff26a633f4f'
Cannot find blob named 'scalingengine/4c4cfeeabd9e0c0e53b44a9563e209edf6eb6230' with SHA1 '6a17cd56ff164fe5cd272518b0beb782b611903c'
Cannot find blob named 'eventgenerator/11123ed87e47a95b8ae473c87a433bce908a5ceb' with SHA1 'a6b403300d88d06db49c18d06db9a9e65f97e1b0'

DB password exposed in the log when db connect failed.

When failed to connect db, the db password will be exposed in the log file, it's a secure issue.
Error logs:

2019/12/03 07:37:39 failed-to-connection-to-database, dburl:postgres://xxx:[email protected]:5432/autoscaler?sslmode=verify-full&sslrootcert=/var/vcap/jobs/scalingengine/config/certs/scalingengine_db/ca.crt,  err:pq: password authentication failed for user "xxx"
failed to connect to database:

Checking the code and found the below line will print the DB url with password:

app-autoscaler-release/src/changeloglockcleaner/sqldb/changelogdb.go

Line 28 in 37f28f8

log.Printf("failed-to-connection-to-database, dburl:%s, err:%s\n", dbUrl, err)

can bosh cli v1 handle the bosh links?

Hi,

Since we are using cf-release, so bosh cli v2 can't be used in this case.
There is one link method which is in the file app-autoscaler-release/jobs/metricscollector/templates/metricscollector.yml.erb, can bosh cli v1 handle this method?

scheduler vm namespace clash with cf scheduler (0.30.0)

cf-deployment introduced the use of a vm named 'scheduler' in cf-deployment 0.28.0 release. This is causing a namespace clash when consul_agents register with the consul_server(s) as there are now two clients both attempting to register with the name scheduler-0

We are seeing repeated problems with the autoscaler being able to pass the autoscaler acceptance test suite when running against cf-deployment 0.30.0 and believe this is caused by the namespace clash within consul.

CA certificate rotation causes interruption in executed schedules

If you have more than one CA cert (while doing a CA cert rotation) on the scheduler job only one certificate will end up in the Java trust store:

$ cat /var/vcap/jobs/scheduler/config/certs/scalingengine/ca.crt
-----BEGIN CERTIFICATE-----
MIIE7jCCAtagAwIBAgIBATANBgkqhkiG9w0BAQsFADAXMRUwEwYDVQQDEwxhdXRv
...
RRCLIcypYA/ld2RGB9wq/9Fj
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
MIIE7jCCAtagAwIBAgIBATANBgkqhkiG9w0BAQsFADAXMRUwEwYDVQQDEwxhdXRv
...
CsZEnYFcqsE/g5jJj0S/YeNG
-----END CERTIFICATE-----

$ /var/vcap/jobs/scheduler/bin/install_crt_truststore test scalingengine/ca.crt
$ /var/vcap/packages/java/bin/keytool -list -v -keystore /var/vcap/data/certs/test/cacerts 
Enter keystore password:  

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 1 entry
...

This means that the scheduler will not trust one of the certificates and there will be some time when schedules cannot be executed:

org.cloudfoundry.autoscaler.scheduler.util.error.SchedulerInternalException: Error connecting to scaling engine, failed with error: I/O error on DELETE request for "https://scalingengine.service.cf.internal:6104/v1/apps/d0910498-eabe-4014-8f42-4d9f77003bd9/active_schedules/7958": sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors; nested exception is javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors for app id: d0910498-eabe-4014-8f42-4d9f77003bd9 and schedule id: 7,958 to delete active schedule.

We probably need to split up the certs according to https://stackoverflow.com/questions/14660767/keytool-importing-multiple-certificates-in-single-file in

app-autoscaler-release/jobs/scheduler/templates/install_crt_truststore.erb

Lines 23 to 26 in 37f28f8

 manage_truststore () { 

 operation=$1 

 $JDK_HOME/bin/keytool -$operation -file $CERT_FILE -keystore $TRUST_STORE_FILE -storeType pkcs12 -storepass $PASSWORD -noprompt -alias $CERT_ALIAS >/dev/null 2>&1 

 }

apiserver fails when deploying app autoscaler using bosh DNS

Trying to deploy a greenfield install of app autoscaler that uses bosh DNS. However the apiserver fails when doing so with the following error:

L Error: Action Failed get_task: Task 916d5245-4549-4acd-47e3-1bee834e78a4 result: 1 of 3 pre-start scripts failed. Failed Jobs: apiserver. Successful Jobs: route_registrar, bosh-dns.

in the var/vcap/sys/log/apiserver/pre-start.stdout.log it shows a connection attempt failed to autoscalerpostgres.service.cf.internal. I am using the bosh-dns.yml that is the exmaples dir of the app-autoscaler release folder. It shows a domain for autoscalerpostgres.service.cf.internal. I have not modified the bosh-dns.yml.

Here is more of the log from the apiserver/pre-start.stderr.log:

Starting Liquibase at Tue, 23 Jul 2019 17:35:18 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Unexpected error running Liquibase: org.postgresql.util.PSQLException: The connection attempt failed.
liquibase.exception.DatabaseException: liquibase.exception.DatabaseException: org.postgresql.util.PSQLException: The connection attempt failed.
at liquibase.integration.commandline.CommandLineUtils.createDatabaseObject(CommandLineUtils.java:132)
at liquibase.integration.commandline.Main.doMigration(Main.java:974)
at liquibase.integration.commandline.Main.run(Main.java:199)
at liquibase.integration.commandline.Main.main(Main.java:137)
Caused by: liquibase.exception.DatabaseException: org.postgresql.util.PSQLException: The connection attempt failed.
at liquibase.database.DatabaseFactory.openConnection(DatabaseFactory.java:254)
at liquibase.database.DatabaseFactory.openDatabase(DatabaseFactory.java:149)
at liquibase.integration.commandline.CommandLineUtils.createDatabaseObject(CommandLineUtils.java:97)
... 3 common frames omitted
Caused by: org.postgresql.util.PSQLException: The connection attempt failed.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:292)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.(PgConnection.java:195)
at org.postgresql.Driver.makeConnection(Driver.java:454)
at org.postgresql.Driver.connect(Driver.java:256)
at liquibase.database.DatabaseFactory.openConnection(DatabaseFactory.java:246)
... 5 common frames omitted
Caused by: java.net.UnknownHostException: autoscalerpostgres.service.cf.internal
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:221)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:402)
at java.base/java.net.Socket.connect(Socket.java:591)
at org.postgresql.core.PGStream.(PGStream.java:70)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:91)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192)
... 10 common frames omitted

Allow encrypted DB connections

All of the database connection strings appear to be hard coded with "sslmode=disable" which is preventing secure connections.
It would be nice to be able to configure this for environments that require secure connections.

cpu or request based metrictypes are also supported?

Is it possible to autoscale based on cpu or requests?

CheckServiceExists does not correctly check for service offering with cf7 CLI

app-autoscaler-release/src/acceptance/helpers/helpers.go

Lines 124 to 130 in ce0fb69

 if strings.Contains(string(version.Out.Contents()), "version 7") { 

 serviceExists = cf.Cf("marketplace", "-e", cfg.ServiceName).Wait(cfg.DefaultTimeoutDuration()) 

 } else { 

 serviceExists = cf.Cf("marketplace", "-s", cfg.ServiceName).Wait(cfg.DefaultTimeoutDuration()) 

 } 

 Expect(serviceExists).To(Exit(0), fmt.Sprintf("Service offering, %s, does not exist", cfg.ServiceName))

does not correctly check for the service offering to exist with cf7 marketplace as it exits with exit code 0 even when no service offering was found:

[Update]: When the -e flag is specified, and no service offering with that name is found, the exit code returned is 0. This is in contrast to the cf CLI v6, which returned exit code 1 in this case.
(c.f. cloudfoundry/docs-cf-cli#71)

Instead the output of the cf7 marketplace command needs to be parsed.

wrong pid for servicebroker in the pid file.

@qibobo Hi qibobo, one more issues found:
In autoscaler-api pod, process servicebroker, apiserver status are does not exist as below:
Process 'crond' Running
File 'cron_bin' Accessible
File 'cron_rc' Accessible
Directory 'cron_spool' Accessible
Process 'rsyslogd' Running
File 'rsyslogd_bin' Accessible
File 'rsyslog_file' Timestamp failed
Process 'servicebroker' Does not exist
Process 'route_registrar' Running
Process 'apiserver' Does not exist
File 'post-start' Does not exist
System 'autoscaler-api-int.hcf.svc' Running

I tried to start them manually with monit validate -v:
'servicebroker' Error testing process id [86445] -- No such process
'servicebroker' process is not running
'servicebroker' trying to restart
'servicebroker' Error testing process id [86445] -- No such process
'servicebroker' Error testing process id [86445] -- No such process
'servicebroker' start: /var/vcap/jobs/servicebroker/bin/servicebroker
'servicebroker' Error testing process id [86445] -- No such process

While check the corresponding pid file, it provides a different pid:
root@autoscaler-api-int:/var/vcap/monit# more /var/vcap/sys/run/servicebroker/servicebroker.pid
86565

bosh cluster deployment

currently, it is providing Bosh Lite deployment. However, it is not enough and satisfing everyone.
Do you having any plan to have bosh cluster deployment with Director ?

certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name after go upgrade.

After upgrading to go1.15, it looks like we are hitting this error when running any of the golang services - e.g.

021/07/22 11:20:27 failed-to-connection-to-database, dburl:postgres://postgres:*REDACTED*@autoscalerpostgres.service.cf.internal:5432/autoscaler?sslmode=verify-full&sslrootcert=/var/vcap/jobs/scalingengine/config/certs/scalingengine_db/ca.crt,  err:x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0

Possible workaround is to set the environment variable GODEBUG=x509ignoreCN=0.

End-of-Life OpenJDK used in the release

Hi,

The latest autoscaler BOSH release is still using openjdk 10 which is already End-of-Life for 2 years, can we update the version in the next release so the autoscaler to be compliant with the all-around global security scanners that are out there?

https://github.com/cloudfoundry/app-autoscaler-release/blob/475d3bd134a4a0f089a13fac6a1fcb4829cca630/packages/java/spec

Many thanks,

Think you still need cf_admin_password to deploy

The readme has recently been updated with the following for deploying autoscaler:

bosh -e YOUR_ENV -d app-autoscaler
deploy templates/app-autoscaler-deployment-fewer.yml
--vars-store=bosh-lite/deployments/vars/autoscaler-deployment-vars.yml
-v system_domain=bosh-lite.com
-v cf_client_id=autoscaler_client_id
-v cf_client_secret=autoscaler_client_secret
-v skip_ssl_validation=true

I only got it to work with the following:

bosh -e YOUR_ENV -d app-autoscaler
deploy templates/app-autoscaler-deployment-fewer.yml
--vars-store=bosh-lite/deployments/vars/autoscaler-deployment-vars.yml
-v system_domain=bosh-lite.com
-v cf_admin_password=
-v cf_client_id=autoscaler_client_id
-v cf_client_secret=autoscaler_client_secret
-v skip_ssl_validation=true

Without it, you get an error like:

Failed to find variable '//app-autoscaler/cf_admin_password' from config server: HTTP Code '404', Error: 'The request could not be completed because the credential does not exist or you do not have sufficient authorization.'

Need a link for bosh-dns setup in docs.

Bosh-dns is not enabled by default to all deployments up to now. So, it is necessary to add a doc link to bosh and cf deployment doc to explain how to enable bosh-dns . Otherwise, the deployment of autoscaler will be problematic.

https://bosh.io/docs/dns/#links
for the entire Director via Director job configuration director.local_dns.use_dns_addresses property that if enabled affects all deployments by default. We are planning to eventually change this configuration to true by default.

https://github.com/cloudfoundry/cf-deployment#bosh-runtime-config

cf-deployment requires that you have uploaded a runtime-config for BOSH DNS prior to deploying your foundation. We recommended that you use the one provided by the bosh-deployment repo:

bosh update-runtime-config bosh-deployment/runtime-configs/dns.yml --name dns

Metricsforwarder logs ssl validation error when forwarding metrics

Using the deployment template I'm seeing this log output for the metrics-forwarder:

{"data":{"metric":{"app_guid":"9aa474dc-7b6d-4cb1-bbf9-2ffb7d23c0d7","instance_index":0,"name":"custom","unit":"test-unit","value":1000},"session":"4"},"log_level":0,"log_time":"2020-06-29T11:56:46Z","message":"metricsforwarder.custom_metrics_server.custom-metric-emit-request-received:","source":"metricsforwarder","timestamp":"1593431806.530002832"}
{"data":{"data":[{"code":14,"message":"all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate is valid for reverselogproxy, not metron\""}],"session":"4.1"},"log_level":0,"log_time":"2020-06-29T11:56:46Z","message":"metricsforwarder.custom_metrics_server.metric_forwarder.Error while flushing: %s","source":"metricsforwarder","timestamp":"1593431806.552621126"}
{"data":{"count":1,"session":"5"},"log_level":0,"log_time":"2020-06-29T11:57:13Z","message":"metricsforwarder.PolicyManager.policycount","source":"metricsforwarder","timestamp":"1593431833.567957401"}

I don't understand what the cause of the error message is:

transport: authentication handshake failed: x509: certificate is valid for reverselogproxy, not metron

I'm assuming that this happens while metricsforwarder is trying to forward to the local loggregator_agent

Not an issue but another CPU query

Hi, we're currently trying out this release in one of our dev envs and noticed that the policy for cpu metric can only be between 1-100%.

I can find the code that implements this restriction which is fair enough. In our Env though CPU can go up to 200% as our Diego Cells are dual-core.

I'm running through the code to see what cpu metric is pulled out and i'm guessing it the same cpuPercentage from the loggregator v2 api as used in the command 'cf app [app name]' which shows cpu used. Is that right?

It wouldn't be using AbsoluteCPUUsage no? And i'm guessing not AbsoluteCPUEntitlement as that's experimental.

Any info would be good as i've been going down a rabbit hole following the code :-)

Consul replacement?

The current app-autoscaler-release is not deployable due to it's templates relying on CF Consul, which does not exist anymore.
Any plans to fix that? Add bosh-dns and bosh-dns-aliases to the release/templates?

metricsgateway job failed on asnozzle VM with certificate error

I just deployed the app-autoscaler BOSH release 3.0.1 and it failed on updating the instance asnozzle:

Task 153575 | 21:51:51 | Updating instance asnozzle: asnozzle/d86a1ac1-2ffe-4912-97c8-63eff5dee550 (0) (canary) (00:05:17)
L Error: 'asnozzle/d86a1ac1-2ffe-4912-97c8-63eff5dee550 (0)' is not running after update. Review logs for failed jobs: metricsgateway
Task 153575 | 21:57:08 | Error: 'asnozzle/d86a1ac1-2ffe-4912-97c8-63eff5dee550 (0)' is not running after update. Review logs for failed jobs: metricsgateway

Task 153575 Started Fri Sep 11 21:49:35 UTC 2020
Task 153575 Finished Fri Sep 11 21:57:08 UTC 2020
Task 153575 Duration 00:07:33
Task 153575 error

Updating deployment:
Expected task '153575' to succeed but state is 'error'

Exit code 1

When I checked the /var/vcap/sys/log/metricsgateway/metricsgateway.stdout.log, I found a lot of occurrences of this error:

{"data":{"error":"x509: certificate is valid for metricsserver.service.cf.internal, *.asmetrics.default.app-autoscaler.bosh, not de4b3b4d-de80-40d3-832e-67a7f49c6bf6.asmetrics.vlan200-cfar.app-autoscaler.bosh"},"log_level":2,"log_time":"2020-09-11T23:01:27Z","message":"metricsgateway.failed to start emitter","source":"metricsgateway","timestamp":"1599865287.905046940"}

It looked like the CN/SAN of the certificate for metrics server does not match with the DNS name used for metric_server_addrs parameter in the metricsgateway.yml:

$ cat /var/vcap/jobs/metricsgateway/config/metricsgateway.yml
logging:
level: info
envelop_chan_size: 1000
nozzle_count: 3
metric_server_addrs: ['wss://0e8d79dc-253f-4128-985d-d86cf161f902.asmetrics.vlan200-cfar.app-autoscaler.bosh:7103']
...

Any idea what is wrong and how to fix this problem?

App autoscaler fails to start, settings.json invalid.

attempting to deploying app autoscaler 3.0.0 results in the apiserver failing to start with a message that settings.json is invalid.

from apiserver.stderr.log

Error: settings.json is invalid
at module.exports (/var/vcap/data/packages/apiserver/a8b8486d14bcb95da7869b8f30a4f2bbef6f1e05/app.js:16:11)
at Object. (/var/vcap/data/packages/apiserver/a8b8486d14bcb95da7869b8f30a4f2bbef6f1e05/index.js:25:56)
at Module._compile (module.js:652:30)
at Object.Module._extensions..js (module.js:663:10)
at Module.load (module.js:565:32)
at tryModuleLoad (module.js:505:12)
at Function.Module._load (module.js:497:3)
at Function.Module.runMain (module.js:693:10)
at startup (bootstrap_node.js:191:16)
at bootstrap_node.js:612:3
/var/vcap/data/packages/apiserver/a8b8486d14bcb95da7869b8f30a4f2bbef6f1e05/app.js:16
throw new Error('settings.json is invalid');

from apiserver.stdout.log

{"timestamp":"2020-03-20T05:41:05.864Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2020-03-20T05:41:45.986Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2020-03-20T05:42:26.115Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2020-03-20T05:43:06.282Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2020-03-20T05:43:46.384Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}
{"timestamp":"2020-03-20T05:44:26.513Z","source":"autoscaler:apiserver","text":"Invalid configuration: minBreachDurationSecs is required","log_level":"error"}

Note: For deployment we used app-autoscaler-deployment-v1.yml file.

The autoscaler API connection to the CC API must skip ssl validation if self signed certs on the CC are used.

If the CC API is using a self signed cert or one provided by a private PKI the Autoscaler API must have: autoscaler.cf.skip_ssl_validation: true

There is no option to supply a trusted root cert for nodejs. Ideally the version of Nodejs the job is using could be compiled to just use the default system CA store like the Nodejs buildpack does. https://www.pivotaltracker.com/n/projects/1042066/stories/152254480

job "metircsgateway" fails to start in instance "asnozzle" because of missing policy_json table

While deploying app-autoscaler using the Readme.md, I ran into following issue:
"metricsgateway" job in "asnozzle" kept failing, and logs showed following message

{"data":{"addr":"0.0.0.0:6503","session":"10"},"log_level":1,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.health-server.new-health-server","source":"metricsgateway","timestamp":"1566995106.178214312"}
{"data":{},"log_level":1,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.starting metricsgateway","source":"metricsgateway","timestamp":"1566995106.178723574"}
{"data":{"interval":5000000000,"session":"4"},"log_level":1,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.AppManager.started","source":"metricsgateway","timestamp":"1566995106.178894281"}
{"data":{"session":"5"},"log_level":1,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.Dispather.dispatcher-started","source":"metricsgateway","timestamp":"1566995106.178979874"}
{"data":{"session":"2"},"log_level":1,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.WSHelper.setup-new-ws-connection","source":"metricsgateway","timestamp":"1566995106.179043293"}
{"data":{"error":"pq: relation \"policy_json\" does not exist","query":"SELECT app_id FROM policy_json","session":"1"},"log_level":2,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.policy-db.get-appids-from-policy-table","source":"metricsgateway","timestamp":"1566995106.179763317"}
{"data":{"error":"pq: relation \"policy_json\" does not exist","session":"4"},"log_level":2,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.AppManager.retrieve-app-ids","source":"metricsgateway","timestamp":"1566995106.179866314"}
{"data":{"error":"pq: relation \"policy_json\" does not exist","query":"SELECT app_id FROM policy_json","session":"1"},"log_level":2,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.policy-db.get-appids-from-policy-table","source":"metricsgateway","timestamp":"1566995106.180225134"}
{"data":{"error":"pq: relation \"policy_json\" does not exist","session":"4"},"log_level":2,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.AppManager.retrieve-app-ids","source":"metricsgateway","timestamp":"1566995106.180329800"}
{"data":{"error":"pq: relation \"policy_json\" does not exist","query":"SELECT app_id FROM policy_json","session":"1"},"log_level":2,"log_time":"2019-08-28T12:25:06Z","message":"metricsgateway.policy-db.get-appids-from-policy-table","source":"metricsgateway","timestamp":"1566995106.180671692"}

I figured this could be because the tables are actually getting created by "pre-start" script for job "apiserver" part of instance "asapi" which is not updated before the updating of "asnozzle" instance.
https://github.com/cloudfoundry/app-autoscaler-release/blob/master/jobs/apiserver/templates/pre-start.erb#L46

My Temp Solution:
Reshuffle the sequence of defining instances in template/app-autoscaler-deployment.yml file such that "asnozzle" is defined before "asapi".
At least this resolved the issue for us.

Abnormal Liquibase lock blocks app-autoscaler from starting

App-autoscaler uses Liquibase to maintain the database change sets, and runs Liquibase process as a pre-start job.
Once a Liquibase process is running , it will insert a DB lock record , and then remove it with normal completion. But in some weird situation (i.e. liquibase update was interrupted ..) , the db lock won't be removed, and blocked the further autoscaler startup.

See detail in https://www.liquibase.org/documentation/databasechangeloglock_table.html

Currently , Liquibase don't have DB lock TTL implemented, so we need to workaround this with cmd listLocks and releaseLocks.

Details steps to reproduce and fix the problem:

step1: do a liquibase update manually , and ctrl+c to break its execution.

java -cp "$DB_DIR/target/lib/*" liquibase.integration.commandline.Main --url "$DBURI" --username=$USER --password=$PASSWORD --driver=org.postgresql.Driver --changeLogFile=$API_DIR/db/api.db.changelog.yml update
Starting Liquibase at Tue, 20 Aug 2019 05:05:11 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
^Cautoscaler-api/1:/var/vcap/jobs/apiserver/bin# ^C

step2: now , a db lock was left in ICD database:

autoscaler-api/1:/var/vcap/jobs/apiserver/bin# java -cp "$DB_DIR/target/lib/*" liquibase.integration.commandline.Main --url "$DBURI" --username=$USER --password=$PASSWORD --driver=org.postgresql.Driver listLocks
Starting Liquibase at Tue, 20 Aug 2019 05:05:28 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Database change log locks for autoscaler@jdbc:postgresql://external-db.uaa.svc.cluster.local:31404/autoscaler?sslmode=require
 - autoscaler-api-1.autoscaler-api-set.cf.svc.cluster.local (172.30.51.196) at Aug 20, 2019, 5:05:12 AM
Liquibase command 'listLocks' was executed successfully.

step3: run the update cmd in step1 again. Now , it hung just as what we found in @travagli 's cluster

To fix it, we can execute "release lock" cmd as below:

autoscaler-api/1:/var/vcap/jobs/apiserver/bin# java -cp "$DB_DIR/target/lib/*" liquibase.integration.commandline.Main --url "$DBURI" --username=$USER --password=$PASSWORD --driver=org.postgresql.Driver releaselocks
Starting Liquibase at Tue, 20 Aug 2019 05:08:36 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Successfully released all database change log locks for 'autoscaler@jdbc:postgresql://external-db.uaa.svc.cluster.local:31404/autoscaler?sslmode=require'
Liquibase command 'releaselocks' was executed successfully.

autoscaler-api/1:/var/vcap/jobs/apiserver/bin# java -cp "$DB_DIR/target/lib/*" liquibase.integration.commandline.Main --url "$DBURI" --username=$USER --password=$PASSWORD --driver=org.postgresql.Driver listLocks
Starting Liquibase at Tue, 20 Aug 2019 05:08:47 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Database change log locks for autoscaler@jdbc:postgresql://external-db.uaa.svc.cluster.local:31404/autoscaler?sslmode=require
 - No locks
Liquibase command 'listLocks' was executed successfully.

autoscaler-api/1:/var/vcap/jobs/apiserver/bin# java -cp "$DB_DIR/target/lib/*" liquibase.integration.commandline.Main --url "$DBURI" --username=$USER --password=$PASSWORD --driver=org.postgresql.Driver --changeLogFile=$API_DIR/db/api.db.changelog.yml update
Starting Liquibase at Tue, 20 Aug 2019 05:08:52 UTC (version 3.6.3 built at 2019-01-29 11:34:48)
Liquibase: Update has been successful.

	manage_truststore () {
	operation=$1
	$JDK_HOME/bin/keytool -$operation -file $CERT_FILE -keystore $TRUST_STORE_FILE -storeType pkcs12 -storepass $PASSWORD -noprompt -alias $CERT_ALIAS >/dev/null 2>&1
	}

	if strings.Contains(string(version.Out.Contents()), "version 7") {
	serviceExists = cf.Cf("marketplace", "-e", cfg.ServiceName).Wait(cfg.DefaultTimeoutDuration())
	} else {
	serviceExists = cf.Cf("marketplace", "-s", cfg.ServiceName).Wait(cfg.DefaultTimeoutDuration())
	}

	Expect(serviceExists).To(Exit(0), fmt.Sprintf("Service offering, %s, does not exist", cfg.ServiceName))

cloudfoundry / app-autoscaler-release Goto Github PK

app-autoscaler-release's People

Contributors

Stargazers

Watchers

Forkers

app-autoscaler-release's Issues

Issue

Context

Question

Recommend Projects

Recommend Topics

Recommend Org