cloudfoundry / cf-deployment Goto Github PK

View Code? Open in Web Editor NEW

290.0 119.0 305.0 10.79 MB

The canonical open source deployment manifest for Cloud Foundry

License: Apache License 2.0

Shell 21.68% Go 78.32%

cff-wg-app-runtime-deployments

cf-deployment's People

Contributors

Stargazers

Watchers

Forkers

pkdevbox voelzmo cloudxtreme katmutua anksv kizimo cf-container-networking nelsam jvshahid drich10 cppforlife jaresty timani emalm acrmp mboldt tylerpinson anoop2811 wendorf peterellisjones anoumana luan plfx ericpromislow fushewokunze-pivotal windofthesky ramonskie bsekar s-matyukevich flawedmatrix xiujiao drnic engineerbetter jpalermo thausler786 hoagrawal priyata25 jenspinney tcdowney virajago zankich dgodd carlo-colombo joaompinto evanfarrar ishustava grapeup geofffranks bingosummer blumixid edwardecook jsievers rohitsharma04 tusing astrieanna zrob idoru bluebosh niroyb idev4u jingyanyi kellygerritz garethjevans ssurenr kinjelom sig-fisherj mcwumbly dlresende ebeer tinygrasshopper gcapizzi chris-pollard-assurity alibaba-archive ailan-gl dkoper aeijdenberg anexper anwarchk ob1-sc rigoford shasthojoy mcnichol mdelillo ykisialiou pradyutsarma sesmith177 goldyellow34 flavorjones booleancat fghorbel giner jmcarp flangewad vchrisr simonkey007 archgrove andyliuliming selzoc hliilh dermc

cf-deployment's Issues

Inconsistent use of hyphens and underscores in instance-group names

The names of the instance groups in the cf-deployment.yml manifest are not using separators consistently. Some jobs use hyphens (diego-bbs, diego-brain,diego-cell, route-emitter, tcp-router) while some use underscores (cc_bridge, cc_clock, log_controller). My own preference is for the hyphen to be the separator, but underscore is used more frequently (but not universally) in the job template names and properties.

Thanks,
Eric

Cannot opt into compiled release after uncompiled release is uploaded

Hi,

We ran into an issue where we were deploying cf-deployment onto bosh-lite without using compiled releases (which is presently just mysql). We later switched to using the use-compiled-releases opsfile, but found it wasn't being utilized; bosh skipped uploading the compiled release and compiled the release anyway.

We spoke with @anEXPer who believed it would work if the uncompiled release was deleted prior to deploying with the compiled release, or perhaps stripping the version attribute from the compiled release in the list of releases in the rendered manifest.

We weren't able to test either of these as compilation succeeded the second time around.

Cheers,

KH && @aashah

Too much stuff in bosh-lite ops file?

I'm trying to deploy Spark (modifying this example) on CF, and think I'll need to leverage container-to-container networking. The simplest way for me to do this is to deploy CF on BOSH-Lite and leverage this cf-networking-release ops file. That ops file assumes there's a MySQL database, but the bosh-lite ops file in this repo replaces MySQL with Postgres.

Is the bosh-lite ops file doing too much? I understand wanting to give bosh-lite users a simple experience, but it seems like swapping out MySQL for Postgres should be its own thing, and asking bosh-lite users to compose a small handful of building blocks isn't that bad (and is encouraged).

/cc @cloudfoundry/cf-container-networking @cppforlife @wendorf @drich10

cf-deployment.yml not compliant with YAML spec

Hi,

I'm developing a tool, for my own educational use, to parse cf-deployment yaml file and visualise the components and dependencies. My tool was parsing the yaml file correctly until Dec 2016. Coming back to the project after few months I've noticed following error message with the latest clone of cf-deployment,

Exception in thread "main" found undefined alias diego_bbs_client_properties
bbs: *diego_bbs_client_properties
in 'reader', line 727, column 16:
^

With some analysis I found that the yaml alias is being referenced before it's declaration, which is not compliant with YAML spec - according to various YAML parsers and lint tools available online.

Alias in
Line 727: bbs: *diego_bbs_client_properties
whereas the declaration is in
Line 800: bbs: &diego_bbs_client_properties Line 801: ca_cert: "((diego_bbs_client.ca))" Line 802: client_cert: "((diego_bbs_client.certificate))" Line 803: client_key: "((diego_bbs_client.private_key))"
Switching the declaration above Line 727 fixes the problem. I'm looking for alternative YAML parsers (written in Java) that can overlook the issue of declaration order, if not then I hope the cf-deployment team can comment on this issue.

Doesn't use default cloud config

Should cf-deployment use the default cloud-config from cloudfoundry/bosh-deployment and then use ops files to switch what type of VM/Disk to use?

/cc @apoydence

Create Service Broker Error - certificate verify failed

Hi,
Deploying service broker on cf-deployment, based on this Stark & Wayne blog, throws following error message -

cf create-service-broker haash-broker warreng natedogg https://${broker_url} --space-scoped

Server error, status code: 500, error code: 10001, message: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed

I've successfully created the service broker on PCFDev, so I must be missing something while following the same procedure on cf-deployment.

`?` in ops files carry over to the right

/instance_groups/name=api/jobs/name=cloud_controller_ng/properties/doppler?/port? [1]

is same as

/instance_groups/name=api/jobs/name=cloud_controller_ng/properties/doppler?/port

because once something is optionally present everything to the right of it by definition may or may not be present when it's being created.

[1] https://github.com/cloudfoundry/cf-deployment/blob/develop/opsfiles/change-logging-port-for-aws-elb.yml#L3

cf login password for bosh-lite + cf-deployment

Hi,

The default cf login credential for cf-release on bosh-lite (admin / admin) is not working with cf-deployment. I understand that there no default credentials and secrets are loaded from deployment-vars.yml file.

Which key inside the deployment-vars.yml file holds the password for cf login -a api.bosh-lite.com --skip-ssl-validation command?

Regards,
Amitoj

Configuration for mysql should be multi-zone by default

So it is not a single-point-of-failure anymore.

See

cf-deployment/cf-deployment.yml

Line 175 in 171fede

- z1

`name:` missing after generating manifest

The aws stub as described in the documentation is missing meta.environment which leads to Deployment name not found in the deployment manifest when actually deploying the result. I don't know why this would work for people not using the cf-deployment toolchain, but probably this should be fixed in the documentation then?

cf-release need to be checked out with `git submodule update --init --recursive` when using an absolute filepath

A hint that your cf-release needs to have all recursive submodules would be appreciated. Maybe just referencing https://github.com/cloudfoundry/cf-release/blob/master/scripts/update as this is what your internal scripting calls when using integration-latest would be a good idea?

Bump releases for Windows operations file in CI

We want to keep getting mileage out of the Windows operations file and it would really help if the releases it includes (garden-windows and the hwc buildpack) were bumped automatically in CI. This would enable all of the consumers of cf-deployment that use windows cells (diego, loggregator, infrastructure, potentially capi and others) to deploy our new BOSH releases.

These releases are on bosh.io here:

garden-windows: http://bosh.io/releases/github.com/cloudfoundry-incubator/garden-windows-bosh-release?all=1
hwc-buildpack: http://bosh.io/releases/github.com/cloudfoundry-incubator/hwc-buildpack-release?all=1

metron_agent has an implicit dependency on consul_agent that isn't satisfied in all instance groups

Because of the properties defined here: https://github.com/cloudfoundry/cf-deployment/blob/master/cf-deployment.yml#L19-L26 that use etcd.service.cf.internal the metron_agent job ends up having a dependency on the consul_agent job.

We found that the following instance_groups don't satisfy that dependency:

Expose several CF properties as links?

We are producing a release which is meant to be deployed alongside CF. It's all BOSH-2.0-style work. Right now, to make this work, we need an Operator to copy and paste several values from the CF manifest to a variables file. To do this, we made a template with the following section:

########################
# Cloud Foundry config #
########################

cf_api_url: https://api.bosh-lite.com
cf_uaa_admin_client_secret: admin-secret
cf_admin_username: admin
cf_admin_password: admin
cf_app_domains: [bosh-lite.com]
cf_sys_domain: bosh-lite.com
cf_skip_ssl_validation: true
cf_nats:
  machines: [10.244.0.6]
  user: nats
  password: nats
  port: 4222

As service authors, I don't think we should expect our Operator to copy and paste these values from their cf-manifest.

So this request, then is to please expose these features as links. I can understand the hesitation around some of them, especially UAA's admin password, but until credhub can be the solution to some of these issues, this is the best a service author has. After all, if the Operator already has access to all manifests, allowing these links isn't a significant additional exposure.

certificate error on bosh-lite deployment

Hi,

So I tried following command with bosh v2, as mentioned in README
AH-MacBook-Pro:cf-deployment$ bosh -e lite update-cloud-config bosh-lite/cloud-config.yml

I'm get the following error message

Updating cloud config:
Performing request POST 'https://lite:25555/cloud_configs':
Performing POST request:
Post https://lite:25555/cloud_configs: x509: certificate is valid for *.sslip.io, not lite

Exit code 1

Somewhere I read that I should be using www-192-168-50-4.sslip.io instead of lite (which is mapped to 192.168.50.4 in my /etc/hosts), but then i get

Updating cloud config:
Performing request POST 'https://www-192-168-50-4.sslip.io:25555/cloud_configs':
Performing POST request:
Post https://www-192-168-50-4.sslip.io:25555/cloud_configs: x509: certificate signed by unknown authority

Exit code 1

Help appreciated.

Best Regards

Links in Readme to cloud foundry docs are wrong

Hyperlinks are incorrect for the following:

vSphere (not currently supported by this tool)
vCloud (not currently supported by this tool)
OpenStack (not currently supported by this tool)

Clicking on the link leads to:

Page not Found
Visit the homepage.

unable to set cf api after updating cf v237 to v241

Hi,

After I migrated cf from version 237 to 241, I'm unable to set api , I get Error performing request: Get https://api.training.cf.redacted.com/v2/info: EOF. this is the first time I'm posting an issue, so I have no idea what information to provide.
ubuntu@cf-ams-training:~$ cf -v
cf version 6.22.1+6b7af9c-2016-09-24

ubuntu@cf-ams-training:~$ uname -a
Linux cf-ams-training 3.13.0-108-generic #155-Ubuntu SMP Wed Jan 11 16:58:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

ubuntu@cf-ams-training:~$ bosh releases
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Acting as user 'trainee' on 'my-bosh'

+----------------------------+----------------+-------------+
| Name | Versions | Commit Hash |
+----------------------------+----------------+-------------+
| cf | 237 | 87f11091+ |
| | 241* | 638c22f9+ |
| | 244 | e2198e12+ |
| cflinuxfs2-rootfs | 1.15.0* | d4408672+ |
| | 1.33.0 | 15e18f58+ |
| diego | 0.1476.0* | 23caa9d3 |
| | 0.1486.0 | e47f7e29 |
| etcd | 55 | 45730f57+ |
| | 59* | c2bd33fc+ |
| | 70 | d97246d2+ |
| garden-linux | 0.338.0* | 38e53b4a |
| | 0.342.0 | b03a9abc |
| logsearch | 203.0.0* | f85490fb+ |
| logsearch-for-cloudfoundry | 200.0.0+dev.1* | 170bbb31 |
| postgres | 1.0.3 | 71dfd61b+ |
+----------------------------+----------------+-------------+

ubuntu@cf-ams-training:~$ bosh deployments
RSA 1024 bit CA certificates are loaded due to old openssl compatibility
Acting as user 'trainee' on 'my-bosh'

+------------------------------+------------------------------------------+----------------------------------------------------+--------------+
| Name | Release(s) | Stemcell(s) | Cloud Config |
+------------------------------+------------------------------------------+----------------------------------------------------+--------------+
| logsearch | logsearch-for-cloudfoundry/200.0.0+dev.1 | bosh-openstack-kvm-ubuntu-trusty-go_agent-raw/3215 | none |
| | logsearch/203.0.0 | | |
+------------------------------+------------------------------------------+----------------------------------------------------+--------------+
| openstack-training-AMS | cf/241 | bosh-openstack-kvm-ubuntu-trusty-go_agent-raw/3215 | none |
| | etcd/59 | | |
+------------------------------+------------------------------------------+----------------------------------------------------+--------------+
| openstack-training-AMS-diego | cf/241 | bosh-openstack-kvm-ubuntu-trusty-go_agent-raw/3215 | none |
| | cflinuxfs2-rootfs/1.15.0 | | |
| | diego/0.1476.0 | | |
| | etcd/59 | | |
| | garden-linux/0.338.0 | | |
+------------------------------+------------------------------------------+----------------------------------------------------+--------------+

consul cert empty on bosh lite

On master at ebcd1cb (and develop at SHA: 48eeb42). One instance of the consul job gives the following issues:

bosh deploy output (note, we killed it otherwise it would hang forever)

16:43:17 | Creating missing vms: log-api/b5bf17b2-01a0-4d7f-9649-df81d6a3e190 (0) (00:00:50)
16:43:17 | Creating missing vms: diego-cell/af10dfe4-0e16-4fb8-bcdc-77ea207fdf44 (0) (00:00:50)
16:43:17 | Updating instance consul: consul/4e847ae8-f6e8-4e37-b192-2a63fd374e61 (0) (canary)

consul error output:

/:~# tail -f /var/vcap/sys/log/consul_agent/consul_agent.stderr.log
2017/02/21 16:50:28 [ERR] agent.client: Failed to decode response header: EOF
error during start: timeout exceeded: "rpc error: failed to get conn: remote error: tls: bad certificate"
2017/02/21 16:51:20 [ERR] agent.client: Failed to decode response header: EOF
2017/02/21 16:51:20 [ERR] agent.client: Failed to decode response header: EOF

Looking at the directory:

/:~# cd /var/vcap/jobs/consul_agent/config/certs/
/:/var/vcap/jobs/consul_agent/config/certs# ls -al
total 28
drwxr-xr-x 2 vcap vcap 4096 Feb 21 16:47 .
drwxr-xr-x 3 vcap vcap 4096 Feb 21 16:51 ..
-rw-r----- 1 vcap vcap    1 Feb 21 16:47 agent.crt
-rw-r----- 1 vcap vcap    1 Feb 21 16:47 agent.key
-rw-r----- 1 vcap vcap 1144 Feb 21 16:47 ca.crt
-rw-r----- 1 vcap vcap 1189 Feb 21 16:47 server.crt
-rw-r----- 1 vcap vcap 1676 Feb 21 16:47 server.key

DNS for SSH Proxy

I looked through this repo's README, https://docs.cloudfoundry.org, docs and README for capi-release, and docs and README for bosh-bootloader, and couldn't find any instructions on how to set up DNS for the load balancer in front of the SSH Proxy instances.

Digging through CC code I found that it makes a hard-coded assumption that you have ssh.SYSTEM_DOMAIN DNS setup to point to your SSH Proxy LB (or the SSH Proxies directly if you're skipping the LB).

It would be nice if cf-deployment explicitly asked the user for system, app, and SSH domains as the only things the user needs to provide for a basic deployment. Basically, the user needs to be responsible for DNS and ingress, and cf-deployment can do the rest. So by the same token, cf-deployment ideally wouldn't assume how the user has chosen to set up that DNS.

What if:

capi-release made this a required property in the job spec
cf-deployment made it a required input from the user
docs.cloudfoundry.org explicitly explained the DNS requirements for setting up an environment, and
bosh-bootloader and cf-deployment referred user to those official docs

/cc @zrob @evanfarrar @wendorf @drich10

Long MariaDB compilation

I am observing two packages compilation takes about 1 hour on c3.large machines:

11:54:47 | Compiling packages: mariadb/563c214c66c68a3558312fee44c22c30085a663a (00:25:17)
12:20:04 | Compiling packages: xtrabackup/44b8b474086ddbc45a7797c191449da8806ee9d1 (00:27:36)

It is too long - for other packages it takes only several minutes to compile.

What is the reason for such long timing and can we make it shorter?

Possible ideas:

Make compilation parallel inside single package
Start compilation of these two packages first thing in the deployment - now they are the last ones.
Run compilation of these two packages in parallel - currently they are compiling sequentially though I have 6 compilation vms.

Retrieving your CF admin password is not obvious

After you've set up your shiny new cf-deployment Cloud Foundry, you'll probably want to login with the cf CLI. Unless you're familiar the new BOSH CLI and the large variables section in cf-deployment, it is not obvious that you should run bosh interpolate --path /uaa_scim_users_admin_password env-repo/deployment-vars.yml to retrieve your CF admin password. An example in the README would be helpful, maybe renaming uaa_scim_users_admin_password to cf_admin_password or similar might also be nice.

Convert to uaa.jwt.policy.keys from uaa.jwt.signing_key

Per https://github.com/cloudfoundry/uaa-release/blob/develop/jobs/uaa/spec#L353-L356, the uaa.jwt.signing_key and uaa.jwt.verification_key properties are deprecated. Should these templates be updated to use the new uaa.jwt.policy.keys and uaa.jwt.policy.active_key_id properties?

cf-deployment ci appears to be inserting new lines every time releases change

Observe: https://github.com/cloudfoundry/cf-deployment/blame/develop/cf-deployment.yml#L1434-L1477

GCP networking issue

We are creating this issue as a place for conversation about GCP networking issues.

We are seeing requests to the cloud controller fail intermittently. The issue primarily shows up while running CATs. The cf cli does not have a timeout value set on its HTTP client, so it just hangs until the test fails. We can reproduce the issue outside of CATs from anywhere on the internet using the cf cli. Here is the test:

cf login # log into the cf deployment
export CF_TRACE=true
while true; do
    echo "--------------------------------------------------"
    date
    time cf create-org foo
    date
    time cf delete-org -f foo
done

Eventually the cf cli will hang. We then track down the vcap-request-id and when we look at the cloud controller's app and nginx logs we see that a 200 response is written out.

Here are the details of our environment that is showing the issue:

CF deployed on GCP using the manifest from this repo.
Static IP assigned to a local forwarding rule that has a target pool for the gorouter instances.

Stuff we've tried to help CATs pass:

We set the MTU for garden containers to 1460 to prevent packet fragmentation coming from the container running CATs. This turned out to be a red herring since running the cf cli from anywhere on the internet also exposes the issue.
We patched the cf cli to have a timeout and a new transport for each request. Since the cf cli has a retry loop this helps CATs still pass when we see these failures. This just masks the underlying issue but helps get us green.

Steps we are taking to fix the problem:

We are verifying this problem exists against a newly deployed GCP environment.
If we see the issue crop up again we will continue to trace the failed requests. Then next thing we need to look at are gorouter logs and try to find what (if anything) is between the gorouter and the cf cli that might be causing these failures.

Loggregator team members who have context on this issue:

3 AZ requirement

Is there any specific reason why there is a requirement on 3 AZs?

cf-deployment/cf-deployment.yml

Line 38 in 1a599cc

- z3
cf-deployment/cf-deployment.yml

Line 108 in 1a599cc

- z3

I just bbl up'd an environment in us-west-1 and it only gave me 2 AZs, which caused this deployment to fail.

Inconsistent Job Names

We have noticed that the jobs in CF are either using underscores or hyphens. We should decide to use one or the other consistently.

For example, we have diego-bbs and cc_clock.

/cc @apoydence

Migration from cf-release

Is there a plan for people using cf-release to migrate to cf-deployment

name change for "cf-internet-required" and "cf-internet-not-required"

We are thinking of adding these two vm_extensions to the cloud config that bbl generates however, we would like to make them non-cf specific. We'd like to propose the name change to "internet-required" and "internet-not-required".

-Christian and @kkallday

Ops File for compiled releases

Would it be possible to have opsFiles to replace the releases section with compiled releases per IaaS(or atleast for GCP).

Currently multiple teams are waiting for the same releases to be compiled, is this project a good place for such a file to live?

Multiple instances of mysql

Currently there is only one mysql instance, is it possible to have 3 instances replicated with galera?

Is uaa.login.client_secret unnecessary?

The job properties for the UAA in cf-deployment.yml includes a uaa.login.client_secret property, but that is no longer a property listed in the UAA spec, and looks to have been removed in cloudfoundry/uaa-release@a33a1f3

Should this be removed from the manifest, and replaces with a login client defined in uaa.clients? Is the login client no longer required in uaa.clients either?

[improvement] Better documentation for stub generation

Would be highly appreciated. So far there is a pretty generic link to http://docs.cloudfoundry.org/deploying/ in the README which doesn't help much. Here is what I'd like to see

a recommendation of which stubs to provide (instances, credentials, properties, <anything else?>)
- instances_minimal could probably be something like https://github.com/cloudfoundry/cf-release/blob/master/templates/cf-minimal-dev.yml
- credentials and properties could come from http://docs.cloudfoundry.org/deploying/ec2/cf-stub-aws.html
examples of the yaml properties to put in those stubs

default branch "develop" can be painful for consumers

As a consumer of cf-deployment who wants a functioning CF as a black box, I was confused to clone the repo and find the deploy broken because the default branch develop is not guaranteed to be stable.

The README does call out that master is the stable branch, but even knowing this it took me a long time to realize the repo I just cloned was on develop.

I understand the main driver for setting develop to be the default branch is to facilitate pull-requests, which generally should be made against develop not master. I know the frustration that can be caused by having to close PRs because they were made against the wrong branch.

I think that setting the default branch to master will provide a better experience for the consumers who want a CF without caring too much about contributing back, so the question becomes which use-case do we want to optimize for?

There's advantages and disadvantages to both choices. What do you think?

Questions about pipelines/cf-deployment.yml

Could domains like hermione.cf-app.com be parameterized, to make it easier to know that the string only needs to be changed in one place?
Why do some tasks come from https://github.com/cloudfoundry/runtime-ci and some from https://github.com/cloudfoundry/cf-deployment-concourse-tasks, especially things like uploading stemcell?

/cc @fushewokunze-pivotal

CATS errand missing

The cf-release has an acceptance-test errand that we currently use to make sure our team's changes don't break our test CF deployment. It'd be great if cf-deployment could also (optionally?) provide this errand. For now we're having to upload all of cf-release just to be able to add it back to our deployment.

Duplicated variables for loggregator doppler cert?

This commit introduced new variables loggregator_tls_doppler_cert but the manifest already had a variable doppler_tls_server_cert.

I think these ought to be the same thing.

Loggregator's own scripts only generate 3 cert/key pairs, not 4. And when we do a deployment and set loggregator_tls_doppler_cert to be different from doppler_tls_server_cert then metron fails to start. But apparently when hermione is deployed using identical values for these two variables, then the deploy succeeds.

cc: @mcwumbly

Unnecessary duplication of vm_extensions in gcp.yml override

Currently there is duplication in the gcp.yml override file for vm_extensions (one example here

cf-deployment/opsfiles/gcp.yml

Line 20 in 08eb768

- 10GB_ephemeral_disk

). This could instead take the form of an append function. It would make the file smaller and reduce the chance for error if the list of shared extension names were to change.

-Derek

Running out of ephemeral disk on GCP

We're seeing a confusing error message on GCP when trying to update certain jobs:

Response exceeded maximum allowed length

After talking with the BOSH team, the likely culprit is our VM running out of ephemeral disk when trying to untar some stuff. Running df -h on our cc_clock VM shows a tiny 1GB ephemeral partition: /dev/sda3 1.1G 741M 235M 76% /var/vcap/data. On GCP, if you ask for a 5GB root disk, as bbl does for the 5GB_ephemeral_disk vm_extension, bosh will take about 3-4GB of the root disk for the agent and whatnot, and carve an ephemeral partition (/var/vcap/data) out of whatever remains.

Can y'all bump the 5GB_ephemeral_disk root disk size to avoid this issue?

Update routers in serial

Hi,

Issue : Currently cf-deployment rolls are routers in parallel and it causes downtime for backends.

Possible fix: To fix this issue we need to update routers in serial and roll this vm after UAA(to fetch oauth tokens for routing API).

Routing team has CI coverage for zero downtime tests and these are intermittently failing since we moved to cf-deployment and it would be great if your team can roll out the fix.

Related PR : #87

Regards
Shash

Anchors, instead of links, make IPs difficult to override.

Hi there!

I'm trying to create an override file which will allow for a minimal, instead of HA deployment.

I'm including lines like these to slim down the static IPs specified:

- type: remove
  path: /instance_groups/name=consul/networks/name=private/static_ips/2
- type: remove
  path: /instance_groups/name=consul/networks/name=private/static_ips/1
- type: remove
  path: /instance_groups/name=nats/networks/name=private/static_ips/1

... unfortunately, the sets of two and three IPs continue to show up all throughout the manifest. That doesn't seem to affect deployment, but it's messy. Here's an example:

          servers:
            lan:
            - 10.0.31.190
            - 10.0.47.190
            - 10.0.63.190

A certain @cppforlife suggested that it's because you're using YAML anchors, instead of BOSH links.

blobstore job should not run consul_agent in server mode

This looks like a mistake: https://github.com/cloudfoundry/cf-deployment/blob/master/cf-deployment.yml#L638-L674

Add smoke-test errand to cf-deployment

Once cf is deployed, we'd like to be confident that cf is running correctly. Can we add the smoke-test bosh errand into the cf-deployment manifest?

When using a path like '/Users/pivotal/workspace/cf-release' in the config.json, bosh complains about missing 'file://' in the manifest

I tried to follow the documentation for using tools/prepare_deployments and when my config file contained an absolute path to cf-release (like in the README example), this resulted in a manifest entry like url: /Users/pivotal/workspace/cf-release. This doesn't play well with bosh, which expects a schema, in this case file:// would be appropriate.

I couldn't find the right place to patch this in your shellscripts, sorry.

It also deploys diego?

No qualifying bean of type [org.springframework.mail.javamail.JavaMailSender] while deploying service into PCF

Hi I am trying to deploy a service which uses Spring Mail to send email. The service is working in my local IDE and I am trying deploy it in PCF cloud. I did a gradle build and have manifest.yml file on the class path. When I push it the app failed start with following exception. Is that something I am missing or PCF behaviour or email service will not in cloud?

I am using spring boot and I have all the required properties which helps to create a bean for mail in my application.properties file.

Caused by: org.springframework.beans.factory.BeanCreationException: Could not autowire field: private org.springframework.mail.javamail.JavaMailSender com.send.SendController.mailSender; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [org.springframework.mail.javamail.JavaMailSender] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}

sha1 for releases referenced by URL missing in generated manifest

I'm getting Expected SHA1 when specifying remote URL for release 'etcd' - what might I be doing wrong?

Here is what my config.json looks like:

{
  "cf": "integration-latest",
  "etcd": "integration-latest",
  "stemcell": "integration-latest",
  "stubs": ["<redacted>"]
}

and the content of blessed_versions.json actually has those sha1 values:

{
  "releases": [
    {
      "name": "cf",
      "commit": "2eb1e78ae64b4454ff8cd392c79bef25574ffb4c"
    },
    {
      "name": "etcd",
      "version": "18",
      "sha1": "222e26f1f38a23f4355ec2517683fdc7c70704aa",
      "url": "https://bosh.io/d/github.com/cloudfoundry-incubator/etcd-release?v=18"
    },
    {
      "name": "consul",
      "version": "6",
      "sha1": "b9774d0f38235336c2ffb07f762de8177ab9a172",
      "url": "https://bosh.io/d/github.com/cloudfoundry-incubator/consul-release?v=6"
    }
  ],
  "stemcells": {
    "aws": {
      "type": "bosh-aws-xen-hvm-ubuntu-trusty-go_agent",
      "version": "3126",
      "url": "https://bosh.io/d/stemcells/bosh-aws-xen-hvm-ubuntu-trusty-go_agent?v=3126",
      "sha1": "c57c5294a33331d75747bf7593ec8fb822fdd497"
    }
  }
}

After manually adding the sha1: 222e26f1f38a23f4355ec2517683fdc7c70704aa for etcd release 18 to the releases section of the manifest, the deployment got past that check.

This probably happens because ./tools/prepare_deployments doesn't have the sha1: field?

AZ Error on Deploy

Hi,

I'm getting the following error when attempting to deploy using the bosh-deploy-with-created-release CI task from the cf-deployment-concourse-tasks repo and a dev release of consul:

Task 23
01:14:10 | Preparing deployment: Preparing deployment (00:00:00)
            L Error: Instance group 'consul' must specify availability zone that matches availability zones of network 'private'

01:14:10 | Error: Instance group 'consul' must specify availability zone that matches availability zones of network 'private'

Started  Thu Mar 23 01:07:19 UTC 2017
Finished Thu Mar 23 01:14:10 UTC 2017
Duration 00:06:51

Task 23 error

It loos like it might be caused by the extra z3 availability zones in the consul and etcd instance groups in the cf-deployment.yml manifest. Are they supposed to be there?

Unclear documentation

In your README, under the Setup and Prerequisites -> Bosh Cloud Config section, it mentions that there are IAAS-specific advice on how to set up a cf-deployment-compatible cloud config on different IAASes. However, no such documentation appears to exist under the Setup and Prerequisites.

Create self sign certificate for load balancer

I was not able to create self sign certificates as explain in the documentation.
First I had to create a CA and then create a certificate. Only in this way I was able to execute bbl create-lbs

Best regards

cc_uploader on cc_bridge failed to start due to port binding issue

We were deploying CF on bosh-lite following the instructions on README and the deployment failed with this error:

10:31:26 | Updating instance diego-cell: diego-cell/14c93c79-ecd8-4bc1-b6bf-db894ea00207 (0) (canary) (00:03:15)
10:48:57 | Updating instance cc-bridge: cc-bridge/5be970c1-0d63-4d60-bb8c-b62b5ce4726e (0) (canary) (00:20:46)
            L Error: 'cc-bridge/0 (5be970c1-0d63-4d60-bb8c-b62b5ce4726e)' is not running after update. Review logs for failed jobs: cc_uploader

10:48:57 | Error: 'cc-bridge/0 (5be970c1-0d63-4d60-bb8c-b62b5ce4726e)' is not running after update. Review logs for failed jobs: cc_uploader

We ssh'ed into the VM and found this repeated several times in /var/vcap/sys/log/cc_uploader/cc_uploader.stdout.log

{"timestamp":"1497265361.568850994","source":"cc-uploader","message":"cc-uploader.ready","log_level":1,"data":{}}
{"timestamp":"1497265361.568930149","source":"cc-uploader","message":"cc-uploader.exited-with-failure","log_level":2,"data":{"error":"Exit trace for group:\ncc-uploader exited with error: listen tcp 0.0.0.0:9090: bind: address already in use\ndebug-server exited with nil\n"}}

And metron process was bound on ::::9090

We ran the deployment again and it succeeded. We were curious why it had failed so we decided to investigate further and found following config files might be responsible for the failure:

/var/vcap/jobs/metron_agent/config/metron_agent.json
/var/vcap/jobs/cc_uploader/config/cc_uploader_config.json

The cc_uploader specifies port 9090 as the listener address and the the metron defines port 9090 as the health endpoint port.

It would look like there is a race condition as to which process grabs the port first.

Here are the config files:

metron_agent.json

{
  "Index": "5be970c1-0d63-4d60-bb8c-b62b5ce4726e",
  "Job": "cc-bridge",
  "Zone": "z1",
  "Deployment": "bosh-lite.com",
  "IP": "10.244.0.140",
  "Tags": {
    "deployment": "bosh-lite.com",
    "job": "cc-bridge",
    "index": "5be970c1-0d63-4d60-bb8c-b62b5ce4726e",
    "ip": "10.244.0.140"
  },
  "IncomingUDPPort": 3457,
  "DisableUDP": false,
  "PPROFPort": 0,
  "HealthEndpointPort": 9090,
  "GRPC": {
    "Port": 3458,
    "KeyFile": "/var/vcap/jobs/metron_agent/config/certs/metron_agent.key",
    "CertFile": "/var/vcap/jobs/metron_agent/config/certs/metron_agent.crt",
    "CAFile": "/var/vcap/jobs/metron_agent/config/certs/loggregator_ca.crt"
  },
  "DopplerAddr": "doppler.service.cf.internal:8082",
  "DopplerAddrUDP": "doppler.service.cf.internal:3457"
}

cc_uploader_config.json

{
    "cc_ca_cert": "/var/vcap/jobs/cc_uploader/config/certs/cc/ca.crt",
    "cc_client_cert": "/var/vcap/jobs/cc_uploader/config/certs/cc/client.crt",
    "cc_client_key": "/var/vcap/jobs/cc_uploader/config/certs/cc/client.key",
    "consul_cluster": "http://127.0.0.1:8500",
    "debug_server_config": {
        "debug_address": "127.0.0.1:17018"
    },
    "dropsonde_port": 3457,
    "lager_config": {
        "log_level": "info"
    },
    "listen_addr": "0.0.0.0:9090",
    "log_level": "info",
    "mutual_tls": {
        "ca_cert": "/var/vcap/jobs/cc_uploader/config/certs/cc_uploader/ca.crt",
        "listen_addr": "0.0.0.0:9091",
        "server_cert": "/var/vcap/jobs/cc_uploader/config/certs/cc_uploader/server.crt",
        "server_key": "/var/vcap/jobs/cc_uploader/config/certs/cc_uploader/server.key"
    }
}