aegershman / upgrade-tiles-proof-of-concept Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 4.11 MB

Out of date. Proof-of-concept to demonstrate using selective-deploys on tiles/stemcells in PCF

License: MIT License

Shell 100.00%

concourse concourse-ci concourse-pipeline pcf

upgrade-tiles-proof-of-concept's People

Contributors

Watchers

upgrade-tiles-proof-of-concept's Issues

General: more informative output on tasks

In order to keep operators clued into the inner-workings of the upgrade-tile tasks, we should include more echo/output statements to display what's happening

Making the output helpful and informative will prevent the need for 'debug' mode (set -x), which will display everything-- including credentials-- in the output window

Ensure install failure when product is uploaded with incorrect stemcell

In order to ensure operational consistency & safety, we should ensure the pipeline fails when a product is uploaded with the incorrect stemcell listed in the product.yml

Doesn't properly stream logs

Once it runs "apply changes", the logs don't properly stream

https://opsman-dev-api-docs.cfapps.io/#getting-a-list-of-recent-install-events

Experiment with 'configure-product'

Poll-wait when installing (but not necessarily when 'busy')

If another tile is in the process of being installed, the pipeline needs to wait. But it shouldn't wait anytime there's a staged change-- it should only wait when the Opsman is in the process of "installing".

Alternative to stage-tile task

The logic in stage-tile is weirdly complicated. It's especially frustrating because you cannot have multiple versions of a tile uploaded; you must delete unused products to allow stage-tile to work properly. Perhaps there's a better way to update a tile.

Perhaps #7 could be applied in this case?

How do I know which major stemcell versions to be fetching on?

How can the pipeline know which major stemcell version to go retrieve? Should each individual tile pipeline be in charge of downloading/uploading the latest stemcell? Should some other pipeline automatically listen for available stemcells & upload them automatically, then the tile-pipelines be responsible for applying them? Should I just load the opsman with a pivnet API token & poll the "check for available stemcells" API endpoint?

Inspiration: https://github.com/pivotal-cf/pcf-pipelines/blob/master/tasks/upload-product-and-stemcell/task.sh

Investigate opsman api on "upgrading-a-product"

https://opsman-dev-api-docs.cfapps.io/#upgrading-a-product

What does this API endpoint do? Does it do a selective deploy? Could it be used in substitution for downloading product/uploading/staging?

Document other "styles" of tile-upgrades

In order to demonstrate the other techniques of upgrading tiles (single-tile single-pipeline; multi-tile single-pipeline, etc.), we should include a little documentation on other ways we've done tile upgrades.

Is 'associate-stemcell' an idempotent operation?

If I try to associate a stemcell to a product, but that stemcell version is already applied to the product, will this still register as a "staged change"? If I'm going to be making API calls to associate a stemcell to a product every time I call "apply-selective-deploy", I need to make sure this doesn't cause unnecessary problems.

I'm hoping for the Opsman response to be something like "stemcell version already applied".

Implement remaining product tiles in sandbox

In order to control automation & finish experimentation of tiles within our sandbox foundation, we should implement the remaining product tiles in sandbox

Account for multiple foundations

Unfortunately this current model only upgrades the tiles in one foundation. But I would like to have a tile version "flow" through to other foundations; e.g., tile_a-0.2 must successfully pass through 'sandbox' before it can be applied to 'dev'.

Regulate when `apply-changes` can be run on a tile

In order to control planned downtimes, we should regulate when apply-changes are automatically run on a tile

Remove "-and-stemcell" from "upload-product-and-stemcell" task

The pcf-pipelines task upload-product-and-stemcell will upload the stemcell for a product... but I don't need the stemcell to be uploaded in this task anymore. I only want the product. Therefore, we should remove the "-and-stemcell" part from the "upload-product-and-stemcell" task

Use distributed lock on "apply-changes"

In order to prevent multiple "apply changes" jobs from running at once, we should use some kind of mutex on the opsman. In the pcf-pipelines project, they use a task called "wait-opsman-clear". Perhaps we can use something else? Perhaps instead of doing a poll-wait on the opsman, we can use the "pool resource"? https://github.com/concourse/pool-resource

Would supercede
#10

Notify operations on failure

E.g., slack notification, email notification, etc. Something to indicate that things have gone wrong.

Investigate if product version regex patterns are correct

I'm not sure if "3445..*" is the best pattern to use. Should investigate using most efficient regex pattern.

Ability to open/close GitHub issues when new products are available/deployed

In order to maintain documentation for tile upgrades, we should have the ability to create/close GitHub issues on the upgrade-tiles repo.

Make `stage-product` a task in `apply-changes` job

We should separate 'upload-product' and 'stage-product' in order to prevent unnecessary double-uploading. If I want to re-stage a product, why should I have to re-upload it? And if I want to upload a product, why should I have to stage it? Consider separating these.

Janitorial tasks in separate pipeline

There should be a separate pipeline for janitorial tasks in the Opsman, such as--

delete unused products
perform s3 backup

etc

Perform s3 backup after successful apply-changes

In order to keep timely backups of the Opsman, s3 backups should be performed after apply-changes is run

Consider breaking out "upload tile" and "stage tile"

Do we need to run upload tile every single time we do stage tile? Investigate viability of separating these. Would save us time & data costs.

Example of how things are:

Investigate "install product" but pass in "deployed_products" param for selective deploys

https://opsman-dev-api-docs.cfapps.io/#triggering-an-install-process

What happens if I call the POST /api/v0/installations endpoint like this?

	./om-darwin \
		--target $TARGET \
		--client-id $USERNAME \
		--client-secret $SECRET \
		--skip-ssl-validation \
		curl \
		--request POST \
		--path /api/v0/installations \
		--data '
			{
				"deploy_products": "all"
			}
		'

Here's the deal:

In the UI, if you hit "apply changes", it will run smoke tests / errands / etc. for all tiles. We don't want this. We want selective deploys.
When using the API endpoint, presumably, if you don't pass in any --data '...', it is the equivalent of hitting "apply changes" in the UI.
When using the API endpoint, you can pass in a list of product GUIDs to trigger selective deploys only on those products. That works just fine.
But what if we use this API endpoint and explicitly pass in --data '{deploy_products: all}'? I know 'all' is the default value, but is the behavior different? Can we get it to perform selective deploys only on the staged changes? The benefit here is that then we don't have to explicitly pass in an array of product GUIDS

Stage-deploy does not account for -build.x products

Maybe my stage-deploy task simplification was naive, because now on products that have a *-build.11 semver system, it falls apart.

Healthwatch 1.1.7:

Archive:  ./pivnet-product/p-healthwatch-1.1.7-build.11.pivotal
   creating: metadata/
 extracting: metadata/p_healthwatch.yml  
could not execute "stage-product": failed to stage product: cannot find product p-healthwatch 1.1.7

It wants to be configured with the "full" version, e.g., 1.1.7-build.11

Investigate viability of using built-in functions of Opsman to coordinate changes

For example, instead of downloading a tile and re-uploading it, would it just be easier to issue API calls to opsman & tell it to go download a product? Would it be easier to just outfit the opsman with a pivnet API token && coordinate all changes as "commands" to the opsman?

Is there a way to add a pivnet token to opsman via API call? Flow: get a pivnet token, put it in credhub, then have Concourse upload pivnet token to opsman. Then Concourse handles upgrading opsman by issuing commands via API, rather than having to supply the actual pivnet product/stemcell/etc.

aegershman / upgrade-tiles-proof-of-concept Goto Github PK

upgrade-tiles-proof-of-concept's People

Contributors

Watchers

upgrade-tiles-proof-of-concept's Issues

Recommend Projects

Recommend Topics

Recommend Org