jquery / infrastructure-puppet Goto Github PK
View Code? Open in Web Editor NEWPuppet configuration for jQuery Infrastructure servers.
License: MIT License
Puppet configuration for jQuery Infrastructure servers.
License: MIT License
The jQuery blog at https://blog.jquery.com/ doesn't support some non-ASCII characters, like the ones in my name: ł
, ę
. If you enter them anywhere, they get converted to question marks.
The issue looks related to the database as it's observed regardless of if you enter such a letter when authoring a blog post or in a plain text field when setting First/Last Name on the profile settings page.
The same issue doesn't exist in the UI (https://blog.jqueryui.com/) & Mobile (https://blog.jquerymobile.com/) blogs despite them being on the same WordPress version (5.8.1).
In the team meeting on 31 March 2023, we looked back on the swarm-02.ops server refresh (https://github.com/jquery/infrastructure/issues/444). The main thing that stood out is the lack of backup, which, while intentional, is also a bit of a weakness.
The most prominent issue was the runtoken
used by testswarm-browserstack clients, which we've since then fixed by adding the capability in TestSwarm for the runtoken to be provisioned through a configuration file (jquery/testswarm@a7e7d6a, c089253).
The remaining issues are:
*.jquery.com
certificate./jQuery-foo.js
is able to match /jquery-foo.js
.-4
, -6
, --http1.1
, --http2
, --tls-max 1.2
, --tls-max 1.3
, http+https URLs (except http2 over HTTP) and confirm HTTP 200 OK (esp no redirect). Use --connect-to ::SOMETHING.global.fastly.net
to test prior to deploying any DNS changes.Three services overall: code, content, releases.
codeorigin.jquery.com
for functional testing.code.jquery.com
.Examples of past issues:
OpenSSL SSL_read: Connection reset by peer
. jquery/codeorigin.jquery.com#82Main differences:
Debian 11 Bullseye hosts today:
wp-02.stage.ops.jquery.net
builder-04.stage.ops.jquery.net
puppet-03.ops.jquery.net
search-02.ops.jquery.net
#36codeorigin-02.stage.ops.jquery.net
codeorigin-02.ops.jquery.net
wpblogs-01.ops.jquery.net
gruntjs-02.stage.ops.jquery.net
gruntjs-02.ops.jquery.net
miscweb-01.ops.jquery.net
contentorigin-02.ops.jquery.net
swarm-02.ops.jquery.net
The following went straight from legacy Debian 7 to Debian 12 Bookworm, via #8, and were never on Debian Bullseye.
wp-*.ops
builder-*.ops
filestash-*.ops
Similar to old hosts, we didn't copy this part over:
d736898?diff=unified#diff-2ff59c9c8afebb232a1b58b7d566ce6f59017f0ae67bbdfc7baa7baaff3c22df
Given that we rewrite robots.txt to wordpress, and wordpress has (afaik) a setting for this, another approach might be to set the appropiate wp option at runtime on staging sites.
@mikewest wrote at jquery/codeorigin.jquery.com#57:
https://docs.google.com/document/d/1zDlfvfTJ_9e8Jdc8ehuV4zMEu9ySMCiTGMS9y0GU92k/edit#bookmark=id.kaco6v4zwnx2 is part of an explainer for the general approach browsers are taking.
[…] Digging through HTTP Archive, I see ~158k sites depending on a script resource of some sort from
code.jquery.com
.
@mikewest wrote at MaxCDN/bootstrapcdn#1495:
Yes,
Cross-Origin-Resource-Policy: cross-origin
is what you'd apply to resources that ought to be embeddable across the web.
When a webhook fires and rebuilds a content site like api.jquery.com, and it a command like npm install
or grunt deploy
failures, the output of those commands should be easy to retrieve from the builder host. For example, via /var/log/syslog
, or via sudo journalctl -u notifier-server -f -n100
.
The last time I tried this, however, the output was not readily available. It seems either the appropiate log levels are turned off, or perhaps the (sub?) process output isn't captured at all from the systemd unit perspective.
Some of the past findings from ~2021 are saved at https://github.com/jquery/infrastructure/wiki/Builder-host.
Previous renewal at https://github.com/jquery/infrastructure/issues/551, with previous testing methodology and results at https://github.com/jquery/infrastructure/issues/551.
Timeline:
.crt
and .ca-bundle
file..key
file, encrypted with GPG against my personal public key.For future reference, please note that the turnaround time was quick in part due to escalation by Benjamin Sternthal and in part because Christopher was already familiar with me and my public key from the year before. I would recommend if someone else requests these in the future, to pair the original request with your GPG public key, and make sure to confirm that you want to receive it on an email address matching your GPG key.
.key
file, and generated the .pem
file as per the README instructions in /modules/jquery/files/cert/. And subsequently verify the file using the verify_certs.sh
script before uploading anywhere else.#jquery_dev:gitter.im
on Matrix to test against https://learn.jquery.com from their various devices and command-line clients.Follow up from #6, in which we moved all ~20 doc sites to new infrastructure as simple standalone WordPress sites. There is one site we haven't migrated yet: https://plugins.jquery.com/.
This looks like a fairly stateful site, and also lacks a staging site. It includes several custom build steps that we haven't ported over yet, and I'm actually not sure that it would re-create the same site even if we do run it from scratch since the underlying sources may have dissappeared or significantly changed.
We could try to copy and upgrade the existing database as-is, but maybe we want to use this chance to turn it into a static site (like #10). Possibly a bit simpler and slimmed down only an index listing with a page for each plugin (URL-compatible) showing the plugin meta data (description, author, links to website/docs/bugs), so that it's easy for people to find what this points to and where to find source code, updates, contact persons, or forks going forward.
Regression caused by 864f584.
It seems as-is, we are mutually exclusive between octodiff working (and real puppet showing empty string as version) or octodiff failing on a git command (and real puppet showing commit message).
Even though those can be re-built in case something goes wrong, let's enable Tarsnap backups of the WordPress databases and content so that we have a faster way to recover from a failed WordPress upgrade or similar issue.
The https://cla.js.foundation site is configured in Cloudflare, and has the jsf-cla-assistant
droplet (IP 159.203.165.250
, created 23 Oct 2016) as its backend. This droplet appears not managed by either Puppet or Ansible, and I'm unable to SSH into this myself. @brianwarner Do you have access to this one?
Screenshot of https://cla.js.foundation from before I powered off the droplet just now:
We also have the former cla.jquery.net site, backed by the cla-01.ops.jquery.net
droplet in DigitalOcean (IP 104.131.146.50
, created 5 Feb 2015), which I do have access to and is managed by Puppet. This uses the software at https://github.com/jquery/jquery-license, and was until recently also used dynamically by the https://contribute.jquery.org website through URLs like http://contribute.jquery.org/CLA/status/?owner=jquery&repo=jquery&sha=XYZ
.
Next steps:
cla-01.ops.jquery.net
needs to be extracted and preserved somewhere. If yes, @Krinkle can help with this.cla-01.ops.jquery.net
droplet, and remove CLA Puppet manifests.jsf-cla-assistant
droplet.Right now, the moment gilded-wordpress is updated on the server-side, clients in content repos like api.jquery.com are broken unless all 12 repos are updated at the same time.
Note that in practice, nothing breaks because:
A while ago, we made jQuery Core Trac instance (https://bugs.jquery.com/) read-only, making the only still fully functional Trac instance to be the jQuery UI one (https://bugs.jqueryui.com/). With jQuery UI being maintained in a very limited way nowadays, it doesn't make sense to maintain a full separate bug tracker just for this project; other projects are using GitHub issues.
We want to enable GitHub issues for jQuery UI and make the UI Trac read-only. In the future, we can consider replacing Trac with a static dump of all its pages if that makes maintenance easier.
I'm not 100% sure what we did for Core but I think we mostly blocked account registration and removed all existing accounts as with no accounts the site is essentially read-only. @rjollos can you help with doing the same for UI? I'm not sure if it was you or someone else involved with the changes for the Core Trac.
Various bots and crawlers are producing entries like the following in wp-05:/var/log/php8.2-fpm.log:
[15-Sep-2023 15:02:17] WARNING: [pool www] child 2355747 said into stderr:
PHP Fatal error: Uncaught TypeError: str_contains(): Argument #1 ($haystack) must be of type string, array given
in /srv/wordpress/sites/api_jquery_com/wp-login.php:1365
Stack trace:
#0 /srv/wordpress/sites/api_jquery_com/wp-login.php(1365): str_contains()
#1 {main}
thrown in /srv/wordpress/sites/api_jquery_com/wp-login.php on line 1365
Seems to be an upstream issue where a $_GET
or $_REQUEST
key is checked for existence but not for type, thus prone to misuse when crafting query parameters in the array-form that PHP supports.
https://github.com/WordPress/wordpress-develop/blob/6.3.1/src/wp-login.php#L1267-L1365
Enable WordPress automatic updates so that we don't have to be constantly upgrading it by hand, with an option to disable those and roll back to a specific version via Puppet.
There's a trade-off here between automatic updates and consistency/reproducibility with lock files. I think for the long-term, the latter is more favourable given we're not super fast moving. Also, these aren't long-running or public services, so there isn't an inherent benefit to automatic updates from that angle, either.
Source:
To simplify management, let's consolidate these two file servers since the paths don't overlap in conflicting ways, and a portion of their content is already duplicated/available on both.
This in turn let's us simplify the Fastly config to 1 site.
@markelog wrote:
Idea - Keep known fingerprints to easily import to avoid having to trust first ssh connect.
@Krinkle wrote:
Maybe in public infra repo, and then document how to load it into ~/.ssh/known_hosts.d/ and how to enable it in ~/.ssh/config.
@supertassu wrote:
https://puppet-03.ops.jquery.net/known_hosts exists now. However it's not documented and it probably needs a more stable hostname if we want to encourage people to use it.
Or we could just have everyone trust the SSH CA:@cert-authority *.ops.jquery.net ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPt01ydjmlHiFKFD3ya6JcQtEPe0WbPj6JnGa/noy4mI jQuery SSH CA v1
(first line of above link)
From @mgol:
it looks like https://releases.jquery.com is no longer getting properly updated. The first commit that didn't get deployed was yours jquery/codeorigin.jquery.com@53da218 adding QUnit
2.20.0
.
The JS file is there: https://code.jquery.com/qunit/qunit-2.20.0.js
so the CDN updates properly but the WordPress site is not.
https://github.com/jquery/infrastructure/blob/puppet-stage/manifests/site.pp
wp-01
, jquery.com
wp-02
, most other sites (incl *.jquery.org, jqueryui.com, etc)wp-03
, codeorigin.jquery.com, releases.jquery.com, and recipient of Git assetswp-01.stage
, WordPress doc sites, staging, all domains (stage.api.jquery.com, etc)builder-01
builder-03.stage
jq03.stage.jquery.com
(stage.demos.jquerymobile.com, stage.themeroller.jquerymobile.com)jenkins-01
cla-01.ops.jquery.net
cla-01.stage.jquery.net
gruntjs.ops.jquery.net
gruntjs.stage.jquery.net
origin-01.ops.jquery.net
, contentorigin (content.jquery.com, static.jquery.com)swarm-01.ops.jquery.net
, TestSwarmview-01.ops.jquery.net
, View, git assetstrac.ops.jquery.net
, Trac, (bugs.jquery.com, bugs.jquerui.com)Dedicated tickets:
In order to get away from the very outdated Debian versions and such, we need to also get to a newer Puppet version.
We are currently using numerous Puppet 2 features that were deprecated in Puppet 3 and removed in Puppet 4. The main change that I think affects us is the change from "environment configs" to "environment directories".
Some relevant links:
The puppet server runs at puppet.ops.jquery.net (in legacy docs: puppet-master). The config for the server is at /etc/puppet/puppet.conf
. There are two Git clones that we care about on this server:
/etc/puppet
- This is a clone of jquery/infrastructure.git at branch puppet-master
. This currently replaces the entire /etc/puppet
directory./etc/puppet-stage
– This is a directory we made up, containing another clone of jquery/infrastructure.git at branch puppet-stage
.In /etc/puppet/puppet.conf
(the only place the Puppet server actually looks at) we have the following stuff:
[main]
# …
templatedir=$confdir/templates
manifest=/etc/puppet/manifests/site.pp
[stage]
manifest=/etc/puppet-stage/manifests/site.pp
modulepath=/etc/puppet-stage/modules
# …
[master]
# …
By default, with one of our droplets that runs a puppet agent asks for provisioning, it gets provisoned by the main config which points simply at the subdirectories within /etc/puppet
. On staging hosts, we have another /etc/puppet/puppet.conf
file that may contain environment = stage
, which the agent passes on to the Puppet server, and so the Puppet server will consider that manifest and modulepath directory instead (in addition to compling it with $::environment = "stage"
).
Beyond this, the only other thing worth knowing is that we use jquery::postreceive
instances (similar to for the content sites) to automatically update these git checkouts after commits to them. The actual applying of changes however is passive, based on puppet agents checking in with the server every 30 minutes (default Puppet agent behaviour).
Under Puppet 4, things are a little bit different. There is no longer support for the templatedir
, manifest
, and modulepath
parameters, and there is no longer support for per-environment configuration section overrides.
Instead, modules are read from a directory like /etc/puppet/code/environments/:environment/modules
and manifests are read from a directory like /etc/puppet/code/environments/:environment/manifests
. For example: /etc/puppet/code/environments/production/modules
.
I think global templates are no longer supported, or at least not varying by environment. But that's okay, we only have one file in /templates
and that'll either just not support staging or maybe we can even get rid of it (do we still use Zabbix?).
The new directory layout seems feasible, we just create two more clones and keep both for a little while.
I noticed just now that, apart from a few minor tweaks being needed for deprecated features, more generally it is not supported to connect Puppet 4 clients to a Puppet 3 server. However, the other way around is supported. So, the puppet master will have to go first, and that means a master switch, and setting up a new one of those first as well.
The good news is, a Puppet server is relatively easy to configure and gradually switch to...
Follows-up from https://github.com/jquery/infrastructure/issues/9. We have a backup role in the Puppet manifest (github::backup
), and it was used for a while on the jq02
host, but that host has since been decom'ed.
This is a task to review the script, fix it if needed, and then re-enable it in one of the host roles. It's pretty minor and does not need a dedicated host. It runs once a night and needs a bit bit of space. The builder host seems like a good candidate as it's the only host with any real space needs.
From @timmywil:
Can we enable 2FA for wp-admin?
In the team meeting on 31 Mar 2023 we decided that "Yes", we do want to enable automatic pruning instead of relying on ad-hoc pruning when the database "gets too big". @mgol okay'ed an initial retention period of 1 year. We can run it daily or weekly.
If we do find we have to run it ad-hoc earlier due to space issues, we can follow-up by tweaking the parameters in the cronjob, but we'll have it in-place at least.
blog.jquery.com-theme is effectively a fork of jquery-wp-content.
Proposed:
@supertassu Are you aware of aspects of jquery-wp-content (specifically, the "next" branch) that would make it difficult to use for the blogs? E.g. things that perhaps the blog fork has stripped out that might pose issues?
The main thing I can think of is the multi-site aspects, but we've weeded those out on for the new wpdocs servers where we don't use multi-site WordPress, so I suspect it'd be fine, but curious what you think.
The wp hosts are currently on PHP 5. More specifically, PHP 5.4, which has been EOL since 2014.
A safer option might be PHP 5.6, which has support for another year. On the other hand, there have been very few breaking changes and WordPress themselves currently recommend PHP 7.2. Seems worth trying to upgrade there in one step.
Quote from https://github.com/jquery/infrastructure/issues/312:
As part of the infra refresh we will (for now) continue to use WordPress for the foreseeable future. (ref #8, jquery/infrastructure#449, and subtask for WordPress: #6)
Notable changes:
- Public infrastructure as code: https://github.com/jquery/infrastructure-puppet
- Latest Debian, PHP, MySQL, and WordPress.
- Few or no plugins. Simple standalone theme.
- No multi-site network. Simple standalone WP installations.
To do:
jquery-blogs
blog.jquery.com
blog.jqueryui.com
blog.jquerymobile.com
wp-01
learn.jquery.com
api.jquery.com
jquery.com
plugins.jquery.com
Being archived at #29wp-02
*.jquery.org
*.jqueryui.com
*.jquerymobile.com
wp-03
releases.jquery.com
ref https://github.com/jquery/infrastructure/issues/554Background as documented previously:
As of 2021, we're exploring an open-source solution that we can support within the free software ecosystem. In doing so we will increase security and availability (by reducing client-side dependence on third-party domains), and lower our privacy budget.
We first evaluated Meilisearch and experienced some suboptimal aspects. These included: difficult upgrades (not yet committing to forward compatibility or automatic in-place upgrades), opt-out telemetry instead of opt-in, no official Debian packages, non-trivial interactive setup, missing support for querying multiple indexes (e.g. qunitjs.com and api.qunitjs.com), and a not yet clear future in terms of business model (Meilisearch Cloud was not yet in the picture, and the backend is not GPL licensed).
In mid-2022, the experiment transitioned to focus on Typesense instead.
Since April 2023 we have an instance of Typesense running in the new infra, provisioned through this repostory (558de96). I also developed a 2kB minimalistic HTML-first client and user interface for it at https://github.com/jquery/typesense-minibar and integrated it with our Jekyll theme at https://github.com/qunitjs/jekyll-theme-amethyst/. This has been live on https://qunitjs.com/ for the past few months.
Next, we need to migrate the remaining doc sites which are still using the (now stale and deprecated) Algolia DocSearch indexes:
Write some basic documentation on how the WordPress setup works, and especially on how we do WordPress updates and what to do in case one fails.
Refs:
Work:
I noticed in Fastly Observer for releases.jquery.com that there's a handful of "Pass" requests once every few minutes (i.e. not cache-miss requests, but uncachable requests, despite having a fallback TTL).I set up a temporary log bin, to look at what these are and they turn out to be POST requests, such as:
16:39
/xmlrpc.php POST
/wp-cron.php?doing_wp_cron=*** POST
16:50
/xmlrpc.php POST
/wp-cron.php?doing_wp_cron=*** POST
16:51
/wp-cron.php?doing_wp_cron=*** POST
/wp-cron.php?doing_wp_cron=*** POST
We can drop these at the edge, since the builder communicates directly with the origin anyway, and the default web-based cron can be turned off in WordPress. We can then turn it back on via a systemd time once a day or something.
https://developer.wordpress.org/plugins/cron/understanding-wp-cron-scheduling/
https://developer.wordpress.org/plugins/cron/hooking-wp-cron-into-the-system-task-scheduler/
Looking at wp_options
where option_name='cron'
shows sites typically have the following recurring tasks:
hourly wp_privacy_delete_old_export_files # unused in our headless setup
daily wp_scheduled_auto_draft_delete
twicedaily wp_https_detection
twicedaily wp_version_check
twicedaily wp_update_plugins
twicedaily wp_update_themes
daily recovery_mode_clean_expired_keys
weekly wp_delete_temp_updater_backups
weekly wp_site_health_scheduled_check
We have received anecdotal reports from jQuery CDN users that StackPath/Highwinds may be blocking Tor clients.
It is not uncommon for CDN providers to offer WAF protection to avoid abuse, e.g. when serving a blog open to comments, or some other kind of dynamic service with an abuse vector. I'm guessing Highwinds has a blocklist of sorts that includes IPs of customers who happen to run Tor relays at home.
I was not able to find any kind of WAF or traffic filtering rules in Highwinds StrikeTracker, nor could I find anything about it in the Highwinds StrikeTracker support pages. However, the general StackPath support pages do mention their WAF service, and indeed that offers a preset for TOR exit nodes. I'm guessing a version of this rule is implicitly turned on for Highwinds, without any ability to turn off, or at least not in a way we can control ourselves.
https://support.stackpath.com/hc/en-us/articles/360001091666-Review-and-Allowlist-CDN-WAF-IP-Blocks
Depending on the timeline for switching from StackPath Highwinds to Fastly, it might not be worth escalating with StackPath. Instead, we can make sure that post-switch we can check and make sure Fastly does not block access to jQuery CDN from home IPs that use Tor.
Ref jquery/codeorigin.jquery.com#95, \cc @vincejv.
Currently if notifier-service forks a process we have no way to kill it if it does not exit in a timely manner.
krinkle@wp-04:/var/log/nginx$ tail -f error.log
PHP Warning: Attempt to read property "post_parent" on null
in /srv/wordpress/jquery-wp-content/themes/learn.jquery.com/sidebar.php on line 35
PHP Warning: Attempt to read property "ID" on null
in /srv/wordpress/jquery-wp-content/themes/learn.jquery.com/sidebar.php on line 15
PHP Warning: Attempt to read property "post_parent" on null
in /srv/wordpress/jquery-wp-content/themes/learn.jquery.com/sidebar.php on line 35
PHP Warning: Attempt to read property "ID" on null
in /srv/wordpress/jquery-wp-content/themes/learn.jquery.com/sidebar.php on line 35
krinkle@wp-04:/var/log/nginx$ ls -halF
1.1G error.log
This is reproducible via URLs such as https://learn.jquery.com/?s=findnothingnothingatall
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.