Git Product home page Git Product logo

wpcom-legacy-redirector's Introduction

WPCOM Legacy Redirector

WordPress plugin for handling legacy redirects in a scalable manner.

Please see our wiki for detailed documentation.

Requirements

  • PHP 7.4+
  • WordPress 5.9+

Change Log

See CHANGELOG.md for the full list of changes.

License

Licensed under GPL-2.0-or-later.

wpcom-legacy-redirector's People

Contributors

bdtech avatar bswatson avatar dependabot[bot] avatar emrikol avatar garyjones avatar jigneshnakrani088 avatar kjbenk avatar mdbitz avatar mikeyarce avatar mjangda avatar ovidiul avatar philipjohn avatar rclations avatar rebeccahum avatar rodruiz avatar seanlanglands avatar spencermorin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wpcom-legacy-redirector's Issues

Report on non-whitelisted hosts during the import

As the plugin uses wp_safe_redirect, it would be helpful to report any non-whitelisted host which is being imported.

Example implementation:

$parsed_url = parse_url( $redirect_being_imported );
if ( true === isset( $parsed_url['host'] ) && false === empty( $parsed_url['host'] ) ) {
    $wpp = parse_url(home_url());
    $allowed_hosts = (array) apply_filters( 'allowed_redirect_hosts', array($wpp['host']), isset($parsed_url['host']) ? $parsed_url['host'] : '' );
    if ( !in_array($parsed_url['host'], $allowed_hosts) && $parsed_url['host'] != strtolower($wpp['host'])) ) {
        WP_CLI::line( sprintf( 'Non whitelisted host is being imported: %s', $parsed_url['host'] ) );
    }
}

Make Redirects Editable

Currently to change a URL, the Redirect has to be deleted and then re-added.

The ability to edit a redirect would improve ease of use.

Introduce Redirect_Code class

Add a class that holds the enum list (class constants) of supported redirect HTTP status codes - like 301 and 302. Note that everything other than 301 is not yet used out of the box, but there is a filter around a 301; adding a constant for 302, for example, would allow someone to use this as well to set a 302 redirect.

Something like:

<?php

declare(strict_types=1);

namespace Automattic\LegacyRedirector\Domain\Redirect;

class Redirect_Code {
	public const MULTIPLE_CHOICES   = 300;
	public const MOVED_PERMANENTLY  = 301;
	public const FOUND              = 302;
	public const SEE_OTHER          = 303;
	public const USE_PROXY          = 305;
	public const TEMPORARY_REDIRECT = 307;
	public const PERMANENT_REDIRECT = 308;

	public static function valid_redirection_codes() {
		return [
			self::MULTIPLE_CHOICES,
			self::MOVED_PERMANENTLY,
			self::FOUND,
			self::SEE_OTHER,
			self::USE_PROXY,
			self::TEMPORARY_REDIRECT,
			self::PERMANENT_REDIRECT,
		];
	}
}

Consistency in Redirect To column

For internal links, wether a TO value shows as relative or as an absolute URLs, I believe serve no purpose. Users reading down the TO column might benefit from seeing consistent local relative paths or absolute urls, but probably not both.

screen shot 2018-12-06 at 3 23 20 pm

Bulk importing from a CSV file fails to process any imports.

When preparing to do a bulk import from a CSV, I tested locally and found none of the redirects were being added. I was able to track this down to the check on whether WP_CLI is defined.

There is already a pull request (#8) which includes a fix for it, but it also included an additional filter and is being held up from being merged.

Add Verification CLI method

Use Case:

  • Run a CSV or other bulk import
  • We now want to make sure that the redirects work as expected

It would be interesting to be able to run a cli command (or something similar) that verifies that the redirects were pulled in successfully and work as expected.

Validate Redirects inline

Right now the "Validate" admin_notice doesn't indicate which Redirect was validated.

screen shot 2018-12-06 at 2 48 15 pm

Perhaps the validation can occur inline with ajax instead of page reload? Maybe with a UI similar to the wp-admin > Plugins 'update' bar

screen shot 2018-12-06 at 2 50 45 pm screen shot 2018-12-06 at 2 50 53 pm

Allow inserted ID to be returned

The insert_legacy_redirect() method will return "True if inserted; false if not permitted; otherwise error upon validation issue.".

It would be handy to make the newly inserted post ID to be available.

If we updated the true return, then this may break existing === checks.

We can add a new param to the method and which can dictate whether true or the post ID is returned on success.

For a 2.0 release, we can make the change so that only the inserted post ID is returned by default.

UI: duplicate redirects when using slug and ID

It's possible to get duplicate redirects:

  1. Add FROM: /redirect-to-slug, TO: /about
  2. Add FROM: /redirect-to-slug, TO: 1545 (on my test site, this is the ID of the about page.

The result is duplicate redirects with the same FROM but different TO:

Screenshot 2019-05-23 at 11 12 22

There should be some logic which consolidates these.

UI: Relative paths in network subsites misleading previews and bypassed validation

In develop branch, I noticed when adding a relative path, the redirect_*_preview above the input shows the current sites path and appends the inputs value to it. However once submitted, the subsites path is not included, which may be misleading.

Steps to reproduce:

  1. checkout develop branch
  2. activate and visit the ui of the plugin within a network subsite (say example.com/subsite1/)
  3. type a FROM relative path of /testing
  4. notice how the preview above the input includes the current subsite1 path and suggests the redirect FROM will be: example.com/subsite1/testing/
  5. add a valid TO and hit submit
  6. notice the redirect FROM created is example.com/testing/ instead of the previewed example.com/subsite1/testing/

Video: https://io.davidsword.ca/wpcomlegacyredirect.mov

The TO input has the same issue, of previewing the subsite path in the preview, but not including it in the actual stored redirect value. There's an additional oddity where if the subsite path isn't included (which will result in a broken link), the link validation does not catch it and it shows as a valid link.

Video: https://io.davidsword.ca/wpcomlegacyredirect2.mov

Adding max-age to redirects

301s are cached by browsers (at least Firefox and Chrome) with no expiry date.
This poses a problem when we need to update redirects in some cases.

I suggest adding a header like Cache-Control: max-age= where seconds may be a fairly large number, perhaps even one day.

We may think of setting up a default behaviour and some configurable params.

Stop post type being indexed by VIP Search

Would need something like:

add_filter( 'ep_indexable_post_types', 'legacy_redirector_do_not_index_in_elasticpress' );

function legacy_redirector_do_not_index_in_elasticpress( $types ) {
	$deny_list = array( 'vip-legacy-redirect' );
	// Unset the items with values matching the deny list.
	return array_diff( $types, $deny_list );
}

See https://docs.wpvip.com/enterprise-search/indexing/post-types/#1-excluding-post-types-from-the-allow-list

While this originated from the VIP Search, the ep_indexable_post_types filter applies to any instance of ElasticPress (not just on VIP), so we would not want to put this snippet into a vipgo-helper.php file.

Add redirects individually does not work

From the Wiki...
There are three ways to create redirects:

  1. Adding redirects from WordPress Admin Visit the Redirect Manager โ†’ Add Redirect WP Admin screen and add redirects individually.
    =================================================================================================
    When we follow step 1, there is no option to input a "to" for
    Relative Path: /redirect_from/,/redirect_to
    Post ID: /redirect_from/,123(redirect_to_post_id)
    URL: /redirect_from/,https://site.com(redirect_to_url)

Check line-endings before import

tl;dr Check line-endings in a CSV file are consistent with the host environment before import.


I tried to use this plugin with a .csv file that produced errors on import. What was reported had overlapping URLs, and only showed 7 out of the 88 URLs attempting to be imported.

I was able to spot a pattern on which URLs these were in the CSV file, and found it was about every 2000 bytes. That lead me to consider that the fgetcsv() file appeared to be grabbing 2000 bytes of data as a single row, which implied that multiple rows were being concatenated into one.

From this, I was able to make and prove the conclusion that the file was using Mac OS 9 line-endings (CR). The server I was trying to run this on required Unix line-endings (LF). Converting the file locally, pushing to the server and importing then worked fine.

The plugin should check for line-endings, or, as suggested here, enable auto-detection of line-endings before opening the file.

Enhance code quality through sonarcloud.io or similar

sonarcloud.io is an online service for Code Quality Review, it might be a good idea to set this up along with PHPCS and Wordpress coding standards and it will give way for better code practices and better collaboration.

A current scan revealed a list of potential items that should be fixed.

Quality Gate Status Maintainability Rating Security Rating

Decouple validation logic

The main plugin class has a validate_urls() method, and a separate validate() method.

The UI class has a validate_redirects_notices, the addition of a "Validate" link, the handling for checking that a link is valid, and a highly-coupled call to validate when adding a redirection via the UI.

When importing a CSV, there's a --skip-validation flag to bypass all of this!

I want to see the validation of a From value, and any validation of a To value to be handled separately. These should be built out as individual validators, which are then registered to a UrlValidator class (Strategy Pattern). This UrlValidator (could be a different instance for From and To, depending on if they need different validations) can then be injected as a dependency when it is needed.

Should start with making a list of the different validations that happen for any data.

Utilize Constant for Plugin Slug

Constant should be defined for the plugin's slug.

define( 'WPCOM_LEGACY_REDIRECTOR_PLUGIN_SLUG', 'wpcom-legacy-redirector' );

There are currently 47 references in the code base to the string 'wpcom-legacy-redirector'

i.e. 'not_found' => __( 'No redirects found.', 'wpcom-legacy-redirector' )

Add ability to white list query params that should pass through on redirects

As part of #63905-z we had the interesting request that legacy redirects should ignore utm_* parameters so that they don't get stripped when redirects occur.

Per discussion in #14 it was determined that whitelisting of params would be an ideal solution so that users can allow campaign and other params to be passed through to the redirected page.

non-ASCII characters allowed don't redirect

Emojis and chars like the horizontal ellipsis โ€ฆ can be submitted successfully in the FROM path's, but they don't actually redirect.

This could be a problem if for example this plugin is used as a shortener by marketers, ie: /๐Ÿ”ฅ-deals

Unable to trash redirects to temporarily pause them

The plugin supports trashing the redirect posts, however the logic evaluating the redirect request doesn't account for the post status at all:

$redirect_post = get_post( $redirect_post_id );
if ( ! $redirect_post instanceof \WP_Post ) {
// If redirect post object doesn't exist, reset cache.
wp_cache_set( $url_hash, 0, self::CACHE_GROUP );
return false;
} elseif ( 0 !== $redirect_post->post_parent ) {
// Add preserved params to the destination URL.
return add_query_arg( $preservable_params, get_permalink( $redirect_post->post_parent ) );
} elseif ( ! empty( $redirect_post->post_excerpt ) ) {
// Add preserved params to the destination URL.
return add_query_arg( $preservable_params, esc_url_raw( $redirect_post->post_excerpt ) );
}

If we do want to support the trashing, the logic needs to be extended to check for the post_status of the resolved $redirect_post. Alternatively, we could disable the trash support and allow hard-deletes.

Please let us know if this needs to be addressed and we can submit a pull request.

CLI: insert-redirect fails when using post ID

$ wp wpcom-legacy-redirector insert-redirect /foo 5
Error: Couldn't insert /foo -> 5 (Redirect is pointing to a Post ID that does not exist.)

Th insert-redirect command fails when using a post ID as the To destination value. Removing a check for the $_POST superglobal, as per #115 fixes it, but then this breaks the UI view:

Screenshot 2023-06-28 at 09 02 26

(The different redirect there is using a /bar To value, rather than a post ID)

#115 was reverted in #116, but the problem still likely lies in WPCOM_Legacy_Redirector::vip_legacy_redirect_parent_id().

Can't add redirect when using a self-signed certificate

The check_if_404() contains a wp_remote_get() (and a vip_safe_wp_remote_get()) call. When using a self-signed certificate, this stops the check from passing as a WP_Error is returned:

object(WP_Error)#1040 (2) {
	["errors"]=> array(1) {
		["http_request_failed"]=> array(1) {
			[0]=> string(63) "cURL error 60: SSL certificate problem: self signed certificate"
		}
	}
	["error_data"]=> array(0) { }
}

However, the UI only shows an error of Error: Redirects need to be from URLs that have a 404 status., which is not indicative of the actual error.

Make redirects support different post statuses

As per #125 (comment) when redirect posts are added, they end up with a draft post status.

To enable a concept of active and inactive (and trashed #125) redirects, we need to:

  • Insert new redirects with a publish post status.
  • Have an update mechanism in a release to update existing redirects to have a publish status (if not already trashed).
  • Update front-end redirect logic to only act on redirects with publish post status.
  • Update tests accordingly.

A future separate issue can then look at the UI and implementation for toggling the active status of a redirect.

Switch from custom edit page, to meta boxes on an Add New page

Currently, the add new redirect page is found at edit.php?post_type=vip-legacy-redirect&page=add-redirect.

This is created by registering a new admin page, via add_submenu_page().

However, since the purpose of the screen is just to add a new instance of a custom post type (something that WP already supports), we can go about this in a different way.

What we need is to allow the creation of new posts (vip-legacy-redirect), and then register a callback that amends the metaboxes on the page, to add a custom metabox that includes the fields we need.

Among other benefits, would be that the Publish metabox would be visible - this allows the later supporting of scheduled publishing of redirects to make them live, and different redirect statuses. Until this we could likely remove these from the Publish metabox.

  • remove the 'create_posts' => 'do_not_allow' capability. This will turn on the Add New button at the top of the Redirects list page (see #39), and add a new "Add New" page at post-new.php?post_type=vip-legacy-redirect.
  • add in methods to register metabox, and provide fields in custom metabox.
  • remove existing registration of pages and generated HTML methods.

Add export CLI command

It would be useful to have an export CLI command that would grab the current redirects out as a CSV file.

Redirect is not deleted if still in trash

If we delete the redirect with wp_delete_post( WPCOM_Legacy_Redirector::get_redirect_post_id( '/<redirect>' ) );, it still redirects from the trash, unless you force delete via wp_delete_post( WPCOM_Legacy_Redirector::get_redirect_post_id( '/<redirect>' ), true );.

Error when running CLI command

When calling the import-from-meta CLI command I get this error Redirect is pointing to a Post ID that does not exist.

After doing some research it appears that the vip_legacy_redirect_parent_id function is expecting the $post variable to be a WP_Post object, when in fact it is sometimes a post ID.

The PR below ensures that the $post variable is a WP_Post object before trying to access its post_parent property.

#46

Handling AMP redirects and filtering redirect_uri

A common situation we've had to deal with is handling AMP redirects. For example, if you redirect:

https://domain.com/old-path to https://domain.com/new-path

If you're running AMP, you technically also need to redirect:

https://domain.com/old-path/amp to https://domain.com/new-path/amp

This may not be a use case for everyone so I don't think it needs to be part of the core plugin code. You can filter the request path for determining the URL match with wpcom_legacy_redirector_request_path to drop /amp to find the match, but there's no way to currently filter $redirect_uri after it's returned to add it back on.

What we've done is add a filter after it's returned in maybe_do_redirect() like:

$redirect_uri = apply_filters( 'wpcom_legacy_redirector_redirect_uri', $redirect_uri, $request_path );

I'd suggest adding the same to the core plugin code. Happy to make a PR for this, but unclear if this should be done against version 1.3.0 or 1.4.0.

To be clear, I understand you could just add redirects for both, but this seems inefficient to essentially double the size of the database entries when this is easily handled programmatically.

Add validate-csv command

After the validation has been decoupled (#58), we should add a validate-csv command, which would replace the idea of a --dry-run flag when inserting via CSV files.

Note that this would be checking the data is valid before adding, and is different to the idea at #13 of verifying that stored redirects work.

Store redirects independint of trailing slash

Currently, if I import a redirect for /test and visit example.com/test/ in my browser, I won't get redirected. The workaround is to upload redirects for /test/ and /test to the same destination URL.

Instead, does it make sense to add a check here that basically does this?

if ( $request_path ) {
	$redirect_uri = self::get_redirect_uri( $request_path );

	if ( ! $redirect_uri ) {
		if ( substr( $request_path , -1 )=='/' ) {
			$redirect_uri = self::get_redirect_uri( rtrim( $request_path, '/' );	
		} else {
			$redirect_uri = self::get_redirect_uri( $request_path . '/' );	
		}		
	}

	if ( $redirect_uri ) {
		header( 'X-legacy-redirect: HIT' );
		$redirect_status = apply_filters( 'wpcom_legacy_redirector_redirect_status', 301, $url );
		wp_safe_redirect( $redirect_uri, $redirect_status );
		exit;
	}
}

Here: https://github.com/Automattic/WPCOM-Legacy-Redirector/blob/master/wpcom-legacy-redirector.php#L108

I'm sure there's a more eloquent way to do this, but it seems silly to have to worry about both variations of a URL.

Use X-Redirect-By header

WordPress core includes a X-Redirect-By header when wp_redirect() or wp_safe_redirect() is used. By default, this has a value of WordPress.

This plugin has a header of:

header( 'X-legacy-redirect: HIT' );

HIT is typically thought of in relation to caching, rather than redirecting, so the use of it here may cause confusion.

Better would be to change

wp_safe_redirect( $redirect_uri, $redirect_status );
to add a third arg with a reference to the plugin name.

This would mean that WP 5.1 is the minimum required version.

Remove support for bulk editing

While it would be good to have individual redirects be editable, I can't think of any case where one would want to bulk edit multiple redirects at the same time.

(Maybe if there was an activated (draft) / deactivated (publish) feature, it would be useful?)

The bulk edit feature currently makes no sense:

Screenshot 2019-05-24 at 09 11 52

Let's disable it for now.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.