idlewan / link_cleaner Goto Github PK

View Code? Open in Web Editor NEW

99.0 99.0 18.0 16 KB

Browser Webextension to clean urls from utm_* and amazon/aliexpress product pages from useless tracking parameters

Makefile 2.58% JavaScript 97.42%

link_cleaner's People

Contributors

Stargazers

Watchers

Forkers

andreicristianpetcu shvchk spambusters acidburn0zzz valery55 sep332 funkynoodles philjc 0xahmet jakubjachym mehdi-alouane gilfaizon andshy xjsv elmergonzalezb tosunkaya

link_cleaner's Issues

Expand Facebook redirect coverage

http://lm.facebook.com/l.php?u=url seems to do the same that l.facebook.com does, so we should listen on that url as well.

Steam links

https://steamcommunity.com/linkfilter/?url=https://www.example.com/?a=test1&b=test2 redirect goes to https://steamcommunity.com/linkfilter/?url=https://www.example.com/?a=test1 skipping any extra url parameters.
Example:
http://steamcommunity.com/discussions/forum/12/1489987634020784508/
This is really a steam bug since they don't seem to be escaping ampersands.

t.co redirect by Twitter

Twitter adds an unnecessary redirect to all links posted by users, such as https://t.co/0HlJ9BeeJf. Unfortunately the true URL cannot be recovered from t.co URLs, but most <a> tags on twitter.com have a data-expanded-url or title attribute with the true URL.

This would require injecting JavaScript to the page (probably limited to *.twitter.com), but if that is done after page load there shouldn't be any performance issues.

Link copied from Steam

http://www.pcgamer.com/the-strangest-patch-notes-in-pc-gaming/?ns_campaign=article-feed&ns_mchannel=ref&ns_source=steam&ns_linkname=0&ns_fee=0

Didn't clean this link copied from Steam browser. Should it?

Every URL in link_cleaner's test pages shows up in browser history.

When reviewing your repo for updates, I found the test_urls.

Clicking each link lands me at a cleaned URL, but upon reviewing my browser history the uncleaned URLs are all listed.

I further checked using uBlock's logger, which displayed the uncleaned URLs before the cleaned URLs.

I assume that this means that the purpose of link_cleaner is not being fulfilled.
Please correct me if this is incorrect.
Thank you.

Clean Slack redirects

For example:

https://slack-redir.net/link?url=https%3A%2F%2Fgithub.com%2F&v=3

Redirects to:

https://github.com/

fbclid query parameter from outbound links on Facebook

Facebook has started adding a fbclid query parameter to every link leaving Facebook. This is a unique string of unknown meaning that is most likely used for tracking. We should strip this parameter from all links visited.

Add Install instructions

How to install this extension?

Please compare to CleanLinks

I'm guessing this is related to the WebExtension shift, but https://github.com/diegocr/cleanlinks is no longer shown at the Mozilla add-ons official pages although it's still working for me in FF ESR 52.

It has some nice options, whitelist capacity, does a yellow highlight to indicate when it cleans a link (which is nice in highlighting the problem of tracking links by making it obvious to users how common this stuff is). I'm not sure about details in comparison as I only just learned about Link Cleaner.

Since it looks like CleanLinks will be going away as a non-WebExtension, I would love to see (A) a published note about the differences for those migrating to Link Cleaner and (B) issues written up to add to Link Cleaner any desirable features from CleanLinks

Thanks!

Chrome extention

Can you please make the extension for chromium edge? The Firefox extension is perfect. It would be great to have it on Chrome as well. Thank you,.

Option to add user defined Links to remove

Hi,

will there be an option to make the user create additional urls and remove part of them. An option can be implement. Or anything similar?

Privacy Policy

Hey @idlewan

One thing I am in the process of doing with the ghacks user.js wiki page for extensions is to add Privacy Policy links (like below). If an extension has one that respects privacy, awesome, if not, no such "badge" (and if it has one one that stinks, it will not be recommended).

Would you consider creating a simple AMO Privacy Policy (or GitHub wiki page, or root md file)? Cheers

uBlock Origin ^{✔ Privacy} | GitHub << GitHub example
Decentraleyes ^{✔ Privacy} | GitHub << AMO example

clean links with right-click "copy link location"?

I'm not seeing links cleaned when I right-click and select copy link location. Is it supposed to? If not would that would be cool to see that added. Thanks.

Feature Request: configurable get-parameters to be cleaned up

It would be cool to have an option where one could enter arbitraty domains and specific get-parameters that should be cleaned when this domain is called.
This way, users could add additional parameters specific to their most visited sites (which is likely to be impractical to manage inside the codebase of link_cleaner), increasing the benefit of this Add-on significantly.

Generic URL Scraper

It would be nice if this implemented a generic URL scraper of some kind, so that each individual site didn't have to be coded manually. Case in point, the link to this page from AMO:

https://outgoing.prod.mozaws.net/v1/d6c54b48bd1142d3dee6387e3d3feabc610d77ab48590ae0a43e6c20d93db01e/https%3A//github.com/idlewan/link_cleaner

If there was a way to recognize the actual URL here automatically, that would be wonderful. Similarly for Google:

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwiU4JDYyZvUAhUoxFQKHf5IAEAQFggsMAE&url=https%3A%2F%2Fgithub.com%2Fidlewan%2Flink_cleaner&usg=AFQjCNHLsiLWuJifp8qBynFPaicSw0gLGw&sig2=imkIeC-CN_z-8x5NgFr4TQ

I'm not sure of the best logic to avoid breakage. It's a very complicated issue. As a start, split the URL query parts, then URL decode them, then compare them against the following regex to see if they match, and if so, navigate to it instead:

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
(From Appendix B in https://www.ietf.org/rfc/rfc3986.txt)

Of course the first URL I gave isn't a query parameter, so if there's still no match after that, perhaps running the regex against the entire URL would also be appropriate.

is this only limited to work on certain domains or to strip certain strings?

I realize you're promoting this to be better than an outdated extension which does something similar but if it's just for cleaning links on a handful of domains or a few extra link cruft it's not that powerful.

Remove other generic tracking parameters from all urls

As discussed in #7, some additional parameters to remove:

ga_source
ga_medium
ga_term
ga_content
ga_campaign
ga_place
yclid, _openstat
fb_action_ids
fb_action_types
fb_ref
fb_source
action_object_map
action_type_map
action_ref_map

Whitelist for parameters on certain sites

Sometimes removing parameters "breaks" sites, e.g. a news site that displays a paywall when a parameter is missing.

For these cases it would be nice to whitelist the parameter in question for the affected site.

Please add configurable settings in some form

Like noted in this comment : Pure URL has a pretty basic set of options, but the fact that its set of tags is open to tweak for the end user puts it ahead of the (fairly scarce) competition.
So, the feature request would be like this:

Make that list configurable, not totally hardcoded. At the very least, make an entry in about:config.
Editable string in Addon settings (like in Pure URL) is fine, too.
Bonus: make two fields, one hardcoded and another one for the user to fiddle, initially empty. Just for the sake of usability/clarity.
Bonus: Import from Pure URL on first run. Totally unnecessary, but adds some quality of life.

Related to #7 and #16

Reddit tracking issue

So, I have currently CleanLinks, Link Cleaner 1.4, and uBlock Origin.

If I click links in Reddit (which look like direct when hovered but are actually tracking links), CleanLinks cleans them, end of story. If I turn CleanLinks off, then uBlock Origin stops the redirect and brings up a warning page asking if I am sure I want to visit the full tracking link, and if I say yes it will successfully track me.

Can Link Cleaner take care of this like CleanLinks does (so I can eventually drop CleanLinks)?

Allow opt-out for affiliate links

You should make a difference between tracking link part and affiliate links. Although affiliate links of course are also a kind of tracking, they usually just track "where the user came from", in order to compensate the search engine or different things.
Now users may like to use this referrers, e.g. as there are also good companies which use that feature or they want to support someone else with that feature.

In order to allow this, please add an option to not strip affiliate links.

Add disqus link cleaner

All commets in disqus has a redirection

Example:
https://disq.us/url?url=https://support.xbox.com/it-IT/games/game-setup/my-home-xbox%3Ad1TNK5kNdMj5J-g7Dk0ANEUjTPw&cuid=4714828

Please add a link cleaner for disqus

not cleaning this link

https://www.google.com/url?rct=j&sa=t&url=https://www.cbc.ca/news/canada/edmonton/alberta-auma-toronto-election-ford-1.4845027&ct=ga&cd=CAIyGTg0YjQyYTQxZDgzNWI3ZDM6Y2E6ZW46Q0E&usg=AFQjCNHY7kBqcJoO_RtVUJzoAQJC81EGUA

Breaks KeePass autotype

I find that this otherwise wonderful Firefox addon seems to break the autotype functionality on KeePass on Windows (Windows 8); Linux (running KeePass under Mono) seems unaffected.

Link Cleaner 1.5

This causing "Open Link in New Tab" to open a tab, then instantly closes it again!

With this addon active, whenever I right click on a link and tell it to open in a new tab, the new tab appears for less than one second before closing again. I cannot use this addon so long as that is the case!

Improving Amazon links + sanitize link copied

Hello @idlewan

I've spent a bit of time hacking your extension (thanks in the first place for your work 👍 ).

As a matter of fact I've almost did a refactor of the Amazon link sanitizer, stripping also the SEO part from their links.

In addition, I've also prepared some code to satisfy issue #26 (adding a new menu item).

Now I have a bunch of messy code in my stash that with a little more love could translate in two PRs.

My question is: is there some interest in evaluating and eventually work together on merging my code upstream? I'm asking because I don't see much activity in this project anymore.

I'd be happy to work on this repo - or on another active fork - because I like this extension.

Thank you for your attention.

Add support for more redirects

Is it possible to add support for more redirects? redflagdeals.com uses a lot of them, as it auto adds them to URLs posted by users, such as the following:

http://www.kqzyfj.com/click-749547-11365101?url=https%3A%2F%2Fwww.homedepot.ca%2Fen%2Fhome%2Fp.elvina-60-inch-framed-sliding-shower-door-in-glass.1001027101.html&sid=rfdcb

Clan up base64 redirects

A lot of redirectors use base64 to obfuscate the link

http://example.com/aHR0cDovL3d3dy5nb29nbGUuY29t ➠ http://www.google.com

So
https://ostermiller.org/calc/encode.html
would become
aHR0cHM6Ly9vc3Rlcm1pbGxlci5vcmcvY2FsYy9lbmNvZGUuaHRtbA==

base64 en/decoding is done easily with the atob() and btoa() functions in JavaScript.

Remove Oracle Eloqua parameters

Oracle Eloqua uses parameters such as elq, elq_ck, elqaid, elqat, elqCampaignId, elqchannel, elqcname, elqct, elqoffer, elqTrackId.

Example link as used by npmjs in a mail: https://docs.npmjs.com/getting-started/using-two-factor-authentication?utm_campaign=2017-10-25%202FA%20all-sub%20email%20test%20copy&utm_medium=email&utm_source=Eloqua&elqTrackId=0123456789abcdef0123456789abcdef&elq=fedcba9876543210fedcba9876543210&elqaid=421&elqat=1&elqCampaignId=144

Please remove l.facebook.com redirect

Great WebExtension! :)

Can you please remove Facebook useless redirects?

I have this link in Facebook

https://l.facebook.com/l.php?u=http%3A%2F%2Feconomie.hotnews.ro%2Fstiri-telecom-21702036-adio-taxe-roaming-15-iunie-2017-europenii-nu-vor-mai-avea-costuri-suplimentare-telefonie-internet-mobil-cand-vor-calatori-alt-stat.htm&h=ATP1kf98S0FxqErjoW8VmdSllIp4veuH2_m1jl69sEEeLzUXbkNXrVnzRMp65r5vf21LJGTgJwR2b66m97zYJoXx951n-pr4ruS1osMvT2c9ITsplpPU37RlSqJsSgba&s=1

I want it to redirect to this URL bypassing Facebook's tracking system:

http://economie.hotnews.ro/stiri-telecom-21702036-adio-taxe-roaming-15-iunie-2017-europenii-nu-vor-mai-avea-costuri-suplimentare-telefonie-internet-mobil-cand-vor-calatori-alt-stat.htm

So basically apply this regex take the first capturing group, decode it as base64 and then redirect to it:

https://l.facebook.com/l.php[?]u=(.*)&h=.*

I think if an addon does this it would be faster since sometimes l.facebook.com seems to hang on slow Internet connections.

Thank you!

Redirects in v1.4 not working with current Nightly

Tested the sample links from AMO with 55.0a1 (2017-04-28) (64-bit) macOS and the parameters don't get removed atm.

Clear Link for Telegram URL

Если не затруднить, добавьте пожалуйста код для очистки ссылок из Telegram.
Вот пример: https://t.me/iv?url=https%3A%2F%2Fwww.hackerplace.org%2Fpost%2Ftelegram-sec&rhash=2fcef7f92d9257
То есть чтобы оставлял только https://www.hackerplace.org/post/telegram-sec

Благодарю!

ALiexpress won't work anymore

Hey there, I really like this plugin, but I have trouble with aliexpress a while now.
I was ablte to point it on the link cleaner, if I disable it it will work again.

If I open a link I recently only see a "oops something went wrong" screen and I am able to slide a verify slider.
After this screen I receive a "Deny from x5" screen.
Someone could look into this?

Example link: https://www.aliexpress.com/item/32948075240.html (shortend)
Long: https://www.aliexpress.com/item/Reusable-Metal-Drinking-Straws-304-Stainless-Steel-Sturdy-Bent-Straight-Drinks-Straw-with-Cleaning-Brush-Bar/32948075240.html?spm=2114.search0104.3.11.2b197e1eZknh4q&ws_ab_test=searchweb0_0,searchweb201602_9_10065_10068_10547_319_317_10548_10696_10084_453_10083_454_10618_10304_10307_10820_10821_537_10302_536_10843_10059_10884_10887_321_322_10103,searchweb201603_53,ppcSwitch_0&algo_expid=5791e96c-b334-4b73-8500-abed1e35c5a6-1&algo_pvid=5791e96c-b334-4b73-8500-abed1e35c5a6&transAbTest=ae803_4

LinkedIn.com links?

Hi,
for selected websites the extension works great. Is it possible to add a linkedin.com to the list?
Or better, add an option for user to add a site self?

Cheers

Copying links to clipboard?

Seems not to work when copying links to the clipboard, on ubuntu at least.

https://www.google.com/url?q=https://soundcloud.com ... &sa=D&ust=1504422217100000&usg=AFQjCNG9C

Links like so not being fixed, at least at docs.google.com/spreadsheet

Bypass Google redirects

In a url such as this one: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwiU4JDYyZvUAhUoxFQKHf5IAEAQFggsMAE&url=https%3A%2F%2Fgithub.com%2Fidlewan%2Flink_cleaner&usg=AFQjCNHLsiLWuJifp8qBynFPaicSw0gLGw&sig2=imkIeC-CN_z-8x5NgFr4TQ

it's possible to recover the target url from the redirect. We should clean this link.

add dealabs.net redirect

Hello,
Could you clean url from this website ?

For example:
When we click a link
https://www.dealabs.com/visit/comment/24553867/8628618
It redirect to
https://dealabs.digidip.net/visit?url=https%3A%2F%2Fwww.fnac.com%2FFreeSync-et-G-Sync-sur-les-ecrans-PC-gaming-c-est-quoi%2Fcp33210%2Fw-4&ppref=https%3A%2F%2Fwww.dealabs.com%2Fbons-plans%2Fecran-35-asus-rog-strix-xg35vq-3440x1440p-led-4-ms-1609701&ref=785619855

It should give
https://www.fnac.com/FreeSync-et-G-Sync-sur-les-ecrans-PC-gaming-c-est-quoi/cp33210/w-4

De-amp-ify Google AMP pages

Google AMP is a “dialect” of HTML, the computer code that websites are written in. A website that uses AMP has empirical benefits, such as improved loading speed, and Google rewards sites that use AMP by bumping them up in search results.

A website that uses AMP has a few downsides, and the biggest one is that everything is hosted by Google, so you can be sure they'll be data mining every bit of it.

An example of an AMP url looks like this: https://amp.theguardian.com/technology/2017/jul/24/microsoft-paint-kill-off-after-32-years-graphics-editing-program. A filter to turn amp.example.com subdomains to example.com and strip /amp/ from URLs would probably hit most of the cases. Some AMP sites are hosted on google.com itself, e.g. https://www.google.com/amp/www.latimes.com/local/lanow/la-me-uc-irvine-rescissions-20170728-story,amp.html#ampshare=http://www.latimes.com/local/lanow/la-me-uc-irvine-rescissions-20170728-story.html . Fortunately the actual URL is easily extractable in these cases.

AMP is a classic Embracce-Extend-Extinguish tactic to subvert the open web and lock content in Google's walled garden. We cannot let it go ignored.