Git Product home page Git Product logo

link_cleaner's People

Contributors

idlewan avatar shvchk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

link_cleaner's Issues

t.co redirect by Twitter

Twitter adds an unnecessary redirect to all links posted by users, such as https://t.co/0HlJ9BeeJf. Unfortunately the true URL cannot be recovered from t.co URLs, but most <a> tags on twitter.com have a data-expanded-url or title attribute with the true URL.

This would require injecting JavaScript to the page (probably limited to *.twitter.com), but if that is done after page load there shouldn't be any performance issues.

Every URL in link_cleaner's test pages shows up in browser history.

When reviewing your repo for updates, I found the test_urls.

Clicking each link lands me at a cleaned URL, but upon reviewing my browser history the uncleaned URLs are all listed.

I further checked using uBlock's logger, which displayed the uncleaned URLs before the cleaned URLs.

I assume that this means that the purpose of link_cleaner is not being fulfilled.
Please correct me if this is incorrect.
Thank you.

fbclid query parameter from outbound links on Facebook

Facebook has started adding a fbclid query parameter to every link leaving Facebook. This is a unique string of unknown meaning that is most likely used for tracking. We should strip this parameter from all links visited.

Please compare to CleanLinks

I'm guessing this is related to the WebExtension shift, but https://github.com/diegocr/cleanlinks is no longer shown at the Mozilla add-ons official pages although it's still working for me in FF ESR 52.

It has some nice options, whitelist capacity, does a yellow highlight to indicate when it cleans a link (which is nice in highlighting the problem of tracking links by making it obvious to users how common this stuff is). I'm not sure about details in comparison as I only just learned about Link Cleaner.

Since it looks like CleanLinks will be going away as a non-WebExtension, I would love to see (A) a published note about the differences for those migrating to Link Cleaner and (B) issues written up to add to Link Cleaner any desirable features from CleanLinks

Thanks!

Chrome extention

Can you please make the extension for chromium edge? The Firefox extension is perfect. It would be great to have it on Chrome as well. Thank you,.

Privacy Policy

Hey @idlewan

One thing I am in the process of doing with the ghacks user.js wiki page for extensions is to add Privacy Policy links (like below). If an extension has one that respects privacy, awesome, if not, no such "badge" (and if it has one one that stinks, it will not be recommended).

Would you consider creating a simple AMO Privacy Policy (or GitHub wiki page, or root md file)? Cheers

Feature Request: configurable get-parameters to be cleaned up

It would be cool to have an option where one could enter arbitraty domains and specific get-parameters that should be cleaned when this domain is called.
This way, users could add additional parameters specific to their most visited sites (which is likely to be impractical to manage inside the codebase of link_cleaner), increasing the benefit of this Add-on significantly.

Generic URL Scraper

It would be nice if this implemented a generic URL scraper of some kind, so that each individual site didn't have to be coded manually. Case in point, the link to this page from AMO:

https://outgoing.prod.mozaws.net/v1/d6c54b48bd1142d3dee6387e3d3feabc610d77ab48590ae0a43e6c20d93db01e/https%3A//github.com/idlewan/link_cleaner

If there was a way to recognize the actual URL here automatically, that would be wonderful. Similarly for Google:

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwiU4JDYyZvUAhUoxFQKHf5IAEAQFggsMAE&url=https%3A%2F%2Fgithub.com%2Fidlewan%2Flink_cleaner&usg=AFQjCNHLsiLWuJifp8qBynFPaicSw0gLGw&sig2=imkIeC-CN_z-8x5NgFr4TQ

I'm not sure of the best logic to avoid breakage. It's a very complicated issue. As a start, split the URL query parts, then URL decode them, then compare them against the following regex to see if they match, and if so, navigate to it instead:

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
(From Appendix B in https://www.ietf.org/rfc/rfc3986.txt)

Of course the first URL I gave isn't a query parameter, so if there's still no match after that, perhaps running the regex against the entire URL would also be appropriate.

Remove other generic tracking parameters from all urls

As discussed in #7, some additional parameters to remove:

ga_source
ga_medium
ga_term
ga_content
ga_campaign
ga_place
yclid, _openstat
fb_action_ids
fb_action_types
fb_ref
fb_source
action_object_map
action_type_map
action_ref_map

Whitelist for parameters on certain sites

Sometimes removing parameters "breaks" sites, e.g. a news site that displays a paywall when a parameter is missing.

For these cases it would be nice to whitelist the parameter in question for the affected site.

Please add configurable settings in some form

Like noted in this comment : Pure URL has a pretty basic set of options, but the fact that its set of tags is open to tweak for the end user puts it ahead of the (fairly scarce) competition.
So, the feature request would be like this:

  • Make that list configurable, not totally hardcoded. At the very least, make an entry in about:config.
    Editable string in Addon settings (like in Pure URL) is fine, too.
  • Bonus: make two fields, one hardcoded and another one for the user to fiddle, initially empty. Just for the sake of usability/clarity.
  • Bonus: Import from Pure URL on first run. Totally unnecessary, but adds some quality of life.

Related to #7 and #16

Reddit tracking issue

So, I have currently CleanLinks, Link Cleaner 1.4, and uBlock Origin.

If I click links in Reddit (which look like direct when hovered but are actually tracking links), CleanLinks cleans them, end of story. If I turn CleanLinks off, then uBlock Origin stops the redirect and brings up a warning page asking if I am sure I want to visit the full tracking link, and if I say yes it will successfully track me.

Can Link Cleaner take care of this like CleanLinks does (so I can eventually drop CleanLinks)?

Allow opt-out for affiliate links

You should make a difference between tracking link part and affiliate links. Although affiliate links of course are also a kind of tracking, they usually just track "where the user came from", in order to compensate the search engine or different things.
Now users may like to use this referrers, e.g. as there are also good companies which use that feature or they want to support someone else with that feature.

In order to allow this, please add an option to not strip affiliate links.

Add disqus link cleaner

All commets in disqus has a redirection

Example:
https://disq.us/url?url=https://support.xbox.com/it-IT/games/game-setup/my-home-xbox%3Ad1TNK5kNdMj5J-g7Dk0ANEUjTPw&cuid=4714828

Please add a link cleaner for disqus

Breaks KeePass autotype

I find that this otherwise wonderful Firefox addon seems to break the autotype functionality on KeePass on Windows (Windows 8); Linux (running KeePass under Mono) seems unaffected.

Link Cleaner 1.5

Improving Amazon links + sanitize link copied

Hello @idlewan

I've spent a bit of time hacking your extension (thanks in the first place for your work 👍 ).

As a matter of fact I've almost did a refactor of the Amazon link sanitizer, stripping also the SEO part from their links.

In addition, I've also prepared some code to satisfy issue #26 (adding a new menu item).

Now I have a bunch of messy code in my stash that with a little more love could translate in two PRs.

My question is: is there some interest in evaluating and eventually work together on merging my code upstream? I'm asking because I don't see much activity in this project anymore.

I'd be happy to work on this repo - or on another active fork - because I like this extension.

Thank you for your attention.

Please remove l.facebook.com redirect

Great WebExtension! :)

Can you please remove Facebook useless redirects?

I have this link in Facebook

https://l.facebook.com/l.php?u=http%3A%2F%2Feconomie.hotnews.ro%2Fstiri-telecom-21702036-adio-taxe-roaming-15-iunie-2017-europenii-nu-vor-mai-avea-costuri-suplimentare-telefonie-internet-mobil-cand-vor-calatori-alt-stat.htm&h=ATP1kf98S0FxqErjoW8VmdSllIp4veuH2_m1jl69sEEeLzUXbkNXrVnzRMp65r5vf21LJGTgJwR2b66m97zYJoXx951n-pr4ruS1osMvT2c9ITsplpPU37RlSqJsSgba&s=1

I want it to redirect to this URL bypassing Facebook's tracking system:

http://economie.hotnews.ro/stiri-telecom-21702036-adio-taxe-roaming-15-iunie-2017-europenii-nu-vor-mai-avea-costuri-suplimentare-telefonie-internet-mobil-cand-vor-calatori-alt-stat.htm

So basically apply this regex take the first capturing group, decode it as base64 and then redirect to it:

https://l.facebook.com/l.php[?]u=(.*)&h=.*

I think if an addon does this it would be faster since sometimes l.facebook.com seems to hang on slow Internet connections.

Thank you!

ALiexpress won't work anymore

Hey there, I really like this plugin, but I have trouble with aliexpress a while now.
I was ablte to point it on the link cleaner, if I disable it it will work again.

If I open a link I recently only see a "oops something went wrong" screen and I am able to slide a verify slider.
After this screen I receive a "Deny from x5" screen.
Someone could look into this?

Example link: https://www.aliexpress.com/item/32948075240.html (shortend)
Long: https://www.aliexpress.com/item/Reusable-Metal-Drinking-Straws-304-Stainless-Steel-Sturdy-Bent-Straight-Drinks-Straw-with-Cleaning-Brush-Bar/32948075240.html?spm=2114.search0104.3.11.2b197e1eZknh4q&ws_ab_test=searchweb0_0,searchweb201602_9_10065_10068_10547_319_317_10548_10696_10084_453_10083_454_10618_10304_10307_10820_10821_537_10302_536_10843_10059_10884_10887_321_322_10103,searchweb201603_53,ppcSwitch_0&algo_expid=5791e96c-b334-4b73-8500-abed1e35c5a6-1&algo_pvid=5791e96c-b334-4b73-8500-abed1e35c5a6&transAbTest=ae803_4

LinkedIn.com links?

Hi,
for selected websites the extension works great. Is it possible to add a linkedin.com to the list?
Or better, add an option for user to add a site self?

Cheers

add dealabs.net redirect

Hello,
Could you clean url from this website ?

For example:
When we click a link
https://www.dealabs.com/visit/comment/24553867/8628618
It redirect to
https://dealabs.digidip.net/visit?url=https%3A%2F%2Fwww.fnac.com%2FFreeSync-et-G-Sync-sur-les-ecrans-PC-gaming-c-est-quoi%2Fcp33210%2Fw-4&ppref=https%3A%2F%2Fwww.dealabs.com%2Fbons-plans%2Fecran-35-asus-rog-strix-xg35vq-3440x1440p-led-4-ms-1609701&ref=785619855

It should give
https://www.fnac.com/FreeSync-et-G-Sync-sur-les-ecrans-PC-gaming-c-est-quoi/cp33210/w-4

De-amp-ify Google AMP pages

Google AMP is a “dialect” of HTML, the computer code that websites are written in. A website that uses AMP has empirical benefits, such as improved loading speed, and Google rewards sites that use AMP by bumping them up in search results.

A website that uses AMP has a few downsides, and the biggest one is that everything is hosted by Google, so you can be sure they'll be data mining every bit of it.

An example of an AMP url looks like this: https://amp.theguardian.com/technology/2017/jul/24/microsoft-paint-kill-off-after-32-years-graphics-editing-program. A filter to turn amp.example.com subdomains to example.com and strip /amp/ from URLs would probably hit most of the cases. Some AMP sites are hosted on google.com itself, e.g. https://www.google.com/amp/www.latimes.com/local/lanow/la-me-uc-irvine-rescissions-20170728-story,amp.html#ampshare=http://www.latimes.com/local/lanow/la-me-uc-irvine-rescissions-20170728-story.html . Fortunately the actual URL is easily extractable in these cases.

AMP is a classic Embracce-Extend-Extinguish tactic to subvert the open web and lock content in Google's walled garden. We cannot let it go ignored.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.