Git Product home page Git Product logo

ublock-origin-dev-filter's Introduction

uBlock-Origin-dev-filter

Filters to block and remove copycat-websites from DuckDuckGo, Google and other search engines. Used to be specific to dev websites like StackOverflow or GitHub, but it currently supports others like Wikipedia.

To use this tools, you should have uBlock Origin installed.

Import into uBlock Origin

Select the filters flavors you want, depending on your needs and search engine:

💻 dev supports StackOverflow + GitHub + NPM (the original dev-oriented filter)
🌐 global supports StackOverflow + GitHub + NPM + Wikipedia + SEO Spam

dev global
Google uBO - add this filter uBO - add this filter
DuckDuckGo uBO - add this filter uBO - add this filter
Google+DDG uBO - add this filter uBO - add this filter
Startpage uBO - add this filter uBO - add this filter
Brave uBO - add this filter uBO - add this filter
Ecosia uBO - add this filter uBO - add this filter
All Search Engines uBO - add this filter uBO - add this filter

More granular versions (StackOverflow-only, GitHub-only, ...)
StackOverflow GitHub NPM Wikipedia SEO Spam
Google add in uBO add in uBO add in uBO add in uBO add in uBO
DuckDuckGo add in uBO add in uBO add in uBO add in uBO add in uBO
Google+DDG add in uBO add in uBO add in uBO add in uBO add in uBO
Startpage add in uBO add in uBO add in uBO add in uBO add in uBO
Brave add in uBO add in uBO add in uBO add in uBO add in uBO
Ecosia add in uBO add in uBO add in uBO add in uBO add in uBO
All Search Engines add in uBO add in uBO add in uBO add in uBO add in uBO

How to import uBlock filters manually

Manually import filters

  1. Open uBlock Origin settings
  2. Under the "Filter lists" tab, scroll to the bottom where it says “Custom” and click the “Import” checkbox to reveal the custom URL textbox
  3. Append the URL https://raw.githubusercontent.com/quenhus/uBlock-Origin-dev-filter/main/dist/google_duckduckgo/all.txt in the textbox
  4. Press Apply Changes in the upper left

Note: In dist/, you can find filters for other search engines (Google, DuckDuckGo, Startpage or Brave). You can use and combine these filters by using the raw URL of dist/ files.

Other filter formats (uBlacklist, hosts filter, ...)

This project also provide filter in other formats:

  • uBlacklist (more efficient than uBO in this case)
  • macOS userscript
  • Domains filter (can be used with a Pi-hole/Firewall)
  • DNS hosts filter (can be used in /etc/hosts)
dev global StackOverflow GitHub NPM Wikipedia SEO Spam
uBlacklist Link Link Link Link Link Link Link
macOS userscript Link Link Link Link Link Link Link
Domains filter Link Link Link Link Link Link Link
DNS hosts filter Link Link Link Link Link Link Link

Adding URL's

Please create a pull-request or start an issue with evidence against the "copycats".

Security

For simplicity and auto-updates, uBlock Origin filters rely on the last commit of the main branch, as every other uBO filters. For now, it seems this method does not raise security issues. However, you can import uBlock Origin filters with a reference to a given commit, not the main branch. Filters won't auto-update but they will be auditable by your own eyes.

Scope of this filter

To me, a copycat is a website that:

  • mirrors most of GitHub/SO content, automatically and without useful additional work on the content,
  • prevents the user to interact easily with the resource (upvote, comment or reply),
  • might use SEO techniques to catch users who would have otherwise reached the original resource,
  • overall, offers no benefits for users over the original resource.

To be more precise:

  • I do not consider automatic translation as a benefit;
  • I do consider a mirror with clear attribution to be a copycat;
  • I do not consider a mirror created for privacy concern to be a copycat, except if it uses aggressive SEO techniques;
  • This uBlock filter is my own filter, for my usage and can't obviously satisfy everyone.

Sources

Do your own

  1. List URL that you want to block in a .txt in the data/ folder
  2. Use src/generate.py, which generate files in dist/ you can use as uBlock filters

Note: You can use letsblock.it to create your own filter.

ublock-origin-dev-filter's People

Contributors

actions-user avatar alanning avatar axnsan12 avatar ben-xx avatar bitboxx avatar cennoxx avatar dylanarmstrong avatar explosion-scratch avatar fedemp avatar feirell avatar growse avatar im-bruno avatar jackblk avatar jendrikw avatar jjnilton avatar kamer avatar lunaneff avatar maybach91 avatar michaelmcdonnell avatar mwnciau avatar nelsonjchen avatar quenhus avatar reeseovine avatar rubenvanerk avatar saul avatar thorncorona avatar trashman9000 avatar veyndan avatar xvello avatar zekxtreme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ublock-origin-dev-filter's Issues

Add Brave Search support

Just posting this here as a feature request. I might eventually get the time to add this myself, but I figured it's good to post this here in case other users are looking for the support.

Effectively, we'd just need to expand the Google + DuckDuckGo support to include search.brave.com as well.

Request: add these sites to the filter

I never thought much about these copycat sites, until this...
https://www.google.com/search?q=baldur+site:fuscin.com

Practically fell backward outta my chair!

{funny enough, this comes up blank, lol: https://www.google.com/search?q=quenhus+site%3Afuscin.com}

Github copycats. (haven't checked if they copy other sites)

Evidence: https://www.fuscin.com/btigi/iiTweak
Original: https://github.com/btigi/iiTweak

Evidence: https://iboxshare.com/quenhus/uBlock-Origin-dev-filter
Original: https://github.com/quenhus/uBlock-Origin-dev-filter

Request: Explain a little what "copycat" websites are

I don't actually know what copycat websites are and why they should be removed from search results.
Are these the websites that mirror github/SO content as their own and therefore catch users who would have otherwise reached github/SO directly?
I looked into the 'data' folder and yet somehow decided NOT to actually try to visit any of those sites, to see what they are ;)

Request: add serveanswer.com and solveforums.msomimaktaba.com to the filter

GitHub Raw isn't meant for serving files

GitHub Raw isn't made for serving files to tons of people, even though it can, so I would recommend setting up Pages on this repo to reduce strain on GitHub's systems.

Request: add 12 URLs to the filter

ask-ubuntu.ru - askubuntu.com translation, has origin link
Evidence: https://ask-ubuntu.ru/questions/2337/ispravit-povrezhdennyij-razdel-ntfs-bez-windows
Original: https://askubuntu.com/questions/47700/fix-corrupt-ntfs-partition-without-windows

askubuntu.ru - askubuntu.com translation, no origin link
Evidence: https://askubuntu.ru/questions/170334/vosstanovit-ntfs-bez-windows-dublikat
Original: https://askubuntu.com/questions/330733/repair-ntfs-without-windows

kompsekret.ru - superuser.com translation, origin link leads to its own page
Evidence: https://kompsekret.ru/q/best-ways-to-fix-outlook-2007-error-your-mailbox-is-over-its-size-limit-288222/
Original: https://superuser.com/questions/293412/best-ways-to-fix-outlook-2007-error-your-mailbox-is-over-its-size-limit

ohandroid.com - stackoverflow.com translation, no origin link, answers merged with the question in one chunk
Evidence: http://www.ohandroid.com/android-widget-switch-toggled-event-listener.html
Original: https://stackoverflow.com/questions/21010924/android-widget-switch-toggled-event-listener/21010941

poweruser.guru - superuser.com translation, has origin link
Evidence: https://poweruser.guru/questions/324399/проводник-windows-держит-дескриптор-открытым-на-исполняемых-файлах
Original: https://superuser.com/questions/324399/windows-explorer-keeps-handle-open-on-executable-files

ruphp.com - stackoverflow.com translation, no origin link, answers merged with the question in one chunk
Evidence: https://ruphp.com/google-x43e-2.html
Original: https://stackoverflow.com/questions/10549049/accessing-google-bookmarks-server-side-with-php

server-fault.ru - serverfault.com translation, origin link leads to its own page
Evidence: https://server-fault.ru/questions/273014/kak-vosstanovit-zapisat-metku-klonirovat-suschestvujuschuju
Original: https://serverfault.com/questions/1085397/how-can-one-recover-write-a-label-clone-existing-one

sprosi.pro - stackoverflow.com translation, has origin link
Evidence: https://sprosi.pro/questions/824506/kak-mne-obnovit-razvetvlennyiy-repozitoriy-github
Original: https://stackoverflow.com/questions/7244321/how-do-i-update-or-sync-a-forked-repository-on-github

stackru.com - stackoverflow.com translation, has origin link
Evidence: https://stackru.com/questions/2261423/eclipse-javalangclassnotfoundexception
Original: https://stackoverflow.com/questions/1052978/eclipse-java-lang-classnotfoundexception

switch-case.ru - redirect to answer-id.com/ru/

ubuntugeeks.com - askubuntu.com translation, has origin link
Evidence: https://ubuntugeeks.com/questions/1/how-to-check-internet-speed-via-terminal
Original: https://askubuntu.com/questions/104755/how-to-check-internet-speed-via-terminal

ubuntuplace.info - askubuntu.com translation, has origin link
Evidence: https://ubuntuplace.info/questions/1/comment-verifier-la-vitesse-internet-le-terminal
Original: https://askubuntu.com/questions/104755/how-to-check-internet-speed-via-terminal

wikiroot.ru - superuser.com translation, no origin link
Evidence: https://wikiroot.ru/question/moy-bios-zavisaet-na-testirovanie-pamyati-chto-mojet-vyzvaty-eto
Original: https://superuser.com/questions/185086/my-bios-hangs-at-testing-memory-what-could-cause-this

russianblogs.com - no direct evidence, but looks very copycat site

Some dead links which can resurrect somedays:
fliplinux.com
issue.life
javahow.ru
programmerz.ru
qaru.site
ru.craftjs.com
stackoverrun.com
unix.stackovernet.com
vpros.ru

Request: add softbranchdevelopers.com to the filter

Reddit:

Evidence: https://softbranchdevelopers.com/typescript-type-substitution-for-proptypes-type/
Original: https://www.reddit.com/r/reactjs/comments/oyxbfn/typescript_type_substitution_for_proptypes_type/

Github:

Evidence: https://softbranchdevelopers.com/a-simple-vue-js-app-made-for-learning-the-framework-and-how-to-work-with-api/
Original: https://github.com/matteotagliatti/movie-vue-app?ref=vuejsexamples.com

Evidence: https://softbranchdevelopers.com/starter-pack-for-creating-ui-kit-on-vue-js/
Original: https://github.com/NorvikIT/vue-uikit-starter?ref=vuejsexamples.com

Comments

Sometimes there are non-human generic comments like the one on this article:
https://softbranchdevelopers.com/learn-swift-for-c-developers/

Thanks:

  1. Thank for for the awesome filterlist. I didn't knew the situation is so bad until I started working with more javascript technology. I thought it's just a myth that the situation is getting out of control.
  2. Do you plan to also include reddit copycats?

Bug: url with non-wildcard path are not filtered on DDG

For example, the DDG filter for *://www.scholarship.edu.vn/wiki/* is duckduckgo.com##[data-domain$="www.scholarship.edu.vn/wiki"]

However, the attribute data-domain, as its name implies, only contain the domain part of the URL. In this example it is www.scholarship.edu.vn

We have to rewrite the DDG filter to handle this case and generate a more precise uBO filter.

Search result removal not working for DDG

I have the DDG and Google bundle for Firefox. Links on the list are blocked in that I cannot access the sites, but results with those links are still showing up in search. Using this rule instead of the current one does remove the result however:

duckduckgo.*##.results>div:has(a[href*="copycatsite.com"])

Request: add Wikipedia clones

Another category of copycat sites that I find maddening are Wikipedia clones. Wikipedia is freely licensed and even allows database dumps, so there are a ton of annoying mirrors out there. These sites can show up in search results for just about any topic. Here are a few:

*://www.wikiwand.com/*
*://wiki2.org/*
*://worddisk.com/wiki/*
*://thereaderwiki.com/*
*://www.absoluteastronomy.com/*
*://encyclopedia.thefreedictionary.com/*
*://peoplepill.com/*
*://www.algebra.com/algebra/about/history/*.wikipedia

You can see that these are clones from the following links:

There are a lot more copycats out there (Wikipedia maintains a giant list, but it includes many sites that have just copied a small amount of content and not the entire site, and I'm sure there are some dead/outdated entries as well), but I figured I'd just post a few to start.

Request: add vuejscode.com to the filter

OK to include in the letsblock.it project?

Hello @quenhus,

I am the maintainer of https://letsblock.it, a uBlock Origin list generator that allows users to pick and customize filter templates to filter out low-value content. The most requested feature so far is to extend the hide websites from search results template with presets to hide Github and Stackoverflow copycats. Instead of duplicating work, I'd love to reuse the data from your project and import it as user-selectable presets.

I am currently working on the implementation PR: letsblockit/letsblockit#64 and will deploy a staging version when the frontend is ready. I'd love to have your input on it, whether that's questions, ideas or concerns.

Add metadata to filter lists

Filter lists using the Adblock syntax can have metadata in special comments in the header. This helps identify the name of the list, its origin, refresh frequency, etc. For example:

! Title: uBlock-Origin-dev-filter
! Version: 12345
! Expires: 1 day
! Description: Filters to block and remove copycat-websites from DuckDuckGo and Google. Specific to dev websites like StackOverflow or GitHub.
! Homepage: https://github.com/quenhus/uBlock-Origin-dev-filter

One side-effect of not having this, is the lack of title after importing the uBlock-Origin-dev-filter list:
Screen Shot 2022-01-22 at 11 27 52

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.