Git Product home page Git Product logo

blackweb's People

Contributors

maravento avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

blackweb's Issues

openSquat

Hello,

Unfortunately, direct access to openSquat feeds no longer work due to high bandwidth consumption.

bwupdate.sh not working

Hi,

While I'm running the "bwupdate.sh" on Linux machine, not finished (terminal output > "Downloading Blocklists...").

Maybe, some URLs not respond.

How often the blackweb.txt get updated?

Hi,

Is there any process or somebody updating the blackweb.txt file periodically to let the community to download and use it directly instead of letting them run scripts? If not, you may add me as a core committer and I can maintain it every few days.

Thanks!

TLDs blocked

Hello, Im starting with your project, it is amazing. I see you just added on blocked TLDs ".gov.com" since it is not a TLD and no one could want to block it, I suppose it was a mistake. The same with ".webcamsex" and ".comsex" which doesn't exist.
".gay" is used by the gay community, not only porn. I would remove .download and .cash since are used for legal stuff too.
Anyway, I know that I can compile it myself the way I want it, but I think this suggestions contribute, thanks.

MD5 Checksum file not matching the blackweb.txt

Following your instructions, it seems the MD5 file is not maintained and it fails the check.

[root@dev block]# wget -q -N https://raw.githubusercontent.com/maravento/blackweb/master/blackweb.tar.gz && cat blackweb.tar.gz* | tar xzf -
[root@dev block]# ls | grep blackweb
blackweb.sh
blackweb.tar.gz
blackweb.txt
[root@dev block]# wget -q -N https://raw.githubusercontent.com/maravento/blackweb/master/checksum.md5
[root@dev block]# ls | grep checksum
checksum.md5
[root@dev block]# md5sum blackweb.txt | awk '{print $1}' && cat checksum.md5 | awk '{print $1}'
967503818f84092d132f78027da9c61d
26440258542a93dc377e9e99d64ca392

My scripts are failing because of this, is there any plan to keep the md5 file updated?

Thanks in advance and best regards,

CD

Source of blockurls.txt

Hi Team,
Could you please let us know the source for blockurls.txt which you are using in the bwupdate.sh at line 339 :

add blockurls

sed '/^$/d; /#/d' lst/blockurls.txt | sort -u >>hit.txt

Regards

Warning: Potential Redundancy Issue in the Blacklist

Hello~

We are a threat intelligence research group. In our recent research work, we referenced the blacklist you constructed. During our quality assessment of this blacklist, we found that there might be some redundant information.

Specifically, some domains may have expired or been safely resolved but are still present in the blacklist. In the July 2024 update, our evaluation of redundancy is as follows:

Status Rate
Hold 0.80%
Pending 1.68%
NXDOMAIN 0.17%
Parked 0.20%
Sinkholed 0.03%

For example, the domain grrrff2452.com has been sinkholed in our observation, but it still appears in the most recent blacklist update.

Considering the large size of your blacklist, the absolute scale of this redundant information can impact the deployment and use of the blacklist. We are curious if you check the domains in the blacklist during updates, or if it is possible to develop some basic quality assurance and optimization mechanisms in the future.

Thank you!

Nombres duplicados

Hay algunos nombres que están duplicados pero en Mayúsculas (por ejemplo wtennis).
¿Se podrían cargar todos los nombres, pasar a minúsculas y quitar repetidos? Hay como 600+ casos.

Legitimate sites in blacklist

Hi.

I'm trying to implement blackweb in the proxy of one of my clients, but there is something I can't understand.

I activate it in a test environment, and it's blocking almost everything, including:

  • well known sites like apache.org, docker.com, pypi.org, etc.;
  • banks like unicredit.it;
  • almost every italian provider (tiscali.it, virgilio.it, libero.it, etc.);
  • governative institutions (sogei.it, rai.it, provincia.tn.it, etc.).

Why are they included? How can Sogei, owned by a ministry, be in a blacklist?

Legitimate sites are filtered due to inclusion of all domains from shallalist

Looks like shallalist categorizes the domains, and not all of them are blacklistted. For example it has domains for governments or educational institutes. However, when processed, blackweb doesn't distinguish them so valid sites are blacklisted too.

A solution for this to include a category list (perhaps user can overwrite), and after untar, all valid categories can be removed.

I haven't checked other packages yet, but they may have similar files in them.

Git clone of blackweb goes to current directory

blackweb.sh assumes there is a directory ~/blackweb and files are installed there. However, git clone https://github.com/maravento/blackweb.git creates the directory on the current directory. This works fine from crontab job, but if the script is executed from a different location, it checks out and downloads the files to loca directory. cd ~ before git clone would help to run this script from any location.
Please read the README carefully.

Download and Update script are not working

Hello,

I am very interested in using your domain list but every time I try to run the download or update script, it is filled with errors(No such file or directory mostly). Even if I download and try to use just the blackweb.txt file form 2/8, it contains duplicates (subdomains). I am doing this on Ubuntu 16.03. Thank you for any help, much appreciated.

The state of the project is ALPHA, what prevents it from becoming beta or stable?

I was wondering: The state of the project is ALPHA, what prevents it from becoming beta or stable?
I would love to have a peek at a check list that can make this project into beta or even stable stage.
I have seen that this project is for Squid-Cache and since I'm maintaining the RPM packages for Squid-Cache it would be fun to try and contribute to this project.
Maybe even package the basic install and update scripts in some way for CentOS and Debian based linux systems?

URL Patterns and Inconsistency

I appreciate you if you could clarify for me.

I noticed that out of 4+M total bad sites, 99% of them have got a prefix of some numbers (\d+.), and 2.7M were hosted on the .blogspot. domain and all of these could be valid bad links, but in reality, 95% of what (malicious links) I have encountered and what PhishTank and Google security detect are not hosted on .blogspot.* domain and do not have number prefix.

So, how to treat these two sets and any plan to cover other active malicious database sets in the future?

Couple of questions

Hey there. Great work here!

Couple of questions.

  1. What are the sources for the data? Is it from all sites listed under 'Data Sheet'?
  2. How often are you planning to update these? I hope it's automated. :)

Help to convert list

Hello and thanks for a good job :)

I try to create a blacklist for using in DNSmasq and your list is amazing

can you help how can convert this list with sed or anything to DNSmasq list

this is your list
.sample.com

this is a sample of DNSmasq list
address=/sample.com/0.0.0.0

No category information in the compiled list

Hi,
thanks a lot for your work!
I am looking at the compile list, but there is no information about which category a black domain belongs to. Is it possible to add that info in next compilation? At least info about the sources categorization would be helpful enough.
Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.