janpascal / denyhosts_sync Goto Github PK
View Code? Open in Web Editor NEWSync server for denyhosts
License: GNU Affero General Public License v3.0
Sync server for denyhosts
License: GNU Affero General Public License v3.0
I know right now that peering is supported within your own infrastructure. Would we be able to get peering activated to a centralized cloud server?
So, we could have internal organization servers peer to each other. Then they can set a primary server of their internal sync servers set to sync to a cloud server. That way they can sync internally, but also sync externally in order to allow other users to get their blocked hosts or pull-down ips that they don't have. Also, it would help to reduce the connections needed to push to the external sync server.
Current DenyHosts setup:
Host 1 -> sync.denyhosts.org
Host 2 -> sync.denyhosts.org
Host 3 -> sync.denyhosts.org
Proposed setup:
Host 1 -> internal.sync.host
Host 2 -> internal.sync.host
Host 3 -> internal.sync.host
Host 4 -> internal2.sync.host
Host 5 -> internal2.sync.host
Host 6 -> internal2.sync.host
internal.sync.host <-> internal2.sync.host (internal peering)
internal.sync.host <-> sync.denyhosts.org (external sync)
e.g. on port 80
At last ipaddr is missing. Also add sqlite3/mysql dependencies, or document them in the README
Connect to the legacy sync server at xmlrpc.denyhosts.net to download reported hosts, in order to bootstrap the list of blocked hosts
or block them using an IP check or access key
Probably caused by an updated in the Twisted framework, received_headers seems to have been remove in release 16.0 of Twisted
hint: it needs
Allow a new peer to download the database of reported hosts from one of the other peers.
See Anne Bezemer's algorithm and comments in the source code
For large databases, the controllers.maintenance() function may give an out of memory error. Probably causes by Crackers.all(), since this will generate a list of all crackers in the database. In my test case, more that a million
Hi All,
Awesome project, I'm trying to run my own denyhosts_sync server for my home setup, as I find denyhosts a lot easier to setup than fail2ban.
Some issues I'm having when setting up:
./setup.py minify_js minify_css install
I get a message:root@ubuntu-xenial:~/denyhosts_sync# sudo ./setup.py minify_js minify_css install
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help
error: invalid command 'minify_js'
So the instruciton should be ./setup.py install minify_js minify_css
. I'll open a PR for that
Using /usr/lib/python2.7/dist-packages
Finished processing dependencies for denyhosts-server==2.2.0
running minify_js
static/js/bootstrap.js -> static/js/bootstrap.min.js
error: [Errno 2] No such file or directory
@janpascal @sergey-dryabzhinsky can you help? ๐
Sometimes IP addresses in the database might be considered stale or test data might no longer be needed. In these cases it would be helpful to have command line parameters which would purge the database of old IP address.
My suggestion would be to have three options:
So for example:
dh_syncserver --purge-legacy
dh_syncserver --purge-addresses
dh_syncserver --purge-ip 123.456.789.012
Alternatively, it might be useful to have the sync server purge old IP address if it receives a signal. For example, sending SIGHUP could re-read the configuration file and purge old IP addresses from the database to give the server a fresh start.
Now maintenance.expiry_days controls the expiry of both the legacy and the addresses from client reports
When two clients report the same host at the same time, I expect things to go wrong. Use transactions or some kind of locking mechanism.
Python2 is end-of-life, so fix all compatibility issues with Python3.
Some of the dependencies may make this problematic: twistar for Python3 may not have complete Python3 support.
As an alternative, the denyhosts sync server could be rewritten completely using the now standardized Python3 asyncio, aiohttp, aiohttp-xmlrpc and an asyncio ORM like tortoise-orm, but that would require some serious development effort
@janpascal and others,
Is this project still considered "alive"? Is it Python 3 ready?
I've switched over to the Python 3 code today, but system usage seems similar so have spent some time on the DB (MariaDB/MySQL) and have managed to tweak some of the settings to (mostly --innodb-buffer-pool-size=5G
), which has resolved the massive HDD usage I was getting denyhosts/denyhosts#149 (comment).
But not sure how to debug the the remaining CPU usage. As seen by the screenshots below, it appears the majority is from denyhosts_sync
.
Also, worth noting is that I've tried running the sync server with an empty db, which results in near 0 CPU usage, which suggests there being an issue with the large DB (~3GB). (though this was only run for 10 mins)
Just found #39 now too, which looks like there are SELECT *
being used, which I'd imagine wouldn't help the CPU load.
Let me know what you need to help debug the issue.
Right now the SQL statements don't work because of the different query parameter handling
I'm working on this. There will be one master server, and multiple slaves. Communication between master and slave is authenticated. Clients can connect to any server (master or slave). This is important, as this makes it possible to use round-robin DNS to distribute the clients over the servers, with as little as possible client-side configuration.
Slave servers send any updates they get to the master, which distributes it over the other slaves.
See the ideas in the algorithm proposed by Anne Bezemer
Within the controllers.py file there's the get_qualifying_crackers method. We've noticed that after doing the initial query of the crackers table that it returns the id and ip_address in order to query the reports table. Looking at the function it looks as if we could reduce the added on queries with the queries below. It looks like all of the data needed is in the crackers table, so there's no need to query the reports table.
I think this could be used for lines 103 - 156. Let me know if there's anything I'm missing in the code that's not being handled in the SQL.
SELECT
DISTINCT ip_address
FROM
crackers
where
#check from last sync
latest_time > 1590029927
#check a and b (reports and resiliency)
(
current_reports >= 3
AND resiliency >= 3600
) OR
# check c and d
(
# this is a resiliency check
AND latest_time - first_time >= 3600
)
order by latest_time ASC;
This is a portion of the log, so you can see what's happening.
SELECT DISTINCT c.id, c.ip_address
FROM crackers c
WHERE (c.current_reports >= 3)
AND (c.resiliency >= 18000)
AND (c.latest_time >= 1590033382)
ORDER BY c.first_time DESC
28 Query COMMIT
12 Query SELECT * FROM reports WHERE cracker_id = 291909 ORDER BY first_report_time ASC
109 Query COMMIT
12 Query COMMIT
52 Query SELECT * FROM crackers WHERE id = 299601
43 Query SELECT * FROM reports WHERE cracker_id = 287737 ORDER BY first_report_time ASC
52 Query COMMIT
61 Query SELECT * FROM reports WHERE cracker_id = 251808 ORDER BY first_report_time ASC
123 Query SELECT * FROM crackers WHERE id = 311074
123 Query COMMIT
61 Query COMMIT
43 Query COMMIT
127 Query SELECT * FROM crackers WHERE id = 291040
127 Query COMMIT
113 Query SELECT * FROM crackers WHERE id = 289900
21 Query SELECT * FROM reports WHERE cracker_id = 300001 ORDER BY first_report_time ASC
113 Query COMMIT
140 Query SELECT * FROM reports WHERE cracker_id = 268511 ORDER BY first_report_time ASC
21 Query COMMIT
45 Query SELECT * FROM reports WHERE cracker_id = 299803 ORDER BY first_report_time ASC
140 Query COMMIT
45 Query COMMIT
23 Query SELECT * FROM crackers WHERE id = 261267
30 Query SELECT * FROM reports WHERE cracker_id = 296089 ORDER BY first_report_time ASC
Analyse database use and add indexes where appropriate
Check what happens when there is a lot of traffic and the database is nicely filled
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.