Git Product home page Git Product logo

denyhosts_sync's People

Contributors

janpascal avatar sergey-dryabzhinsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

denyhosts_sync's Issues

Peering To Parent

I know right now that peering is supported within your own infrastructure. Would we be able to get peering activated to a centralized cloud server?

So, we could have internal organization servers peer to each other. Then they can set a primary server of their internal sync servers set to sync to a cloud server. That way they can sync internally, but also sync externally in order to allow other users to get their blocked hosts or pull-down ips that they don't have. Also, it would help to reduce the connections needed to push to the external sync server.

Current DenyHosts setup:
Host 1 -> sync.denyhosts.org
Host 2 -> sync.denyhosts.org
Host 3 -> sync.denyhosts.org

Proposed setup:
Host 1 -> internal.sync.host
Host 2 -> internal.sync.host
Host 3 -> internal.sync.host
Host 4 -> internal2.sync.host
Host 5 -> internal2.sync.host
Host 6 -> internal2.sync.host
internal.sync.host <-> internal2.sync.host (internal peering)
internal.sync.host <-> sync.denyhosts.org (external sync)

Bootstrap using legacy sync server

Connect to the legacy sync server at xmlrpc.denyhosts.net to download reported hosts, in order to bootstrap the list of blocked hosts

Out of memory during maintenance

For large databases, the controllers.maintenance() function may give an out of memory error. Probably causes by Crackers.all(), since this will generate a list of all crackers in the database. In my test case, more that a million

Getting started issues

Hi All,

Awesome project, I'm trying to run my own denyhosts_sync server for my home setup, as I find denyhosts a lot easier to setup than fail2ban.

Some issues I'm having when setting up:

  1. When trying to run ./setup.py minify_js minify_css install I get a message:
root@ubuntu-xenial:~/denyhosts_sync# sudo ./setup.py minify_js minify_css install
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

error: invalid command 'minify_js'

So the instruciton should be ./setup.py install minify_js minify_css. I'll open a PR for that

  1. When trying to run those commands, I get the following error:
Using /usr/lib/python2.7/dist-packages
Finished processing dependencies for denyhosts-server==2.2.0
running minify_js
static/js/bootstrap.js -> static/js/bootstrap.min.js
error: [Errno 2] No such file or directory

@janpascal @sergey-dryabzhinsky can you help? ๐Ÿ˜„

Please add command line flags to purge database entries

Sometimes IP addresses in the database might be considered stale or test data might no longer be needed. In these cases it would be helpful to have command line parameters which would purge the database of old IP address.

My suggestion would be to have three options:

  1. Purge all legacy IP address.
  2. Purge all non-legacy IP address
  3. Purge one specific IP address from either the legacy or the non-legacy tables.

So for example:
dh_syncserver --purge-legacy
dh_syncserver --purge-addresses
dh_syncserver --purge-ip 123.456.789.012

Alternatively, it might be useful to have the sync server purge old IP address if it receives a signal. For example, sending SIGHUP could re-read the configuration file and purge old IP addresses from the database to give the server a fresh start.

Concurrency

When two clients report the same host at the same time, I expect things to go wrong. Use transactions or some kind of locking mechanism.

Port to Python3

Python2 is end-of-life, so fix all compatibility issues with Python3.
Some of the dependencies may make this problematic: twistar for Python3 may not have complete Python3 support.

As an alternative, the denyhosts sync server could be rewritten completely using the now standardized Python3 asyncio, aiohttp, aiohttp-xmlrpc and an asyncio ORM like tortoise-orm, but that would require some serious development effort

High CPU usage

I've switched over to the Python 3 code today, but system usage seems similar so have spent some time on the DB (MariaDB/MySQL) and have managed to tweak some of the settings to (mostly --innodb-buffer-pool-size=5G), which has resolved the massive HDD usage I was getting denyhosts/denyhosts#149 (comment).

But not sure how to debug the the remaining CPU usage. As seen by the screenshots below, it appears the majority is from denyhosts_sync.

Also, worth noting is that I've tried running the sync server with an empty db, which results in near 0 CPU usage, which suggests there being an issue with the large DB (~3GB). (though this was only run for 10 mins)
Just found #39 now too, which looks like there are SELECT * being used, which I'd imagine wouldn't help the CPU load.

Let me know what you need to help debug the issue.

image
image
image

image

Multiple synced servers

I'm working on this. There will be one master server, and multiple slaves. Communication between master and slave is authenticated. Clients can connect to any server (master or slave). This is important, as this makes it possible to use round-robin DNS to distribute the clients over the servers, with as little as possible client-side configuration.
Slave servers send any updates they get to the master, which distributes it over the other slaves.

Too Many Queries Running On Getting New Hosts

Within the controllers.py file there's the get_qualifying_crackers method. We've noticed that after doing the initial query of the crackers table that it returns the id and ip_address in order to query the reports table. Looking at the function it looks as if we could reduce the added on queries with the queries below. It looks like all of the data needed is in the crackers table, so there's no need to query the reports table.

I think this could be used for lines 103 - 156. Let me know if there's anything I'm missing in the code that's not being handled in the SQL.

SELECT 
	DISTINCT ip_address
FROM
	crackers
where
	#check from last sync
	latest_time > 1590029927
	#check a and b (reports and resiliency)
	(
		current_reports >= 3
		AND resiliency >= 3600
	) OR 
	# check c and d
	(
		# this is a resiliency check
        	AND latest_time - first_time >= 3600
	)
order by latest_time ASC;

This is a portion of the log, so you can see what's happening.

SELECT DISTINCT c.id, c.ip_address
            FROM crackers c
            WHERE (c.current_reports >= 3)
                AND (c.resiliency >= 18000)
                AND (c.latest_time >= 1590033382)
            ORDER BY c.first_time DESC
		    28 Query	COMMIT
		    12 Query	SELECT * FROM reports WHERE cracker_id = 291909 ORDER BY first_report_time ASC
		   109 Query	COMMIT
		    12 Query	COMMIT
		    52 Query	SELECT * FROM crackers WHERE id = 299601
		    43 Query	SELECT * FROM reports WHERE cracker_id = 287737 ORDER BY first_report_time ASC
		    52 Query	COMMIT
		    61 Query	SELECT * FROM reports WHERE cracker_id = 251808 ORDER BY first_report_time ASC
		   123 Query	SELECT * FROM crackers WHERE id = 311074
		   123 Query	COMMIT
		    61 Query	COMMIT
		    43 Query	COMMIT
		   127 Query	SELECT * FROM crackers WHERE id = 291040
		   127 Query	COMMIT
		   113 Query	SELECT * FROM crackers WHERE id = 289900
		    21 Query	SELECT * FROM reports WHERE cracker_id = 300001 ORDER BY first_report_time ASC
		   113 Query	COMMIT
		   140 Query	SELECT * FROM reports WHERE cracker_id = 268511 ORDER BY first_report_time ASC
		    21 Query	COMMIT
		    45 Query	SELECT * FROM reports WHERE cracker_id = 299803 ORDER BY first_report_time ASC
		   140 Query	COMMIT
		    45 Query	COMMIT
		    23 Query	SELECT * FROM crackers WHERE id = 261267
		    30 Query	SELECT * FROM reports WHERE cracker_id = 296089 ORDER BY first_report_time ASC

Performance

Check what happens when there is a lot of traffic and the database is nicely filled

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.