This tool analyzes website traffic logs to identify Google bots, bad bot traffic, and human traffic. It uses various signals to differentiate between legitimate and illegitimate traffic.
- Identify Google bot traffic: Detects legitimate Google bots based on user agent strings, IP addresses, and other signals.
- Identify bad bot traffic: Detects known bad bots, fake Google bots, and the use of libraries and net tools based on user agent strings and other signals.
- Identify human traffic: Filters out bot traffic to identify human traffic.
- Clone the repository or download the script.
- Ensure you have Python 3.x installed.
- Install required dependencies using pip:
pip install pandas requests
Run the script from the command line with the appropriate arguments:
python traffic_analyzer.py <file> [options]
<file>
: The CSV file containing the logs to analyze.
--googlebot
: Identify Google bot traffic.--badbot
: Identify bad bot traffic.--human
: Identify human traffic.
Identify Google bot traffic:
python traffic_analyzer.py logs_analyst.csv --googlebot
Identify bad bot traffic:
python traffic_analyzer.py logs_analyst.csv --badbot
Identify human traffic:
python traffic_analyzer.py logs_analyst.csv --human
- User agent:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
- IP address: Starts with
66.249.
apiIpAutonomousSystemOrganization
: ContainsGOOGLE
- JavaScript WebGL Renderer, WebDriver, and Hardware Concurrency: Null or NaN
apiEndpoint
: Must behttp
fingerprintAccept
: Containstext/html
- Known bad bots: Identified using an external list of bad user agents.
- Fake Google bots: User agent contains
Googlebot
, but the IP address does not belong to Google. - Libraries and net tools: User agent strings like
curl
,python-requests
,PostmanRuntime
. - Path traversal attacks: URLs containing
"/../"
- Negative of bot signals
- User agent strings that do not match known bad bots, libraries, and net tools
- URLs that do not contain path traversal patterns
This project is licensed under the MIT License.