Git Product home page Git Product logo

theharvester's Introduction

*********************************
*theHarvester 2.2a              *
*Coded by Christian Martorella  *
*[email protected]  *
*********************************

What is this?
-------------

theHarvester is a tool for gathering e-mail accounts, subdomain names, virtual hosts, open ports/ banners, and employee names from different public sources (search engines, pgp key servers). 

Is a really simple tool, but very effective for the early stages of a penetration test or just to know the visibility of your company in the Internet.

The actual sources are:

Passive:
--------
-google: google search engine  - www.google.com

-google-profiles: google search engine, specific search for Google profiles

-bing: microsoft search engine  - www.bing.com

-bingapi: microsoft search engine, through the API (you need to add your Key in the discovery/bingsearch.py file)

-pgp: pgp key server - pgp.rediris.es 

-linkedin: google search engine, specific search for Linkedin users

-shodan: Shodan Computer search engine, will search for ports and banner of the discovered hosts  (http://www.shodanhq.com/)

-vhost: Bing virtual hosts search

Active:
-------
-DNS brute force: this plugin will run a dictionary brute force enumeration
-DNS reverse lookup: reverse lookup of ip´s discovered in order to find hostnames
-DNS TDL expansion: TLD dictionary brute force enumeration


Dependencies:
------------
none

Changelog 2.2a:
---------------
-Fixed Linkedin parser (thanks Alton Johnson and Francesco Stillavato)
-New banner with superpowers

Changelogin 2.2:
----------------
-Added Jigsaw (www.jigsaw.com)
-Added 123People (www.123people.com)
-Added limit to google searches as the maximum results we can obtain is 1000
-Removed SET, as service was discontinued by Google
-Fixed parser to remove wrong results like emails starting with @


Changelog in 2.1:
----------------
-DNS Bruteforcer
-DNS Reverse lookups
-DNS TDL Expansion
-SHODAN DB integration
-HTML report
-DNS server selection 


Changelog in 2.0:
----------------
-Complete rewrite, more modular and easy to maintain
-New sources (Exalead, Google-Profiles, Bing-Api)
-Time delay between request, to prevent search engines from blocking our IP´s
-You can start the search from the results page that you want, hence you can *resume* a search 
-Export to xml
-All search engines harvesting


TODO:
----
See TODO file.

Comments? Bugs? requests?
------------------------
[email protected]

Updates:
--------
http://code.google.com/p/theharvester/

Thanks:
-------
John Matherly -  SHODAN project
Lee Baird for suggestions and bugs reporting

theharvester's People

Watchers

James Cloos avatar  avatar

theharvester's Issues

Bug

What steps will reproduce the problem?
1. run a command using -b linkedin
2. python theHarvester.py -d securitytube.net -l 500 -b linkedin
3. python theHarvester.py -d microsoft.com -l 500 -b linkedin
4. python theHarvester.py -d microsoft  -l 500 -b linkedin

What is the expected output? What do you see instead?
Expected to find info as in the examples, but nothing is output - 0 results

What version of the product are you using? On what operating system?
Backtrack 5 R3 version 2.2 of theHarvester

Please provide any additional information below.
I have tried several sites, including ones I know have linkedin links and info 
and users (my work - me) came up empty. It also does not create an XML or HTML 
file when using linkedin (-f linkedin.html)

Original issue reported on code.google.com by [email protected] on 4 Feb 2013 at 1:23

how to install the api-key

I would like to know what are the steps for the installation of the API-KEY to 
make search in SHODAN 

I am using kali linux in their latest version   

best regards ! 

Original issue reported on code.google.com by [email protected] on 5 Apr 2014 at 9:59

Case sensitive regex

The regexs in myparser.py arse case sensitive, I think it makes sense to make 
them case insensitive to yield more results.

To do this for emails change like 34 in myparser.py to:

reg_emails = re.compile('[a-zA-Z0-9.-_]*' + '@' + '[a-zA-Z0-9.-]*' + self.word, 
re.I)

^ re.I will also have to be added to the other regexs within myparser.py to 
make them case insensitive.

Original issue reported on code.google.com by [email protected] on 6 Mar 2013 at 11:39

Feature Request: Show where results were found

Hi,

First of all thank you for taking the time to create theHarvester, I use it 
very often.

One thing that I think that it is missing is the ability to output exactly 
where the email addresses were found. For example, if emails were found via a 
Google search, it would be cool to show on which URLs the email addresses were 
found.

Thanks again for such a great tool!

Ryan

Original issue reported on code.google.com by [email protected] on 2 Jan 2013 at 10:02

Exalead - Invalid search engine

What steps will reproduce the problem?
1. theharvester -d cisco.com -b exalead

What is the expected output? What do you see instead?
Invalid search engine, try with: bing, google, linkedin, pgp, exalead, jigsaw, 
bing_api, people123, google-profiles


What version of the product are you using?
Ver. 2.2a 

On what operating system? 
Linux kali 3.14-kali1-486 #1 Debian 3.14.5-1kali1 (2014-06-07) i686 GNU/Linux



Original issue reported on code.google.com by [email protected] on 16 Jun 2014 at 4:24

theharvester

Hello sir,
         please sir am a new comer,new student, sir my question goes thus 
1. how can i install ubutu linux 
2. how can i use (theharvester) on windows xp 
3. how can i hack host, smtp,rdp, admin eamils
please sir i will be glad if you can give positive response to all what i have 
whritten up, Thank very much i wich i should be one of your best student 
because i want to learn and good aslo, Hope i will hear from you soon Thanks 
very much 



Best regard
oluwaseun 

Original issue reported on code.google.com by [email protected] on 28 Aug 2011 at 6:17

html reports not generating completely

What steps will reproduce the problem?
1. Executing a search:

./theHarvester.py -d somedomain.com -l 500 -b all -f temp.html

What is the expected output? What do you see instead?
Output is not being entirely placed into the html output. Emails and hosts 
found are, but the link(s) prior to the email (e.g. google links to pages) are 
not being placed into the html report.

What version of the product are you using? On what operating system?
* TheHarvester Ver. 2.2a 

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 18 Sep 2013 at 1:54

Linkedin search

What steps will reproduce the problem?
1. ./theharvester.py -d microsoft -l 200 -b linkedin


What is the expected output? What do you see instead?
List linked users

What version of the product are you using? On what operating system?
theHarvester2.2

Please provide any additional information below.

The problem resides in the unique function in myparser.py and the affected 
statemnt is 

if x[0] != "@":

i think that this can be fixed editing the previous statement with this one:

if x != "" and x[0] != "@":

Original issue reported on code.google.com by [email protected] on 7 Feb 2013 at 10:13

Debug information accidentally left in 2.2a?

$ ./theHarvester.py -d XXX.com -l 500 -b all

*******************************************************************
*                                                                 *
* | |_| |__   ___    /\  /\__ _ _ ____   _____  ___| |_ ___ _ __  *
* | __| '_ \ / _ \  / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
* | |_| | | |  __/ / __  / (_| | |   \ V /  __/\__ \ ||  __/ |    *
*  \__|_| |_|\___| \/ /_/ \__,_|_|    \_/ \___||___/\__\___|_|    *
*                                                                 *
* TheHarvester Ver. 2.2a                                          *
* Coded by Christian Martorella                                   *
* Edge-Security Research                                          *
* [email protected]                                   *
*******************************************************************


Full harvest..
[-] Searching in Google..
    Searching 0 results...
    Searching 100 results...
    Searching 200 results...
    Searching 300 results...
    Searching 400 results...
    Searching 500 results...
[-] Searching in PGP Key server..
[-] Searching in Bing..
    Searching 50 results...
    Searching 100 results...
    Searching 150 results...
    Searching 200 results...
    Searching 250 results...
    Searching 300 results...
    Searching 350 results...
    Searching 400 results...
    Searching 450 results...
    Searching 500 results...
[-] Searching in Exalead..




<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">





  <head >
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta name="description" content="The best search engine out there"/>
    <meta name="viewport" content="initial-scale = 1, user-scalable = no" />
    <meta http-equiv="X-UA-Compatible" content="IE=8" />
    <link rel="icon" href="http://www.3ds.com/favicon.ico"/>
    <link rel="shortcut icon" href="http://www.3ds.com/favicon.ico"/>

    <link type="application/opensearchdescription+xml" rel="search" title="Exalead" href="/go/opensearchdescription/"/>

    <link type="text/css" rel="stylesheet" href="/content/media/css/base.css?1349082334"/>



    <link rel="help" href="faq/"/>
    <link rel="home" href="/search/web/results/"/>




          <link rel="first" href="/search/web/results/?q=%XXX.com&amp;elements_per_page=15&amp;start_index=0"/>



          <link rel="next" href="/search/web/results/?q=%XXX.com&amp;elements_per_page=15&amp;start_index=15"/>




    <title>
  Web Search - "@XXX.com"
 - Exalead</title>


Original issue reported on code.google.com by [email protected] on 18 Feb 2013 at 9:49

Won't run due to dependency issue?

What steps will reproduce the problem?
1. Download version 2.2 (latest) tar from google code.
2. Untar.
3. Run. ($ ./theHarvester.py -f REDACTED -l 500 -b all)

What is the expected output? 

STDOUT of the results

What do you see instead?

$ ./theHarvester.py -f REDACTED -l 500 -b all

*************************************
*TheHarvester Ver. 2.2              *
*Coded by Christian Martorella      *
*Edge-Security Research             *
*[email protected]      *
*************************************


Full harvest..
[-] Searching in Google..
$

What version of the product are you using?

2.2 TAR from Google Code

On what operating system?

Mac OS X Mountain Lion

Please provide any additional information below.

I think it is probably due to a dependency issue. Would be useful if the 
dependency error was shown to know which is needed.

Original issue reported on code.google.com by [email protected] on 24 Jan 2013 at 10:32

Search engines added by me

Hi,

I've been working in some modifications and added support for private search 
engines like ixquick, duckduckgo and disconnect.me (with bing, google, yahoo 
and duckduckgo search capability). How can I do to collaborate with the proyect 
and upload that code?

Gilberto Najera

Original issue reported on code.google.com by [email protected] on 9 Nov 2014 at 7:25

Attachments:

Incomplete harvesting

What steps will reproduce the problem?
1.Run a search on a known site. 
2.
3.

What is the expected output? What do you see instead?
I expected to see all of the emails attached to the organization.
Three important email addresses are missing. These may be being hidden by some 
type of security but I am unaware of it. And I really do not expect that all of 
the three would be hidden. They are openly advertised on the website. Therefore 
they should be harvestable. 


What version of the product are you using? On what operating system?
2.2a Kali

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 1 Oct 2013 at 9:09

Proxy support

It seems that httplib does not support the use of the system proxy variable.

Please can you add authenticated proxy support.

Original issue reported on code.google.com by [email protected] on 22 Jan 2013 at 5:06

Email addresses inside [] brackets will be listed as starting with [ (left bracket)

What steps will reproduce the problem?
1. theharvester.py -d gfong.com -l 50 -b google

What is the expected output?

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

What do you see instead?

[email protected]
[[email protected]
[[email protected]
[email protected]
[email protected]
[email protected]

What version of the product are you using? On what operating system?

2.2a on Windows 7

Please provide any additional information below.

The problem occurs when an email-address is put inside square brackets.
This is common in mail exports, like this:
From: John Doe [[email protected]]
Sent: Fri 13, 1337
To: D. Evil [[email protected]]

Saw an earlier bug fix, removing preceding @ from addresses. Probably the same 
issue? It should anyhow be easy to filter out the square brackets.

Original issue reported on code.google.com by [email protected] on 10 Apr 2013 at 8:15

dns-brute forcing error and fix

What steps will reproduce the problem?
1. theharvester -d target-domain -b all -v -f target-domain.html -n -c
2.
3.

What is the expected output? What do you see instead?

-Expected output of the command should include a dns brute forcing of said 
target.
-What is seen is an error that the program is having trouble opening 
dns-search.txt


What version of the product are you using? On what operating system?
-Version of theharvester is 2.2a on fully updated Kali Linux

Please provide any additional information below.
-I solved the issue by replacing "f = open(self.file,"r")" in 
dnssearch-threads.py to "f = open(self.file,"dns-search.txt")" and all ran 
great with no errors.

Hopefully this will help others that are having the same issues

Original issue reported on code.google.com by [email protected] on 20 Apr 2014 at 10:07

Emails appear incorrect when Google results are truncated

What steps will reproduce the problem?
1. Perform a harvester query, for a known organisation
2. Notice that when you attempt to modify the email regex to:
(' ' + '[a-zA-Z0-9.-_]*' + '.' + '[a-zA-Z0-9.-_]*' + '@' + '[a-zA-Z0-9.-]*' + 
self.word)
You will begin to see some results appearing as "... [email protected]"
3. These results are incorrectly being parsed, due to the fact that you are 
creating the results not from the pages, but including truncated google results.

What is the expected output? What do you see instead?

Expected: "[email protected]" - as viewed on webpage.
Actual: "... [email protected]" - From Truncated google result.

What version of the product are you using? On what operating system?
2.2a - Mac OS X

Please provide any additional information below.

I cannot see a fix for this, unless you provide a future command line switch 
e.g. -IF (Investigate further and attempt to curl/ grep the page for the 
corresponding result.)

Original issue reported on code.google.com by [email protected] on 7 Sep 2014 at 12:22

Permisson

What steps will reproduce the problem?

1. Trying to run the Harvester but get Permission denied though I have full 
privledge
2. trying to run using ./theHarvester.py

3. Version 2.2a

What is the expected output? What do you see instead?
Should run but get Permission denied

What version of the product are you using? On what operating system?
Harvester 2.2a LinuxMint


Please provide any additional information below.

All I can add

Original issue reported on code.google.com by [email protected] on 13 Apr 2015 at 12:24

exalead not in list of search engines

What steps will reproduce the problem?
1. enter any search that is solely based on exalead, e.g. ./theHarvester.py -d 
www.wikipedia.org -b exalead

What is the expected output? What do you see instead?
I should get a valid search, instead I get this:

Invalid search engine, try with: bing, google, linkedin, pgp, exalead, jigsaw, 
bing_api, people123, google-profiles


What version of the product are you using? On what operating system?
Mac OS X 10.6: Version 2.2a

Please provide any additional information below.
I tracked the error at line 89 of theHarvester.py: the list is not complete, 
"exalead" is not listed. When fixing this in my version, it all runs smoothly. 
Also yandex is in the list, but currently unsupported, maybe exclude it for now?


Original issue reported on code.google.com by [email protected] on 6 Oct 2013 at 1:43

Uncle to scrape google for over 1000 results

What steps will reproduce the problem?
1. choose google under the -b arguement
2. choose a number over a 1000 under the -l argument 
3. run the tool
What is the expected output? What do you see instead?
i expect to see the application keep counting up the number of pages, instead i 
see it stops at 1000, instead of going over to the spcified -l parametyer. 

What version of the product are you using? On what operating system?
latest on kali linux 

Please provide any additional information below.
you guys kick ass ! 



Original issue reported on code.google.com by [email protected] on 3 Mar 2014 at 6:47

Cannot Generate XML report

when i try to generate a XML file as the output i am still getting HTML file  

*the expected outcome should be xml file


* i am using recent harvester version on Ubuntu 


please tell me the option for generating the XML file as output .

command used ::./theHarvester.py -d <HOSTNAME> -l 500 -b all -f har.xml

Original issue reported on code.google.com by [email protected] on 22 Feb 2012 at 4:26

local variable 'full' referenced before assignment

Hello,

I encounter a problem with theHarvester-2.2a and 2.2.
In both versions, I am not able to generate an html file with the "f" option.

The following message is displayed:

Saving file
local variable 'full' referenced before assignment
Error creating the file
I have also tried several versions of python (python, python2,2.7)
I am running under mint 16.
Please let me know If you need more informations.

Original issue reported on code.google.com by [email protected] on 5 Feb 2014 at 10:37

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.