Git Product home page Git Product logo

diegocaraballo / email-extractor Goto Github PK

View Code? Open in Web Editor NEW
177.0 15.0 72.0 114 KB

The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url

Home Page: https://whitemonkey.io

Python 98.49% Shell 0.58% Dockerfile 0.93%
spyder extraction stractor email-extractor email email-marketing emails scraper scrapy scraping scraping-websites scrapper scrapping scrapy-spider scrapers python

email-extractor's Introduction

Add Feature: 13-07-2022

  • You can save the mailing list in a .csv file

Fix: 13-09-2019

  • Fix - The script was pasted when searching for phrases on Google.
  • Add Requirements - pip install -r requirements.txt

Email Extractor Functions

English

  • (1) Extract emails from a single URL

  • (2) Extract emails from a URL (Two Levels) - Search on the page and all its URLs

  • (3) Do a Google search, save the Urls found and search the emails

  • (4) Same as option 3 but with a list of keywords (TODO)

  • (5) You can list the saved emails

  • (6) You can save the mailing list in a .txt file

  • (7) You can save the mailing list in a .csv file

  • (8) Delete Emails from data base

  • (9) Exit

  • The emails are stored in a Sqlite database ("Emails.db")

Español

  • (1) Extraer los correos de una única URL

  • (2) Extraer los correos de una Url (Dos Niveles) - Busca sobre la página y todas sus URL

  • (3) Hacer una busqueda en Google, guardar las Urls encontradas y buscar los correos en dichas Urls

  • (4) Igual que la opción 3 pero con una lista de palabras (TODO)

  • (5) Listar correos guardados

  • (6) Se pueden guardar los correo en un archivo .txt

  • (7) Se pueden guardar los correo en un archivo .csv

  • (8) Eliminar correos de la base de datos

  • (9) Salir

  • Todos los correos son guardados en una base de datos Sqlite ("Emails.db")

Versión: Python 3.x.

Required modules - Modulos necesarios

pip install -r requirements.txt

Extraer correos de paginas web con Python

Docker

Docker and docker-compose are required.

In order to use docker follow below instructions:

Installation

  1. Get an .env file
cp .env.example .env
  1. Start docker container
docker-compose up -d --build

Usage

To execute the script and get the options menu:

docker exec -ti email-extractor python EmailExtractor.py

To get the sqlite db with al e-mails:

docker cp email-extractor:Emails.db .

To get the file saved, for instance, as "out":

ocker cp email-extractor:out.txt .

email-extractor's People

Contributors

dcaraballo avatar dependabot[bot] avatar diegocaraballo avatar gregord1a1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

email-extractor's Issues

HTTP Error 503: Service Unavailable

I was running it vps on GCP
After several searches, it scraped 91 emails and I got this message.
Searching emails... please wait
This operation may take several minutes
HTTP Error 503: Service Unavailable
Press enter to continue

ERROR when executing the script

when ever i run the script it says "Line 6 in module, from googlesearch import search import error no module named googlesearch

Add a list of phrases to search

The idea is to have an option like 3 (Do a Google search, save the Urls found and search the emails), but search a list of phrases.

This list can be in a .txt
The option can ask for number of search results in Google

Unknown URL type and sites that hang scraper

Hello,
I'm getting this error that stops the program from extracting emails.

unknown url type: '[email protected]'
Press enter to continue

There is also issue with some sites that hang scraper, I'm not sure if it can be overcome, but here are some examples of the sites maybe you can figure it out from it.

Searching in https://whyy.streamguys1.com/whyy-mp3

Searching in http://www.investor.reuters.com/business/BusCompanyOverview.aspx?t
icker=SCI&symbol=SCI&target=%2fbusiness%2fbuscompany%2fbuscompfake%2fbuscompove
rview

Searching in http://www.accuweather.com/en/us/jersey-city-nj/07306/weather-fore
cast/2735_pc

groupon.com


I'm getting also this error
`[Errno 104] Connection reset by peer`

Run the Email Extractor

Hi!

I have downloaded the Email Extractor but I dont know how to run it!

I have Python 3.6 in my Windows and also have Anaconda Installed

Could you tell me how to execute the Email extractor?

When I click in the python file it opens a black prompt but sudddenly it closes

Toda ayuda sera buenvenida

Gracias

googlesearch module not found

Hi,

I am trying to install 'googlesearch' module with pip but can't be found.

sudo pip install googlesearch

Returns this error

Could not find a version that satisfies the requirement googlesearch (from versions: ) No matching distribution found for googlesearch

What can I do?

Thanks

Google Search Error - Ubuntu 18.04

Hi,

when i try to execute your program on ubuntu 18.04, i get the following error message. Is this normal?

Traceback (most recent call last):
File "./EmailExtractor.py", line 6, in
from googlesearch import search
ImportError: No module named googlesearch

I have installed the following packages below:
sudo apt-get install pip3
pip3 install google
pip3 install google-search
pip3 install beautifulsoup4
pip3 install fake-useragent

ImportError: cannot import name 'search'

~/Email-extractor$ python3 EmailExtractor.py 
Traceback (most recent call last):
  File "EmailExtractor.py", line 6, in <module>
    from googlesearch import search
ImportError: cannot import name 'search'

Running ubuntu 18.04 , installed requirements. Run using python3. Could you please help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.