Git Product home page Git Product logo

linkedint's Introduction

Credits

Original Scraper by Danny Chrastil (@DisK0nn3cT): https://github.com/DisK0nn3cT/linkedin-gatherer

Modified by @vysecurity

Requirements

pip install beautifulsoup4
pip install thready

Change Log

[v0.1 BETA 12-07-2017] Additions:

  • UI Updates
  • Constrain to company filters
  • Addition of Hunter for e-mail prediction

To-Do List

  • Allow for horizontal scraping and mass automated company domain, and format prediction per company
  • Add Natural Language Processing techniques on titles to discover groups of similar titles to be stuck into same "department". This should then be visualised in a graph.

Usage

Put in LinkedIn credentials in LinkedInt.py Put Hunter.io API key in LinkedInt.py Run LinkedInt.py and follow instructions

Example

██╗     ██╗███╗   ██╗██╗  ██╗███████╗██████╗ ██╗███╗   ██╗████████╗
██║     ██║████╗  ██║██║ ██╔╝██╔════╝██╔══██╗██║████╗  ██║╚══██╔══╝
██║     ██║██╔██╗ ██║█████╔╝ █████╗  ██║  ██║██║██╔██╗ ██║   ██║
██║     ██║██║╚██╗██║██╔═██╗ ██╔══╝  ██║  ██║██║██║╚██╗██║   ██║
███████╗██║██║ ╚████║██║  ██╗███████╗██████╔╝██║██║ ╚████║   ██║
╚══════╝╚═╝╚═╝  ╚═══╝╚═╝  ╚═╝╚══════╝╚═════╝ ╚═╝╚═╝  ╚═══╝   ╚═╝

Providing you with Linkedin Intelligence
Author: Vincent Yiu (@vysec, @vysecurity)
Original version by @DisK0nn3cT
[*] Enter search Keywords (use quotes for more percise results)
"General Motors"

[*] Enter filename for output (exclude file extension)
generalmotors

[*] Filter by Company? (Y/N):
Y

[*] Specify a Company ID (Provide ID or leave blank to automate):


[*] Enter e-mail domain suffix (eg. contoso.com):
gm.com

[*] Select a prefix for e-mail generation (auto,full,firstlast,firstmlast,flast,first.last,fmlast):
auto

[*] Automaticly using Hunter IO to determine best Prefix
[!] {first}.{last}
[+] Found first.last prefix

linkedint's People

Contributors

binary1985 avatar blackout314 avatar mdsecactivebreach avatar n0pe-sled avatar vysec avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linkedint's Issues

UnboundLocalError: local variable 'user' referenced before assignment

Running LinkedInt on either macOS or Kali Linux, I encounter this error whenever I try to do a scrape.

Here is a transcript of the session (with company and person info redacted)

[*] Enter search Keywords (use quotes for more percise results)


[*] Enter filename for output (exclude file extension)
LinkedInt-Test

[*] Filter by Company? (Y/N): 
Y

[*] Specify a Company ID (Provide ID or leave blank to automate): 
<REDACTED>

[*] Enter e-mail domain suffix (eg. contoso.com): 
<REDACTED>.com

[*] Select a prefix for e-mail generation (auto,full,firstlast,firstmlast,flast,first.last,fmlast,lastfirst): 
auto 

[*] Automaticly using Hunter IO to determine best Prefix
[!] {first}
[+] Found first prefix

[!] Cannot load main LinkedIn page
<REDACTED>
[*] Obtained new session: <REDACTED>

[*] Using company ID: <REDACTED>
https://www.linkedin.com/voyager/api/search/cluster?count=40&guides=List(v->PEOPLE,facetCurrentCompany-><REDACTED>)&origin=OTHER&q=guided&start=0
[*] 122 Results Found
[*] Fetching 3 Pages

[*] Fetching page 0 with 40 results
[*] No picture found for <REDACTED>, ___
Traceback (most recent call last):
  File "LinkedInt.py", line 477, in <module>
    get_search()
  File "LinkedInt.py", line 273, in get_search
    email = '{}@{}'.format(user, suffix)
UnboundLocalError: local variable 'user' referenced before assignment

Results limitation

I currently have a LinkedIn premium account which I have recently downgraded to a basic account; however, it's still a valid premium account for a few more days (technically). That said, I do receive the following error:

[] 17459 Results Found
[
] LinkedIn only allows 1000 results. Refine keywords to capture all data

The target org does have nearly 20k employees on LI - so that is accurate. Ideally, I would like to extract the entire company list from LI. That said, I'm not sure if this error is due to my account or if this same error would happen anyway. I didn't really want to pay to find out... Does anyone know offhand?

Keyerror 'data'

Getting this at the end

Traceback (most recent call last):
File "LinkedInt.py", line 447, in
prefix = content['data']['pattern']
KeyError: 'data'

ImportError: No module named thready

Hi,

I am getting the following error:

Traceback (most recent call last):
File "LinkedInt.py", line 27, in
from thready import threaded
ImportError: No module named thready

SyntaxError: Missing parentheses in call to 'print' statements

My environment;
OS: Manjaro 18.0.2 Illyria
Kernel: x86_64 Linux 4.19.13-1-MANJARO
Python 3.7.1 (default, Oct 22 2018, 10:41:28)
[GCC 8.2.1 20180831]
Installed beautifulsoup4 and thready.

I get this error when running.

LinkedInt-master]$ python LinkedInt.py 
  File "LinkedInt.py", line 45
    print "[!] Oops, you did not enter your api_key, username, or password in LinkedInt.py"
                                                                                          ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("[!] Oops, you did not enter your api_key, username, or password in LinkedInt.py")?
LinkedInt-master]$ 

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.linkedin.com', port=443): Max retries exceeded with url

Seem to be getting the following error message upon startup of the script

Traceback (most recent call last): File "LinkedInt.py", line 465, in <module> get_search() File "LinkedInt.py", line 189, in get_search r = requests.get(url, cookies=cookies, headers=headers) File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 75, in get return request('get', url, params=params, **kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 60, in request return session.request(method=method, url=url, **kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 533, in request resp = self.send(prep, **send_kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 646, in send r = adapter.send(request, **kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 516, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.linkedin.com', port=443): Max retries exceeded with url: /voyager/api/search/cluster?count=40&guides=List(v-%3EPEOPLE,facetCurrentCompany-%3E403184)&origin=OTHER&q=guided&start=0 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f12ae031cd0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

No module named thready

Any help with this error will be appreciated:
Running on Python 2.7.13

Traceback (most recent call last):
File "LinkedInt.py", line 27, in
from thready import threaded
ImportError: No module named thready

IndexError: list index out of range

I'm trying to run this for a company with 812 employees and getting this error:

Traceback (most recent call last):
  File "LinkedInt.py", line 477, in <module>
    get_search()
  File "LinkedInt.py", line 206, in get_search
    print "[*] Fetching page %i with %i results" % ((p),len(content['elements'][0]['elements']))
IndexError: list index out of range

LinkedIn changed json format - images in html output are broken - fix included

LinkedIn have changed their json format

So the code on line 212 no longer works:
data_picture = "https://media.licdn.com/mpr/mpr/shrinknp_400_400%s" % c['hitInfo']['com.linkedin.voyager.search.SearchProfile']['miniProfile']['picture']['com.linkedin.voyager.common.MediaProcessorImage']['id']

It should be replaced with:
data_picture = c['hitInfo']['com.linkedin.voyager.search.SearchProfile']['miniProfile']['picture']['com.linkedin.common.VectorImage']['rootUrl'] + c['hitInfo']['com.linkedin.voyager.search.SearchProfile']['miniProfile']['picture']['com.linkedin.common.VectorImage']['artifacts'][3]['fileIdentifyingUrlPathSegment']

The [3] is the highest quality 800x800 but could be replaced with [0],[1] or [2] for lower quality images.

Could not authenticate, creds are good.

Got one good run pulling data as expected, subsequent tries produce the error:

Could not authenticate to linkedin. 'NoneType' object has no attribute '__getitem__'

Creds are verified good and I'm able to login to LinkedIn from two different machines.

Any idea?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.