joeyism / linkedin_scraper Goto Github PK

• Scraping Public profiles are battle tested in court in HiQ VS LinkedIn case.
• GDPR, CCPA, SOC2 compliant
• High rate limit - 300 requests/minute
• Fast - APIs respond in ~2s
• Fresh data - 88% of data is scraped real-time, other 12% are not older than 29 days
• High accuracy
• Tons of data points returned per profile

Built for developers, by developers.

Usage

To use it, just create the class.

Sample Usage

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()

email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/joey-sham-aa2a50122", driver=driver)

NOTE: The account used to log-in should have it's language set English to make sure everything works as expected.

User Scraping

from linkedin_scraper import Person
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5")

Company Scraping

from linkedin_scraper import Company
company = Company("https://ca.linkedin.com/company/google")

Job Scraping

from linkedin_scraper import JobSearch, actions
from selenium import webdriver

driver = webdriver.Chrome()
email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
input("Press Enter")
job = Job("https://www.linkedin.com/jobs/collections/recommended/?currentJobId=3456898261", driver=driver, close_on_complete=False)

Job Search Scraping

from linkedin_scraper import JobSearch, actions
from selenium import webdriver

driver = webdriver.Chrome()
email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
input("Press Enter")
job_search = JobSearch(driver=driver, close_on_complete=False, scrape=False)
# job_search contains jobs from your logged in front page:
# - job_search.recommended_jobs
# - job_search.still_hiring
# - job_search.more_jobs

job_listings = job_search.search("Machine Learning Engineer") # returns the list of `Job` from the first page

Scraping sites where login is required first

Run ipython or python
In ipython/python, run the following code (you can modify it if you need to specify your driver)

from linkedin_scraper import Person
from selenium import webdriver
driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver, scrape=False)

Login to Linkedin
[OPTIONAL] Logout of Linkedin
In the same ipython/python code, run

person.scrape()

The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the browser and it won't affect your profile views. Then when you run person.scrape(), it'll scrape and close the browser. If you want to keep the browser on so you can scrape others, run it as

NOTE: For version >= 2.1.0, scraping can also occur while logged in. Beware that users will be able to see that you viewed their profile.

person.scrape(close_on_complete=False)

so it doesn't close.

Scraping sites and login automatically

From verison 2.4.0 on, actions is a part of the library that allows signing into Linkedin first. The email and password can be provided as a variable into the function. If not provided, both will be prompted in terminal.

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()
email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver)

API

Person

A Person object can be created with the following inputs:

Person(linkedin_url=None, name=None, about=[], experiences=[], educations=[], interests=[], accomplishments=[], company=None, job_title=None, driver=None, scrape=True)

`linkedin_url`

This is the linkedin url of their profile

`name`

This is the name of the person

`about`

This is the small paragraph about the person

`experiences`

This is the past experiences they have. A list of linkedin_scraper.scraper.Experience

`educations`

This is the past educations they have. A list of linkedin_scraper.scraper.Education

`interests`

This is the interests they have. A list of linkedin_scraper.scraper.Interest

`accomplishment`

This is the accomplishments they have. A list of linkedin_scraper.scraper.Accomplishment

`company`

This the most recent company or institution they have worked at.

`job_title`

This the most recent job title they have.

`driver`

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

For example

driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver)

`scrape`

When this is True, the scraping happens automatically. To scrape afterwards, that can be run by the scrape() function from the Person object.

`scrape(close_on_complete=True)`

This is the meat of the code, where execution of this function scrapes the profile. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other profiles are desired, then you might want to set that to false so you can keep using the same driver.

Company

Company(linkedin_url=None, name=None, about_us=None, website=None, headquarters=None, founded=None, company_type=None, company_size=None, specialties=None, showcase_pages=[], affiliated_companies=[], driver=None, scrape=True, get_employees=True)

`linkedin_url`

This is the linkedin url of their profile

`name`

This is the name of the company

`about_us`

The description of the company

`website`

The website of the company

`headquarters`

The headquarters location of the company

`founded`

When the company was founded

`company_type`

The type of the company

`company_size`

How many people are employeed at the company

`specialties`

What the company specializes in

`showcase_pages`

Pages that the company owns to showcase their products

`affiliated_companies`

Other companies that are affiliated with this one

`driver`

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

`get_employees`

Whether to get all the employees of company

For example

driver = webdriver.Chrome()
company = Company("https://ca.linkedin.com/company/google", driver=driver)

`scrape(close_on_complete=True)`

This is the meat of the code, where execution of this function scrapes the company. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other companies are desired, then you might want to set that to false so you can keep using the same driver.

Contribution

linkedin_scraper's People

Contributors

Stargazers

Watchers

Forkers

shanmugamgsn hazbro cfeng1 cris2223 amy1117 henrikbrokkoli calvinsid lokkju thecodenation iurisan javiplav tarryninnes bfritscher f75 thanasis-com mikezlin swissbeats93 jj-globant zolfran ybautista yakovkeselman jcaye yohayg spankders ra0mb1er vishvdeep007 buldozer911 celikburak caisbalderas thelady naoufalhatim aeneasb vakili73 akhilkishore ycc1107 cypherius17 bioinfonerd-forks markdverhagen sambadbidari gordyu nattiesse susithrupasinghe mmatosin zavenzareyan-da hassaanse batukaraev josesaribeiro carledwards jqueguiner quinnpertuit noam1991 yuxiaoluo jdonmc danancy ajithisaac michaelmccrae flowfelis dwtcourses lw6ege rodrigodias27 skatingboy2006 cedalexandre staylean luvkg santhoshnarayan niloo79 madhan-kumar-selvaraj lxngoddess5321 pauldevos anonyknight not-today chenmorsays monkey-king-cloud ibpad wtgme marshadshaheen magdafairfax arnav-deep yourman hubio-inc ssitb isaacjoy bymi15 paddoum mjdhasan adrian0350 8adre barna3838 cafehaine manthonyg augiek sstd3010 infin8 tostefan yeshapatel356 saleweaver warifp jazzminwe diederick90 x-magic

linkedin_scraper's Issues

no such element: Unable to locate element: {"method":"class name","selector":"name"}

As of today, if I run this:

company = Company("https://ca.linkedin.com/company/google")

This comes back:

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"name"}
  (Session info: chrome=64.0.3282.167)
  (Driver info: chromedriver=2.26.436382 (70eb799287ce4c2208441fc057053a5b07ceabac),platform=Linux 4.13.0-37-generic x86_64)

no such element: Unable to locate element

I used the code someone uploaded
**I noticed if the profile name is something like abcfss1 then it works fine. If the profile name is like
abc_fss_1 then it shows the below error
for example if my profile is https://www.linkedin.com/in/trinanjan_saha_216751116/ it doesn't work.
**

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".pv-top-card-v3"}
(Session info: chrome=77.0.3865.90)

Scraper

I tried to Install Linkedin_user_scrapper
I'm facing following error.

After installing linkedin_scraper, module still cannot be found

Hi, I tried installing the linkedin_scraper "!pip install linkedin_scraper"
and tried to run your sample code but this error keeps coming up "ModuleNotFoundError: No module named 'linkedin_scraper'" is the a step which I have missed?

Location Info has not been scraped

Class Person does not get location info. This could be a very useful data if it is implemented

facing location problem when scraping person

When I run this code

from linkedin_scraper import Person, actions, Company
from selenium import webdriver
driver = webdriver.Chrome()
email = "my email"
password = "my password"
actions.login(driver, email, password) 
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5",driver=driver)

It comes back

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".pv-top-card-v3"}

Maybe linkedin has upgrade their website

Company scraping blocked by authwall

I'm trying to scrape LinkedIn public Company profile.
But on the first attempt, I get blocked by an authwall page: https://www.linkedin.com/authwall?trk=gf...
How can I bypass this authwall?

In the console I get the following exception:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".name"}

NoSuchElementException

Hello,

I almost got this to work. But in the end I get this error.

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"name"}
  (Session info: chrome=66.0.3359.181)
  (Driver info: chromedriver=2.38.552518 (183d19265345f54ce39cbb94cf81ba5f15905011),platform=Mac OS X 10.13.1 x86_64)

What can I do to fix this?

How do I write into text file or csv file?

Here's what I'm doing...
driver - webdriver.Chrome()
person = Person("http://www.linkedin.com/in/randomperson", driver = driver, scrape=False)
f = open('scrape.txt', 'w')
f.write(person.scrape())

But I'm getting an error..
TypeError: write() argument must be str, not None
How do I convert person object into string and append into text file, or better yet, csv file?

Trying to write scraped employees of a company

Apologies if this is a simple question, but is there a way to print the list of names from get_employees? It seems to be scrolling through the pages and gathering them, but I can't get them to display.

Thanks!

Module error

I followed the instructions

pip3 install --user linkedin_scraper

and created a file

from linkedin_scraper import Person person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5")

python version

Python 2.7.15rc1

but im getting the error

Traceback (most recent call last): File "./linkedin_scraper_1.py", line 6, in <module> from linkedin_scraper import Person ImportError: cannot import name Person

after i run pip freeze i get linkedin-scraper==2.3.0.... its installed. am I missing something?

IndexError: list index out of range Company Scraper

Following error by using ver. 2.5.2 as well. Now its not giving the exact line as well where the error is happening

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-10-1fef8da42f7f> in <module>
    ---> 12 company = Company(linkedin_url = "https://www.linkedin.com/company/google",driver=driver,close_on_complete=False,get_employees=False)
     13 set_trace()
     14 print(company)

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in __init__(self, linkedin_url, name, about_us, website, headquarters, founded, company_type, company_size, specialties, showcase_pages, affiliated_companies, driver, scrape, get_employees, close_on_complete)

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in scrape(self, get_employees, close_on_complete)

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in scrape_logged_in(self, get_employees, close_on_complete)

IndexError: list index out of range

Linkedin profile Summary and Country

it will be more than helpful if linkedin_scraper scrape Summary and country
#person.country
person.country=driver.find_element_by_class('pv-top-card-section__location Sans-17px-black-55%-dense mt1 inline-block').getText()
#Click
driver.find_element_by_xpath('//*[@id="ember1506"]/button/span[1]').click()
#person.summary
person.summary=driver.find_element_by_xpath('pv-top-card-section__location Sans-17px-black-55%-dense mt1 inline-block')).getText().strip(" \n\t\r")

NUMB QUESTION AND NOT AN ISSUE, PLEASE HELP

I followed all procedures and scraping was done successfully as the Chrome closed on its own after scraping but I have no idea on how I could retrieve the scraped data. Please don't descend on me with anger because I am new on this data scraping and I really want to learn, please help me out.
HOW DO I RETRIEVE SCRAPED DATA, POSSIBLY SAVE AS .CSV FILE?

I have actually gone through the earlier issues before posting but non relate to my newbie questions.
Thanks and honestly I admire your works.

Need to restart

Data not scraped

Hey,
the script don't scrape website, company size and all thouse informations.

I don't know why but i can get the description
here is an exemple

Scrape list of connections

Is there any capability to scrape the list of connections for a given person? Or any pointers on how such capability might be extended into this codebase?

I'm hoping to use this to find connections-in-common between two given persons.

Support Scraping Salary Data

As seen on: https://www.linkedin.com/salary/

Is that possible to include all the professional experiences?

When I try to scrape the LinkedIn profile of myself, I find that only 5 professional experience and be scrapped. Is that possible to scrape all?
In addition, I got the professional experiences twice since I have profiles in two languages. Can I only scrape the English one?

Scrape list of Skills

In the latest version of the LinkedIn website, there does not seems te be a directory of Skills anymore. Could this be added to this library in any way?

Does Linkedin allows to scrap for keyword in "Licenses & certifications"?

I'm currently studying to obtain a Microsoft MCSA certification.

I would like to scrap LinkedIn in order to understand how many people in the World own a MCSA certification.

From your documentation it looks like there is no way to dig for a keyword into "Licenses & certifications", is that right?
Is that because LinkedIn does not allow us to search for it?
Or is this simply not covered by your library?

Experiences scraped None

Hello
after trying to scrape someone's profile experiences was None
like this
[Company Name
Synovus at None from None to None for None based at None, Controller at None from None to None for None based at None, CPA - Audit at None from None to None for None based at None]

[Feature]: Current company

Explicitly knowing company name allows me to go and find people at https://www.linkedin.com/title/[marketing]-at-[company-name]. Very valuable endpoint.

Error Running Test Class

How to replicate the error:
1) Go to linkedin_scrapper\test
2) py scrape_person.py
No module name linkedin_user_scraper (I'm using python version 3)

How to scrape list of urls

Hi I am trying to scrape linkedin profiles , i have urls of profile
for l in link:
try:
person = Person(l, driver=driver, scrape=False)
person.scrape(close_on_complete=False)
But every time i am receiving same profile , what's the issue

IndexError when scraping certain companies

Forgive me if this is an error on my part, but for most company urls, when I try to scrape I get an IndexError for various lines under scrape_logged_in (also reloads each company page 2-3 times). Sample code and error below (first company works, second doesn't):

Traceback (most recent call last): File "LItest.py", line 16, in <module> company = Company("https://www.linkedin.com/company/archimedes-rx/about", driver=driver, get_employees=False, close_on_complete=False) File "C:\Users\n****\AppData\Local\Programs\Python\Python37\lib\site-packages\linkedin_scraper\company.py", line 69, in __init__ self.scrape(get_employees=get_employees, close_on_complete=close_on_complete) File "C:\Users\n****\AppData\Local\Programs\Python\Python37\lib\site-packages\linkedin_scraper\company.py", line 79, in scrape self.scrape_logged_in(get_employees = get_employees, close_on_complete = close_on_complete) File "C:\Users\n****\AppData\Local\Programs\Python\Python37\lib\site-packages\linkedin_scraper\company.py", line 156, in scrape_logged_in self.specialties = "\n".join(values[-1].text.strip().split(", ")) IndexError: list index out of range

from linkedin_scraper import Company, actions
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
email = "MYEMAIL"
password = "MYPASSWORD"
actions.login(driver, email, password)
company = Company("https://www.linkedin.com/company/archimedes-rx", driver=driver, get_employees=False, close_on_complete=False)
companytwo = Company("https://www.linkedin.com/company/life360", driver=driver, get_employees=False)
print(companytwo)

This library cannot pass the security check that is done by linkedin before proceeding towards the profile

Please find the solution to that

NameError: name 'Person' is not defined

i already install it by pip

when i'm try to run

> from linkedin_user_scraper import scraper
> > rick_fox = scraper.Person("https://www.linkedin.com/in/rifox?trk=pub-pbmap")
> > iggy = scraper.Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5")

it show me this error :

NameError: name 'Person' is not defined

Index out of range

I was testing out this script but I couldn't get it going because of

self.about_us = grid.find_elements_by_tag_name("p")[0].text.strip()
IndexError: list index out of range

Any comments on how to overcome this issue?

Better Way to Get Person's Job Title and Company

I'm currently using regex to extract information from the person.experiences field

user_company_info = re.findall(
    r"b\'(.+?)\'", str(person.experiences[0])
)

However, this method is unreliable. Since this is a fairly important feature, I was wondering if linkedin_scraper had a builtin method to handle this. If not, I'd be happy to work on its implementation.

invalid session id

I run into an 'invalid session id' error from selenium each time I try to use this scraper. I use the following code:

from linkedin_scraper import Person, actions
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
driver = webdriver.Chrome(chrome_options=options, executable_path="C:/Program Files (x86)/Google/Chrome/Application/chromedriver.exe", )
email = [email protected]
password = mypassword  
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver, scrape=True)

Person(linkedin_url=None, experiences=[], educations=[], driver=driver, scrape=True)
person.scrape()

Where [email protected] is subbed for my real email address and password for my real password

When I run the script, Selenium succesfully opens a browser, goes to Linkedin.com, logs in and goes to the page of andre iguodala. Then, I get the following error:

C:\Users\jbennekom\Desktop\linkedin_scraper-master\linkedin_scraper\testje.py:22: DeprecationWarning: use options instead of chrome_options
driver = webdriver.Chrome(chrome_options=options, executable_path="C:/Program Files (x86)/Google/Chrome/Application/chromedriver.exe", )
Traceback (most recent call last):
File "C:\Users\jbennekom\Desktop\linkedin_scraper-master\linkedin_scraper\testje.py", line 28, in
Person(linkedin_url=None, experiences=[], educations=[], driver=driver, scrape=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\linkedin_scraper\person.py", line 36, in init
driver.get(linkedin_url)
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSessionIdException: Message: invalid session id

I run Chrome 76 with a chromedriver that's compatible with Chrome 76.

[New feature] Collecting info about companies and employed people

Hi,
I would ask you to improve your library to collecting information about companies. List of companies should be read from external CSV file. On output we should get info about company, below is image with marked required sections:

If it is possible it would by great to get also list of employed people (only names and surnames). Output should be save in JSON format - one file on each companies list.
It is possible, how you thing @joeyism??

ERROR: Could not find a version

I am unable to download this package. I tried:

conda install linkedin_scraper
pip install linkedin_scraper
pip3 install --user linkedin_scraper

These are the errors I am seeing:

ERROR: Could not find a version that satisfies the requirement request (from linkedin_scraper) (from versions: none)
ERROR: No matching distribution found for request (from linkedin_scraper)

@joeyism, can't wait to try it out 😁

Feature: scrape your timeline posts from connections

Only name and surname scraped?

Maybe I'm using it wrong, but I wasn't able to retrieve more info than person's name and surname.

Code used:

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()

email = "my-login"
password = "my-pass"
actions.login(driver, email, password)
person = Person("https://www.linkedin.com/in/rbranson/", driver=driver, scrape=False)
person.scrape(close_on_complete=True)
print(person)

It prints out just:

Richard Branson

Experience
[b'Founder' at None from None to None for None based at None]

Education
[]

Interest
[]

Accomplishments
[]

Do I access the data in a wrong way?

NoSuchElementException

when i run (for some people -- but not all):

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome('/Applications/chromedriver')

email = "email"
password = "password"
actions.login(driver, email, password) 

person = Person(linkedin_url='https://www.linkedin.com/in/isabellez/', driver=driver, scrape=False)
person.scrape(close_on_complete=True)

i get:

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".pv-entity__secondary-title"}
  (Session info: chrome=80.0.3987.149)

has linkedin changed its code again?

Long description fail

When a company has a long desciption the script fail

Running on Company page fails

When running on a company page (i.e. Google example) I'm getting this error. How can this be resolved?

Traceback (most recent call last):

  File "scrape_person.py", line 11, in <module>
    company = Company("https://linkedin.com/company/first-data-corporation", driver=driver)
  File "/Users/username/Library/Python/3.7/lib/python/site-packages/linkedin_scraper/company.py", line 69, in __init__
    self.scrape(get_employees=get_employees, close_on_complete=close_on_complete)
  File "/Users/username/Library/Python/3.7/lib/python/site-packages/linkedin_scraper/company.py", line 79, in scrape
    self.scrape_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
  File "/Users/username/Library/Python/3.7/lib/python/site-packages/linkedin_scraper/company.py", line 140, in scrape_logged_in
    _ = WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.XPATH, '//h1[@dir="ltr"]')))
  File "/Users/username/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

AttributeError: 'NoneType' object has no attribute 'click'

Here is the screenshot of the code. I am trying to do a company scrape but it looks like when I search for 'page_member_main_nav_about_tab' after inspecting the HTML, it does not find anything with that tag name. Any thoughts? Was running totally fine about 3 weeks ago, but perhaps something has since changed with LinkedIn's HTML structure.

Keeps refreshing last page of employee scrape

I'm trying to scrap a list of employees, but when it reaches the last page it will keep looping through it and never complete.

It looks like it has to with the loop continuing while the next element exists. It becomes unclickable on the last page, however.

unknown error: DevToolsActivePort file doesn't exist

I'm trying to run the "SAMPLE USAGE" on a headless server (Ubuntu 20.04) - however, the code crashes with:

Traceback (most recent call last):
  File "scraper.py", line 5, in <module>
    driver = webdriver.Chrome()
  File "/home/scraper/.local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in __init__
    RemoteWebDriver.__init__(
  File "/home/scraper/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "/home/scraper/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/home/scraper/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/scraper/.local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: crashed.
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /snap/chromium/current/command-chromium.wrapper is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

Not logging in

Hey there,

Just wondering, I tried the intro code for scraping a profile but the logging in does not working in headless chrome.

Is just adds my email + password and then hangs without submitting.

Appreciate any help.

Isaac

Crash in Company.py when linkedIn refuses to display more employee list

For companies such as "https://www.linkedin.com/company/tata-consultancy-services/" if we try to get all employees, the linkedin does not shows employees after page 100 or so. Then there is a crash. This can be fixed by simple exception check "NoSuchElementException" thrown by webdriver

Cannot use get_employees list index out of range

Getting the below error when running this line:
company = Company(linkedin_url='https://www.linkedin.com/company/addmin/', driver=driver, scrape=True)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-25-898c77594b17> in <module>
      3 password = credentials['password']
      4 actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
----> 5 company = Company(linkedin_url='https://www.linkedin.com/company/addmin/', driver=driver, scrape=True, close_on_complete=True)

~/opt/anaconda3/lib/python3.7/site-packages/linkedin_scraper/company.py in __init__(self, linkedin_url, name, about_us, website, headquarters, founded, company_type, company_size, specialties, showcase_pages, affiliated_companies, driver, scrape, get_employees, close_on_complete)
     67 
     68         if scrape:
---> 69             self.scrape(get_employees=get_employees, close_on_complete=close_on_complete)
     70 
     71     def __get_text_under_subtitle(self, elem):

~/opt/anaconda3/lib/python3.7/site-packages/linkedin_scraper/company.py in scrape(self, get_employees, close_on_complete)
     77     def scrape(self, get_employees = True, close_on_complete = True):
     78         if self.is_signed_in():
---> 79             self.scrape_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
     80         else:
     81             self.scrape_not_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)

~/opt/anaconda3/lib/python3.7/site-packages/linkedin_scraper/company.py in scrape_logged_in(self, get_employees, close_on_complete)
    146 
    147         self.name = driver.find_element_by_xpath('//span[@dir="ltr"]').text.strip()
--> 148         navigation.find_elements_by_xpath("//a[@data-control-name='page_member_main_nav_about_tab']")[0].click()
    149 
    150         _ = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.TAG_NAME, 'section')))

IndexError: list index out of range

Failing to run

For the life of me, I cannot get the scraper to run for a Person, may I please get the watered down version of the instructions if it's not too much to ask

IndexError: list index out of range - Company Scraper Issues

#Wrote this basic code to scrape the company profile
from selenium import webdriver
from linkedin_scraper import Company,Person, actions
from IPython.core.debugger import set_trace
import time
driver=webdriver.Chrome(executable_path=r"C:\bin\chromedriver.exe")
#driver.get('http://www.google.com')
email = ""
password = ""
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
company = Company("https://www.linkedin.com/company/infosys",driver=driver)
print(company)

Getting "IndexError: list index out of range" error in the following line:
navigation.find_elements_by_xpath("//a[@data-control-name='page_member_main_nav_about_tab']")[0].click()
Any help please to fix the same?

IndexError Traceback (most recent call last)
in
10 #person = Person("https://www.linkedin.com/in/mohitagarwal/", driver=driver, scrape=False)
11 #person.scrape(close_on_complete=True)
---> 12 company = Company("https://www.linkedin.com/company/infosys",driver=driver)
13 print(company)
14 #set_trace()

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in init(self, linkedin_url, name, about_us, website, headquarters, founded, company_type, company_size, specialties, showcase_pages, affiliated_companies, driver, scrape, get_employees, close_on_complete)
67
68 if scrape:
---> 69 self.scrape(get_employees=get_employees, close_on_complete=close_on_complete)
70
71 def __get_text_under_subtitle(self, elem):

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in scrape(self, get_employees, close_on_complete)
77 def scrape(self, get_employees = True, close_on_complete = True):
78 if self.is_signed_in():
---> 79 self.scrape_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
80 else:
81 self.scrape_not_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in scrape_logged_in(self, get_employees, close_on_complete)
146
147 self.name = driver.find_element_by_xpath('//span[@dir="ltr"]').text.strip()
--> 148 navigation.find_elements_by_xpath("//a[@data-control-name='page_member_main_nav_about_tab']")[0].click()
149
150 _ = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.TAG_NAME, 'section')))

IndexError: list index out of range

Scraper times out and doesn't find HTML it's looking for

I managed to get the script to run and logged in before scraping as pointed out in the missing element issue, but then I get a timeout when trying to actually scrape the page:

File "company.py", line 140, in scrape_logged_in
_ = WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.XPATH, '//h1[@dir="ltr"]')))
File "wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

The page loads and waits a bit before that happens, and I'm guessing what is going on is that it's not finding the HTML element it's looking for. Could it have changed on Linkedin's end?

Chromedriver

Hi joeyism,

Could you kindly explain more about: "export CHROMEDRIVER=~/chromedriver"? It will be really helpful for a beginner like me.

Thank you so much!
Shen

Company Scrapping not working.

**Traceback (most recent call last):
File "C:\Users\LAksh\Desktop\test.py", line 14, in
person = Company("https://www.linkedin.com/company/qubole/", driver=driver)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\linkedin_scraper\company.py", line 69, in init
self.scrape(get_employees=get_employees, close_on_complete=close_on_complete)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\linkedin_scraper\company.py", line 79, in scrape
self.scrape_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\linkedin_scraper\company.py", line 194, in scrape_logged_in
self.employees = self.get_employees()
File "C:\Users\LAksh\Anaconda3\lib\site-packages\linkedin_scraper\company.py", line 131, in get_employees
_ = WebDriverWait(driver, wait_time).until(EC.visibility_of(res))
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\support\wait.py", line 71, in until
value = method(self._driver)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\support\expected_conditions.py", line 144, in call
return _element_if_visible(self.element)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\support\expected_conditions.py", line 148, in _element_if_visible
return element if element.is_displayed() == visibility else False
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py", line 488, in is_displayed
self)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 636, in execute_script
'args': converted_args})['value']
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: headless chrome=83.0.4103.116)

Error details
code

import os
from linkedin_scraper import Person, actions, Company
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome("C:\\Users\\LAksh\\Downloads\\chromedriver_win32\\chromedriver", options=chrome_options)
driver.set_window_size(1920, 1080)


email = ""
password = ""
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Company("https://www.linkedin.com/company/qubole/", driver=driver)
print(person)

Error: "export CHROMEDRIVER=~/chromedriver"

Hello,

I'm trying to run this in Jupyter Notebooks. When I enter "export CHROMEDRIVER=~/chromedriver" in an cell and run it, I get the following error message:

export CHROMEDRIVER=~/chromedriver File "<ipython-input-1-f9de4c739fa9>", line 1 export CHROMEDRIVER=~/chromedriver ^ SyntaxError: invalid syntax

Could someone help me solve this? Please keep in mind that I am new to this. Thank you in advance!

joeyism / linkedin_scraper Goto Github PK

linkedin_scraper's Introduction

Linkedin Scraper

Installation

Setup

Sponsor

Usage

Sample Usage

User Scraping

Company Scraping

Job Scraping

Job Search Scraping

Scraping sites where login is required first

Scraping sites and login automatically

API

Person

linkedin_url

name

about

experiences

educations

interests

accomplishment

company

job_title

driver

scrape

scrape(close_on_complete=True)

Company

linkedin_url

name

about_us

website

headquarters

founded

company_type

company_size

specialties

showcase_pages

affiliated_companies

driver

get_employees

scrape(close_on_complete=True)

Contribution

linkedin_scraper's People

Contributors

Stargazers

Watchers

Forkers

linkedin_scraper's Issues

Recommend Projects

Recommend Topics

Recommend Org

`linkedin_url`

`name`

`about`

`experiences`

`educations`

`interests`

`accomplishment`

`company`

`job_title`

`driver`

`scrape`

`scrape(close_on_complete=True)`

`linkedin_url`

`name`

`about_us`

`website`

`headquarters`

`founded`

`company_type`

`company_size`

`specialties`

`showcase_pages`

`affiliated_companies`

`driver`

`get_employees`

`scrape(close_on_complete=True)`