Git Product home page Git Product logo

linkedin_scraper's Introduction

Linkedin Scraper

Scrapes Linkedin User Data

Linkedin Scraper

Installation

pip3 install --user linkedin_scraper

Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper

Setup

First, you must set your chromedriver location by

export CHROMEDRIVER=~/chromedriver

Sponsor

rds-cost

Scrape public LinkedIn profile data at scale with Proxycurl APIs.

• Scraping Public profiles are battle tested in court in HiQ VS LinkedIn case.
• GDPR, CCPA, SOC2 compliant
• High rate limit - 300 requests/minute
• Fast - APIs respond in ~2s
• Fresh data - 88% of data is scraped real-time, other 12% are not older than 29 days
• High accuracy
• Tons of data points returned per profile

Built for developers, by developers.

Usage

To use it, just create the class.

Sample Usage

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()

email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/joey-sham-aa2a50122", driver=driver)

NOTE: The account used to log-in should have it's language set English to make sure everything works as expected.

User Scraping

from linkedin_scraper import Person
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5")

Company Scraping

from linkedin_scraper import Company
company = Company("https://ca.linkedin.com/company/google")

Job Scraping

from linkedin_scraper import JobSearch, actions
from selenium import webdriver

driver = webdriver.Chrome()
email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
input("Press Enter")
job = Job("https://www.linkedin.com/jobs/collections/recommended/?currentJobId=3456898261", driver=driver, close_on_complete=False)

Job Search Scraping

from linkedin_scraper import JobSearch, actions
from selenium import webdriver

driver = webdriver.Chrome()
email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
input("Press Enter")
job_search = JobSearch(driver=driver, close_on_complete=False, scrape=False)
# job_search contains jobs from your logged in front page:
# - job_search.recommended_jobs
# - job_search.still_hiring
# - job_search.more_jobs

job_listings = job_search.search("Machine Learning Engineer") # returns the list of `Job` from the first page

Scraping sites where login is required first

  1. Run ipython or python
  2. In ipython/python, run the following code (you can modify it if you need to specify your driver)
from linkedin_scraper import Person
from selenium import webdriver
driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver, scrape=False)
  1. Login to Linkedin
  2. [OPTIONAL] Logout of Linkedin
  3. In the same ipython/python code, run
person.scrape()

The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the browser and it won't affect your profile views. Then when you run person.scrape(), it'll scrape and close the browser. If you want to keep the browser on so you can scrape others, run it as

NOTE: For version >= 2.1.0, scraping can also occur while logged in. Beware that users will be able to see that you viewed their profile.

person.scrape(close_on_complete=False)

so it doesn't close.

Scraping sites and login automatically

From verison 2.4.0 on, actions is a part of the library that allows signing into Linkedin first. The email and password can be provided as a variable into the function. If not provided, both will be prompted in terminal.

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()
email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver)

API

Person

A Person object can be created with the following inputs:

Person(linkedin_url=None, name=None, about=[], experiences=[], educations=[], interests=[], accomplishments=[], company=None, job_title=None, driver=None, scrape=True)

linkedin_url

This is the linkedin url of their profile

name

This is the name of the person

about

This is the small paragraph about the person

experiences

This is the past experiences they have. A list of linkedin_scraper.scraper.Experience

educations

This is the past educations they have. A list of linkedin_scraper.scraper.Education

interests

This is the interests they have. A list of linkedin_scraper.scraper.Interest

accomplishment

This is the accomplishments they have. A list of linkedin_scraper.scraper.Accomplishment

company

This the most recent company or institution they have worked at.

job_title

This the most recent job title they have.

driver

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

For example

driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver)

scrape

When this is True, the scraping happens automatically. To scrape afterwards, that can be run by the scrape() function from the Person object.

scrape(close_on_complete=True)

This is the meat of the code, where execution of this function scrapes the profile. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other profiles are desired, then you might want to set that to false so you can keep using the same driver.

Company

Company(linkedin_url=None, name=None, about_us=None, website=None, headquarters=None, founded=None, company_type=None, company_size=None, specialties=None, showcase_pages=[], affiliated_companies=[], driver=None, scrape=True, get_employees=True)

linkedin_url

This is the linkedin url of their profile

name

This is the name of the company

about_us

The description of the company

website

The website of the company

headquarters

The headquarters location of the company

founded

When the company was founded

company_type

The type of the company

company_size

How many people are employeed at the company

specialties

What the company specializes in

showcase_pages

Pages that the company owns to showcase their products

affiliated_companies

Other companies that are affiliated with this one

driver

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

get_employees

Whether to get all the employees of company

For example

driver = webdriver.Chrome()
company = Company("https://ca.linkedin.com/company/google", driver=driver)

scrape(close_on_complete=True)

This is the meat of the code, where execution of this function scrapes the company. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other companies are desired, then you might want to set that to false so you can keep using the same driver.

Contribution

Buy Me A Coffee

linkedin_scraper's People

Contributors

adrian0350 avatar alex-bujorianu avatar amirsharif avatar anirudrachoudhury avatar arafatkatze avatar austyns avatar bymi15 avatar cafehaine avatar danieledatamasters avatar davidcuentasmar avatar directroot avatar ednavivianasegura avatar hatala91 avatar isaacjoy avatar joeyism avatar josephlimtech avatar jqueguiner avatar lusifer021 avatar rrohjansrsm avatar rui-long avatar sabhay avatar stanvanrooy avatar swapnilsoni1999 avatar thanasis-com avatar turreted avatar zavenzareyan-da avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linkedin_scraper's Issues

no such element: Unable to locate element: {"method":"class name","selector":"name"}

As of today, if I run this:

company = Company("https://ca.linkedin.com/company/google")

This comes back:

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"name"}
  (Session info: chrome=64.0.3282.167)
  (Driver info: chromedriver=2.26.436382 (70eb799287ce4c2208441fc057053a5b07ceabac),platform=Linux 4.13.0-37-generic x86_64)

no such element: Unable to locate element

I used the code someone uploaded
**I noticed if the profile name is something like abcfss1 then it works fine. If the profile name is like
abc_fss_1 then it shows the below error
for example if my profile is https://www.linkedin.com/in/trinanjan_saha_216751116/ it doesn't work.
**

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".pv-top-card-v3"}
(Session info: chrome=77.0.3865.90)
Screenshot from 2019-09-22 13-55-05

Scraper

I tried to Install Linkedin_user_scrapper
I'm facing following error.

2017-11-25

facing location problem when scraping person

When I run this code

from linkedin_scraper import Person, actions, Company
from selenium import webdriver
driver = webdriver.Chrome()
email = "my email"
password = "my password"
actions.login(driver, email, password) 
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5",driver=driver)

It comes back

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".pv-top-card-v3"}

Maybe linkedin has upgrade their website

Company scraping blocked by authwall

I'm trying to scrape LinkedIn public Company profile.
But on the first attempt, I get blocked by an authwall page: https://www.linkedin.com/authwall?trk=gf...
How can I bypass this authwall?

In the console I get the following exception:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".name"}

NoSuchElementException

Hello,

I almost got this to work. But in the end I get this error.

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"class name","selector":"name"}
  (Session info: chrome=66.0.3359.181)
  (Driver info: chromedriver=2.38.552518 (183d19265345f54ce39cbb94cf81ba5f15905011),platform=Mac OS X 10.13.1 x86_64)

What can I do to fix this?

How do I write into text file or csv file?

Here's what I'm doing...
driver - webdriver.Chrome()
person = Person("http://www.linkedin.com/in/randomperson", driver = driver, scrape=False)
f = open('scrape.txt', 'w')
f.write(person.scrape())

But I'm getting an error..
TypeError: write() argument must be str, not None
How do I convert person object into string and append into text file, or better yet, csv file?

Trying to write scraped employees of a company

Apologies if this is a simple question, but is there a way to print the list of names from get_employees? It seems to be scrolling through the pages and gathering them, but I can't get them to display.

Thanks!

Module error

I followed the instructions

pip3 install --user linkedin_scraper

and created a file

from linkedin_scraper import Person person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5")

python version

Python 2.7.15rc1

but im getting the error

Traceback (most recent call last): File "./linkedin_scraper_1.py", line 6, in <module> from linkedin_scraper import Person ImportError: cannot import name Person

after i run pip freeze i get linkedin-scraper==2.3.0.... its installed. am I missing something?

IndexError: list index out of range Company Scraper

Following error by using ver. 2.5.2 as well. Now its not giving the exact line as well where the error is happening

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-10-1fef8da42f7f> in <module>
    ---> 12 company = Company(linkedin_url = "https://www.linkedin.com/company/google",driver=driver,close_on_complete=False,get_employees=False)
     13 set_trace()
     14 print(company)

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in __init__(self, linkedin_url, name, about_us, website, headquarters, founded, company_type, company_size, specialties, showcase_pages, affiliated_companies, driver, scrape, get_employees, close_on_complete)

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in scrape(self, get_employees, close_on_complete)

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in scrape_logged_in(self, get_employees, close_on_complete)

IndexError: list index out of range

Linkedin profile Summary and Country

it will be more than helpful if linkedin_scraper scrape Summary and country
#person.country
person.country=driver.find_element_by_class('pv-top-card-section__location Sans-17px-black-55%-dense mt1 inline-block').getText()
#Click
driver.find_element_by_xpath('//*[@id="ember1506"]/button/span[1]').click()
#person.summary
person.summary=driver.find_element_by_xpath('pv-top-card-section__location Sans-17px-black-55%-dense mt1 inline-block')).getText().strip(" \n\t\r")

NUMB QUESTION AND NOT AN ISSUE, PLEASE HELP

I followed all procedures and scraping was done successfully as the Chrome closed on its own after scraping but I have no idea on how I could retrieve the scraped data. Please don't descend on me with anger because I am new on this data scraping and I really want to learn, please help me out.
HOW DO I RETRIEVE SCRAPED DATA, POSSIBLY SAVE AS .CSV FILE?

I have actually gone through the earlier issues before posting but non relate to my newbie questions.
Thanks and honestly I admire your works.

Data not scraped

Hey,
the script don't scrape website, company size and all thouse informations.

I don't know why but i can get the description
here is an exemple
capture d ecran 2018-07-16 a 22 46 20

Scrape list of connections

Is there any capability to scrape the list of connections for a given person? Or any pointers on how such capability might be extended into this codebase?

I'm hoping to use this to find connections-in-common between two given persons.

Is that possible to include all the professional experiences?

When I try to scrape the LinkedIn profile of myself, I find that only 5 professional experience and be scrapped. Is that possible to scrape all?
In addition, I got the professional experiences twice since I have profiles in two languages. Can I only scrape the English one?

Scrape list of Skills

In the latest version of the LinkedIn website, there does not seems te be a directory of Skills anymore. Could this be added to this library in any way?

Does Linkedin allows to scrap for keyword in "Licenses & certifications"?

I'm currently studying to obtain a Microsoft MCSA certification.

I would like to scrap LinkedIn in order to understand how many people in the World own a MCSA certification.

From your documentation it looks like there is no way to dig for a keyword into "Licenses & certifications", is that right?
Is that because LinkedIn does not allow us to search for it?
Or is this simply not covered by your library?

Experiences scraped None

Hello
after trying to scrape someone's profile experiences was None
like this
[Company Name
Synovus at None from None to None for None based at None, Controller at None from None to None for None based at None, CPA - Audit at None from None to None for None based at None]

Error Running Test Class

How to replicate the error:
1) Go to linkedin_scrapper\test
2) py scrape_person.py
No module name linkedin_user_scraper (I'm using python version 3)

How to scrape list of urls

Hi I am trying to scrape linkedin profiles , i have urls of profile
for l in link:
try:
person = Person(l, driver=driver, scrape=False)
person.scrape(close_on_complete=False)
But every time i am receiving same profile , what's the issue

IndexError when scraping certain companies

Forgive me if this is an error on my part, but for most company urls, when I try to scrape I get an IndexError for various lines under scrape_logged_in (also reloads each company page 2-3 times). Sample code and error below (first company works, second doesn't):

Traceback (most recent call last): File "LItest.py", line 16, in <module> company = Company("https://www.linkedin.com/company/archimedes-rx/about", driver=driver, get_employees=False, close_on_complete=False) File "C:\Users\n****\AppData\Local\Programs\Python\Python37\lib\site-packages\linkedin_scraper\company.py", line 69, in __init__ self.scrape(get_employees=get_employees, close_on_complete=close_on_complete) File "C:\Users\n****\AppData\Local\Programs\Python\Python37\lib\site-packages\linkedin_scraper\company.py", line 79, in scrape self.scrape_logged_in(get_employees = get_employees, close_on_complete = close_on_complete) File "C:\Users\n****\AppData\Local\Programs\Python\Python37\lib\site-packages\linkedin_scraper\company.py", line 156, in scrape_logged_in self.specialties = "\n".join(values[-1].text.strip().split(", ")) IndexError: list index out of range

from linkedin_scraper import Company, actions
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
email = "MYEMAIL"
password = "MYPASSWORD"
actions.login(driver, email, password)
company = Company("https://www.linkedin.com/company/archimedes-rx", driver=driver, get_employees=False, close_on_complete=False)
companytwo = Company("https://www.linkedin.com/company/life360", driver=driver, get_employees=False)
print(companytwo)

NameError: name 'Person' is not defined

i already install it by pip

when i'm try to run

> from linkedin_user_scraper import scraper
> > rick_fox = scraper.Person("https://www.linkedin.com/in/rifox?trk=pub-pbmap")
> > iggy = scraper.Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5") 

it show me this error :

NameError: name 'Person' is not defined

Index out of range

I was testing out this script but I couldn't get it going because of

self.about_us = grid.find_elements_by_tag_name("p")[0].text.strip()
IndexError: list index out of range

Any comments on how to overcome this issue?

Better Way to Get Person's Job Title and Company

I'm currently using regex to extract information from the person.experiences field

user_company_info = re.findall(
    r"b\'(.+?)\'", str(person.experiences[0])
)

However, this method is unreliable. Since this is a fairly important feature, I was wondering if linkedin_scraper had a builtin method to handle this. If not, I'd be happy to work on its implementation.

invalid session id

I run into an 'invalid session id' error from selenium each time I try to use this scraper. I use the following code:

from linkedin_scraper import Person, actions
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
driver = webdriver.Chrome(chrome_options=options, executable_path="C:/Program Files (x86)/Google/Chrome/Application/chromedriver.exe", )
email = [email protected]
password = mypassword  
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver, scrape=True)

Person(linkedin_url=None, experiences=[], educations=[], driver=driver, scrape=True)
person.scrape()

Where [email protected] is subbed for my real email address and password for my real password

When I run the script, Selenium succesfully opens a browser, goes to Linkedin.com, logs in and goes to the page of andre iguodala. Then, I get the following error:

C:\Users\jbennekom\Desktop\linkedin_scraper-master\linkedin_scraper\testje.py:22: DeprecationWarning: use options instead of chrome_options
driver = webdriver.Chrome(chrome_options=options, executable_path="C:/Program Files (x86)/Google/Chrome/Application/chromedriver.exe", )
Traceback (most recent call last):
File "C:\Users\jbennekom\Desktop\linkedin_scraper-master\linkedin_scraper\testje.py", line 28, in
Person(linkedin_url=None, experiences=[], educations=[], driver=driver, scrape=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\linkedin_scraper\person.py", line 36, in init
driver.get(linkedin_url)
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSessionIdException: Message: invalid session id

I run Chrome 76 with a chromedriver that's compatible with Chrome 76.

[New feature] Collecting info about companies and employed people

Hi,
I would ask you to improve your library to collecting information about companies. List of companies should be read from external CSV file. On output we should get info about company, below is image with marked required sections:
image
image
image

If it is possible it would by great to get also list of employed people (only names and surnames). Output should be save in JSON format - one file on each companies list.
It is possible, how you thing @joeyism??

ERROR: Could not find a version

I am unable to download this package. I tried:

conda install linkedin_scraper
pip install linkedin_scraper
pip3 install --user linkedin_scraper

These are the errors I am seeing:

ERROR: Could not find a version that satisfies the requirement request (from linkedin_scraper) (from versions: none)
ERROR: No matching distribution found for request (from linkedin_scraper)

@joeyism, can't wait to try it out 😁

Only name and surname scraped?

Maybe I'm using it wrong, but I wasn't able to retrieve more info than person's name and surname.

Code used:

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()

email = "my-login"
password = "my-pass"
actions.login(driver, email, password)
person = Person("https://www.linkedin.com/in/rbranson/", driver=driver, scrape=False)
person.scrape(close_on_complete=True)
print(person)

It prints out just:

Richard Branson

Experience
[b'Founder' at None from None to None for None based at None]

Education
[]

Interest
[]

Accomplishments
[]

Do I access the data in a wrong way?

NoSuchElementException

when i run (for some people -- but not all):

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome('/Applications/chromedriver')

email = "email"
password = "password"
actions.login(driver, email, password) 

person = Person(linkedin_url='https://www.linkedin.com/in/isabellez/', driver=driver, scrape=False)
person.scrape(close_on_complete=True)

i get:

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".pv-entity__secondary-title"}
  (Session info: chrome=80.0.3987.149)

has linkedin changed its code again?

Running on Company page fails

When running on a company page (i.e. Google example) I'm getting this error. How can this be resolved?

Traceback (most recent call last):

  File "scrape_person.py", line 11, in <module>
    company = Company("https://linkedin.com/company/first-data-corporation", driver=driver)
  File "/Users/username/Library/Python/3.7/lib/python/site-packages/linkedin_scraper/company.py", line 69, in __init__
    self.scrape(get_employees=get_employees, close_on_complete=close_on_complete)
  File "/Users/username/Library/Python/3.7/lib/python/site-packages/linkedin_scraper/company.py", line 79, in scrape
    self.scrape_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
  File "/Users/username/Library/Python/3.7/lib/python/site-packages/linkedin_scraper/company.py", line 140, in scrape_logged_in
    _ = WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.XPATH, '//h1[@dir="ltr"]')))
  File "/Users/username/Library/Python/3.7/lib/python/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: 

AttributeError: 'NoneType' object has no attribute 'click'

image

Here is the screenshot of the code. I am trying to do a company scrape but it looks like when I search for 'page_member_main_nav_about_tab' after inspecting the HTML, it does not find anything with that tag name. Any thoughts? Was running totally fine about 3 weeks ago, but perhaps something has since changed with LinkedIn's HTML structure.

Keeps refreshing last page of employee scrape

I'm trying to scrap a list of employees, but when it reaches the last page it will keep looping through it and never complete.

It looks like it has to with the loop continuing while the next element exists. It becomes unclickable on the last page, however.

unknown error: DevToolsActivePort file doesn't exist

I'm trying to run the "SAMPLE USAGE" on a headless server (Ubuntu 20.04) - however, the code crashes with:

Traceback (most recent call last):
  File "scraper.py", line 5, in <module>
    driver = webdriver.Chrome()
  File "/home/scraper/.local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 76, in __init__
    RemoteWebDriver.__init__(
  File "/home/scraper/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "/home/scraper/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/home/scraper/.local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/scraper/.local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: crashed.
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /snap/chromium/current/command-chromium.wrapper is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

Not logging in

Hey there,

Just wondering, I tried the intro code for scraping a profile but the logging in does not working in headless chrome.

Is just adds my email + password and then hangs without submitting.

Appreciate any help.

Isaac

Cannot use get_employees list index out of range

Getting the below error when running this line:
company = Company(linkedin_url='https://www.linkedin.com/company/addmin/', driver=driver, scrape=True)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-25-898c77594b17> in <module>
      3 password = credentials['password']
      4 actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
----> 5 company = Company(linkedin_url='https://www.linkedin.com/company/addmin/', driver=driver, scrape=True, close_on_complete=True)

~/opt/anaconda3/lib/python3.7/site-packages/linkedin_scraper/company.py in __init__(self, linkedin_url, name, about_us, website, headquarters, founded, company_type, company_size, specialties, showcase_pages, affiliated_companies, driver, scrape, get_employees, close_on_complete)
     67 
     68         if scrape:
---> 69             self.scrape(get_employees=get_employees, close_on_complete=close_on_complete)
     70 
     71     def __get_text_under_subtitle(self, elem):

~/opt/anaconda3/lib/python3.7/site-packages/linkedin_scraper/company.py in scrape(self, get_employees, close_on_complete)
     77     def scrape(self, get_employees = True, close_on_complete = True):
     78         if self.is_signed_in():
---> 79             self.scrape_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
     80         else:
     81             self.scrape_not_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)

~/opt/anaconda3/lib/python3.7/site-packages/linkedin_scraper/company.py in scrape_logged_in(self, get_employees, close_on_complete)
    146 
    147         self.name = driver.find_element_by_xpath('//span[@dir="ltr"]').text.strip()
--> 148         navigation.find_elements_by_xpath("//a[@data-control-name='page_member_main_nav_about_tab']")[0].click()
    149 
    150         _ = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.TAG_NAME, 'section')))

IndexError: list index out of range

Failing to run

For the life of me, I cannot get the scraper to run for a Person, may I please get the watered down version of the instructions if it's not too much to ask

IndexError: list index out of range - Company Scraper Issues

#Wrote this basic code to scrape the company profile
from selenium import webdriver
from linkedin_scraper import Company,Person, actions
from IPython.core.debugger import set_trace
import time
driver=webdriver.Chrome(executable_path=r"C:\bin\chromedriver.exe")
#driver.get('http://www.google.com')
email = ""
password = ""
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
company = Company("https://www.linkedin.com/company/infosys",driver=driver)
print(company)

Getting "IndexError: list index out of range" error in the following line:
navigation.find_elements_by_xpath("//a[@data-control-name='page_member_main_nav_about_tab']")[0].click()
Any help please to fix the same?


IndexError Traceback (most recent call last)
in
10 #person = Person("https://www.linkedin.com/in/mohitagarwal/", driver=driver, scrape=False)
11 #person.scrape(close_on_complete=True)
---> 12 company = Company("https://www.linkedin.com/company/infosys",driver=driver)
13 print(company)
14 #set_trace()

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in init(self, linkedin_url, name, about_us, website, headquarters, founded, company_type, company_size, specialties, showcase_pages, affiliated_companies, driver, scrape, get_employees, close_on_complete)
67
68 if scrape:
---> 69 self.scrape(get_employees=get_employees, close_on_complete=close_on_complete)
70
71 def __get_text_under_subtitle(self, elem):

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in scrape(self, get_employees, close_on_complete)
77 def scrape(self, get_employees = True, close_on_complete = True):
78 if self.is_signed_in():
---> 79 self.scrape_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
80 else:
81 self.scrape_not_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)

~\AppData\Roaming\Python\Python38\site-packages\linkedin_scraper\company.py in scrape_logged_in(self, get_employees, close_on_complete)
146
147 self.name = driver.find_element_by_xpath('//span[@dir="ltr"]').text.strip()
--> 148 navigation.find_elements_by_xpath("//a[@data-control-name='page_member_main_nav_about_tab']")[0].click()
149
150 _ = WebDriverWait(driver, 3).until(EC.presence_of_all_elements_located((By.TAG_NAME, 'section')))

IndexError: list index out of range

Scraper times out and doesn't find HTML it's looking for

I managed to get the script to run and logged in before scraping as pointed out in the missing element issue, but then I get a timeout when trying to actually scrape the page:

File "company.py", line 140, in scrape_logged_in
_ = WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.XPATH, '//h1[@dir="ltr"]')))
File "wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

The page loads and waits a bit before that happens, and I'm guessing what is going on is that it's not finding the HTML element it's looking for. Could it have changed on Linkedin's end?

Chromedriver

Hi joeyism,

Could you kindly explain more about: "export CHROMEDRIVER=~/chromedriver"? It will be really helpful for a beginner like me.

Thank you so much!
Shen

Company Scrapping not working.

**Traceback (most recent call last):
File "C:\Users\LAksh\Desktop\test.py", line 14, in
person = Company("https://www.linkedin.com/company/qubole/", driver=driver)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\linkedin_scraper\company.py", line 69, in init
self.scrape(get_employees=get_employees, close_on_complete=close_on_complete)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\linkedin_scraper\company.py", line 79, in scrape
self.scrape_logged_in(get_employees = get_employees, close_on_complete = close_on_complete)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\linkedin_scraper\company.py", line 194, in scrape_logged_in
self.employees = self.get_employees()
File "C:\Users\LAksh\Anaconda3\lib\site-packages\linkedin_scraper\company.py", line 131, in get_employees
_ = WebDriverWait(driver, wait_time).until(EC.visibility_of(res))
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\support\wait.py", line 71, in until
value = method(self._driver)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\support\expected_conditions.py", line 144, in call
return _element_if_visible(self.element)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\support\expected_conditions.py", line 148, in _element_if_visible
return element if element.is_displayed() == visibility else False
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py", line 488, in is_displayed
self)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 636, in execute_script
'args': converted_args})['value']
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\LAksh\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: headless chrome=83.0.4103.116)

**

Error details
code

import os
from linkedin_scraper import Person, actions, Company
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome("C:\\Users\\LAksh\\Downloads\\chromedriver_win32\\chromedriver", options=chrome_options)
driver.set_window_size(1920, 1080)


email = ""
password = ""
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Company("https://www.linkedin.com/company/qubole/", driver=driver)
print(person)

Error: "export CHROMEDRIVER=~/chromedriver"

Hello,

I'm trying to run this in Jupyter Notebooks. When I enter "export CHROMEDRIVER=~/chromedriver" in an cell and run it, I get the following error message:

export CHROMEDRIVER=~/chromedriver File "<ipython-input-1-f9de4c739fa9>", line 1 export CHROMEDRIVER=~/chromedriver ^ SyntaxError: invalid syntax

Could someone help me solve this? Please keep in mind that I am new to this. Thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.