Git Product home page Git Product logo

instaloctrack's Introduction

instaloctrack

TL;DR : ascineema, video of the project

A tool to scrape geotagged locations on Instagram profiles. Output in JSON & interactive map.

requirements

sudo apt install chromium-chromedriver && chmod a+x /usr/bin/chromedriver

🛠️ installation

git clone https://github.com/bernsteining/instaloctrack
cd instaloctrack
pip3 install .

Or use Docker:

sudo docker build -t instaloctrack -f Dockerfile .

Usage

instaloctrack -h
usage: instaloctrack [-h] [-t TARGET_ACCOUNT] [-l LOGIN] [-p PASSWORD] [-v]

Instagram location data gathering tool. Usage: python3 instaloctrack.py -t <target_account>

optional arguments:
  -h, --help            show this help message and exit
  -t TARGET_ACCOUNT, --target TARGET_ACCOUNT
                        Instagram profile to investigate
  -l LOGIN, --login LOGIN
                        Instagram profile to connect to, in order to access
                        the instagram posts of the target account
  -p PASSWORD, --password PASSWORD
                        Password of the Instagram profile to connect to
  -v, --visual          Spawns Chromium GUI, otherwise Chromium is headless

e.g.

instaloctrack -t <target_account>

If the target profile is private and you have an account following the target profile you can scrape the data with a connected session:

instaloctrack -t <target_account> -l <your_account> -p <your_password>

or with Docker:

sudo docker run -v /tmp/output:/tmp/output instaloctrack -t <target_account> -o /tmp/output

⚙️ How it works

First, we retrieve all the pictures links of the account by scrolling the whole Instagram profile, thanks to selenium's webdriver.

Then, we retrieve asynchronously (asyncio) each picture link, we check if it contains a location in the picture description, and retrieve the location's data if there's one, and the timestamp.

  • NB: Since 2018 Instagram deprecated its location API and it's not possible anymore to get the GPS coordinates of a picture, all we can retrieve is the name of the location. (If you can prove me that I'm wrong about this, please tell me!)

Because Instagram doesn't provide GPS coordinates, and we're only given names of places, we have to geocode these (.ie. get the GPS coords from the name's place).

For this, I used Nominatim's awesome API, which uses OpenStreetMap. For our usage, no API key is required, and we respect Nominatim's usage Policy by requesting GPS coordinatess once every second.

Eventually, once we have all the GPS coordinatess, we generate a HTML (thanks to jinja2 templating) with Javascript embedded that plots an Open Street Map (thanks to Leaflet library) with all our locations pinned. Once again, no API key is required for this step.

Also, the data collected by the script (location names, timestamps, GPS coordinates, errors) are dumped to a JSON file in order to be re-used.

Example

As an example, here's the output on the former French President's Instagram profile, @fhollande:

Map of @fhollande's locations on Instagram

The Heatmap:

Heatmap of @fhollande's locations on Instagram

Information available when clicking on a marker:

available data when clicking on a marker

Stats about the location data:

stats about the location data

The JSON data dump (just a part of it to show the format for a given location):

{
    "link": "https://www.instagram.com/p/-Q_9EvR9eu",
    "place": {
      "id": "290297",
      "name": "Musée du quai Branly - Jacques Chirac",
      "slug": "musee-du-quai-branly-jacques-chirac",
      "street_address": " 37 quai Branly",
      " zip_code": " 75007",
      " city_name": " Paris",
      " region_name": " ",
      " country_code": " FR"
    },
    "timestamp": "2015-11-19",
    "gps": {
      "lat": "48.8566969",
      "lon": "2.3514616"
    }
  }

Possible Improvements

  • Cleaner code :D
  • Factorize the geocoding function which is waaay too long and cryptic
  • Use beautifulsoup instead of regex parsing
  • Remove weird blank space caused by progress bar
  • Use other geocoding tools (e.g. https://geo.api.gouv.fr/adresse) than Nominatim when it fails? (specify arg?)
    • Use geopy ?
    • Use Overpass instead of Nominatim ?
  • Add an argument to select only a set of pictures (selected by date, or rank)
  • Time information about the duration of the script

instaloctrack's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

instaloctrack's Issues

error

Traceback (most recent call last):
File "/home/user/.local/bin/instaloctrack", line 8, in
sys.exit(main())
File "/home/user/.local/lib/python3.9/site-packages/instaloctrack/instaloctrack.py", line 536, in main
re.search("([0-9]+) publications",
AttributeError: 'NoneType' object has no attribute 'group'

How to use this on windows 10?

The sudo command to install selenium webdriver is not available on windows so I was wondering if there is a way to use it on windows machines.

Enhancement: GPS locations

Hi, in your blog post you mention that Instagram removed GPS from photos. This is mostly true, but Instagram does list GPS coordinates for the Location that a user selected. It is not a precise location, but it should save time geocoding the data yourself. This may only work if you are logged in to Instagram (it does for me, but I have heard others can get data without logging in).

Example: https://www.instagram.com/p/CXeIWI0IpYh/
If I search for "lat" in the source code it will show the geocoded location (lat/lng) for Rouen.

Screenshot from 2022-02-08 05-41-27

Another alternative, in case the above does not work for you: add ?__a=1 to the post URL and you get JSON that shows the info:

https://www.instagram.com/p/CXeIWI0IpYh/?__a=1
Screenshot from 2022-02-08 05-44-37

Error with the installation

Hi ! I have an issue when I try to install the program. Could you please help me to understand what is happening ? Thanks.

Here is what I have in PowerShell:

PS C:\Users\User\instaloctrack> pip3 install .
Processing c:\users\user\instaloctrack
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [10 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\User\instaloctrack\setup.py", line 4, in
long_description = fh.read()
^^^^^^^^^
File "C:\Users\User\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 376: character maps to
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

chromium-chromedriver package

Hi,
First, thanks for this tool, really interesting.
I just found a problem, the package chromium-chromedriver you ask to install in the requirements can't be find through apt install.
Which package could replace it ?
Thanks for your work !

DeprecationWarning: use options instead of chrome_options

Hi i got this error when i run python3 instaloctrack.py -t or instaloctrack -t

instaloctrack.py:129: DeprecationWarning: use options instead of chrome_options
return webdriver.Chrome("/usr/bin/chromedriver",
Traceback (most recent call last):
File "instaloctrack.py", line 562, in
main()
File "instaloctrack.py", line 536, in main
re.search("([0-9]+) publications",
AttributeError: 'NoneType' object has no attribute 'group'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.