Git Product home page Git Product logo

degoogle's Introduction

degoogle

search Google and extract result urls directly. skip all the click-through links and other sketchiness

contributions welcome


install with pip: pip install degoogle

or

git clone
cd degoogle
pip install .

command line usage script usage
degoogle "query here" make a dg object, execute queries with run()
usage: degoogle [-h] [-o OFFSET] [-p PAGES] [-t TIME_WINDOW] [-j] query

Search and extract google results.

positional arguments:
  query                 search query

optional arguments:
  -h, --help            show this help message and exit
  -o OFFSET, --offset OFFSET
                        page offset to start from
  -p PAGES, --pages PAGES
                        specify multiple pages
  -t TIME_WINDOW, --time-window TIME_WINDOW
                        time window
  -j, --exclude-junk    exclude junk (yt, fb, quora)

note that time window follows a syntax used by google's tbs parameter with the qdr option (read someone explain how it works here)

examples:

  1. find .txt files on .edu sites within the past 3 months:

degoogle "site:edu filetype:txt" -t m3

image

  1. with one dg instance, update query in a loop to perform multiple searches:
from degoogle import dg
degoogler = dg()
queries = ["site:edu", "site:gov", "filetype:txt"]
for query in queries:
	print(query)
	degoogler.query = query
	results = degoogler.run()
	for result in results:
		print(result)
	print()

image

  1. one liner: make dg instance (query set in constructor), search across 5months, format + print results, end:

[print(result['desc'],": ",result['url'],"\n") for result in (dg("intext:'begin rsa' site:*.edu.*",time_window="m5").run())]

image


this is an experiment meant to have benefits for both user privacy and broadened osint capabilities

idea: when you search google normally, your results will appear to be direct links to the target site, but what you're really getting is more like this:

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=3ahUWEwjVn87AHIEHeyMAsIQFjAZegQIWARN&url=https%3A%2F%2Fexample.com%2F&usg=BOvWas1Dcw1x9iNBCBHvWL8rWGgJO4

by using a link like this to access example.com, you are providing more information about yourself than necessary. they don't need to know what is done with the results after they are served; their job is done, the click-through is merely a suggestion.

example.com can identify that google was your referer; the page you're clicking through from includes your search query as part of the URL, so they probably know what you searched for too. whether or not they're looking at it is another question

if you navigate to example.com directly, without a referer like google, example.com knows that you have visited, but not where you came from, and google knows even less - even if you originally found out about example.com by spotting it in a google search at some point.

google will obviously always know that the search took place, and which results were served, but I prefer to access results directly and not give anyone that click-through confirmation.

there are also utility benefits here for dorking - you might find a super juicy link on page 10 with a bunch of strange parameters cached, but when you follow google's click-through you're redirected to an index page or 404 without even having a chance to copy the link from the result to research more.

there are times where visually inspecting a URL is just as valuable as accessing the link. this tool ganks the good stuff, URL decodes and normalizes, and returns just the scraped URL + description

degoogle's People

Contributors

deepseagirl avatar yunginnanet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

degoogle's Issues

Kindly help me out

Traceback (most recent call last):
File "C:\Users\Home\Desktop\test.py", line 1, in
from degoogle import dg
ModuleNotFoundError: No module named 'degoogle'

What's this error?Can help me out?

usage: degoogle.py [-h] [-o OFFSET] [-p PAGES] [-t TIME_WINDOW] [-j] query
degoogle.py: error: the following arguments are required: query

->When i run the code this error pops up!

Introduce support for non-Latin characters

As of now, search results that contain Unicode characters (i.e. most non-Latin languages, e.g. Hindi, Arabic, Greek, Hebrew . . .) as part of the description, domain name (IDNs), or URL. do not render properly.

Example (with Greek characters)
carbon

I've introduced this change in PR #7 to address this. Instead of printing the final_string variable directly, it is now passed to the html.unescape() function from within the print call.

carbon(1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.