Git Product home page Git Product logo

google-it's People

Contributors

92bondstreet avatar connorlanglois avatar dependabot[bot] avatar greenkeeper[bot] avatar hawkeye64 avatar kua-as-exe avatar mohamed3on avatar patneedham avatar t-rekttt avatar vladimirmikulic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

google-it's Issues

Access the 'Top stories' (and `${search term} on Twitter`) section which occasionally appears in the search results

Searching for 'amazon hq2' shows a 'Top stories' section at the top, which should be included in the googleIt results

image

${search term} on Twitter also appears underneath, but with a title and link like other search results. That indicates the 'snippet' property of googleIt results might need an array of results (each with its own title, link, and snippet), instead of always being plain text.

Titles are undefined

Details
Version: 1.2.0
NPM version: 6.10.3
Node version: 12.8.0

Bug Info
As the title says, when I search for something using google-it, sometimes I get links, but without titles in response object.

[
  {
    title: undefined,
    link: <some link>
  }
]

Why this happens?

Our systems have detected unusual traffic from your computer network.

Hi,

I'm using google-it in my project. I'm searching lots of query in a small time. After some time i got this error:

Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. Why di d this happen?

How could i pass it ?
Thanks.

Allow providing existing HTML code

In my use case, I would like to be able to reuse the results of an existing request to Google to feed into google-it for parsing. (My tool would be more efficient if it could use the HTML response from Google that has already been received for parsing.) Thanks!

Disable console logs?

Is there a way to disable console when requesting...
For an example

const google = require('google-it');
google({ query: query }); // this should return The title. The link and The description
// I was wondering if there's a way to disable that like...
google({ query: query, disable-console: true }); 

Requesting to add feature to use proxy for google search

Hi,

Thanks for creating such an useful script. It save me a lot of time using the urls only feature.

Google search may block an IP or put robot captcha if it is used more frequently for grabbing thousands of urls, so a proxy support will be very useful. Something like this:

  • $ google-it --query="keywords" --only-urls --limit=100 --proxy=proxies.txt --proxytype=socks4 (this is more preferred)

  • if possible to add a txt file for queries, for example:

$ google-it --query=list.txt --only-urls --limit=100 --proxy=proxies.txt --proxytype=https

  • Also, a txt file for urls only as output support:

$ google-it --query=list.txt --only-urls --limit=1000 --proxy=proxies.txt --proxytype=socks4 -o result.txt

It will be greatly appreciated if you could add these features.

Best Regards

JSON is missing actual search results

Version of google-it

Versions of npm and node
yarn 1.22.5
node v12.16.0

Describe the bug
The returned output has a ,, in place of the search criteria used, for instance "handling assets". If either of these words are found, or simularities (like asset), then double ,, is seen in place (this is happening because of undefined from the mapping, when there is further children)

Expected behavior
To have the actual data

Screenshots
I am using google-it for a Discord bot to query our docs site for convenience of users
image
Notice in the image, there is text ,, which is where the match for the search criteria would be.

Additional context
Diving into the code, I see in the getSnippet(elem) method an issue.

function getSnippet(elem) {
  return elem.children.map(function (child) {
    if (!child.data) {
      return child.children.map(function (c) {
        return c.data;
      });
    }

    return child.data;
  }).join('');
}

The return c.data; is undefined where this matching data should be. But, the c var has children (c.children) that when expanded, c.children[0].data does have the matching searched criteria. I suppose Google has done it this way so they can highlight the searched criteria, but google-it is failing to pick it up.
As you can see by this screenshot:
Screenshot from 2020-10-29 11-43-51

Cannot use programmatically

Version of google-it
-- [email protected]

Versions of npm and node
6.14.13
v14.17.0

Describe the bug
Cannot use it programmatically. Require command throws errors.

Welcome to Node.js v14.17.0.
Type ".help" for more information.
> const g = require('google-it')
Uncaught:
Error: UNKNOWN: unknown error, lstat 'C:\test\node_modules\htmlparser2\lib'
    at Object.realpathSync (fs.js:1796:7)
    at toRealPath (internal/modules/cjs/loader.js:349:13)
    at tryFile (internal/modules/cjs/loader.js:345:10)
    at tryPackage (internal/modules/cjs/loader.js:301:16)
    at Function.Module._findPath (internal/modules/cjs/loader.js:521:18)
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:872:27)
    at Function.Module._load (internal/modules/cjs/loader.js:730:27)
    at Module.require (internal/modules/cjs/loader.js:957:19)
    at require (internal/modules/cjs/helpers.js:88:18) {
  errno: -4094,
  syscall: 'lstat',
  code: 'UNKNOWN',
  path: 'C:\\test\\node_modules\\htmlparser2\\lib'
}
> console.log(g)
Uncaught ReferenceError: g is not defined

Expected behavior
Require should instantiate g withour errors.

Not working with ubuntu

Version of google-it
If you have it installed globally, please run npm list google-it -g in your command prompt of choice, and that should result in an output that looks like this:

If you don't have it installed globally, simply remove the -g flag from that command.

Versions of npm and node
Please run npm --version and node --version and paste in the results.
npm --version
9.5.0

node -v
v19.7.0

Describe the bug
A clear and concise description of what the bug is. At a minimum, please include you how used the google-it library (either on command line or within a node program).
Not working on ubuntu (I can access google using the same IP address )

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.
image

Additional context
Add any other context about the problem here.
ubuntu@mregyptian ~> google-it --query="Latvian unicorn"
⠙ Loading results

Error: Error in response: statusCode 429. To see the raw response object, please include the 'diagnostics: true' as part of the options object (or -d if using command line)
at /usr/local/lib/node_modules/google-it/lib/googleIt.js:267:16
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
⠸ Loading results⏎ ubuntu@mregyptian ~ [SIGINT]> google-it --query="Latvian unicorn" -d
⠹ Loading results

ubuntu@mregyptian ~> google-it -d --query="Latvian unicorn"
⠼ Loading results

ubuntu@mregyptian ~>
ubuntu@mregyptian ~> google-it -d --query="Latvian unicorn"
⠹ Loading results

ubuntu@mregyptian ~> google-it --query="Latvian unicorn"
⠸ Loading results

Error: Error in response: statusCode 429. To see the raw response object, please include the 'diagnostics: true' as part of the options object (or -d if using command line)
at /usr/local/lib/node_modules/google-it/lib/googleIt.js:267:16
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
⠙ Loading results⏎

approximateResults/resultStatsSelector

Capture

Anyone have code examples to get this to work? I got as far as the screenshot above. I noticed the NULL values in my stats section,
so my instinct is I haven't set a parameter way before reading the STATS response in my code.

THANKS

No results on searching

Hi,

Looks like google-it has stoped working in the last 2 days. Tried on a number of devices and internet connections and no results are being returned but no error messages are being thrown.

Steps to reproduce:

  1. sudo npm i -g google-it
  2. google-it --query="Facebook"

Thanks,
C

Does not reject on status codes !== 200

Version of google-it
1.4.2

Versions of npm and node
npm: 6.13.4
node: 11.6.0

Describe the bug
Use: As an imported package (i.e. not command line)

The main function googleIt does not throw errors when the response from google has status code !== 200. Instead, it returns an empty array. This is because the function googleIt uses the function getResponseBody, but getResponseBody does not throw an error when this occurs. I believe this is due to the way the "request" library works, in that the request library does not set the error parameter in the callback.

Expected behavior
To be able to catch errors where status code !== 200, it would be better to check the status code and reject accordingly. For example, in the callback of the request call of getResponseBody:

if (response.statusCode !== 200) {
    reject(response);
}

This way, with a response code like 429 "Too Many Requests", receiving the response in the catch handler would allow for logging and reporting.

I am not sure what effects this change would have on other code in the library and in dependants. It could be a breaking change.

Screenshots
N/A

Additional context
N/A

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on Greenkeeper branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet.
We recommend using:

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please click the 'fix repo' button on account.greenkeeper.io.

An in-range update of eslint-config-amex is breaking the build 🚨

The devDependency eslint-config-amex was updated from 11.1.0 to 11.2.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

eslint-config-amex is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details

Release Notes for v11.2.0

11.2.0 (2020-02-07)

Features

  • index: disable jsx-one-expression-per-line (#8) (68e6d48)
  • rules: no-extra-parens (#9) (a3afc72)
Commits

The new version differs by 10 commits.

  • 662aad7 chore(release): 11.2.0 [skip ci]
  • 89f41ec chore(release): enable semantic-release
  • a3afc72 feat(rules): no-extra-parens (#9)
  • 68e6d48 feat(index): disable jsx-one-expression-per-line (#8)
  • 4dbe9cf chore(lockfile-lint): add lockfile-lint script
  • 58322cf chore(npm): add package lock file (#5)
  • 0f45501 Merge pull request #4 from americanexpress/update-readme-image
  • 359ccf0 docs(readme): center image
  • 9cc0d24 docs(readme): update layout and security file (#3)
  • 63d2840 chore(commitlint): add commitlint and husky (#2)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

How to increase limit?

Currently max query result is 10. Anyway to increase that to 1000+ for the same query?

Multiple pages

Hello, only 6 results get saved to the json, how can I make it save all of them? When I search it manually there are about 3,500 results.

Google-it fails when Google returns different HTML formats

Environment

  • node: v16.15.1
  • npm: 8.11.0
  • google-it: 1.6.3

Issue

I am suffering from a problem where using google-it fails about 50% of the time.

After investigation, I assume the following results are the root cause.
Look at the following HTML pattern

success:
https://github.com/ryudenx/google-it_fail/blob/master/takeda_success.html

fail:
https://github.com/ryudenx/google-it_fail/blob/master/takeda_fail.html

These have different HTML format/design.
Despite the same URL, same User-Agent, same IP, Google seems to return different HTML.

If possible, could you please fix this problem?
Or could you support the failing HTML format?

Reproduce

Reproduced code:

const { setTimeout } = require('timers/promises');
const googleIt = require('google-it')

async function main() {
    while (true) {

        googleIt({ 'query': "takeda", 'no-display': true, 'limit': 10 }).then(results => {

            //check
            if (results.length > 1) {
                console.log("success")
            } else {
                console.log("fail")
            }

        }).catch(e => {
            console.log(e)
        })

        await setTimeout(5000);
    }
}

main()

log (example):

$ node ./test.js
success
success
success
fail
success
fail
fail

Thank you for your cooperation.

Title and links are out of sync with this query "spotify music vs apple"

Version of google-it

1.6.2

Versions of npm and node
Please run npm --version and node --version and paste in the results.

% npm --version
7.10.0
% node --version
v16.0.0

Describe the bug

The result's titles/links are out of sync. with this query:

% ./node_modules/.bin/google-it --query="spotify music vs apple"
⠙ Loading results

Apple Music vs. Spotify: The best music streaming service for you ...
https://www.soundguys.com/apple-music-vs-spotify-36833/?sa=X&ved=2ahUKEwi19PKWrqTwAhXQFIgKHZBFCQUQ9QF6BAgIEAI
While ,Apple Music, may have more content, Spotify's music, catalog is still extensive with over 50 million ,songs,, with around 40,000 being added everyday. Unlike ,Apple Music,, ,Spotify, also offers podcasts, with over 700,000 currently on the platform.Mar. 23, 2021


Apple Music vs Spotify (2021) - YouTube
https://www.cnet.com/news/apple-music-versus-spotify-best-music-podcasts-streaming-service-price-catalog-features-plans-compared/
Aug. 3, 2020, · ,Spotify says it has a catalog of over 50 million songs while Apple Music tops 60 million. Both offer early access to certain albums from time to time ...


Apple Music vs. Spotify | Digital Trends
https://www.youtube.com/watch?v=l4QZej2Ae4A
Mar. 27, 2021, · ,The battle of the kings of music streaming services is here. Today we will be comparing Apple ...Duration: ,14:47,,Posted: ,Mar. 27, 2021

As you can see, the second result is from YouTube but the link is not.

Expected behavior

The titles and the links should be in sync.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Result title is undefined always

google-it version: 1.6.2
node version: 14.15.3
npm version: 6.14.9

Describe the bug
Result title is always undefined

Expected behavior
Should return the correct result title

Screenshots
image

Screenshot covers all the steps from installation and then running the query.
If you need more information, please let me know.

Options array object not being read correctly

Version of google-it
1.6.2

Versions of npm and node
npm 7.6.3
node v12.4.0

Describe the bug
Attempting to use options in code, as follows:

const  googleIt = require('google-it')

const options = {
  'limit': 3
};
googleIt({options, 'query': 'covfefe irony'}).then(results => {
  // access to results object here
}).catch(e => {
  // any possible errors that might have occurred (like no Internet connection)
})

Expected behavior
I expect only the first three results to be returned and displayed in the console. Instead the default 10 appears.

Additional context
I have also tried other options such as 'no-display': true and those listed in optionsDefinitions.js, to no avail (e.g. 'no-display': true continues to display the results in the console). Oddly, the only option that seems to work is 'proxy:' (as per the readme example), and that is not one of the options listed in optionsDefinitions.js.

I have found, however, that putting the options directly in the command path does work, e.g.:

googleIt({'limit': 3, 'no-display': true, 'query': 'covfefe irony'}).then(results => {
  // access to results object here
}).catch(e => {
  // any possible errors that might have occurred (like no Internet connection)
})

The code behaves as expected when written in this form. My guess is that it has something to do with how the options are unpacked inside the googleIt function.

Other Language

Hello,
Is this possible to change language results?. Like force to use a language

Issue with retrieving results now

Google made an update which makes all the selectors not work now by randomizing the div class names, is there any full proof solution to getting results? Because also on different versions of Google like russia's mobile site they use a different structure for results so it won't parse correct results consistently

You can see here every div class g is randomized by google to prevent scraping
image

error with title

Version of google-it
@1.6.3

Versions of npm and node
npm: 8.11.0 || node: v17.9.1

Describe the bug
results titles always return undefined for me

Allow extracting statistics

I would like to be able to programmatically access search stats via google-it (basically retrieving and interpreting the "Page 4 of about 37,700,000 results (0.67 seconds)" text), especially the result count and search time. The div id is "resultStats". Thanks!

Stopped working

I was using this for quite a while, but it suddenly stopped giving me an array when I would call the results please fix

Undefined error

Version of google-it
[email protected]
Versions of npm and node
npm: 6.11.3
node: 8.11.3
Describe the bug
When I run the command
google-it --query="test"
it only returns undefined.

Expected behavior
google searches of test

Screenshots
errorWithGoogle-it

Error in response: statusCode 429

Version of google-it
1.6.1

Versions of npm and node
npm - 6.14.6
node - v12.18.4

Describe the bug
I was using this module in a NodeJS project (discord.js specifically) and I faced this error:
Error in response: statusCode 429. To see the raw response object, please include the 'diagnostics: true' as part of the options object (or -d if using command line)

Expected behavior
It should have given me results according to the query specified but it didn't.

Additional context
I also enabled the "diagnostics" to see the raw data but it didn't gave me anything. Hope this info is alright for the info.

Feature request: ignoring some domain's results

Hi !!

I use this library these days and love it ❤️
First of all, let me tell you grateful thank you creating this library 👍

When I use this library, sometime I want to ignore some domains which show not useful information.
I think this demand seems like the desire of stackoverflow-github-only option.

The thing we have to consider is where we save the data of ignore-list.
I'm thinking ~/.config is pretty nice but I don't know it's the best.

May I ask what you think about that? 😉

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.