patneedham / google-it Goto Github PK

View Code? Open in Web Editor NEW

103.0 103.0 34.0 14.62 MB

command line Google search and save to JSON

JavaScript 100.00%

google-it's People

Contributors

Stargazers

Watchers

google-it's Issues

Access the 'Top stories' (and `${search term} on Twitter`) section which occasionally appears in the search results

Searching for 'amazon hq2' shows a 'Top stories' section at the top, which should be included in the googleIt results

${search term} on Twitter also appears underneath, but with a title and link like other search results. That indicates the 'snippet' property of googleIt results might need an array of results (each with its own title, link, and snippet), instead of always being plain text.

Titles are undefined

Details
Version: 1.2.0
NPM version: 6.10.3
Node version: 12.8.0

Bug Info
As the title says, when I search for something using google-it, sometimes I get links, but without titles in response object.

[
  {
    title: undefined,
    link: <some link>
  }
]

Why this happens?

Our systems have detected unusual traffic from your computer network.

Hi,

I'm using google-it in my project. I'm searching lots of query in a small time. After some time i got this error:

Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. Why di d this happen?

How could i pass it ?
Thanks.

Google adverts to be displayed in the search result

Hi, thanks for a great tool!
I found the search result doesn't include google ads. For me it is an important thing. Is it possible to add such an option?

Allow providing existing HTML code

In my use case, I would like to be able to reuse the results of an existing request to Google to feed into google-it for parsing. (My tool would be more efficient if it could use the HTML response from Google that has already been received for parsing.) Thanks!

google-it always returns empty array [] no matter what the query is

//this code it does not work at: https://npm.runkit.com/google

var googleIt = require("google-it");

googleIt({'query': 'coffee'}).then(results => {
console.log(results); //results(always) = "[]";
}).catch(e => {
console.log(e); //no error is ever caught
});

There is a way to do pagination with this model?

Access to the 'Videos' section which occasionally appears in the search results.

Searching for 'when is it' results in the video section appearing before any results.

Searching for 'where is it' results in the video section appearing after the first result.

The titles and links contained inside these boxes should be included within the googleIt results as well.

Disable console logs?

Is there a way to disable console when requesting...
For an example

const google = require('google-it');
google({ query: query }); // this should return The title. The link and The description
// I was wondering if there's a way to disable that like...
google({ query: query, disable-console: true });

Requesting to add feature to use proxy for google search

Hi,

Thanks for creating such an useful script. It save me a lot of time using the urls only feature.

Google search may block an IP or put robot captcha if it is used more frequently for grabbing thousands of urls, so a proxy support will be very useful. Something like this:

$ google-it --query="keywords" --only-urls --limit=100 --proxy=proxies.txt --proxytype=socks4 (this is more preferred)
if possible to add a txt file for queries, for example:

$ google-it --query=list.txt --only-urls --limit=100 --proxy=proxies.txt --proxytype=https

Also, a txt file for urls only as output support:

$ google-it --query=list.txt --only-urls --limit=1000 --proxy=proxies.txt --proxytype=socks4 -o result.txt

It will be greatly appreciated if you could add these features.

Best Regards

Orther Language (like vietnamese)

can't change language

JSON is missing actual search results

Version of google-it

[email protected]

Versions of npm and node
yarn 1.22.5
node v12.16.0

Describe the bug
The returned output has a ,, in place of the search criteria used, for instance "handling assets". If either of these words are found, or simularities (like asset), then double ,, is seen in place (this is happening because of undefined from the mapping, when there is further children)

Expected behavior
To have the actual data

Screenshots
I am using google-it for a Discord bot to query our docs site for convenience of users

Notice in the image, there is text ,, which is where the match for the search criteria would be.

Additional context
Diving into the code, I see in the getSnippet(elem) method an issue.

function getSnippet(elem) {
  return elem.children.map(function (child) {
    if (!child.data) {
      return child.children.map(function (c) {
        return c.data;
      });
    }

    return child.data;
  }).join('');
}

The return c.data; is undefined where this matching data should be. But, the c var has children (c.children) that when expanded, c.children[0].data does have the matching searched criteria. I suppose Google has done it this way so they can highlight the searched criteria, but google-it is failing to pick it up.
As you can see by this screenshot:

Cannot use programmatically

Version of google-it
-- [email protected]

Versions of npm and node
6.14.13
v14.17.0

Describe the bug
Cannot use it programmatically. Require command throws errors.

Welcome to Node.js v14.17.0.
Type ".help" for more information.
> const g = require('google-it')
Uncaught:
Error: UNKNOWN: unknown error, lstat 'C:\test\node_modules\htmlparser2\lib'
    at Object.realpathSync (fs.js:1796:7)
    at toRealPath (internal/modules/cjs/loader.js:349:13)
    at tryFile (internal/modules/cjs/loader.js:345:10)
    at tryPackage (internal/modules/cjs/loader.js:301:16)
    at Function.Module._findPath (internal/modules/cjs/loader.js:521:18)
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:872:27)
    at Function.Module._load (internal/modules/cjs/loader.js:730:27)
    at Module.require (internal/modules/cjs/loader.js:957:19)
    at require (internal/modules/cjs/helpers.js:88:18) {
  errno: -4094,
  syscall: 'lstat',
  code: 'UNKNOWN',
  path: 'C:\\test\\node_modules\\htmlparser2\\lib'
}
> console.log(g)
Uncaught ReferenceError: g is not defined

Expected behavior
Require should instantiate g withour errors.

Not working with ubuntu

Version of google-it
If you have it installed globally, please run npm list google-it -g in your command prompt of choice, and that should result in an output that looks like this:

└── [email protected]

If you don't have it installed globally, simply remove the -g flag from that command.

Versions of npm and node
Please run npm --version and node --version and paste in the results.
npm --version
9.5.0

node -v
v19.7.0

Describe the bug
A clear and concise description of what the bug is. At a minimum, please include you how used the google-it library (either on command line or within a node program).
Not working on ubuntu (I can access google using the same IP address )

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.
ubuntu@mregyptian ~> google-it --query="Latvian unicorn"
⠙ Loading results

Error: Error in response: statusCode 429. To see the raw response object, please include the 'diagnostics: true' as part of the options object (or -d if using command line)
at /usr/local/lib/node_modules/google-it/lib/googleIt.js:267:16
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
⠸ Loading results⏎ ubuntu@mregyptian ~ [SIGINT]> google-it --query="Latvian unicorn" -d
⠹ Loading results

ubuntu@mregyptian ~> google-it -d --query="Latvian unicorn"
⠼ Loading results

ubuntu@mregyptian ~>
ubuntu@mregyptian ~> google-it -d --query="Latvian unicorn"
⠹ Loading results

ubuntu@mregyptian ~> google-it --query="Latvian unicorn"
⠸ Loading results

can i change location and search for specific country results?

according to this comment #65 (comment) he used "gl" parameter but i tried and it's not working or I'm doing something wrong?


googleIt({'query': 'covfefe irony', 'gl':'fr'}).then(results => {
 console.log(results)
}).catch(e => {
console.log(e)
})```

approximateResults/resultStatsSelector

Anyone have code examples to get this to work? I got as far as the screenshot above. I noticed the NULL values in my stats section,
so my instinct is I haven't set a parameter way before reading the STATS response in my code.

THANKS

Does not return results anymore

Version of google-it

[email protected]

Versions of npm and node
Node: v14.15.3
Yarn: 1.22.4

Describe the bug

I have tested this on Mac and windows.

How to reproduce

google-it --query="Obama"

Screenshots

No results on searching

Hi,

Looks like google-it has stoped working in the last 2 days. Tried on a number of devices and internet connections and no results are being returned but no error messages are being thrown.

Steps to reproduce:

sudo npm i -g google-it
google-it --query="Facebook"

Thanks,
C

Does not reject on status codes !== 200

Version of google-it
1.4.2

Versions of npm and node
npm: 6.13.4
node: 11.6.0

Describe the bug
Use: As an imported package (i.e. not command line)

The main function googleIt does not throw errors when the response from google has status code !== 200. Instead, it returns an empty array. This is because the function googleIt uses the function getResponseBody, but getResponseBody does not throw an error when this occurs. I believe this is due to the way the "request" library works, in that the request library does not set the error parameter in the callback.

Expected behavior
To be able to catch errors where status code !== 200, it would be better to check the status code and reject accordingly. For example, in the callback of the request call of getResponseBody:

if (response.statusCode !== 200) {
    reject(response);
}

This way, with a response code like 429 "Too Many Requests", receiving the response in the catch handler would allow for logging and reporting.

I am not sure what effects this change would have on other code in the library and in dependants. It could be a breaking change.

Screenshots
N/A

Additional context
N/A

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on Greenkeeper branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet.
We recommend using:

CircleCI
Travis CI
Buildkite
CodeShip
Azure Pipelines
TeamCity
Buddy
AppVeyor
But Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please click the 'fix repo' button on account.greenkeeper.io.

An in-range update of eslint-config-amex is breaking the build 🚨

The devDependency eslint-config-amex was updated from `11.1.0` to `11.2.0`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

eslint-config-amex is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details

❌ Travis CI - Branch: The build failed.

Release Notes for v11.2.0

11.2.0 (2020-02-07)

Features

index: disable jsx-one-expression-per-line (#8) (68e6d48)
rules: no-extra-parens (#9) (a3afc72)

Commits

The new version differs by 10 commits.

662aad7 chore(release): 11.2.0 [skip ci]
89f41ec chore(release): enable semantic-release
a3afc72 feat(rules): no-extra-parens (#9)
68e6d48 feat(index): disable jsx-one-expression-per-line (#8)
4dbe9cf chore(lockfile-lint): add lockfile-lint script
58322cf chore(npm): add package lock file (#5)
0f45501 Merge pull request #4 from americanexpress/update-readme-image
359ccf0 docs(readme): center image
9cc0d24 docs(readme): update layout and security file (#3)
63d2840 chore(commitlint): add commitlint and husky (#2)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴

How to increase limit?

Currently max query result is 10. Anyway to increase that to 1000+ for the same query?

Multiple pages

Hello, only 6 results get saved to the json, how can I make it save all of them? When I search it manually there are about 3,500 results.

Google-it fails when Google returns different HTML formats

Environment

node: v16.15.1
npm: 8.11.0
google-it: 1.6.3

Issue

I am suffering from a problem where using google-it fails about 50% of the time.

After investigation, I assume the following results are the root cause.
Look at the following HTML pattern

success:
https://github.com/ryudenx/google-it_fail/blob/master/takeda_success.html

fail:
https://github.com/ryudenx/google-it_fail/blob/master/takeda_fail.html

These have different HTML format/design.
Despite the same URL, same User-Agent, same IP, Google seems to return different HTML.

If possible, could you please fix this problem?
Or could you support the failing HTML format?

Reproduce

Reproduced code:

const { setTimeout } = require('timers/promises');
const googleIt = require('google-it')

async function main() {
    while (true) {

        googleIt({ 'query': "takeda", 'no-display': true, 'limit': 10 }).then(results => {

            //check
            if (results.length > 1) {
                console.log("success")
            } else {
                console.log("fail")
            }

        }).catch(e => {
            console.log(e)
        })

        await setTimeout(5000);
    }
}

main()

log (example):

$ node ./test.js
success
success
success
fail
success
fail
fail

Thank you for your cooperation.

How to use additional search parameters?

Hello,

Is this possible to use additional search parameters, for example &tbs=qdr:d?

stopped working for me yesterday, empty results, anyone else

latest version.

-h saves the correct html file, but no visible results.

probably a selector issue? US google.

Title and links are out of sync with this query "spotify music vs apple"

Version of google-it

1.6.2

Versions of npm and node
Please run npm --version and node --version and paste in the results.

% npm --version
7.10.0
% node --version
v16.0.0

Describe the bug

The result's titles/links are out of sync. with this query:

% ./node_modules/.bin/google-it --query="spotify music vs apple"
⠙ Loading results

Apple Music vs. Spotify: The best music streaming service for you ...
https://www.soundguys.com/apple-music-vs-spotify-36833/?sa=X&ved=2ahUKEwi19PKWrqTwAhXQFIgKHZBFCQUQ9QF6BAgIEAI
While ,Apple Music, may have more content, Spotify's music, catalog is still extensive with over 50 million ,songs,, with around 40,000 being added everyday. Unlike ,Apple Music,, ,Spotify, also offers podcasts, with over 700,000 currently on the platform.Mar. 23, 2021


Apple Music vs Spotify (2021) - YouTube
https://www.cnet.com/news/apple-music-versus-spotify-best-music-podcasts-streaming-service-price-catalog-features-plans-compared/
Aug. 3, 2020, · ,Spotify says it has a catalog of over 50 million songs while Apple Music tops 60 million. Both offer early access to certain albums from time to time ...


Apple Music vs. Spotify | Digital Trends
https://www.youtube.com/watch?v=l4QZej2Ae4A
Mar. 27, 2021, · ,The battle of the kings of music streaming services is here. Today we will be comparing Apple ...Duration: ,14:47,,Posted: ,Mar. 27, 2021

As you can see, the second result is from YouTube but the link is not.

Expected behavior

The titles and the links should be in sync.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Result title is undefined always

google-it version: 1.6.2
node version: 14.15.3
npm version: 6.14.9

Describe the bug
Result title is always undefined

Expected behavior
Should return the correct result title

Screenshots

Screenshot covers all the steps from installation and then running the query.
If you need more information, please let me know.

Options array object not being read correctly

Version of google-it
1.6.2

Versions of npm and node
npm 7.6.3
node v12.4.0

Describe the bug
Attempting to use options in code, as follows:

const  googleIt = require('google-it')

const options = {
  'limit': 3
};
googleIt({options, 'query': 'covfefe irony'}).then(results => {
  // access to results object here
}).catch(e => {
  // any possible errors that might have occurred (like no Internet connection)
})

Expected behavior
I expect only the first three results to be returned and displayed in the console. Instead the default 10 appears.

Additional context
I have also tried other options such as 'no-display': true and those listed in optionsDefinitions.js, to no avail (e.g. 'no-display': true continues to display the results in the console). Oddly, the only option that seems to work is 'proxy:' (as per the readme example), and that is not one of the options listed in optionsDefinitions.js.

I have found, however, that putting the options directly in the command path does work, e.g.:

googleIt({'limit': 3, 'no-display': true, 'query': 'covfefe irony'}).then(results => {
  // access to results object here
}).catch(e => {
  // any possible errors that might have occurred (like no Internet connection)
})

The code behaves as expected when written in this form. My guess is that it has something to do with how the options are unpacked inside the googleIt function.

Other Language

Hello,
Is this possible to change language results?. Like force to use a language

Issue with retrieving results now

Google made an update which makes all the selectors not work now by randomizing the div class names, is there any full proof solution to getting results? Because also on different versions of Google like russia's mobile site they use a different structure for results so it won't parse correct results consistently

You can see here every div class g is randomized by google to prevent scraping

error with title

Version of google-it
@1.6.3

Versions of npm and node
npm: 8.11.0 || node: v17.9.1

Describe the bug
results titles always return undefined for me

Allow extracting statistics

I would like to be able to programmatically access search stats via google-it (basically retrieving and interpreting the "Page 4 of about 37,700,000 results (0.67 seconds)" text), especially the result count and search time. The div id is "resultStats". Thanks!

Stopped working

I was using this for quite a while, but it suddenly stopped giving me an array when I would call the results please fix

Undefined error

Version of google-it
[email protected]
Versions of npm and node
npm: 6.11.3
node: 8.11.3
Describe the bug
When I run the command
google-it --query="test"
it only returns undefined.

Expected behavior
google searches of test

Screenshots

Error in response: statusCode 429

Version of google-it
1.6.1

Versions of npm and node
npm - 6.14.6
node - v12.18.4

Describe the bug
I was using this module in a NodeJS project (discord.js specifically) and I faced this error:
Error in response: statusCode 429. To see the raw response object, please include the 'diagnostics: true' as part of the options object (or -d if using command line)

Expected behavior
It should have given me results according to the query specified but it didn't.

Additional context
I also enabled the "diagnostics" to see the raw data but it didn't gave me anything. Hope this info is alright for the info.

Feature request: ignoring some domain's results

Hi !!

I use this library these days and love it ❤️
First of all, let me tell you grateful thank you creating this library 👍

When I use this library, sometime I want to ignore some domains which show not useful information.
I think this demand seems like the desire of stackoverflow-github-only option.

The thing we have to consider is where we save the data of ignore-list.
I'm thinking ~/.config is pretty nice but I don't know it's the best.

May I ask what you think about that? 😉

patneedham / google-it Goto Github PK

google-it's People

Contributors

Stargazers

Watchers

Forkers

google-it's Issues

How to reproduce

The devDependency eslint-config-amex was updated from 11.1.0 to 11.2.0.

11.2.0 (2020-02-07)

Features

Environment

Issue

Reproduce

Recommend Projects

Recommend Topics

Recommend Org

The devDependency eslint-config-amex was updated from `11.1.0` to `11.2.0`.