ispras / web-scraper-chrome-extension Goto Github PK
View Code? Open in Web Editor NEWWeb data extraction tool implemented as chrome extension
License: GNU Lesser General Public License v3.0
Web data extraction tool implemented as chrome extension
License: GNU Lesser General Public License v3.0
Hi,
Good job with the plugin.
Chrome: Version 87.0.4280.141 (Official Build) (64-bit)
Ubuntu 20.04.1 LTS 64-bit
But I'm trying to use this:
Supported URL patterns:
1. Numeric with optional step and zero padding – [START_END:STEP] – [001_010:10]
my sitemap:
{"_id":"google","startUrls":["http://google.com.br?id=[001_010:10]"],"selectors":[{"id":"body","selector":"body","type":"SelectorHTML","parentSelectors":["_root"]}]}
and the pagination does not work.
I tried with 3.6 and it does not work again.
I would like the loop to stop the pagination when conditions like repeated elements or html contain.
Thank you.
Hi there!
I am struggling to create a pagination for a "next"-link (which changes the URL NOT using JavaScript to dynamically load new content).
E.g. have a look at https://www.ader-paris.fr/en/catalog/121109?offset=0.
On the mentioned page I want to cycle through all pages (1-7) using the "Next" link and extract the data shown below.
Is something like this possible with this extension? I found no way how to do it...
Would be great if you can give me a hint how this can be achieved!
Thank you for your help!
Best wishes,
koseduhemak
Firstly it is an awesome extension, thanks,
I try it and it works in someway,
What is the purposes of Select, Element preview, Data preview of Selector in Selectors and selectors properties pages?
Nothing happens when clicked.
Is the purpose of Select button to select the element on the web page by mouse click or what?
web-scraper-chrome-extension-v0.3.716.zip
Chrome Version 97.0.4692.71
Not sure if this project is still being maintained. I just tried to install this extension in the latest Chrome version 86.0.x and there are so many bugs right away:
I get a series of " Uncaught SyntaxError: Cannot use import statement outside a module" . Not sure if Chrome APIs or settings have changed in the last three months (last time this repo was updated).
Can anyone else confirm my observations?
Commas could apper in start url and break validation
Fix bug on sitemap in https://linuxsecurity.com/advisories/archlinux.
How to run the unminified version or how does one develop, make changes and run this extension? I know how to use the release version.
I tried to build on npm but it was not successful. I think it is due to deprecated node modules.
Can any body list the required node, node-gyp, npm, python and msvc versions to build it?
Thanks.
Test scraping and data preview does not work with image selector
We need to add support for internalization. For example we can use https://github.com/wikimedia/jquery.i18n. And add russian translations.
Some tables can have colspan and rowspan that merge some rows or columns. We need somehow add suport fot that cases.
Will this project be compatible with Manifest V3?
I am using Chrome 102.0.5005.63. When I add a new selector and go to the page in the browser, I don't get the selector panel in the bottom left. I can't select anything on the page. I am comparing the behavior with the extension from the web store. Does the extension here work differently? I tried different websites, restarted Chrome and I don't see errors in the console (both in top and the extension's console context). I removed the browser_specific_settings setting from the manifest file.
I am not sure what the issue is.
Currently the manifest requires permision for all urls.
Some users might prefer least needed permisions from the app.
A solution for that might be adding that permision as an optional permission and asking user to add permission for data of new sites added to the extention.
example for making data for domains an optional permission that can be asked later from the running extention at runtime:
manifest.json
...
"optional_permissions": [ "http://*/", "https://*/" ]
...
example for asking permission for a new site at runtime:
chrome.permissions.request({
origins: [protocol +"://"+ domain +":"+ port+"/"]
}, function(granted) {
// The callback argument will be true if the user granted the permissions.
if (granted) {
alert("amazing things happend here")
} else {
alert("Without permision to the site the app can't work")
}
});
On installing in chrome as shown in the installation guide, chrome gives a message that manifest file is missing. Is this extension compatible or not.
I'm trying to set a click on a button in a website where infinite scroll is active, but I've noticed that the scraped data will be related to only one thing inside the page and not all the selectors that have the same class.
How I can scrape correctly the data, do I need to set the selectors in a different way?
Does it work with the websites that have a prompt login like bhadoo index?
I am using Firefox and I am not finding an easy way to debug and put breakpoints. Webpack puts the code I want to put a breakpoint on in a long line of an eval statement. I am using yarn watch:dev.
How does one debug and place breakpoints in such code which is hard to read? I am familiar with debugging extensions before the JS code gets built.
Hi,
First thank you for the extension. I liked very much.
I'm trying do a simple pagination with Element click selector. I read documentation.
Maybe facilitate adding simple sample sitemap to the Element click selector doc .
Below my sitemap:
{"_id":"bestbuy","startUrls":["https://www.bestbuy.com/site/car-stereos/android-auto-receivers/pcmcat1495052094624.c?cp=3&id=pcmcat1495052094624"],"selectors":[{"id":"pagination","selector":"#sku-list-1","clickElementSelector":"a.sku-list-page-next svg.svg-size-s","clickElementUniquenessType":"uniqueHTML","clickType":"clickMore","type":"SelectorElementClick","parentSelectors":["_root"],"delay":"2000"},{"id":"pagination products list","selector":"body","type":"SelectorHTML","multiple":true,"parentSelectors":["pagination"],"delay":"2000"},{"id":"product list","selector":"#main-results > ol","type":"SelectorHTML","parentSelectors":["_root"],"delay":"2000"}]}
Save the selector with an empty id field , it invalid. Then enter the name and validator reject a saving.
support send json results to backen server via http?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.