Comments (12)
I was AFK as well and now draining in what stacked up… I've just looked at the page source – it seems that all "visible" (HTML) references for WA (2 are there) point to FAMILY (aka "Kids"). Only the protobuf data has COMMUNICATION:
[[["Communication",[null,null,null,null,[null,null,"/store/apps/category/COMMUNICATION"]],"COMMUNICATION"]]]
So this will only be noticed when checking manually 😢 Looks like we have to switch the source for category to protopuf then (and only fall back to the other source if lookup failed).
Not sure when I'll find time to do that, might take a little. Thanks for reporting, @andaroid – might have taken even longer for us to spot and thus to fix! I'll do my best to fix it as speedy as possible, but cannot promise anything.
from googleplaywebserviceapi.
P.S: Sorry, I am on travel and currently going from Napoli to Rome.
I will check it with a delay.
Probably the main official source for this app: https://play.google.com/store/apps/details?id=com.whatsapp&hl=en&gl=US
@IzzySoft Would you please test it? If you found a time.
from googleplaywebserviceapi.
@IzzySoft hi
i try fix it by using ld+json , its contain the right category
check #23
from googleplaywebserviceapi.
That looks more reliable than my protobuf hacks. I had just set up this:
@ google-play.php:216 @ class GooglePlay {
if ( empty($values["featureGraphic"]) ) $values["featureGraphic"] = $proto[1][2][96][0][3][2];
if ( empty($values["video"]) && !empty($proto[1][2][100]) ) $values["video"] = $proto[1][2][100][0][0][3][2];
if ( empty($values["summary"]) && !empty($proto[1][2][73]) ) $values["summary"] = $proto[1][2][73][0][1]; // 1, 2, 73, 0, 1
+ if ( !empty($proto[1][2][79]) ) {
+ $values["category"] = $proto[1][2][79][0][0][0];
+ switch($proto[1][2][79][0][0][2]) { // category from HTML sometimes is wrong, e.g. "Kids" with WhatsApp (com.whatsapp)
+ case "GAME": $values["type"] = "game"; break;
+ case "FAMILY": $values["type"] = "family"; break;
+ default: $values["type"] = "app"; break;
+ }
+ }
// screenshots: 1,2,78,0,0-n; 1=format,2=[wid,hei],3.2=url
// more details see: https://github.com/JoMingyu/google-play-scraper/blob/2caddd098b63736318a7725ff105907f397b9a48/google_play_scraper/constants/element.py
break;
But protobuf sometimes needs more than 5 reloads to show up. Yours seems to hit it on the first try.
@BaseMax you're OK to go with the solution @andaroid is offering with the mentioned PR? Maybe the formatting should match the way all the other code is formatted (which also would compact it a bit), but then I'd say it's the better approach.
@andaroid maybe you have something similar for featureGraphic, summary and video as well, so we can save us the reloads?
from googleplaywebserviceapi.
@IzzySoft
ld+json offer this data only , without featureGraphic and video
ld+json data
{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "Enhance Photo Quality",
"url": "https://play.google.com/store/apps/details/Enhance_Photo_Quality?id=com.smartworld.enhancephotoquality&hl=en&gl=US",
"description": "App for enlarge image without losing quality, enhance color and photo resolution",
"operatingSystem": "ANDROID",
"applicationCategory": "PHOTOGRAPHY",
"image": "https://play-lh.googleusercontent.com/chvvSlAFzWN16LrHPxO2WAg7LjekVsvgP_BQM9I7nqabiIEQe4hrf8Z8oPPsVSj7uw",
"contentRating": "Everyone",
"author": {
"@type": "Person",
"name": "Csmartworld"
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.053050518035889",
"ratingCount": "36957"
},
"offers": [
{
"@type": "Offer",
"price": "0",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock"
}
]
}
for category id you can use preg_match regex without parse ld+json data
ex:
$values["category"] = $this->getRegVal('/applicationCategory\"\:\"(?<content>[^"]+)\"/iu');
i think it's better solution for fix category id
from googleplaywebserviceapi.
ld+json offer this data only , without featureGraphic and video
Yes, that was the only thing I found, too. I was just hoping I had missed something…
from googleplaywebserviceapi.
@IzzySoft now what gonna to do ?
from googleplaywebserviceapi.
@andaroid if you can adjust the formatting to match the project's code, it seems fine for me. Did I understand you correctly that the single line you just posted would do the same as the JSON parsing (your lines 145-150) and we don't want to use other values from the JSON, feel free to rewrite to that. OTOH we could consider taking the other values (author, ratings, price) from the JSON as well. Especially it would be great to include
if ( empty($values["summary"]) ) $values["summary"] = $data["description"];
(which is currently part of the protobuf fallback).
So maybe the best idea is:
- adjust the formatting
- include the summary (as above)
- include a link to your comment here with a comment to what currently is line 148 in your diff so we can decide for other values later
- move the entire block down to after
$values["price"]
(as the default summary is otherwise overwriting the one from the JSON)
All that pending on approval by @BaseMax 😉
from googleplaywebserviceapi.
Basically, this is what I think:
@ google-play.php:143 @ class GooglePlay {
}
$values["developer"] = strip_tags($this->getRegVal('/href="\/store\/apps\/dev(eloper)*\?id=(?<id>[^\"]+)"([^\>]*|)>(\<span[^\>]*>)*(?<content>[^\<]+)(<\/span>|)<\/a>/i'));
- preg_match('/<a class="WpHeLc VfPpkd-mRLv6 VfPpkd-RLmnJb" href="\/store\/apps\/category\/(?<id>[^\"]+)" aria-label="(?<content>[^\"]+)"/i', $this->input, $category);
- if ( empty($category) ) preg_match('/href="\/store\/apps\/category\/(?<id>[^\"]+)" data-disable-idom="true" data-skip-focus-on-activate="false" jsshadow><span class="VfPpkd-N5Lhkf" jsname="bN97Pc"><span class="VfPpkd-jY41G-V67aGc" jsname="V67aGc">(?<content>[^\<]+)<\/span>/i', $this->input, $category);
- if (isset($category["id"], $category["content"])) {
- $values["category"] = trim(strip_tags($category["content"]));
- $catId = trim(strip_tags($category["id"]));
- if ($catId=='GAME' || substr($catId,0,5)=='GAME_') $values["type"] = "game";
- elseif ($catId=='FAMILY' || substr($catId,0,7)=='FAMILY?') $values["type"] = "family";
- else $values["type"] = "app";
- } else {
- $values["category"] = null;
- $values["type"] = null;
- }
$values["summary"] = strip_tags($this->getRegVal('/property="og:description" content="(?<content>[^\"]+)/i'));
$values["description"] = $this->getRegVal('/itemprop="description"[^\>]*><div class="bARER"[^\>]*>(?<content>.*?)<\/div><div class=/i');
if ( strtolower(substr($lang,0,2)) != 'en' ) { // Google sometimes keeps the EN description additionally, so we need to filter it out **TODO:** check if this still applies (2022-05-27)
@ google-play.php:192 @ class GooglePlay {
$values["votes"] = $this->getRegVal('/<div class="g1rdde">(?<content>[^>]+) reviews<\/div>/i');
$values["price"] = $this->getRegVal('/<meta itemprop="price" content="(?<content>[^"]+)">/i');
+ $d = new DomDocument();
+ @$d->loadHTML($this->input);
+ $xp = new domxpath($d);
+ $jsonScripts = $xp->query( '//script[@type="application/ld+json"]' );
+ $json = trim( @$jsonScripts->item(0)->nodeValue ); //
+ $data = json_decode($json,true);
+ if(isset($data['applicationCategory'])) {
+ $values["category"] = $data['applicationCategory'];
+ if(substr($values["category"],0,5)=='GAME_') $values["type"] = "game";
+ elseif(substr($values["category"],0,7)=='FAMILY?') $values["type"] = "family";
+ else $values["type"] = "app";
+ } else {
+ $values["category"] = null;
+ $values["type"] = null;
+ }
+ if ( empty($values["summary"]) && !empty($data["description"]) ) $values["summary"] = $data["description"];
$limit = 5; $proto = '';
while ( empty($proto) && $limit > 0 ) { // sometimes protobuf is missing, but present again on subsequent call
$proto = json_decode($this->getRegVal("/key: 'ds:4'. hash: '7'. data:(?<content>\[\[\[.+?). sideChannel: .*?\);<\/script/ims")); // ds:8 hash:22 would have reviews
@ google-play.php:221 @ class GooglePlay {
if ( empty($values["video"]) && !empty($proto[1][2][100]) ) $values["video"] = $proto[1][2][100][0][0][3][2];
if ( empty($values["summary"]) && !empty($proto[1][2][73]) ) $values["summary"] = $proto[1][2][73][0][1]; // 1, 2, 73, 0, 1
// screenshots: 1,2,78,0,0-n; 1=format,2=[wid,hei],3.2=url
+ // category: $proto[1][2][79][0][0][0]; catId: $proto[1][2][79][0][0][2]
// more details see: https://github.com/JoMingyu/google-play-scraper/blob/2caddd098b63736318a7725ff105907f397b9a48/google_play_scraper/constants/element.py
break;
}
The only draw-back to the protobuf approach is that the category then is all-CAPS, as the JSON has the categoryId. We could work around that by calling parseCategories()
and map it accordingly – or simply leave that to the "user". Ouch, after fixing that method that is…
from googleplaywebserviceapi.
OK, I've fixed parseCategories()
(now using a local list of all categories (categories.jsonl
, using JSONL format for easy maintenance) as I couldn't find them listed in the original place anymore). Whoever wants the category names instead of the IDs can now obtain the list and map it as needed. The type is defined there as well:
Array
(
[success] => 1
[message] =>
[data] => Array
(
[ANDROID_WEAR] => stdClass Object
(
[id] => ANDROID_WEAR
[name] => Wear OS by Google
[type] => app
)
[ART_AND_DESIGN] => stdClass Object
(
[id] => ART_AND_DESIGN
[name] => Art & Design
[type] => app
)
...
from googleplaywebserviceapi.
@andaroid so will you perform the above mentioned adjustments?
from googleplaywebserviceapi.
Thanks once again for pointing out the path, @andaroid! As there was no response from @BaseMax and you didn't do the reorg, I've just pushed it myself. Hope you don't mind; attribution given with the commit 😉
@BaseMax I've also increased the version number in the header. As the structure returned by parseCategories()
is different from what it returned before (now it's an array of objects with category details instead of just a simple array of category IDs), I've increased "minor" (1.0.1 => 1.1.0). The method also no longer needs network traffic (using local definitions as the other list was no longer available) and thus is much faster 😄 For what it returns, see 2 comments above.
Issue should be solved now, so I'm closing it.
from googleplaywebserviceapi.
Related Issues (7)
- Consistency in return values HOT 15
- Playstore redesign HOT 15
- is the API stopped to work ? HOT 15
- Fetch reviews of an app HOT 13
- Unable to get latest version number/name HOT 19
- Adding optional $lang param HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from googleplaywebserviceapi.