Git Product home page Git Product logo

Comments (12)

IzzySoft avatar IzzySoft commented on August 11, 2024 2

I was AFK as well and now draining in what stacked up… I've just looked at the page source – it seems that all "visible" (HTML) references for WA (2 are there) point to FAMILY (aka "Kids"). Only the protobuf data has COMMUNICATION:

[[["Communication",[null,null,null,null,[null,null,"/store/apps/category/COMMUNICATION"]],"COMMUNICATION"]]]

So this will only be noticed when checking manually 😢 Looks like we have to switch the source for category to protopuf then (and only fall back to the other source if lookup failed).

Not sure when I'll find time to do that, might take a little. Thanks for reporting, @andaroid – might have taken even longer for us to spot and thus to fix! I'll do my best to fix it as speedy as possible, but cannot promise anything.

from googleplaywebserviceapi.

BaseMax avatar BaseMax commented on August 11, 2024 1

P.S: Sorry, I am on travel and currently going from Napoli to Rome.
I will check it with a delay.

Probably the main official source for this app: https://play.google.com/store/apps/details?id=com.whatsapp&hl=en&gl=US

@IzzySoft Would you please test it? If you found a time.

from googleplaywebserviceapi.

andaroid avatar andaroid commented on August 11, 2024

@IzzySoft hi
i try fix it by using ld+json , its contain the right category
check #23

from googleplaywebserviceapi.

IzzySoft avatar IzzySoft commented on August 11, 2024

That looks more reliable than my protobuf hacks. I had just set up this:

@ google-play.php:216 @ class GooglePlay {
         if ( empty($values["featureGraphic"]) ) $values["featureGraphic"] = $proto[1][2][96][0][3][2];
         if ( empty($values["video"]) && !empty($proto[1][2][100]) ) $values["video"] = $proto[1][2][100][0][0][3][2];
         if ( empty($values["summary"]) && !empty($proto[1][2][73]) ) $values["summary"] = $proto[1][2][73][0][1]; // 1, 2, 73, 0, 1
+        if ( !empty($proto[1][2][79]) ) {
+          $values["category"] = $proto[1][2][79][0][0][0];
+          switch($proto[1][2][79][0][0][2]) { // category from HTML sometimes is wrong, e.g. "Kids" with WhatsApp (com.whatsapp)
+            case "GAME": $values["type"] = "game"; break;
+            case "FAMILY": $values["type"] = "family"; break;
+            default: $values["type"] = "app"; break;
+          }
+        }
         // screenshots: 1,2,78,0,0-n; 1=format,2=[wid,hei],3.2=url
         // more details see: https://github.com/JoMingyu/google-play-scraper/blob/2caddd098b63736318a7725ff105907f397b9a48/google_play_scraper/constants/element.py
         break;

But protobuf sometimes needs more than 5 reloads to show up. Yours seems to hit it on the first try.

@BaseMax you're OK to go with the solution @andaroid is offering with the mentioned PR? Maybe the formatting should match the way all the other code is formatted (which also would compact it a bit), but then I'd say it's the better approach.

@andaroid maybe you have something similar for featureGraphic, summary and video as well, so we can save us the reloads?

from googleplaywebserviceapi.

andaroid avatar andaroid commented on August 11, 2024

@IzzySoft
ld+json offer this data only , without featureGraphic and video

ld+json data

{
  "@context": "https://schema.org",
  "@type": "SoftwareApplication",
  "name": "Enhance Photo Quality",
  "url": "https://play.google.com/store/apps/details/Enhance_Photo_Quality?id=com.smartworld.enhancephotoquality&hl=en&gl=US",
  "description": "App for enlarge image without losing quality, enhance color and photo resolution",
  "operatingSystem": "ANDROID",
  "applicationCategory": "PHOTOGRAPHY",
  "image": "https://play-lh.googleusercontent.com/chvvSlAFzWN16LrHPxO2WAg7LjekVsvgP_BQM9I7nqabiIEQe4hrf8Z8oPPsVSj7uw",
  "contentRating": "Everyone",
  "author": {
    "@type": "Person",
    "name": "Csmartworld"
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.053050518035889",
    "ratingCount": "36957"
  },
  "offers": [
    {
      "@type": "Offer",
      "price": "0",
      "priceCurrency": "USD",
      "availability": "https://schema.org/InStock"
    }
  ]
}

for category id you can use preg_match regex without parse ld+json data
ex:
$values["category"] = $this->getRegVal('/applicationCategory\"\:\"(?<content>[^"]+)\"/iu');
i think it's better solution for fix category id

from googleplaywebserviceapi.

IzzySoft avatar IzzySoft commented on August 11, 2024

ld+json offer this data only , without featureGraphic and video

Yes, that was the only thing I found, too. I was just hoping I had missed something…

from googleplaywebserviceapi.

andaroid avatar andaroid commented on August 11, 2024

@IzzySoft now what gonna to do ?

from googleplaywebserviceapi.

IzzySoft avatar IzzySoft commented on August 11, 2024

@andaroid if you can adjust the formatting to match the project's code, it seems fine for me. Did I understand you correctly that the single line you just posted would do the same as the JSON parsing (your lines 145-150) and we don't want to use other values from the JSON, feel free to rewrite to that. OTOH we could consider taking the other values (author, ratings, price) from the JSON as well. Especially it would be great to include

if ( empty($values["summary"]) ) $values["summary"] = $data["description"];

(which is currently part of the protobuf fallback).

So maybe the best idea is:

  • adjust the formatting
  • include the summary (as above)
  • include a link to your comment here with a comment to what currently is line 148 in your diff so we can decide for other values later
  • move the entire block down to after $values["price"] (as the default summary is otherwise overwriting the one from the JSON)

All that pending on approval by @BaseMax 😉

from googleplaywebserviceapi.

IzzySoft avatar IzzySoft commented on August 11, 2024

Basically, this is what I think:

@ google-play.php:143 @ class GooglePlay {
     }
 
     $values["developer"] = strip_tags($this->getRegVal('/href="\/store\/apps\/dev(eloper)*\?id=(?<id>[^\"]+)"([^\>]*|)>(\<span[^\>]*>)*(?<content>[^\<]+)(<\/span>|)<\/a>/i'));
 
-    preg_match('/<a class="WpHeLc VfPpkd-mRLv6 VfPpkd-RLmnJb" href="\/store\/apps\/category\/(?<id>[^\"]+)" aria-label="(?<content>[^\"]+)"/i', $this->input, $category);
-    if ( empty($category) ) preg_match('/href="\/store\/apps\/category\/(?<id>[^\"]+)" data-disable-idom="true" data-skip-focus-on-activate="false" jsshadow><span class="VfPpkd-N5Lhkf" jsname="bN97Pc"><span class="VfPpkd-jY41G-V67aGc" jsname="V67aGc">(?<content>[^\<]+)<\/span>/i', $this->input, $category);
-    if (isset($category["id"], $category["content"])) {
-      $values["category"] = trim(strip_tags($category["content"]));
-      $catId = trim(strip_tags($category["id"]));
-      if ($catId=='GAME' || substr($catId,0,5)=='GAME_') $values["type"] = "game";
-      elseif ($catId=='FAMILY' || substr($catId,0,7)=='FAMILY?') $values["type"] = "family";
-      else $values["type"] = "app";
-    } else {
-      $values["category"] = null;
-      $values["type"] = null;
-    }
 
     $values["summary"] = strip_tags($this->getRegVal('/property="og:description" content="(?<content>[^\"]+)/i'));
     $values["description"] = $this->getRegVal('/itemprop="description"[^\>]*><div class="bARER"[^\>]*>(?<content>.*?)<\/div><div class=/i');
     if ( strtolower(substr($lang,0,2)) != 'en' ) { // Google sometimes keeps the EN description additionally, so we need to filter it out **TODO:** check if this still applies (2022-05-27)
@ google-play.php:192 @ class GooglePlay {
     $values["votes"] = $this->getRegVal('/<div class="g1rdde">(?<content>[^>]+) reviews<\/div>/i');
     $values["price"] = $this->getRegVal('/<meta itemprop="price" content="(?<content>[^"]+)">/i');
 
+    $d = new DomDocument();
+    @$d->loadHTML($this->input);
+    $xp = new domxpath($d);
+    $jsonScripts = $xp->query( '//script[@type="application/ld+json"]' );
+    $json = trim( @$jsonScripts->item(0)->nodeValue ); //
+    $data = json_decode($json,true);
 
+    if(isset($data['applicationCategory'])) {
+      $values["category"] = $data['applicationCategory'];
+      if(substr($values["category"],0,5)=='GAME_') $values["type"] = "game";
+      elseif(substr($values["category"],0,7)=='FAMILY?') $values["type"] = "family";
+      else $values["type"] = "app";
+    } else {
+      $values["category"] = null;
+      $values["type"] = null;
+    }
+    if ( empty($values["summary"]) && !empty($data["description"]) ) $values["summary"] = $data["description"];
 
     $limit = 5; $proto = '';
     while ( empty($proto) && $limit > 0 ) { // sometimes protobuf is missing, but present again on subsequent call
       $proto = json_decode($this->getRegVal("/key: 'ds:4'. hash: '7'. data:(?<content>\[\[\[.+?). sideChannel: .*?\);<\/script/ims")); // ds:8 hash:22 would have reviews
@ google-play.php:221 @ class GooglePlay {
         if ( empty($values["video"]) && !empty($proto[1][2][100]) ) $values["video"] = $proto[1][2][100][0][0][3][2];
         if ( empty($values["summary"]) && !empty($proto[1][2][73]) ) $values["summary"] = $proto[1][2][73][0][1]; // 1, 2, 73, 0, 1
         // screenshots: 1,2,78,0,0-n; 1=format,2=[wid,hei],3.2=url
+        // category: $proto[1][2][79][0][0][0]; catId: $proto[1][2][79][0][0][2]
         // more details see: https://github.com/JoMingyu/google-play-scraper/blob/2caddd098b63736318a7725ff105907f397b9a48/google_play_scraper/constants/element.py
         break;
       }

The only draw-back to the protobuf approach is that the category then is all-CAPS, as the JSON has the categoryId. We could work around that by calling parseCategories() and map it accordingly – or simply leave that to the "user". Ouch, after fixing that method that is…

from googleplaywebserviceapi.

IzzySoft avatar IzzySoft commented on August 11, 2024

OK, I've fixed parseCategories() (now using a local list of all categories (categories.jsonl, using JSONL format for easy maintenance) as I couldn't find them listed in the original place anymore). Whoever wants the category names instead of the IDs can now obtain the list and map it as needed. The type is defined there as well:

Array
(
    [success] => 1
    [message] => 
    [data] => Array
        (
            [ANDROID_WEAR] => stdClass Object
                (
                    [id] => ANDROID_WEAR
                    [name] => Wear OS by Google
                    [type] => app
                )

            [ART_AND_DESIGN] => stdClass Object
                (
                    [id] => ART_AND_DESIGN
                    [name] => Art & Design
                    [type] => app
                )
 ...

from googleplaywebserviceapi.

IzzySoft avatar IzzySoft commented on August 11, 2024

@andaroid so will you perform the above mentioned adjustments?

from googleplaywebserviceapi.

IzzySoft avatar IzzySoft commented on August 11, 2024

Thanks once again for pointing out the path, @andaroid! As there was no response from @BaseMax and you didn't do the reorg, I've just pushed it myself. Hope you don't mind; attribution given with the commit 😉

@BaseMax I've also increased the version number in the header. As the structure returned by parseCategories() is different from what it returned before (now it's an array of objects with category details instead of just a simple array of category IDs), I've increased "minor" (1.0.1 => 1.1.0). The method also no longer needs network traffic (using local definitions as the other list was no longer available) and thus is much faster 😄 For what it returns, see 2 comments above.

Issue should be solved now, so I'm closing it.

from googleplaywebserviceapi.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.