Git Product home page Git Product logo

Comments (8)

wchao avatar wchao commented on June 17, 2024

Hi Michael,

Thanks for the praise. I built the code because I get all my food via delivery (no car, and I live in the suburbs), and it was a giant hassle that I was having to stay up late at night to snatch a time slot (mostly unsuccessfully!). I have a toddler, so staying up until 2 AM is not good because the toddler wakes up at 6:30 AM and expects attention....

Yes, I have seen that error before. When I encountered that error before, I did some Googling and it led me to some search results implying that the error occurs with DDoS protection from Cloudflare. In other words, Cloudflare thinks you are running a bot. Now, technically, I was running a bot, but it wasn't for the purpose of DDoS, so I thought that with some tweaking I might be able to overcome the problem (short story is that yes, I got it working, and it works pretty well for me now, with a few minor issues I'd like to fix, but basically very usable now for me).

What worked for me was to use my real user profile, either by pointing the Selenium Chrome webdriver to my user data directory or by making a copy of the entire user profile directory for Chrome. In my case, I copied "C:\Users\wchao\AppData\Local\Google\Chrome\User Data" to "C:\Users\wchao\Documents\Devel\Google Chrome\User Data" and then pointed my program to the copy by setting user_data_dir to "C:\Users\wchao\Documents\Devel\Google Chrome\User Data".

For you, I recommend the following steps to figure out what the best solution is, since you are on Mac:

  1. First try it just with the default user profile (not a copy). I believe that on a Mac, this is located at Users/<username>/Library/Application Support/Google/Chrome/Default. Can you confirm on your Mac?

  2. Make sure Google Chrome is not configured to run in the background and that you have shut down or terminated all running instances of Google Chrome. I found that if Chrome was running elsewhere, there were warning messages emitted by the Selenium Chrome webdriver. I wasn't sure if those warnings resulted in improper functioning, but since we are trying to isolate the problem, better to eliminate potential problems at least until we get it working for you. It would be nice to be able to interact cleanly with Chrome even if another instance is running, but I need to do further testing and research to figure out how to achieve that with the Selenium Chrome webdriver.

  3. Run the program and see if it works at this point. If not, stop the program, make sure to kill all Chrome processes and also all chromedriver processes (the chromedriver is started up by the freshdirect_slot_chrome.py program when it instantiates a Selenium Chrome webdriver instance).

  4. If the program is still giving you the error from FreshDirect and still not signing in automatically, then the next thing to try is to sign in manually. To do that, add the following right after the line that reads "if not raw_div_list:" (the if statement is right after raw_div_list = soup.find_all('div', {'id': re.compile('ts_d\d+_ts\d+_time'), 'class': 'tsCont'})):

time.sleep(60)
continue

Thus it should then read:

raw_div_list = soup.find_all('div', {'id': re.compile('ts_d\d+_ts\d+_time'), 'class': 'tsCont'})
if not raw_div_list:
time.sleep(60)
continue

Because this is Python, make sure the indentation is correct (the indents above are not right due to Github formatting). The time.sleep(60) and continue should have 6 spaces at the beginning of each line. Then run the program. What that will do is get to the point where it needs a sign-in, and then you should manually sign in. That's not going to completely solve the problem if it does work, because obviously you'd like to be able to step away from your computer and not be tethered there to do the sign in when it occasionally needs it. I find that occasionally the program needs to sign in again because FreshDirect reboots their web servers and state is lost, or some other reason.

Can you let me know if that works OK for you? If it still does not work, maybe we could schedule a time to do a screen share where I can take a look at what is happening? When the Selenium Chrome webdriver is not pointed at user data directory, what happens is it instantiates an anonymous profile, and I think Cloudflare is able to detect with certain fingerprints that the instance of Chrome is automated. When you use a user data directory that has your real data in it, I think Cloudflare believes the Chrome instance is real and not controlled by a bot.

I would at some point like to build a Chrome extension, which ought to eliminate the issues (in addition to making it way easier to install, configure, and run!).

Let me know how things work above, and then I'm happy to help as needed (could do a screen share with TeamViewer or Zoom or Skype, for instance). If you are a Python or Javascript developer, would also love to get any code contributions you might feel inclined to offer. Or, if not a developer, also happy to get feature requests and suggestions. I would love to see this get more use. I think the project will probably evolve. Eventually some of the grocery delivery companies will implement queues on their web sites where you can sign up for first available time slot rather than the less-than-ideal situation now where they have a schedule 7 days out and that's it, and the slots get grabbed minutes after they open up. At that point, the current code will be less useful, but there are probably still other things that people will find useful (e.g. super shopper that scans multiple grocery delivery sites and helps you get exactly the items you ordered).

from grocery-delivery.

wchao avatar wchao commented on June 17, 2024

Also, if you need an SMTP server with authentication, let me know, and I'm happy to create some login credentials. I need to write more code to allow the general public to use my mail server for alerts (to avoid abuse), but on a one off basis for someone who is real (i.e. not a bot or a spammer or someone with malicious intent), happy to provide that if you don't already have an SMTP server.

from grocery-delivery.

mhendri avatar mhendri commented on June 17, 2024

Thanks for your quick response! Necessity is the mother of invention, so no wonder you spent time building this workflow. 😄

I previously tried the steps you outlined above (1-3) with no success, and tried it again after your comments.

  1. I created a new Chrome profile and re-added my FD credentials
  2. I also tried moving the directory and referencing that new directory in the user_data_dir field.

Sadly, neither of those worked. When I try to sleep the process to manually login, Chrome never autofills my credentials even though it sees my user profile. When I manually login, I get the same access denied message; maybe because the request appears "fishy." I'd be down to do a screen share to continue troubleshooting. My email is: [email protected]

I develop in Python and Javascript and would be happy to help contribute to this! Once it's working on my Mac, I can submit a PR with any changes that allow it to work/add error handling for issues like mine. It's great that you want to extend this and make it into a Chrome extension, I think that'd be really neat.

I'll let you know about SMTP credentials once I can get it to run and "see" delivery windows. Thanks very much for the offer!

from grocery-delivery.

wchao avatar wchao commented on June 17, 2024

Just to confirm, on the same Mac, if you open up Chrome, not via freshdirect_slot_chrome.py, but rather just by starting the Chrome application, you are able to see FreshDirect and log in?

The symptoms you describe suggest that the user profile isn't being set properly because I think if they were, the username and password should autofill. I think the access denied message even when you manually login is caused by your user profile not getting retrieved or linked up properly in the Selenium Chrome instance.

One more thing to check: for user_data_dir, are you specifying "Users/<username>/Library/Application Support/Google/Chrome/Default" or "Users/<username>/Library/Application Support/Google/Chrome". I'm sorry that in my previous response, I think I gave you the wrong path. "Users/<username>/Library/Application Support/Google/Chrome/Default" is the path to your default user profile, but the program needs the immediate parent directory, i.e. "Users/<username>/Library/Application Support/Google/Chrome". If you change that, does that make a difference in functioning of the program?

I am on Eastern Time. Would you be able to get on a screen share around 8:30, 9, 9:30 PM Eastern Time?

from grocery-delivery.

mhendri avatar mhendri commented on June 17, 2024

That's correct; when I start a "normal" Chrome session I am able to a login to FreshDirect with no trouble.

I'm specifying Users/<username>/Library/Application Support/Google/Chrome for user_data_dir. I agree, I don't think the Selenium session is seeing my Chrome profile correctly, even though my user avatar appears. When I change the user_data_dir to a wrong path my avatar disappears, so it recognizes the difference on some level, but not enough to autofill the log in.

I'm also on eastern time - unfortunately I can't chat tonight, but perhaps some time tomorrow? My work day is flexible if you'd like to chat in the morning, afternoon, or early evening. Please feel free to email at the address above. Thank you again for your time, help, and effort!

from grocery-delivery.

mhendri avatar mhendri commented on June 17, 2024

Hey @wchao, I had two friends test this out on their Macs and they had similar issues. There is something odd about the way macOS manages Google profiles - perhaps something with the keychain. Even using selenium to send the login credentials results in an access denied page.

from grocery-delivery.

wchao avatar wchao commented on June 17, 2024

Interesting. The possibility of some kind of privilege elevation being needed on Mac did occur to me. I am not really a Mac user, and have only used one maybe a few dozen times over the past couple of decades, so I'm not that familiar with how it operates, but what you say makes some sense to me, and sounds similar to AppArmor or SELinux on Linux. Is there a way to run Chrome (well, I suppose in this case, the Python script that invokes Chrome) with elevated privileges to see if that helps with this particular issue? Obviously that is a risk that would not generally be advisable, but in this case the source code is readily available to see what is being done, and it's primarily to help debug the issue. I'm not familiar enough with the keychain to know what operations are permitted and how the Chrome user data profile interacts with the keychain.

One possibility I looked into, because it also impacts Windows and probably Linux as well, was to figure out how Cloudflare and Akamai detect bots. I think it would be much better to run the code without requiring the user's real profile (more secure), but instead to use the default, anonymized profile created by Selenium and the Chrome WebDriver. The following page was somewhat useful:
https://stackoverflow.com/questions/54432980/how-to-access-a-site-via-a-headless-driver-without-being-denied-permission

However, it seems like a lot of work to handle that. Another approach is to use requests instead of Selenium:
https://stackoverflow.com/questions/44865673/access-denied-while-scraping

That has significant advantages, if it can be made to work, in that it will work on any computer because it doesn't even require a browser. On the other hand, it has certain disadvantages, notably that getting it to work may take longer than a Selenium-based approach because you can't really see what is going on. Of course, I have already done the pattern recognition and workflow bits of the project, so that's work that is already done, and the thing that would need to happen is to deal with Javascript issues that might result from not having a Javascript interpreter in the requests library. I think that the best way to make this widely usable is to write a Chrome extension (I think Javascript is needed for that) because then (a) the installation is super easy, and (b) you probably eliminate any issues with the access denied error because it's the full and real user profile of the Chrome user. I'm hoping to start looking at that in the next week or two, but I'm coming at it completely new since I haven't ever written a Chrome extension, though I've done some Javascript programming -- maybe I can get some pointers or help from you on the Chrome extension effort?

For the access denied problem overall, this page is moderately useful:
https://www.reddit.com/r/techsupport/comments/ap9wrk/definitive_research_on_the_access_denied_you_dont/

I did a Google search on "Access Denied You don't have permission to access on this server. reference # selenium chrome webdriver" and some of those links helped.

I'm a little tied up today, so I can't review it with you today, but take a look at the links I've included and see if those help, and we can connect up later this week.

One anecdote as to how crazy it continues to be. My script alerted me as follows:

FreshDirect delivery time slots opened up at 2020-04-27 12:23:42:
May 1 6 am - 9 am
May 1 2 pm - 5 pm
May 1 5 pm - 8 pm
May 1 8 pm - 10 pm

I went online as soon as I received the email (probably 15 seconds after getting it), and two of the four time slots had already been taken. I managed to snag a May 1 5 PM - 8 PM slot, but only because I already had my cart loaded and ready to go from yesterday and the day before, and all I had to do was click check out and pay (about 30 seconds). If I had to shop and fill the cart, no way I would have gotten a slot. Competition is so intense to get a slot.

from grocery-delivery.

mhendri avatar mhendri commented on June 17, 2024

I created a Windows virtual machine and was able to get it running there. Although, every once in a while I get the Access Denied page. It's been hard to track down why it happens to begin with, but maybe it's due to some network latency and it tries to access a page thats behind the login. I can get around this; it requires I clear the cache in the Selenium instance, quit that instance, load FreshDirect and login through a normal browser session, then restart the process.

I played around with page sequence, login sequence, cache clearing and all that quite a bit, but I haven't found a method that works 100% of the time. For example, I found that if I got an access denied request when going to the home page I could:

  1. clear the cache
  2. load FreshDirect.com
  3. click on 'Hi'
  4. be brought to the login page and login successfully

However, sometimes that button doesn't work when using Selenium and sometimes it is a pop-up instead of a redirect.

Headless browsing or a requests method would be really good. Before I came across your repo I spent about an hour trying to get requests to work and was mostly foiled.

The short of it though is that I've been able to get it working and was able to get 2 time slots! One happened to be because I was on the site so much and saw one open up, the other was thanks to your tool. So a very sincere thank you!

P.S. One thing I noticed is that if a time slot shows available but cannot be clicked on the website, it can usually be selected in the app.

from grocery-delivery.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.