Git Product home page Git Product logo

automate-save-page-as's Introduction

automate-save-page-as

A quick hack for when wget doesn't cut it.

tl;dr Perform browser's "Save page as" (Ctrl+S) operation from command line without manual intervention

Demo

This small bash script simulates a sequence of key presses which opens a given url in the browser, save the page (Ctrl+S), and close the browser tab/window (Ctrl+F4). Chained together, these operations allow you to use the "Save Page As" (Ctrl+S) programtically (currently you can use either of google-chrome, chromium-browser or firefox, and it's fairly straight forward to add support for your favorite browser).

Examples:

# Save your FB home page
$ ./save_page_as "www.facebook.com" --destination "/tmp/facebook_home_page.html"
# Use Firefox to open a web-page and save it in /tmp (the default name for the file (Page title) is used)
$ ./save_page_as "www.example.com" --browser "firefox" --destination "/tmp"
# Save a url with default name, but provide an additional suffix
$ ./save_page_as "www.example.com" --destination "/tmp" --suffix "-trial_save"
# List all available command line options.
$ ./save_page_as --help

save_page_as: Open the given url in a browser tab/window, perform 'Save As' operation and close the tab/window.

USAGE:
   save_page_as URL [OPTIONS]

URL                      The url of the web page to be saved.

options:
  -d, --destination      Destination path. If a directory, then file is saved with default name inside the directory, else assumed to be full path of target file. Default = '.'
  -s, --suffix           An optional suffix string for the target file name (ignored if --destination arg is a full path)
  -b, --browser          Browser executable to be used (must be one of 'google-chrome' or 'firefox'). Default = 'google-chrome'.
  --load-wait-time       Number of seconds to wait for the page to be loaded (i.e., seconds to sleep before Ctrl+S is 'pressed'). Default = 4
  --save-wait-time       Number of seconds to wait for the page to be saved (i.e., seconds to sleep before Ctrl+F4 is 'pressed'). Default = 8
  -h, --help             Display this help message and exit.

The script needs xdotool installed (http://www.semicomplete.com/projects/xdotool/): sudo apt-get install xdotool (for Ubuntu).

Sidenote: My particular use case while writing this script was crawling a bunch of web pages which were rendered almost entierly on client side using lots of javascript magic (thus saving output of wget url was useless). Since the browser is capable of rendering those pages, and also saving the post-render version on disk (using Ctrl+S), I wrote this script to automate the process (Tested on Ubuntu 12.04 and 14.04 myself).

Suggestions and/or pull requests are always welcome!

automate-save-page-as's People

Contributors

abiyani avatar jmn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

automate-save-page-as's Issues

KDE + Chrome on ubuntu 14.04

When Chrome is used on KDE, the 'save as' popup is a different file-selection-widget and doesn't have default focus on the 'filename' (like gnome/gtk)

Any pointers to make this tool work in KDE environment is appreciated.

zap Download list left at bottom of screen

In an already running browser, after the newly created window is closed, a Downloads list is still left across the bottom of the screen.
So (optionally) zapping that is also needed, for a perfect script.
The "X" on its right side needs to be clicked (in chromium at least).

The script does not work with Chrome under Ubuntu (Unity)

When running the script under Ubuntu 16.04 running Unity the chrome browser opens with the URL. The save dialog box appear. But then the "Type your command" popup appears as if ALT key is pressed. So the full path gets added to the command box instead of the save dialog and the dialog does not close. It's caused by the fix done for issue #1.
If the above issue is fixed, after saving the page, when the tab close (ALT+F4) is invoked, it is changing to text mode as if (CTRL+ALT+F4) is pressed.

I have fixed both the issues but tested only in Ubuntu 16.04. Will send the PR.

Firefox window does not close automatically in KUbuntu 18

//Close the browser tab/window (Ctrl+w for KDE, Ctrl+F4 otherwise)

This is the KDE desktop, but the Firefox is never closing by itself at the end of the operation.

Other small issues.

I noticed that the script does not work for a freshly installed Firefox - only the first time.

I noticed also, that the script did not work today only on the first execution of this test run:
./save_page_as -b firefox -d tmp "https://www.bing.com"

No error if browser isn't installed

Thanks for this! Real handy tool.

It's defaulting to google-chrome, but if google-chrome isn't installed it just runs indefinitely.

Building an error message off the exit status of "which $browser" might be an easy fix.

issue non-printable ascii character(s)

Hi! Downloaded latest save_as... with xdotool-2.20110530.1 and my charset is UTF-8. Great tool but my first output of it is as follows:

save_page_as "www.facebook.com" --destination "/tmp/facebook_home_page.html" ERROR: Either -- destination ('/tmp/facebook_home_page.html') or --suffix ('') contains a non ascii or non-printable ascii character(s). 'xdotool' does not mingle well with non-ascii characters (https://code.google.com/p/semicomplete/issues/detail?id=14).

!!!! Will NOT proceed !!!!

is it a bug or something with my .bashrc ? Thanks

sync flag did not work

what is that sync option here
browser_wid="$(xdotool search --sync --onlyvisible --class "${browser}" | head -n 1)"?
it just never worked with that option. Is it important? Please explain.

thanks for the nice tool.

Mention where one got this file from

After downloading the script and finding it works great,
one wants to contact the authors and congratulate them etc.

However unless one remembers where one downloaded it from,
or searches for patterns on Google, all knowledge of where the script came from is lost.

I would put a

  • bug report / repository URL
  • copyright date
  • version #

in the --help message.

Firefox Branch Capability

First and foremost, Thank you for writing this program. It saved my bacon when i needed to save about 250 pages and I can't thank you enough.

Secondly I use Firefox developer branch and the program didn't want to use it. It's an easy fix as i just linked firefox-developer to Firefox and noticed no issues. Personally i'd like to see this fixed from inside the program as it is more elegant.

ps: firefox branches: firefox-developer,firefox-beta,firefox-nightly,(deprecated firefox-aurora).
pps:I'm in the middle of my exams right now and don't have the time to fix it, but if there's no-one else I definitely will.

This tool downloads only the top of the page content

Hi, can this program not work with modern pages, where the pages are loaded gradually via scrolling? I would like to use this tool for a page of this type. The page reads only what is visible and scrolls down through AJAX to generate additional parts of the source code. So after loading, the page has only 40 lines, this tool sees it. Really, this page has, after scrolling down, for example, 10762 lines. How can I get a whole page of 10762 lines?

Thank you all for the tips.

Firefox sometimes appears dysfunctional during the first run

KUbuntu 18.

During the first run in a session, it appears as though the Firefox browser is first warming up. It registers the "index.html" as the name of the page to be saved, which is wrong. I click on cancel and close the browser myself. Later on, it works well automatically.

problem on Debian 9 stretch

Hello,
Thanks for the script :)
I use Debian 9 stretch, I have installed xdotools.
Whether with firefox or chrome, the page opens, but there are no catches.
In logs, I have this:

Aug 8 09:19:21 xxxxxxxxxxxxxxx kerrnel: [ 1258.634385] xdotool[1922]: segfault at 20 ip 00007face0fcb346 sp 00007ffd3d6ac490 error 4 in libxdo.so.3[7face0fc4000+b000]
Aug 8 09:25:08 xxxxxxxxxxxxxxxxxxxx kernel: [ 1605.317681] xdotool[2194]: segfault at 20 ip 00007f3cc379f346 sp 00007ffdbb266c20 error 4 in libxdo.so.3[7f3cc3798000+b000]
Aug 8 09:26:28 xxxxxxxxxxxxxxxxxx kernel: [ 1684.747747] xdotool[2323]: segfault at 20 ip 00007f1acba36346 sp 00007ffe012508c0 error 4 in libxdo.so.3[7f1acba2f000+b000]

Do you have any idea of ​​the problem ?

thank you

Cannot save page when computer is locked

The script is working perfectly fine (Opens FF, Saves, Closes FF) when I run it manually on terminal but I need it to run without human intervention, so I set a cron job to save the page everyday.

Unfortunately when the computer is locked, the script only opens the page on FF and does not save or close it. Im not sure if its an Ubuntu problem or a script problem (perhaps the keyboard shortcuts dont work when the computer is not logged in?), but I just wanted to check if anybody else has encountered and has a workaround.

Oh and I'm running Ubuntu 14.04LTS / Firefox. Thanks!

Char % added to destination folder

I'm running the script like this:

./save_page_as -d /tmp -b chromium-browser --load-wait-time 1.5 --save-wait-time 3 "$URL"

Sometimes, while the script is saving the file, it shows &tmp on the save dialog. Sometimes it works correctly. Im also using another script to call save_page_as with several different urls.

What could be the problem?

Workaround for file overwrite dialog

There is a problem when the destination file already exists.
To avoid handling a popup dialog, I suggest to download to temprorary name first, then move to the destination just by mv.

Error: Can't open display: (null)

When i start the script i get then Error messages

Error: Can't open display: (null) Failed creating new xdo instance

I am running it with firefox on Ubunut 14.04

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.