Git Product home page Git Product logo

amalrajan / learncpp-download Goto Github PK

View Code? Open in Web Editor NEW
50.0 4.0 18.0 483 KB

An advanced web scraper tool that seamlessly fetches and combines over 350 online tutorials into a convenient offline PDF format.

License: MIT License

Python 96.81% Dockerfile 3.19%
learncpp crawler web-crawler offline learncpp-offline wwwlearncppcom learncpp-download learncpp-tutorials learncpp-content learncpp-python

learncpp-download's People

Contributors

amalrajan avatar cuppajoeman avatar dependabot[bot] avatar mushi0 avatar p0358 avatar pentabarf avatar vlada1001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

learncpp-download's Issues

Syntax highlighting is removed

In the website where it sometimes shows snippets of code you can see the code syntax
image

However, in the PDFs, this syntax highlighting is removed
image

Warnings

(process:18596): GLib-GIO-WARNING **: 21:51:57.865: Unexpectedly, UWP app `Microsoft.ScreenSketch_11.2309.16.0_x64__8wekyb3d8bbwe' (AUMId `Microsoft.ScreenSketch_8wekyb3d8bbwe!App') supports 29 extensions but has no verbs

(process:18596): GLib-GIO-WARNING **: 21:51:57.909: Unexpectedly, UWP app `Clipchamp.Clipchamp_2.8.1.0_neutral__yxz26nhyzhsrt' (AUMId `Clipchamp.Clipchamp_yxz26nhyzhsrt!App') supports 41 extensions but has no verbs

It stops at 99.7%.

Empty Download folder.

Undocumented dependencies

When using the instructions featured in the README.md, with a Python3 fresh environment, main.py will fail to execute. The following modules need to be installed (using pip):

  • beautifulsoup4
  • pdfkit
  • lxml

socket.gaierror: [Errno -3] Temporary failure in name resolution : in docker

docker run --rm --name=learncpp-download --mount type=bind,destination=/app/downloads,source=/home/siemens/learncpp/learncpp-download/source/downloads --shm-size=1.14gb amalrajan/learncpp-download
2023-11-21 13:41:04,311 WARNING services.py:1826 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 1224069120 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=8.98gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2023-11-21 13:41:04,434 INFO worker.py:1636 -- Started a local Ray instance.
Traceback (most recent call last):
File "/usr/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/usr/lib/python3.10/http/client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
self.send(msg)
File "/usr/lib/python3.10/http/client.py", line 975, in send
self.connect()
File "/usr/lib/python3.10/http/client.py", line 941, in connect
self.sock = self._create_connection(
File "/usr/lib/python3.10/socket.py", line 824, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/lib/python3.10/socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/app/source/main.py", line 27, in
instance = render.WeasyRender()
File "/app/source/helper/render.py", line 99, in init
self.urls = self.get_urls(cooldown)
File "/app/source/helper/render.py", line 27, in get_urls
return scraper.get_urls(cooldown)
File "/app/source/helper/scraper.py", line 20, in get_urls
sauce = urllib.request.urlopen(req).read()
File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/usr/lib/python3.10/urllib/request.py", line 1377, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

WeasyPrint encounters a library loading issue related to 'gobject-2.0-0' on Windows

Error message:

WeasyPrint could not import some external libraries. Please carefully follow the installation steps before reporting an issue:
[Installation Steps](https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#installation)
[Troubleshooting Guide](https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#troubleshooting)

Traceback (most recent call last):
  File "C:\Users\WDAGUtilityAccount\Documents\learncpp-download-master\source\main.py", line 5, in <module>
    from helper import render
  File "C:\Users\WDAGUtilityAccount\Documents\learncpp-download-master\source\helper\render.py", line 10, in <module>
    import weasyprint
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python310\lib\site-packages\weasyprint\__init__.py", line 387, in <module>
    from .css import preprocess_stylesheet  # noqa isort:skip
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python310\lib\site-packages\weasyprint\css\__init__.py", line 25, in <module>
    from . import computed_values, counters, media_queries
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python310\lib\site-packages\weasyprint\css\computed_values.py", line 11, in <module>
    from ..text.ffi import ffi, pango, units_to_double
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python310\lib\site-packages\weasyprint\text\ffi.py", line 428, in <module>
    gobject = _dlopen(
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python310\lib\site-packages\weasyprint\text\ffi.py", line 417, in _dlopen
    return ffi.dlopen(names[0])  # pragma: no cover
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python310\lib\site-packages\cffi\api.py", line 150, in dlopen
    lib, function_cache = _make_ffi_library(self, name, flags)
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python310\lib\site-packages\cffi\api.py", line 832, in _make_ffi_library
    backendlib = _load_backend_lib(backend, libname, flags)
  File "C:\Users\WDAGUtilityAccount\AppData\Local\Programs\Python\Python310\lib\site-packages\cffi\api.py", line 827, in _load_backend_lib
    raise OSError(msg)
OSError: cannot load library 'gobject-2.0-0': error 0x7e.  Additionally, ctypes.util.find_library() did not manage to locate a library called 'gobject-2.0-0'

Steps to reproduce

  1. Simply launch a new windows sandbox
  2. Install Python 3.10
  3. Run the script.

Exits at 90% with no message and nothing downloaded?

$ docker run --rm --name=learncpp-download -v learncpp-download:/app/downloads --shm-size=10.17gb amalrajan/learncpp-download
[======================================================------] 90.0% ...

It just exits after it gets to 90% (where it gets pretty much instantly), then it just exists. The mounted folder for download is empty, and it seems to be exiting with exit code 0

unable to download

log is following:

(download_file pid=70080) ERROR:root:unable to download: http://www.learncpp.com/cpp-tutorial/introduction-to-these-tutorials#FAQ
(download_file pid=70080) ERROR:root:unable to download: https://www.learncpp.com/cpp-tutorial/configuring-your-compiler-compiler-extensions/
(download_file pid=70081) ERROR:root:unable to download: https://www.learncpp.com/cpp-tutorial/introduction-to-cplusplus/
(download_file pid=70081) ERROR:root:unable to download: https://www.learncpp.com/cpp-tutorial/configuring-your-compiler-warning-and-error-levels/
(download_file pid=70081) ERROR:root:unable to download: https://www.learncpp.com/cpp-tutorial/introduction-to-iostream-cout-cin-and-endl/
(download_file pid=70084) ERROR:root:unable to download: https://www.learncpp.com/cpp-tutorial/installing-an-integrated-development-environment-ide/
(download_file pid=70084) ERROR:root:unable to download: https://www.learncpp.com/cpp-tutorial/comments/
(download_file pid=70084) ERROR:root:unable to download: https://www.learncpp.com/cpp-tutorial/introduction-to-expressions/
(download_file pid=70083) ERROR:root:unable to download: http://www.learncpp.com/cpp-tutorial/introduction-to-these-tutorials#FAQ
(download_file pid=70083) ERROR:root:unable to download: https://www.learncpp.com/cpp-tutorial/compiling-your-first-program/

Javascript functions return void

The web scraper tries to scrape local Javascript functions, like the show solution button in quizzes, which inevitably returns void.
image
I do not know much about web scraping, but it would be nice if the bot opens the functions and copies the solution instead or something like that...
image
P.S. You may tell me to use the HTML documents, but it would be nice to also have the answers visible in the PDF

Undocumented dependency

When following instructions, the "ray" dependency isn't automatically installed. I had to manually install ray. Not a big deal though, thank you for your work!

doesn't download

hi
thank you for write this software
but i have a problem:
it doesn't download anything and only displays this output
and just create directory downloads

2023-10-15 09:11:30,546 INFO worker.py:1636 -- Started a local Ray instance.

AssignProcessToJobObject() failed

Error message:

(process:1140): GLib-GIO-WARNING **: 11:13:56.138: Unexpectedly, UWP app Microsoft.ScreenSketch_11.2305.26.0_x64__8wekyb3d8bbwe' (AUMId Microsoft.ScreenSketch_8wekyb3d8bbwe!App') supports 29 extensions but has no verbs
Traceback (most recent call last):
File "C:\Users\abhayhm\Desktop\learncpp\learncpp-download\source\main.py", line 23, in
ray.init(log_to_driver=False)
File "C:\Users\abhayhm\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ray_private\client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
File "C:\Users\abhayhm\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ray_private\worker.py", line 1534, in init
_global_node = ray._private.node.Node(
File "C:\Users\abhayhm\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ray_private\node.py", line 287, in init
self.start_head_processes()
File "C:\Users\abhayhm\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ray_private\node.py", line 1164, in start_head_processes
self.start_monitor()
File "C:\Users\abhayhm\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ray_private\node.py", line 1067, in start_monitor
process_info = ray._private.services.start_monitor(
File "C:\Users\abhayhm\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ray_private\services.py", line 1957, in start_monitor
process_info = start_ray_process(
File "C:\Users\abhayhm\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ray_private\services.py", line 904, in start_ray_process
ray._private.utils.set_kill_child_on_death_win32(process)
File "C:\Users\abhayhm\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ray_private\utils.py", line 916, in set_kill_child_on_death_win32
raise OSError(ctypes.get_last_error(), "AssignProcessToJobObject() failed")
OSError: [Errno 0] AssignProcessToJobObject() failed

Showing progress 99.7% but does not download anything

I have followed all the given steps but still unable to download:

python3 main.py
2023-11-21 19:18:56,214 INFO worker.py:1636 -- Started a local Ray instance.
[============================================================] 99.7% ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.