Git Product home page Git Product logo

Comments (7)

goasleep avatar goasleep commented on September 14, 2024 1

I got it. You use Jupyter notebook to run this code. Jupyter have their own async event loop and asyncio.run will open new event loop so it will raise this error. you could switch to a plain Python file to run your script. If you're keen on sticking with Jupyter, just make sure to run certain lines of code before executing your main script.

!pip install nest-asyncio
import nest_asyncio
nest_asyncio.apply()

graph_config = ....

@xjtupy

from scrapegraph-ai.

goasleep avatar goasleep commented on September 14, 2024

Seem this error was raised by Playwright. You can try to

  • Give Playwright an upgrade?
  • Make sure your network's all clear?

it doesn't work, could you share the full exception stack with us?
@xjtupy

from scrapegraph-ai.

xjtupy avatar xjtupy commented on September 14, 2024

@goasleep I installed the latest version of Playwright==1.46.0 and the network is working fine.

The complete exception information is as follows:

--- Executing Fetch Node ---
--- (Fetching HTML from: https://blog.csdn.net/mopmgerg54mo/article/details/141028116) ---
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
[/tmp/ipykernel_188540/3999856684.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/tmp/ipykernel_188540/3999856684.py) in <module>
     19 )
     20 
---> 21 result = smart_scraper_graph.run()
     22 print(result)

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/smart_scraper_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/smart_scraper_graph.py) in run(self)
    112 
    113         inputs = {"user_prompt": self.prompt, self.input_key: self.source}
--> 114         self.final_state, self.execution_info = self.graph.execute(inputs)
    115 
    116         return self.final_state.get("answer", "No answer found.")

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py) in execute(self, initial_state)
    261             return (result["_state"], [])
    262         else:
--> 263             return self._execute_standard(initial_state)
    264 
    265     def append_node(self, node):

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py) in _execute_standard(self, initial_state)
    183                         exception=str(e)
    184                     )
--> 185                     raise e
    186                 node_exec_time = time.time() - curr_time
    187                 total_exec_time += node_exec_time

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py) in _execute_standard(self, initial_state)
    167             with get_openai_callback() as cb:
    168                 try:
--> 169                     result = current_node.execute(state)
    170                 except Exception as e:
    171                     error_node = current_node.node_name

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py) in execute(self, state)
    125             return self.handle_local_source(state, source)
    126         else:
--> 127             return self.handle_web_source(state, source)
    128 
    129     def handle_directory(self, state, input_type, source):

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py) in handle_web_source(self, state, source)
    277             else:
    278                 loader = ChromiumLoader([source], headless=self.headless, **loader_kwargs)
--> 279                 document = loader.load()
    280 
    281             if not document or not document[0].page_content.strip():

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/langchain_core/document_loaders/base.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/langchain_core/document_loaders/base.py) in load(self)
     28     def load(self) -> List[Document]:
     29         """Load data into Document objects."""
---> 30         return list(self.lazy_load())
     31 
     32     async def aload(self) -> List[Document]:

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py) in lazy_load(self)
    109 
    110         for url in self.urls:
--> 111             html_content = asyncio.run(scraping_fn(url))
    112             metadata = {"source": url}
    113             yield Document(page_content=html_content, metadata=metadata)

[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run(future, debug)
     30         loop = asyncio.get_event_loop()
     31         loop.set_debug(debug)
---> 32         return loop.run_until_complete(future)
     33 
     34     if sys.version_info >= (3, 6, 0):

[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run_until_complete(self, future)
     68                 raise RuntimeError(
     69                     'Event loop stopped before Future completed.')
---> 70             return f.result()
     71 
     72     def _run_once(self):

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in wrap_api_call(self, cb, is_internal)
    510         self._api_zone.set(parsed_st)
    511         try:
--> 512             return await cb()
    513         except Exception as error:
    514             raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in inner_send(self, method, params, return_as_dict)
     95         if not callback.future.done():
     96             callback.future.cancel()
---> 97         result = next(iter(done)).result()
     98         # Protocol now has named return values, assume result is one level deeper unless
     99         # there is explicit ambiguity.

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py) in __step(***failed resolving arguments***)
    254                 # We use the `send` method directly, because coroutines
    255                 # don't have `__iter__` and `__next__` methods.
--> 256                 result = coro.send(None)
    257             else:
    258                 result = coro.throw(exc)

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py) in ascrape_playwright(self, url)
     78         logger.info("Starting scraping...")
     79         results = ""
---> 80         async with async_playwright() as p:
     81             browser = await p.chromium.launch(
     82                 headless=self.headless, proxy=self.proxy, **self.browser_config

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py) in __aenter__(self)
     44         if not playwright_future.done():
     45             playwright_future.cancel()
---> 46         playwright = AsyncPlaywright(next(iter(done)).result())
     47         playwright.stop = self.__aexit__  # type: ignore
     48         return playwright

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

Exception: Connection closed while reading from the driver

from scrapegraph-ai.

xjtupy avatar xjtupy commented on September 14, 2024

@goasleep I added the following code in Jupyter and the error still occurs

import nest_asyncio
nest_asyncio.apply()

In addition, I wrote a python file to run that code on Linux, and it also reported this error

Traceback (most recent call last):
  File "/home/odin/ddmpeng/tmp.py", line 23, in <module>
    result = smart_scraper_graph.run()
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 114, in run
    self.final_state, self.execution_info = self.graph.execute(inputs)
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py", line 263, in execute
    return self._execute_standard(initial_state)
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py", line 185, in _execute_standard
    raise e
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py", line 169, in _execute_standard
    result = current_node.execute(state)
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py", line 127, in execute
    return self.handle_web_source(state, source)
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py", line 279, in handle_web_source
    document = loader.load()
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/langchain_core/document_loaders/base.py", line 30, in load
    return list(self.lazy_load())
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py", line 111, in lazy_load
    html_content = asyncio.run(scraping_fn(url))
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 512, in wrap_api_call
    return await cb()
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 97, in inner_send
    result = next(iter(done)).result()
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py", line 80, in ascrape_playwright
    async with async_playwright() as p:
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py", line 46, in __aenter__
    playwright = AsyncPlaywright(next(iter(done)).result())
Exception: Connection closed while reading from the driver
Task exception was never retrieved
future: <Task finished name='Task-4' coro=<Connection.run.<locals>.init() done, defined at /home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py:269> exception=Exception('Connection.init: Connection closed while reading from the driver')>
Traceback (most recent call last):
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 270, in init
    self.playwright_future.set_result(await self._root_object.initialize())
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 212, in initialize
    await self._channel.send(
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 59, in send
    return await self._connection.wrap_api_call(
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 514, in wrap_api_call
    raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
Exception: Connection.init: Connection closed while reading from the driver

from scrapegraph-ai.

goasleep avatar goasleep commented on September 14, 2024

I try it on linux but I cannot reproduce this problem. Could you help to run below code in Jupyter?If still get same error. maybe reach out to the Playwright folks for some assistance. @xjtupy

import asyncio
import nest_asyncio
nest_asyncio.apply()

from playwright.async_api import async_playwright

url = "https://blog.csdn.net/mopmgerg54mo/article/details/141028116"
async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto(url, wait_until="domcontentloaded")
        await browser.close()
        print(page)

asyncio.run(main())

from scrapegraph-ai.

xjtupy avatar xjtupy commented on September 14, 2024

@goasleep Unfortunately, this problem still occurs

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
[/tmp/ipykernel_188540/1335809689.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/tmp/ipykernel_188540/1335809689.py) in <module>
     14         print(page)
     15 
---> 16 asyncio.run(main())

[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run(future, debug)
     30         loop = asyncio.get_event_loop()
     31         loop.set_debug(debug)
---> 32         return loop.run_until_complete(future)
     33 
     34     if sys.version_info >= (3, 6, 0):

[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run_until_complete(self, future)
     68                 raise RuntimeError(
     69                     'Event loop stopped before Future completed.')
---> 70             return f.result()
     71 
     72     def _run_once(self):

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in wrap_api_call(self, cb, is_internal)
    510         self._api_zone.set(parsed_st)
    511         try:
--> 512             return await cb()
    513         except Exception as error:
    514             raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in inner_send(self, method, params, return_as_dict)
     95         if not callback.future.done():
     96             callback.future.cancel()
---> 97         result = next(iter(done)).result()
     98         # Protocol now has named return values, assume result is one level deeper unless
     99         # there is explicit ambiguity.

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py) in __step(***failed resolving arguments***)
    254                 # We use the `send` method directly, because coroutines
    255                 # don't have `__iter__` and `__next__` methods.
--> 256                 result = coro.send(None)
    257             else:
    258                 result = coro.throw(exc)

[/tmp/ipykernel_188540/1335809689.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/tmp/ipykernel_188540/1335809689.py) in main()
      7 url = "https://blog.csdn.net/mopmgerg54mo/article/details/141028116"
      8 async def main():
----> 9     async with async_playwright() as p:
     10         browser = await p.chromium.launch(headless=True)
     11         page = await browser.new_page()

~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py in __aenter__(self)
     44         if not playwright_future.done():
     45             playwright_future.cancel()
---> 46         playwright = AsyncPlaywright(next(iter(done)).result())
     47         playwright.stop = self.__aexit__  # type: ignore
     48         return playwright

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

Exception: Connection closed while reading from the driver

from scrapegraph-ai.

goasleep avatar goasleep commented on September 14, 2024

@goasleep Unfortunately, this problem still occurs

if get same error in running above code? if yes, you can ask playwright for help and create new issue in playwright issue and linking new playwright issue in this issue.

I guess it is your env problems cause it. I suggest you use docker to isolate the environment and then try again. @xjtupy

from scrapegraph-ai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.