Comments (7)
I got it. You use Jupyter notebook to run this code. Jupyter have their own async event loop and asyncio.run will open new event loop so it will raise this error. you could switch to a plain Python file to run your script. If you're keen on sticking with Jupyter, just make sure to run certain lines of code before executing your main script.
!pip install nest-asyncio
import nest_asyncio
nest_asyncio.apply()
graph_config = ....
from scrapegraph-ai.
Seem this error was raised by Playwright. You can try to
- Give Playwright an upgrade?
- Make sure your network's all clear?
it doesn't work, could you share the full exception stack with us?
@xjtupy
from scrapegraph-ai.
@goasleep I installed the latest version of Playwright==1.46.0 and the network is working fine.
The complete exception information is as follows:
--- Executing Fetch Node ---
--- (Fetching HTML from: https://blog.csdn.net/mopmgerg54mo/article/details/141028116) ---
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
[/tmp/ipykernel_188540/3999856684.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/tmp/ipykernel_188540/3999856684.py) in <module>
19 )
20
---> 21 result = smart_scraper_graph.run()
22 print(result)
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/smart_scraper_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/smart_scraper_graph.py) in run(self)
112
113 inputs = {"user_prompt": self.prompt, self.input_key: self.source}
--> 114 self.final_state, self.execution_info = self.graph.execute(inputs)
115
116 return self.final_state.get("answer", "No answer found.")
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py) in execute(self, initial_state)
261 return (result["_state"], [])
262 else:
--> 263 return self._execute_standard(initial_state)
264
265 def append_node(self, node):
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py) in _execute_standard(self, initial_state)
183 exception=str(e)
184 )
--> 185 raise e
186 node_exec_time = time.time() - curr_time
187 total_exec_time += node_exec_time
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py) in _execute_standard(self, initial_state)
167 with get_openai_callback() as cb:
168 try:
--> 169 result = current_node.execute(state)
170 except Exception as e:
171 error_node = current_node.node_name
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py) in execute(self, state)
125 return self.handle_local_source(state, source)
126 else:
--> 127 return self.handle_web_source(state, source)
128
129 def handle_directory(self, state, input_type, source):
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py) in handle_web_source(self, state, source)
277 else:
278 loader = ChromiumLoader([source], headless=self.headless, **loader_kwargs)
--> 279 document = loader.load()
280
281 if not document or not document[0].page_content.strip():
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/langchain_core/document_loaders/base.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/langchain_core/document_loaders/base.py) in load(self)
28 def load(self) -> List[Document]:
29 """Load data into Document objects."""
---> 30 return list(self.lazy_load())
31
32 async def aload(self) -> List[Document]:
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py) in lazy_load(self)
109
110 for url in self.urls:
--> 111 html_content = asyncio.run(scraping_fn(url))
112 metadata = {"source": url}
113 yield Document(page_content=html_content, metadata=metadata)
[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run(future, debug)
30 loop = asyncio.get_event_loop()
31 loop.set_debug(debug)
---> 32 return loop.run_until_complete(future)
33
34 if sys.version_info >= (3, 6, 0):
[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run_until_complete(self, future)
68 raise RuntimeError(
69 'Event loop stopped before Future completed.')
---> 70 return f.result()
71
72 def _run_once(self):
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
199 self.__log_traceback = False
200 if self._exception is not None:
--> 201 raise self._exception
202 return self._result
203
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in wrap_api_call(self, cb, is_internal)
510 self._api_zone.set(parsed_st)
511 try:
--> 512 return await cb()
513 except Exception as error:
514 raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in inner_send(self, method, params, return_as_dict)
95 if not callback.future.done():
96 callback.future.cancel()
---> 97 result = next(iter(done)).result()
98 # Protocol now has named return values, assume result is one level deeper unless
99 # there is explicit ambiguity.
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
199 self.__log_traceback = False
200 if self._exception is not None:
--> 201 raise self._exception
202 return self._result
203
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py) in __step(***failed resolving arguments***)
254 # We use the `send` method directly, because coroutines
255 # don't have `__iter__` and `__next__` methods.
--> 256 result = coro.send(None)
257 else:
258 result = coro.throw(exc)
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py) in ascrape_playwright(self, url)
78 logger.info("Starting scraping...")
79 results = ""
---> 80 async with async_playwright() as p:
81 browser = await p.chromium.launch(
82 headless=self.headless, proxy=self.proxy, **self.browser_config
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py) in __aenter__(self)
44 if not playwright_future.done():
45 playwright_future.cancel()
---> 46 playwright = AsyncPlaywright(next(iter(done)).result())
47 playwright.stop = self.__aexit__ # type: ignore
48 return playwright
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
199 self.__log_traceback = False
200 if self._exception is not None:
--> 201 raise self._exception
202 return self._result
203
Exception: Connection closed while reading from the driver
from scrapegraph-ai.
@goasleep I added the following code in Jupyter and the error still occurs
import nest_asyncio
nest_asyncio.apply()
In addition, I wrote a python file to run that code on Linux, and it also reported this error
Traceback (most recent call last):
File "/home/odin/ddmpeng/tmp.py", line 23, in <module>
result = smart_scraper_graph.run()
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 114, in run
self.final_state, self.execution_info = self.graph.execute(inputs)
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py", line 263, in execute
return self._execute_standard(initial_state)
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py", line 185, in _execute_standard
raise e
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py", line 169, in _execute_standard
result = current_node.execute(state)
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py", line 127, in execute
return self.handle_web_source(state, source)
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py", line 279, in handle_web_source
document = loader.load()
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/langchain_core/document_loaders/base.py", line 30, in load
return list(self.lazy_load())
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py", line 111, in lazy_load
html_content = asyncio.run(scraping_fn(url))
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 512, in wrap_api_call
return await cb()
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 97, in inner_send
result = next(iter(done)).result()
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py", line 80, in ascrape_playwright
async with async_playwright() as p:
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py", line 46, in __aenter__
playwright = AsyncPlaywright(next(iter(done)).result())
Exception: Connection closed while reading from the driver
Task exception was never retrieved
future: <Task finished name='Task-4' coro=<Connection.run.<locals>.init() done, defined at /home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py:269> exception=Exception('Connection.init: Connection closed while reading from the driver')>
Traceback (most recent call last):
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 270, in init
self.playwright_future.set_result(await self._root_object.initialize())
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 212, in initialize
await self._channel.send(
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 59, in send
return await self._connection.wrap_api_call(
File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 514, in wrap_api_call
raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
Exception: Connection.init: Connection closed while reading from the driver
from scrapegraph-ai.
I try it on linux but I cannot reproduce this problem. Could you help to run below code in Jupyter?If still get same error. maybe reach out to the Playwright folks for some assistance. @xjtupy
import asyncio
import nest_asyncio
nest_asyncio.apply()
from playwright.async_api import async_playwright
url = "https://blog.csdn.net/mopmgerg54mo/article/details/141028116"
async def main():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(url, wait_until="domcontentloaded")
await browser.close()
print(page)
asyncio.run(main())
from scrapegraph-ai.
@goasleep Unfortunately, this problem still occurs
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
[/tmp/ipykernel_188540/1335809689.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/tmp/ipykernel_188540/1335809689.py) in <module>
14 print(page)
15
---> 16 asyncio.run(main())
[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run(future, debug)
30 loop = asyncio.get_event_loop()
31 loop.set_debug(debug)
---> 32 return loop.run_until_complete(future)
33
34 if sys.version_info >= (3, 6, 0):
[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run_until_complete(self, future)
68 raise RuntimeError(
69 'Event loop stopped before Future completed.')
---> 70 return f.result()
71
72 def _run_once(self):
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
199 self.__log_traceback = False
200 if self._exception is not None:
--> 201 raise self._exception
202 return self._result
203
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in wrap_api_call(self, cb, is_internal)
510 self._api_zone.set(parsed_st)
511 try:
--> 512 return await cb()
513 except Exception as error:
514 raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in inner_send(self, method, params, return_as_dict)
95 if not callback.future.done():
96 callback.future.cancel()
---> 97 result = next(iter(done)).result()
98 # Protocol now has named return values, assume result is one level deeper unless
99 # there is explicit ambiguity.
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
199 self.__log_traceback = False
200 if self._exception is not None:
--> 201 raise self._exception
202 return self._result
203
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py) in __step(***failed resolving arguments***)
254 # We use the `send` method directly, because coroutines
255 # don't have `__iter__` and `__next__` methods.
--> 256 result = coro.send(None)
257 else:
258 result = coro.throw(exc)
[/tmp/ipykernel_188540/1335809689.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/tmp/ipykernel_188540/1335809689.py) in main()
7 url = "https://blog.csdn.net/mopmgerg54mo/article/details/141028116"
8 async def main():
----> 9 async with async_playwright() as p:
10 browser = await p.chromium.launch(headless=True)
11 page = await browser.new_page()
~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py in __aenter__(self)
44 if not playwright_future.done():
45 playwright_future.cancel()
---> 46 playwright = AsyncPlaywright(next(iter(done)).result())
47 playwright.stop = self.__aexit__ # type: ignore
48 return playwright
[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
199 self.__log_traceback = False
200 if self._exception is not None:
--> 201 raise self._exception
202 return self._result
203
Exception: Connection closed while reading from the driver
from scrapegraph-ai.
@goasleep Unfortunately, this problem still occurs
if get same error in running above code? if yes, you can ask playwright for help and create new issue in playwright issue and linking new playwright issue in this issue.
I guess it is your env problems cause it. I suggest you use docker to isolate the environment and then try again. @xjtupy
from scrapegraph-ai.
Related Issues (20)
- [1.14.0+] pydantic ValidationError with SmartScraperGraph HOT 31
- Based on the appeal, is there a possibility to add this tool to langflow custom tool?
- ValueError: Provider gpt-4o-mini is not supported. If possible, try to use a model instance instead. HOT 2
- IndexError: list index out of range when use SearchGraph HOT 7
- DeepSeek demo HOT 19
- v1.15.1: burr dependency not optional HOT 3
- i am getting the below while running for ollama model HOT 7
- Token count implementation in ParseNode splits text on spaces which is not correct HOT 1
- When I use examples/deepseek/smart_scraper_deepseek.py ,I have a error. HOT 4
- v1.17.0b5: No module named 'PIL' HOT 3
- Chunking support for ScriptCreatorGraph HOT 4
- Support for OpenAI Assistants API HOT 2
- Provider bedrock is not supported when trying to use bedrock examples listed in repo. HOT 5
- Error Instancing bedrock model from example code HOT 2
- ValueError: Error raised by bedrock service: 'str' object has no attribute 'invoke_model' HOT 6
- It can´t scrape URLs from the source HOT 8
- Executing RAG Node HOT 2
- Not getting extraction results after upgrading from 1.6.1 to 1.18.1 HOT 1
- SmartScraperGraph with Gemini: Provider google is not supported in SmartScraperGraph HOT 2
- Implement tokenization for Ollama models in refactoring-tokenization branch HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scrapegraph-ai.