Git Product home page Git Product logo

Comments (15)

rushter avatar rushter commented on May 31, 2024 1

@LemonsoftLtd You need to install the python-devel package.

Something like yum install python-devel or yum install python3-devel should fix the problem.

from selectolax.

rushter avatar rushter commented on May 31, 2024

What version of selectolax and Python are you running on?
I tried your example and it works fine for me.

I changed the open command, since Python 3 is very strict about encodings.
open('selectolax_bug.log', encoding='utf-8')


 
  
    
             
       Eintrag hinzufügen
      |  Administration
    
   
   
     Donnerstag, 16. August 2018 02:03Willkomen in unserem Gästebuch. Hier können Sie einen Beitrag hinterlassen.
     




Gästebuch
606911-606920
606901-606910
606891-606900
606881-606890
606871-606880
606861-606870
606851-606860
606841-606850
606831-606840
....
....
....

from selectolax.

mleue avatar mleue commented on May 31, 2024

Thanks a lot for the quick reply.
That is very weird indeed. This is the setup on my system:
Linux, Ubuntu 18.04
python 3.6.5
selectolax 0.1.7

I just tried everything in a fresh virtualenv.

○ → pip list
Package    Version
---------- -------
pip        18.0   
selectolax 0.1.7  
setuptools 40.4.3 
wheel      0.31.1 

Here are the exact steps that produce the error on my system in that virtualenv in the python console:

○ → python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from selectolax.parser import HTMLParser
>>> with open('selectolax_bug.log', encoding='utf-8') as f:
...   test = f.read()
... 
>>> tree = HTMLParser(test)
>>> tree.body.text()
Segmentation fault (core dumped)

from selectolax.

rushter avatar rushter commented on May 31, 2024

I'll check it this weekend.

from selectolax.

mleue avatar mleue commented on May 31, 2024

perfect, thanks in advance for your time and effort

from selectolax.

rushter avatar rushter commented on May 31, 2024

I've pushed the fix, it's not very clever and depends on the compiler, but it should work on most of the systems. I will rewrite my text parsing algorithm in the future. Currently, it uses recursion approach and fails on your example because of the stack size limits.

You can fix the old version by simply increasing the stack size:

➜  ~ ulimit -s
8192
➜  ~ ulimit -s 16000
➜  ~ ulimit -s
16000

Please try the new version:

pip install --no-cache-dir selectolax==0.1.8

from selectolax.

mleue avatar mleue commented on May 31, 2024

Hey. very nice.
The quickfix with setting ulimit -s works.

However, the new version 0.1.8 doesn't seem to fix the problem for me by default. I still have to set the ulimit for that one to work too.

pip list
Package    Version
---------- -------
pip        18.0   
selectolax 0.1.8  
setuptools 40.4.3 
wheel      0.31.1
○ → python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from selectolax.parser import HTMLParser
>>> with open('selectolax_bug.log', encoding='utf-8') as f:
...   test = f.read()
... 
>>> tree = HTMLParser(test)
>>> tree.body.text()
Segmentation fault (core dumped)

but after setting

○ → ulimit -s 16000

it works

○ → python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from selectolax.parser import HTMLParser
>>> with open('selectolax_bug.log', encoding='utf-8') as f:
...   test = f.read()
... 
>>> tree = HTMLParser(test)
>>> tree.body.text()
>>>

Or do I need to recompile any of the C backend? I'm not very familiar with Cython or C extensions.

from selectolax.

rushter avatar rushter commented on May 31, 2024

My fix relies on a compiler and it can ignore my instruction. I will fix this issue later.

For the time being, you could use something like this:

>>> import resource
>>> soft, hard = resource.getrlimit(resource.RLIMIT_STACK)
>>> resource.setrlimit(resource.RLIMIT_STACK, (soft * 4, hard))
>>>

The code above increases the stack limit from 8kb to 32kb.

from selectolax.

mleue avatar mleue commented on May 31, 2024

that works, perfect.
Again, thanks a lot for your time and effort.

from selectolax.

mindscratch avatar mindscratch commented on May 31, 2024

@rushter was there any work done on this since the ticket was closed?

from selectolax.

rushter avatar rushter commented on May 31, 2024

@mindscratch Nope. Do you have the same problem?

from selectolax.

mindscratch avatar mindscratch commented on May 31, 2024

Yes, using selectolax 0.1.10 with python 3.6.6 (on CentOS 7 kernel 4.18) and html content that's ~14mb.

Using the rlimit hack in the python code has worked so far.

from selectolax.

rushter avatar rushter commented on May 31, 2024

@mindscratch I've fixed the problem. Can you please check?

pip install selectolax==0.1.12

from selectolax.

lemonysoft avatar lemonysoft commented on May 31, 2024

Hello, i tried steps for my Centos 7. I couldn't install.

[root@dhpc09 ehealth]# pip install selectolax
Collecting selectolax
  Using cached https://files.pythonhosted.org/packages/42/7b/07342f02e9857a866dbd1d57ebc0de9c894d46fb4ee5283193b7496b59d0/selectolax-0.1.12.tar.gz
Installing collected packages: selectolax
  Running setup.py install for selectolax ... error
    Complete output from command /usr/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-93v7sd98/selectolax/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-u01qla9m/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.6
    creating build/lib.linux-x86_64-3.6/selectolax
    copying selectolax/__init__.py -> build/lib.linux-x86_64-3.6/selectolax
    running egg_info
    writing selectolax.egg-info/PKG-INFO
    writing dependency_links to selectolax.egg-info/dependency_links.txt
    writing top-level names to selectolax.egg-info/top_level.txt
    reading manifest file 'selectolax.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching 'CONTRIBUTING.rst'
    warning: no files found matching 'HISTORY.rst'
    warning: no previously-included files found matching 'selectolax/*.so'
    warning: no files found matching 'modest/include/*'
    warning: no files found matching 'modest/source/*'
    warning: no previously-included files matching '__pycache__' found under directory '*'
    warning: no previously-included files matching '*.py[co]' found under directory '*'
    warning: no files found matching '*.jpg' under directory 'docs'
    writing manifest file 'selectolax.egg-info/SOURCES.txt'
    copying selectolax/node.pxi -> build/lib.linux-x86_64-3.6/selectolax
    copying selectolax/parser.c -> build/lib.linux-x86_64-3.6/selectolax
    copying selectolax/parser.pxd -> build/lib.linux-x86_64-3.6/selectolax
    copying selectolax/parser.pyx -> build/lib.linux-x86_64-3.6/selectolax
    copying selectolax/selector.pxi -> build/lib.linux-x86_64-3.6/selectolax
    running build_ext
    building 'selectolax.parser' extension
    creating build/temp.linux-x86_64-3.6
    creating build/temp.linux-x86_64-3.6/selectolax
    creating build/temp.linux-x86_64-3.6/modest
    creating build/temp.linux-x86_64-3.6/modest/source
    creating build/temp.linux-x86_64-3.6/modest/source/modest
    creating build/temp.linux-x86_64-3.6/modest/source/modest/finder
    creating build/temp.linux-x86_64-3.6/modest/source/modest/layer
    creating build/temp.linux-x86_64-3.6/modest/source/modest/node
    creating build/temp.linux-x86_64-3.6/modest/source/modest/render
    creating build/temp.linux-x86_64-3.6/modest/source/modest/style
    creating build/temp.linux-x86_64-3.6/modest/source/mycore
    creating build/temp.linux-x86_64-3.6/modest/source/mycore/utils
    creating build/temp.linux-x86_64-3.6/modest/source/mycss
    creating build/temp.linux-x86_64-3.6/modest/source/mycss/declaration
    creating build/temp.linux-x86_64-3.6/modest/source/mycss/media
    creating build/temp.linux-x86_64-3.6/modest/source/mycss/namespace
    creating build/temp.linux-x86_64-3.6/modest/source/mycss/property
    creating build/temp.linux-x86_64-3.6/modest/source/mycss/selectors
    creating build/temp.linux-x86_64-3.6/modest/source/mycss/values
    creating build/temp.linux-x86_64-3.6/modest/source/myencoding
    creating build/temp.linux-x86_64-3.6/modest/source/myfont
    creating build/temp.linux-x86_64-3.6/modest/source/myhtml
    creating build/temp.linux-x86_64-3.6/modest/source/myport
    creating build/temp.linux-x86_64-3.6/modest/source/myport/posix
    creating build/temp.linux-x86_64-3.6/modest/source/myport/posix/mycore
    creating build/temp.linux-x86_64-3.6/modest/source/myport/posix/mycore/utils
    creating build/temp.linux-x86_64-3.6/modest/source/myunicode
    creating build/temp.linux-x86_64-3.6/modest/source/myurl
    gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Imodest/include/ -I/usr/include/python3.6m -c selectolax/parser.c -o build/temp.linux-x86_64-3.6/selectolax/parser.o -DMODEST_BUILD_OS=Linux -DMyCORE_OS_Linux -DMODEST_PORT_NAME=posix -DMyCORE_BUILD_WITHOUT_THREADS=YES -DMyCORE_BUILD_DEBUG=NO -O2 -pedantic -fPIC -Wno-unused-variable -Wno-unused-function -std=c99
    selectolax/parser.c:177:20: fatal error: Python.h: No such file or directory
     #include "Python.h"
                        ^
    compilation terminated.
    error: command 'gcc' failed with exit status 1

    ----------------------------------------
Command "/usr/bin/python3.6 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-93v7sd98/selectolax/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-u01qla9m/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-93v7sd98/selectolax/

from selectolax.

lemonysoft avatar lemonysoft commented on May 31, 2024

After yum install -y python36-devel.x86_64, now installed very well. Thanks.

from selectolax.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.