Git Product home page Git Product logo

soupy's Issues

setup error

Thanks for your package.

When I finished my setup, import soupy, there is error happened :

import soupy
Traceback (most recent call last):
File "", line 1, in
File "/Library/Python/2.7/site-packages/soupy.py", line 139, in
@six.python_2_unicode_compatible
AttributeError: 'module' object has no attribute 'python_2_unicode_compatible'

Do you have any ideas to fix it ?

Thanks for your help.

Add arguments to require()

Currently require simply checks if a Node is null. It should accept an function, and assert that the function evaluates to true when mapped on the data.

dom.find('a').require(Q['href'].startswith('https'))

Make wrappers hashable

Right now things like Scalar don't hash like they ought to:

Scalar(2) in {Scalar(2)}  # False

Support getattr as a find alias on Node?

BS turns attribute getting into an alias for find

dom.a.b # == dom.find('a').find('b')

Soupy doesn't do this yet. The terseness is nice, but it has a few downsides:

  • It's another syntax for doing the same thing
  • You still need find for its extra args/kwargs support
  • It's possible that the name of a tag collides with another soupy method name, which will trip you up.

A few libraries do this (numpy recarrays, pandas), and I always have mixed feelings about it.

Support multiple inputs to each

each currently takes a single function which it maps over the collection of items. Each could take N functions as input, map each one, and pack the result as a Collection of N-tuples. That's more symmetric to what dump() does -- each builds unlabeled tuples, dump builds labeled dicts.

dump should create a tuple if passed args

cc @cryzed

Calling dump on a Node currently applies each kwarg to the node, and packs the result into a dict. It could also alternatively accept args, apply each to the node, and return the result as a tuple:

>>> node.dump(href=Q.attrs['href'], class=Q.attrs['class'])
Scalar({'href': 'https://www.google.com/imghp?hl=en&tab=wi', 'cls': ['gb1']})
>>> node.dump(Q.attrs['href'], Q.attrs['class'])
Scalar(('https://www.google.com/imghp?hl=en&tab=wi', ['gb1']))

To keep things simple, using both args and kwargs is a ValueError

>>> node.dump(Q.attrs['href'], class=Q.attrs['class'])
TypeError("Cannot pass both arguments and keywords to dump")

Easier dict(zip(colnames, values))

A common pattern among my old bs scripts is to extract colulmn names from the header of a table, and then repeatedly dict(zip(names, values)) for each row. A couple of proposals to do that with soupy:

# should work now, cumbersome
cols.each(Q.text).map(lambda vals: dict(zip(names, vals)))

# new method
cols.dictzip(names, Q.text)
cols.each(Q.text).dictzip(names)

# more general zip + mapping
cols.each(Q.text).zip(names).map(reversed).map(dict)

# overload dump -- don't like this
c.each(Q.find_all('td').dump(names, Q.text))

I think I like the idea of adding zip, and then also adding the second version of dictzip which is implemented using zip

Better repr for Q-expressions

If an exception is raised when evaluating a Q expression, the traceback is pretty opaque

<ipython-input-7-fcabd18eb998> in iter_page(gene, dom)
     26     rows = (table.find('tbody')
     27             .find_all('tr', recursive=False)[1:]  # first row is junk
---> 28             .each(Q.find_all('td').each(Q.text.replace('\xa0', ' ')).dictzip(column_names))
     29             )
     30 

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in each(self, func)
    467         """
    468         func = _make_callable(func)
--> 469         return Collection(imap(func, self._items))
    470 
    471     def filter(self, func):

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __init__(self, items)
    433 
    434     def __init__(self, items):
--> 435         super(Collection, self).__init__(list(items))
    436         self._items = self._value
    437         self._assert_items_are_wrappers()

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __eval__(self, val)
   1293     def __eval__(self, val):
   1294         for item in self._items:
-> 1295             val = item.__eval__(val)
   1296         return val
   1297 

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __eval__(self, val)
   1246 
   1247     def __eval__(self, val):
-> 1248         return val.__call__(*self._args, **self._kwargs)
   1249 
   1250 

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in each(self, func)
    467         """
    468         func = _make_callable(func)
--> 469         return Collection(imap(func, self._items))
    470 
    471     def filter(self, func):

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __init__(self, items)
    433 
    434     def __init__(self, items):
--> 435         super(Collection, self).__init__(list(items))
    436         self._items = self._value
    437         self._assert_items_are_wrappers()

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __eval__(self, val)
   1293     def __eval__(self, val):
   1294         for item in self._items:
-> 1295             val = item.__eval__(val)
   1296         return val
   1297 

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __eval__(self, val)
   1246 
   1247     def __eval__(self, val):
-> 1248         return val.__call__(*self._args, **self._kwargs)
   1249 
   1250 

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __call__(self, *args, **kwargs)
    355 
    356     def __call__(self, *args, **kwargs):
--> 357         return self.map(operator.methodcaller('__call__', *args, **kwargs))
    358 
    359     def __eq__(self, other):

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in map(self, func)
    189             Scalar(6)
    190         """
--> 191         return Wrapper.wrap(_make_callable(func)(self._value))
    192 
    193     def apply(self, func):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

It would probably be easier to parse if Q expressions could better repr themselves, and some how add a hint to the traceback about what step failed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.