Git Product home page Git Product logo

soupy's People

Contributors

chrisbeaumont avatar cryzed avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

soupy's Issues

Easier dict(zip(colnames, values))

A common pattern among my old bs scripts is to extract colulmn names from the header of a table, and then repeatedly dict(zip(names, values)) for each row. A couple of proposals to do that with soupy:

# should work now, cumbersome
cols.each(Q.text).map(lambda vals: dict(zip(names, vals)))

# new method
cols.dictzip(names, Q.text)
cols.each(Q.text).dictzip(names)

# more general zip + mapping
cols.each(Q.text).zip(names).map(reversed).map(dict)

# overload dump -- don't like this
c.each(Q.find_all('td').dump(names, Q.text))

I think I like the idea of adding zip, and then also adding the second version of dictzip which is implemented using zip

setup error

Thanks for your package.

When I finished my setup, import soupy, there is error happened :

import soupy
Traceback (most recent call last):
File "", line 1, in
File "/Library/Python/2.7/site-packages/soupy.py", line 139, in
@six.python_2_unicode_compatible
AttributeError: 'module' object has no attribute 'python_2_unicode_compatible'

Do you have any ideas to fix it ?

Thanks for your help.

Support multiple inputs to each

each currently takes a single function which it maps over the collection of items. Each could take N functions as input, map each one, and pack the result as a Collection of N-tuples. That's more symmetric to what dump() does -- each builds unlabeled tuples, dump builds labeled dicts.

Add arguments to require()

Currently require simply checks if a Node is null. It should accept an function, and assert that the function evaluates to true when mapped on the data.

dom.find('a').require(Q['href'].startswith('https'))

Make wrappers hashable

Right now things like Scalar don't hash like they ought to:

Scalar(2) in {Scalar(2)}  # False

Better repr for Q-expressions

If an exception is raised when evaluating a Q expression, the traceback is pretty opaque

<ipython-input-7-fcabd18eb998> in iter_page(gene, dom)
     26     rows = (table.find('tbody')
     27             .find_all('tr', recursive=False)[1:]  # first row is junk
---> 28             .each(Q.find_all('td').each(Q.text.replace('\xa0', ' ')).dictzip(column_names))
     29             )
     30 

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in each(self, func)
    467         """
    468         func = _make_callable(func)
--> 469         return Collection(imap(func, self._items))
    470 
    471     def filter(self, func):

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __init__(self, items)
    433 
    434     def __init__(self, items):
--> 435         super(Collection, self).__init__(list(items))
    436         self._items = self._value
    437         self._assert_items_are_wrappers()

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __eval__(self, val)
   1293     def __eval__(self, val):
   1294         for item in self._items:
-> 1295             val = item.__eval__(val)
   1296         return val
   1297 

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __eval__(self, val)
   1246 
   1247     def __eval__(self, val):
-> 1248         return val.__call__(*self._args, **self._kwargs)
   1249 
   1250 

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in each(self, func)
    467         """
    468         func = _make_callable(func)
--> 469         return Collection(imap(func, self._items))
    470 
    471     def filter(self, func):

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __init__(self, items)
    433 
    434     def __init__(self, items):
--> 435         super(Collection, self).__init__(list(items))
    436         self._items = self._value
    437         self._assert_items_are_wrappers()

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __eval__(self, val)
   1293     def __eval__(self, val):
   1294         for item in self._items:
-> 1295             val = item.__eval__(val)
   1296         return val
   1297 

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __eval__(self, val)
   1246 
   1247     def __eval__(self, val):
-> 1248         return val.__call__(*self._args, **self._kwargs)
   1249 
   1250 

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in __call__(self, *args, **kwargs)
    355 
    356     def __call__(self, *args, **kwargs):
--> 357         return self.map(operator.methodcaller('__call__', *args, **kwargs))
    358 
    359     def __eq__(self, other):

/Users/cbeaumont/anaconda/lib/python2.7/site-packages/soupy.pyc in map(self, func)
    189             Scalar(6)
    190         """
--> 191         return Wrapper.wrap(_make_callable(func)(self._value))
    192 
    193     def apply(self, func):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

It would probably be easier to parse if Q expressions could better repr themselves, and some how add a hint to the traceback about what step failed

Support getattr as a find alias on Node?

BS turns attribute getting into an alias for find

dom.a.b # == dom.find('a').find('b')

Soupy doesn't do this yet. The terseness is nice, but it has a few downsides:

  • It's another syntax for doing the same thing
  • You still need find for its extra args/kwargs support
  • It's possible that the name of a tag collides with another soupy method name, which will trip you up.

A few libraries do this (numpy recarrays, pandas), and I always have mixed feelings about it.

dump should create a tuple if passed args

cc @cryzed

Calling dump on a Node currently applies each kwarg to the node, and packs the result into a dict. It could also alternatively accept args, apply each to the node, and return the result as a tuple:

>>> node.dump(href=Q.attrs['href'], class=Q.attrs['class'])
Scalar({'href': 'https://www.google.com/imghp?hl=en&tab=wi', 'cls': ['gb1']})
>>> node.dump(Q.attrs['href'], Q.attrs['class'])
Scalar(('https://www.google.com/imghp?hl=en&tab=wi', ['gb1']))

To keep things simple, using both args and kwargs is a ValueError

>>> node.dump(Q.attrs['href'], class=Q.attrs['class'])
TypeError("Cannot pass both arguments and keywords to dump")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.