coleifer / micawber Goto Github PK
View Code? Open in Web Editor NEWa small library for extracting rich content from urls
Home Page: http://micawber.readthedocs.org/
License: MIT License
a small library for extracting rich content from urls
Home Page: http://micawber.readthedocs.org/
License: MIT License
The most recent commit is a valuable addition and it would be great to see it rolled into an "official" release. In the meantime, deploying via git is working.
I'm considering migrating to micawber from a custom oembed consumer, and wanted to suggest a performance improvement that I am willing to generate a PR for.
I'd like to extend the ProviderRegistry
with a secondary internal register that nests providers under domain names.
this would allow users to optionally avoid a regex match against every provider and only test the domain.
some light tests on a quick mockup showed the lookups to run in 30% the time -- including the overhead of parsing the domain name from a url, but about 5% of the time if you have the domain already.
we would be using this on a high volume indexer, so this performance is a need.
Hi!
Falsely reported to nikola (to add more features), I'm now reporting this here as a feature request:
It would be great to integrate videos/ streams from https://media.ccc.de into this library.
The service is run by the German hacker association Chaos Computer Club (CCC), which hosts annual events itself and lends streaming expertise to many external events via its Video Operation Center (VOC).
The streaming service is a valuable source of information on many different topics and I think it would be an awesome addition!
If you have pointers on where I can add it (I assume somewhere in providers.py), I might be able to do a pull request myself. I wouldn't call myself a Python expert though :-)
Hello,
There is a security concern that is generally not taken care of in oEmbed solutions: if one uses these solutions to provide media display of user input, one has to take care of malicious users filling their input with dozens or hundreds of links. (posted in order to clutter the other viewers' pages)
So I wonder if there is a simple way with micawber to limit the number of links parsed.
Thanks
I am trying to embed that video in my nikola blog post.
The video is embed in http no matter what configuration I use.
I tried both http and https with the following syntax:
# With http
.. media:: http://www.dailymotion.com/video/x1apjif_une-arbalete-de-poche-fabriquee-manuellement_tv
# With https
.. media:: https://www.dailymotion.com/video/x1apjif_une-arbalete-de-poche-fabriquee-manuellement_tv
The problem is that the video is hidden by Firefox when I use the https version of my blog.
Is this a bug in micawber or, as @RAISINA mentioned here, is it a dailymotion issue?
If it can be of any help, I am currently using:
Here is my original issue
Since 0.3.7 I have troubles running the tests during packaging.
running test
running egg_info
writing micawber.egg-info/PKG-INFO
writing dependency_links to micawber.egg-info/dependency_links.txt
writing top-level names to micawber.egg-info/top_level.txt
reading manifest file 'micawber.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'micawber.egg-info/SOURCES.txt'
running build_ext
test_extract (micawber.tests.ParserTestCase) ... ok
test_html_entities (micawber.tests.ParserTestCase) ... ok
test_multiline (micawber.tests.ParserTestCase) ... ok
test_multiline_full (micawber.tests.ParserTestCase) ... ok
test_outside_of_markup (micawber.tests.ParserTestCase) ... ok
test_parse_text (micawber.tests.ParserTestCase) ... ok
test_parse_text_full (micawber.tests.ParserTestCase) ... ok
test_urlize (micawber.tests.ParserTestCase) ... ok
test_caching (micawber.tests.ProviderTestCase) ... ok
test_caching_params (micawber.tests.ProviderTestCase) ... ok
test_invalid_json (micawber.tests.ProviderTestCase) ... ok
test_multiple_matches (micawber.tests.ProviderTestCase) ... ok
test_provider (micawber.tests.ProviderTestCase) ... ok
test_provider_matching (micawber.tests.ProviderTestCase) ... ok
test_register_unregister (micawber.tests.ProviderTestCase) ... ok
----------------------------------------------------------------------
Ran 15 tests in 0.082s
OK
Running micawber tests
All micawber tests passed
Running django integration tests
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/django/apps/config.py", line 118, in create
cls = getattr(mod, cls_name)
AttributeError: module 'micawber.contrib.mcdjango' has no attribute 'mcdjango_tests'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "setup.py", line 35, in <module>
test_suite='runtests.runtests',
File "/usr/lib/python3.7/site-packages/setuptools/__init__.py", line 140, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3.7/site-packages/setuptools/command/test.py", line 228, in run
self.run_tests()
File "/usr/lib/python3.7/site-packages/setuptools/command/test.py", line 250, in run_tests
exit=False,
File "/usr/lib/python3.7/unittest/main.py", line 100, in __init__
self.parseArgs(argv)
File "/usr/lib/python3.7/unittest/main.py", line 147, in parseArgs
self.createTests()
File "/usr/lib/python3.7/unittest/main.py", line 159, in createTests
self.module)
File "/usr/lib/python3.7/unittest/loader.py", line 220, in loadTestsFromNames
suites = [self.loadTestsFromName(name, module) for name in names]
File "/usr/lib/python3.7/unittest/loader.py", line 220, in <listcomp>
suites = [self.loadTestsFromName(name, module) for name in names]
File "/usr/lib/python3.7/unittest/loader.py", line 205, in loadTestsFromName
test = obj()
File "/build/python-micawber/src/python-micawber-0.3.7/runtests.py", line 80, in runtests
dj_failures = run_django_tests()
File "/build/python-micawber/src/python-micawber-0.3.7/runtests.py", line 60, in run_django_tests
setup()
File "/usr/lib/python3.7/site-packages/django/__init__.py", line 24, in setup
apps.populate(settings.INSTALLED_APPS)
File "/usr/lib/python3.7/site-packages/django/apps/registry.py", line 89, in populate
app_config = AppConfig.create(entry)
File "/usr/lib/python3.7/site-packages/django/apps/config.py", line 123, in create
import_module(entry)
File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'micawber.contrib.mcdjango.mcdjango_tests'
It seems currently only the templates of mcdjango are packaged.
The last version on PyPI is from 2015. Could a more recent version be uploaded?
I have a Markdown RichText field in my Django app that I'm using micawber for converting video links into embedded videos. I only want micawber to convert links on their own line into embedded media however. I don't want it to convert my markdown links, the mardown converter will take care of those.
So far the text is first run through an oembed_no_urlize
function as described in your documentation:
from micawber.contrib.mcdjango import extension
oembed_no_urlize = extension('oembed', urlize_all=False)
Inline YouTube links are still oEmbed converted though, so a Markdown link like
[5 minutter og ti sekunder](http://www.youtube.com/watch?v=chbOViRudAg&t=5m10s)
is converted into
<a href='a href="http://www.youtube.com/watch?v=chbOViRudAg&t=5m10s" title="Joo Sae Hyuk Vs Chuang Chih Yuan: WTTC 2014: 1/4 Final AMAZING MATCH">Joo Sae Hyuk Vs Chuang Chih Yuan: WTTC 2014: 1/4 Final AMAZING MATCH</a'>5 minutter og ti sekunder</a>
when first converted by micawber and then markdown.
Is it possible to disable all inline conversion?
Hello! What versions of Django micawber does support?
Now with Django 1.9.x I get RemovedInDjango110Warning
warnings in log.
.../site-packages/django/template/loader.py:97: RemovedInDjango110Warning: render() must be called with a dict, not a Context.
return template.render(context, request)
It's because of render_to_string
function. I looked through 1.8-1.10 Django docs. Looks like this function really waiting for dict.
At present if no provider is found for a URL, and urlize_all is True, the urlize function appears to always be called which renders a simple link. There doesn't appear to be a way to change this.
I'd like to be able to customize this fallback behavior - perhaps by passing in a function as is done with the handlers - for example if I want to render the link with target="_blank", or use the domain instead of the full URL in the title, etc.
Is there a way to do this at present?
Hi Charles, this is a separate ticket to continue this discussion.
We added the description of Iframely's approach to providers here: https://iframely.com/docs/providers.
Though our preference would be to bootstrap for all URLs as Iframely can generate summary cards, handle link shorteners, detect direct image links, etc.
Another issue is the API endpoint address:
Any suggestions?
Using Python 3.4.2 in ubuntu
In [1]: import micawber
In [2]: micawber.bootstrap_embedly()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-2419c7664bfa> in <module>()
----> 1 micawber.bootstrap_embedly()
/home/tin/.virtualenvs/waliki/lib/python3.4/site-packages/micawber/providers.py in bootstrap_embedly(cache, **params)
203 resp.close()
204
--> 205 json_data = json.loads(contents)
206
207 for provider_meta in json_data:
/usr/lib/python3.4/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
310 if not isinstance(s, str):
311 raise TypeError('the JSON object must be str, not {!r}'.format(
--> 312 s.__class__.__name__))
313 if s.startswith(u'\ufeff'):
314 raise ValueError("Unexpected UTF-8 BOM (decode using utf-8-sig)")
TypeError: the JSON object must be str, not 'bytes'
Using the bootstrap providers results in mixed protocol content in HTTPS sites. I believe using protocol relative urls for providers would fix this...
In [5]: bootstrap_oembedio().request('http://vimeo.com/111410510')
---------------------------------------------------------------------------
ProviderNotFoundException Traceback (most recent call last)
<ipython-input-5-cdc3e12d26dd> in <module>()
----> 1 bootstrap_oembedio().request('http://vimeo.com/111410510')
/home/adas/.virtualenvs/rownosc-info/local/lib/python2.7/site-packages/micawber/providers.pyc in inner(self, url, **params)
91 self.cache.set(key, data)
92 return data
---> 93 return fn(self, url, **params)
94 return inner
95
/home/adas/.virtualenvs/rownosc-info/local/lib/python2.7/site-packages/micawber/providers.pyc in request(self, url, **params)
132 if provider:
133 return provider.request(url, **params)
--> 134 raise ProviderNotFoundException('Provider not found for "%s"' % url)
135
136
ProviderNotFoundException: Provider not found for "http://vimeo.com/111410510"
In [12]: [(k,v) for k,v in bootstrap_oembedio()._registry.items() if 'vimeo' in k]
Out[12]: [(u'vimeo\\.com', <micawber.providers.Provider at 0xb5ebfd0c>)]
Am doing something wrong?
Other way was working...
In [17]: bootstrap_basic().request('http://vimeo.com/111410510')
Out[17]:
{u'author_name': u'Fundacja Picture Doc',
u'author_url': u'http://vimeo.com/user8938954',
u'description': u'Copyright by Fundacja Picture Doc\nCopyright by Fundacja Dialog-Pheniben',
u'duration': 310,
u'height': 720,
u'html': u'<iframe src="//player.vimeo.com/video/111410510" width="1280" height="720" frameborder="0" title="Romowie w Europie. Zag\u0142ada" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>',
u'is_plus': u'1',
u'provider_name': u'Vimeo',
u'provider_url': u'https://vimeo.com/',
u'thumbnail_height': 720,
u'thumbnail_url': u'http://i.vimeocdn.com/video/496100635_1280.jpg',
u'thumbnail_width': 1280,
u'title': u'Romowie w Europie. Zag\u0142ada',
u'type': u'video',
u'uri': u'/videos/111410510',
'url': 'http://vimeo.com/111410510',
u'version': u'1.0',
u'video_id': 111410510,
u'width': 1280}
In [18]: bootstrap_embedly().request('http://vimeo.com/111410510')
Out[18]:
{u'author_name': u'Fundacja Picture Doc',
u'author_url': u'http://vimeo.com/user8938954',
u'description': u'Copyright by Fundacja Picture Doc Copyright by Fundacja Dialog-Pheniben',
u'height': 720,
u'html': u'<iframe class="embedly-embed" src="//cdn.embedly.com/widgets/media.html?src=http%3A%2F%2Fplayer.vimeo.com%2Fvideo%2F111410510&src_secure=1&url=http%3A%2F%2Fvimeo.com%2F111410510&image=http%3A%2F%2Fi.vimeocdn.com%2Fvideo%2F496100635_1280.jpg&type=text%2Fhtml&schema=vimeo" width="1280" height="720" scrolling="no" frameborder="0" allowfullscreen></iframe>',
u'provider_name': u'Vimeo',
u'provider_url': u'https://vimeo.com/',
u'thumbnail_height': 720,
u'thumbnail_url': u'http://i.vimeocdn.com/video/496100635_1280.jpg',
u'thumbnail_width': 1280,
u'title': u'Romowie w Europie. Zag\u0142ada',
u'type': u'video',
'url': 'http://vimeo.com/111410510',
u'version': u'1.0',
u'width': 1280}
In [19]: bootstrap_noembed().request('http://vimeo.com/111410510')
Out[19]:
{u'author_name': u'Fundacja Picture Doc',
u'author_url': u'http://vimeo.com/user8938954',
u'description': u'Copyright by Fundacja Picture Doc\nCopyright by Fundacja Dialog-Pheniben',
u'duration': 310,
u'height': 720,
u'html': u'\n<div class="noembed-embed ">\n <div class="noembed-wrapper">\n \n<div class="noembed-embed-inner noembed-vimeo">\n <iframe src="//player.vimeo.com/video/111410510" width="1280" height="720" frameborder="0" title="Romowie w Europie. Zag\u0142ada" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>\n</div>\n\n <table class="noembed-meta-info">\n <tr>\n <td class="favicon"><img src="https://noembed.com/favicon/Vimeo.png"></td>\n <td>Vimeo</td>\n <td align="right">\n <a title="http://vimeo.com/111410510" href="http://vimeo.com/111410510">http://vimeo.com/111410510</a>\n </td>\n </tr>\n </table>\n </div>\n</div>\n',
u'is_plus': u'1',
u'provider_name': u'Vimeo',
u'provider_url': u'https://vimeo.com/',
u'thumbnail_height': 720,
u'thumbnail_url': u'http://i.vimeocdn.com/video/496100635_1280.jpg',
u'thumbnail_width': 1280,
u'title': u'Romowie w Europie. Zag\u0142ada',
u'type': u'video',
u'uri': u'/videos/111410510',
u'url': u'http://vimeo.com/111410510',
u'version': u'1.0',
u'video_id': 111410510,
u'width': 1280}
Oembed.io support it too
In [23]: requests.get('http://oembed.io/api?url=http://vimeo.com/111410510').json()
Out[23]:
{u'author': u'Fundacja Picture Doc',
u'author_url': u'http://vimeo.com/user8938954',
u'canonical': u'http://vimeo.com/111410510',
u'description': u'Copyright by Fundacja Picture Doc\nCopyright by Fundacja Dialog-Pheniben',
u'duration': 310,
u'html': u'<div class="oembed-widget-container" style="left: 0px; width: 100%; height: 0px; position: relative; padding-bottom: 56%;"><iframe class="oembed-widget oembed-iframe" src="//player.vimeo.com/video/111410510" frameborder="0" style="top: 0px; left: 0px; width: 100%; height: 100%; position: absolute;"></iframe></div>',
u'provider_name': u'Vimeo',
u'thumbnail_height': 720,
u'thumbnail_url': u'http://i.vimeocdn.com/video/496100635_1280.jpg',
u'thumbnail_width': 1280,
u'title': u'Romowie w Europie. Zag\u0142ada',
u'type': u'rich',
u'version': u'1.0'}
In [3]: requests.get('http://api.embed.ly/1/oembed?url=https%3A%2F%2Fiso.500px.com%2Fguest-curator-joel-julius-tjintjelaar-reveals-three-photographers-that-should-have-a-larger-following%2F&maxwidth=500').json()
Out[3]:
{u'author_name': u'DL Cade',
u'author_url': u'https://iso.500px.com/author/dl/',
u'description': u"One of December's talented 500px Guest Curators was photographer Joel (Julius) Tjintjelaar , and he fully embraced the real purpose of the Editors' Choice section: to unveil photos and photographers that might not have made the Popular page for one reason or another... but probably should have.",
u'provider_name': u'500px',
u'provider_url': u'https://iso.500px.com',
u'thumbnail_height': 1000,
u'thumbnail_url': u'https://isocdn.500px.org/wp-content/uploads/2014/12/julius-1500x1000.jpg',
u'thumbnail_width': 1500,
u'title': u'Guest Curator Joel (Julius) Tjintjelaar Reveals Three Photographers that Should Have a Larger Following',
u'type': u'link',
u'url': u'https://iso.500px.com/guest-curator-joel-julius-tjintjelaar-reveals-three-photographers-that-should-have-a-larger-following/',
u'version': u'1.0'}
In [4]: bootstrap_embedly().request('http://iso.500px.com/guest-curator-joel-julius-tjintjelaar-reveals-three-photographers-that-should-have-a-larger-following/')
---------------------------------------------------------------------------
ProviderNotFoundException Traceback (most recent call last)
<ipython-input-4-aca3a4c8cf6f> in <module>()
----> 1 bootstrap_embedly().request('http://iso.500px.com/guest-curator-joel-julius-tjintjelaar-reveals-three-photographers-that-should-have-a-larger-following/')
/tmp/micawber/local/lib/python2.7/site-packages/micawber/providers.pyc in inner(self, url, **params)
91 self.cache.set(key, data)
92 return data
---> 93 return fn(self, url, **params)
94 return inner
95
/tmp/micawber/local/lib/python2.7/site-packages/micawber/providers.pyc in request(self, url, **params)
132 if provider:
133 return provider.request(url, **params)
--> 134 raise ProviderNotFoundException('Provider not found for "%s"' % url)
135
136
ProviderNotFoundException: Provider not found for "http://iso.500px.com/guest-curator-joel-julius-tjintjelaar-reveals-three-photographers-that-should-have-a-larger-following/"
Suppose you've got the following content:
Testing
http://picasaweb.google.com/lh/sredir?uname=test&target=ALBUM&id=123&authkey=abc
(Note: the link itself is not valid due to mangled IDs (it was a private album))
Rendering this content as follows will not work:
{{post.body|linebreaksbr|oembed_html}}
The reason is that the "&" has been escaped and turned into "&". The HTML parser over at https://github.com/coleifer/micawber/blob/master/micawber/parsers.py#L144 does recognize & extract the URL, but it does not unescape &. Hence, & is fed to embed.ly... resulting in a 404 over there.
Hi,
I was trying to include Facebook into the basic list of providers. E.g.,
pr.register('https://www.facebook.com/\S*?/posts/\S*', Provider('https://www.facebook.com/plugins/post/oembed.json'))
or
pr.register('https://www.facebook.com/\S*/photos/\S*', Provider('https://www.facebook.com/plugins/post/oembed.json'))
work perfectly fine. However, when I try
pr.register('https://www.facebook.com/photo.php?fbid=\S*', Provider('https://www.facebook.com/plugins/post/oembed.json')) for a url like
https://www.facebook.com/photo.php?fbid=10204669368414661&set=a.10201344709340262.1073741826.1849311083&type=3&theater
it always comes back with the message "Provider not found for ..."
What am I doing wrong? Is it the regular expression? Or is it an issue with the endpoints?
Many thanks for any feedback.
Getting this error when building a Wagtail project: AttributeError: module 'html5lib.treebuilders' has no attribute '_base'
Looks like it's being picked up elsewhere so hopefully a fix will be released soon..
There is a file conflict between flasgger and micawber, because both install files into the too generic path name examples
.
For reference, please see this Arch Linux bug.
As a solution, micawber and flasgger should either not install these examples at all, or if required into a unique directory (e.g. micawber-examples
) or another system directory (e.g. on Linux: /usr/share/doc/python-micawber/examples
, which is usually done by the packagers).
I will remove them for now to resolve the file conflict.
We get this error for some yet to be clarified reason. Is there a hidden dependency on the BeautifulSoup package?
File "/app/.heroku/python/lib/python3.9/site-packages/micawber/contrib/mcflask.py", line 21, in _oembed
2020-10-25T09:55:22.161763+00:00 app[web.1]: return oembed(s, providers, urlize_all, html, **params)
2020-10-25T09:55:22.161763+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/micawber/contrib/mcflask.py", line 10, in oembed
2020-10-25T09:55:22.161763+00:00 app[web.1]: return Markup(fn(s, providers, urlize_all, **params))
2020-10-25T09:55:22.161764+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/micawber/parsers.py", line 137, in parse_html
2020-10-25T09:55:22.161764+00:00 app[web.1]: raise Exception('Unable to parse HTML, please install BeautifulSoup '
2020-10-25T09:55:22.161764+00:00 app[web.1]: Exception: Unable to parse HTML, please install BeautifulSoup or beautifulsoup4, or use the text parser
If you have encoded html characters like <
and >
inside the same html tag as an untagged link, parse_html
will decode the encoded characters in stead of skipping them. This is inconsistent with the behavior when the encoded character is not inside the same tag as the untagged link, or if the link is already tagged.
from micawber import ProviderRegistry
from micawber import parse_html
text = u'<p>http://www.google.com <script> alert("foo"); </script></p>'
parse_html(text, ProviderRegistry())
Output:
u'<p><a href="http://www.google.com">http://www.google.com</a> <script> alert("foo"); </script></p>'
Here the encoded characters are decoded.
text = u'<p><a href="http://www.google.com">http://www.google.com</a> <script> alert("foo"); </script></p>'
parse_html(text, ProviderRegistry())
Output:
u'<p><a href="http://www.google.com">http://www.google.com</a> <script> alert("foo"); </script></p>'
Here the encoded characters are not decoded.
text = u'<p><script> alert("foo"); </script></p>'
parse_html(text, ProviderRegistry())
Output:
u'<p><script> alert("foo"); </script></p>'
Here the encoded characters are not decoded.
python2.7
Package Version
----------------------------- -------
backports.functools-lru-cache 1.5
beautifulsoup4 4.8.1
micawber 0.5.0
pip 19.2.3
pkg-resources 0.0.0
setuptools 41.4.0
soupsieve 1.9.4
wheel 0.33.6
What is the intended behavior for parse_html
?
I packaged micawber for NixOS. (NixOS/nixpkgs#34948)
I noticed that runtests.py
is not included in the PyPI source, therefore the install tests don't work.
Could you include them, so we can test the builds properly?
got an error from google:
Support for Python 2.5 has turned off. Please refer to https://goo.gl/aESk5L for more information
I noticed that a lot of the regular expression patterns in bootstrap_basic
don't escape dots (match all). This means that a fair number of these patterns will match more than intended.
In addition most patterns aren't marked as raw strings and therefore contain invalid escape sequences. This isn't noticeable directly, but could cause issues in a future python version.
For an example of the latter:
python -W always -c '"https://\S*?soundcloud.com/\S+"'
<string>:1: DeprecationWarning: invalid escape sequence \S
I'm not quite sure where the fault for this lies, but here seems a good start.
Embedding a youtube playlist using embed.ly directly works okay:
http://embed.ly/code?url=https%3A%2F%2Fwww.youtube.com%2Fplaylist%3Flist%3DPLE2714DC8F2BA092D (literally an example playlist heh)
Running it thorough micawber doesn't embed anything using the URL: https://www.youtube.com/playlist?list=PLE2714DC8F2BA092D - using the embed URL of https://www.youtube.com/embed/videoseries?list=PLE2714DC8F2BA092D results in the first video in the series being embedded but no playlist controls.
i know that requests
is a bit of a resource hit (and it's been brought up before), but I wanted to suggest using it as the Provider (or an ancillary option) because it could improve testing.
The responses
library (https://github.com/getsentry/responses) lets you easily intercept calls to the requests
library to quickly write integrated tests. for example:
expected_payload = {'author_name':
}
as_json = json.dumps(expected_payload)
with responses.RequestsMock() as rsps:
rsps.add(responses.GET,
"http://www.youtube.com/oembed",
body=as_json,
status=200,
content_type='text/html',
)
result = providers.request('http://www.youtube.com/watch?v=54XHDUOHuzU')
for (k, v) in expected_payload.items():
assert result[k] == v
This was a big benefit to us for testing and simulations (and incredibly easy to implement via subclassing), so I wanted to suggest it upstream.
Hi, what's the license for the micawber
project?
Would you mind adding a license file for it?
We'd prefer an MIT license if you're open to suggestions.
Thanks
For example:
{{ object.body|oembed:"600" }}
I think there is a problem in fix_width_height function. If only width size passed it sets maxwidth to first digit only:
...
params['maxwidth'] = int(width_height[0])
...
Some of the links are dead.
I'd like more granular exceptions so I can distinguish between exception cases in my calling code. Specifically, I'd like to differentiate when a call to ProviderRegistry.request
fails due to a provider not being found for a URL versus an error fetching a particular endpoint URL.
Let me know what you think about this. I'm happy to fork and make a pull request if you're willing to go this direction.
Hi!
I'm using Flask but this will be usefull for Django and others
Will be supernice to have a feature that accumulates in a per request cache or something which services has been used and correct the content security policy header to include this services as accepted origins
Otherwise the embedded object will not load blocked by the browser and it is not acceptable to allow any origin but only those needed
Thanks a lot!
The bootstrap_oembed provider appears to be broken. The following works fine with other providers (Python 3.7, micawber 0.4.0).
from micawber.providers import bootstrap_oembed
r = bootstrap_oembed()
result = r.provider_for_url("https://i.imgur.com/CZX7D64.jpg")
/usr/local/lib/python3.7/site-packages/micawber/providers.py in provider_for_url(self, url)
136 def provider_for_url(self, url):
137 for regex, provider in self:
--> 138 if re.match(regex, url):
139 return provider
140
/usr/local/lib/python3.7/re.py in match(pattern, string, flags)
171 """Try to apply the pattern at the start of the string, returning
172 a Match object, or None if no match was found."""
--> 173 return _compile(pattern, flags).match(string)
174
175 def fullmatch(pattern, string, flags=0):
/usr/local/lib/python3.7/re.py in _compile(pattern, flags)
284 if not sre_compile.isstring(pattern):
285 raise TypeError("first argument must be string or compiled pattern")
--> 286 p = sre_compile.compile(pattern, flags)
287 if not (flags & DEBUG):
288 if len(_cache) >= _MAXCACHE:
/usr/local/lib/python3.7/sre_compile.py in compile(p, flags)
762 if isstring(p):
763 pattern = p
--> 764 p = sre_parse.parse(p, flags)
765 else:
766 pattern = None
/usr/local/lib/python3.7/sre_parse.py in parse(str, flags, pattern)
928
929 try:
--> 930 p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
931 except Verbose:
932 # the VERBOSE flag was switched on inside the pattern. to be
/usr/local/lib/python3.7/sre_parse.py in _parse_sub(source, state, verbose, nested)
424 while True:
425 itemsappend(_parse(source, state, verbose, nested + 1,
--> 426 not nested and not items))
427 if not sourcematch("|"):
428 break
/usr/local/lib/python3.7/sre_parse.py in _parse(source, state, verbose, nested, first)
652 if item[0][0] in _REPEATCODES:
653 raise source.error("multiple repeat",
--> 654 source.tell() - here + len(this))
655 if item[0][0] is SUBPATTERN:
656 group, add_flags, del_flags, p = item[0][1]
error: multiple repeat at position 44
Would be nice to see this fix added to next release:
/micawber/contrib/mcdjango/__init__.py:4: RemovedInDjango19Warning: django.utils.importlib will be removed in Django 1.9. from django.utils.importlib import import_module
pr.register('http://qik.com/\S*',
Provider('http://qik.com/api/oembed.json'))
pr.register('http://www.polleverywhere.com/\w+/\S+',
Provider('http://www.polleverywhere.com/services/oembed/'))
pr.register('http://www.slideshare.net/\w+/\S+',
Provider('http://www.slideshare.net/api/oembed/2'))
pr.register('http://\w+.wordpress.com/\S+',
Provider('http://public-api.wordpress.com/oembed/'))
pr.register('http://*.revision3.com/\S+',
Provider('http://revision3.com/api/oembed/'))
pr.register('http://www.slideshare.net/\w+/\S+',
Provider('http://api.smugmug.com/services/oembed/'))
pr.register('http://\w+.viddler.com/\S+',
Provider('http://lab.viddler.com/services/oembed/'))
I tried to parse a youtube https url by the steps:
import micawber
providers = micawber.bootstrap_basic()
url = "https://www.youtube.com/watch?v=5BbSe_pI_eo"
micawber.parse_text(url, providers)
output:
<iframe width="480" height="270" src="http://www.youtube.com/embe/5BbSe_pI_eo?feature=oembed" frameborder="0" allowfullscreen></iframe>
The result still use http url instead of https. Is this due to the design of micawber or the limitation of youtube?
Hi, so I am following your example. When I pull the json from
micawber.bootstrap_basic().request('https://www.youtube.com/watch?v=M9taeyvPQzg')
It pulls all the info, but for some reason it does not pull the duration?
It seems that the oembed.io domain is no longer registered. This means that bootstrap_oembedio
is dead code for all intents and purposes.
It could possibly be replaced by https://oembed.com/ (https://oembed.com/providers.json)
I get the error shown below when I run the Peewee sample blog app from here: https://github.com/coleifer/peewee/tree/master/examples/blog
Specifically this happens when Micawber tries to display a post with links that need converting to embeds (e.g. a YouTube video link).
I've been able to reproduce this reliably with different links (e.g. Vimeo links instead of YouTube) and different browsers. It doesn't always happen immediately, but if you click around to view the posts with embeds, then return to the index page, then view posts again, the error appears and the page is either unavailable or shows the page with no CSS. Errors in the console show that files failed to load: Failed to load resource: net::ERR_SOCKET_NOT_CONNECTED
This is in a Python 2.7.10 virtualenv on Ubuntu 15.10 running the Flask dev server.
Interestingly, running it in a Python 3.4 virtualenv works without issues. But it would be great to have a fix for Python 2.
Exception happened during processing of request from ('127.0.0.1', 33044)
Traceback (most recent call last):
File "/usr/lib/python2.7/SocketServer.py", line 295, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 321, in process_request
self.finish_request(request, client_address)
File "/usr/lib/python2.7/SocketServer.py", line 334, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python2.7/SocketServer.py", line 655, in __init__
self.handle()
File "/home/tom/.virtualenvs/peewee-blog/local/lib/python2.7/site-packages/werkzeug/serving.py", line 216, in handle
rv = BaseHTTPRequestHandler.handle(self)
File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle
self.handle_one_request()
File "/home/tom/.virtualenvs/peewee-blog/local/lib/python2.7/site-packages/werkzeug/serving.py", line 247, in handle_one_request
self.raw_requestline = self.rfile.readline()
IOError: [Errno 11] Resource temporarily unavailable
We need to grab oembed data and would prefer to do it ourselves using the providers in micawber.providers.bootstrap_basic
.
There are some services that don't provide endpoints (thinking of Facebook and Vine, in particular) or aren't defined in bootstrap_basic
. We want to compose a ProviderRegistry
instance which tries providers from bootstrap_basic
first, falling back to oembedio or Embedly if nothing is found.
Our current (proposed) solution:
from micawber import bootstrap_basic, bootstrap_embedio
# embedio first so that basic providers overwrite embedio providers
# a bit icky since it relies on internal registry implementation
providers = bootstrap_embedio()
for provider in boostrap_basic():
providers.register(provider)
That seems a bit ... circuitous. So, here a couple of ways to provide composited ProverRegistry
s that I can think of:
bootstrap_*
funcs to take an optional registry
argument that defaults to None
, but is used if passed,def bootstrap_basic(pr=None, cache=None):
pr = pr or ProviderRegistry(cache)
...
return pr
bootstrap_basic
so that they're available to use by library users.PROVIDERS = {
'http://blip.tv/\S+': 'http://blip.tv/oembed',
...
}
def bootstrap_basic(cache=None)
pr = ProviderRegistry(cache)
for regex, endpoint in PROVIDERS.items():
pr.register(regex, Provider(endpoint))
return pr
Thoughts?
If you want to add my web service http://monitor.eibriel.com as a provider:
providers.register('http://monitor.eibriel.com/\S*', Provider('http://monitor.eibriel.com/api/job/oembed'))
Example: http://monitor.eibriel.com/54f317ef7ff6a915d864496a
Bests!
>>> import micawber
>>> providers = micawber.bootstrap_basic()
>>> micawber.parse_html('<p>http://www.youtube.com/watch?v=54XHDUOHuzU</p>', providers)
u'<html><body><p><html><body><iframe allowfullscreen="" frameborder="0" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?feature=oembed" width="459"></iframe></body></html></p></body></html>'
What is html and body tags ? i do not need it.
>>> micawber.parse_text('http://www.youtube.com/watch?v=54XHDUOHuzU', providers)
u'<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?feature=oembed" frameborder="0" allowfullscreen></iframe>'
>>> micawber.parse_text('<p>http://www.youtube.com/watch?v=54XHDUOHuzU</p>', providers)
u'<p><a href="http://www.youtube.com/watch?v=54XHDUOHuzU" title="Future Crew - Second Reality demo - HD">Future Crew - Second Reality demo - HD</a></p>'
I don't want link, i want iframe, etc, as in docs, even i have other tags in text.
I use bs4, but why it is not in docs as dependency?
ps. Python 2.7.3 (default, Mar 13 2014, 11:03:55)
I'm not 100% certain what the issue is, but I'll give you as many details as possible.
my site is behind https
(only). When trying to embed a map using an https
link, the map does not embed.
When switching the URL to http
the map will embed.
poking in the source I found: https://github.com/coleifer/micawber/blob/master/micawber/contrib/providers.py#L34
Which seems to only accept http
as a valid url for google maps?
I think the problem might be that google redirects to https
if you just go to maps.google.com.
So then when trying to embed a google map with https
it fails the regex match?
Hi, I'm new to micawber, and I'm reading the docs. I have a question about bootstrap_embedly.
If I want to use embed.ly, is bootstrap_embedly required initialization every time? For example, if I call it in a django web app's initialization code, is it going to cause a delay at startup? Or does it cache results for future use?
From the docs:
>>> import micawber
>>> providers = micawber.bootstrap_embedly() # may take a second
>>> print micawber.parse_text('this is a test:\nhttp://www.youtube.com/watch?v=54XHDUOHuzU', providers)
this is a test:
<iframe width="640" height="360" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>
A bit more detail in the docs regarding this issue would be appreciated.
Hi Coleifer,
Thank you for your nice project. :)
Sorry for disturbing you
I found the answer to my issue, so I deleted the issue text as I can't delete all the issue record.
Best regards
and thanks again!
Igor
Hello,
I am would like report short URL for YT eg. http://youtu.be/tS3FDpAiy3k raise ProviderNotFoundException .
Greetings,
Adam Dobrawy
Do you plan to add a way to use custom data retrieval method by any chance?
It could be nice because than different methods could be used, like requests library or asyncio in python3.4.
oauthlib (https://oauthlib.readthedocs.org/en/latest/oauth1/client.html) has this implementation and works pretty nice. Example:
client = oauthlib.oauth1.Client('client_key', client_secret='your_secret')
uri, headers, body = client.sign('http://example.com/request_token')
# Here you do a request
# and next you can grab data from response
So in this case it could be
provider = micawber.bootstrap_basic()
url, headers, body = provider.prepare_request(URL)
# Do a request
provider.parse_response(resp)
Or, could be easier maybe to allow override fetch method by using a callback?
provider.request(URL, fetch_callback=my_callback(url, headers, body))
I know I can monkeypatch fetch method but that don't seem to be a good way in a long run.
When I try to use the template filters:
{% load micawber_tags %}
{{ "http://www.youtube.com/watch?v=mQEWI1cn7HY"|oembed }}
I get an error:
TemplateDoesNotExist at ...
micawber/video.html
This is due to a missing templates folder in the released 0.2.3. version http://pypi.python.org/pypi/micawber/0.2.3 . The same tagged version on github seems to be fine: https://github.com/coleifer/micawber/tree/0.2.3/micawber/contrib/mcdjango
Hi!
I'm currently trying to package this module for Arch Linux [community].
However, while doing so, I realized, that there is no definition of required dependencies.
When grep'ing for imports, I see that tests definitely require beautifulsoup
, because they import it (also they failed hard trying to execute without it being installed). There seems to be a conditional dependency on redis, django and flask. Can you please add them to a requirements.txt or add a Pipfile (and explain why they are needed), so I can add proper (optional, runtime and test) dependencies for the package and people will have an easier time using micawber?
Thanks for your work!
It would be super helpful if you could pass in a percentage for the iFrame width in the django template for oembed.
Cheers
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.