kurtmckee / feedparser Goto Github PK
View Code? Open in Web Editor NEWParse feeds in Python
Home Page: https://feedparser.readthedocs.io/en/latest/
License: Other
Parse feeds in Python
Home Page: https://feedparser.readthedocs.io/en/latest/
License: Other
I am really bad with encoding/charset stuff, but here’s what I am getting:
Python 2.7.9 (default, Feb 10 2015, 03:28:08)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> title = feedparser.parse('http://www.yeni1tarif.com/feed').entries[9]['title']
>>> title
u'Yufkadan K\xc4\xb1ymal\xc4\xb1 Kol B\xc3\xb6re\xc4\u0178i'
>>> print title
Yufkadan Kıymalı Kol Böreği
>>> print title.encode('utf-8')
Yufkadan Kıymalı Kol Böreği
Although if you check out curl 'http://www.yeni1tarif.com/feed' | grep -i yufkadan
the title correctly is <title>Yufkadan Kıymalı Kol Böreği</title>
instead.
XML is set to UTF-8
(curl 'http://www.yeni1tarif.com/feed' | head
):
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
>
and server (apparently) sends correct Content-Type
:
curl -v 'http://www.yeni1tarif.com/feed'
* Trying 94.101.84.135...
* Connected to www.yeni1tarif.com (94.101.84.135) port 80 (#0)
> GET /feed HTTP/1.1
> Host: www.yeni1tarif.com
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/xml; charset=UTF-8
< Last-Modified: Wed, 01 Apr 2015 10:30:35 GMT
< ETag: "0c98f3f9ec42f8fd4b207adbc69e7d18"
< Server: Microsoft-IIS/7.5
< X-Powered-By: PHP/5.4.14
< X-Pingback: http://www.yeni1tarif.com/xmlrpc.php
< Date: Thu, 24 Sep 2015 22:30:34 GMT
< Content-Length: 75344
<
I am not sure what am I missing or doing wrong; or maybe there is a bug somewhere?
Thank you.
Would it be possible to add the opensearch namespace?
I don't know where this would go now (or I'd have written a patch) but in the monolithic version, the namespace additions might look like this:
'http://a9.com/-/spec/opensearch/1.1/': 'opensearch',
and maybe the older
'http://a9.com/-/spec/opensearchrss/1.0/': 'opensearch',
I create simple example:
#!/usr/bin/python3
import feedparser
d = feedparser.parse('http://planet.gnu.org/atom.xml')
print(d['feed']['title'])
I get the following:
Traceback (most recent call last):
File "./test.py", line 5, in <module>
d = feedparser.parse('http://planet.gnu.org/atom.xml')
File "/usr/lib64/python3.4/site-packages/feedparser-5.2.1-py3.4.egg/feedparser/api.py", line 235, in parse
File "/usr/lib64/python3.4/site-packages/drv_libxml2.py", line 189, in parse
eltName = (_d(reader.NamespaceUri()),\
File "/usr/lib64/python3.4/site-packages/drv_libxml2.py", line 70, in _d
return _decoder(s)[0]
File "/usr/lib64/python3.4/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: 'str' does not support the buffer interface
When using python2 this problem don't occur.
feedparser 5.2.1
Some users are reporting this error (bozo_exception) on some servers:
urlopen error [Errno 1] _ssl.c:490: error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error
Presumably it's related to changes to handle the Poodle vulnerability, as described in https://github.com/calmh/unifi-api/issues/22
This is a design change that I think would be a positive move for feedparser. It is somewhat related to #24 but isn't exactly the same.
Right now, if feedparser is parsing a particular element and that element maps to one of it's 'common interface' elements, it is consumed and not accessible individually.
An example of this would be itunes:author
. Because this is mapped to the core author
field, it is not accessible via feed['itunes_author']
. If feedparser's precedence rules make it so that another element also maps to the core author
field and is a higher precedence, it is impossible to access the itunes:author
information.
I think that all tags should be accessible manually and that the mapping to that common interface should be supplementary. It shouldn't throw away any information.
You could do this by making all elements individually accessible like so:
feed['rss:author']
feed['itunes:author']
feed['atom:subtitle']
feed['itunes:subtitle']
You would still be able to access elements via the common interface: feed['author']
or feed['subtitle']
, according to the well documented precedence rules. However, if I, as an application writer, want to say, ensure that any iTunes element takes precedence over the other items, I can do this myself by specifying the individual elements themselves and bypass the common interface.
This is similar to the approach https://github.com/danmactough/node-feedparser takes and I think it allows for a lot more flexibility.
With the bozo detection, my Feeds seem to be not ok
>>> data = feedparser.parse('https://foxmask.trigger-happy.eu/feeds/all.rss.xml')
>>> data.bozo
1
>>> data.bozo_exception
CharacterEncodingOverride('document declared as us-ascii, but parsed as utf-8',)
but when I use chardet, the encoding is well parsed and encoding well detect
>>> data = urlopen('https://foxmask.trigger-happy.eu/feeds/all.rss.xml').read()
>>> chardet.detect(data)
{'confidence': 0.99, 'encoding': 'utf-8'}
What can I do to make the bozo detection works well ?
This feed:
http://eatcodeplay.com/feed.xml
contains errors. However python2 version can find several entries while python3 version does not find any entries.
Hi!
I'm receiving a unichr() arg not in range(0x10000) error when parsing some RSS feeds.
My OS/config:
Linux 4.4.0 SMP Mon Jan 11 22:30:29 CST 2016 x86_64,
Slackware 14.2 (current), Python 2.7.11, and feedparser 5.2.1
Below is the output of parse:
diniz@darkstar:~$ python
Python 2.7.11 (default, Dec 6 2015, 14:10:30)
[GCC 5.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> feedparser.parse('http://feeds.feedburner.com/podcast30min')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/feedparser.py", line 3964, in parse
feedparser.feed(data.decode('utf-8', 'replace'))
File "/usr/lib64/python2.7/site-packages/feedparser.py", line 2124, in feed
sgmllib.SGMLParser.feed(self, data)
File "/usr/lib64/python2.7/sgmllib.py", line 104, in feed
self.goahead(0)
File "/usr/lib64/python2.7/sgmllib.py", line 186, in goahead
self.handle_charref(name)
File "/usr/lib64/python2.7/site-packages/feedparser.py", line 734, in handle_charref
text = unichr(c).encode('utf-8')
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
>>> quit()
Simple test case:
import feedparser
import json
info = {}
parser = feedparser.parse('http://feeds.feedburner.com/codinghorror')
#if parser.bozo == 1:
# info['bozo_message'] = parser.bozo_exception
info['title'] = parser.feed['title']
print(json.dumps(info))
With the bozo check commented out this runs fine and prints the json string with the title, uncomment the bozo check and you get:
SAXParseException('Input is not proper UTF-8, indicate encoding !\nBytes: 0xE2 0x80 0x99 0x73\n',) is not JSON serializable
I would expect this to work the same regardless of checking the bozo exception.
First thanks for this great library.
If you go to the bottom of the page:
https://pythonhosted.org/feedparser/html-sanitization.html#advanced-sanitization
It has a dead link to the platypus attack.
Hope that helps.
Chris
I'm specifically looking to access the itunes:owner element of a podcast feed, but I think that access to arbitrary non-standard elements would be useful.
I might be blind, but I don't see anything in the documentation specifying any sort of guarantee on timezone parsing. From the examples it looks like it always returns GMT time...
The struct_time
object it spits out for the various datetime fields doesn't have the tm_zone
or tm_gmtoff
attributes on my system.
Can I assume that the time it provides is always in GMT? What if the feed doesn't provide a timezone...will it just be assumed as GMT?
I have an issue I cant seem to work out on my own, not sure if it is a bug or if it is my fail on finding a workaround. Maybe you can point me in the right direction..
In the following xml, parsed output only includes the last of the torznab:attr
elements. The torznab xmlns does not resolve, which I think doesn't matter from looking at the code, since it seems external xmlns resolution is disabled by default.
Is this a bug or is there a way for me to get the value of all 4 torznab:attr somehow?
<?xml version="1.0" encoding="UTF-8"?>
<rss version="1.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:torznab="http://torznab.com/schemas/2015/feed">
<channel>
<atom:link href="http://127.0.0.1:9117/" rel="self" type="application/rss+xml" />
<title>TORZNAB</title>
<description>TORZNAB</description>
<link>https://torznab.org/</link>
<lanuage>en-us</lanuage>
<category>search</category>
<image>
<url>http://127.0.0.1:9117/logos/TORZNAB.png</url>
<title>TORZNAB</title>
<link>https://torznab.org/</link>
<description>TORZNAB</description>
</image>
<item>
<title>Ubuntu.14.10.Desktop.64bit.ISO</title>
<guid>https://torznab.org/B415C913643E5FF49FE37D304BBB5E6E11AD5101/comments</guid>
<comments>https://torznab.org/B415C913643E5FF49FE37D304BBB5E6E11AD5101/comments</comments>
<pubDate>Sat, 06 Jul 2013 03:57:49 -0700</pubDate>
<size>1159641169</size>
<description>Ubuntu.14.10.Desktop.64bit.ISO</description>
<link>magnet:?xt=urn:btih:B415C913643E5FF49FE37D304BBB5E6E11AD5101&dn=ubuntu+14+10+desktop+64bit+iso&tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&tr=udp%3A%2F%2Fopen.demonii.com%3A1337</link>
<category>4020</category>
<enclosure url="magnet:?xt=urn:btih:B415C913643E5FF49FE37D304BBB5E6E11AD5101&dn=ubuntu+14+10+desktop+64bit+iso&tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&tr=udp%3A%2F%2Fopen.demonii.com%3A1337" length="253217700" type="application/x-bittorrent" />
<torznab:attr name="magneturl" value="magnet:?xt=urn:btih:B415C913643E5FF49FE37D304BBB5E6E11AD5101&dn=ubuntu+14+10+desktop+64bit+iso&tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&tr=udp%3A%2F%2Fopen.demonii.com%3A1337" />
<torznab:attr name="seeders" value="115" />
<torznab:attr name="peers" value="8" />
<torznab:attr name="infohash" value="B415C913643E5FF49FE37D304BBB5E6E11AD5101" />
</item>
</channel>
</rss>
Hello again and thanks for the great work.
Unless I missed something, I think there is no way to tell Feedparser the feed encoding if we already know it. In my case, I process and convert the feed to utf8 in Node before passing it to Feedparser. The passed feed is in utf8 but the encoding="window-1252"
attribute is still present in the feed content and causes Feedparser to fail. I'd be happy not to have to remove that attribute myself.
Thank you.
I understand that sanitization is for my safety, but there are times when it is silly to do and changes the feed enough to be "too much." Would you accept a pull request to disable the sanitization at the user's request? Maybe based on a flag passed to parse?
In [27]: f['entries'][0]["published"]
Out[27]: u'2016/6/29 15:07:41'
In [28]: f['entries'][0]["published_parsed"]
Out[28]: time.struct_time(tm_year=2016, tm_mon=6, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=153, tm_isdst=0)
In [29]: datetime.datetime.fromtimestamp(time.mktime(f['entries'][0]["published_parsed"]))
Out[29]: datetime.datetime(2016, 6, 1, 0, 0)
as we can see, feedparser parsed "2016/6/29 15:07:41"
into datetime.datetime(2016, 6, 1, 0, 0)
.
After reading the related code roughly, i found _parse_date_iso8601
was used to parse the date. The problem is that 2016/6/29 15:07:41
is not in iso8601 format.
The re pattern used in _parse_date_iso8601
and the result returned:
In [20]: m = re.match("(?P<year>\d{4})(T?(?P<hour>\d{2}):(?P<minute>\d{2})(:(?P<second>\d{2}))?(\.(?P<fracsecond>\d+))?(?P<tz>[+-](?P<tzhour>\d{2})(:(?P<tzmin>\d{2}))?|Z)?)?", '2016/6/29 13:15:50')
In [22]: params = m.groupdict()
In [23]: params
Out[23]:
{'fracsecond': None,
'hour': None,
'minute': None,
'second': None,
'tz': None,
'tzhour': None,
'tzmin': None,
'year': '2016'}
The re pattern used here can only get the year out and the code try to make assumption to the month and day. Why trying to make assumption? So the published_parsed
can not be trusted?
I'm building a thing with feedparser and I'd like to rely on ETags as much as possible. A bunch of webserver versions don't provide ETags with gzipped content.
Currently the way feedparser decides whether to ask for gzip or not is by checking whether python supports gzip:
try:
import gzip
except ImportError:
gzip = None
…
if gzip and zlib:
request.add_header('Accept-encoding', 'gzip, deflate')
elif gzip:
request.add_header('Accept-encoding', 'gzip')
Would you accept a PR adding the possibility to disable gzip?
$ pip install --upgrade feedparser
Collecting feedparser
Using cached feedparser-5.2.0.post1.tar.bz2
Installing collected packages: feedparser
Found existing installation: feedparser 5.2.0
Uninstalling feedparser-5.2.0:
Successfully uninstalled feedparser-5.2.0
Running setup.py install for feedparser
Successfully installed feedparser-5.2.0
$ pip install --upgrade feedparser
Collecting feedparser
Using cached feedparser-5.2.0.post1.tar.bz2
Installing collected packages: feedparser
Found existing installation: feedparser 5.2.0
Uninstalling feedparser-5.2.0:
Successfully uninstalled feedparser-5.2.0
Running setup.py install for feedparser
Successfully installed feedparser-5.2.0
$ pip install --upgrade feedparser --no-cache-dir
Collecting feedparser
Downloading feedparser-5.2.0.post1.tar.bz2 (192kB)
100% |████████████████████████████████| 192kB 471kB/s
Installing collected packages: feedparser
Found existing installation: feedparser 5.2.0
Uninstalling feedparser-5.2.0:
Successfully uninstalled feedparser-5.2.0
Running setup.py install for feedparser
Successfully installed feedparser-5.2.0
$ pip install --upgrade feedparser --no-cache-dir
Collecting feedparser
Downloading feedparser-5.2.0.post1.tar.bz2 (192kB)
100% |████████████████████████████████| 192kB 2.5MB/s
Installing collected packages: feedparser
Found existing installation: feedparser 5.2.0
Uninstalling feedparser-5.2.0:
Successfully uninstalled feedparser-5.2.0
Running setup.py install for feedparser
Successfully installed feedparser-5.2.0
$
It looks like the setup.py says it's version 5.2.0 but sdist says otherwise
I am sorry if it the wrong place for this kind of bugs,
cheers
According to this the default timeout in urllib2 is -1, or None. So... this is a problem for long running programs, when occasionally some connection will hang everything.
Solution is pretty simple, add a timeout to the 'open' here
Line 175 in 39a7157
I'll fork and try make a fix
data variable does not exist:
feedparser/feedparser/encodings.py
Line 43 in f019d06
The handling of the media:description element in 5.2.1 ends up overwriting the 'content' field of an item. This seems like a particular case of issue #35.
An example feed item and test script are attached. 'description' and 'summary' of the single entry in the feed are set to the full story text (starting "Just like you sync your tablet ..."), which is some 4400-odd bytes. But 'content' is set to the 101-byte caption of the photo (starting "Bonnie Plants’ Homegrown free app keeps you growing in the garden.").
One possible fix is to make _start_media_description()/_end_media_description() could be their own methods instead of an alias for _start_description(), and it could do something like what _start_media_license()/_end_media_license() do. Or maybe _start_description()
needs to be more complicated and do something different when in a media:content context.
I'm happy to work on a patch, if given a direction to pursue.
Hi,
Many sites uses 3rd party software to distribute content (feedsportal, feedburner, etc.). In such case we always get 301 or 302 from http://mysite.com/feed to http://feeds.feedburner.com/mysite. In the end we get 200 or 304.
However parse()
always return status set to 301. I think we should return final response code.
Here's my patch:
replace lines
1969 if hasattr(f, 'status'):
1970 result['status'] = f.status
with
result['status'] = getattr(f, 'code', 0)
Or maybe we can add this as new entry in result (along with status)
See the dates on this feed: https://www.daydeal.ch/rss.xml
<pubDate>Thu, 03 Sep 15 00:01:01 +0200</pubDate>
This is being parsed as:
>>> entry.get('published_parsed')
time.struct_time(tm_year=200, tm_mon=9, tm_mday=2, tm_hour=15, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=245, tm_isdst=0)
In the year 200!
According to RSS 2.0 Specification, category item may include multiple values.
You may include as many category elements as you need to, for different domains, and to have an item cross-referenced in different parts of the same domain.
There's a sample including multiple category values as below.
>>> import feedparser
>>> feedparser.__version__
'5.2.1'
>>> data = feedparser.parse('http://www.validome.org/check/RSS_validator/version/rss_2_0/action/xml/feed/234')
>>> data.feed.get('category')
u'category/subcategory/subcategory2'
Is this a bug?
I am using guv
and feedparser
to parse multiple feeds simultaneously. The following is my code:
def parse_feed(_feed):
return feedparser.parse(_feed)
def main():
urls = ["http://feeds.bbci.co.uk/news/rss.xml"]
pool = guv.GreenPool()
results = pool.starmap(parse_feed, zip(urls))
for resp in results:
print(str(resp))
However, I get the following output:
{'bozo_exception': TypeError('a float is required',), 'bozo': 1, 'feed': {}, 'entries': []}
I have the similar problem using Eventlet
, but not with native Python 3 threading
library.
From the very first code bit on the Introduction page:
>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
All seems to work OK, except:
>>> d["feed"]["title"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/feedparser.py", line 357, in __getitem__
return dict.__getitem__(self, key)
KeyError: 'title'
The source of atom10.xml is
<!DOCTYPE html>
<body style="padding:0; margin:0;">
<html>
<body>
<iframe src="http://mcc.godaddy.com/park/p3WlpJAhMJMlMF5vMKD=" style="visibility: visible;height: 100%; position:absolute" allowtransparency="true" marginheight="0" marginwidth="0" frameborder="0" width="100%">
</iframe>
</body>
</html>
(I formatted it, as it was all on one line). From what I can gather, the domain is parked but has no content. So, if you own the domain, could you an acceptable RSS feed file at the indicated URL? If not, could you find another sample RSS feed to use? You could probably just post a file on Github or something.
Thanks!
When going to this feed, the published
field is populated as the date twice in a row (u'2016-05-24 14:17:57.02016-05-24 14:17:57.0'
).
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.unodc.org/misc/feed.xsl"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0"><channel><title>UNODC Publications</title><link>http://www.unodc.org/unodc/en/feed/publications.xml</link><description>UNODC Publications</description><item><title>World wildlife crime report 2016</title><link>http://www.unodc.org/documents/data-and-analysis/wildlife/World_Wildlife_Crime_Report_2016_final.pdf</link><guid>http://www.unodc.org/documents/data-and-analysis/wildlife/World_Wildlife_Crime_Report_2016_final.pdf</guid><description></description><pubDate>Tue, 24 May 2016 2:17:57 PM CEST</pubDate></item><item><title>The Afghan Opiate Trade and Africa - A Baseline Assessment- 2016</title><link>http://www.unodc.org/documents/data-and-analysis/Afghanistan/Afghan_Opiate_trade_Africa_2016_web.pdf</link><guid>http://www.unodc.org/documents/data-and-analysis/Afghanistan/Afghan_Opiate_trade_Africa_2016_web.pdf</guid><description></description><pubDate>Wed, 16 Mar 2016 4:34:12 PM CET</pubDate></item><item><title>Afghanistan Opium Survey 2015 - Socio-economic analysis</title><link>http://www.unodc.org/documents/crop-monitoring/Afghanistan/Afghanistan_opium_survey_2015_socioeconomic.pdf</link><guid>http://www.unodc.org/documents/crop-monitoring/Afghanistan/Afghanistan_opium_survey_2015_socioeconomic.pdf</guid><description> Afghanistan Opium Survey 2015 - Socio-economic analysis </description><pubDate>Wed, 16 Mar 2016 2:19:37 PM CET</pubDate></item><item><title>Afghanistan Opium Survey 2015 - Cultivation and Production</title><link>http://www.unodc.org/documents/crop-monitoring/Afghanistan/_Afghan_opium_survey_2015_web.pdf</link><guid>http://www.unodc.org/documents/crop-monitoring/Afghanistan/_Afghan_opium_survey_2015_web.pdf</guid><description></description><pubDate>Fri, 18 Dec 2015 1:19:09 PM CET</pubDate></item><item><title>Southeast Asia Opium Survey 2015 - Lao PDR, Myanmar</title><link>http://www.unodc.org/documents/crop-monitoring/sea/Southeast_Asia_Opium_Survey_2015_web.pdf</link><guid>http://www.unodc.org/documents/crop-monitoring/sea/Southeast_Asia_Opium_Survey_2015_web.pdf</guid><description>Southeast Asia Opium Survey 2015 - Lao PDR, Myanmar</description><pubDate>Tue, 15 Dec 2015 4:30:42 PM CET</pubDate></item><item><title>Drug Money - the illicit proceeds of opiates trafficked on the Balkan route</title><link>http://www.unodc.org/documents/data-and-analysis/Studies/IFF_report_2015_final_web.pdf</link><guid>http://www.unodc.org/documents/data-and-analysis/Studies/IFF_report_2015_final_web.pdf</guid><description></description><pubDate>Thu, 26 Nov 2015 3:15:21 PM CET</pubDate></item><item><title>Strengthening the medico-legal response to sexual violence</title><link>http://www.unodc.org/documents/publications/WHO_RHR_15.24_eng.pdf</link><guid>http://www.unodc.org/documents/publications/WHO_RHR_15.24_eng.pdf</guid><description></description><pubDate>Wed, 25 Nov 2015 11:07:00 AM CET</pubDate></item><item><title>Afghanistan Opium Survey 2015 - Executive Summary</title><link>http://www.unodc.org/documents/crop-monitoring/Afghanistan/Afg_Executive_summary_2015_final.pdf</link><guid>http://www.unodc.org/documents/crop-monitoring/Afghanistan/Afg_Executive_summary_2015_final.pdf</guid><description>Afghanistan Opium Survey 2015 - Executive Summary</description><pubDate>Wed, 14 Oct 2015 7:53:45 AM CEST</pubDate></item><item><title>Estado Plurinacional de Bolivia - Monitoreo de Cultivos de Coca 2014 </title><link>http://www.unodc.org/documents/bolivia/Bolivia_Informe_Monitoreo_Coca_2014.pdf</link><guid>http://www.unodc.org/documents/bolivia/Bolivia_Informe_Monitoreo_Coca_2014.pdf</guid><description></description><pubDate>Tue, 18 Aug 2015 11:04:00 AM CEST</pubDate></item><item><title>Peru - Informe Monitoreo de Cultivos de Coca 2014 (Summary in English included)</title><link>http://www.unodc.org/documents/crop-monitoring/Peru/Peru_Informe_monitoreo_coca_2014_web.pdf</link><guid>http://www.unodc.org/documents/crop-monitoring/Peru/Peru_Informe_monitoreo_coca_2014_web.pdf</guid><description></description><pubDate>Wed, 15 Jul 2015 6:51:02 PM CEST</pubDate></item></channel></rss>
published_parsed
is correct though:
>>> parser['entries'][0].get('published')
u'2016-05-24 14:17:57.02016-05-24 14:17:57.0'
>>> parser['entries'][0].get('published_parsed')
time.struct_time(tm_year=2016, tm_mon=5, tm_mday=24, tm_hour=14, tm_min=17, tm_sec=57, tm_wday=1, tm_yday=145, tm_isdst=0)
>>>
This is on Python 2.7.11 using version 5.2.1 from PyPI and the following virtualenv:
BeautifulSoup==3.2.1
boto3==1.2.2
botocore==1.3.30
cssselect==0.9.1
docutils==0.12
feedparser==5.1.3
futures==2.2.0
goose-extractor==1.0.25
jieba==0.38
jmespath==0.9.0
lambda-uploader==0.5.1
lxml==3.6.0
nltk==3.2.1
Pillow==3.2.0
piprot==0.9.6
python-dateutil==2.5.3
python-lambda-local==0.1.2
requests==2.3.0
requests-futures==0.9.7
simplejson==3.8.2
six==1.10.0
virtualenv==15.0.1
When I install feedparser from git it works. If install from pypi version 5.2.0 it fails with invalid python 3 syntax.
If one has a connection to a server which does not sent any data the connection will be kept open for a infinite time. It would be good to have a timeout parameter which is transfered down to the urllib2 library.
Thank you
parser is affected by order by descriotion and content:encoded.
A test case is below.
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>title</title>
<link>http://www.example.com/</link>
<item>
<title>title2</title>
<description>hoge</description>
<content:encoded><![CDATA[
fuga
]]></content:encoded>
<link>http://example.com/2.html</link>
</item>
<item>
<title>title1</title>
<content:encoded><![CDATA[
fuga
]]></content:encoded>
<description>hoge</description>
<link>http://example.com/1.html</link>
</item>
</channel>
</rss>
Above two entries' description and content:encoded are just same except order.
But the result is not same..
In [4]: a.entries[0].content
Out[4]: [{'base': '', 'language': None, 'type': 'text/html', 'value': 'fuga'}]
In [6]: a.entries[1].content
Out[6]:
[{'base': '', 'language': None, 'type': 'text/html', 'value': 'fuga'},
{'base': '', 'language': None, 'type': 'text/plain', 'value': 'hoge'}]
In [5]: a.entries[0].description
Out[5]: 'hoge'
In [7]: a.entries[1].description
Out[7]: 'fuga'
It seems because
(1)content is copied to sumary
https://github.com/kurtmckee/feedparser/blob/develop/feedparser/namespaces/_base.py#L482-L483
(2)summary is set to content
https://github.com/kurtmckee/feedparser/blob/develop/feedparser/namespaces/_base.py#L428-L430
this behavior is affected by order, it seems strange to me.
{'feed': {}, 'entries': [], 'bozo': 1, 'bozo_exception': URLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)'),)}
It works some day ago, but now doesn't.
feedparser.parse('https://habrahabr.ru/rss/feed/posts/5d0c9b4397559e2e7cb380b29ec8151b/')
Hi,
For the feed "http://www.feedbooks.com/books/top.atom?category=FBHIS000000&lang=en&range=week" the feed source is
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:opds="http://opds-spec.org/2010/catalog" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:odl="http://opds-spec.org/odl" xml:lang="en" xmlns:app="http://www.w3.org/2007/app">
<id>http://www.feedbooks.com/books/top.atom?category=FBHIS000000&lang=en&range=week</id>
<title>History</title>
<updated>2015-07-06T15:41:14Z</updated>
<icon>http://assets1.feedbooks.net/images/favicon.ico?t=1436193026</icon>
<author>
<name>Feedbooks</name>
<uri>http://www.feedbooks.com</uri>
<email>[email protected]</email>
</author>
<link type="application/atom+xml; profile=opds-catalog; kind=acquisition" title="Most Popular" href="http://www.feedbooks.com/books/top.atom?category=FBHIS000000&lang=en&range=week" rel="self"/>
<link type="application/atom+xml;profile=opds-catalog;kind=navigation" title="Home" href="http://www.feedbooks.com/catalog.atom" rel="start"/>
<link type="application/opensearchdescription+xml" title="Search on Feedbooks" href="http://www.feedbooks.com/opensearch.xml" rel="search"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="Bookshelf" href="https://www.feedbooks.com/user/bookshelf.atom" rel="http://opds-spec.org/shelf"/>
<opensearch:totalResults>41</opensearch:totalResults>
<opensearch:itemsPerPage>20</opensearch:itemsPerPage>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="Next Page" href="http://www.feedbooks.com/books/top.atom?category=FBHIS000000&lang=en&page=2&protection=false" rel="next"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="Recently Added" href="http://www.feedbooks.com/books/recent.atom?category=FBHIS000000&lang=en&protection=false" rel="http://opds-spec.org/sort/new"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="History by country" href="/books/top.atom?category=FBHIS000000N&lang=en&protection=false" opds:facetGroup="In category" rel="http://opds-spec.org/facet" thr:count="4"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="English" href="/books/top.atom?category=FBHIS000000&lang=en&protection=false" opds:facetGroup="Language" rel="http://opds-spec.org/facet" thr:count="41" opds:activeFacet="true"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="French" href="/books/top.atom?category=FBHIS000000&lang=fr&protection=false" opds:facetGroup="Language" rel="http://opds-spec.org/facet" thr:count="11"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="German" href="/books/top.atom?category=FBHIS000000&lang=de&protection=false" opds:facetGroup="Language" rel="http://opds-spec.org/facet" thr:count="1"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="Spanish" href="/books/top.atom?category=FBHIS000000&lang=es&protection=false" opds:facetGroup="Language" rel="http://opds-spec.org/facet" thr:count="10"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="Italian" href="/books/top.atom?category=FBHIS000000&lang=it&protection=false" opds:facetGroup="Language" rel="http://opds-spec.org/facet" thr:count="6"/>
<entry>
<title>The Prince</title>
<id>http://www.feedbooks.com/book/94</id>
<author>
<name>Niccolò Machiavelli</name>
<uri>http://www.feedbooks.com/author/36</uri>
</author>
<published>2007-01-02T19:44:01Z</published>
<updated>2015-03-06T16:41:55Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1513</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FSHUM000000N" label="Human Science"/>
<category scheme="http://www.feedbooks.com/categories" term="FBPHI000000" label="Philosophy"/>
<category scheme="http://www.feedbooks.com/categories" term="FBSOC000000" label="Social science"/>
<category scheme="http://www.feedbooks.com/categories" term="FBPOL000000" label="Political science"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000N" label="History by country"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036000" label="United States"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036020N" label="Other"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS027000" label="Military"/>
<summary>Il Principe (The Prince) is a political treatise by the Florentine public servant and political theorist Niccolò Machiavelli. Originally called De Principatibus (About Principalities), it was written in 1513, but not published until 1532, five yea...</summary>
<dcterms:extent>32,174 words</dcterms:extent>
<dcterms:source>Wikisource</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/94" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/94.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/94.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/94.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/94.jpg?size=large&t=1425660115" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/94.jpg?t=1425660115" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/94.atom" rel="alternate"/>
</entry>
<entry>
<title>The Code of Hammurabi</title>
<id>http://www.feedbooks.com/book/4239</id>
<author>
<name>Hammurabi</name>
<uri>http://www.feedbooks.com/author/1216</uri>
</author>
<published>2009-09-23T07:29:12Z</published>
<updated>2015-03-06T16:57:11Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>-1790</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBSOC000000" label="Social science"/>
<category scheme="http://www.feedbooks.com/categories" term="FBPOL000000" label="Political science"/>
<summary>The Code of Hammurabi (Codex Hammurabi) is a well-preserved ancient law code, created ca. 1790 BC (middle chronology) in ancient Babylon. It was enacted by the sixth Babylonian king, Hammurabi. One nearly complete example of the Code survives toda...</summary>
<dcterms:extent>6,390 words</dcterms:extent>
<dcterms:source>http://oll.libertyfund.org/index.php?option=com_content&task=view&id=1472&Itemid=264</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/4239" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/4239.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/4239.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/4239.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4239.jpg?size=large&t=1425661031" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4239.jpg?t=1425661031" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/4239.atom" rel="alternate"/>
</entry>
<entry>
<title>Life On The Mississippi</title>
<id>http://www.feedbooks.com/book/4313</id>
<author>
<name>Mark Twain</name>
<uri>http://www.feedbooks.com/author/24</uri>
</author>
<published>2009-10-11T12:21:18Z</published>
<updated>2015-03-06T16:57:26Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1883</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHUM000000" label="Humor"/>
<category scheme="http://www.feedbooks.com/categories" term="FBTRV000000" label="Travel"/>
<summary>Life on the Mississippi is a memoir by Mark Twain detailing his days as a steamboat pilot on the Mississippi River before and after the American Civil War. The book begins with a brief history of the river. It continues with anecdotes of Twain's t...</summary>
<dcterms:extent>143,742 words</dcterms:extent>
<dcterms:source>Project Gutenberg</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/4313" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/4313.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/4313.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/4313.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4313.jpg?size=large&t=1425661046" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4313.jpg?t=1425661046" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/4313.atom" rel="alternate"/>
</entry>
<entry>
<title>The Diary of a U-boat Commander</title>
<id>http://www.feedbooks.com/book/4208</id>
<author>
<name>Sir William Stephen Richard King-Hall</name>
<uri>http://www.feedbooks.com/author/1207</uri>
</author>
<published>2009-09-16T18:08:47Z</published>
<updated>2015-06-30T19:33:05Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1918</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBBIO000000" label="Biography & autobiography"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000N" label="History by country"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036000" label="United States"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036020N" label="Other"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS027000" label="Military"/>
<summary>The diary of a World War One U-Boat commander. As well as being a fascinating glimpse of life on the German U-boats during the intense submarine blockade, this also reminds us there were humans involved - on both sides of the action - as we read t...</summary>
<dcterms:extent>48,813 words</dcterms:extent>
<dcterms:source>http://www.gutenberg.org/etext/7947</dcterms:source>
<rights>This work was published before 1923 and is in the public domain in the USA only.</rights>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/4208" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/4208.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/4208.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/4208.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4208.jpg?size=large&t=1435692785" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4208.jpg?t=1435692785" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/4208.atom" rel="alternate"/>
</entry>
<entry>
<title>The Federalist Papers</title>
<id>http://www.feedbooks.com/book/2674</id>
<author>
<name>Publius</name>
<uri>http://www.feedbooks.com/author/491</uri>
</author>
<published>2008-07-20T12:04:13Z</published>
<updated>2015-03-26T16:55:45Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1787</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBSOC000000" label="Social science"/>
<category scheme="http://www.feedbooks.com/categories" term="FBPOL000000" label="Political science"/>
<summary>The Federalist Papers are a series of 85 articles advocating the ratification of the United States Constitution. Seventy-seven of the essays were published serially in The Independent Journal and The New York Packet between October 1787 and August...</summary>
<dcterms:extent>189,954 words</dcterms:extent>
<dcterms:source>http://www.foundingfathers.info/federalistpapers/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/2674" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/2674.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/2674.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/2674.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2674.jpg?size=large&t=1427388945" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2674.jpg?t=1427388945" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/2674.atom" rel="alternate"/>
</entry>
<entry>
<title>The Borgias</title>
<id>http://www.feedbooks.com/book/1248</id>
<author>
<name>Alexandre Dumas</name>
<uri>http://www.feedbooks.com/author/25</uri>
</author>
<published>2007-06-21T22:26:06Z</published>
<updated>2015-03-06T16:46:17Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1840</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<summary>No Description Available</summary>
<dcterms:extent>83,323 words</dcterms:extent>
<dcterms:source>http://gutenberg.org</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/1248" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/1248.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/1248.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/1248.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/1248.jpg?size=large&t=1425660377" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/1248.jpg?t=1425660377" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/1248.atom" rel="alternate"/>
</entry>
<entry>
<title>Henry V</title>
<id>http://www.feedbooks.com/book/3029</id>
<author>
<name>William Shakespeare</name>
<uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-29T19:15:04Z</published>
<updated>2015-03-06T16:52:59Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1599</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>Henry V is a history play by William Shakespeare, believed to be written in 1599. It is based on the life of King Henry V of England, and focuses on events immediately before and after the Battle of Agincourt during the Hundred Years' War.
The pl...</summary>
<dcterms:extent>27,188 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3029" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3029.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3029.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3029.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3029.jpg?size=large&t=1425660779" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3029.jpg?t=1425660779" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3029.atom" rel="alternate"/>
</entry>
<entry>
<title>King John</title>
<id>http://www.feedbooks.com/book/3038</id>
<author>
<name>William Shakespeare</name>
<uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-30T22:10:55Z</published>
<updated>2015-03-06T16:53:01Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1595</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>The Life and Death of King John, a history play by William Shakespeare, dramatizes the reign of King John of England (ruled 1199–1216), son of Henry II of England and Eleanor of Aquitaine and father of Henry III of England. It is believed to have ...</summary>
<dcterms:extent>21,524 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3038" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3038.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3038.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3038.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3038.jpg?size=large&t=1425660781" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3038.jpg?t=1425660781" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3038.atom" rel="alternate"/>
</entry>
<entry>
<title>Richard III</title>
<id>http://www.feedbooks.com/book/3045</id>
<author>
<name>William Shakespeare</name>
<uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-10-01T10:25:03Z</published>
<updated>2015-03-06T16:53:02Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1591</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>Richard III is a history play by William Shakespeare, believed to have been written in approximately 1591. The play is an unflattering depiction of the short reign of Richard III of England. While generally classified as a history, as grouped in t...</summary>
<dcterms:extent>31,087 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3045" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3045.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3045.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3045.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3045.jpg?size=large&t=1425660782" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3045.jpg?t=1425660782" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3045.atom" rel="alternate"/>
</entry>
<entry>
<title>Henry VIII</title>
<id>http://www.feedbooks.com/book/3040</id>
<author>
<name>William Shakespeare</name>
<uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-10-01T07:31:51Z</published>
<updated>2015-03-06T16:53:01Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1603</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>The Famous History of the Life of King Henry the Eighth is a history play by William Shakespeare, based on the life of Henry VIII of England. An alternative title, All is True, is recorded in contemporary documents, the title Henry VIII not appear...</summary>
<dcterms:extent>25,710 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3040" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3040.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3040.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3040.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3040.jpg?size=large&t=1425660781" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3040.jpg?t=1425660781" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3040.atom" rel="alternate"/>
</entry>
<entry>
<title>Henry VI, Part 1</title>
<id>http://www.feedbooks.com/book/3033</id>
<author>
<name>William Shakespeare</name>
<uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-30T09:51:34Z</published>
<updated>2015-03-06T16:53:00Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1590</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>The First Part of King Henry the Sixth is history play by William Shakespeare, believed written in approximately 1588–1590. It is the first in the cycle of four plays often referred to as "The First Tetralogy".</summary>
<dcterms:extent>22,578 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3033" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3033.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3033.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3033.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3033.jpg?size=large&t=1425660780" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3033.jpg?t=1425660780" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3033.atom" rel="alternate"/>
</entry>
<entry>
<title>The Capture of a Slaver</title>
<id>http://www.feedbooks.com/book/6723</id>
<author>
<name>John Taylor Wood</name>
<uri>http://www.feedbooks.com/author/2180</uri>
</author>
<published>2013-04-11T08:21:04Z</published>
<updated>2015-03-06T17:06:08Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1900</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000N" label="History by country"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036000" label="United States"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036010N" label="Historical period"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS006010" label="Pre-Confederation (to 1867)"/>
<summary>A true personal account of the capture of a slave-running ship by a United States gunship in the fleet assigned for the suppression of the slave trade. It is told in 1900 by John Taylor Wood, who, 50 years earlier, had been a young midshipmen on ...</summary>
<dcterms:extent>8,368 words</dcterms:extent>
<dcterms:source>University of Virginia Library http://etext.lib.virginia.edu/toc/modeng/public/WooCapt.html</dcterms:source>
<rights>Attribution Non-Commercial Share Alike (cc by-nc-sa)</rights>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/6723" rel="alternate"/>
<link type="text/html" title="Creative Commons" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" rel="license"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/6723.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/6723.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/6723.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/6723.jpg?size=large&t=1425661568" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/6723.jpg?t=1425661568" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/6723.atom" rel="alternate"/>
</entry>
<entry>
<title>Glimpses of Unfamiliar Japan, Vol 1</title>
<id>http://www.feedbooks.com/book/2056</id>
<author>
<name>Lafcadio Hearn</name>
<uri>http://www.feedbooks.com/author/286</uri>
</author>
<published>2007-12-15T15:48:19Z</published>
<updated>2015-06-30T18:19:44Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1871</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBTRV000000" label="Travel"/>
<summary>A Japanese magic-lantern show is essentially dramatic. It is a play of which the dialogue is uttered by invisible personages, the actors and the scenery being only luminous shadows. Wherefore it is peculiarly well suited to goblinries and weirdnes...</summary>
<dcterms:extent>95,032 words</dcterms:extent>
<dcterms:source>http://www.gutenberg.org/dirs/etext05/8glm110.txt</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/2056" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/2056.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/2056.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/2056.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2056.jpg?size=large&t=1435688384" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2056.jpg?t=1435688384" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/2056.atom" rel="alternate"/>
</entry>
<entry>
<title>Saint Joan</title>
<id>http://www.feedbooks.com/book/3255</id>
<author>
<name>George Bernard Shaw</name>
<uri>http://www.feedbooks.com/author/749</uri>
</author>
<published>2008-10-25T19:25:19Z</published>
<updated>2015-03-06T16:53:37Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1923</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>Saint Joan is a 1923 play by Irish playwright George Bernard Shaw depicting the life of Joan of Arc.</summary>
<dcterms:extent>36,740 words</dcterms:extent>
<dcterms:source>http://gutenberg.net.au/ebooks02/0200811h.html</dcterms:source>
<rights>This work is available for countries where copyright is Life+50 or in the USA (published before 1923).</rights>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3255" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3255.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3255.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3255.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3255.jpg?size=large&t=1425660817" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3255.jpg?t=1425660817" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3255.atom" rel="alternate"/>
</entry>
<entry>
<title>The Story of the Pony Express</title>
<id>http://www.feedbooks.com/book/6666</id>
<author>
<name>Glenn Danford Bradley</name>
<uri>http://www.feedbooks.com/author/2149</uri>
</author>
<published>2013-03-07T09:35:53Z</published>
<updated>2015-03-06T17:05:53Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1913</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000N" label="History by country"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036000" label="United States"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036010N" label="Historical period"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036050" label="Civil War Period (1850-1877)"/>
<summary>An account of the most remarkable mail service ever in existence, and its place in history.
The Pony Express was the first rapid transit and the first fast mail line across the North American continent from the Missouri River to the Pacific Coa...</summary>
<dcterms:extent>24,819 words</dcterms:extent>
<dcterms:source>Project Gutenberg Australia http://gutenberg.net.au/ebooks/w00112.html</dcterms:source>
<rights>Attribution Non-Commercial Share Alike (cc by-nc-sa)</rights>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/6666" rel="alternate"/>
<link type="text/html" title="Creative Commons" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" rel="license"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/6666.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/6666.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/6666.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/6666.jpg?size=large&t=1425661553" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/6666.jpg?t=1425661553" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/6666.atom" rel="alternate"/>
</entry>
<entry>
<title>Henry IV, Part 1</title>
<id>http://www.feedbooks.com/book/3023</id>
<author>
<name>William Shakespeare</name>
<uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-28T12:27:44Z</published>
<updated>2015-03-06T16:52:59Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1597</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>Henry IV, Part 1 is a history play by William Shakespeare, believed to have been written no later than 1597. It is the second of Shakespeare's tetralogy that deals with the successive reigns of Richard II, Henry IV (2 plays), and Henry V. Henry IV...</summary>
<dcterms:extent>25,762 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3023" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3023.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3023.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3023.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3023.jpg?size=large&t=1425660779" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3023.jpg?t=1425660779" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3023.atom" rel="alternate"/>
</entry>
<entry>
<title>Richard II</title>
<id>http://www.feedbooks.com/book/3024</id>
<author>
<name>William Shakespeare</name>
<uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-28T13:25:10Z</published>
<updated>2015-03-06T16:52:59Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1595</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>King Richard the Second is a history play by William Shakespeare believed to be written in approximately 1595. It is based on the life of King Richard II of England and is the first part of a tetralogy, referred to by scholars as the Henriad, foll...</summary>
<dcterms:extent>23,655 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3024" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3024.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3024.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3024.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3024.jpg?size=large&t=1425660779" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3024.jpg?t=1425660779" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3024.atom" rel="alternate"/>
</entry>
<entry>
<title>Glimpses of Unfamiliar Japan, Vol 2</title>
<id>http://www.feedbooks.com/book/2057</id>
<author>
<name>Lafcadio Hearn</name>
<uri>http://www.feedbooks.com/author/286</uri>
</author>
<published>2007-12-15T18:49:36Z</published>
<updated>2015-03-06T16:49:24Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1894</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBTRV000000" label="Travel"/>
<summary>No Description Available</summary>
<dcterms:extent>98,578 words</dcterms:extent>
<dcterms:source>http://www.gutenberg.org/dirs/etext05/8glm210.txt</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/2057" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/2057.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/2057.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/2057.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2057.jpg?size=large&t=1425660564" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2057.jpg?t=1425660564" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/2057.atom" rel="alternate"/>
</entry>
<entry>
<title>Ali Pacha</title>
<id>http://www.feedbooks.com/book/1247</id>
<author>
<name>Alexandre Dumas</name>
<uri>http://www.feedbooks.com/author/25</uri>
</author>
<published>2007-06-21T22:19:02Z</published>
<updated>2015-03-06T16:46:16Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1840</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<summary>No Description Available</summary>
<dcterms:extent>43,226 words</dcterms:extent>
<dcterms:source>http://gutenberg.org</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/1247" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/1247.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/1247.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/1247.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/1247.jpg?size=large&t=1425660376" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/1247.jpg?t=1425660376" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/1247.atom" rel="alternate"/>
</entry>
<entry>
<title>Henry VI, Part 2</title>
<id>http://www.feedbooks.com/book/3034</id>
<author>
<name>William Shakespeare</name>
<uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-30T10:15:12Z</published>
<updated>2015-03-06T16:53:01Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1591</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>The Second Part of King Henry the Sixth, or Henry VI, Part 2, is a history play by William Shakespeare believed written in approximately 1590-91. It is the second part of the trilogy on Henry VI, and often grouped together with Richard III as a te...</summary>
<dcterms:extent>26,527 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3034" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3034.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3034.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3034.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3034.jpg?size=large&t=1425660781" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3034.jpg?t=1425660781" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3034.atom" rel="alternate"/>
</entry>
</feed>
But when parsed through feedparser the generated published_parsed is
>>> import feedparser
>>> feedparser.__version__
'5.2.0'
>>> url = 'http://www.feedbooks.com/books/top.atom?category=FBHIS000000&lang=en&range=week'
>>> feed_data = feedparser.parse(url)
>>> entries = feed_data.entries
>>> entries[0]['published_parsed']
time.struct_time(tm_year=1513, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
#Similar for all the entries:
>>> for i in entries: print i['published_parsed']
...
time.struct_time(tm_year=1513, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=2015, tm_mon=6, tm_mday=28, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=179, tm_isdst=1)
time.struct_time(tm_year=1883, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1918, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1787, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1840, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1599, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1595, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1591, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1603, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1590, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1900, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1871, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1923, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1913, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1597, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1595, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1894, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1840, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1591, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=1, tm_isdst=0)
Hello and thanks for the great work.
This date : Wed, 01 Jul 15 00:00:00 +0200
, which is a valid RFC822 date, returns this tuple :
[200, 7, 1, 15, 0, 0, 1, 182, 0]
, where the year is obviously wrong.
I noticed it with this feed : http://www.cerveauetpsycho.fr/ewb_pages/f/flux_rss_general_cp.xml
I'm using the latest version of feedparser.
Thank you.
Traceback (most recent call last):
File "/usr/bin/pip3", line 9, in <module>
load_entry_point('pip==1.5.6', 'console_scripts', 'pip3')()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 558, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2682, in load_entry_point
return ep.load()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2355, in load
return self.resolve()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2361, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/usr/lib/python3/dist-packages/pip/__init__.py", line 74, in <module>
from pip.vcs import git, mercurial, subversion, bazaar # noqa
File "/usr/lib/python3/dist-packages/pip/vcs/mercurial.py", line 9, in <module>
from pip.download import path_to_url
File "/usr/lib/python3/dist-packages/pip/download.py", line 25, in <module>
from requests.compat import IncompleteRead
ImportError: cannot import name 'IncompleteRead'
System ubuntu 15.04
Python 3.5.0
Requests 2.8.1
I want to use "media:thumbnail" like "enclosures".
I think that mixin's _itsAnHrefDamnIt method should be used in this case.
Take the following podcast rss: http://feeds.serialpodcast.org/serialpodcast
It has both an itunes:subtitle
and a description
for the feed element. FeedParser only ever returns the itunes:subtitle
, even when attempting to access feed.description
.
I think that feed.description
should have channel/description take precedent over the itunes subtitle. We have a separate feed.subtitle already.
I'm trying to validate the url, like If I pas feedparser.parse('http://rohitkhatri.com')
, It should give me false.
Is there any property which I can access to get this information?
Hi,
I'm trying to parse a RSS 2.0 feed :
http://www.legrandmix.com/data/rss.xml
It is OK with w3c validator :
https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.legrandmix.com%2Fdata%2Frss.xml
But I've an error with feedparser :
SAXParseException('not well-formed (invalid token)'
I'm using the last version of feedparser (5.2.1) on python 2.7.10+, on Debian Linux.
Thanks for your help !
% python3
Python 3.5.1 (default, Mar 4 2016, 15:21:15)
[GCC 6.0.0 20160302 (Red Hat 6.0.0-0.14)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> url = 'http://api.videos.ndtv.com/apis/podcast/index/client_key/ndtv-podcast-5d35e3e34a92df17d11d54e0ff241e8b?shows=503&showfull=1&media_type=audio&extra_params=keywords,description'
>>> feedparser.parse(url)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/site-packages/feedparser.py", line 3957, in parse
saxparser.parse(source)
File "/usr/lib64/python3.5/site-packages/drv_libxml2.py", line 190, in parse
_d(reader.LocalName()))
File "/usr/lib64/python3.5/site-packages/drv_libxml2.py", line 70, in _d
return _decoder(s)[0]
File "/usr/lib64/python3.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
TypeError: a bytes-like object is required, not 'str'
>>> feedparser.__version__
'5.2.0'
>>>
According to documentation <img src="">
is treated as URI and its link is resolved.
But I found the case where it doesn't work, Atom feed at http://jvns.ca/atom.xml
> import feedparser
> feedparser.__version__
'5.2.1'
> feedparser.RESOLVE_RELATIVE_URIS
1
> feed = feedparser.parse('http://jvns.ca/atom.xml')
> # This entry has img with relative URI
> e = feed['entries'][4]
> # Get HTML content
> content = e['content'][0]['value']
> # Find and show <img>
> i = content.find('img src')
> print(content[i-1:(i+50)])
<img src="/images/ml-feelings.jpg" width="300px" />
As you can see image source remained relative
feed: https://xueqiu.com/hots/topic/rss
>>> import feedparser
>>> f=feedparser.parse('https://xueqiu.com/hots/topic/rss')
>>> f.entries[0]
{'summary_detail': {'base': u'https://xueqiu.com/hots/topic/rss', 'type': u'text/html', 'value': u'', 'language': None}, 'published_parsed': time.struct_time(tm_year=2016, tm_mon=11, tm_mday=23, tm_hour=3, tm_min=16, tm_sec=41, tm_wday=2, tm_yday=328, tm_isdst=0), 'links': [{'href': u'http://xueqiu.com/4465952737/77946541', 'type': u'text/html', 'rel': u'alternate'}], 'title': u'\u9009\u80a1\u601d\u8def\u5206\u4eab\uff0c\u54ea\u4e9b\u4e8b\u60c5\u8981\u7559\u610f\uff1f', 'authors': [{'name': u'\u5f90\u51e4\u4fca'}], 'updated': u'2016-11-23T03:16:41Z', 'summary': u'', 'content': [{'base': u'https://xueqiu.com/hots/topic/rss', 'type': u'text/html', 'value': u'', 'language': None}], 'guidislink': False, 'title_detail': {'base': u'https://xueqiu.com/hots/topic/rss', 'type': u'text/plain', 'value': u'\u9009\u80a1\u601d\u8def\u5206\u4eab\uff0c\u54ea\u4e9b\u4e8b\u60c5\u8981\u7559\u610f\uff1f', 'language': None}, 'link': u'http://xueqiu.com/4465952737/77946541', 'author': u'\u5f90\u51e4\u4fca', 'published': u'Wed, 23 Nov 2016 03:16:41 GMT', 'author_detail': {'name': u'\u5f90\u51e4\u4fca'}, 'id': u'http://xueqiu.com/4465952737/77946541', 'updated_parsed': time.struct_time(tm_year=2016, tm_mon=11, tm_mday=23, tm_hour=3, tm_min=16, tm_sec=41, tm_wday=2, tm_yday=328, tm_isdst=0)}
Hi,
I'm using rss2email by @wking and I ran some measurements to find the bottleneck because I found that rss2email a bit slow.
I used line_profiler to measure the cpu time.
From my investigations, I noticed that
_build_urllib2_request
urllib.request.Request(url)
and 50% by _parse_date(modified)
Clearly, this function takes a lot of time and can probably be optimized. I leave this note as an open question for suggestions from experts.
From my feeds, rfc822 seems to be mostly used.
Thank you.
Date in RSS Feed
Thu, 30 Apr 2015 08:57:00 MEST
Actual published.parsed
time.struct_time(tm_year=2015, tm_mon=4, tm_mday=30, tm_hour=20, tm_min=57, tm_sec=0, tm_wday=3, tm_yday=120, tm_isdst=0)
Expected published.parsed
time.struct_time(tm_year=2015, tm_mon=4, tm_mday=30, tm_hour=06, tm_min=57, tm_sec=0, tm_wday=3, tm_yday=120, tm_isdst=0)
In the following feed, each item has an "image" attribute. Any chance you can use feedparser to access this element?
Kind regards,
pieter
When I try to parse the PBS NewsHour, the parser doesn't recognize all items as entries, and some of them doesn't have fields like description even if the XML contains it.
Trying to use this on python3 and I'm getting this?
feedparser.py", line 1353
ur'''(([a-zA-Z0-9_-.+]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([a-zA-Z0-9-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?))(?subject=\S+)?''',
^
SyntaxError: invalid syntax
the error is pointing to the last set of ''' before the comma.
I have played with this for a few hours, spliting up the line turing it into a string, etc, etc and no matter what it doesn't like that line.
Thanks
The docs here recommend installing a couple of packages from http://cjkpython.i18n.org/
i18n.org has been gone for long enough that Google's last cached copy of it was a godaddy hosting page.
So, I guess we need to figure out what the current recommended guidance here is and then update the docs.
Take the following Atom feeds:
Explicit Text Type:
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">Example <b>Atom</b></title>
</feed>
Implicit Text Type:
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Example <b>Atom</b></title>
</feed>
Feedparser will return the following for Title:
Example <b>Atom</b>
However, the Atom spec states that items with the type="text"
attribute should be left as is. Indeed, this is the major difference between type="text"
and type="html"
, where type="html"
is explicit about containing escaped markup that should be decoded. If no type
attribute exists, the element defaults to type="text"
.
So, to me, it seems as though feedparser
is incorrectly entity-decoding Atom text elements. I think the correct logic would be something like (pseudo code):
if (feedType == "atom")
if (elem.Type == "" || elem.Type == "text")
return elem.CharData
else if (elem.Type == "html")
return decodeEntities(elem.CharData)
else if (elem.Type == "xhtml")
// ... Handle xhtml
I've checked the output of Simplepie another feed parser, and it indeed returns what I consider the expected output for both the quoted feeds above:
Example <b>Atom</b>
Trying to parse s simple document that consist only of several collections, produces 'collection' as a single entry dictionary with only one url - the last one in the document.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.