kurtmckee / feedparser Goto Github PK

View Code? Open in Web Editor NEW

1.8K 47.0 333.0 6.91 MB

Parse feeds in Python

Home Page: https://feedparser.readthedocs.io/en/latest/

License: Other

Python 100.00%

atom rss json rdf python

feedparser's People

Contributors

Stargazers

Watchers

Forkers

pingviini ubergrape melinath paulswartz korprulu sgillies bendikro perkville hfeeki patrykw jquade rcarmo kmartino sebest slevytam guojing adityaiitd jametong wolph linearregression edwardt slick666 visity pombredanne edavis marado speedplane joulez willowtreeapps raniglas facundobatista annanymouse kevinmarks cgonzalezsanc mstoyano bbeloqui exlsunshine ichdream vjoe martincollignon the7erm anboqing jikamens cglorioso miaobenjun halfword-dev mhcrnl guogguo baymin1217 felixzhang00 zhouyunan wangyichen1064431086 mkusiciel muguruma83 demoup cryptaxe nikolas amin24e sciunto valentin-at-smarch lol4t0 ericeiffel t2be mvdbeek dmwyatt samueldeng mosasiru ayatoy sourcejedi mihajenko tonyvu2014 alexanderpu dachrisch vhf tjtunnell thecrackofdawn peterashwell shenchao0120 opentopic ambier whiteanthrax jdjimmy rlugojr imo jpfrancoia nagyistge karthick6038 terareach zenhack fpcmotif weijarz factr kcobindev jasonoldwoo olivierh59500 brettcannon alex-halden dhensold jtkostman gpstathis

feedparser's Issues

Some characters are not correcly converted in unicode

I am really bad with encoding/charset stuff, but here’s what I am getting:

Python 2.7.9 (default, Feb 10 2015, 03:28:08) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> title = feedparser.parse('http://www.yeni1tarif.com/feed').entries[9]['title']
>>> title
u'Yufkadan K\xc4\xb1ymal\xc4\xb1 Kol B\xc3\xb6re\xc4\u0178i'
>>> print title
Yufkadan KÄ±ymalÄ± Kol BÃ¶reÄŸi
>>> print title.encode('utf-8')
Yufkadan KÄ±ymalÄ± Kol BÃ¶reÄŸi

Although if you check out curl 'http://www.yeni1tarif.com/feed' | grep -i yufkadan the title correctly is <title>Yufkadan Kıymalı Kol Böreği</title> instead.

XML is set to UTF-8 (curl 'http://www.yeni1tarif.com/feed' | head):

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    >

and server (apparently) sends correct Content-Type:

curl -v 'http://www.yeni1tarif.com/feed'
*   Trying 94.101.84.135...
* Connected to www.yeni1tarif.com (94.101.84.135) port 80 (#0)
> GET /feed HTTP/1.1
> Host: www.yeni1tarif.com
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: text/xml; charset=UTF-8
< Last-Modified: Wed, 01 Apr 2015 10:30:35 GMT
< ETag: "0c98f3f9ec42f8fd4b207adbc69e7d18"
< Server: Microsoft-IIS/7.5
< X-Powered-By: PHP/5.4.14
< X-Pingback: http://www.yeni1tarif.com/xmlrpc.php
< Date: Thu, 24 Sep 2015 22:30:34 GMT
< Content-Length: 75344
<

I am not sure what am I missing or doing wrong; or maybe there is a bug somewhere?

Thank you.

opensearch namespace

Would it be possible to add the opensearch namespace?

I don't know where this would go now (or I'd have written a patch) but in the monolithic version, the namespace additions might look like this:

'http://a9.com/-/spec/opensearch/1.1/': 'opensearch',

and maybe the older

'http://a9.com/-/spec/opensearchrss/1.0/': 'opensearch',

'str' does not support the buffer interface

I create simple example:

#!/usr/bin/python3

import feedparser

d = feedparser.parse('http://planet.gnu.org/atom.xml')
print(d['feed']['title'])

I get the following:

Traceback (most recent call last):
  File "./test.py", line 5, in <module>
    d = feedparser.parse('http://planet.gnu.org/atom.xml')
  File "/usr/lib64/python3.4/site-packages/feedparser-5.2.1-py3.4.egg/feedparser/api.py", line 235, in parse
  File "/usr/lib64/python3.4/site-packages/drv_libxml2.py", line 189, in parse
    eltName = (_d(reader.NamespaceUri()),\
  File "/usr/lib64/python3.4/site-packages/drv_libxml2.py", line 70, in _d
    return _decoder(s)[0]
  File "/usr/lib64/python3.4/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
TypeError: 'str' does not support the buffer interface

When using python2 this problem don't occur.
feedparser 5.2.1

urlopen error SSL/TLS

Some users are reporting this error (bozo_exception) on some servers:

urlopen error [Errno 1] _ssl.c:490: error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error

Presumably it's related to changes to handle the Poodle vulnerability, as described in https://github.com/calmh/unifi-api/issues/22

Retain all tag information, even if mapped to a core attribute

This is a design change that I think would be a positive move for feedparser. It is somewhat related to #24 but isn't exactly the same.

Right now, if feedparser is parsing a particular element and that element maps to one of it's 'common interface' elements, it is consumed and not accessible individually.

An example of this would be itunes:author. Because this is mapped to the core author field, it is not accessible via feed['itunes_author']. If feedparser's precedence rules make it so that another element also maps to the core author field and is a higher precedence, it is impossible to access the itunes:author information.

I think that all tags should be accessible manually and that the mapping to that common interface should be supplementary. It shouldn't throw away any information.

You could do this by making all elements individually accessible like so:

feed['rss:author']
feed['itunes:author']
feed['atom:subtitle']
feed['itunes:subtitle']

You would still be able to access elements via the common interface: feed['author'] or feed['subtitle'], according to the well documented precedence rules. However, if I, as an application writer, want to say, ensure that any iTunes element takes precedence over the other items, I can do this myself by specifying the individual elements themselves and bypass the common interface.

This is similar to the approach https://github.com/danmactough/node-feedparser takes and I think it allows for a lot more flexibility.

bozo detection

With the bozo detection, my Feeds seem to be not ok

 >>> data = feedparser.parse('https://foxmask.trigger-happy.eu/feeds/all.rss.xml')
 >>> data.bozo
1
 >>> data.bozo_exception
CharacterEncodingOverride('document declared as us-ascii, but parsed as utf-8',)

but when I use chardet, the encoding is well parsed and encoding well detect

 >>> data = urlopen('https://foxmask.trigger-happy.eu/feeds/all.rss.xml').read()
 >>> chardet.detect(data)
{'confidence': 0.99, 'encoding': 'utf-8'}

What can I do to make the bozo detection works well ?

python3 and python2 version parsed a feed differently

This feed:

http://eatcodeplay.com/feed.xml

contains errors. However python2 version can find several entries while python3 version does not find any entries.

parsing error with some feeds - unichr() arg not in range(0x10000)

Hi!
I'm receiving a unichr() arg not in range(0x10000) error when parsing some RSS feeds.

My OS/config:
Linux 4.4.0 SMP Mon Jan 11 22:30:29 CST 2016 x86_64,
Slackware 14.2 (current), Python 2.7.11, and feedparser 5.2.1

Below is the output of parse:

diniz@darkstar:~$ python
Python 2.7.11 (default, Dec  6 2015, 14:10:30) 
[GCC 5.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> feedparser.parse('http://feeds.feedburner.com/podcast30min')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/site-packages/feedparser.py", line 3964, in parse
    feedparser.feed(data.decode('utf-8', 'replace'))
  File "/usr/lib64/python2.7/site-packages/feedparser.py", line 2124, in feed
    sgmllib.SGMLParser.feed(self, data)
  File "/usr/lib64/python2.7/sgmllib.py", line 104, in feed
    self.goahead(0)
  File "/usr/lib64/python2.7/sgmllib.py", line 186, in goahead
    self.handle_charref(name)
  File "/usr/lib64/python2.7/site-packages/feedparser.py", line 734, in handle_charref
    text = unichr(c).encode('utf-8')
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
>>> quit()

Checking bozo will cause parser to fail (not checking will parse fine)

Simple test case:
import feedparser
import json

info = {}
parser = feedparser.parse('http://feeds.feedburner.com/codinghorror')

#if parser.bozo == 1:
#    info['bozo_message'] = parser.bozo_exception

info['title'] = parser.feed['title']
print(json.dumps(info))

With the bozo check commented out this runs fine and prints the json string with the title, uncomment the bozo check and you get:

SAXParseException('Input is not proper UTF-8, indicate encoding !\nBytes: 0xE2 0x80 0x99 0x73\n',) is not JSON serializable

I would expect this to work the same regardless of checking the bozo exception.

Dead Link in Documentation

First thanks for this great library.
If you go to the bottom of the page:
https://pythonhosted.org/feedparser/html-sanitization.html#advanced-sanitization

It has a dead link to the platypus attack.

Hope that helps.
Chris

Access to arbitrary elements.

I'm specifically looking to access the itunes:owner element of a podcast feed, but I think that access to arbitrary non-standard elements would be useful.

Timezone handling

I might be blind, but I don't see anything in the documentation specifying any sort of guarantee on timezone parsing. From the examples it looks like it always returns GMT time...

The struct_time object it spits out for the various datetime fields doesn't have the tm_zone or tm_gmtoff attributes on my system.

Can I assume that the time it provides is always in GMT? What if the feed doesn't provide a timezone...will it just be assumed as GMT?

Namespace attrs overwritten by dupekeys?

I have an issue I cant seem to work out on my own, not sure if it is a bug or if it is my fail on finding a workaround. Maybe you can point me in the right direction..

In the following xml, parsed output only includes the last of the torznab:attr elements. The torznab xmlns does not resolve, which I think doesn't matter from looking at the code, since it seems external xmlns resolution is disabled by default.

Is this a bug or is there a way for me to get the value of all 4 torznab:attr somehow?

<?xml version="1.0" encoding="UTF-8"?>
<rss version="1.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:torznab="http://torznab.com/schemas/2015/feed">
  <channel>
    <atom:link href="http://127.0.0.1:9117/" rel="self" type="application/rss+xml" />
    <title>TORZNAB</title>
    <description>TORZNAB</description>
    <link>https://torznab.org/</link>
    <lanuage>en-us</lanuage>
    <category>search</category>
    <image>
      <url>http://127.0.0.1:9117/logos/TORZNAB.png</url>
      <title>TORZNAB</title>
      <link>https://torznab.org/</link>
      <description>TORZNAB</description>
    </image>
    <item>
      <title>Ubuntu.14.10.Desktop.64bit.ISO</title>
      <guid>https://torznab.org/B415C913643E5FF49FE37D304BBB5E6E11AD5101/comments</guid>
      <comments>https://torznab.org/B415C913643E5FF49FE37D304BBB5E6E11AD5101/comments</comments>
      <pubDate>Sat, 06 Jul 2013 03:57:49 -0700</pubDate>
      <size>1159641169</size>
      <description>Ubuntu.14.10.Desktop.64bit.ISO</description>
      <link>magnet:?xt=urn:btih:B415C913643E5FF49FE37D304BBB5E6E11AD5101&amp;dn=ubuntu+14+10+desktop+64bit+iso&amp;tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&amp;tr=udp%3A%2F%2Fopen.demonii.com%3A1337</link>
      <category>4020</category>
      <enclosure url="magnet:?xt=urn:btih:B415C913643E5FF49FE37D304BBB5E6E11AD5101&amp;dn=ubuntu+14+10+desktop+64bit+iso&amp;tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&amp;tr=udp%3A%2F%2Fopen.demonii.com%3A1337" length="253217700" type="application/x-bittorrent" />
      <torznab:attr name="magneturl" value="magnet:?xt=urn:btih:B415C913643E5FF49FE37D304BBB5E6E11AD5101&amp;dn=ubuntu+14+10+desktop+64bit+iso&amp;tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&amp;tr=udp%3A%2F%2Fopen.demonii.com%3A1337" />
      <torznab:attr name="seeders" value="115" />
      <torznab:attr name="peers" value="8" />
      <torznab:attr name="infohash" value="B415C913643E5FF49FE37D304BBB5E6E11AD5101" />
    </item>
  </channel>
</rss>

Feature request : let us specify the encoding

Hello again and thanks for the great work.

Unless I missed something, I think there is no way to tell Feedparser the feed encoding if we already know it. In my case, I process and convert the feed to utf8 in Node before passing it to Feedparser. The passed feed is in utf8 but the encoding="window-1252" attribute is still present in the feed content and causes Feedparser to fail. I'd be happy not to have to remove that attribute myself.

Thank you.

Allow turning off sanitization

I understand that sanitization is for my safety, but there are times when it is silly to do and changes the feed enough to be "too much." Would you accept a pull request to disable the sanitization at the user's request? Maybe based on a flag passed to parse?

published_parsed is wrong sometimes

In [27]: f['entries'][0]["published"]
Out[27]: u'2016/6/29 15:07:41'

In [28]: f['entries'][0]["published_parsed"]
Out[28]: time.struct_time(tm_year=2016, tm_mon=6, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=153, tm_isdst=0)

In [29]: datetime.datetime.fromtimestamp(time.mktime(f['entries'][0]["published_parsed"]))
Out[29]: datetime.datetime(2016, 6, 1, 0, 0)

as we can see, feedparser parsed "2016/6/29 15:07:41" into datetime.datetime(2016, 6, 1, 0, 0).
After reading the related code roughly, i found _parse_date_iso8601 was used to parse the date. The problem is that 2016/6/29 15:07:41 is not in iso8601 format.
The re pattern used in _parse_date_iso8601 and the result returned:

In [20]: m = re.match("(?P<year>\d{4})(T?(?P<hour>\d{2}):(?P<minute>\d{2})(:(?P<second>\d{2}))?(\.(?P<fracsecond>\d+))?(?P<tz>[+-](?P<tzhour>\d{2})(:(?P<tzmin>\d{2}))?|Z)?)?", '2016/6/29 13:15:50')
In [22]: params = m.groupdict()
In [23]: params
Out[23]:
{'fracsecond': None,
 'hour': None,
 'minute': None,
 'second': None,
 'tz': None,
 'tzhour': None,
 'tzmin': None,
 'year': '2016'}

The re pattern used here can only get the year out and the code try to make assumption to the month and day. Why trying to make assumption? So the published_parsed can not be trusted?

Add option to disable gzip accept-encoding

I'm building a thing with feedparser and I'd like to rely on ETags as much as possible. A bunch of webserver versions don't provide ETags with gzipped content.

Currently the way feedparser decides whether to ask for gzip or not is by checking whether python supports gzip:

try:
    import gzip
except ImportError:
    gzip = None

…

    if gzip and zlib:
        request.add_header('Accept-encoding', 'gzip, deflate')
    elif gzip:
        request.add_header('Accept-encoding', 'gzip')

Would you accept a PR adding the possibility to disable gzip?

pip package keeps upgrading all the time

    $ pip install --upgrade feedparser
    Collecting feedparser
      Using cached feedparser-5.2.0.post1.tar.bz2
    Installing collected packages: feedparser
      Found existing installation: feedparser 5.2.0
        Uninstalling feedparser-5.2.0:
          Successfully uninstalled feedparser-5.2.0
      Running setup.py install for feedparser
    Successfully installed feedparser-5.2.0
    $ pip install --upgrade feedparser
    Collecting feedparser
      Using cached feedparser-5.2.0.post1.tar.bz2
    Installing collected packages: feedparser
      Found existing installation: feedparser 5.2.0
        Uninstalling feedparser-5.2.0:
          Successfully uninstalled feedparser-5.2.0
      Running setup.py install for feedparser
    Successfully installed feedparser-5.2.0
    $ pip install --upgrade feedparser --no-cache-dir
    Collecting feedparser
      Downloading feedparser-5.2.0.post1.tar.bz2 (192kB)
        100% |████████████████████████████████| 192kB 471kB/s
    Installing collected packages: feedparser
      Found existing installation: feedparser 5.2.0
        Uninstalling feedparser-5.2.0:
          Successfully uninstalled feedparser-5.2.0
      Running setup.py install for feedparser
    Successfully installed feedparser-5.2.0
    $ pip install --upgrade feedparser --no-cache-dir
    Collecting feedparser
      Downloading feedparser-5.2.0.post1.tar.bz2 (192kB)
        100% |████████████████████████████████| 192kB 2.5MB/s
    Installing collected packages: feedparser
      Found existing installation: feedparser 5.2.0
        Uninstalling feedparser-5.2.0:
          Successfully uninstalled feedparser-5.2.0
      Running setup.py install for feedparser
    Successfully installed feedparser-5.2.0
    $

It looks like the setup.py says it's version 5.2.0 but sdist says otherwise

I am sorry if it the wrong place for this kind of bugs,

cheers

Feedparser seems to occasionally hang and has no timeout

According to this the default timeout in urllib2 is -1, or None. So... this is a problem for long running programs, when occasionally some connection will hang everything.

Solution is pretty simple, add a timeout to the 'open' here

feedparser/feedparser/http.py

Line 175 in 39a7157

f = opener.open(request)

I'll fork and try make a fix

[develop] Bug in lazy_chardet_encoding (encodings.py)

data variable does not exist:

feedparser/feedparser/encodings.py

Line 43 in f019d06

chardet_encoding = chardet.detect(data)['encoding']

media:description element overwrites 'content' field

The handling of the media:description element in 5.2.1 ends up overwriting the 'content' field of an item. This seems like a particular case of issue #35.

An example feed item and test script are attached. 'description' and 'summary' of the single entry in the feed are set to the full story text (starting "Just like you sync your tablet ..."), which is some 4400-odd bytes. But 'content' is set to the 101-byte caption of the photo (starting "Bonnie Plants’ Homegrown free app keeps you growing in the garden.").

One possible fix is to make _start_media_description()/_end_media_description() could be their own methods instead of an alias for _start_description(), and it could do something like what _start_media_license()/_end_media_license() do. Or maybe _start_description()
needs to be more complicated and do something different when in a media:content context.

I'm happy to work on a patch, if given a direction to pursue.

media-desc-issue.zip

Handling http 301 in result['status']

Hi,
Many sites uses 3rd party software to distribute content (feedsportal, feedburner, etc.). In such case we always get 301 or 302 from http://mysite.com/feed to http://feeds.feedburner.com/mysite. In the end we get 200 or 304.

However parse() always return status set to 301. I think we should return final response code.
Here's my patch:
replace lines

1969 if hasattr(f, 'status'):
1970 result['status'] = f.status

with

result['status'] = getattr(f, 'code', 0)

Or maybe we can add this as new entry in result (along with status)

Incorrect year on some dates

See the dates on this feed: https://www.daydeal.ch/rss.xml

<pubDate>Thu, 03 Sep 15 00:01:01 +0200</pubDate>

This is being parsed as:

>>> entry.get('published_parsed')
time.struct_time(tm_year=200, tm_mon=9, tm_mday=2, tm_hour=15, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=245, tm_isdst=0)

In the year 200!

feedparser cannot parse multiple "category" value?

According to RSS 2.0 Specification, category item may include multiple values.

You may include as many category elements as you need to, for different domains, and to have an item cross-referenced in different parts of the same domain.

There's a sample including multiple category values as below.

http://www.validome.org/check/RSS_validator/version/rss_2_0/action/xml/feed/234

>>> import feedparser
>>> feedparser.__version__
'5.2.1'
>>> data = feedparser.parse('http://www.validome.org/check/RSS_validator/version/rss_2_0/action/xml/feed/234')
>>> data.feed.get('category')
u'category/subcategory/subcategory2'

Is this a bug?

TypeError: a float is required

I am using guv and feedparser to parse multiple feeds simultaneously. The following is my code:

def parse_feed(_feed):  
    return feedparser.parse(_feed)

def main():
    urls = ["http://feeds.bbci.co.uk/news/rss.xml"]
    pool = guv.GreenPool()
    results = pool.starmap(parse_feed, zip(urls))
    for resp in results:
        print(str(resp))

However, I get the following output:

{'bozo_exception': TypeError('a float is required',), 'bozo': 1, 'feed': {}, 'entries': []}

I have the similar problem using Eventlet, but not with native Python 3 threading library.

Introduction in feedparser docs needs new URL

From the very first code bit on the Introduction page:

>>> import feedparser
>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')

All seems to work OK, except:

>>> d["feed"]["title"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/feedparser.py", line 357, in __getitem__
    return dict.__getitem__(self, key)
KeyError: 'title'

The source of atom10.xml is

<!DOCTYPE html>
<body style="padding:0; margin:0;">
<html>
<body>
    <iframe src="http://mcc.godaddy.com/park/p3WlpJAhMJMlMF5vMKD=" style="visibility: visible;height: 100%; position:absolute" allowtransparency="true" marginheight="0" marginwidth="0" frameborder="0" width="100%">
    </iframe>
</body>
</html>

(I formatted it, as it was all on one line). From what I can gather, the domain is parked but has no content. So, if you own the domain, could you an acceptable RSS feed file at the indicated URL? If not, could you find another sample RSS feed to use? You could probably just post a file on Github or something.

Thanks!

Published date string duplicated

When going to this feed, the published field is populated as the date twice in a row (u'2016-05-24 14:17:57.02016-05-24 14:17:57.0').

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.unodc.org/misc/feed.xsl"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0"><channel><title>UNODC Publications</title><link>http://www.unodc.org/unodc/en/feed/publications.xml</link><description>UNODC Publications</description><item><title>World wildlife crime report 2016</title><link>http://www.unodc.org/documents/data-and-analysis/wildlife/World_Wildlife_Crime_Report_2016_final.pdf</link><guid>http://www.unodc.org/documents/data-and-analysis/wildlife/World_Wildlife_Crime_Report_2016_final.pdf</guid><description></description><pubDate>Tue, 24 May 2016 2:17:57 PM CEST</pubDate></item><item><title>The Afghan Opiate Trade and Africa - A Baseline Assessment- 2016</title><link>http://www.unodc.org/documents/data-and-analysis/Afghanistan/Afghan_Opiate_trade_Africa_2016_web.pdf</link><guid>http://www.unodc.org/documents/data-and-analysis/Afghanistan/Afghan_Opiate_trade_Africa_2016_web.pdf</guid><description></description><pubDate>Wed, 16 Mar 2016 4:34:12 PM CET</pubDate></item><item><title>Afghanistan Opium Survey 2015 - Socio-economic analysis</title><link>http://www.unodc.org/documents/crop-monitoring/Afghanistan/Afghanistan_opium_survey_2015_socioeconomic.pdf</link><guid>http://www.unodc.org/documents/crop-monitoring/Afghanistan/Afghanistan_opium_survey_2015_socioeconomic.pdf</guid><description> Afghanistan Opium Survey 2015 - Socio-economic analysis </description><pubDate>Wed, 16 Mar 2016 2:19:37 PM CET</pubDate></item><item><title>Afghanistan Opium Survey 2015 - Cultivation and Production</title><link>http://www.unodc.org/documents/crop-monitoring/Afghanistan/_Afghan_opium_survey_2015_web.pdf</link><guid>http://www.unodc.org/documents/crop-monitoring/Afghanistan/_Afghan_opium_survey_2015_web.pdf</guid><description></description><pubDate>Fri, 18 Dec 2015 1:19:09 PM CET</pubDate></item><item><title>Southeast Asia Opium Survey 2015 - Lao PDR, Myanmar</title><link>http://www.unodc.org/documents/crop-monitoring/sea/Southeast_Asia_Opium_Survey_2015_web.pdf</link><guid>http://www.unodc.org/documents/crop-monitoring/sea/Southeast_Asia_Opium_Survey_2015_web.pdf</guid><description>Southeast Asia Opium Survey 2015 - Lao PDR, Myanmar</description><pubDate>Tue, 15 Dec 2015 4:30:42 PM CET</pubDate></item><item><title>Drug Money - the illicit proceeds of opiates trafficked on the Balkan route</title><link>http://www.unodc.org/documents/data-and-analysis/Studies/IFF_report_2015_final_web.pdf</link><guid>http://www.unodc.org/documents/data-and-analysis/Studies/IFF_report_2015_final_web.pdf</guid><description></description><pubDate>Thu, 26 Nov 2015 3:15:21 PM CET</pubDate></item><item><title>Strengthening the medico-legal response to sexual violence</title><link>http://www.unodc.org/documents/publications/WHO_RHR_15.24_eng.pdf</link><guid>http://www.unodc.org/documents/publications/WHO_RHR_15.24_eng.pdf</guid><description></description><pubDate>Wed, 25 Nov 2015 11:07:00 AM CET</pubDate></item><item><title>Afghanistan Opium Survey 2015 - Executive Summary</title><link>http://www.unodc.org/documents/crop-monitoring/Afghanistan/Afg_Executive_summary_2015_final.pdf</link><guid>http://www.unodc.org/documents/crop-monitoring/Afghanistan/Afg_Executive_summary_2015_final.pdf</guid><description>Afghanistan Opium Survey 2015 - Executive Summary</description><pubDate>Wed, 14 Oct 2015 7:53:45 AM CEST</pubDate></item><item><title>Estado Plurinacional de Bolivia - Monitoreo de Cultivos de Coca 2014 </title><link>http://www.unodc.org/documents/bolivia/Bolivia_Informe_Monitoreo_Coca_2014.pdf</link><guid>http://www.unodc.org/documents/bolivia/Bolivia_Informe_Monitoreo_Coca_2014.pdf</guid><description></description><pubDate>Tue, 18 Aug 2015 11:04:00 AM CEST</pubDate></item><item><title>Peru - Informe Monitoreo de Cultivos de Coca 2014 (Summary in English included)</title><link>http://www.unodc.org/documents/crop-monitoring/Peru/Peru_Informe_monitoreo_coca_2014_web.pdf</link><guid>http://www.unodc.org/documents/crop-monitoring/Peru/Peru_Informe_monitoreo_coca_2014_web.pdf</guid><description></description><pubDate>Wed, 15 Jul 2015 6:51:02 PM CEST</pubDate></item></channel></rss>

published_parsed is correct though:

>>> parser['entries'][0].get('published')
u'2016-05-24 14:17:57.02016-05-24 14:17:57.0'
>>> parser['entries'][0].get('published_parsed')
time.struct_time(tm_year=2016, tm_mon=5, tm_mday=24, tm_hour=14, tm_min=17, tm_sec=57, tm_wday=1, tm_yday=145, tm_isdst=0)
>>>

This is on Python 2.7.11 using version 5.2.1 from PyPI and the following virtualenv:

BeautifulSoup==3.2.1
boto3==1.2.2
botocore==1.3.30
cssselect==0.9.1
docutils==0.12
feedparser==5.1.3
futures==2.2.0
goose-extractor==1.0.25
jieba==0.38
jmespath==0.9.0
lambda-uploader==0.5.1
lxml==3.6.0
nltk==3.2.1
Pillow==3.2.0
piprot==0.9.6
python-dateutil==2.5.3
python-lambda-local==0.1.2
requests==2.3.0
requests-futures==0.9.7
simplejson==3.8.2
six==1.10.0
virtualenv==15.0.1

pypi package broken on python 3.4

When I install feedparser from git it works. If install from pypi version 5.2.0 it fails with invalid python 3 syntax.

Missing timeout parameter in opener.open

If one has a connection to a server which does not sent any data the connection will be kept open for a infinite time. It would be good to have a timeout parameter which is transfered down to the urllib2 library.

Thank you

feedparser depends on order of description and content

parser is affected by order by descriotion and content:encoded.
A test case is below.

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
    <title>title</title>
    <link>http://www.example.com/</link>
    <item>
        <title>title2</title>
        <description>hoge</description>
        <content:encoded><![CDATA[
                fuga
        ]]></content:encoded>
        <link>http://example.com/2.html</link>
    </item>
    <item>
        <title>title1</title>
        <content:encoded><![CDATA[
                fuga
        ]]></content:encoded>
        <description>hoge</description>
        <link>http://example.com/1.html</link>
    </item>
</channel>
</rss>

Above two entries' description and content:encoded are just same except order.
But the result is not same..

In [4]: a.entries[0].content
Out[4]: [{'base': '', 'language': None, 'type': 'text/html', 'value': 'fuga'}]
In [6]: a.entries[1].content
Out[6]:
[{'base': '', 'language': None, 'type': 'text/html', 'value': 'fuga'},
 {'base': '', 'language': None, 'type': 'text/plain', 'value': 'hoge'}]

In [5]: a.entries[0].description
Out[5]: 'hoge'
In [7]: a.entries[1].description
Out[7]: 'fuga'

It seems because
(1)content is copied to sumary
https://github.com/kurtmckee/feedparser/blob/develop/feedparser/namespaces/_base.py#L482-L483
(2)summary is set to content
https://github.com/kurtmckee/feedparser/blob/develop/feedparser/namespaces/_base.py#L428-L430
this behavior is affected by order, it seems strange to me.

How to fix SSL: CERTIFICATE_VERIFY_FAILED?

{'feed': {}, 'entries': [], 'bozo': 1, 'bozo_exception': URLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)'),)}
It works some day ago, but now doesn't.

feedparser.parse('https://habrahabr.ru/rss/feed/posts/5d0c9b4397559e2e7cb380b29ec8151b/')

wrongly generated published_parsed

Hi,
For the feed "http://www.feedbooks.com/books/top.atom?category=FBHIS000000&lang=en&range=week" the feed source is

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:thr="http://purl.org/syndication/thread/1.0" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:opds="http://opds-spec.org/2010/catalog" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:odl="http://opds-spec.org/odl" xml:lang="en" xmlns:app="http://www.w3.org/2007/app">
  <id>http://www.feedbooks.com/books/top.atom?category=FBHIS000000&amp;lang=en&amp;range=week</id>
  <title>History</title>
  <updated>2015-07-06T15:41:14Z</updated>
  <icon>http://assets1.feedbooks.net/images/favicon.ico?t=1436193026</icon>
  <author>
    <name>Feedbooks</name>
    <uri>http://www.feedbooks.com</uri>
    <email>[email protected]</email>
  </author>
  <link type="application/atom+xml; profile=opds-catalog; kind=acquisition" title="Most Popular" href="http://www.feedbooks.com/books/top.atom?category=FBHIS000000&amp;lang=en&amp;range=week" rel="self"/>
  <link type="application/atom+xml;profile=opds-catalog;kind=navigation" title="Home" href="http://www.feedbooks.com/catalog.atom" rel="start"/>
  <link type="application/opensearchdescription+xml" title="Search on Feedbooks" href="http://www.feedbooks.com/opensearch.xml" rel="search"/>
  <link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="Bookshelf" href="https://www.feedbooks.com/user/bookshelf.atom" rel="http://opds-spec.org/shelf"/>
<opensearch:totalResults>41</opensearch:totalResults>
<opensearch:itemsPerPage>20</opensearch:itemsPerPage>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="Next Page" href="http://www.feedbooks.com/books/top.atom?category=FBHIS000000&amp;lang=en&amp;page=2&amp;protection=false" rel="next"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="Recently Added" href="http://www.feedbooks.com/books/recent.atom?category=FBHIS000000&amp;lang=en&amp;protection=false" rel="http://opds-spec.org/sort/new"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="History by country" href="/books/top.atom?category=FBHIS000000N&amp;lang=en&amp;protection=false" opds:facetGroup="In category" rel="http://opds-spec.org/facet" thr:count="4"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="English" href="/books/top.atom?category=FBHIS000000&amp;lang=en&amp;protection=false" opds:facetGroup="Language" rel="http://opds-spec.org/facet" thr:count="41" opds:activeFacet="true"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="French" href="/books/top.atom?category=FBHIS000000&amp;lang=fr&amp;protection=false" opds:facetGroup="Language" rel="http://opds-spec.org/facet" thr:count="11"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="German" href="/books/top.atom?category=FBHIS000000&amp;lang=de&amp;protection=false" opds:facetGroup="Language" rel="http://opds-spec.org/facet" thr:count="1"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="Spanish" href="/books/top.atom?category=FBHIS000000&amp;lang=es&amp;protection=false" opds:facetGroup="Language" rel="http://opds-spec.org/facet" thr:count="10"/>
<link type="application/atom+xml;profile=opds-catalog;kind=acquisition" title="Italian" href="/books/top.atom?category=FBHIS000000&amp;lang=it&amp;protection=false" opds:facetGroup="Language" rel="http://opds-spec.org/facet" thr:count="6"/>
<entry>
<title>The Prince</title>
<id>http://www.feedbooks.com/book/94</id>
<author>
  <name>Niccol&#242; Machiavelli</name>
  <uri>http://www.feedbooks.com/author/36</uri>
</author>
<published>2007-01-02T19:44:01Z</published>
<updated>2015-03-06T16:41:55Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1513</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FSHUM000000N" label="Human Science"/>
<category scheme="http://www.feedbooks.com/categories" term="FBPHI000000" label="Philosophy"/>
<category scheme="http://www.feedbooks.com/categories" term="FBSOC000000" label="Social science"/>
<category scheme="http://www.feedbooks.com/categories" term="FBPOL000000" label="Political science"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000N" label="History by country"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036000" label="United States"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036020N" label="Other"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS027000" label="Military"/>
<summary>Il Principe (The Prince) is a political treatise by the Florentine public servant and political theorist Niccol&#242; Machiavelli. Originally called De Principatibus (About Principalities), it was written in 1513, but not published until 1532, five yea...</summary>
<dcterms:extent>32,174 words</dcterms:extent>
<dcterms:source>Wikisource</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/94" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/94.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/94.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/94.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/94.jpg?size=large&amp;t=1425660115" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/94.jpg?t=1425660115" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/94.atom" rel="alternate"/>
</entry>
<entry>
<title>The Code of Hammurabi</title>
<id>http://www.feedbooks.com/book/4239</id>
<author>
  <name>Hammurabi</name>
  <uri>http://www.feedbooks.com/author/1216</uri>
</author>
<published>2009-09-23T07:29:12Z</published>
<updated>2015-03-06T16:57:11Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>-1790</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBSOC000000" label="Social science"/>
<category scheme="http://www.feedbooks.com/categories" term="FBPOL000000" label="Political science"/>
<summary>The Code of Hammurabi (Codex Hammurabi) is a well-preserved ancient law code, created ca. 1790 BC (middle chronology) in ancient Babylon. It was enacted by the sixth Babylonian king, Hammurabi. One nearly complete example of the Code survives toda...</summary>
<dcterms:extent>6,390 words</dcterms:extent>
<dcterms:source>http://oll.libertyfund.org/index.php?option=com_content&amp;task=view&amp;id=1472&amp;Itemid=264</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/4239" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/4239.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/4239.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/4239.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4239.jpg?size=large&amp;t=1425661031" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4239.jpg?t=1425661031" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/4239.atom" rel="alternate"/>
</entry>
<entry>
<title>Life On The Mississippi</title>
<id>http://www.feedbooks.com/book/4313</id>
<author>
  <name>Mark Twain</name>
  <uri>http://www.feedbooks.com/author/24</uri>
</author>
<published>2009-10-11T12:21:18Z</published>
<updated>2015-03-06T16:57:26Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1883</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHUM000000" label="Humor"/>
<category scheme="http://www.feedbooks.com/categories" term="FBTRV000000" label="Travel"/>
<summary>Life on the Mississippi is a memoir by Mark Twain detailing his days as a steamboat pilot on the Mississippi River before and after the American Civil War. The book begins with a brief history of the river. It continues with anecdotes of Twain's t...</summary>
<dcterms:extent>143,742 words</dcterms:extent>
<dcterms:source>Project Gutenberg</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/4313" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/4313.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/4313.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/4313.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4313.jpg?size=large&amp;t=1425661046" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4313.jpg?t=1425661046" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/4313.atom" rel="alternate"/>
</entry>
<entry>
<title>The Diary of a U-boat Commander</title>
<id>http://www.feedbooks.com/book/4208</id>
<author>
  <name>Sir William Stephen Richard King-Hall</name>
  <uri>http://www.feedbooks.com/author/1207</uri>
</author>
<published>2009-09-16T18:08:47Z</published>
<updated>2015-06-30T19:33:05Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1918</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBBIO000000" label="Biography &amp; autobiography"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000N" label="History by country"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036000" label="United States"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036020N" label="Other"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS027000" label="Military"/>
<summary>The diary of a World War One U-Boat commander. As well as being a fascinating glimpse of life on the German U-boats during the intense submarine blockade, this also reminds us there were humans involved - on both sides of the action - as we read t...</summary>
<dcterms:extent>48,813 words</dcterms:extent>
<dcterms:source>http://www.gutenberg.org/etext/7947</dcterms:source>
<rights>This work was published before 1923 and is in the public domain in the USA only.</rights>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/4208" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/4208.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/4208.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/4208.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4208.jpg?size=large&amp;t=1435692785" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/4208.jpg?t=1435692785" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/4208.atom" rel="alternate"/>
</entry>
<entry>
<title>The Federalist Papers</title>
<id>http://www.feedbooks.com/book/2674</id>
<author>
  <name>Publius</name>
  <uri>http://www.feedbooks.com/author/491</uri>
</author>
<published>2008-07-20T12:04:13Z</published>
<updated>2015-03-26T16:55:45Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1787</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBSOC000000" label="Social science"/>
<category scheme="http://www.feedbooks.com/categories" term="FBPOL000000" label="Political science"/>
<summary>The Federalist Papers are a series of 85 articles advocating the ratification of the United States Constitution. Seventy-seven of the essays were published serially in The Independent Journal and The New York Packet between October 1787 and August...</summary>
<dcterms:extent>189,954 words</dcterms:extent>
<dcterms:source>http://www.foundingfathers.info/federalistpapers/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/2674" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/2674.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/2674.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/2674.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2674.jpg?size=large&amp;t=1427388945" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2674.jpg?t=1427388945" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/2674.atom" rel="alternate"/>
</entry>
<entry>
<title>The Borgias</title>
<id>http://www.feedbooks.com/book/1248</id>
<author>
  <name>Alexandre Dumas</name>
  <uri>http://www.feedbooks.com/author/25</uri>
</author>
<published>2007-06-21T22:26:06Z</published>
<updated>2015-03-06T16:46:17Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1840</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<summary>No Description Available</summary>
<dcterms:extent>83,323 words</dcterms:extent>
<dcterms:source>http://gutenberg.org</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/1248" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/1248.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/1248.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/1248.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/1248.jpg?size=large&amp;t=1425660377" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/1248.jpg?t=1425660377" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/1248.atom" rel="alternate"/>
</entry>
<entry>
<title>Henry V</title>
<id>http://www.feedbooks.com/book/3029</id>
<author>
  <name>William Shakespeare</name>
  <uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-29T19:15:04Z</published>
<updated>2015-03-06T16:52:59Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1599</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>Henry V is a history play by William Shakespeare, believed to be written in 1599. It is based on the life of King Henry V of England, and focuses on events immediately before and after the Battle of Agincourt during the Hundred Years' War.
The pl...</summary>
<dcterms:extent>27,188 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3029" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3029.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3029.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3029.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3029.jpg?size=large&amp;t=1425660779" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3029.jpg?t=1425660779" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3029.atom" rel="alternate"/>
</entry>
<entry>
<title>King John</title>
<id>http://www.feedbooks.com/book/3038</id>
<author>
  <name>William Shakespeare</name>
  <uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-30T22:10:55Z</published>
<updated>2015-03-06T16:53:01Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1595</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>The Life and Death of King John, a history play by William Shakespeare, dramatizes the reign of King John of England (ruled 1199&#8211;1216), son of Henry II of England and Eleanor of Aquitaine and father of Henry III of England. It is believed to have ...</summary>
<dcterms:extent>21,524 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3038" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3038.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3038.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3038.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3038.jpg?size=large&amp;t=1425660781" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3038.jpg?t=1425660781" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3038.atom" rel="alternate"/>
</entry>
<entry>
<title>Richard III</title>
<id>http://www.feedbooks.com/book/3045</id>
<author>
  <name>William Shakespeare</name>
  <uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-10-01T10:25:03Z</published>
<updated>2015-03-06T16:53:02Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1591</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>Richard III is a history play by William Shakespeare, believed to have been written in approximately 1591. The play is an unflattering depiction of the short reign of Richard III of England. While generally classified as a history, as grouped in t...</summary>
<dcterms:extent>31,087 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3045" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3045.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3045.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3045.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3045.jpg?size=large&amp;t=1425660782" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3045.jpg?t=1425660782" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3045.atom" rel="alternate"/>
</entry>
<entry>
<title>Henry VIII</title>
<id>http://www.feedbooks.com/book/3040</id>
<author>
  <name>William Shakespeare</name>
  <uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-10-01T07:31:51Z</published>
<updated>2015-03-06T16:53:01Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1603</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>The Famous History of the Life of King Henry the Eighth is a history play by William Shakespeare, based on the life of Henry VIII of England. An alternative title, All is True, is recorded in contemporary documents, the title Henry VIII not appear...</summary>
<dcterms:extent>25,710 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3040" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3040.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3040.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3040.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3040.jpg?size=large&amp;t=1425660781" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3040.jpg?t=1425660781" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3040.atom" rel="alternate"/>
</entry>
<entry>
<title>Henry VI, Part 1</title>
<id>http://www.feedbooks.com/book/3033</id>
<author>
  <name>William Shakespeare</name>
  <uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-30T09:51:34Z</published>
<updated>2015-03-06T16:53:00Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1590</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>The First Part of King Henry the Sixth is history play by William Shakespeare, believed written in approximately 1588&#8211;1590. It is the first in the cycle of four plays often referred to as &quot;The First Tetralogy&quot;.</summary>
<dcterms:extent>22,578 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3033" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3033.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3033.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3033.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3033.jpg?size=large&amp;t=1425660780" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3033.jpg?t=1425660780" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3033.atom" rel="alternate"/>
</entry>
<entry>
<title>The Capture of a Slaver</title>
<id>http://www.feedbooks.com/book/6723</id>
<author>
  <name>John Taylor Wood</name>
  <uri>http://www.feedbooks.com/author/2180</uri>
</author>
<published>2013-04-11T08:21:04Z</published>
<updated>2015-03-06T17:06:08Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1900</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000N" label="History by country"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036000" label="United States"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036010N" label="Historical period"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS006010" label="Pre-Confederation (to 1867)"/>
<summary>A true personal account of the capture of a slave-running ship by a United States gunship in the fleet assigned for the suppression of the slave trade.  It is told in 1900 by John Taylor Wood, who, 50 years earlier, had been a young midshipmen on ...</summary>
<dcterms:extent>8,368 words</dcterms:extent>
<dcterms:source>University of Virginia Library http://etext.lib.virginia.edu/toc/modeng/public/WooCapt.html</dcterms:source>
<rights>Attribution Non-Commercial Share Alike (cc by-nc-sa)</rights>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/6723" rel="alternate"/>
<link type="text/html" title="Creative Commons" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" rel="license"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/6723.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/6723.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/6723.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/6723.jpg?size=large&amp;t=1425661568" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/6723.jpg?t=1425661568" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/6723.atom" rel="alternate"/>
</entry>
<entry>
<title>Glimpses of Unfamiliar Japan, Vol 1</title>
<id>http://www.feedbooks.com/book/2056</id>
<author>
  <name>Lafcadio Hearn</name>
  <uri>http://www.feedbooks.com/author/286</uri>
</author>
<published>2007-12-15T15:48:19Z</published>
<updated>2015-06-30T18:19:44Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1871</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBTRV000000" label="Travel"/>
<summary>A Japanese magic-lantern show is essentially dramatic. It is a play of which the dialogue is uttered by invisible personages, the actors and the scenery being only luminous shadows. Wherefore it is peculiarly well suited to goblinries and weirdnes...</summary>
<dcterms:extent>95,032 words</dcterms:extent>
<dcterms:source>http://www.gutenberg.org/dirs/etext05/8glm110.txt</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/2056" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/2056.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/2056.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/2056.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2056.jpg?size=large&amp;t=1435688384" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2056.jpg?t=1435688384" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/2056.atom" rel="alternate"/>
</entry>
<entry>
<title>Saint Joan</title>
<id>http://www.feedbooks.com/book/3255</id>
<author>
  <name>George Bernard Shaw</name>
  <uri>http://www.feedbooks.com/author/749</uri>
</author>
<published>2008-10-25T19:25:19Z</published>
<updated>2015-03-06T16:53:37Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1923</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>Saint Joan is a 1923 play by Irish playwright George Bernard Shaw depicting the life of Joan of Arc.</summary>
<dcterms:extent>36,740 words</dcterms:extent>
<dcterms:source>http://gutenberg.net.au/ebooks02/0200811h.html</dcterms:source>
<rights>This work is available for countries where copyright is Life+50 or in the USA (published before 1923).</rights>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3255" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3255.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3255.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3255.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3255.jpg?size=large&amp;t=1425660817" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3255.jpg?t=1425660817" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3255.atom" rel="alternate"/>
</entry>
<entry>
<title>The Story of the Pony Express</title>
<id>http://www.feedbooks.com/book/6666</id>
<author>
  <name>Glenn Danford Bradley</name>
  <uri>http://www.feedbooks.com/author/2149</uri>
</author>
<published>2013-03-07T09:35:53Z</published>
<updated>2015-03-06T17:05:53Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1913</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000N" label="History by country"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036000" label="United States"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036010N" label="Historical period"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS036050" label="Civil War Period (1850-1877)"/>
<summary>An account of the most remarkable mail service ever in existence, and its place in history.

The Pony Express was the first rapid transit and the first fast mail line across the North American continent from the Missouri River to the Pacific Coa...</summary>
<dcterms:extent>24,819 words</dcterms:extent>
<dcterms:source>Project Gutenberg Australia http://gutenberg.net.au/ebooks/w00112.html</dcterms:source>
<rights>Attribution Non-Commercial Share Alike (cc by-nc-sa)</rights>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/6666" rel="alternate"/>
<link type="text/html" title="Creative Commons" href="http://creativecommons.org/licenses/by-nc-sa/3.0/" rel="license"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/6666.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/6666.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/6666.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/6666.jpg?size=large&amp;t=1425661553" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/6666.jpg?t=1425661553" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/6666.atom" rel="alternate"/>
</entry>
<entry>
<title>Henry IV, Part 1</title>
<id>http://www.feedbooks.com/book/3023</id>
<author>
  <name>William Shakespeare</name>
  <uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-28T12:27:44Z</published>
<updated>2015-03-06T16:52:59Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1597</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>Henry IV, Part 1 is a history play by William Shakespeare, believed to have been written no later than 1597. It is the second of Shakespeare's tetralogy that deals with the successive reigns of Richard II, Henry IV (2 plays), and Henry V. Henry IV...</summary>
<dcterms:extent>25,762 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3023" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3023.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3023.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3023.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3023.jpg?size=large&amp;t=1425660779" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3023.jpg?t=1425660779" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3023.atom" rel="alternate"/>
</entry>
<entry>
<title>Richard II</title>
<id>http://www.feedbooks.com/book/3024</id>
<author>
  <name>William Shakespeare</name>
  <uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-28T13:25:10Z</published>
<updated>2015-03-06T16:52:59Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1595</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>King Richard the Second is a history play by William Shakespeare believed to be written in approximately 1595. It is based on the life of King Richard II of England and is the first part of a tetralogy, referred to by scholars as the Henriad, foll...</summary>
<dcterms:extent>23,655 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3024" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3024.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3024.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3024.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3024.jpg?size=large&amp;t=1425660779" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3024.jpg?t=1425660779" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3024.atom" rel="alternate"/>
</entry>
<entry>
<title>Glimpses of Unfamiliar Japan, Vol 2</title>
<id>http://www.feedbooks.com/book/2057</id>
<author>
  <name>Lafcadio Hearn</name>
  <uri>http://www.feedbooks.com/author/286</uri>
</author>
<published>2007-12-15T18:49:36Z</published>
<updated>2015-03-06T16:49:24Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1894</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBTRV000000" label="Travel"/>
<summary>No Description Available</summary>
<dcterms:extent>98,578 words</dcterms:extent>
<dcterms:source>http://www.gutenberg.org/dirs/etext05/8glm210.txt</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/2057" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/2057.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/2057.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/2057.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2057.jpg?size=large&amp;t=1425660564" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/2057.jpg?t=1425660564" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/2057.atom" rel="alternate"/>
</entry>
<entry>
<title>Ali Pacha</title>
<id>http://www.feedbooks.com/book/1247</id>
<author>
  <name>Alexandre Dumas</name>
  <uri>http://www.feedbooks.com/author/25</uri>
</author>
<published>2007-06-21T22:19:02Z</published>
<updated>2015-03-06T16:46:16Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1840</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<summary>No Description Available</summary>
<dcterms:extent>43,226 words</dcterms:extent>
<dcterms:source>http://gutenberg.org</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/1247" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/1247.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/1247.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/1247.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/1247.jpg?size=large&amp;t=1425660376" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/1247.jpg?t=1425660376" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/1247.atom" rel="alternate"/>
</entry>
<entry>
<title>Henry VI, Part 2</title>
<id>http://www.feedbooks.com/book/3034</id>
<author>
  <name>William Shakespeare</name>
  <uri>http://www.feedbooks.com/author/494</uri>
</author>
<published>2008-09-30T10:15:12Z</published>
<updated>2015-03-06T16:53:01Z</updated>
<dcterms:language>en</dcterms:language>
<dcterms:issued>1591</dcterms:issued>
<category scheme="http://www.feedbooks.com/categories" term="FBNFC000000" label="Non-Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBHIS000000" label="History"/>
<category scheme="http://www.feedbooks.com/categories" term="FBFIC000000" label="Fiction"/>
<category scheme="http://www.feedbooks.com/categories" term="FBDRA000000" label="Drama"/>
<summary>The Second Part of King Henry the Sixth, or Henry VI, Part 2, is a history play by William Shakespeare believed written in approximately 1590-91. It is the second part of the trilogy on Henry VI, and often grouped together with Richard III as a te...</summary>
<dcterms:extent>26,527 words</dcterms:extent>
<dcterms:source>http://shakespeare.mit.edu/</dcterms:source>
<link type="text/html" title="View on Feedbooks" href="http://www.feedbooks.com/book/3034" rel="alternate"/>
<link type="application/epub+zip" href="http://www.feedbooks.com/book/3034.epub" rel="http://opds-spec.org/acquisition"/>
<link type="application/x-mobipocket-ebook" href="http://www.feedbooks.com/book/3034.mobi" rel="http://opds-spec.org/acquisition"/>
<link type="application/pdf" href="http://www.feedbooks.com/book/3034.pdf" rel="http://opds-spec.org/acquisition"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3034.jpg?size=large&amp;t=1425660781" rel="http://opds-spec.org/image"/>
<link type="image/jpeg" href="http://covers.feedbooks.net/book/3034.jpg?t=1425660781" rel="http://opds-spec.org/image/thumbnail"/>
<link type="application/atom+xml;type=entry;profile=opds-catalog" title="Full entry" href="http://www.feedbooks.com/book/3034.atom" rel="alternate"/>
</entry>
</feed>

But when parsed through feedparser the generated published_parsed is

>>> import feedparser
>>> feedparser.__version__
'5.2.0'
>>> url = 'http://www.feedbooks.com/books/top.atom?category=FBHIS000000&lang=en&range=week'
>>> feed_data = feedparser.parse(url)
>>> entries = feed_data.entries
>>> entries[0]['published_parsed']
time.struct_time(tm_year=1513, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
#Similar for all the entries:
>>> for i in entries: print i['published_parsed']
...
time.struct_time(tm_year=1513, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=2015, tm_mon=6, tm_mday=28, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=179, tm_isdst=1)
time.struct_time(tm_year=1883, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1918, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1787, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1840, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1599, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1595, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1591, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1603, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1590, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1900, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1871, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1923, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1913, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1597, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1595, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1894, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1840, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
time.struct_time(tm_year=1591, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=1, tm_isdst=0)

Publish a .whl

See http://pythonwheels.com

RFC822 parse error : support local differential

Hello and thanks for the great work.

This date : Wed, 01 Jul 15 00:00:00 +0200, which is a valid RFC822 date, returns this tuple :
[200, 7, 1, 15, 0, 0, 1, 182, 0], where the year is obviously wrong.

I noticed it with this feed : http://www.cerveauetpsycho.fr/ewb_pages/f/flux_rss_general_cp.xml

I'm using the latest version of feedparser.

Thank you.

Error installing in Ubuntu 15.04

Traceback (most recent call last):
  File "/usr/bin/pip3", line 9, in <module>
    load_entry_point('pip==1.5.6', 'console_scripts', 'pip3')()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 558, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2682, in load_entry_point
    return ep.load()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2355, in load
    return self.resolve()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2361, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/lib/python3/dist-packages/pip/__init__.py", line 74, in <module>
    from pip.vcs import git, mercurial, subversion, bazaar  # noqa
  File "/usr/lib/python3/dist-packages/pip/vcs/mercurial.py", line 9, in <module>
    from pip.download import path_to_url
  File "/usr/lib/python3/dist-packages/pip/download.py", line 25, in <module>
    from requests.compat import IncompleteRead
ImportError: cannot import name 'IncompleteRead'

System ubuntu 15.04
Python 3.5.0
Requests 2.8.1

why "media:thumbnail" doesn't use _itsAnHrefDamnIt

I want to use "media:thumbnail" like "enclosures".
I think that mixin's _itsAnHrefDamnIt method should be used in this case.

Incorrect description when subtitle present in feed

Take the following podcast rss: http://feeds.serialpodcast.org/serialpodcast

It has both an itunes:subtitle and a description for the feed element. FeedParser only ever returns the itunes:subtitle, even when attempting to access feed.description.

I think that feed.description should have channel/description take precedent over the itunes subtitle. We have a separate feed.subtitle already.

How to check whether the passed url is a valid rss feed

I'm trying to validate the url, like If I pas feedparser.parse('http://rohitkhatri.com'), It should give me false.

Is there any property which I can access to get this information?

Rss feed with errors with feedparser but ok in w3c

Hi,

I'm trying to parse a RSS 2.0 feed :

http://www.legrandmix.com/data/rss.xml

It is OK with w3c validator :
https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.legrandmix.com%2Fdata%2Frss.xml

But I've an error with feedparser :

SAXParseException('not well-formed (invalid token)'

I'm using the last version of feedparser (5.2.1) on python 2.7.10+, on Debian Linux.

Thanks for your help !

"parse" generates error while parsing podcast URLs

% python3                                                                                                               
Python 3.5.1 (default, Mar  4 2016, 15:21:15) 
[GCC 6.0.0 20160302 (Red Hat 6.0.0-0.14)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import feedparser
>>> url = 'http://api.videos.ndtv.com/apis/podcast/index/client_key/ndtv-podcast-5d35e3e34a92df17d11d54e0ff241e8b?shows=503&showfull=1&media_type=audio&extra_params=keywords,description'
>>> feedparser.parse(url)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/site-packages/feedparser.py", line 3957, in parse
    saxparser.parse(source)
  File "/usr/lib64/python3.5/site-packages/drv_libxml2.py", line 190, in parse
    _d(reader.LocalName()))
  File "/usr/lib64/python3.5/site-packages/drv_libxml2.py", line 70, in _d
    return _decoder(s)[0]
  File "/usr/lib64/python3.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
TypeError: a bytes-like object is required, not 'str'
>>> feedparser.__version__
'5.2.0'
>>>

Relative link resolution doesn't work for some <img>

According to documentation <img src=""> is treated as URI and its link is resolved.

But I found the case where it doesn't work, Atom feed at http://jvns.ca/atom.xml

> import feedparser
> feedparser.__version__
'5.2.1'
> feedparser.RESOLVE_RELATIVE_URIS
1
> feed = feedparser.parse('http://jvns.ca/atom.xml')
> # This entry has img with relative URI
> e = feed['entries'][4]
> # Get HTML content
> content = e['content'][0]['value']
> # Find and show <img>
> i = content.find('img src')
> print(content[i-1:(i+50)])
<img src="/images/ml-feelings.jpg" width="300px" />

As you can see image source remained relative

python2 version will get empty summary_detail and content from this feed

feed: https://xueqiu.com/hots/topic/rss

>>> import feedparser
>>> f=feedparser.parse('https://xueqiu.com/hots/topic/rss')
>>> f.entries[0]
{'summary_detail': {'base': u'https://xueqiu.com/hots/topic/rss', 'type': u'text/html', 'value': u'', 'language': None}, 'published_parsed': time.struct_time(tm_year=2016, tm_mon=11, tm_mday=23, tm_hour=3, tm_min=16, tm_sec=41, tm_wday=2, tm_yday=328, tm_isdst=0), 'links': [{'href': u'http://xueqiu.com/4465952737/77946541', 'type': u'text/html', 'rel': u'alternate'}], 'title': u'\u9009\u80a1\u601d\u8def\u5206\u4eab\uff0c\u54ea\u4e9b\u4e8b\u60c5\u8981\u7559\u610f\uff1f', 'authors': [{'name': u'\u5f90\u51e4\u4fca'}], 'updated': u'2016-11-23T03:16:41Z', 'summary': u'', 'content': [{'base': u'https://xueqiu.com/hots/topic/rss', 'type': u'text/html', 'value': u'', 'language': None}], 'guidislink': False, 'title_detail': {'base': u'https://xueqiu.com/hots/topic/rss', 'type': u'text/plain', 'value': u'\u9009\u80a1\u601d\u8def\u5206\u4eab\uff0c\u54ea\u4e9b\u4e8b\u60c5\u8981\u7559\u610f\uff1f', 'language': None}, 'link': u'http://xueqiu.com/4465952737/77946541', 'author': u'\u5f90\u51e4\u4fca', 'published': u'Wed, 23 Nov 2016 03:16:41 GMT', 'author_detail': {'name': u'\u5f90\u51e4\u4fca'}, 'id': u'http://xueqiu.com/4465952737/77946541', 'updated_parsed': time.struct_time(tm_year=2016, tm_mon=11, tm_mday=23, tm_hour=3, tm_min=16, tm_sec=41, tm_wday=2, tm_yday=328, tm_isdst=0)}

Bottleneck, significant cpu time spent on datetimes

Hi,

I'm using rss2email by @wking and I ran some measurements to find the bottleneck because I found that rss2email a bit slow.

I used line_profiler to measure the cpu time.

From my investigations, I noticed that

most of the time is used by feedparser (not a big surprise, but good to check, excluding IOs)
more precisely, http.py function _build_urllib2_request
in this function, about 50% is used by urllib.request.Request(url) and 50% by _parse_date(modified)

Clearly, this function takes a lot of time and can probably be optimized. I leave this note as an open question for suggestions from experts.

From my feeds, rfc822 seems to be mostly used.

Thank you.

Middle European Saving Time (MEST) wrong parsed

Date in RSS Feed
Thu, 30 Apr 2015 08:57:00 MEST

Actual published.parsed
time.struct_time(tm_year=2015, tm_mon=4, tm_mday=30, tm_hour=20, tm_min=57, tm_sec=0, tm_wday=3, tm_yday=120, tm_isdst=0)

Expected published.parsed
time.struct_time(tm_year=2015, tm_mon=4, tm_mday=30, tm_hour=06, tm_min=57, tm_sec=0, tm_wday=3, tm_yday=120, tm_isdst=0)

Parse item image tag from RSS feed

In the following feed, each item has an "image" attribute. Any chance you can use feedparser to access this element?

https://iso.500px.com/feed/

Kind regards,

pieter

Not parsing all entries

When I try to parse the PBS NewsHour, the parser doesn't recognize all items as entries, and some of them doesn't have fields like description even if the XML contains it.

SyntaxError: invalid syntax

Trying to use this on python3 and I'm getting this?
feedparser.py", line 1353
ur'''(([a-zA-Z0-9_-.+]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([a-zA-Z0-9-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?))(?subject=\S+)?''',
^
SyntaxError: invalid syntax

the error is pointing to the last set of ''' before the comma.

I have played with this for a few hours, spliting up the line turing it into a string, etc, etc and no matter what it doesn't like that line.

Thanks

cjkpython gone

The docs here recommend installing a couple of packages from http://cjkpython.i18n.org/

i18n.org has been gone for long enough that Google's last cached copy of it was a godaddy hosting page.

So, I guess we need to figure out what the current recommended guidance here is and then update the docs.

Atom feeds are incorrectly having their text elements entity decoded

Take the following Atom feeds:

Explicit Text Type:

<feed xmlns="http://www.w3.org/2005/Atom">
  <title type="text">Example &lt;b&gt;Atom&lt;/b&gt;</title>
</feed>

Implicit Text Type:

<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Example &lt;b&gt;Atom&lt;/b&gt;</title>
</feed>

Feedparser will return the following for Title:

Example <b>Atom</b>

However, the Atom spec states that items with the type="text" attribute should be left as is. Indeed, this is the major difference between type="text" and type="html", where type="html" is explicit about containing escaped markup that should be decoded. If no type attribute exists, the element defaults to type="text".

So, to me, it seems as though feedparser is incorrectly entity-decoding Atom text elements. I think the correct logic would be something like (pseudo code):

if (feedType == "atom") 
    if (elem.Type == "" || elem.Type == "text")   
        return elem.CharData  
    else if (elem.Type == "html")   
       return decodeEntities(elem.CharData)  
    else if (elem.Type == "xhtml")   
       // ... Handle xhtml

I've checked the output of Simplepie another feed parser, and it indeed returns what I consider the expected output for both the quoted feeds above:
Example <b>Atom</b>

Service document (RFC 5023) not parsed properly

Trying to parse s simple document that consist only of several collections, produces 'collection' as a single entry dictionary with only one url - the last one in the document.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.