donohue / medium-to-jekyll Goto Github PK
View Code? Open in Web Editor NEWConvert Medium exported posts to Jekyll posts
Convert Medium exported posts to Jekyll posts
When a post is converted to Markdown, the trailing '*' on emphasized text is shifted one character to the right. So for example, it should read:
"This is *italic.* The following isn't."
Is instead rendered:
"This is *italic. *The following isn't."
my posts with emoji char (unicode) in it are making lxml.html.document_fromstring(html)
return <html><body><p>! D O C T Y P E h t m l > </p></body></html>
instead of the proper html code.
Which then cause an error while parsing the HTML later :
Traceback (most recent call last):
File "medium_to_jekyll.py", line 110, in <module>
main()
File "medium_to_jekyll.py", line 100, in main
title, date = extract_metadata(doc)
File "medium_to_jekyll.py", line 34, in extract_metadata
title = etree.tostring(doc.xpath('//title')[0], method='text', encoding='unicode')
IndexError: list index out of range
While I'm looking for the document_fromstring()
on the lxml doc I noticed this :
Really broken pages
The normal HTML parser is capable of handling broken HTML, but for pages that are far enough from HTML to call them 'tag soup', it may still fail to parse the page in a useful way. A way to deal with this is ElementSoup, which deploys the well-known BeautifulSoup parser to build an lxml HTML tree.However, note that the most common problem with web pages is the lack of (or the existence of incorrect) encoding declarations. It is therefore often sufficient to only use the encoding detection of BeautifulSoup, called UnicodeDammit, and to leave the rest to lxml's own HTML parser, which is several times faster.
I think there's something to investigate on this side
I'm using Python 3 ;)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.