Comments (9)
I can't get result with that url, I get 302...do you have any other url that I can test???
using your url fails here
this code in /readability/lib/readability.ex
%{status_code: _, body: raw, headers: headers} = HTTPoison.get!(url, [], httpoison_options)
here HTTPoison is not giving any result,
actually you can check on iex
HTTPoison.get!("https://sergiotapia.me/pluralizing-strings-in-javascript-es6-b5d4d651d403", [], [])
and the result is 302...so I can't even get the floki error
from readability.
Which version of Floki are you using? The internal API of Floki is unstable, and Floki.Selector.match?/2 doesn't exist on the latest release.
You'll need to use Floki < 0.16 (A 0.15 or older version).
from readability.
@jonzlin95 Do you mean I need to bring in Floki manually in my dependency list?
from readability.
Just check your mix.lock file and make sure you're not running a version of Floki that's too new.
from readability.
@jonzlin95 Thanks for the help! In my mix.lock
file I have this:
"floki": {:hex, :floki, "0.14.0", "91a6be57349e10a63cf52d7890479a19012cef9185fa93c305d4fe42e6a50dee", [:mix], [{:mochiweb, "~> 2.15", [hex: :mochiweb, optional: false]}]},
I'm still getting strange Floki build error, like Readability is calling functions that don't exist in the Floki package.
** (FunctionClauseError) no function clause matching in Floki.HTMLTree.build/1
stacktrace:
(floki) lib/floki/html_tree.ex:14: Floki.HTMLTree.build(nil)
(floki) lib/floki/finder.ex:48: Floki.Finder.find_selectors/2
(floki) lib/floki/filter_out.ex:17: Floki.FilterOut.filter_out/2
(floki) lib/floki.ex:210: Floki.text/2
(readability) lib/readability/helper.ex:75: Readability.Helper.text_length/1
(readability) lib/readability/article_builder.ex:36: Readability.ArticleBuilder.build/2
Can you let me know what to put into my mix.exs file and what command to run to update my dependencies? I have a feeling Readability is still calling the newer version of floki for some reason.
I put this in my mix file to make sure I have the 0.14 version.
{:readability, "~> 0.8.0"},
{:floki, "~> 0.14.0"},
Is this the right way to do it?
from readability.
Actually I see now that it's trying to build an HTML tree for nil
. I wonder what's causing it to end up using nil. I know I'm passing the readability function a very long html string.
from readability.
This is the HTML that I'm trying to parse.
https://gist.github.com/sergiotapia/6ea4f860f9c4759dec1036118ac38872
It looks like the Floki's internal method: filter_out(html_tree, "script")
is returning nil. This may be fixed in newer versions of Floki. :(
from readability.
@sergiotapia Some code changing is necessary for upgrade Floki dependency. Please feel free to make PR
from readability.
Floki has been updated and works fine with the latest version. Also, Floki.Selector.match?
is no longer used in the codebase, so possibly that issue is no longer relevant (at least I can't replicate it).
If you still have this issue, please open separate issue.
from readability.
Related Issues (20)
- XML version tag seems to break summarize HOT 3
- Title suffix detection breaks with phrases-like-this
- Title tag finder pulls titles from SVG elements
- Multiple title matches are concatenated
- TitleFinder raises an ArgumentError if no titles are found
- Readability.article(raw_html) encounter error HOT 6
- Relative image urls are broken in extracted content.
- Bug when extracting article from HTML HOT 2
- (FunctionClauseError) no function clause matching in Readability.Helper.remove_tag/2 HOT 2
- Support current version of Elixir and Floki HOT 4
- (FunctionClauseError) no function clause matching in Floki.HTMLTree.build/1 HOT 2
- Make url request interface that returns structured result data
- Dependency versions too old HOT 2
- Command line interface
- Publish Updated Release to Hex HOT 1
- Unify options used for `summarize` and `article`
- Extract authors
- Floki Dependency HOT 2
- Summarize from Raw HTML HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from readability.