Git Product home page Git Product logo

myhtml's Issues

postinstall link is broken

Can it be replaced with the latest release version?

wget https://github.com/lexborisov/myhtml/archive/caa7d711847c02db8c4dd24855fb1caa8b728a9a.tar.gz
make: wget: No such file or directory
make: *** [myhtml-c] Error 1

undefined method 'css' for Myhtml::Parser

    page = HTTP::Client.get "http://something.com"
    anu = page.body
    myhtml = Myhtml::Parser.new(anu)
    links = myhtml.css(".mycssselector").map(&.attribute_by("href")).to_a

cant cd to src/ext

hi, im newbie in crystal language, i try to use your package but when i run crystal deps its return that error

crystal deps
Updating https://github.com/kemalcr/kemal.git
Updating https://github.com/luislavena/radix.git
Updating https://github.com/jeromegn/kilt.git
Updating https://github.com/RX14/multipart.cr.git
Updating https://github.com/kostya/myhtml.git
Updating https://github.com/kostya/modest.git
Using kemal (ac8ec0a07b5929dc1656006a135e08a2a2732f5e)
Using radix (0.3.5)
Using kilt (0.3.3)
Using multipart (0.1.1)
Installing myhtml (master)
Postinstall cd src/ext && make package
Failed cd src/ext && make package:
/bin/sh: 1: cd: can't cd to src/ext

Possible to detect HTML Entities?

With this code:

html = Myhtml::Parser.new(" ")
puts html.body!.children.inspect

# Myhtml::Iterator::Children(@start_node=Myhtml::Node(:body), @current_node=Myhtml::Node(:_text, "Β "))

Is there a way I can ask if this node is an entity? Or do I just check that it's a _text node with a space?

Parsing incomplete fragments

Example:

html = <<-HTML
  <tr><td>Hello</td></tr>
  <tr><td>123</td><td>other</td></tr>
  <tr><td>foo</td><td>columns</td></tr>
  <tr><td>bar</td><td>are</td></tr>
  <tr><td>xyz</td><td>ignored</td></tr>
HTML

myhtml = Myhtml::Parser.new(html)
puts myhtml.css("tr td").map(&.to_html).to_a
# => []

Is this possible somehow? It only works once I wrap the HTML in <table>.

Is it possible to get the original doctype?

The original HTML content is

  origin = <<-HTML
    <!doctype html>
    <html lang="en">
      <head>
       <title></title>
      </head>
      <body> </body>
  </html>
  HTML

But, I can't get it to print out the DOCTYPE after parsing:

    puts MyHTML::Parser.new(origin).root!.to_html

Is there a way to put out the full HTML with a doctype?

Less to type?

In the PR and create example, I find it’s a little tedious to use (...) and node all the time.

div = tree.create_node(:div)
div.attribute_add("class", "red")
body.append_child(div)

Can be

div = tree.create_div
div.attribute_add "class", "red"
body.append_child div

What do you think?

Invalid memory access

Code:

require "myhtml"
require "modest"
require "http/client"
url = "http://academica.ru/vysshee-obrazovanie/negosudarstvennyj-vuz/stranitsa_1/"
response = HTTP::Client.get url
source = Myhtml::Parser.new(response.body)
source.css("li.sectionListItem").each do |node|
    p node
end

Output:

laptop% crystal build parser.cr
laptop% ./parser               
Invalid memory access (signal 11) at address 0x8
[4742053] *CallStack::print_backtrace:Int32 +117
[4710520] __crystal_sigfault_handler +56
[140079244742784] ???
[5443431] modest_finder_by_selectors_list +119
[5264191] *Modest::Finder#find<Myhtml::Node>:Myhtml::CollectionIterator +127
[5255785] *Myhtml::Parser#css<String>:Myhtml::CollectionIterator +233
[4657361] ???
[4710265] main +41
[140079232176785] __libc_start_main +241
[4655226] _start +42
[0] ???
laptop%

Version:

laptop% crystal --version
Crystal 0.20.5 (2017-01-25)

myhtml with http::client.get

I try everything but cant make to this lib work with HTTP::Client.get method I convert response to string but still dont work

Looking for a "inner_html" equivalent

While there is inner_text, and to_html, neither of them achieves what I'm looking for: combined HTML for everything inside the node.

<div>
<a href="#">Link</a>
<p>Read this</p>
</div>

I want to get <a href="#">Link</a><p>Read this</p> as the complete inner_html of the node. I couldn't find a straight-forward way of doing this.

Parse Document Fragments?

Hello, first let me say thank you for this shard!

I am trying to convert my document back into html without adding html, head, body tags. I am parsing components and not full documents. Here is a nice example:

require "myhtml"

class HTMLTransformer
  property key_attribute : String
  property state_attribute : String

  def initialize(key_attribute = "key", state_attribute = "state")
    @key_attribute = key_attribute
    @state_attribute = state_attribute
  end

  def add_state_to_html(component, html)
    return if html.blank?

    key, state = ["1234", "5678"]

    transform_root(component, html) do |root|
      root[key_attribute] = key
      root[state_attribute] = state
    end
  end

  private def transform_root(component, html)
    fragment = Myhtml::Parser.new(html)

    root = fragment.root!

    yield root

    puts fragment.to_html
    fragment.to_html
  end
end

class Test
  def initialize
    @hi = "hi"
    @num = 1
  end
end

test = Test.new
transformer = HTMLTransformer.new
html = <<-HTML
<div id="t1" class="red">
  <a href="/#" data-motion="add">Link to site</a>
</div>
HTML
puts transformer.add_state_to_html(test, html) == <<-HTML
<div id="t1" class="red" key="1234" state="5678">
  <a href="/#" data-motion="add">Link to site</a>
</div>
HTML

# Outputs:
#   <html key="1234" state="5678"><head></head><body><div id="t1" class="red">
#     <a href="/#" data-motion="add">Link to site</a>
#   </div></body></html>
#   false

Is there any way to do this? If not, do you have a recommended side step?

Unable to parse <template> tag

When I try to parse the template tag, it just returns nil.

html = Myhtml::Parser.new("<template>test</template>")
body = html.body!

puts body.children.inspect

#=> Myhtml::Iterator::Children(@start_node=Myhtml::Node(:body), @current_node=nil)

The strange thing is that I can make up random tags like <jeremy> and it parses those fine.

html = Myhtml::Parser.new("<jeremy>test</jeremy>")
body = html.body!

puts body.children.inspect
#=> Myhtml::Iterator::Children(@start_node=Myhtml::Node(:body), @current_node=Myhtml::Node(:last_entry))

Clang 12: error: cast to smaller integer type 'mycss_selectors_function_drop_type_t'

Whenever I try to include this as a dependency for a new project the postinstall fails. When I try to make it manually, it fails with the same error. This wasn't the case in previous versions.

Here's the full error:

(base) Fishbowl:myhtml shark$ make
cd src/ext && make package
git clone https://github.com/lexborisov/Modest.git ./modest-c
Cloning into './modest-c'...
remote: Enumerating objects: 4945, done.
remote: Counting objects: 100% (34/34), done.
remote: Compressing objects: 100% (28/28), done.
remote: Total 4945 (delta 11), reused 15 (delta 6), pack-reused 4911
Receiving objects: 100% (4945/4945), 6.44 MiB | 25.38 MiB/s, done.
Resolving deltas: 100% (3556/3556), done.
cd modest-c && git reset --hard 393338d994c921705ff71dfbd1d98ceb31328f14
HEAD is now at 393338d Update includes.
cd modest-c && make static MyHTML_BUILD_SHARED=OFF MyCORE_BUILD_WITHOUT_THREADS=YES PROJECT_OPTIMIZATION_LEVEL=-O3 -j
sed -e 's,@version\@,0.0.6,g' -e 's,@prefix\@,/usr/local,g' -e 's,@exec_prefix\@,/usr/local,g' -e 's,@libdir\@,lib,g' -e 's,@includedir\@,include,g' -e 's,@cflags\@,-I$\{includedir}/modest -I$\{includedir}/mycore -I$\{includedir}/mycss -I$\{includedir}/myencoding -I$\{includedir}/myfont -I$\{includedir}/myhtml -I$\{includedir}/myunicode -I$\{includedir}/myurl,g' -e 's,@libname\@,modest,g' -e 's,@description\@,fast HTML renderer library with no outside dependency,g' modest.pc.in >  modest.pc
mkdir -p bin lib test_suite

cc -Wall -Werror -pipe -pedantic -Isource -DMyCORE_BUILD_WITHOUT_THREADS -fPIC -O3 -Wno-unused-variable -Wno-unused-function -std=c99 -DMODEST_BUILD_OS=Darwin -DMODEST_PORT_NAME=posix -DMyCORE_OS_DARWIN   -c -o ssource/mycss/selectors/serialization.c:183:69: error: cast to smaller integer type 'mycss_selectors_function_drop_type_t' (aka 'enum mycss_selectors_function_drop_type') from 'void *' [-Werror,-Wvoid-pointer-to-enum-cast]
                    mycss_selectors_function_drop_type_t drop_val = mycss_selector_value_drop(selector->value);
                                                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
source/mycss/selectors/value.h:28:41: note: expanded from macro 'mycss_selector_value_drop'
#define mycss_selector_value_drop(obj) ((mycss_selectors_function_drop_type_t)(obj))
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
ource/myhtml/./tokenizer_end.o source/myhtml/./tokenizer_end.c
make[2]: *** [source/mycss/selectors/serialization.o] Error 1
make[2]: *** Waiting for unfinished jobs....
source/mycss/selectors/function_parser.c:469:57: error: cast to smaller integer type 'mycss_selectors_function_drop_type_t' (aka 'enum mycss_selectors_function_drop_type') from 'void *' [-Werror,-Wvoid-pointer-to-enum-cast]
        mycss_selectors_function_drop_type_t drop_val = mycss_selector_value_drop(selector->value);
                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
source/mycss/selectors/value.h:28:41: note: expanded from macro 'mycss_selector_value_drop'
#define mycss_selector_value_drop(obj) ((mycss_selectors_function_drop_type_t)(obj))
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make[2]: *** [source/mycss/selectors/function_parser.o] Error 1
make[1]: *** [modest-c/lib/libmodest_static.a] Error 2
make: *** [src/ext/myhtml-c/lib/libmodest_static.a] Error 2

Error on building myhtml in docker

I am trying to build this in a docker and gets following error:

 > [6/6] RUN CRYSTAL_ENV=production crystal build --release src/worker.cr:
#10 154.0 _main.o: In function `initialize':
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:14: undefined reference to `myhtml_create'
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:15: undefined reference to `myhtml_init'
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:20: undefined reference to `myhtml_tree_create'
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:21: undefined reference to `myhtml_tree_init'
#10 154.0 _main.o: In function `parse':
#10 154.0 /app/lib/myhtml/src/myhtml/parser.cr:95: undefined reference to `myencoding_detect_and_cut_bom'
#10 154.0 /app/lib/myhtml/src/myhtml/parser.cr:116: undefined reference to `myhtml_parse'
#10 154.0 _main.o: In function `initialize':
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:24: undefined reference to `myhtml_destroy'
#10 154.0 _main.o: In function `free':
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:153: undefined reference to `myhtml_tree_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:154: undefined reference to `myhtml_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:153: undefined reference to `myhtml_tree_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:154: undefined reference to `myhtml_destroy'
#10 154.0 _main.o: In function `document!':
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:58: undefined reference to `myhtml_tree_get_document'
#10 154.0 _main.o: In function `initialize':
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:9: undefined reference to `mycss_create'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:11: undefined reference to `mycss_init'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:17: undefined reference to `mycss_entry_create'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:18: undefined reference to `mycss_entry_init'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:25: undefined reference to `modest_finder_create_simple'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:26: undefined reference to `mycss_entry_selectors'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:27: undefined reference to `mycss_selectors_parse'
#10 154.0 _main.o: In function `search_from':
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:36: undefined reference to `modest_finder_by_selectors_list'
#10 154.0 _main.o: In function `free':
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:43: undefined reference to `mycss_selectors_list_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:44: undefined reference to `modest_finder_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:45: undefined reference to `mycss_entry_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:46: undefined reference to `mycss_destroy'
#10 154.0 _main.o: In function `initialize':
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:13: undefined reference to `mycss_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:20: undefined reference to `mycss_entry_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:21: undefined reference to `mycss_destroy'
#10 154.0 _main.o: In function `free':
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:43: undefined reference to `mycss_selectors_list_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:44: undefined reference to `modest_finder_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:45: undefined reference to `mycss_entry_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:46: undefined reference to `mycss_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:43: undefined reference to `mycss_selectors_list_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:44: undefined reference to `modest_finder_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:45: undefined reference to `mycss_entry_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:46: undefined reference to `mycss_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:43: undefined reference to `mycss_selectors_list_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:44: undefined reference to `modest_finder_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:45: undefined reference to `mycss_entry_destroy'
#10 154.0 /app/lib/myhtml/src/myhtml/css_filter.cr:46: undefined reference to `mycss_destroy'
#10 154.0 _main.o: In function `free':
#10 154.0 /app/lib/myhtml/src/myhtml/iterator/collection.cr:44: undefined reference to `myhtml_collection_destroy'
#10 154.0 _main.o: In function `next':
#10 154.0 /app/lib/myhtml/src/myhtml/node/navigate.cr:2: undefined reference to `myhtml_node_next'
#10 154.0 _main.o: In function `parent':
#10 154.0 /app/lib/myhtml/src/myhtml/node/navigate.cr:2: undefined reference to `myhtml_node_parent'
#10 154.0 _main.o: In function `next':
#10 154.0 /app/lib/myhtml/src/myhtml/node/navigate.cr:2: undefined reference to `myhtml_node_next'
#10 154.0 _main.o: In function `child':
#10 154.0 /app/lib/myhtml/src/myhtml/node/navigate.cr:2: undefined reference to `myhtml_node_child'
#10 154.0 _main.o: In function `next':
#10 154.0 /app/lib/myhtml/src/myhtml/node/navigate.cr:2: undefined reference to `myhtml_node_next'
#10 154.0 _main.o: In function `tag_id':
#10 154.0 /app/lib/myhtml/src/myhtml/node.cr:24: undefined reference to `myhtml_node_tag_id'
#10 154.0 _main.o: In function `parent':
#10 154.0 /app/lib/myhtml/src/myhtml/node/navigate.cr:2: undefined reference to `myhtml_node_parent'
#10 154.0 _main.o: In function `next':
#10 154.0 /app/lib/myhtml/src/myhtml/node/navigate.cr:2: undefined reference to `myhtml_node_next'
#10 154.0 _main.o: In function `tag_text_slice':
#10 154.0 /app/lib/myhtml/src/myhtml/node.cr:65: undefined reference to `myhtml_node_text'
#10 154.0 _main.o: In function `tag_id':
#10 154.0 /app/lib/myhtml/src/myhtml/node.cr:24: undefined reference to `myhtml_node_tag_id'
#10 154.0 _main.o: In function `nodes':
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:78: undefined reference to `myhtml_get_nodes_by_tag_id'
#10 154.0 /app/lib/myhtml/src/myhtml/tree.cr:78: undefined reference to `myhtml_get_nodes_by_tag_id'
#10 154.0 _main.o: In function `attribute_by':
#10 154.0 /app/lib/myhtml/src/myhtml/node/attributes.cr:78: undefined reference to `myhtml_node_attribute_first'
#10 154.0 /app/lib/myhtml/src/myhtml/node/attributes.cr:81: undefined reference to `myhtml_attribute_next'
#10 154.0 _main.o: In function `attribute_name':
#10 154.0 /app/lib/myhtml/src/myhtml/node/attributes.cr:93: undefined reference to `myhtml_attribute_key'
#10 154.0 _main.o: In function `attribute_value':
#10 154.0 /app/lib/myhtml/src/myhtml/node/attributes.cr:99: undefined reference to `myhtml_attribute_value'
#10 154.0 collect2: error: ld returned 1 exit status
#10 154.0 Error: execution of command failed with code: 1: `cc "${@}" -o /app/worker  -rdynamic -L/usr/bin/../lib/crystal/lib -lxml2 -lz `command -v pkg-config > /dev/null && pkg-config --libs --silence-errors libssl || printf %s '-lssl -lcrypto'` `command -v pkg-config > /dev/null && pkg-config --libs --silence-errors libcrypto || printf %s '-lcrypto'` /app/lib/myhtml/src/myhtml/../ext/modest-c/lib/libmodest_static.a -lyaml -lpcre -lm -lgc -lpthread /usr/share/crystal/src/ext/libcrystal.a -levent -lrt -ldl`

When I run on my mac directly, it works fine.

This is my dockerfile:

FROM crystallang/crystal:1.0.0

RUN mkdir /app
WORKDIR /app

COPY . /app

RUN shards install --production --ignore-crystal-version
RUN CRYSTAL_ENV=production crystal build --release src/worker.cr

Myhtml::Iterator::Collection#empty? returns different value when calling more than one times.

I found that Myhtml::Iterator::Collection#empty? doesn't return same value when calling more than one times. I don't know why this happens, but I think it is not an expected behavior.

Here is a short example. Are there any mistakes in my code?

require "myhtml"

html = <<-HTML
<html>
<meta>
<head>
  <title>page title</title>
</head>
<body></body>
</html>
BODY
HTML

myhtml = Myhtml::Parser.new(html)
node = myhtml.css("title")
p node.size # => 1
p node.empty? # => true
p node.size # => 1
p node.empty? # => false
p node.size # => 1
p node.empty? # => false

btw, thank you for this library. myhtml is very helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.