Git Product home page Git Product logo

prism's Introduction

Prism

Ruby microformat parser and HTML toolkit

RDoc | Gem | Metrics | Microformats.org

What Prism is:

  • A robust microformat parser
  • A command-line tool for parsing microformats from a url or a string of markup
  • A DSL for defining semantic markup patterns
  • Export microformats to other standards:
    • hCard => vCard

It is your lowercase semantic web friend.

Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns (e.g. XHTML, blogging).

Learn more about Microformats at http://microformats.org.

Usage

The command line tool takes a SOURCE from the Standard Input or as an argument:

$: curl http://markwunsch.com | prism --hcard > ~/Desktop/me.vcf

OR

$: prism --hcard http://markwunsch.com > ~/Desktop/me.vcf

Installation

With Ruby and Rubygems:

gem install prism

Or clone the repository and run bundle install to get the development dependencies.

Requirements:

Microformats supported (right now, as of this very moment)

More on the way.

Finding Microformats:

# All microformats
Prism.find 'http://foobar.com'

# A specific microformat
Prism.find 'http://foobar.com', :hcard

# Search HTML too
Prism.find big_string_of_html

Parsing Microformats:

twitter_contacts = Prism.find 'http://twitter.com/markwunsch', :hcard
me = twitter_contacts.first
me.fn
#=> "Mark Wunsch"
me.n.family_name
#=> "Wunsch"
me.url
#=> ["http://markwunsch.com/"]
File.open('mark.vcf','w') {|f| f.write me.to_vcard }
## Add me to your address book!	

POSH DSL

The Prism module defines a group of methods to search, validate, and extract nodes out of a Nokogiri document.

All microformats inherit from Prism::POSH, because all microformats begin as POSH formats. If you wanted to create your own POSH format, you'd do something like this:

class Navigation < Prism::POSH
	search {|document| document.css('ul#navigation') }
	# Search a Nokogiri document for nodes of a certain type
	
	validate {|node| node.matches?('ul#navigation') }
	# Validate that a node is the right element we want
	
	has_many :items do
		search {|doc| doc.css('li') }
	end
	# has_many and has_one define properties, which themselves inherit from
	# Prism::POSH::Base, so you can do :has_one, :has_many, :search, :extract, etc.
end

Now you can do:

nav = Navigation.parse_first(document) 
# document is a Nokogiri document. 
# parse_first extracts just the first example of the format out of the document

nav.items
# Returns an array of contents
# This method comes from the has_many call up above that defines the :items property

Other Microformat parsers

  • Mofo is a Ruby microformat parser backed by Hpricot.
  • Sumo is a JavaScript microformat parser.
  • Operator is a Firefox extension.
  • hKit is a microformat parser for PHP.
  • Oomph is a microformat toolkit add-in for Internet Explorer.

Feature wishlist:

  • HTML outliner (using HTML5 sectioning)
  • HTML5 article, time, etc POSH support
  • Extensions so you can do something like: String.is_a_valid? :hcard in your tests
  • Extensions to turn Ruby objects into semantic HTML. Hash.to_definition_list, Array.to_ordered_list, etc.

TODO:

  • Code is ugly. Especially XOXO.
  • Better recursive parsing of trees. See above.
  • Tests are all kinds of disorganized. And slow.
  • Broader support for some of the weirder Patterns, like object[data]
  • Man pages (see Ron)

License

Prism is licensed under the MIT License and is Copyright (c) 2010 Mark Wunsch.

prism's People

Contributors

mwunsch avatar sentientmonkey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

prism's Issues

Prism is broken with the latest nokogiri release

/home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/microformat/hcard.rb:60:in []': can't convert String into Integer (TypeError) from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/microformat/hcard.rb:60 from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism.rb:102:incall'
from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism.rb:102:in extract_from' from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism.rb:109:inparse'
from /usr/local/lib/site_ruby/1.8/rubygems/custom_require.rb:31:in collect' from /home/test_prism/vendor/gems/ruby/1.8/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:402:incall'
from /home/test_prism/vendor/gems/ruby/1.8/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:402:in each' from /home/test_prism/vendor/gems/ruby/1.8/gems/nokogiri-1.4.2/lib/nokogiri/xml/node.rb:401:ineach'
from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism.rb:109:in collect' from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism.rb:109:inparse'
from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/posh/base.rb:144:in get_properties' from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/posh/base.rb:140:ineach_pair'
from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/posh/base.rb:140:in get_properties' from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/posh/base.rb:175:into_h'
from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/posh/base.rb:195:in empty?' from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/posh/base.rb:148:inget_properties'
from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/posh/base.rb:147:in reject' from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/posh/base.rb:147:inget_properties'
from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/posh/base.rb:175:in to_h' from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/posh/base.rb:165:in[]'
from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/posh/base.rb:92:in fn' from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/lib/prism/microformat/hcard.rb:99:into_vcard'
from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/bin/prism:95
from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/bin/prism:80:in parse_microformats' from /home/test_prism/vendor/gems/ruby/1.8/gems/prism-0.1.0/bin/prism:95 from bin/prism:3:inload'
from bin/prism:3

When we instruct Gem bundle to use 1.4.1 everything works fine ;) Might be better off declaring which version of the dependency you need.

Prism::Microformat::HCard returns anonymous class objects which can't be converted to YAML

From an irb session :-

vcard = Prism.find(open('<some_linkedin_profile>'), :hcard)
vcard.second   # Returns Prism::Microformat::HCard object -> {:title=>"Title_string", :org=>{:organization_name=>"The Organization Name"}}
current = vcard.second[:org] # This object's type is an anonymous class
current.to_yaml
TypeError: can't dump anonymous class Class
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:6:in `to_yaml'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:41:in `node_export'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:41:in `add'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:41:in `to_yaml'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:40:in `each'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:40:in `to_yaml'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:39:in `map'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:39:in `to_yaml'
        from /usr/lib64/ruby/1.8/yaml.rb:391:in `call'
        from /usr/lib64/ruby/1.8/yaml.rb:391:in `emit'
        from /usr/lib64/ruby/1.8/yaml.rb:391:in `quick_emit'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:38:in `to_yaml'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:18:in `node_export'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:18:in `add'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:18:in `to_yaml'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:17:in `each'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:17:in `to_yaml'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:16:in `map'
        from /usr/lib64/ruby/1.8/yaml/rubytypes.rb:16:in `to_yaml'
        from /usr/lib64/ruby/1.8/yaml.rb:391:in `call'
        from /usr/lib64/ruby/1.8/yaml.rb:391:in `emit'
        from /usr/lib64/ruby/1.8/yaml.rb:391:in `quick_emit'

We are working around this issue by doing an inspect on current and then serializing it. We wanted to know whether this issue is something that can be fixed at your end.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.