Git Product home page Git Product logo

publicsuffix-elixir's Introduction

PublicSuffix

PublicSuffix is an Elixir library to operate on domain names using the public suffix rules provided by https://publicsuffix.org/:

A "public suffix" is one under which Internet users can (or historically could) directly register names. Some examples of public suffixes are .com, .co.uk and pvt.k12.ma.us. The Public Suffix List is a list of all known public suffixes.

This Elixir library provides a means to get the public suffix and the registrable domain from any domain:

iex(1)> PublicSuffix.registrable_domain("mysite.foo.bar.com")
"bar.com"
iex(2)> PublicSuffix.registrable_domain("mysite.foo.bar.co.uk")
"bar.co.uk"
iex(3)> PublicSuffix.public_suffix("mysite.foo.bar.com")
"com"
iex(4)> PublicSuffix.public_suffix("mysite.foo.bar.co.uk")
"co.uk"

The publicsuffix.org data file contains both official ICANN records and private records:

ICANN domains are those delegated by ICANN or part of the IANA root zone database. The authorized registry may express further policies on how they operate the TLD, such as subdivisions within it. Updates to this section can be submitted by anyone, but if they are not an authorized representative of the registry then they will need to back up their claims of error with documentation from the registry's website.

PRIVATE domains are amendments submitted by the domain holder, as an expression of how they operate their domain security policy. Updates to this section are only accepted from authorized representatives of the domain registrant. This is so we can be certain they know what they are getting into.

By default, PublicSuffix considers private domain records, but you can tell it to ignore them:

iex(1)> PublicSuffix.registrable_domain("foo.github.io")
"foo.github.io"
iex(2)> PublicSuffix.public_suffix("foo.github.io")
"github.io"
iex(3)> PublicSuffix.registrable_domain("foo.github.io", ignore_private: true)
"github.io"
iex(4)> PublicSuffix.public_suffix("foo.github.io", ignore_private: true)
"io"

Working with Rules

You can also gain access to the prevailing rule for a particular domain:

iex(1)> PublicSuffix.prevailing_rule("mysite.foo.bar.com")
"com"
iex(2)> PublicSuffix.prevailing_rule("mysite.example")
"*"

The value returned in the last example ("*") is the fallback rule when there is no explicit matching rule defined in the rules file. If you just want to know if a domain matches an explicit matching rule, we provide a predicate for that:

iex(1)> PublicSuffix.matches_explicit_rule?("mysite.foo.bar.com")
true
iex(2)> PublicSuffix.matches_explicit_rule?("mysite.example")
false

Installation

The package can be installed as:

  1. Add public_suffix to your list of dependencies in mix.exs:

    def deps do [{:public_suffix, "~> 0.5.0"}] end

  2. If using Elixir < 1.4, then ensure public_suffix is started before your application:

    def application do [applications: [:public_suffix]] end

Configuration

PublicSuffix is bundled with a cached copy of the public suffix rules from publicsuffix.org, but can be configured to download the rules files on compilation by adding the following line to your project's config.exs:

config :public_suffix, download_data_on_compile: true

There are pros and cons to both approaches; which you choose will depend on the needs of your project:

  • Setting download_data_on_compile to true will ensure that the rules are always up-to-date (as of the time you last compiled) but could introduce an instability. While we have tried to implement the logic in this library according to the publicsuffix.org spec, one can imagine future rule changes not being handled properly by the existing logic and manifesting itself in a new bug.
  • Setting download_data_on_compile to false (or not setting it at all) ensures stable, consistent behavior. In the context of your project, you may want compilation to be deterministic. Compilation is also a bit faster when a new copy of the rules is not downloaded.

Updating the suffix list

Run mix public_suffix.sync_files at a command prompt.

Known Issues

The Public Suffix specification specifically allows wildcards to appear multiple times in a rule and at any position:

Wildcards are not restricted to appear only in the leftmost position, but they must wildcard an entire label. (I.e. *.*.foo is a valid rule: *bar.foo is not.)

However, while supporting a single leading wildcard is easy, supporting multiple wildcards and wildcards at any position is far more difficult. Furthermore, all wildcard rules in the publicsuffix.org data file use a wildcard only at the leftmost position. There is also an open conversation going about this issue:

publicsuffix/list#145

From the issue, most public suffix implementations, including Mozilla and Chromium, only support wildcards at the leftmost position. We do not support them yet, either, but may in the future depending on the direction of the github issue.

publicsuffix-elixir's People

Contributors

andersju avatar bkirz avatar myronmarston avatar rich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

publicsuffix-elixir's Issues

Suffix file not bundled with Hex package?

Created a new project (mix new foo), added the necessary things to mix.exs:

...
def application do
  [applications: [:logger, :public_suffix]]
end

defp deps do
  [{:public_suffix, "~> 0.2.0"}]
end
...

Then:

$ mix deps.get
Running dependency resolution
Dependency resolution completed
  idna: 2.0.0
  public_suffix: 0.2.0
* Getting public_suffix (Hex package)
Checking package (https://hexpmrepo.global.ssl.fastly.net/tarballs/public_suffix-0.2.0.tar)
Using locally cached package
* Getting idna (Hex package)
Checking package (https://hexpmrepo.global.ssl.fastly.net/tarballs/idna-2.0.0.tar)
Fetched package
$ mix compile
==> idna (compile)
Compiled src/idna_ucs.erl
Compiled src/idna_unicode.erl
Compiled src/punycode.erl
Compiled src/idna.erl
Compiled src/idna_unicode_data.erl
==> public_suffix
Compiled lib/public_suffix/remote_file_fetcher.ex
Compiled lib/public_suffix/rules_parser.ex

== Compilation error on file lib/public_suffix.ex ==
** (File.Error) could not read file /home/user/elixir/foo/deps/public_suffix/data/public_suffix_list.dat: no such file or directory
    (elixir) lib/file.ex:244: File.read!/1
    lib/public_suffix.ex:122: (module)
    (stdlib) erl_eval.erl:670: :erl_eval.do_apply/6

could not compile dependency :public_suffix, "mix compile" failed. You can recompile this dependency with "mix deps.compile public_suffix", update it with "mix deps.update public_suffix" or clean it with "mix deps.clean public_suffix"

Works fine with download_data_on_compile: true, though.

(Elixir 1.2.5, Hex 0.12.0)

Handling suffixes not in the public list

I think it would make sense for the library to handle domains that don't have a valid public suffix (possibly as an option).

Right now (with 0.2.0):

> PublicSuffix.registrable_domain("foobar.thistlddoesntexist")
"foobar.thistlddoesntexist"

Suggested:

> PublicSuffix.registrable_domain("foobar.thistlddoesntexist")
:undefined

Or something like that. This is how publicsuffix-erl does it, e.g.:

> :tld.domain("foobar.thistlddoesntexist")
:undefined

Are you open to transferring ownership of this project?

Hi there! ๐Ÿ‘‹

This library hasn't seen active maintenance in a while and I'm guessing that you're no longer actively using this project.

I have a semi-active fork of this that I just did some basic maintenance on: https://github.com/axelson/publicsuffix-elixir

Are you open to transferring ownership of this git repo and hex package over to me? I'd like to help maintain it (or help maintain with a few other maintainers as well).

I have fond memories of using this library and it would be nice if the library was able to stick around in it's original form rather than a hard fork ๐Ÿ˜„

Codepoint 42 not allowed

I am trying to get this running, using this repo and also a fork https://github.com/axelson/publicsuffix-elixir and I am getting the error below.

I am stumped, presumably a dependency issue?

PublicSuffix: fetched fresh data file for compilation.

== Compilation error in file lib/public_suffix.ex ==
** (exit) {:bad_label, {:alabel, 'The label "*"  is not a valid A-label: ulabel error={bad_label,\n                                                     {context,\n                                                      "Codepoint 42 not allowed (\'DISALLOWED\') at position 0 in \\"*\\""}}'}}
    //deps/idna/src/idna.erl:281: :idna.alabel/1
    //deps/idna/src/idna.erl:149: :idna.encode_1/2
    lib/public_suffix/rules_parser.ex:62: PublicSuffix.RulesParser.punycode_domain/1
    lib/public_suffix/rules_parser.ex:35: anonymous fn/1 in PublicSuffix.RulesParser.parse_rules_section/2
    (elixir 1.10.2) lib/stream.ex:466: anonymous fn/3 in Stream.flat_map/2
    (elixir 1.10.2) lib/stream.ex:902: Stream.do_transform_user/6
    (elixir 1.10.2) lib/enum.ex:3383: Enum.split_with/2
    lib/public_suffix/rules_parser.ex:38: PublicSuffix.RulesParser.parse_rules_section/2
could not compile dependency :public_suffix, "mix compile" failed. You can recompile this dependency with "mix deps.compile public_suffix", update it with "mix deps.update public_suffix" or clean it with "mix deps.clean public_suffix"

Easier way to check if a domain includes a valid TLD

I have the given requirement, given an arbitrary string, check if any valid domains are listed in the string and highlight/linkify them. I already have a basic regex to split the string into potential domains.

I would like there to be a function in PublicSuffix that could more readily answer this. Right now I am using:

  @doc """
  Check if the given host contains a valid TLD that does not comprise the entire
  host (since we don't want to create a link to a TLD)
  """
  def valid_tld?(%URI{host: nil}), do: false
  def valid_tld?(%URI{host: host}) do
    rule = PublicSuffix.prevailing_rule(host)
    rule != "*" && rule != host
  end

Which I guess seems simple enough but it might be nice to have some sort of helper.

Warning when compiling

warning: Enum.partition/2 is deprecated. Use Enum.split_with/2 instead
Found at 2 locations:
  lib/public_suffix/rules_parser.ex:38: PublicSuffix.RulesParser.parse_rules_section/2
  lib/public_suffix/rules_parser.ex:41: PublicSuffix.RulesParser.parse_rules_section/2

warning: String.lstrip/2 is deprecated. Use String.trim_leading/2 with a binary as second argument instead
  lib/public_suffix/rules_parser.ex:45: PublicSuffix.RulesParser.parse_rules_section/2

warning: String.rstrip/1 is deprecated. Use String.trim_trailing/1 instead
  lib/public_suffix/rules_parser.ex:34: PublicSuffix.RulesParser.parse_rules_section/2
% elixir --version
Erlang/OTP 23 [erts-11.1.8] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1]

Elixir 1.10.3 (compiled with Erlang/OTP 22)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.