Git Product home page Git Product logo

Comments (8)

whatyouhide avatar whatyouhide commented on May 17, 2024

I started and pretty much finished this (here https://github.com/elixir-lang/gettext/blob/duplicate-translations/lib/gettext/po/parser.ex#L66-L95) and it works pretty smoothly. I think there's one inconsistency though: when there are duplicate translations, a Gettext.PO.SyntaxError is raised in Gettext.PO.parse_string!/1 and Gettext.PO.parse_file!/1. This happens because the parser returns a {:error, line, reason} tuple where there are duplicate translations, just like when there's any kind of other parsing/tokenizing error.

Personally, I'd really like to raise something like Gettext.PO.DuplicateTranslationError instead of a syntax error since the syntax is fine (one could argue with this, but I think this error is not what most people would call a "syntax error" even if it technically was).

I can think of two solutions: we make the parser return {:error, type, line, reason} instead of {:error, line, reason} (I'm not a fan of this one) or we check for duplicate translations outside of the parser; if we went with the second suggestion, I really have no idea where we should perform such check; Gettext.PO.parse_string/1 or Gettext.PO.parse_file are still parse-related (as the name suggests), so I'm not sure we could check there (assuming that's the natural place where you would want to check, right after calling Gettext.PO.Parser.parse/1).

What do you think? /cc @josevalim

from gettext.

josevalim avatar josevalim commented on May 17, 2024

I think SyntaxError is fine for now. We can discuss if it is an issue in the future.

Btw, there is an easier/faster way to implement the duplicates check. You can call Enum.reduce using a HashDict as accumulator. For every entry, you check if there was something in the HashDict already. The downside is that we are not going to accumulate all translations (which I think is fine) but on the upside it is going to be much faster in the common case (where everything worksβ„’).

from gettext.

whatyouhide avatar whatyouhide commented on May 17, 2024

@josevalim mmm, but I still need to jump out of Enum.reduce early if I find a duplicate, right?

If I had something like this

Enum.reduce translations, HashDict.new, fn
  %Translation{id: id, po_source: {_, line}}, acc ->
    Dict.update(acc, id, [line], &[line|&1])
  # same thing for PluralTranslation
end

I could still keep track of all the duplicate translations, but to find them I would have to build the entire HashDict and then find duplicates in it, right? I'm sorry I'm having a little trouble getting you on this one :).

(btw, it would still be faster using the HashDict and Enum.reduce the way I used them before, but I don't think you meant to use them like that :))

from gettext.

josevalim avatar josevalim commented on May 17, 2024

The code you wrote is literally group_by implementation. What I mean is that you would use Enum.reduce with a HashDict as soon as you find a duplicate, you would raise. The upside is that we avoid traversing the structures multiple times. The downside is that we raise only on the first duplicate (which is fine imo).

from gettext.

whatyouhide avatar whatyouhide commented on May 17, 2024

@josevalim indeed that's group_by, you're right πŸ˜₯

Ok, if you want to raise at this level I get what you're saying. I was trying to avoid raising in the parser since the parser always returns {:error, line, reason} when there's an error and it's Gettext.PO.parse_string!/1 or Gettext.PO.parse_file!/1 that do the raising; if we go with your suggestion, we'll have to rescue inside the non bang version of these two functions, right?

from gettext.

whatyouhide avatar whatyouhide commented on May 17, 2024

@josevalim what about this:

   defp check_for_duplicates(translations) do
     res = Enum.flat_map_reduce translations, HashDict.new, fn t, acc ->
       if line = Dict.get(acc, translation_id(t)) do
         {:halt, [line, elem(t.po_source, 1)]}
       else
         {[t], Dict.put_new(acc, translation_id(t), elem(t.po_source, 1))}
       end
     end

     case res do
       {_, [l1, l2]} -> {:error, l2, "found duplicates of this translation"}
       {_, _}        -> :ok
     end
   end

where translation_id/1 just extracts id for %Translation{}s and {id, id_plural} for %PluralTranslations{}.

This way we still stop at the first duplicate translation we find, but we return {:error, line, reason} instead of raising (while still traversing the structure only once).

Btw, raising as you suggested could solve us the problem of raising a DuplicateTranslationError instead of a SyntaxError, so that's another advantage; at the same time, I guess people don't expect parse_string/1 and parse_file/1 to raise if they have a bang variant. Wdyt?

from gettext.

josevalim avatar josevalim commented on May 17, 2024

Ah, yes, I am sorry. If you want to exit early from a reduce, in this case, it is fine to use throw/catch.

from gettext.

whatyouhide avatar whatyouhide commented on May 17, 2024

Close by #25!

from gettext.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.