Opening this issue so that we don't forget this problem and we can discuss it here. Qu

I started and pretty much finished this (here <a href="https://github.com/elixir-lang/

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Close by <a class="issue-link js-issue-link" data-error-text="Failed to load title" da

Duplicate msgids about gettext HOT 8 CLOSED

elixir-gettext commented on May 17, 2024

Duplicate msgids

from gettext.

Comments (8)

whatyouhide commented on May 17, 2024

I started and pretty much finished this (here https://github.com/elixir-lang/gettext/blob/duplicate-translations/lib/gettext/po/parser.ex#L66-L95) and it works pretty smoothly. I think there's one inconsistency though: when there are duplicate translations, a Gettext.PO.SyntaxError is raised in Gettext.PO.parse_string!/1 and Gettext.PO.parse_file!/1. This happens because the parser returns a {:error, line, reason} tuple where there are duplicate translations, just like when there's any kind of other parsing/tokenizing error.

Personally, I'd really like to raise something like Gettext.PO.DuplicateTranslationError instead of a syntax error since the syntax is fine (one could argue with this, but I think this error is not what most people would call a "syntax error" even if it technically was).

I can think of two solutions: we make the parser return {:error, type, line, reason} instead of {:error, line, reason} (I'm not a fan of this one) or we check for duplicate translations outside of the parser; if we went with the second suggestion, I really have no idea where we should perform such check; Gettext.PO.parse_string/1 or Gettext.PO.parse_file are still parse-related (as the name suggests), so I'm not sure we could check there (assuming that's the natural place where you would want to check, right after calling Gettext.PO.Parser.parse/1).

What do you think? /cc @josevalim

from gettext.

josevalim commented on May 17, 2024

I think SyntaxError is fine for now. We can discuss if it is an issue in the future.

Btw, there is an easier/faster way to implement the duplicates check. You can call Enum.reduce using a HashDict as accumulator. For every entry, you check if there was something in the HashDict already. The downside is that we are not going to accumulate all translations (which I think is fine) but on the upside it is going to be much faster in the common case (where everything works™).

from gettext.

whatyouhide commented on May 17, 2024

@josevalim mmm, but I still need to jump out of Enum.reduce early if I find a duplicate, right?

If I had something like this

Enum.reduce translations, HashDict.new, fn
  %Translation{id: id, po_source: {_, line}}, acc ->
    Dict.update(acc, id, [line], &[line|&1])
  # same thing for PluralTranslation
end

I could still keep track of all the duplicate translations, but to find them I would have to build the entire HashDict and then find duplicates in it, right? I'm sorry I'm having a little trouble getting you on this one :).

(btw, it would still be faster using the HashDict and Enum.reduce the way I used them before, but I don't think you meant to use them like that :))

from gettext.

josevalim commented on May 17, 2024

The code you wrote is literally group_by implementation. What I mean is that you would use Enum.reduce with a HashDict as soon as you find a duplicate, you would raise. The upside is that we avoid traversing the structures multiple times. The downside is that we raise only on the first duplicate (which is fine imo).

from gettext.

whatyouhide commented on May 17, 2024

@josevalim indeed that's group_by, you're right 😥

Ok, if you want to raise at this level I get what you're saying. I was trying to avoid raising in the parser since the parser always returns {:error, line, reason} when there's an error and it's Gettext.PO.parse_string!/1 or Gettext.PO.parse_file!/1 that do the raising; if we go with your suggestion, we'll have to rescue inside the non bang version of these two functions, right?

from gettext.

whatyouhide commented on May 17, 2024

@josevalim what about this:

   defp check_for_duplicates(translations) do
     res = Enum.flat_map_reduce translations, HashDict.new, fn t, acc ->
       if line = Dict.get(acc, translation_id(t)) do
         {:halt, [line, elem(t.po_source, 1)]}
       else
         {[t], Dict.put_new(acc, translation_id(t), elem(t.po_source, 1))}
       end
     end

     case res do
       {_, [l1, l2]} -> {:error, l2, "found duplicates of this translation"}
       {_, _}        -> :ok
     end
   end

where translation_id/1 just extracts id for %Translation{}s and {id, id_plural} for %PluralTranslations{}.

This way we still stop at the first duplicate translation we find, but we return {:error, line, reason} instead of raising (while still traversing the structure only once).

Btw, raising as you suggested could solve us the problem of raising a DuplicateTranslationError instead of a SyntaxError, so that's another advantage; at the same time, I guess people don't expect parse_string/1 and parse_file/1 to raise if they have a bang variant. Wdyt?

from gettext.

josevalim commented on May 17, 2024

Ah, yes, I am sorry. If you want to exit early from a reduce, in this case, it is fine to use throw/catch.

from gettext.

whatyouhide commented on May 17, 2024

Close by #25!

from gettext.

Duplicate msgids about gettext HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent