Git Product home page Git Product logo

Comments (7)

rufuspollock avatar rufuspollock commented on July 26, 2024

@ppKrauss - first welcome and thanks for proposing an enhancement. Could you briefly summarize what change you'd like to see

/cc @Yannael - our managing core datasets curator!

from language-codes.

Yannael avatar Yannael commented on July 26, 2024

Nice work!

At first sight, it does not seem straightforward to merge these data with the existing language-codes-full.csv. So probably the best would be to include the file language-code-extensions.csv in the package, and update the description.

A few more thoughts:

  • Rename language-codes-full.csv to iso-639.csv
  • Rename language-code-extensions to ietf-language-tags.csv
  • Files language-codes-3b2.csvand language-codes.csv are redundant with language-codes-full.csv and could be removed.

What do you think?

@ppKrauss @ewheeler @rgrp

from language-codes.

ppKrauss avatar ppKrauss commented on July 26, 2024

@rgrp and @Yannael thanks (!), I would be proud with this chance to discuss and collaborate.

As user of locale (standards for i18n and l10n) and metadata like xml:lang, I believe that "only root language-codes" are not enough... We need the combination with country-codes to obtain meaningful codes... In day-by-day I use pt-BR, and I perceive some impact when change it to pt-PT or when "only pt" is pt-PT, in both contexts of use (locale or metadata).

As programmer I see that there are many possible combinations of language×country,
~190 × ~250 ≈ 47500, so make sense to show at datasets/language-codes the list of ~700 valid combinations (~2%)... It is not very clear (to me) the goal of Datasets Project, but as datasets user and enthusiast, I prefer to list "the officially valid combinations" here than to use http://i18ndata.appspot.com/cldr/main

Synopsis: the objective of this (CSV) proposal is to generate a summary of unicode.org/Public/cldr core for datasets/language-codes.

PS: the name "language-tag-extensions" for this official list was something arbitrary, I got from the URL http://www.iana.org/assignments/language-tag-extensions-registry/language-tag-extensions-registry

from language-codes.

ppKrauss avatar ppKrauss commented on July 26, 2024

@Yannael , sorry, you was fast :-)

I think your forwarding is perfect!

PS: about "Preparation" section of this project, I am not a Python expert, but I can help to translate PHP to Python.

from language-codes.

Yannael avatar Yannael commented on July 26, 2024

@ppKrauss Thanks :)

I agree with you, language regional codes should also be included in the core packages.

The dataset you provided seems very nice for this (i.e. using version 27 of http://www.unicode.org/Public/cldr/)

Since we already have the language-codes package, the best is to merge it with this package. Do you agree?

If you do, the best to do is:

If you think we should do differently let me know.

Since the main goal of the datasets project is to ensure easy sharing of datasets, we set a few guidelines there http://data.okfn.org/doc/publish-faq, can you have a look?

Looking forward to your feedback!

cc/ @rgrp

from language-codes.

ppKrauss avatar ppKrauss commented on July 26, 2024

Thanks! I followed your recipe (!)... lets see if it works ;-)

About fields: I described it at datapackage.json; I not harmonized/compatibilized names, perhaps renaming langType to iso639-1-alpha2 and territory to ISO3166-1-alpha2...

About "merge these data with the existing language-codes-full.csv", I can do, but perhaps users prefer the normalized form -- well... normalization not help here, I not know a join mechanism for CSV (neither see at tabular-data-model).

from language-codes.

Yannael avatar Yannael commented on July 26, 2024

@ppKrauss @rgrp
Great!
Merged.
Note: I made a few edits to add this additional resource to the title and data sources.
Regarding the renaming langType and territory, I think it is not necessary since the ISO info is in the description.

Thanks!

from language-codes.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.