Comments (7)
@ppKrauss - first welcome and thanks for proposing an enhancement. Could you briefly summarize what change you'd like to see
/cc @Yannael - our managing core datasets curator!
from language-codes.
Nice work!
At first sight, it does not seem straightforward to merge these data with the existing language-codes-full.csv
. So probably the best would be to include the file language-code-extensions.csv
in the package, and update the description.
A few more thoughts:
- Rename
language-codes-full.csv
toiso-639.csv
- Rename
language-code-extensions
toietf-language-tags.csv
- Files
language-codes-3b2.csv
andlanguage-codes.csv
are redundant withlanguage-codes-full.csv
and could be removed.
What do you think?
from language-codes.
@rgrp and @Yannael thanks (!), I would be proud with this chance to discuss and collaborate.
As user of locale (standards for i18n and l10n) and metadata like xml:lang
, I believe that "only root language-codes" are not enough... We need the combination with country-codes to obtain meaningful codes... In day-by-day I use pt-BR
, and I perceive some impact when change it to pt-PT
or when "only pt
" is pt-PT
, in both contexts of use (locale or metadata).
As programmer I see that there are many possible combinations of language×country,
~190 × ~250 ≈ 47500, so make sense to show at datasets/language-codes the list of ~700 valid combinations (~2%)... It is not very clear (to me) the goal of Datasets Project, but as datasets user and enthusiast, I prefer to list "the officially valid combinations" here than to use http://i18ndata.appspot.com/cldr/main
Synopsis: the objective of this (CSV) proposal is to generate a summary of unicode.org/Public/cldr core for datasets/language-codes.
PS: the name "language-tag-extensions" for this official list was something arbitrary, I got from the URL http://www.iana.org/assignments/language-tag-extensions-registry/language-tag-extensions-registry
from language-codes.
@Yannael , sorry, you was fast :-)
I think your forwarding is perfect!
PS: about "Preparation" section of this project, I am not a Python expert, but I can help to translate PHP to Python.
from language-codes.
@ppKrauss Thanks :)
I agree with you, language regional codes should also be included in the core packages.
The dataset you provided seems very nice for this (i.e. using version 27 of http://www.unicode.org/Public/cldr/)
Since we already have the language-codes
package, the best is to merge it with this package. Do you agree?
If you do, the best to do is:
- Fork
datasets/language-codes
- Rename
language-code-extensions.csv
toietf-language-tags.csv
, and add it to the fork - Update the Readme section so that information you wrote in your repository https://github.com/ppKrauss/language-tag-extensions/blob/master/README.md are also included there (see the guidelines at http://data.okfn.org/doc/publish-faq#readme)
- And then send back the link here
If you think we should do differently let me know.
Since the main goal of the datasets project is to ensure easy sharing of datasets, we set a few guidelines there http://data.okfn.org/doc/publish-faq, can you have a look?
Looking forward to your feedback!
cc/ @rgrp
from language-codes.
Thanks! I followed your recipe (!)... lets see if it works ;-)
About fields: I described it at datapackage.json
; I not harmonized/compatibilized names, perhaps renaming langType to iso639-1-alpha2
and territory to ISO3166-1-alpha2
...
About "merge these data with the existing language-codes-full.csv", I can do, but perhaps users prefer the normalized form -- well... normalization not help here, I not know a join mechanism for CSV (neither see at tabular-data-model).
from language-codes.
@ppKrauss @rgrp
Great!
Merged.
Note: I made a few edits to add this additional resource to the title and data sources.
Regarding the renaming langType and territory, I think it is not necessary since the ISO info is in the description.
Thanks!
from language-codes.
Related Issues (3)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from language-codes.