Git Product home page Git Product logo

wals-analysis's Introduction

World Atlas of Language Structures Analysis

Author: Cassandra Rudig
Date: 3/18/2022

The purpose of this visualization is to examine the world's language families and see what interesting trends, if any, can be seen in the typology data of various language families. In other words, the goal of this visualization was to explore the connection between linguistic typology and genetic relations amongst languages.

The data used comes from the World Atlas of Language Structures, which is a large online database which seeks to describe the various properties of languages in terms of "features" with a discrete set of possible values. For example, feature "13A: Tone" compares the types of tone systems across languages, and has 3 possible values: "No tones", "Simple tone system", or "Complex tone system". Detailed Information on the meaning of each of these features and their values can be found in the chapters at https://wals.info/chapter.

The data was not collected from the WALS website directly, but rather from kaggle: https://www.kaggle.com/averkij/wals-dataset.

This was mostly created for those with some interest in linguistics, especially comparative linguistics. Most users should be able to discover interesting patterns in the data, but users who are willing to read corresponding WALS chapters will be able to get a deeper sense of what the implications are of a given visualization.

In my opinion, the most successful part of this visualization comes in the "characteristic features" bar chart, which is intended to determine and display which feature values are most characteristic of a given language family, taking into account (1) how common the value is amongst the family, compared to other possible values for the feature, and (2), how unique the value is for the feature cross-linguistically (so that rarer values will be considered more characteristic of the family). The success of this can be seen in the pie chart, which shows, for a given value and feature, what families each of the languages with that value belong to. For example, one can see with this interface that the Niger-Congo languages account for over half of all the languages in the world (for which WALS has data) that indicate plurality of nouns with the use of a prefix.

The "likeness" chart was an attempt to be able to see if one language family could be characterized in terms of its similarity to two other language families, on a scatterplot. This appears to work to some extent, but the scores of languages vary a lot even among the likeness scale of their own family. I'd have preferred some metric which could have somehow placed languages into tight clusters with other members of the same family. I have a few ideas for ways to change the calculation that might produce this appearance, and I would attempt to implement them if I were to continue working on this project.

I also would have liked to add more interactivity via clicking data points with the mouse. I managed to do this for the world map, but it would have been interesting to do a similar thing for the likeness chart (to show what language a given point corresponds to), and for the bar chart (to show what languages within the chosen family exemplify the selected value).

Citations: Dryer, Matthew S. & Haspelmath, Martin (eds.) 2013.
The World Atlas of Language Structures Online.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
Available online at http://wals.info, Accessed on 2022-03-10.

wals-analysis's People

Contributors

truequeenbee avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.