amacinho / name-gender-guesser Goto Github PK
View Code? Open in Web Editor NEWGuesses the gender of the names.
License: GNU General Public License v3.0
Guesses the gender of the names.
License: GNU General Public License v3.0
Copying: Name Gender Guesser is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. Name Gender Guesser is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Name Gender Guesser. If not, see <http://www.gnu.org/licenses/>. Introduction: Name Gender Guesser helps you to find out the gender of a given name. You can either use two provided datasets (or another if you have your own) consisting of common American names with their frequencies in male and female populations, or you can use Yahoo! BOSS API to guess the gender of an unknown name by carrying out some pattern-based searches. Quick Start: Checkout the code and run example.py Less Quick Start: This project contains two datasets for gender assocciations of common American names and two scripts, one to handle these datasets, another to carry out a web-based search to guess the gender of unknown names. First dataset, us_census, comes from the US Census Bureau and constructed as follows: The names are fetched from the Bureau's web site (http://www.census.gov/genealogy/www/data/1990surnames/names_files.html) and put in two files: us_census_males and us_census_females which contain the frequency of names for the sample male and female population respective (according to 1990 census). The second dataset, popular_baby_names, comes from the US Social Security Administration's statistics for popular baby names for every year between 1960 and 2010. The dataset is constructed as follows: 1) Fetch most popular 100 female and male names for every year between 1960 and 2010 from http://www.ssa.gov/cgi-bin/popularnames.cgi 2) For each male and female name calculate the average probability of usage between 1960 and 2010. Missing years are not used in averaging. That implies if a name was in top100 list for only year for the given period, its final score will be its probability for that year. The class NameGender (contained in name_gender.py) handles with these datasets. If you have your own dataset, you can also use it. The format is trivial (really, check them yourself). The class WebNameGender does not use any dataset to guess the gender of the name. It simply carries out several web-searches via Yahoo! BOSS API and calculates a gender score according the hit counts. It provides a fallback mechanism if a given name is not contained in the datasets. It also works fairly well for common names in languages other than English (a proper evaluation is yet to be done). You will need a BOSS Application ID to use this class. Two example patterns that WebNameGender uses for a given name X are: * "X himself", "X herself" * "husband of X", "wife of X" In the first case, "X himself" provides evidence that X is a he. In the second case, "husband of X" provides evidence that X is a she. By comparing several pattern pairs like these, WebNameGender computes a gender score for X.
Hi,
I've been trying to recreate the popular baby name files following the procedure outlined in the README file. For this, I first fetched the most popular 100 female and male names for every year between 1960 and 2010 from http://www.ssa.gov/cgi-bin/popularnames.cgi using the following command:
for year in $(seq 1960 2010); do echo $year; wget --quiet --no-check-certificate -O "${year}.html" --post-data="year=${year}&top=100&number=p" https://www.ssa.gov/cgi-bin/popularnames.cgi; done
However, that's where I already run into some questions.
Looking at your file popular_1960_2010_females, the first entry is for the name "fawn". However, I cannot find any mention of that name in the files downloaded above:
grep -i fawn *.html
This commands finds no matches.
Could you please elaborate your method, or point out what I should have done differently?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.