Git Product home page Git Product logo

tamudatathon's Introduction

DaHack - Hackathons Dataset!

Hackathon's Dataset in Hackathon

Python enter image description here enter image description here

Open In Colab

This Dataset is built as a part of TAMU Datathon 2021. It contains the details of hackathons which include Title, Location, Start Date, End Date, Prize Money, Number of Participants, Host of Hackathon, Themes in the Hackathon

As a part of Tamu Datathon 2021 challenges, we chose TD Data Synthesis Challenge. We approached the problem of synthesis of Hackathons Dataset throughout the world.

Construction of Dataset

The data for this dataset is obtained by Web Scraping using Python and Selenium.

Size

  • We are able to generate a dataset of 6,236 hackathons due to time constraints. However this size can be tremendously increased.

Uniqueness

  • This is the first Hackathon's Dataset with 8 unique features that are the major datapoints of a Hackathon

Usefulness

  • To Observe the trends:
    • Number of Hackathons per year.
    • Hackathons Prize Money Trend over years.
    • Rise of Themes through the years.
    • Participation in the Hackathon across years/themes.
    • And many more ...
  • To Build ML models in order :
    • to predict Prize Money based on other features like participation, themes, location.
    • to estimate user participation based on location, prize money, themes, etc.
    • to analyze relationship between the above features.

Visualization

Visualization Image

Visualization Image

Visualization Image

Visualization Image

Team's Approach

  • To build a dataset, we need to find a potential and authentic source, for which we used devpost as the source.
  • To get the data from Website, we used Web Scraping tools which helps in getting the raw data
  • We have used Selenium to scrape the website.
  • We used a Chrome Web Driver to create the automation of scrolling and scraping the data.
  • After Scraping, we got the raw data which is processed further and cleaned.
  • The clean data is stored as a dataset in a CSV file.
  • Unique datapoints like Number of participants, Start Date, End Date, Location(Online/Offline location), Prize Money, Themes
  • After obtaining a decent dataset, we used Google Studio to provide insights of the dataset and relations between the variables in the dataset.

Reflection

  • Problems

    • While scraping the website, the content inside website is not rendered as static content.
    • As the content is dynamic one, it is hard to scrape.
    • The website has a pagination effect based on page scrolling. So, initially x items are loaded in the website and on scrolling another x items are loaded and so on.
    • Unless we scroll we can't scrape the entire dataset.
  • Future Scope

    • The Dataset can be used to build a Machine Learning Model (to help in picking a theme/prize money/participant estimation and also in analyzing the trends of Hackathons)
    • Automation of Scraping using Job Schedulers(CI/CD) to scrape data periodically
    • Large data can be scraped from various sources
    • Scraping process can be further optimized by using mutlithreading.

Team DaHack

Site https://github.com/pcarribean/ Site https://github.com/msharsha/ Site https://github.com/revanth-reddy/

Linkedin: manisha-bachu Linkedin: msharsha Linkedin: revanth--reddy

Team Image

tamudatathon's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.