Git Product home page Git Product logo

made_mlbd's Introduction

Title: Machine Learning in Big Data - Laboratory Works in MADE Repository

Introduction:

Welcome to the GitHub repository for our comprehensive laboratory works in MADE academy in Machine Learning for Big Data! This repository aims to provide hands-on experience and practical insights into the powerful world of machine learning techniques and their application in big data environments. The repository is structured into four branches, each covering a specific homework topic, including HDFS and MapReduce, Hive, Scala, and Spark ML.

Project Overview:

In this repository, you'll find a collection of laboratory works that delve into key concepts and tools essential for understanding and implementing machine learning algorithms in big data settings. From data storage and processing to advanced machine learning models, each branch focuses on a distinct aspect of big data analytics.

Branches:

  • HDFS and MapReduce: This branch explores the fundamentals of Hadoop Distributed File System (HDFS) and MapReduce programming. It covers how to manage large-scale data storage and leverage MapReduce for parallel processing and distributed computing.

  • Hive: The Hive branch introduces the popular data warehouse infrastructure for Hadoop. It covers how to query and analyze structured data using HiveQL, making data exploration and manipulation efficient and intuitive.

  • Scala: In this branch, I have dived into the Scala programming language, a versatile and powerful language for working with big data tools.It was the first try of Scala and its role in big data processing.

  • Spark ML: This branch focuses on Apache Spark's MLlib, a robust library for scalable machine learning. It covers Spark's distributed machine learning algorithms, enabling to build and deploy ML models on large datasets.

Project Structure:

  • homework_1: Contains laboratory works and code samples related to HDFS and MapReduce.
  • homework_2: Includes queries, data samples, and HiveQL scripts for data analysis using Hive.
  • homework_3: Comprises Scala first try code and examples for big data processing.
  • homework_4: Contains notebooks and code snippets demonstrating Spark MLlib's machine learning capabilities.

Getting Started:

I hope this repository serves as a valuable resource for your machine learning journey in the realm of big data analytics. Happy learning and exploring the exciting world of machine learning in big data! ๐Ÿš€

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.