Git Product home page Git Product logo

cs-4403-assignment-1's Introduction

Assignment 1

The Pigeon Hole Challenge

Consider a domain of size n = 300.

Generate random numbers in the domain [n] until every value i โˆˆ [n] has had one random number equal to i.

How many random trials did this take?

We will use k to represent this value.

Repeat 400 times (let's call this counter m), and for each repetition record k.

Make a density plot.

What can you say about k?

Describe how you implemented this experiment and how long it took for n = 300 and m = 400 trials.

Show a plot of the run time as you gradually increase the parameters n and m. Go to https://matplotlib.org/ to learn how to create visualizations in Python.

Now have some fun and try some different values for n and m.

What are the highest values for n and m that you can reasonably reach? Which one claims the most resources? Why?

The Birthday Paradox

Consider a domain of size n = 5000.

Generate random numbers in the domain [n] until two have the same value.

How many random trials did this take?

We will again use k to represent this value.

Repeat the experiment m = 300 times, and record k each time.

Make a density plot. The plot should show a curve that starts at a y value of 0, and increases as k increases, and eventually reaches a y value of 1.

What can you say about k?

Describe how you implemented this experiment and how long it took for m = 300 trials. Show a plot of the run time as you gradually increase the parameters n and m.

What are the highest values for n and m that you can reasonably reach?

Finding Similar Sentences

I used GPT-3 to generate a bunch of sentences of various lengths. You can find the results in the following files:

Within each document, construct k-grams for all sentences (you will want to convert upper case to lower case), as follows:

  • character based 2-grams
  • character based 3-grams
  • word based 2-grams

You should only store each k-gram once, duplicates are ignored.

How many distinct k-grams are there for each document with each type of k-gram?

Compute the Jaccard similarity between all pairs of sentences for each type of k-gram within each document.

Report for each document the 5 pairs of sentences which are the most similar.

What can you state about the computational cost for this experiment?

cs-4403-assignment-1's People

Contributors

theaidentv avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.