Git Product home page Git Product logo

awk-hack-the-planet's Introduction

Awk: Hack the Planet['s text]!

Slides and source code to go along with Ben Porter's Awk talk at Linux Fest Northwest 2020

Videos from Linux Fest Northwest 2020:

  • Part 1 (Presentation) - This is the presentation or lecture explaining Awk syntax and functions
  • Part 2 (Exercises) - This includes me explaining all of the answers to the challenges in the repo

If you want to contact me:

The Scenario

The boss has given us a tsv file full of payroll data, and she would like us to run some analysis on it. We recently learned about awk and it's amazing processing power, and have decided this is an awesome chance to use our new skillz!

You should primarily use awk, but you can (and should) combine with other tools (like sort, uniq) when it makes sense. Don’t use grep or sed tho since awk can handle the same scenarios (and you are trying to learn awk after all) :-)

The payroll file is payroll.tsv. You can generate a new one with the provided ruby script if you’d like to randomize it.

There are many different solutions. The ones presented are just mine. Many of them could be optimized and refactored to be more elegant. To run my solutions (and check my output against yours), use awk -f <file> payroll.tsv (but substitute the number for the one you are trying to run):

awk -f 01.awk payroll.tsv

Some solutions are bash scripts, in which case just run them like normal:

./09-awk.sh

Challenges (Questions to answer about our payroll data using awk to analyze)

Easy (one-liners)

  1. How much money per hour does the janitor make?
  2. What is the name of the CEO? Format like "LastName, FirstName"?
  3. Which employees were hired on April 16, 1993? (Print the list)
  4. Which employee works in the Springfield office?

A little harder

  1. How many mechanical engineers work here?
  2. How many people from the Portwood family work here?
  3. Are there any employees with identical first & last names? (IOW, the first name is the same as the last name. e.g. Linus Torvalds is not identical, Johnson Johnson is identical)

Gotta think a bit

  1. Print each column header, along with the column number. E.g. The LastName column is the second column, so print "2 - LastName"
  2. How much money per hour does the Seattle office cost to run? (IOW, how much total per hour does it cost to pay all employees who work out of the Seattle office)
  3. How many engineers (of any type) work here?
  4. Who is the highest paid employee?
  5. Who worked the most hours this week?

Awk proficient

  1. Anonymize the data by removing the first two columns. Print all remaining columns
  2. Our client is complaining about the anonymized data from the previous question. They say is claiming it is too hard to read. They would like you to add line numbers to the beginning of each line in the output.
  3. How many different office locations does the company have?
  4. What is the average (mean) wage of all employees? What about the median (extra credit)?
  5. Are there any duplicate entries? (Same names appearing on payroll more than once)
  6. Who was the first employee hired?

Solutions

My solutions are in the *.awk files in this repository. Feel free to use them for hints. You can run them with:

awk -f <file>.awk payroll.tsv

They are also detailed in the Slides at the end of the deck.

awk-hack-the-planet's People

Contributors

freedomben avatar antalest avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.