Git Product home page Git Product logo

nycflights13's Introduction

nycflights13

CRAN_Status_Badge R-CMD-check Codecov test coverage

Overview

This package contains information about all flights that departed from NYC (e.g. EWR, JFK and LGA) to destinations in the United States, Puerto Rico, and the American Virgin Islands) in 2013: 336,776 flights in total. To help understand what causes delays, it also includes a number of other useful datasets.

This package provides the following data tables.

  • ?flights: all flights that departed from NYC in 2013
  • ?weather: hourly meterological data for each airport
  • ?planes: construction information about each plane
  • ?airports: airport names and locations
  • ?airlines: translation between two letter carrier codes and names

If you're interested in other subsets of flight data, see:

  • nycflights for flights departing from NYC in the last year.

  • anyflights for flights departing from any airport in any year.

  • airlines to maintain a local SQL database of all flight departure data.

nycflights13's People

Contributors

balthasars avatar bbrewington avatar beanumber avatar elben10 avatar hadley avatar hughparsonage avatar ianmcook avatar jozefhajnala avatar krlmlr avatar rmcd1024 avatar seankross avatar sjackman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nycflights13's Issues

query about including both incoming vs. outgoing flights

Is it feasible to include both incoming and outgoing flights in the nycflights13 package? I know that large packages are frowned upon by CRAN, but could an exception be made?

Is the problem one of a single large table? Could the "flights" table be split into two parts (and built together at package installation time)?

This might allow a version of the package on github to include more cities and years without running into the "Github doesn't like files greater than 50MB".

If you have general advice on providing access to larger datasets via R packages hosted on github, I'd be all ears (and suspect that @beanumber and @rpruim would as well!).

Move `master` branch to `main`

The master branch of this repository will soon be renamed to main, as part of a coordinated change across several GitHub organizations (including, but not limited to: tidyverse, r-lib, tidymodels, and sol-eng). We anticipate this will happen by the end of September 2021.

That will be preceded by a release of the usethis package, which will gain some functionality around detecting and adapting to a renamed default branch. There will also be a blog post at the time of this master --> main change.

The purpose of this issue is to:

  • Help us firm up the list of targetted repositories
  • Make sure all maintainers are aware of what's coming
  • Give us an issue to close when the job is done
  • Give us a place to put advice for collaborators re: how to adapt

message id: euphoric_snowdog

weather.r requires revision

weather.r no longer runs correctly as the mesonet file format appears to have changed slightly. I modified the code and was planning to issue a pull request, but the time_hour value in the file is different. I presume this occurred because no tz was specified but I wanted to ask before

distance is not flown distance

The dataset documentation says

distance Distance flown

But according to BTS's glossary, distance is "Distance between airports (miles)".
Rarely this is the flown distance due to the fact that flights do not fly great circle paths.
Also some flights are

  • diverted to a different airport so DivDistance will cater for that case.
  • cancelled, i.e. flight 4412 on 2013-01-30

HTH

Upkeep for nycflights13 (2022)

2022

  • Handle and close any still-open master --> main issues
  • usethis:::use_codecov_badge("tidyverse/nycflights13")
  • Update pkgdown site using instructions at https://tidytemplate.tidyverse.org
  • Update lifecycle badges with more accessible SVGs: usethis::use_lifecycle()

2023

  • Update email addresses *@rstudio.com -> *@posit.co
  • Update copyright holder in DESCRIPTION: person("Posit Software, PBC", role = c("cph", "fnd"))
  • Run devtools::document() to re-generate package-level help topic with DESCRIPTION changes
  • use_tidy_logo()
  • usethis::use_tidy_coc()
  • Use pak::pak("org/pkg") in README
  • Consider running use_tidy_dependencies() and/or replace compat files with use_standalone()
  • Use cli errors or file an issue if you don't have time to do it now
  • use_standalone("r-lib/rlang", "types-check") instead of home grown argument checkers;
    or file an issue if you don't have time to do it now
  • Add alt-text to pictures, plots, etc; see https://posit.co/blog/knitr-fig-alt/ for examples

Eternal

  • use_package("R", "Depends", "3.6")
  • usethis::use_tidy_description()
  • usethis::use_tidy_github_actions()
  • devtools::build_readme()
  • Re-publish released site if needed

Created on 2023-10-30 with usethis::use_tidy_upkeep_issue(), using usethis v2.2.2.9000

Query: Package documentation reference to American Airways

Hello,

The description section of the planes data set CRAN documentation includes a note about 'American Airways' (AA) and 'Envoy Air' (MQ). Should this note refer to 'American Airlines' instead?

The airlines data set associates the initials AA with 'American Airlines' (and the letters MQ with 'Envoy Air'). Then, it might be easy to infer that this could be just a typo. However, it took me a while to understand the anti_join()documentation example since the American Airways observations do not appear in any data set even when I match the carrier variable using the rest of the join functions.

Thanks a lot for your time and I hope that everyone is safe and healthy in these times.

Best,

Sicabí

Hourly precipitation calculation in weather.R incorrect

When aggregated, the hourly precipitation numbers in the weather dataframe do not match official NOAA daily totals for the same locations. The calculation of hourly precipitation is somewhat involved. The issue is that hourly cumulative totals reset at 51 minutes (this is not invariably true, but it appears true for the 3 NY airports in 2013). My pull request #26 addresses this.

See this page for an example using ASOS data to match NOAA daily totals.

Release nycflights13 1.0.0

  • devtools::check_win_devel()
  • rhub::check_for_cran()
  • revdepcheck::revdep_check(num_workers = 4)
  • Polish NEWS
  • Bump version (in DESCRIPTION and NEWS)
  • devtools::check_win_devel() (again!)
  • devtools::submit_cran()
  • Approve email
  • Tag release
  • Bump dev version
  • Tweet

Template from r-lib/usethis#338

Download functions in /data-raw have old links

The links used in airlines, flights, and planes are no longer valid

Can be fixed immediately:

I'm working on fixing:

  • flights: bts.gov no longer uses the same url format. Instead of year and month being specified in the URL, there is now an input form and all parameters are past through that

Dillant Hopkins Airport lat and long are switched and lat * -1

I was looking at the airports data and noticed that Dillant Hopkins Airport was at 72.27, i.e. the furthest north. I knew there were airports in Alaska to this seemed incorrect. The lat and lon in this file show lat of 42.89, long of -72.27, which is correct:
image
It is shown incorrect in the csv:
image
and also incorrect when you load in R:
image

I didn't have time to look into where/why this is happening but wanted to report, Vermont is not the furthest north airport :-)

add cancellation status?

The dataset is fabulous and works extremely well for teaching purposes. But would it be possible to add cancellation status? It appears as if cancelled flights are not included.

Release nycflights13 1.0.1

Prepare for release:

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

weather uses two timezones - not clear which matches flights

In weather, the time_hour variable is offset by five hours from the time displayed across the year, month, day, and hour variables.

screen shot 2017-01-01 at 12 46 53 pm

It is not clear which time matches the times in flights (where year, month, day, hour, and time_hour all agree). Given the offset, it is possible that time_hour is in the America/New_York timezone and the other variables are in UTC.

Customizing airports and years with the groundcontrol package

I wrote a package groundcontrol that adapted the code in nycflights13 to allow the user to create a package like this one, but specify the airports, year, and whether they want to include flight to or from those airports. Would you have any interest in (a) including those functions inside this package, or (b) sharing a common codebase?

airport code "BFT" is not unique?

> filter(airports, faa == "BFT")
Source: local data frame [2 x 7]

  faa               name      lat       lon alt tz dst
1 BFT           Beaufort 32.47741 -80.72316  37 -5   A
2 BFT BFT County Airport 32.41083 -80.63500 500 -5   A

Obviously, this is an upstream issue, but might we want to filter these out? A hacky temporary solution is:

  filter(name != "Beaufort") %>%

in the creation of the airports table.

why this code not showing any request?flights_sml <- select(flights, year:day, ends_with("delay"), distance, air_time)

flights_sml <- select(flights, year:day, ends_with("delay"), distance, air_time)

flights_sml <- select(flights, year:day, ends_with("delay"), distance, air_time)
mutate(flights_sml, gain = dep_delay - arr_delay, speed = distance/air_time * 60)

A tibble: 336,776 × 9

year month   day dep_delay arr_delay distance air_time  gain speed


1 2013 1 1 2 11 1400 227 -9 370.
2 2013 1 1 4 20 1416 227 -16 374.
3 2013 1 1 2 33 1089 160 -31 408.
4 2013 1 1 -1 -18 1576 183 17 517.
5 2013 1 1 -6 -25 762 116 19 394.
6 2013 1 1 -4 12 719 150 -16 288.
7 2013 1 1 -5 19 1065 158 -24 404.
8 2013 1 1 -3 -14 229 53 11 259.
9 2013 1 1 -3 -8 944 140 5 405.
10 2013 1 1 -2 8 733 138 -10 319.

ℹ 336,766 more rows

ℹ Use print(n = ...) to see more rows

flights_sml <- select(flights, year:day, ends_with("delay"), distance, air_time)

query about including both incoming and outgoing flights

I realize that there are serious space limitations on the size of packages on CRAN. But would it be feasible within those constraints to include both incoming and outgoing flights? This would allow a whole series of questions to be answer about differences between departures and arrivals at JFK, for example.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.