Git Product home page Git Product logo

r-geospatial-fundamentals-legacy's Introduction

R Geospatial Fundamentals

Datahub Binder

This is the repository for D-Lab's R Geospatial Fundamentals workshop.

Geospatial data are an important component of data visualization and analysis in the social sciences, humanities, and elsewhere. The R programming language is a great platform for exploring these data and integrating them into your research. This workshop focuses on fundamental operations for reading, writing, manipulating and mapping vector data, which encodes location as points, lines and polygons.

--

Content outline

  • Part I: Core concepts, vector data, and plotting
    • Basic geospatial concepts
    • Basic vector data
    • Geospatial data structures (the sf package)
    • Basic plotting (base::plot and the ggplot3 package)
    • Managing coordinate reference systems (CRS)
    • Advanced plotting (the tmap package)
    • Map overlays
  • Part II: Spatial analysis
    • Spatial measurement queries
    • Spatial relationship queries
    • Buffer analysis
    • Spatial and non-spatial joins
    • Aggregation
    • Continued mapping practice
  • Part III: Raster data
    • Raster concepts
    • Raster data structures (the raster package)
    • Mapping with raster and vector data
    • Spatial analysis of raster and vector data
    • Raster reclassification
    • Raster stacks and raster algebra

Getting started

Please follow the notes in participant-instructions.md.

Assumed participant background

We assume that participants have working familiarity with the R language, including the topics covered in our R-Fundamentals workshop materials (though participants without this have still reported useful learning about geospatial concepts).

If you would like a refresher on R, you could review that workshop's materials, or look to other online resources such as the Base R Cheat Sheet or the Quick R website.

Technology requirements

Please bring a laptop with the following:

Resources

r-geospatial-fundamentals-legacy's People

Contributors

aculich avatar erthward avatar hikari-murayama avatar katherinerosewolf avatar pattyf avatar tomvannuenen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

r-geospatial-fundamentals-legacy's Issues

Broken links

There are broken image links. e.g. notebook 7, the image in 7.1 Attribute join

Duration

Looking over 2 previous iterations and going through this current one, I think the projected duration included at the beginning of each lesson may be an underestimation. the lessons seem to take longer looking at the last 3 runs.

Build Rproj file to connect code and files?

Feeling the new Geospatial-fundanmentals-in-sf so much! One suggestion would be that we bundle the .RMDs and various data sources into an Rproj document, similar to what our Advanced data wrangling in R curriculum has going on. If we go that way, we could leverage the here() package to automate user to access shapefiles and datasets, avoiding directory configs at a local level.

Binder link fails to load `sf` package

Binder link will eventually resolve. The sf package will not load when called.
This is with the runtime.txt of r-4.0-2020-10-10

This warning message while installing the package could be a clue for us why:

Warning message:
R graphics engine version 14 is not supported by this version of RStudio. The Plots tab will be disabled until a newer version of RStudio is installed. 

Comments after being instructor

  • In the Introduction to sf, in line 261 in "Making your own sf objects" it seemed a bit counter intuitive to create points that do not have a CRS.
  • Is there a reason to be using base R instead of dplyr? I think it might be more intuitive for some to use dplyr instead of base R, specially when subsetting data from dataframes.
  • In line 408 of More Data More Maps, it did not make a lot of sense to me to write out a .csv with sf_write, instead of write.csv, particularly when we use write.csv later in the workshop.

Add knitr option for path

Add
knitr::opts_knit$set(root.dir = normalizePath('../'))
to the knitr setup options for the Rmd files for pt2 & 3 (done for pt1) so that the Rmd files can knit in the docs directory but reference the data files from the root directory in any R chunks.

Note: The images directory was moved under docs because it is only used in the Rmarkdown files and the relative path setting only works for chunks (not markdown).

Define reclassification matrix

The whole matrix below is quite challenging to follow. It's unclear the traslation that is happen. The notes attempt to help but I think they could be more explicit.

Note: by default, the ranges dont include left val, but include right

reclass_vec <- c(0, 20, NA, # water will be set to NA (i.e. 'left out' of our analysis)
20, 21, 1, # we'll treat developed open space as greenspace,
# based on the NLCD description
21, 30, 0, # developed and hardscape will be set to 0s
30, 31, NA,
31, Inf, 1) # greenspace will have 1s

Add docs dir

Add a docs directory for the RMD and HTML files.

Update "Insert Code Chunk" text

Instructions for inserting a code chunk (Part 1, lines 56-62) appear out of date.

Option 1 should be Code > Insert Chunk.

Clean up some of the slides with too much content

Split slides with too much content into two: 1 with the slide text & code & one with the code output (repeating the code but not the text if it is short). At least these part 1 slides...
103, 111, 137, 142,156
165, 170, 173, 175, 177, 186

09.Raster_data.Rmd : fatal error when running certain chunk on DataHub

Around line 380, projectRaster() causes fatal error only when ran on DataHub. No problem on the local side. There are subtle differences in the proj4string output that precedes the call for some reason. Not sure how to fix at this time.

st_crs(SFtracts)$proj4string

when ran locally, the above call returns: "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs " and "+proj=longlat +datum=NAD83 +no_defs" via DataHub. (where did the +towgs84=0,0,0,0,0,0,0 go)?!?!

DEM_WGS = projectRaster(DEM, projectExtent(DEM, st_crs(SFtracts)$proj4string))

the next call then causes fatal error on DataHub.

Rmd Outline

I wondering if the subheadings could be more informative

There are name 'Explore the Structures'. Maybe 'Explore the Structures - Memory' and 'Explore the Structures - dropping', etc. could be more informative.

maxValue minValue

In the code chunk

{r}

Quick summary stats

summary(DEM)

summary(DEM[,])

freq(DEM)

maxValue(DEM)
minValue(DEM)

res(DEM)

i get an NA - although it worked in one of the videos. the documentation on the function says 'If a Raster* object is created from a file on disk, the min and max values are often not known (depending on the file format).' It's confusing because in the instructor's video (https://berkeley.zoom.us/rec/share/JQ4Xk3TSG5U-8L0pPezDaZoN_dIMFQyS4jJRKUK_JASFXo5G30erHErh8kxcvTjZ.XsO27WKI2ZKFQPEH?startTime=1649797368000) at (time point 1:21:33), the Max and Min values that are output are not the same values we see in the summary output.

07 Solution Visible

Solution to student exercise is visible in 07_Joins_and_Aggregation at lines 644-663

Datahub

Is there a possibility to have a datahub version of this workshop? We ran into some issues where datahub could have been useful.

Aggregation

It seems there are additional data added if the join = st_within is not specified.

Original code: tracts_acs_sf_ac2 <- st_join(tracts_acs_sf_ac, tracts_meanAPI['mean_API'])
image

suggested code: tracts_acs_sf_ac2 <- st_join(tracts_acs_sf_ac, tracts_meanAPI['mean_API'], left = T, join = st_within)
image

opening slide recommended changes

Slides 17-18: Location example looks like an all-white ranch/4H program (https://asotincountyfairandrodeo.org/4-h/)
Possibly use this location instead? https://commons.wikimedia.org/wiki/File:Coyote_Point_Trail_at_Whitewater_State_Park,_Minnesota_(44136078811).jpg (photo has a CC 2.0 license)

Slide 34: Given the racialization of what counts as “crime” in the United States (and even more overt racism in police presence), I’d replace crime locations with something else, e.g., motor vehicle accident locations or point environmental pollutant emissions locations from the Toxics Release Inventory (e.g., https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-data-files-calendar-years-1987-2019)

Add license file

I think creative commons is what we want:
Creative Commons Attribution-NonCommercial 4.0 International Public License

Source vs. Visual

Hi,

If one choses to use the Visual version, there are some aspects that don't translate well. e.g.

  • in section 2.1 "sf = simple features"
  • 2.4 the text after the image in sf Geometry Types subsection
  • at the very end
  • at the beginning of lesson 3: "
    Instructor Notes" and "Exercises: 10 minutes "
  • Section 3.3: "

    If you look at the coordinate values "
  • Section 3.4 "

    As a refresher, a CRS "
  • at the very end

So most if not all are just visual

LN 84: st_read() typo

The filename in line 84 should be "sftracts_wpop.shp" instead of ""sftracts_wpop". Otherwise it will not read in.

Raster Legend

I think the Challenge 2: Read in and check out new data section needs some effort to edit. The nlcd@legend has no data. Even when brought into the memory with readAll. So the predefined legend values are not available, even before transforming and cropping. So a large part of this lesson is lost. The legend data is there, somewhere, because the barplot segments into the colors and if you just click on the tif file (mac) there is a plot that follows the predefined colors.

We noted that there is a reference to a data folder: 'You have another raster dataset in your ./data directory. The file is called nlcd2011_sf.tif.'
So wondering if there was change somehow that affected the data?

Add a readme file

Add a readme file with:

  • links to the html slides/notebook in the docs directory so folks can click and view in the browser.
  • Details on prerequisite knowledge, packages, getting started etc - the stuff in the setup slide.

change name?

I was supposed to make an issue regarding the name of the repo and the name listed on the excel sheet.
Name listed: R Geospatial Data: Parts 1-3

CRS Transformations

I think how projectExtent is used can be explained a bit more explicitly in the following code:

DEM_NAD83 = projectRaster(DEM, projectExtent(DEM, st_crs(SFtracts_NAD83)))

since there are a few nested functions. It's not immediately clear what is occurring.


And then when the incompatibility between cases in packages is explained, the use of $proj4string doesn't come across too clearly. A good amount of effort goes into explaining the incompatibility-- maybe a bit more could go into explaining how proj4string introduces compatibility across the packages. I think "st_crs(DEM_NAD83) == st_crs(SFtracts_NAD83)" gets at this point enough but for a brief moment only looking at the output of st_crs(SFtracts_NAD83)$proj4string alone may not clearly show why this workaround works.

Or, if the main point is more so trying to find workarounds then all is well. It's a bit nit picky here

Same CRS Different EPSG

CRS Transformations

st_crs(DEM_NAD83) == st_crs(SFtracts_NAD83)

is true because the CRS' are both NAD83, but the former has EPSG",9122 and the latter "EPSG",4269.

So the workaround:
DEM_NAD83 = projectRaster(DEM, projectExtent(DEM, st_crs(SFtracts_NAD83)$proj4string))

works, but it's a bit confusing because the alternative way of reprojecting is inputing the EPSG. So it could be useful to clarify this difference in CRS and EPSG

st_length

There may be an error with the 'meters' column of bike boulevard.

{r}
bart_lines$len_mi <- units::set_units(st_length(bart_lines), mi)
bart_lines$len_km <- units::set_units(st_length(bart_lines), km)
bart_lines$len_m <- units::set_units(st_length(bart_lines), m)

bart_lines$len_m <- units::set_units(st_length(bart_lines), m)

head(bart_lines)

When you calculate the lengths by transforming to various CRS'

bart_lines$len_NAD83 <- units::set_units(st_length(st_transform(bart_lines,26910)), m)
bart_lines$len_WebMarc <- units::set_units(st_length(st_transform(bart_lines,3857)), m)
bart_lines$len_WGS84 <- units::set_units(st_length(st_transform(bart_lines,4326)), m)

you see that Web Marc outputs the closest length to the 'meters' column. The transformation to WGS84 is redundant and was done just to verify how the st_length function is working. So in theory both these outputs should be the same as the 'meters' column.

Caused some confusion during the workshop.

09_Raster_Data.Rmd, raster::getData()

I wonder if it's relevant to include a piece about the raster::getData function when we need to import ELEV data into the San Francisco bicycling pain analysis map. getdata() is an extremely useful way to import geographical data directly into the R computing environment. The imported data can be a little cryptic, but here is one blog that explains exactly what is being imported with the function.

Workshop Title

Workshop title should be "R-Geospatial-Data:-Parts-1-2"

NAD27

We say "NAD27 is old and inaccurate! Don't use it." and then use the DEM in Section 1 that uses NAD27... and then we later transform the other data into NAD27 which seems a bit against the statement. It could be helpful to make a statement about this or use this to show why NAD27 is outdated? Or just project into a different CRS overall.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.