Git Product home page Git Product logo

sql-ecology-lesson's Introduction

Create a Slack Account with us Slack Status

Data Carpentry SQL Lesson for Ecologists

This repository contains the Data Carpentry SQL material based on ecological data. Please see our contribution guidelines for information on how to contribute updates, bug fixes, or other corrections.

Maintainers

Current maintainers

Past maintainers

sql-ecology-lesson's People

Contributors

almightybeeij avatar amyehodge avatar apawlik avatar ashander avatar bsmith89 avatar c-martinez avatar caseyyoungflesh avatar christinalk avatar emmanuellern avatar erinbecker avatar ethanwhite avatar fmichonneau avatar heereman avatar hlapp avatar jacquim avatar jd-foster avatar jrherr avatar juanfrh avatar kariljordan avatar kcranston avatar ltngo avatar maneesha avatar metajinomics avatar nickynicolson avatar orchid00 avatar pmarchand1 avatar remram44 avatar tracykteal avatar waveform80 avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sql-ecology-lesson's Issues

Encourage data download on landing page

It's encouraged, and the files needed are descrilbed in 00-sql-introduction but not on the landing page.

Mentioning the 4 files required on the landing page, rather than just providing the doi, might encourage a few more students to at least download the 16mb zip?

Potentially add a module on 'Using SQL with your own data'

This module would be the last one and would discuss how to use SQL with your own data. It could

  • Emphasize that SQLite is fine for their own work
  • Give an overview of what they would do, i.e. import your own CSV files like we did here
  • List the potential limitations of SQLite
  • List other database types and some of their strengths/weaknesses (not overwhelmingly technical, more just like "handles databases up to 10GB well", "is good for multiple users", "need access to a Linux server to use and install", "command line interface only")
  • Discuss that once a database is created, you can connect to it through other languages like R or Python (and that we'll demonstrate that in the capstone)
  • Have some links to materials or info on good database design

Contributing guidelines need fixes reflecting split-out lessons

  • Repo to submit pull requests to is no longer the datacarpentry repo.

  • Lessons are no longer in subdirectories:

    Every lesson has a sub-directory of its own, while individual topics are files in that directory. For example, the lessons/shell directory holding our introduction to the shell contains the files 00-intro.md, 01-filedir.md and so on.

Improvements to the JOINS section

I think it would be a good idea to start the "Joins and aliases" section with an example of what happens if we do a JOIN but do not use an ON, as this is not a necessity. Basically explaining that this will lead to a joining of all rows of one table to all rows of the other table. Then explaining why we don't want this, e.g. big tables=lots of calculations = lots of time, finally explaining that we want to join tables based on common entries in two or more tables using ON. Here we also might want to mention briefly, that this relates back to database design and table structure, as we can only join tables that have common table/column entries. So we need to think about how to design the DB upfront.

We should mention that
"ON surveys.species_id = species.species_id" and "USING (species_id)"
will not get the same result, as the "species_id" column in the later will only appear once.

Seb

Add info on ability to change data type within SQLite after data is imported

At the beginning we have people import the data from csv as an exercise. There's a step on import where you have to select the data type for the field. Some people select the wrong type or nothing at all and wonder what they can do to change it.

If you right click or Control click on the column name in the Structure tab, then there's an Edit Column option and you can type in a new data type. When you go to save it, it does warn you that this is dangerous behavior, so people are being warned, but it might be nice to show this option and also to say why it's not generally a good idea to change data types.

kramdown rendering

Is it safe to just remove markdown from the _config.yml in anticipation for the May 1st deprecation of redcarpet in profit of kramdown?

Pinging @datacarpentry/coordinators and @datacarpentry/maintainers too, maybe this is being discussed in issues on another repo.

Current online lesson pages are broken

The current lesson page for this repo linked from the main Data Carpentry site are displaying like this:

screenshot from 2016-08-29 09 10 06

When I build the pages locally using Jekyll they work, but are filled with FIXME's.

Looks like this has been down for at least 3 days. Cc'ing @gvwilson since he was involved in the most recent changes.

Question about teaching SQL lite

Hey Ethan, et al. I am working through the install instructions for SQL lite.

The lesson, as written instructs students to install SQL as a plugin for firefox. I went through that and it is fairly straight forward. However the SWC setup instructions suggest that Sql comes installed with OSX natively but there is a windows installer.

http://ouinformatics.github.io/2015-04-15-ncar/

How is SQLlite normally taught? Should we ask students to go the firefox route? Or is there a gui version of the database that runs independently of the firefox browser.

Thank you for any feedback.
Leah

On data import data emphasize types for the fields

Feedback from a workshop. When learners import data, we do have in step 10 the data types that are supposed to be used, but it's easy to miss, so maybe we could restructure that a little.

Also, for the import exercise, we don't tell students what the data types should be.

Format Key Points

These are not currently formatted in any of the episodes. If there are none, delete the box

  • 00-sql-introduction.md
  • 01-sql-basic-queries.md
  • 02-sql-aggregation.md
  • 03-sql-joins-aliases.md

Species not a variable - use species_id

Received via email:

FYI: In the section below, there is a typo. In the CHALLENGE, you are asked to query species. There is no species in the surveys table. You must use species_id.

Sorting
We can also sort the results of our queries by using ORDER BY. For simplicity, let’s go back to the species table and alphabetize it by taxa.

SELECT *
FROM species
ORDER BY taxa ASC;
The keyword ASC tells us to order it in Ascending order. We could alternately use DESC to get descending order.

SELECT *
FROM species
ORDER BY taxa DESC;
ASC is the default.

We can also sort on several fields at once. To truly be alphabetical, we might want to order by genus then species.

SELECT *
FROM species
ORDER BY genus ASC, species ASC;

Challenge
Write a query that returns year, species, and weight in kg from the surveys table, sorted with the largest weights at the top.

SQL aggregation "Null values" section: code blocks not formatted as code

In the "null values" section of the SQL Aggregation episode/page, we see:

But if we try to be extra clever, and find the average ourselves, we might get tripped up:

SELECT SUM(weight), COUNT(), SUM(weight)/COUNT() FROM summer_2000 WHERE species_id == ‘PE’

and the SELECT statement isn't formatted as a code block. This is because that SELECT statement -- like others on the same page -- isn't indented enough in the Markdown file. I'm fixing this with a pull request.

Concept map for "what is a relational database"

This issue springs from discussion of pull request #99 in which @heereman wrote a definition of a relational database, and @orchid00 and I thought it might be a good idea to add a concept map to the instructor's notes for this lesson.

My question:

It seems to me like the concepts and linkages @heereman mentioned in their definition are most of the concepts and linkages that ought to go into the concept map -- here's my list, and I'd love feedback on what else should go into the map for this lesson, or what I could leave out:

  • users/researchers
  • queries
  • SQL
  • data/values
  • storage
  • retrieval
  • tables
  • rows
  • columns
  • records
  • record IDs
  • atomic
  • relationships
  • joining
  • functions
  • nulls
  • aliases

I'd love to know others' feedback on what I could leave out of this! I'm thinking maybe "atomic"?

software we use to teach SQL

We currently have people install the Firefox plugin to use SQL. It's nice because it tends to work under all OSes, but there are some challenges because people can't save commands, they don't quite understand how they'll use it after a workshop, and because you can't zoom, when you teach the font is tiny and hard to read on the screen.

An alternative when we're teaching R already is to instead use the RSQLite package and teach SQL through R. This solves a lot of those problems and also works well with a learners workflow. @lwasser already has a lesson that includes it https://github.com/datacarpentry/R-ecology/blob/gh-pages/05-r-and-sql.Rmd and when she taught it, it went well. @karthik has also taught this way and suggested this approach in conversation.

Lesson Release checklist

Freeze Date: 2017-03-15
Scheduled Release Date: 2017-04-21

Checklist of tasks to complete before release:

  • check that the learning objectives reflect the content of the lessons
  • check that learning objectives are phrased as statements using action words
  • check for typos
  • check that the live coding examples work as expected
  • if example code generates warnings, explain in narrative and instructor notes
  • check that challenges and their solutions work as expected
  • check that the challenges test skills that have been seen
  • check that the setup instructions are up to date (e.g., update version numbers)
  • check that data is available and mentions of the data in the lessons are accurate
  • check that the instructor guide is up to date with the content of the lessons
  • check that all the links within the lessons work (this should be automated)
  • make sure formatting of the code in the lesson looks good (e.g. line breaks)
  • check for clarity and flow of narrative
  • update README as needed
  • fill out “overview” for each module - minutes needed for teaching and exercises, questions and learning objectives
  • check that contributor guidelines are clear and consistent
  • clean up files (e.g. delete deprecated files, insure filenames are consistent)
  • update the release notes (NEWS)
  • tag release on GitHub

Add mention of different JOIN types

In the lesson on joins, a brief mention of the existence of multiple types of join might be useful. If students try to apply JOIN to their own data and aren't aware that non-matching rows will be dropped, it could be frustrating. I would be happy to do a PR adding this caution to the end of "Joins" if desired.

committing directly to gh-pages, oops

In preparation for a workshop at UNC, I made several commits directly to gh-pages (checked with @ethanwhite who said "go ahead!' and then, "oops! I am no longer the maintainer). Note that none of them affect the lesson directly, only adding / editing instructor notes and also the lesson documentation (specifically, updating the maintainers)

Let me know if you are ok with these changes, or if you want me to revert and PR. @tpoisot @orchid00

Markdown headers

Is there a rationale for the markdown headers? It seems to be a mix of # and -, and I'm just a little bit obsessive about these things.

Should we decide on a standard, or is there one in place that is convoluted to the point of eluding me?

Add discussion of HAVING

I always end up discussing this because someone always asked about it. It should be included as part of the notes, probably at the end of 02.

Add more introduction to SQLite manager

Add some introduction to the buttons, tabs, and navigation in SQLite manager early in the lesson. This is something that has proved confusing for students in the past.

Add a list of functions

In 01-..., the "Calculated values" header mentions functions. Maybe a table would be nice.

Switch to using data from Portal Project Teaching Database

We've recently cleaned up the data for this lesson further and posted it on figshare. The lesson should be converted to using the figshare data, which involves:

  • Changing links
  • Changing the column names that have been modified
  • Changing any reference to species code NA to refer to NL since NA as replaced with NL

Which learning objectives should we assess?

Greetings everyone,

I'd appreciate your feedback on my current project. We've successfully revised the learning objectives for all of the Ecology lessons to reflect what we are teaching, and now we are in the process of developing surveys to assess our learners (before and after the workshop) on their skills and self-efficacy for the tools they were taught.

Your feedback is extremely valuable. In this document I've added the learning objectives for all of the Ecology lessons. I would appreciate if you'd open this document and add a +1 to the learning objectives you think are most important to assess our learners.

Additionally, I'm scheduling a virtual meeting to review the objectives, discuss your +1's, and come to a consensus. Please provide your availability here.

Our goal isn't to change what we're teaching, but to better understand what we're teaching, and ensure our learning objectives reflect that, so that we can assess our learners.

Thank you for your feedback and your time.
Kari

P.S. Thank you so much to all the maintainers who helped revise the learning objectives!

@ethanwhite @tpoisot @tracykteal @apawlik @fmichonneau @ChristinaLK @c-martinez @kcranston @heereman @waveform80 @nickynicolson @caseyyoungflesh @ErinBecker @jrherr @metajinomics @hlapp @aurielfournier @cbahlai @karthik

SQLite plugin install

sqlSetup.html should include instructions for installing the SQLite plugin for Firefox.

Do you want gh-pages for this repo?

Hi Ethan / Tracy,
I'm working on the next DC workshop and have created a gh-pages for the python lessons.

http://lwasser.github.io/python-ecology/

I plan to PR against the main DC repo when we are done. Do you want a gh-pages for SQL and shell as well so you can link to lessons like SWC does? i'm going through these now for our workshop and can edit as needed as i go.
Leah

Handling NULL values in SQL

The dataset we use has a lot of empty cells, so the issue of how to handle missing data often comes up. Can we add a section on handling missing data, like with IS NULL and IS NOT NULL?

check that the learning objectives reflect the content of the lessons [Lesson Release checklist]

in: 00-sql-introduction.md

The learning objectives listed at the top of this lesson are not all covered in this .md file and they are not in the same order as encountered in the lesson.

They should either match the 2 objectives given in the header, or be trimmed to the 3 that are covered:

  • Describe why relational databases are useful.
    
  • Define SQLite data types.
    
  • Create and populate a database from a text file.
    

Learning Objectives

Create and populate a database from a text file.
Describe why relational databases are useful.
Define SQLite data types.
Select, group, add to, and analyze subsets of data.
Combine data across multiple tables.

potentially change structure of JOIN statement

Our current JOIN statement is

SELECT *
FROM surveys
JOIN species ON surveys.species_id = species.species_id

When I restructured it as

SELECT *
FROM surveys JOIN species
ON surveys.species_id = species.species_id

participants suggested it made more sense to see on the same line the two tables that were being joined. I don't know if this conforms to standard formatting or if it would help beyond this one case, but just a thought.

From legacy Access to SQLite

In #45, @tpoisot suggests the idea of a "From Excel to SQL" capstone module. A related idea might be best practices for going "From legacy Access to SQLite". Is this beyond the scope of this lesson or does this sort of thing already exist elsewhere?

More detail on section about Views

@ethanwhite mentioned that Views are the standard way to save queries. This, along with how to export csv files of the data from a query are already mentioned in the current lesson.

Could we add some more detail to the Views section though, to explain what a view is, how to create it and how to use it again later? Something like 'the anatomy of a view'? I think both instructors and learners would benefit for some more detail in that section.

A few points I've run across in particular:

  • you can select Create View and it pops up a window. Do you copy and paste a query in there, or is it possible to create a View from a query you just ran?
  • Export View brings up an Export Wizard. You can export the CSV which makes sense that you're exporting the data that's the result of the current query. There's also the option to export the SQL and a couple buttons. Would you ever want to export the SQL? If you do, what buttons do you select?
  • Once you create a View, I found that I could see what query I ran by clicking on the 'Structure' tab, but that wasn't intuitive. I would have expected it to appear in the 'Execute SQL' tab.

Split material into lessons?

Currently all material is on one file. Most modules are split into smaller chunks - each one is a separate lesson. It may be good to do the same with SQL (?). This should also be combined with adding more examples and some more material / revising wording to expand the lessons.

check for clarity and flow of narrative [Lesson release checklist]

in: 00-sql-introduction.md

The definition below could specify that records (rows) and fields (columns). This is not necessarily obvious to a complete [novice.]

Definition: Relational database

A relational database is a digital database where all data is stored in relations (or tables) containing rows and columns. All tables contain records and all records are identified by a field contiaing a unique value. Every table shares at least one field with another table in one-to-one, one-to-many or many-to-many relationships. This allows the user to access the data in many ways {: .callout}

Improve one clause per line code block and discussion

There are currently two versions of the same SQL statement. The idea is to demonstrate that a better way of writing longer queries is one clause per line, but this isn't discussed in the notes. The code block should be split into two separate code blocks and language about the one clause per line approach added.

Numeric `IN` queries

I was working through the Basic queries lesson and I got stuck for a moment when trying to use numeric arguments to the IN keyword. Up until this point all the numeric queries are unquoted. However, I believe they need to be quoted when used with IN. Submitting a PR shortly with an example query.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.