shriram / gradescope-racket Goto Github PK
View Code? Open in Web Editor NEWInfrastructure to autograde Racket code on Gradescope
License: MIT License
Infrastructure to autograde Racket code on Gradescope
License: MIT License
At present, to test their code partway through an assignment, students will have to stub out all the missing functionality. From an instructor point of view, it could be easier to judge the completeness of a student's program by seeing the output of testing their than knowing about the first missing function.
Some assignments (e.g., cs019 summer placement, or equivalent of cs019 data scripting) have lots of separate problems bundled into one homework. Students are likely to develop these incrementally. Should they get output for each function?
Right now the autograder halts when it can't find a definition of any of the required functions. This means they would get no feedback at all even if they're done with some problems. They can manually work around it with stub functions, but that would create busywork and produce irritating output, for no good use. (And may reveal something of the intended tests before they've even tried anything.)
There seem to be two alternatives:
Break down the homework into several individual assignments. This seems quite annoying.
Add support to the auto-grader to just skip tests associated with a name. This requires some redesign of the infrastructure, because name-extraction and testing are currently disjoint.
Nevertheless, the second option above seems to be the best way to go?
It would be nice to reverse the logic and have students get automated feedback on their tests as opposed to their code. This requires a somewhat different setup.
@shriram @wilbowma : Any objections to an MIT license?
@dbp Just FYI: https://pkgs.racket-lang.org/package/khoury-gradescope, https://github.com/northeastern-khoury/gradescope-racket/.
Originally posted by @jasonhemann in #25 (comment)
Above
The Docker image name base-image
is too generic and may clash with other images (or at the very least is not memorable six months hence). Give it a better name. Update Dockerfiles, Makefile, and readme.
For whatever reason, I've found it difficult to wrap my head around making autograders with this: I think it's the copy repo, edit files, etc, workflow that for whatever reason is confusing me.
Working backwards, I started from wanting a single file to be an autograder for an assignment. They live in the repo for the course, and so should be self-contained, depending on the gradescope code in a library (and depending on shared pre-built docker images seems as bad / worse, given how poorly they seem to be maintained as infrastructure).
This is easy --- if you move the lib-grade.rkt
code into a package (which I tentatively named autogradescope
, though haven't published, as I wanted to open this issue first):
#!/usr/bin/env racket
#lang racket
(require autogradescope)
(require rackunit)
...
;; (define-var ...
;; (define-test-suite ...
The next thing that caused trouble for me was the hardcoded file names -- students would often submit slight variations (capitalized, etc), and while one could say this is a learning opportunity, it doesn't strike me as an important one. So, rather than (define-var foo from "file.rkt")
, I changed it so that it finds whatever file in the submission directory has the right extension (which is configurable, but defaults to .rkt
). If students are submitting multiple files, obviously this doesn't work (and perhaps that was the original motivation for the current design?).
(set-submission-extension! ".rkt") ; this is the default, so not actually needed.
(define-var my-function) ; these are found in the first file in the submission with the correct extension.
(define-var other-function)
...
The last thing that I changed was how testing of the autograders is done (i.e., on our own computers, before we send them to Gradescope). While there is certainly value in running on the same image (or, hopefully the same image) that runs on Gradescope, I suspect most of us rely on the fact that Racket 8.9 is Racket 8.9, across platforms (as, indeed, we need our testing to match what our students are doing), and so running natively should be "good enough", and can be a lot lighter weight: an environment variable can hardcode the path to the file where definitions are loaded from (and, when it is present, the results are printed to stdout):
$ SUBMISSION=./reference-soln.rkt ./run_autograder
{"score":"96.42857142857143","tests":[{"output":"Execution error in test named «blah»"}]}
Obviously, these are a bunch of changes, and some actively conflict with how this library currently works. At the same time, they aren't mutually exclusive (e.g., there can easily be two forms of define-var
), and if you were interested in turning this into a library (to support my first goal: having the autograders be single files, not depending on prebuilt docker images), I could certainly do a more careful job merging. On the other hand, I'm also fine with having a fork (which is why I haven't called the library gradescope-racket
). One thing though: could you stick a license on this code? :)
Because BSL does not export names, we currently use namespace tools to extract the name from inside the module. This works across many languages, but it has a subtle flaw: it does not respect rename-out
. For instance, in
(provide [rename-out (three four)])
(define three 3)
(define four 4)
the value of four
obtained from requiring the module is 3
but when obtained from the module's namespace, it's 4
.
Therefore, either the name extraction should respect rename-out
, or we should first try to obtain the name through regular means, and resort to namespace inspection only as a last resort. (A language without provide
has no need for rename-out
, so this should be safe.)
The Makefile currently hardcodes shriramk
in the grading image name. This should be a parameter.
To avoid Gradescope build times, there's a recommendation (also from Joe Politz) to pull from the autograder from a git repository.
https://gradescope-autograders.readthedocs.io/en/latest/git_pull/
Then there's only one build, and every update is automatically seen. However, this seems extremely wasteful (a git pull for each student). Furthermore, it's unclear that it's easy to set up for this project, because it's meant to be generic: the people customizing it would have to be able to set up their own public git repositories, etc.
The much better solution is to upload an image to Gradescope instead, and let Docker's layer caching, etc., do its job. In the meanwhile, a user comfortable enough with following Gradescope's instructions above can probably figure out how to set this up for themselves.
When obtaining multiple names, it's a nuisance to write define-var
over and over again. Instead, provide a define-vars
that enables extracting multiple names.
An outstanding bug seems to get in the way of raco
installing additional packages from the PPA racket. The .sh installation works around this. Would you consider a PR implementing the suggested work-around?
Create a student test file with the same names as used by the grader and check that these don't override the grader's names and hence behavior.
The documentation doesn't explain that it should be used, and why.
My students' will sometimes accidentally code infinite loops. As written, the entire test suite will time out when I test such code, which isn't as helpful as it could be.
(require racket/engine)
(define timeout-symbol (gensym 'timeout))
(define-syntax-rule (run-w/timeout e)
(let ([w/timeout (engine (λ (f) e))])
(if (engine-run *TIMEOUT-MS* w/timeout)
(engine-result w/timeout)
'(timeout-symbol "test timed out"))))
I'm currently needing to wrap this around almost all my tests, which seems gross and wrong. In principle I'll want to timeout on almost any tests, b/c they could accidentally loop in any program they write. It's a shame to have to set one single time-out.
@wilbowma , are you currently using https://gist.github.com/wilbowma/79330280f474ecc456916787028206cc ? Matthias suggests: https://www.mail-archive.com/[email protected]/msg32199.html on
So what's the right API and a good standard implementation to support it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.