dlab-berkeley / cloud-computing-working-group Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
The wiki attached to cloud-computing-working-group can be modified by me, maybe by anyone, if they use the web interface. But if I check wiki out using
git clone \
https://github.com/dlab-berkeley/cloud-computing-working-group.wiki.git
Then I am not allowed to push changes upstream. Is this a feature? I would
like to use git to upload some notes into the wiki.
We went over ssh in the first meeting, but I forget what was recommended! Also should we be doing some port/IP filtering in addition to that? I'll admit I don't know much about how the internet works. It's all trucks and tubes to me.
Let's setup this github repo to publish upcoming events and info from our draft meeting schedule the way that THW does.
Per a 2015-10-29 discussion with Harrison Dekker, it'd be great to have a cookbook for cloud provisioning of what amounts to a "virtual computing lab" - either a multi-user server or a set of virtual machine instances and other facilities that can be spun up on demand for a training, workshop, or short course.
The use case that Dekker came up with was for setting up R Studio Server and/or a SQL database for trainings. My own recent use case was an inquiry to Research IT by a student teaching a URAP course, who needed to provide students with Chinese-language web scraping software.
My understanding is that folks like @aculich, @ryanlovett, @paciorek, et al. may have many pieces of this they've developed for Stat courses. If we can put together some generic, vanilla templates, scripts and/or instructions for doing this, that'd be of significant benefit.
For starters, use the form. :)
I'd like to know how much memory I need in an instance to complete a job, but I'm not sure how best to predict that. The job script is up on this repo and involves running a C program to find maximal cliques in a graph. Is there a best practice around predicting memory usage? Is there a shell script I could run that would gather memory usage, maybe on smaller test batches?
The input data is at worst a two column matrix with 27,797,685 rows containing 352,151 unique integers. I'm not sure what the largest working memory allocation in the maximal cliques routine actually is, but I'm hoping there's a practical test that would prevent me from having to study the code (I'm not a C programmer). No surprise, I get a segfault trying to run this on the free tier BCE instance on EC2, which has 1GB of memory.
I'm grateful for any advice about the cheapest instance I can run that would finish this job!
If we're penny pinching, how can we add notifications in our bash scripts to ping us when we should shut down the instance?
If a job is complete or if we know that we need to press pause on an instance, what are our options for stopping the meter? Do we have to shut it down entirely, or is there a pause button?
@aculich set me up with RStudio Server running on my instance (we'll create documentation for that elsewhere if you want to do it too). I tried to log in again later and it wasn't working! I forgot that I'd set the security group using the MyIP option, so when my IP changed EC2 wouldn't let me through. I was a little confused because my server said it was listening and all looked good from inside the instance.
I changed my IP using the menu (screen below) and it worked like a charm! Will I have to go into the EC2 console every time I log in though? What if I want collaborators to access RStudio Server and I don't know their IPs?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.