Comments (8)
Here's what I'm proposing:
Two services:
odot_crash_data
- will contain the ODOT crash data.passenger_census
- will contain the ridership data; the namepassenger_census
comes from the CSV file we received.
Container port numbers and their host mappings and postgres
user passwords will be set from a local .env file.
We need to define a mechanism for the Dockerfiles to acquire the input database dump files without the user having to download them. In other words, I want to be able to do a wget
orcurl
in the Dockerfile that runs at image build time, rather than a doing it with a Dockerfile COPY. This is something we have to get nailed down for DevOps / deployment anyway, so we might as well solve it this week. ;-) See hackoregon/civic-devops#3.
from transportation-systems.
I'll get some data on my personal dev s3 account and setup a billing alert and we can play around a little.
If we can get a proof of concept and cost idea, there would be pretty quick adoption I would imagine. This should be a priority in my mind because then we ensure we are working from the same data and saving manual hours updating.
from transportation-systems.
OK ... how does S3 authentication work? Is it like everything else (a PEM key, ssh-stuff?)
from transportation-systems.
Access and secret key.
You will need to add the aws cli client to your DOCKERFILE:
RUN pip install --upgrade --user awscli
We did something similar to pull our secrets last year:
https://github.com/hackoregon/backend-service-pattern/blob/master/bin/getconfig.sh
Which was called in the entrypoint file:
https://github.com/hackoregon/backend-service-pattern/blob/master/bin/docker-entrypoint.sh
from transportation-systems.
Yeah - syncing with S3 is built into cookiecutter's data science template
from transportation-systems.
Ok so I went ahead and setup the following access policy (actual bucket name is redacted):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"<ACTUAL ARN>"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": [
"<ACTUAL ARN>"
]
}
]
}
I then attached this policy to a IAM group and created a user within it. Will provide creds through slack.
The creds will work for either a docker or cookiecutter setup as you wish. it looks like cookiecutter is using the sync command from the cli:
it looks like we may need to name the folder within the bucket as "data"?
from transportation-systems.
I'm hacking away on this in https://github.com/hackoregon/data-science-pet-containers. It's just about where I want it, so I'm planning a "formal release" later this week.
I'm testing a utility called rclone
(https://rclone.org/) for the cloud syncing. It's available in all the Linux distros, including Debian. It seems to be well maintained and will sync just about anywhere, not just S3. But IMHO it is not suitable for deployment, just for desktops. It's interactive and its secrets management scheme would probably rule out its use even in self-managed servers.
from transportation-systems.
I put this on the back burner for the Tech Challenge but I'm back on it. I just have one major documentation task and another example scenario to do.
from transportation-systems.
Related Issues (20)
- Update Project Documentation HOT 5
- issues! HOT 1
- write the app
- Ingest Ridership Data HOT 3
- Crash Data Django API Development
- Change All Repository Licenses to MIT License HOT 2
- PostgreSQL cleanup / documentation HOT 2
- Build / distribute TriMet congestion database HOT 1
- Change license of this repository to MIT HOT 1
- Congestion data exploratory data analysis HOT 3
- Develop a script to install PostGIS in Amazon Linux 2 HOT 1
- Convert ridership data analysis repository to Cookiecutter data science format HOT 1
- Eliminate the need for PostGIS in the deployed PostgreSQL server HOT 1
- Extract query information from ODOT crash database HOT 1
- Ridership data exploratory data analysis HOT 2
- Explore feasibility of doing GIS API using Spatialite instead of PostGIS for ridership data HOT 2
- Rename Database Dumps to confirm to standards HOT 2
- Use the Multnomah County Permit Data to Document Team Process
- Change Visualization Type for blockchange to Text HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transportation-systems.