Git Product home page Git Product logo

Comments (8)

znmeb avatar znmeb commented on September 8, 2024

Here's what I'm proposing:

Two services:

  1. odot_crash_data - will contain the ODOT crash data.
  2. passenger_census - will contain the ridership data; the name passenger_census comes from the CSV file we received.

Container port numbers and their host mappings and postgres user passwords will be set from a local .env file.

We need to define a mechanism for the Dockerfiles to acquire the input database dump files without the user having to download them. In other words, I want to be able to do a wget orcurl in the Dockerfile that runs at image build time, rather than a doing it with a Dockerfile COPY. This is something we have to get nailed down for DevOps / deployment anyway, so we might as well solve it this week. ;-) See hackoregon/civic-devops#3.

from transportation-systems.

BrianHGrant avatar BrianHGrant commented on September 8, 2024

I'll get some data on my personal dev s3 account and setup a billing alert and we can play around a little.

If we can get a proof of concept and cost idea, there would be pretty quick adoption I would imagine. This should be a priority in my mind because then we ensure we are working from the same data and saving manual hours updating.

from transportation-systems.

znmeb avatar znmeb commented on September 8, 2024

OK ... how does S3 authentication work? Is it like everything else (a PEM key, ssh-stuff?)

from transportation-systems.

bhgrant8 avatar bhgrant8 commented on September 8, 2024

Access and secret key.

You will need to add the aws cli client to your DOCKERFILE:

RUN pip install --upgrade --user awscli

We did something similar to pull our secrets last year:

https://github.com/hackoregon/backend-service-pattern/blob/master/bin/getconfig.sh

Which was called in the entrypoint file:

https://github.com/hackoregon/backend-service-pattern/blob/master/bin/docker-entrypoint.sh

from transportation-systems.

znmeb avatar znmeb commented on September 8, 2024

Yeah - syncing with S3 is built into cookiecutter's data science template

from transportation-systems.

bhgrant8 avatar bhgrant8 commented on September 8, 2024

Ok so I went ahead and setup the following access policy (actual bucket name is redacted):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "<ACTUAL ARN>"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": [
                "<ACTUAL ARN>"
            ]
        }
    ]
}

I then attached this policy to a IAM group and created a user within it. Will provide creds through slack.

The creds will work for either a docker or cookiecutter setup as you wish. it looks like cookiecutter is using the sync command from the cli:

https://github.com/drivendata/cookiecutter-data-science/blob/master/%7B%7B%20cookiecutter.repo_name%20%7D%7D/Makefile#L47

it looks like we may need to name the folder within the bucket as "data"?

from transportation-systems.

znmeb avatar znmeb commented on September 8, 2024

I'm hacking away on this in https://github.com/hackoregon/data-science-pet-containers. It's just about where I want it, so I'm planning a "formal release" later this week.

I'm testing a utility called rclone (https://rclone.org/) for the cloud syncing. It's available in all the Linux distros, including Debian. It seems to be well maintained and will sync just about anywhere, not just S3. But IMHO it is not suitable for deployment, just for desktops. It's interactive and its secrets management scheme would probably rule out its use even in self-managed servers.

from transportation-systems.

znmeb avatar znmeb commented on September 8, 2024

I put this on the back burner for the Tech Challenge but I'm back on it. I just have one major documentation task and another example scenario to do.

from transportation-systems.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.