Git Product home page Git Product logo

ethyca / fides Goto Github PK

View Code? Open in Web Editor NEW
328.0 23.0 70.0 96.24 MB

The Privacy Engineering & Compliance Framework

Home Page: https://ethyca.com/docs

License: Apache License 2.0

Dockerfile 0.04% Python 62.42% Mako 0.01% TypeScript 36.52% JavaScript 0.27% CSS 0.33% HTML 0.27% SCSS 0.09% Jinja 0.04%
python data data-privacy data-privacy-compliance developer-tools gdpr privacy-as-code hacktoberfest

fides's Introduction

Meet Fides: Privacy as Code

Latest Release Version Docker Docs Package License Code style: black Checked with mypy Twitter Coverage

Fides banner

โšก Overview

Fides (pronounced /fee-dhez/, from Latin: Fidฤ“s) is an open-source privacy engineering platform for managing the fulfillment of data privacy requests in your runtime environment, and the enforcement of privacy regulations in your code.

๐Ÿš€ Quick Start

Getting Started

In order to get started quickly with Fides, a sample project is bundled within the Fides CLI that will set up a server, privacy center, and a sample application for you to experiment with.

Minimum requirements (for all platforms)

  • Docker (version 20.10.11 or later)
  • Python (version 3.8 through 3.10)

Download and install Fides

Tip

We highly recommend setting up a Python virtual environment such as venv to install Fides into. For example:

mkdir ~/fides
cd ~/fides
python3 -m venv venv
source venv/bin/activate

Once your virtual environment is ready, you can easily download and install Fides using pip. Run the following command to get started:

pip install ethyca-fides

Deploy the Fides sample project

By default, Fides ships with a small project belonging to a fictional e-commerce store. Running the fides deploy up command builds a Fides project with all you need to run your first Data Subject Request against real databases.

fides deploy up

Explore the sample project

When your deployment finishes, a welcome screen will explain the key components of Fides and the sample "Cookie House" project.

If your browser does not open automatically, you should navigate to http://localhost:3000/landing.

The project contains:

  • The Fides Admin UI for managing privacy requests
  • The Fides Privacy Center for submitting requests
  • The sample "Cookie House" eCommerce site for testing
  • A DSR Directory on your computer to view results (./fides_uploads)

Run your first Privacy Access Request

Navigate to the Fides Privacy Center (http://localhost:3001), submit a "Download your data" request, provide the email address for the sample user ([email protected]), and submit the request.

Then, navigate to the Fides Admin UI (http://localhost:8080) to review the pending privacy request.

Use username root_user and password Testpassword1! to login, approve the request, and review the resulting package in your ./fides_uploads folder!

Next Steps

Congratulations! You've just run an entire privacy request in under 5 minutes! Fides offers many more tools to help take control of your data privacy. To find out more, you can run a privacy request on your own infrastructure, discover data mapping, or learn about the Fides Taxonomy.

๐Ÿ“– Learn More

The Fides core team is committed to providing a variety of documentation to help get you started using Fides. As such, all interactions are governed by the Fides Code of Conduct.

Documentation

For more information on getting started with Fides, how to configure and set up Fides, and more about the Fides ecosystem of open source projects:

Support

Join the conversation on:

Contributing

We welcome and encourage all types of contributions and improvements! Please see our contribution guide to opening issues for bugs, new features, and security or experience enhancements.

Read about the Fides community or dive into the contributor guides for information about contributions, documentation, code style, testing and more. Ethyca is committed to fostering a safe and collaborative environment, such that all interactions are governed by the Fides Code of Conduct.

โš–๏ธ License

The Fides ecosystem of tools are licensed under the Apache Software License Version 2.0. Fides tools are built on fideslang, the Fides language specification, which is licensed under CC by 4.

Fides is created and sponsored by Ethyca: a developer tools company building the trust infrastructure of the internet. If you have questions or need assistance getting started, let us know at [email protected]!

โš ๏ธ Advanced Setup for Microsoft SQL Server (MSSQL) Support

By default, running pip install ethyca-fides locally will not install the optional Python libraries needed for Microsoft SQL Server, since these rely on additional system dependencies (freetds)! However, if you do want to connect to MSSQL, you have two options:

  1. Use our pre-built Docker images which install these optional dependencies automatically: ethyca/fides. See our Deployment Guide for more!
  2. Install the required dependencies on your local development machine and run pip install ethyca-fides[all] to include "all" the optional libraries. Keep reading to learn more about this!

For local development setup on macOS, follow these steps:

  1. Install the required development libraries from Homebrew:
brew install freetds openssl
  1. Add the following to your shell (i.e. .zshrc) to ensure your compiler can access the freetds and openssl libraries, updating the paths & versions to match your local install:
export LDFLAGS="-L/opt/homebrew/Cellar/freetds/1.3.18/lib -L/opt/homebrew/Cellar/[email protected]/1.1.1u/lib"
export CFLAGS="-I/opt/homebrew/Cellar/freetds/1.3.18/include"
  1. Reinstall Fides with MSSQL support by including the all extra requirement:
pip install ethyca-fides[all]

fides's People

Contributors

adamsachs avatar allisonking avatar chriscalhoun1974 avatar conceptualshark avatar daveqnet avatar dependabot[bot] avatar dougfulton avatar earmenda avatar eastandwestwind avatar galvana avatar gilluminate avatar iamkelllly avatar jpople avatar kelsey-ethyca avatar lkcsmith avatar lucanovera avatar marcgethyca avatar nevilles avatar pattisdr avatar psalant726 avatar robertkeyser avatar rsilvery avatar sanders41 avatar seanpreston avatar ssangervasi avatar stevedmurphy avatar stevenbenjamin avatar theandrewjackson avatar thomaslapiana avatar tmuralikrishnan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fides's Issues

Write `evaluation` and `dry-evaluation` logic in `fidesctl`

  • By leaving the server evaluation logic intact, we have an escape hatch if needed
  • Would need to add the evaluation logic itself to the CLI as well as the logic for creating evaluation objects and sending them to the server
  • CLI needs to retrieve any missing objects from the server
  • This includes the ability to dry-run all resource types

This will be complete when:

  • we can dry-evaluate all resources on command
  • edit multiple resource files locally, run the evaluate on the local copy
  • we can run an evaluate without sending the evaluate object to the server using the -dry flag

Test cases:

  • all resources exist locally
  • none resources exist locally
  • resources exist in multiple repos

Documentation Pass post-rewrite

Need to go through and rewrite/update the following things once issue #82 and #81 and #80 and #79 are complete:

  • Deployment Guide
  • Tutorial Guide
  • Landing Readme
  • Getting started documentation

Not all of these might need changes but at least a few of them will for sure

update the readme.md

Need to clean up and simplify the Readme and point it to the proper docs

also need to update the docs to have a section about how to create/submit PRs

also should add a pronunciation guide for Fides :)

add a section in the docs about how we plan work

Intermittent `make server-test` failures

seems related to something in the caching/persistence, I know we ran into this before, maybe its related to the recent caching changes? i vaguely remember there being some

Update how sanitization is handled in FidesAPI

Currently, the FidesAPI will automatically sanitize certain inputs (such as the fidesKey and dataset field names) without surfacing that change to the user

This led to some weird behaviour during testing with fidesctl and after discussing with @stevenbenjamin , the solution is to throw errors instead of silently sanitizing inputs. otherwise, a manifest file can show a different name/fidesKey than what is actually being used by the API. This will also make it more clear to the user what is and isn't acceptable for different field types

Implement a Fides Lang module

This module will sit within the fidesctl folder (next to cli and core as a submodule) and contain the following logic:

  1. parsing the resource files
  2. validating the resource files (their types, constraints, etc.)
  3. the pydantic models for each resource
  4. update how configuration is handled (use pydantic basesettings, allow a default dir to be configured)

This issue will be complete when:

  • we can pip install fidesctl separately
  • import to another project (solon)

Make the config more robust/available everywhere

Now that the concept of a config exists in fidesctl, it needs to be leveraged more efficiently.

The server parameter should be converted to a config value and the config itself should probably be converted to a more concrete object instead of just a dictionary? there are definitely ways to get the code to interact with the config in a more elegant manner

this can be a pydantic model described in config.py to make it easier to extend, reuse loading methods, etc.

fidesctl load config information from a file

For storing e.g. login and credentials. As a part of this it would be nice to have a "fidesctl login" command which stores these, similar to what applications like AWS cli do.

Add revised versions of data uses, data subjects, and data qualifiers as YAML files

We've worked on many revisions of the language and structure for the "default" taxonomies; the current working versions are all here: https://docs.google.com/spreadsheets/d/1uP-McSq8cGXwpheHk55ZfLpIpizpj7XS8CnaD73wErY/edit. We'd like to get these checked into the repo now (as well as in the separate https://github.com/ethyca/Fides-Privacy-Taxonomy/ repo) so we can continue working with some external parties to both visualize and review the contents.

Within the linked spreadsheet, the first three tabs are marked:

  1. Fides Data Uses
  2. Fides Data Subjects
  3. Fides Data Qualifiers

The sheet itself is a WIP but is reasonably stable now so it's ready to get pulled into the repo and made part of fideslang and similar. A couple implementation notes on the "Data Use" tab though:

  • When converting Data Uses, use the "Proposed" column for the new name (and generate a new fides key to match)
  • Ensure that the Data Uses taxonomy is preserved; in the sheet, the parent is tracked using the prior "name" column, not the new "Proposed" name, so just be careful
  • Leave all the descriptions for the Data Uses blank as these need further editing & review before use

Tutorial Rework

Tutorial accessibility

  • Tutorial is still quite unapproachable; I really don't think it should start with a policy object (do that last). I'd recommend it gets cut into at least three steps to (1) define & apply your first system, (2) define & apply your first dataset, and (3) define & apply your first policy
  • Tutorial might be a lot easier to follow and document if there was a /demo folder that shipped with the source that was already structured with the example YAML files and you could follow along- e.g. /demo/step_1_manifests, /demo/step_2_manifests, etc.
  • Tutorial would be way more approachable with some diagrams and explanation of "why" I want to do any of these things. I think couching it in an example app (like a fictional eCommerce site selling items) would make it more approachable

Update verbiage from `Fides Objects` to Fides Resources

Resources is all around a better term for what we're trying to describe, can also do some other renaming along with this

  • Fides objects -> Fides resources
  • show cli command changed to list or ls
  • Update CLI docstrings to be more descriptive and helpful

Host MkDocs on GitHub Pages

With the Beta coming up we need to make sure that we have a solid docs page up and running, its much simpler to do it here than have people spin up the docs in the repo for feedback

There is a feature built into our docs tool that can publish to github pages, this should be investigated and integrated into CI

Add a `parse` command to the CLI

We need a way to get people into Fides without needing a server at all. This command would do it!

Not sure if i'm sold on verify, maybe validate? not sure, but the gist is that the command should leverage the Fideslang module's parsing functions to create a local taxonomy without making any calls to the server. This would then lay the foundation for doing local-only evaluations with zero calls home to a server

Rewrite FidesAPI in Python

This also includes the following requirements:

  • Remove some server-side logic i.e. evaluations, dry-evaluations, so that generally the primary purpose of FidesAPI is to manage CRUD operations
  • Remove the concept of "id's", instead allowing the fides_key to be the primary key for all resource types
  • rename field names to be pythonic (snake_case) instead of camelCase
  • An evaluations endpoint that accepts Evaluation objects from fidesctl, and evaluate updated on the fidesctl to send it
  • Policy objects need to have a systems field that specifies which policies apply to which systems

Improve new developer experience

A few of us have tried heading to the fides repo and getting started lately and I think it's pretty confusing. Holistically, it looks like we really need to step back and trim a lot of the docs to help smooth this out, but here are a few concrete suggestions, organized into a few themes...

Docs Consistency

  • The "Quick guide" in the README is too dense to actually work- it dives immediately into manifests, classifiers, datasets, etc. This should likely just get removed in favour of a clearer tutorial
  • The link to the tutorial docs in the README is a 404
  • The README in the /docs site references an old setup step instead of make docs-serve
  • Once I find the tutorial (https://github.com/ethyca/fides/blob/main/docs/fides/docs/tutorial.md) it does take me to the "getting started" guide for Docker, but I think you should consider just inlining that into the tutorial

Getting started errors

  • The first command in the getting started with docker guide is make cli, but when I ran this from a clean repo it fails with org.flywaydb.core.api.exception.FlywayValidateException: Validate failed: Migrations have failed validation. After I thought about it a bit, I ran make init-db (which succeeded) and then make cli again
  • Somehow, Kelly and I did manage to trigger build errors like "this project requires java 11" but I'm not sure exactly what command triggered that
  • There seems to be no caching of the sbt update command, so on a slow internet connection like mine this means make cli takes a few minutes each time...

Deployment

  • I realize now there is no "Deployment" guide for how someone would use this in their system without building from source- i.e. pull the docker images for CLI and Server, deploy those and start using it in their project. This is the primary use case we actually want, so what would that look like?

Persist Evaluation objects within the DB

the evaluate function needs to get updated so that it sends the Evaluate object it creates to the server

fidesapi.sql_models will also need to get updated with an Evaluation model (this will auto-generate the endpoints)

Remove passwords from env files

MYSQL_ROOT_PASSWORD="wIW97O5m^r0w"
MYSQL_USER="fidesdb"
MYSQL_PASSWORD="fidesdb"
MYSQL_DATABASE="fidesdb"

@ThomasLaPiana stumbled on this on Fri, but wanted to see if there's any way to not use explicit root passwords in our env files? i'm no expert, but...we are about to open this up to friends/family and don't want to have egg-on-face.

Reworking of approval structure

Rebuild approval output so that it is cleaner and more comprehensible.

  • add names to system declarations so that we can use those in approval output
  • document all types of checks we're doing in the doc site

Add a "diff" option to fidesctl

Need to be able to dry run manifests against what exists in the server to show what would change if those manifests were applied

a lot of this logic already exists in apply.py and could be abstracted out a bit into other functions within apply.py or maybe even an entirely new module

also should add a "--dry" flag to "apply" so that users can load the resource files and confirm they're at least locally valid

Remove all of the "mock" tests

Mock tests can lead to situations where tests are passing but aren't representative of reality. Need to replace any instances of mocked tests with full integration tests

Improve `make help` with a command list

The current make help leaves a lot to be desired - it says "Under construction" ๐Ÿ˜„

I have a common snippet I use in Makefiles to effectively "annotate" the targets with some docs, it works like this:

# This target auto-generates a helpdoc by parsing the Makefile itself for special "##" comments
# See https://gist.github.com/prwhite/8168133#gistcomment-3291344
.PHONY: help
help: ## show this help
	@egrep '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-20s\033[0m %s\n", $$1, $$2}'

All you need to do for that to work is write your targets in the format of:

<target>: ## <docstring>

The egrep + sort + awk magic does the rest. I'm not sure how well supported this would be on non-UNIX systems (ie Windows) though, since it does rely on those being available.

Flesh out the docs site

Some points that still need to get covered before we should consider it "ready":

  1. Talk about why Fides matters
  2. Give a concrete example

update `evaluate.py` so that the fideskey that gets passed in is used

Currently the evaluate command accepts a fides_key as a parameter and passes it to the evaluate.py module but it doesn't actually do anything...

The evaluate function needs to be updated so that it will only evaluate the policy that matches that fides_key. This can be found in the fidesctl.core.evaluate module in the evaluate function at the bottom.

Add a dry-run/ci check flag to evaluate the validity of the current manifest files

One of the core components of Fides is the ability for it to run as part of CI and the greater "SDLC". For this to be possible, Fides needs to be aware of when its running on main/master and when its running in a branch. My proposed solution for this is to add a new field to all database objects called branch or something similar, and have Fidesctl inject the name of the git branch into that field when sending objects to the server and also filtering for that field when querying the server for objects.

Improve docs for first-time readers

Overview

For this issue, I'm going to provide as many notes as possible on the current docs site - specifically as if I was reading up about Fides for the first time. I'll use the structure of the docs themselves to leave comments here, since I didn't do this on the original PR ๐Ÿ™ƒ

Home

Overall, I'd recommend we combine "Home" & "What Is Fides?". The very first paragraph of the docs should explain what the project is, who it's for, and give an example of how it helps you as an engineer. We are doing this twice in the docs right now and neither of them completely work independently. See React and Airflow as two examples of pretty good ones.

Once we combine the two sections, the general layout would be:

  • What is Fides?
  • Diagram
  • Quick Example
  • Core Components (Systems, Datasets, Registries, Policies) - this would be the descriptions from "What Is Fides?", plus the YAML examples from "Home".

Last feedback here is that the Example used it too complicated - we should focus on a really minimal example of a System, Dataset, and Policy.

License

Looks good

Getting Started

What does fidesctl ping do? It's not described why I should run that

Minor, but between the two setup methods they don't both start with "clone the Fides repo"

What is Fides?

This is a good section - see the comments in "Home" for how I think we should combine the two and lead with it

Tutorial

This is pretty lightweight, I'd expect to see example manifests / policies / etc. described in here. I'd be hard-pressed to use this tutorial to get started, I want to see example commands and more of a 1-2-3 guide to getting started with my first system+dataset+policy+evaluation, I think.

I wrote an imaginary Getting Started guide a few months ago that could be inspiration to help structure this more.

Fides Objects

To me, I'd expect to see the YAML syntax fully documented here, e.g. for each model all the supported keys and what they do (e.g. system.declarations.dataCategories). If this kind of documentation is better served by the API though (for example) we could link to the API docs.

The "Declarations" description is hard to understand when you read it, because it relies on the definitions from "Privacy Classifiers" section first, which is at the bottom. We'll likely need to reorder this.

The "Privacy Classifiers" needs a lot more explanation. What's a data category? What's a data use? Why do I care? What values can I use? We should also be citing ISO 19004 here, because that's what gives these initial definitions a bunch of merit and weight.

NOTE: I'm noticing here an API quirk that's going to bother me... why is it dataUse and not dataUseCategory?

Overall: this section is pretty good and it's explaining the concepts through each object and examples. I think we should try to reorganize them slightly by ensuring the description follows a consistent pattern like:

  • What is the object used for?
  • What are the fields/arguments/options/etc.?
  • 2-3 Example Uses

So for example, take the dataset object:

### Dataset
In Fides, `datasets` are used to declare what kind of data is stored in a given database, by annotating individual database columns with an accurate `data category`. A single dataset can declare one or more tables, each of which should list all the fields in that table and annotations for each. This allows you to easily determine exactly where personal data is stored in your databases, since you've gone through the process of annotating them!

Structure:
* `organizationId` (required): ID of the organization this dataset belongs to
* `fidesKey` (required): a unique...
* ...

(you get the idea)

### Fidesctl
TODO

### Fides Server
TODO

### Deployment
TODO

### Contributing
TODO

Drop YAML support from the server

Everything is standardized as JSON within the CLI (even though manifests are described as YAML, they get converted), is it worth tearing the YAML support out? If it is a significant amount of double code that needs to get maintained/updated, it's probably not worth keeping

Additional Evaluation features

After the huge #86, there is still some work to be done here!

  • Consider a dataset's privacy data when evaluating a system that declares a relationship to it
  • the function that finds FidesKeys should be recursive so that it can find nested references
  • evaluations currently don't consider the hierarchy of type, only if there is a strict match. i.e. identified data is more dangerous than anonymized data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.