Git Product home page Git Product logo

polo's Introduction

Open Source at IFTTT Build Status

Polo

Polo

Polo travels through your database and creates sample snapshots so you can work with real world data in any environment.

Polo takes an ActiveRecord::Base seed object and traverses every whitelisted ActiveRecord::Association generating SQL INSERTs along the way.

You can then save those SQL INSERTS to .sql file and import the data to your favorite environment.

Motivation

Read our blog post or check out this presentation.

Usage

Given the following data model:

class Chef < ActiveRecord::Base
  has_many :recipes
  has_many :ingredients, through: :recipes
end

class Recipe < ActiveRecord::Base
  has_many :recipes_ingredients
  has_many :ingredients, through: :recipes_ingredients
end

class Ingredient < ActiveRecord::Base
end

class RecipesIngredient < ActiveRecord::Base
  belongs_to :recipe
  belongs_to :ingredient
end

Simple ActiveRecord Objects

inserts = Polo.explore(Chef, 1)
INSERT INTO `chefs` (`id`, `name`) VALUES (1, 'Netto')

Where Chef is the seed object class, and 1 is the seed object id.

Simple Associations

inserts = Polo.explore(Chef, 1, :recipes)
INSERT INTO `chefs` (`id`, `name`) VALUES (1, 'Netto')
INSERT INTO `recipes` (`id`, `title`, `num_steps`, `chef_id`) VALUES (1, 'Turkey Sandwich', NULL, 1)
INSERT INTO `recipes` (`id`, `title`, `num_steps`, `chef_id`) VALUES (2, 'Cheese Burger', NULL, 1)

Complex nested associations

inserts = Polo.explore(Chef, 1, :recipes => :ingredients)
INSERT INTO `chefs` (`id`, `name`) VALUES (1, 'Netto')
INSERT INTO `recipes` (`id`, `title`, `num_steps`, `chef_id`) VALUES (1, 'Turkey Sandwich', NULL, 1)
INSERT INTO `recipes` (`id`, `title`, `num_steps`, `chef_id`) VALUES (2, 'Cheese Burger', NULL, 1)
INSERT INTO `recipes_ingredients` (`id`, `recipe_id`, `ingredient_id`) VALUES (1, 1, 1)
INSERT INTO `recipes_ingredients` (`id`, `recipe_id`, `ingredient_id`) VALUES (2, 1, 2)
INSERT INTO `recipes_ingredients` (`id`, `recipe_id`, `ingredient_id`) VALUES (3, 2, 3)
INSERT INTO `recipes_ingredients` (`id`, `recipe_id`, `ingredient_id`) VALUES (4, 2, 4)
INSERT INTO `ingredients` (`id`, `name`, `quantity`) VALUES (1, 'Turkey', 'a lot')
INSERT INTO `ingredients` (`id`, `name`, `quantity`) VALUES (2, 'Cheese', '1 slice')
INSERT INTO `ingredients` (`id`, `name`, `quantity`) VALUES (3, 'Patty', '1')
INSERT INTO `ingredients` (`id`, `name`, `quantity`) VALUES (4, 'Cheese', '2 slices')

Advanced Usage

Occasionally, you might have a dataset that you want to refresh. A production database that has data that might be useful on your local copy of the database. Polo doesn't have an opinion about your data; if you try to import data with a key that's already in your local database, Polo doesn't necessarily know how you want to handle that conflict.

Advanced users will find the on_duplicate option to be helpful in this context. It gives Polo instructions on how to handle collisions. Note: This feature is currently only supported for MySQL databases. (PRs for other databases are welcome!)

There are two possible values for the on_duplicate key: :ignore and :override. Ignore keeps the old data. Override keeps the new data. If there's a collision and the on_duplicate param is not set, Polo will simpy stop importing the data.

Ignore

A.K.A the Ostrich Approach: stick your head in the sand and pretend nothing happened.

Polo.configure do
  on_duplicate :ignore
end

Polo::Traveler.explore(Chef, 1, :recipes)
INSERT IGNORE INTO `chefs` (`id`, `name`) VALUES (1, 'Netto')
INSERT IGNORE INTO `recipes` (`id`, `title`, `num_steps`, `chef_id`) VALUES (1, 'Turkey Sandwich', NULL, 1)
INSERT IGNORE INTO `recipes` (`id`, `title`, `num_steps`, `chef_id`) VALUES (2, 'Cheese Burger', NULL, 1)

Override

Use the option on_duplicate: :override to override your local data with new data from your Polo script.

Polo.configure do
  on_duplicate :override
end

Polo::Traveler.explore(Chef, 1, :recipes)
INSERT INTO `chefs` (`id`, `name`) VALUES (1, 'Netto')
ON DUPLICATE KEY UPDATE id = VALUES(id), name = VALUES(name)
...

Sensitive Fields

You can use the obfuscate option to obfuscate sensitive fields like emails or user logins.

Polo.configure do
  obfuscate :email, :credit_card
end

Polo::Traveler.explore(AR::Chef, 1)
INSERT INTO `chefs` (`id`, `name`, `email`) VALUES (1, 'Netto', 'eahorctmaagfo.nitm@l')

Warning: This is not a security feature. Fields can still easily be rearranged back to their original format. Polo will simply scramble the order of strings so you don't accidentally end up causing side effects when using production data in development. It is not a good practice to use highly sensitive data in development.

Advanced Obfuscation

For more advanced obfuscation, you can pass in a custom obfuscation strategy. Polo will take in a lambda that can be used to transform sensitive data.

Using a :symbol as an obfuscate key targets all columns of that name. Passing an SQL selector as a String will target columns within the specified table.

Polo.configure do

  email_strategy = lambda do |email|
    first_part = email.split("@")[0]
    "#{first_part}@test.com"
  end

  credit_card_strategy = lambda do |credit_card|
    "4123 4567 8910 1112"
  end

  # If you need the context of the record for its fields, it is accessible
  # in the second argument of the strategy
  social_security_strategy = lambda do |ssn, instance|
    sprintf("%09d", instance.id)
  end

  obfuscate({
    'chefs.email' => email_strategy, # This only applies to the "email" column in the "chefs" table
    :credit_card  => credit_card_strategy, # This applies to any column named "credit_card" across every table
    :ssn_strategy => social_security_strategy
  })
end

Polo::Traveler.explore(AR::Chef, 1)
INSERT INTO `chefs` (`id`, `name`, `email`) VALUES (1, 'Netto', '[email protected]')

Installation

Add this line to your application's Gemfile:

gem 'polo'

And then execute:

$ bundle

Or install it yourself as:

$ gem install polo

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/IFTTT/polo. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Code of Conduct.

To run the specs across all supported version of Rails, check out the repo and follow these steps:

$ bundle install
$ bundle exec appraisal install
$ bundle exec appraisal rake

License

The gem is available as open source under the terms of the MIT License.

polo's People

Contributors

afine avatar bolshakov avatar charliepark avatar compwron avatar craigmcnamara avatar derencius avatar devinfoley avatar dmnelson avatar dv avatar jakeonfire avatar jasdeepsingh avatar jdelstrother avatar joerichsen avatar jonwaghorn avatar nettofarah avatar nickyleach avatar orthographic-pedant avatar rapito avatar rf- avatar silvamerica avatar tiagoamaro avatar volkanunsal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

polo's Issues

Specify table name in obfuscate calls

As of today, we just blindly try to match a global list of fields we want to be obfuscated against all the attributes of every row Polo traverses:

next if intersection(instance.attributes.keys, fields).empty?

To do so, we pass in a list of fields to Polo.configure

Polo.configure do
  obfuscate :email
end

While this has been working well, it could be the case that someone wants to only obfuscate certain fields in certain tables.

What I'm suggesting is that we change the obfuscate method to also accept something like this:

Polo.configure do
  obfuscate "users.email" 
end

So in this case we're only obfuscating email fields in the users table, and not obfuscating emails on some other tables such as invites or some other table where emails are not sensitive.

SQLite Support

We currently use SQLite in our test cases, but I never actually tried to populate a SQLite database with a data dump from Polo.

We'd need to implement both on_duplicate strategies for SQLite and make sure everything else works.

has_and_belongs_to_many relationships will not store join table

Title perhaps doesn't explain the problem well, code does:

class Account
  has_and_belongs_to_many :roles
end

class Role
  has_and_belongs_to_many :accounts
end

account = Account.create name: "Matt"
role = Role.create name: "admin"

account.roles << role

Polo.explore(Account, [1,2,3], [:roles])

This will result in the INSERT SQL for account and role being created, but not for the join table. The "solution" appears to be to use has_many through: instead, and always have an AR model with an ID column as the join table.

I don't have a solution, so I feel the README should at least clarify that HABTM doesn't work right now, and to use has_many through: instead.

Encountered on Rails 3.2

Issue with PostgreSQL adapter: ActiveRecord::StatementInvalid: PG::ProtocolViolation: ERROR

Hello!

Thank you for open sourcing this gem, think it's a brilliant idea. I've been able to give it a try with sqlite3, but I haven't had much luck making it work with PostgreSQL. (Latest Rails 4.2 and Ruby 2.2.3)

Given a User model, with a attribute of name, if one is present in the database; when I run Polo.explore(User, 1), I get the following stack trace:

irb(main):001:0> Polo.explore(User, 1)
  User Load (0.3ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1  [["id", 1]]
  User Load (0.7ms)  SELECT "users".* FROM "users" WHERE "users"."id" = 1
  User Load (1.0ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1
ActiveRecord::StatementInvalid: PG::ProtocolViolation: ERROR:  bind message supplies 0 parameters, but prepared statement "" requires 1
: SELECT "users".* FROM "users" WHERE "users"."id" = $1
    from /Users/bcb/.gem/ruby/2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `async_exec'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `block in exec_no_cache'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract_adapter.rb:472:in `block in log'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activesupport-4.2.5/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract_adapter.rb:466:in `log'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/postgresql_adapter.rb:592:in `exec_no_cache'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/postgresql_adapter.rb:584:in `execute_and_clear'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/postgresql/database_statements.rb:160:in `exec_query'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract/database_statements.rb:355:in `select'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract/database_statements.rb:32:in `select_all'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract/query_cache.rb:70:in `select_all'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activerecord-4.2.5/lib/active_record/querying.rb:39:in `find_by_sql'
    from /Users/bcb/.gem/ruby/2.2.3/gems/polo-0.1.0/lib/polo/translator.rb:24:in `block in instances'
    from /Users/bcb/.gem/ruby/2.2.3/gems/polo-0.1.0/lib/polo/translator.rb:23:in `each'
    from /Users/bcb/.gem/ruby/2.2.3/gems/polo-0.1.0/lib/polo/translator.rb:23:in `flat_map'
    from /Users/bcb/.gem/ruby/2.2.3/gems/polo-0.1.0/lib/polo/translator.rb:23:in `instances'
... 5 levels...
    from /Users/bcb/.gem/ruby/2.2.3/gems/railties-4.2.5/lib/rails/commands/console.rb:9:in `start'
    from /Users/bcb/.gem/ruby/2.2.3/gems/railties-4.2.5/lib/rails/commands/commands_tasks.rb:68:in `console'
    from /Users/bcb/.gem/ruby/2.2.3/gems/railties-4.2.5/lib/rails/commands/commands_tasks.rb:39:in `run_command!'
    from /Users/bcb/.gem/ruby/2.2.3/gems/railties-4.2.5/lib/rails/commands.rb:17:in `<top (required)>'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activesupport-4.2.5/lib/active_support/dependencies.rb:274:in `require'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activesupport-4.2.5/lib/active_support/dependencies.rb:274:in `block in require'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activesupport-4.2.5/lib/active_support/dependencies.rb:240:in `load_dependency'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activesupport-4.2.5/lib/active_support/dependencies.rb:274:in `require'
    from /Users/bcb/code/polo_rails_test/bin/rails:9:in `<top (required)>'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activesupport-4.2.5/lib/active_support/dependencies.rb:268:in `load'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activesupport-4.2.5/lib/active_support/dependencies.rb:268:in `block in load'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activesupport-4.2.5/lib/active_support/dependencies.rb:240:in `load_dependency'
    from /Users/bcb/.gem/ruby/2.2.3/gems/activesupport-4.2.5/lib/active_support/dependencies.rb:268:in `load'
    from /Users/bcb/.rubies/ruby-2.2.3/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in `require'
    from /Users/bcb/.rubies/ruby-2.2.3/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in `require'

Here's a sample application that exhibits this behavior: https://github.com/bbonamin/polo_rails_test

Let me know if I can be of any help :) To be honest I'm kind of lost on where to go an fix this, but apparently it's related to the fact that PostgreSQL validates prepared statements. rails/rails#20077

Multi Database tests

We currently run all of our tests using SQLite in memory.

While this runs super fast and works great it doesn't really test Polo against some specific scenarios such as how different databases quote fields or how on_duplicate works with different databases.

My assumption is that we could set up some extra databases with appraisal and get some more coverage.

Rails 5.2 support

Seeing the following error. Is this not supported on Rails 5.2?

NoMethodError (undefined method `arel_attributes_with_values_for_create' for #<Reason:0x00007f9b7a9e2740>)

Keep a CHANGELOG

Hey @nettofarah, thank you for this awesome gem :)

This issue is more an idea. I'm finding hard times to check which version has been packaged to Rubygems and which one is the current version on master branch.

What do you think in keeping a Changelog? :)

Project Status

Is anyone at IFTTT still maintaining this? Master branch has some changes that would be nice to have in a release but in #54 @nettofarah mentioned being unable to publish to RubyGems. I'm happy to help out if necessary.

Usage examples

Provide some examples (documentation or ruby files) with some common use cases.

e.g.

  • Rake task to pull some complex data relationships
  • Code snippet of how to export generated inserts to a .sql file
  • Sample rsync command to download .sql files from a remote server
  • Sample scp command

Suggestion: Expose instance to obfuscation strategies

I think it makes sense for some more complex obfuscation strategies to be aware of the context in which they are operating. Even simple ones. Some examples

  1. You have a table of phone numbers which must be unique and valid after obfuscation. One way to achieve this would be a caesar cipher or a simple rotation. Neither of these do a particularly good job of obfuscating the data though. An approach we have used in the past is essentially to replace the first 7 digits of the record with the ID so "415-000-0000" would become "415-123-4567" for user record 1234567
  2. You have a table with some sensitive data and some not. Perhaps your users table. You wish to obfuscate emails only of non-engineers. To do this you need the context of the record.
  3. You store phone numbers both normalized and "pretty". You need to consistently transform the number in both cases so that the normalized and pretty number are the same.

My thought is that one would modify lib/polo/translator.rb#L50 to pass new_field_value(field, strategy, instance), and call strategy with an optional second argument of strategy.call(value, instance). I believe this would be backwards compatible too.

I'm happy to build this myself, just want some assurance maintainers are open to it.

nil exception in conjunction with globlaize 3

Hi,

I'm using globalize 3.1.0
with this gem I get a nil Exception within
cast_attribute. The error starts within
def raw_sql(record) because
record.column_for_attribute(key) for translated columns
returns nil.

Globalize offers a method translated_attributes to determine such attributes.
Is it possible to support globalize or simply skip columns without a database column?

Regards
Dieter

obfuscate does not work with lambda

def intersection(attrs, fields)
  attrs & fields.to_a.flatten.map(&:to_s)
end

fields.to_a.flatten.map(&:to_s) does not yield the array-of-strings you are looking for... instead it yields:

["{:password_digest => #20 < Proc:0x007fccdcb6abb0@RAILS_ROOT/lib/tasks/db_grep_associations.rake:92 (lambda)>}"]

Suggestion: Make the ID param optional

Currently Polo's usage goes by Polo.explore(Product, 1). But, in some cases you might not be wanting to look for an specific ID but for objects that have a specific type of data, for example: Polo.explore(Product.most_popular). Or, in cases where you completely truncate a table and re-populate it often and you can't have static primary key values.

In that case it would be nice to be able to omit the ID parameter. Right now we can workaround this by doing: Polo.explore(Product.most_popular, Product.most_popular.pluck(:id)), but that looks very inefficient.

Another possibility would be to remove completely the ID param and only rely on scopes for that, like Polo.explore(Product.where(id: 1)), but I do see the value of having that param as a way to simplify the usage.

Postgres support

The SQL file Polo generates doesn't work with Postgres on my setup. I noticed there has been some discussion of support for Postgres already, but only partial support has been implemented. Can anyone fill in the the status of the support for Postgres?

Stream queries when traversing database to reduce memory footprint

AFAIK Polo relies on ActiveSupport::Notifications.subscribed to traverse the ActiveRecord tree. This works fine when associated data is small however in production scenario, we have experienced huge memory consumption as collect_sql stores everything in memory. I was wondering if others have similar issues and how they solved this? Also, are there plans to perhaps implement a streaming style methodology (e.g. probably using yield) in future?

Handle id sequences with care

I realized that after restoring the dump file generated by Polo, I still had to reset the id sequences postgres uses to determine the next id for a new record. This entails looking up the highest id number in the imported rows, and running the following command for every table:

ALTER SEQUENCE users_id_seq RESTART WITH 208

It would be great if Polo did this automatically, though.

Rails 6.0 support

Similar to #50, it looks like Rails 6 has broken/removed attributes_with_values_for_create

NoMethodError: undefined method `attributes_with_values_for_create' for #<User:0x00007f9a584e9310>

advanced usage

@charliepark - could you please add a section in the Advanced Usage area that describes the MySQL-specific feature where you can pass in update or ignore and have the importing respect the desired action when there are collisions of keys.

Security Restrictions

Can you add in some ability to require obfuscation for certain fields, across the board. This can be opt-in, but once it's been opted-in to a codebase, it should not be able to be bypassed. Right now, you can obfuscate with:

Polo::Traveler.collect(AR::Chef, 1).translate(obfuscate: [:email])

This makes it very easy for someone to accidentally forget the obfuscate call, potentially pulling sensitive data out of the production environment. Something done either in a configuration (for the entire project and all models) or declaratively per-model would make it impossible to get non-obfuscated data out.

How are you using Polo?

I'm curious to know what cool stuff people are using Polo for.

  • What environments are you generating data for?
  • What version of Rails?
  • How big is your final .sql file?
  • What database are you using?
  • How many database tables is Polo touching?
  • How hard was it to find a good sample size?
  • Are you using any advanced features?
  • How are you running Polo in prod? rake task, rails runner, rails console, etc.
  • Do you have an automated process to generate the files?
  • How do you transfer your data across environments? Publishing artifacts, rsync, scp...

No need to answer everything, but I would love to know how people are using the library so we know what to prioritize.

Thank you for using Polo <3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.