Comments (8)
We will review the implementation. But I suspect it is due to the hash function. I think we should provide hash function choice and hash function params. We will try to resolve it in the next release.
For now, I can suggest a temporal solution. You can use a simple shell script to implement any hashing function. For instance
#!/bin/bash
while read line
do
printf "%s" "$line" | md5sum | awk '{print $1}'
done
And the config can be like:
- schema: "humanresources"
name: "employee"
transformers:
- name: "Cmd"
params:
driver:
name: "text"
expected_exit_code: -1
skip_on_null_input: true
executable: "/var/lib/playground/test.sh"
columns:
- name: "jobtitle"
The result
Read about Cmd transformer
from greenmask.
Another finding about the Hash
transformer, it seems to generate duplicate values even for a small set of values.
I have a table users
with 1000 records and the email
column is unique (UNIQUE INDEX), when I use the Hash
I get an error during pg_restore: ERROR: could not create unique index \"users_email_ux\""
from greenmask.
Never mind, it was my fault, the UNIQUE INDEX
has a FILTER
condition allowing duplicate emails for deleted users.
Sorry for the false alarm.
from greenmask.
I gave it a try, the new hash
transformer is crazy fast now, thank you
from greenmask.
I see that the current hash implementation is using an encryption algorithm, maybe a hash algorithm would be faster.
I'm going to push back this requirement for now until there is a built-in solution for it, thanks
from greenmask.
Agreed. We will try to deliver it soon, but any contribution is appreciated. Thank you!
from greenmask.
Yeah, a collision was caused. I will rewrite the implementation with the possibility of choosing a hash function (md5, sha1, SHA224/256/384/512). The expected release date is 14 February. Thank you so much for reporting.
from greenmask.
FIxed in v0.1.5
from greenmask.
Related Issues (20)
- locale_provider not recognized during restore with create database true HOT 3
- Epic: Implement dynamic parameters for trasnformers
- Epic: Determninistic transformations
- Feat: RandomPerson transformer implementation HOT 1
- Epic: V0.2b release
- Feat: RandomIp transformer implementation
- Feat: Documentation deployment with multiversion support
- Greenmask V0.1.13 SIGSEGV HOT 6
- Bug: --data-only flag interfere with --schema-only
- doc: Review documentation for v0.2 release
- feat: Add type validation for dynamic parameters encoders
- feat: Database subset
- feat: unique transformations
- Feat: RandomMacAddress transformer implementation
- permission denied for large object during dump action HOT 10
- fix: Enrich dynamic parameter validation warning
- feat: Set min and max values not required for int values
- feat: Implement LargeObjects inclusive and exclusive list
- feat: Noise* transformers - allow empty min or max params
- Feature request: transformer "timestamp with time zone" HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from greenmask.