Git Product home page Git Product logo

Comments (8)

wwoytenko avatar wwoytenko commented on May 24, 2024 1

We will review the implementation. But I suspect it is due to the hash function. I think we should provide hash function choice and hash function params. We will try to resolve it in the next release.

For now, I can suggest a temporal solution. You can use a simple shell script to implement any hashing function. For instance

#!/bin/bash

while read line
do
   printf "%s" "$line" | md5sum | awk '{print $1}'
done

And the config can be like:

    - schema: "humanresources"
      name: "employee"
      transformers:
        - name: "Cmd"
          params:
            driver:
              name: "text"
            expected_exit_code: -1
            skip_on_null_input: true
            executable: "/var/lib/playground/test.sh"
            columns:
              - name: "jobtitle"

The result

image

Read about Cmd transformer

from greenmask.

viniciuschiele avatar viniciuschiele commented on May 24, 2024 1

Another finding about the Hash transformer, it seems to generate duplicate values even for a small set of values.

I have a table users with 1000 records and the email column is unique (UNIQUE INDEX), when I use the Hash I get an error during pg_restore: ERROR: could not create unique index \"users_email_ux\""

from greenmask.

viniciuschiele avatar viniciuschiele commented on May 24, 2024 1

Never mind, it was my fault, the UNIQUE INDEX has a FILTER condition allowing duplicate emails for deleted users.
Sorry for the false alarm.

from greenmask.

viniciuschiele avatar viniciuschiele commented on May 24, 2024 1

I gave it a try, the new hash transformer is crazy fast now, thank you

from greenmask.

viniciuschiele avatar viniciuschiele commented on May 24, 2024

I see that the current hash implementation is using an encryption algorithm, maybe a hash algorithm would be faster.

I'm going to push back this requirement for now until there is a built-in solution for it, thanks

from greenmask.

wwoytenko avatar wwoytenko commented on May 24, 2024

Agreed. We will try to deliver it soon, but any contribution is appreciated. Thank you!

from greenmask.

wwoytenko avatar wwoytenko commented on May 24, 2024

Yeah, a collision was caused. I will rewrite the implementation with the possibility of choosing a hash function (md5, sha1, SHA224/256/384/512). The expected release date is 14 February. Thank you so much for reporting.

from greenmask.

wwoytenko avatar wwoytenko commented on May 24, 2024

FIxed in v0.1.5

from greenmask.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.