Comments (1)
I would challenge that best practice, as described, but only where using Multiple Hash for ALL hash generation within your solution. The best practice is there for cases where you are using HashBytes (or it's equivalent in other DBMS').
On face value, as described, it will increase your risk of collisions due to dissimilar records generating the same hash.
In my view what is suggested will result in the following scenario where dissimilar records generate the same hash value:
Col 1 | Col 2 | Col 3 | Hash |
---|---|---|---|
Text | NULL | Text 2 | 0x123456 |
Text | Empty String | Text 2 | 0x123456 |
If you are writing your own hashing from scratch, or using hashbytes, then other than empty string, that best practice will reduce the risk of dissimilar records generating the same hash. You need to include replacement of NULL with something that you can guarantee will never appear in your data to be safe.
Col 1 | Col 2 | Col 3 | Hash |
---|---|---|---|
Text | NULL -> Never In Text | Text 2 | 0xab4f35 |
Text | Empty String | Text 2 | 0x123456 |
See Best practices on developing Data Vault in SQL Server (including SSIS)
Version 1.3 and greater of my component has a "Safe Null Hashing" option that is enabled by default to prevent this type of issue.
My component generates a byte array of the input to hash by repeating the following for each input column in the hashed field:
- Append input column content to a data byte array, ignoring null columns, and formatting all input data into maximum precision
- Append Null indicator to indicator byte array
- If the input column is variable length, append the length of the field to the indicator byte array
This guarantees that there can be no duplicate hash values generated for input data (with the exception of hash collisions), and is a better solution than the "best practice".
That being said, if you really want an option added to support Pipe delimiting with null replacement hashing, which would have to be mutually exclusive of Safe Null Handling, it can be done.
It would require an increment to the version (1.8), as this will need a new property added (or the SafeNullHandling property updated to support On, Off and DataVault.), which needs the upgrade to execute when opening an old package.
The CaclulateHash function in Utility.cs will need updating to support creating the inputByteBuffer in the correct format.
And of course the UI's need updating to support the new option.
The license doesn't prevent changes, (fork or pull), or running your own repository (clone). You just have to use the same license...
from ssismhash.
Related Issues (20)
- Installer v1.7.2 and Visual Studio 2017 (15.5.5) /SSDT 15.1.61710.120 - Does not appear in SSIS Toolbox HOT 1
- Cannot selectively choose hash columns HOT 6
- Possibility for a CustomProperty to specify numOfThreads HOT 3
- Installation to Azure SSIS runtime HOT 7
- Packages fail to run on server HOT 3
- Installation location of Multiple Hash transform non-default SQL Server install location HOT 8
- No Setup.exe file in the download SSISMHash.zip. How do I install? HOT 2
- Anyway to migrate from 2008 to 2017 ? HOT 2
- SQL Server 2019 Support HOT 4
- Installer not compatible error HOT 1
- MultiHash 64 not working on SQL Server 2019 HOT 1
- Incorrect version showing in About menu HOT 4
- v1.7.5.0 is unsigned HOT 1
- Support SQL Server 2022 HOT 3
- Sha2(512) Value different in ADF/Snowflake than in SSIS HOT 1
- SSDT 17.2 compatibility HOT 3
- Proper location for Install for SSDT 2015 HOT 4
- SQL 2016 SP1 Support HOT 2
- Installer issue v1.7.2 - Not showing up in 140>DTS HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ssismhash.