blairconrad / dicognito Goto Github PK
View Code? Open in Web Editor NEWA library and command line tool for anonymizing DICOM files
License: MIT License
A library and command line tool for anonymizing DICOM files
License: MIT License
anon-anon-anon-anon-anon-anon-myfile.dcm
InstitutionName
is currently picked just from the InstututionAddress
with "CLINIC" appended. So there's essentially a 1 in 40 chance there will be a collision, and we run a lot of tests. I'm seeing failures.
Either make the name more uniquey or stop testing it for uniqueness. Maybe just test to see that it's the same as in the address.
Sometimes I want to anonymize a whole directory, throwing out what was there. This makes it easier to then store everything in this directory without worrying about file names, and it gets rid of possible confidential data from your system.
And export __version__
and __version_info__
package members.
Dicom store tools generally store everything in the current directory and subdirectories, so we should be able to anonymize everything in the subfolders too. Maybe this should include the 'overwrite' flag by default (Issue #21)
https://dicom.innolitics.com/ciods/segmentation/patient/00120062
Note that if set to "YES", it indicates that the Pixel Data is clean, so we may not be able to do this, unless
BurnedInAnnotation is "NO"
Connects to #42.
When anonymizing using a set seed, the date (time) offsets will sometimes vary.
To see this, anonymize an object a few times with the same seed.
This is the reason that test_multivalued_date_and_time_pair_gets_anonymized
fails from time to time.
While possibly technically not incorrect behaviour, if the new issuer is unknown to the receiving system, it can cause problems with study validation. For now, do not add an issuer if there wasn't one.
Anonymizing a dataset containing a value for 0031,0020, which would typically be a Private Creator Data Element, results in dicognito erroring out with
Error occurred while converting <_io.BytesIO object at 0x0000022DD4CEA160>. Aborting.
Traceback (most recent call last):
File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\tag.py", line 28, in tag_in_exception
yield
File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\dataset.py", line 2474, in walk
callback(self, data_element) # self = this Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Dev\dicognito\src\dicognito\anonymizer.py", line 151, in _anonymize_element
if handler(dataset, data_element):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Dev\dicognito\src\dicognito\idanonymizer.py", line 67, in __call__
if self._anonymize_mitra_global_patient_id(dataset, data_element):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Dev\dicognito\src\dicognito\idanonymizer.py", line 97, in _anonymize_mitra_global_patient_id
dataset[(mitra_linked_attributes_group << 16) + private_tag_group].value
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\dataset.py", line 988, in __getitem__
elem = self._dict[tag]
~~~~~~~~~~^^^^^
KeyError: (0031, 0000)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:\Dev\dicognito\src\dicognito\__main__.py", line 76, in main
anonymizer.anonymize(dataset)
File "E:\Dev\dicognito\src\dicognito\anonymizer.py", line 134, in anonymize
dataset.walk(self._anonymize_element)
File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\dataset.py", line 2472, in walk
with tag_in_exception(tag):
File "D:\Users\amidu\AppData\Local\Programs\Python\Python311\Lib\contextlib.py", line 155, in __exit__
self.gen.throw(typ, value, traceback)
File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\tag.py", line 32, in tag_in_exception
raise type(exc)(msg) from exc
KeyError: 'With tag (0031, 0020) got exception: (0031, 0000)\nTraceback (most recent call last):\n File "E:\\Dev\\dicognito\\.venv\\dicognito\\Lib\\site-packages\\pydicom\\tag.py", line 28, in tag_in_exception\n yield\n File "E:\\Dev\\dicognito\\.venv\\dicognito\\Lib\\site-packages\\pydicom\\dataset.py", line 2474, in walk\n callback(self, data_element) # self = this Dataset\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\\Dev\\dicognito\\src\\dicognito\\anonymizer.py", line 151, in _anonymize_element\n if handler(dataset, data_element):\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\\Dev\\dicognito\\src\\dicognito\\idanonymizer.py", line 67, in __call__\n if self._anonymize_mitra_global_patient_id(dataset, data_element):\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\\Dev\\dicognito\\src\\dicognito\\idanonymizer.py", line 97, in _anonymize_mitra_global_patient_id\n dataset[(mitra_linked_attributes_group << 16) + private_tag_group].value\n ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\\Dev\\dicognito\\.venv\\dicognito\\Lib\\site-packages\\pydicom\\dataset.py", line 988, in __getitem__\n elem = self._dict[tag]\n ~~~~~~~~~~^^^^^\nKeyError: (0031, 0000)\n'
It shouldn't error out.
Often we want to preserve original files. One option is to copy them away and then anonymize, but that can be tedious. An alternative is a mode where the dicognito command line tool will write anonymized files to another directory.
Proposal:
-o
/--output-directory
option to specify an output directory.dcm
I had a study with multiple DICOM files. In some of the files the patient name was like 'LAST^FIRST^MIDDLE' and in others 'LAST^FIRST^MIDDLE^'. These are obviously the same patient name, but because of the trailing '^' dicognito assumed they were different and anonymized them differently.
https://dicom.innolitics.com/ciods/segmentation/patient/00120063
Multivalued, in the case that the object was anonymized more than once.
Required if Patient Identity Removed (0012,0062) is present and has a value of YES and De-identification Method Code Sequence (0012,0064) is not present. May be present otherwise.
Warning of burned in annotations was added in #44.
Some clients may prefer the potential presence of burned in annotations to actually fail the operation. Support this.
Many thanks for this great tool!
I need to map the original patient name to the anonymized name in order to include clinical data in my analysis. I am running in recursive mode with one input directory including exam data from multiple patients. The output Accession Number Patient ID Patient Name
only includes the anonymized data. Is there an existing way to obtain this mapping when running in recursive mode with multiple patients per session? I'm happy to implement this and open a PR if not, I just wanted to check to make sure I'm not missing anything obvious. Thanks for your help!
Anonymizing a dataset with a supplied StationName
attribute but no sibling Modality
fails with this output:
Traceback (most recent call last):
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
yield
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2040, in walk
callback(self, data_element) # self = this Dataset
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
if handler(dataset, data_element):
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 48, in __call__
element_anonymizer(dataset, data_element)
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 67, in anonymize_station_name
data_element.value = dataset.Modality + "01"
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 778, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'Modality'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
yield
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2046, in walk
dataset.walk(callback)
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2046, in walk
dataset.walk(callback)
File "c:\program files (x86)\python38-32\lib\contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 34, in tag_in_exception
raise type(ex)(msg)
AttributeError: With tag (0008, 1010) got exception: 'Dataset' object has no attribute 'Modality'
Traceback (most recent call last):
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
yield
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2040, in walk
callback(self, data_element) # self = this Dataset
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
if handler(dataset, data_element):
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 48, in __call__
element_anonymizer(dataset, data_element)
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 67, in anonymize_station_name
data_element.value = dataset.Modality + "01"
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 778, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'Modality'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\program files (x86)\python38-32\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\program files (x86)\python38-32\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Program Files (x86)\Python38-32\Scripts\dicognito.exe\__main__.py", line 7, in <module>
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\__main__.py", line 119, in main
anonymizer.anonymize(dataset)
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 114, in anonymize
dataset.walk(self._anonymize_element)
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2046, in walk
dataset.walk(callback)
File "c:\program files (x86)\python38-32\lib\contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 34, in tag_in_exception
raise type(ex)(msg)
AttributeError: With tag (0018, 9506) got exception: With tag (0008, 1010) got exception: 'Dataset' object has no attribute 'Modality'
Traceback (most recent call last):
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
yield
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2040, in walk
callback(self, data_element) # self = this Dataset
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
if handler(dataset, data_element):
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 48, in __call__
element_anonymizer(dataset, data_element)
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 67, in anonymize_station_name
data_element.value = dataset.Modality + "01"
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 778, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'Modality'
Traceback (most recent call last):
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
yield
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2040, in walk
callback(self, data_element) # self = this Dataset
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
if handler(dataset, data_element):
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 48, in __call__
element_anonymizer(dataset, data_element)
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 67, in anonymize_station_name
data_element.value = dataset.Modality + "01"
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 778, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'Modality'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
yield
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2046, in walk
dataset.walk(callback)
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2046, in walk
dataset.walk(callback)
File "c:\program files (x86)\python38-32\lib\contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 34, in tag_in_exception
raise type(ex)(msg)
AttributeError: With tag (0008, 1010) got exception: 'Dataset' object has no attribute 'Modality'
Traceback (most recent call last):
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\tag.py", line 27, in tag_in_exception
yield
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 2040, in walk
callback(self, data_element) # self = this Dataset
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
if handler(dataset, data_element):
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 48, in __call__
element_anonymizer(dataset, data_element)
File "c:\program files (x86)\python38-32\lib\site-packages\dicognito\equipmentanonymizer.py", line 67, in anonymize_station_name
data_element.value = dataset.Modality + "01"
File "c:\program files (x86)\python38-32\lib\site-packages\pydicom\dataset.py", line 778, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'Modality'
It would be useful for searching on the PACS system if all the Patient IDs and AccessionNumbers, and probably StudyIDs had a prefix or suffix that was supplied when running dicognito.
I'm unsure about a separator character between the prefix/suffix and the rest of the ID. I think I'm tending towards 'leave it off' as it could be passed in as part of the prefix/suffix. Something like:
dicognito -prefix "PD-" some.dcm
The first thing I generally want to know after anonymizing a file is how to find it. Consider adding output like this:
dicognito some*.dcm
Created 'anon-some1.dcm' with PatientID: ABCD1234 and AccessionNumber: EFAB5678
Created 'anon-some2.dcm' with PatientID: ABCD1234 and AccessionNumber: CDEF9012
...
This might be too verbose when running against a large dataset, so you could have a -q flag to suppress it.
The current readme states:
Anonymization causes significant attributes, such as identifiers, names, and addresses, to be replaced by new values
Please document exactly which attributes are modified
because we expect to find "Date" and "Time" at the end of the element name.
This would allow dicognito to essentially act as a fake modality that can help populate a PACS with stuff it considers as new.
Inspired by #30.
We should support at least the latest minor, but as many as are practical, I suppose.
If it's possible to support earlier versions of Python 2 while we're at it, why not?
When using the command line, any files with a deflated transfer syntax, e.g. Deflated Explicit VR Little Endian, will be corrupt when saved.
This is due to pydicom/pydicom#1086.
Describe the bug
When anonymizing a DICOM file that includes encapsulated pixel data as described in A.4 Transfer Syntaxes For Encapsulation of Encoded Pixel Data, that is
if the data fragment contains 4 consecutive bytes FE FF DD E0, which would be how the terminating Sequence Delimiter Item tag would appear, the fileutil.read_undefined_length_value
method considers the fragment to end at the delimiter
After the read, the pixel data element's value's length is short, cut off at the point the Sequence Delimiter Item appears, with assumed additional tags following. Then s the anonymization attempts to access the dataset via walk
, we see the following error:
Expected 4 zero bytes after undefined length delimiter at pos 0bf4
Traceback (most recent call last):
File "D:\Sandbox\pydicom\pydicom\dataelem.py", line 735, in DataElement_from_raw
value = convert_value(VR, raw, encoding)
File "D:\Sandbox\pydicom\pydicom\values.py", line 623, in convert_value
raise NotImplementedError(message)
NotImplementedError: Unknown Value Representation '0x01 0x00'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Sandbox\pydicom\pydicom\tag.py", line 27, in tag_in_exception
yield
File "D:\Sandbox\pydicom\pydicom\dataset.py", line 2032, in walk
data_element = self[tag]
File "D:\Sandbox\pydicom\pydicom\dataset.py", line 861, in __getitem__
self[tag] = DataElement_from_raw(data_elem, character_set)
File "D:\Sandbox\pydicom\pydicom\dataelem.py", line 737, in DataElement_from_raw
raise NotImplementedError("{0:s} in tag {1!r}".format(str(e), raw.tag))
NotImplementedError: Unknown Value Representation '0x01 0x00' in tag (0000, 0000)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\program files (x86)\python38-32\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\program files (x86)\python38-32\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Sandbox\dicognito\src\dicognito\__main__.py", line 142, in <module>
main()
File "D:\Sandbox\dicognito\src\dicognito\__main__.py", line 119, in main
anonymizer.anonymize(dataset)
File "D:\Sandbox\dicognito\src\dicognito\anonymizer.py", line 119, in anonymize
dataset.walk(self._anonymize_element)
File "D:\Sandbox\pydicom\pydicom\dataset.py", line 2039, in walk
dataset.walk(callback)
File "c:\program files (x86)\python38-32\lib\contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "D:\Sandbox\pydicom\pydicom\tag.py", line 34, in tag_in_exception
raise type(ex)(msg)
NotImplementedError: With tag (0000, 0000) got exception: Unknown Value Representation '0x01 0x00' in tag (0000, 0000)
Traceback (most recent call last):
File "D:\Sandbox\pydicom\pydicom\dataelem.py", line 735, in DataElement_from_raw
value = convert_value(VR, raw, encoding)
File "D:\Sandbox\pydicom\pydicom\values.py", line 623, in convert_value
raise NotImplementedError(message)
NotImplementedError: Unknown Value Representation '0x01 0x00'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Sandbox\pydicom\pydicom\tag.py", line 27, in tag_in_exception
yield
File "D:\Sandbox\pydicom\pydicom\dataset.py", line 2032, in walk
data_element = self[tag]
File "D:\Sandbox\pydicom\pydicom\dataset.py", line 861, in __getitem__
self[tag] = DataElement_from_raw(data_elem, character_set)
File "D:\Sandbox\pydicom\pydicom\dataelem.py", line 737, in DataElement_from_raw
raise NotImplementedError("{0:s} in tag {1!r}".format(str(e), raw.tag))
NotImplementedError: Unknown Value Representation '0x01 0x00' in tag (0000, 0000)```
Expected behavior
The dataset would be anonymized.
Steps To Reproduce
Discovered when opening a patient's Video Endoscopic Image (1.2.840.10008.5.1.4.1.1.77.1.1.1), which I can't share because of PHI concerns, and also it's quite large. See JPEG2000-embedded-sequence-delimiter.zip for a constructed dataset. After extracting, attempt to anonymize it.
Environment
module | version |
---|---|
platform | Windows-10-10.0.18362-SP0 |
Python | 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:20:19) [MSC v.1925 32 bit (Intel)] |
pydicom | 2.1.0.dev0 (b9fb05c177b685bf683f7f57b2d57374eb7d882d) |
dicognito | any version, including current master (7e9b068) |
Cause
This is caused by pydicom/pydicom#1140.
They are woefully lacking, and it would be nice to upgrade them to just "insufficient".
If Burned In Annation is "YES", our anonymization won't be sufficient.
We could emit a WARN level log message to indicate that there are images with burned-in demographics.
If the attribute is absent, our anonymization still might not be sufficient, so we have to decide what to do about that. I suspect it's often absent, so we might want to take no action.
We could add a flag so users can customize the behaviour. For example
… because we look for "Date" at the beginning or end of the element name:
UnboundLocalError: With tag (200b, 102b) got exception: local variable 'time_name' referenced before assignment
Traceback (most recent call last):
File "c:\program files\python37\lib\site-packages\pydicom\tag.py", line 30, in tag_in_exception
yield
File "c:\program files\python37\lib\site-packages\pydicom\dataset.py", line 1773, in walk
callback(self, data_element) # self = this Dataset
File "c:\program files\python37\lib\site-packages\dicognito\anonymizer.py", line 120, in _anonymize_element
if handler(dataset, data_element):
File "c:\program files\python37\lib\site-packages\dicognito\datetimeanonymizer.py", line 43, in __call__
self._anonymize_date_and_time(dataset, data_element)
File "c:\program files\python37\lib\site-packages\dicognito\datetimeanonymizer.py", line 61, in _anonymize_date_and_time
if time_name in dataset:
UnboundLocalError: local variable 'time_name' referenced before assignment
Probably AppVeyor, I'm used to it.
Test against Python 2.7.
When I run dicognito with the -taco (or -:taco:) option, it should provide me with a delicious free taco.
Acceptable alternative: -sandwich
module | version |
---|---|
platform | Windows-10-10.0.18363-SP0 |
Python | 3.10.0 (tags/v3.10.0:b494f59, Oct 4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)] |
dicognito | 0.12.0 |
pydicom | 2.2.2 |
Error is like
Traceback (most recent call last):
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 525, in _convert_value
val.append
AttributeError: 'str' object has no attribute 'append'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\valuerep.py", line 752, in __new__
newval = super().__new__(cls, val)
ValueError: invalid literal for int() with base 10: 'K5JR4D4YWZN7'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\tag.py", line 28, in tag_in_exception
yield
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataset.py", line 2382, in walk
callback(self, data_element) # self = this Dataset
File "D:\Sandbox\dicognito\src\dicognito\anonymizer.py", line 127, in _anonymize_element
if handler(dataset, data_element):
File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 63, in __call__
if self._anonymize_mitra_global_patient_id(dataset, data_element):
File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 78, in _anonymize_mitra_global_patient_id
self._replace_id(data_element)
File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 87, in _replace_id
data_element.value = self._new_id(data_element.value)
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 463, in value
self._value = self._convert_value(val)
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 527, in _convert_value
return self._convert(val)
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 541, in _convert
return pydicom.valuerep.IS(val)
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\valuerep.py", line 755, in __new__
newval = super().__new__(cls, float(val))
ValueError: could not convert string to float: 'K5JR4D4YWZN7'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\blairyat\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\blairyat\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\Sandbox\dicognito\src\dicognito\__main__.py", line 206, in <module>
main()
File "D:\Sandbox\dicognito\src\dicognito\__main__.py", line 182, in main
anonymizer.anonymize(dataset)
File "D:\Sandbox\dicognito\src\dicognito\anonymizer.py", line 121, in anonymize
dataset.walk(self._anonymize_element)
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataset.py", line 2380, in walk
with tag_in_exception(tag):
File "C:\Users\blairyat\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\tag.py", line 32, in tag_in_exception
raise type(exc)(msg) from exc
ValueError: With tag (0031, 1020) got exception: could not convert string to float: 'K5JR4D4YWZN7'
Traceback (most recent call last):
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 525, in _convert_value
val.append
AttributeError: 'str' object has no attribute 'append'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\valuerep.py", line 752, in __new__
newval = super().__new__(cls, val)
ValueError: invalid literal for int() with base 10: 'K5JR4D4YWZN7'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\tag.py", line 28, in tag_in_exception
yield
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataset.py", line 2382, in walk
callback(self, data_element) # self = this Dataset
File "D:\Sandbox\dicognito\src\dicognito\anonymizer.py", line 127, in _anonymize_element
if handler(dataset, data_element):
File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 63, in __call__
if self._anonymize_mitra_global_patient_id(dataset, data_element):
File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 78, in _anonymize_mitra_global_patient_id
self._replace_id(data_element)
File "D:\Sandbox\dicognito\src\dicognito\idanonymizer.py", line 87, in _replace_id
data_element.value = self._new_id(data_element.value)
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 463, in value
self._value = self._convert_value(val)
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 527, in _convert_value
return self._convert(val)
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\dataelem.py", line 541, in _convert
return pydicom.valuerep.IS(val)
File "D:\Sandbox\dicognito\.venv\lib\site-packages\pydicom\valuerep.py", line 755, in __new__
newval = super().__new__(cls, float(val))
ValueError: could not convert string to float: 'K5JR4D4YWZN7'
… because we use it to pick a patient name.
Anonymizing a Deflated Explicit VR Little Endian (1.2.840.10008.1.2.1.99) file results in dicognito erroring out with
Error occurred while converting <_io.BytesIO object at 0x0000022DD4CEA160>. Aborting.
Traceback (most recent call last):
File "E:\Dev\dicognito\.venv\dicognito\Lib\site-packages\pydicom\tag.py", line 28, in tag_in_exception
yield
…
The "<_io.BytesIO object at 0x0000022DD4CEA160>" should be the input filename.
Occurs whether using --output-directory
or --in-place
.
…not unlike in SelfInitializingFakes.
I think it would be more intuitive to point to a certain directory and anonymize everything in it without needing a star.
Hey guys, I am currently working on an anonymizing tool for dicoms in typescript and am really appreciating your great work. As I was thinking about what and how to anonmyize i found this table that defines a standardized way to anonymize dicoms and was wondering: how did you define what tags you wanted to alter with dicognito and in which way? Did you follow NEMA guidelines in some way?
Example:
(0018, 1200) Date of Last Calibration DA: ['19900101', '19900101']
(0018, 1201) Time of Last Calibration TM: ['010000.000000', '010000.000000']
gives
Traceback (most recent call last):
File "c:\program files\python37\lib\site-packages\pydicom\tag.py", line 30, in tag_in_exception
yield
File "c:\program files\python37\lib\site-packages\pydicom\dataset.py", line 1354, in walk
callback(self, data_element) # self = this Dataset
File "c:\program files\python37\lib\site-packages\dicognito\anonymizer.py", line 114, in _anonymize_element
if handler(dataset, data_element):
File "c:\program files\python37\lib\site-packages\dicognito\datetimeanonymizer.py", line 42, in __call__
self._anonymize_date_and_time(dataset, data_element)
File "c:\program files\python37\lib\site-packages\dicognito\datetimeanonymizer.py", line 51, in _anonymize_date_and_time
old_date = datetime.datetime.strptime(date_value, date_format).date()
TypeError: strptime() argument 1 must be str, not MultiValue
When data on a given cohort is accumulated over long periods, users may wish to run dicognito in multiple passes in order to perform preliminary analyses on the partial dataset. It would be convenient to be able to checkpoint the Anonymizer
state so that patients seen in previous dicognito runs over the same cohort would have matching anonymized IDs.
Two options occur to me:
Anonymizer
and save it to a pickle file. I think this would make starting from a checkpoint 'equivalent' to running in a single pass. It would have the disadvantage of adding another file with sensitive data to manage.From the outside, it should be billed as a seed for randomness, even though inside Randomizer
, it's used as a salt when hashing values.
e.g.
0031 0011 28 | private_creator | LO | 1 | "MITRA LINKED ATTRIBUTES 1.0"
0031 1120 10 | Unknown element | Unkn | ? | "GPIAPCB136"
Hey @blairconrad,
I noticed, that you're loading in your datetimeanonymizer.py in both the methods _anonymize_date_and_time
and _anonymize_datetime
elements of a MultiValue into a list:
if isinstance(data_element.value, pydicom.multival.MultiValue):
datetimes = list([v for v in data_element.value])
but isn't the value of a MultiValue already a list?
later you store the altered values back concatenating them as a string:
new_dates_string = "\\".join(new_dates)
data_element.value = new_dates_string
(line 98-100)
data_element.value = "\\".join(new_datetimes)
(line 121)
but why? Shouldn't they be stored back as a list?
It's lying (at least) about us (only) supporting Python 3
if non-empty
Is there a way to explicitly exclude DICOM tags from anonymization? We use dicognito to anonymize exams for a trial, which works great, but we must retain some tags like StudyDate
, SeriesDate
or AcquisitionDate
. Currently, I reset them programmatically after the anoynomization, but I wonder if there is an option to leave those tags untouched by dicognito.
Try to anonymize (from the command line) a file that doesn't have an accession number. The anonymization works, but printing the summary fails:
λ dicognito .
Traceback (most recent call last):
File "C:\Program Files\Python37\Scripts\dicognito-script.py", line 11, in <module>
load_entry_point('dicognito==0.7.0', 'console_scripts', 'dicognito')()
File "c:\program files\python37\lib\site-packages\dicognito\__main__.py", line 102, in main
ConvertedStudy(dataset.AccessionNumber, dataset.PatientID, str(dataset.PatientName))
File "c:\program files\python37\lib\site-packages\pydicom\dataset.py", line 556, in __getattr__
return super(Dataset, self).__getattribute__(name)
AttributeError: 'FileDataset' object has no attribute 'AccessionNumber'
I fell to install dicognito via pip3.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.