kitwaremedical / dicom-anonymizer Goto Github PK
View Code? Open in Web Editor NEWTool to anonymize DICOM files according to the DICOM standard
License: BSD 3-Clause "New" or "Revised" License
Tool to anonymize DICOM files according to the DICOM standard
License: BSD 3-Clause "New" or "Revised" License
I used the following JSON file:
{
"(0x0010, 0x0010)": {
"action": "regexp",
"find": ".",
"replace": "ID001^ID002"
},
"(0x0010, 0x0020)": {
"action": "regexp",
"find": ".",
"replace": "ID003"
},
}
In the anonymized DICOM files, the PatientID tag gets set to ID003ID003 (i.e. it is duplicated).
The PatientName tag is similarly duplicated and set to ID001^ID002ID001^ID002
I'm seeing this:
$ dicom-anonymizer E29171854 anons
1%|▌ | 75/13460 [00:10<32:50, 6.79it/s]
Traceback (most recent call last):
File "/cm/shared/anaconda3/bin/dicom-anonymizer", line 33, in <module>
sys.exit(load_entry_point('dicom-anonymizer==1.0.9', 'console_scripts', 'dicom-anonymizer')())
File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/anonymizer.py", line 175, in m
ain
File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/anonymizer.py", line 64, in an
onymize
File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/simpledicomanonymizer.py", lin
e 305, in anonymize_dicom_file
File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/simpledicomanonymizer.py", lin
e 413, in anonymize_dataset
File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/simpledicomanonymizer.py", lin
e 236, in delete_or_empty_or_replace_UID
File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/simpledicomanonymizer.py", lin
e 142, in empty_element
File "/cm/shared/anaconda3/lib/python3.9/site-packages/dicom_anonymizer-1.0.9-py3.9.egg/dicomanonymizer/simpledicomanonymizer.py", lin
e 144, in empty_element
NotImplementedError: Not anonymized. VR IS not yet implemented.
1%|▌
What can we do about this?
How can I change only 1 DICOM tag from the command line ?
If I do:
dicom-anonymizer InputPath OutputPath --dictionary pathToMyDict.json
then all the tags are anonymized in addition to the tags I list in my dictionary.
Hello,
Thank you for this great project.
While using it in our codebase, I have found the following issue.
anonymize_dataset fails if the dataset contains RawDataElement:
element.value = new_value
fails. In my case, it's in replace_element_UID (https://github.com/KitwareMedical/dicom-anonymizer/blob/master/dicomanonymizer/simpledicomanonymizer.py#L96) but I believe that all other rules have the same issue.I would be happy to create a PR to fix this issue here too.
I see two possibilities:
I'm partial to solution 1.:
- it's simple and easy to understand and review.
- it makes it easier for the user to add custom-rules since they are able to assume that all elements are RawDataElement, and they can use the simpler: element.value = new_value syntax.
- however, it also means that input dataset are walked-through twice. I feel that this price is worth paying.
Best
Guillaume
Hi,
It would be great if you could also add a conda package in addition to the pip. It would make constructing complex dependencies (especially those relying on c++ libraries) much easier.
I am happy to work on it if you want. If you could publish the sdist of this package in pip, I can take care of the rest.
Cheers,
Sarthak
I noticed that the anonymizer, when used via Python for a newer MR software version with modified DICOM header (modified with respect to the older versions), is significantly slower than for older MR software versions. I have attached an excerpt from the terminal:
100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 47.61it/s]
2023-08-23 09:00:00 dcm_anon INFO File 1.3.12.2.1107.5.2.36.40414.201712181.dcm with software version syngo MR B19.
100%|█████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.83it/s]
2023-08-23 09:01:51 dcm_anon INFO File 1.3.12.2.1107.5.2.50.176395.20230531.dcm with software version syngo MR XA50.
This is the same imaging sequence, just on a different scanner with a newer software version.
Dear KitwareMemdical,
I have a DICOMDIR that still contains the PatientName after running it through the default anonymizer!
Using the version from 1690b78 (installed via pip install git+https...) and pydicom version 2.3.1, I noticed that I have encountered a DICOMDIR file that contains a DirectoryRecordSequence with patient data and it looks like:
Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length UL: 180
(0002, 0001) File Meta Information Version OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID UI: Media Storage Directory Storage
(0002, 0003) Media Storage SOP Instance UID UI: 2.25.297764926861898021974262533209051862847
(0002, 0010) Transfer Syntax UID UI: Explicit VR Little Endian
(0002, 0012) Implementation Class UID UI: 1.2.276.0.45.1.1.0.71.20130122
(0002, 0013) Implementation Version Name SH: 'DicomWeb_71'
-------------------------------------------------
(0004, 1130) File-set ID CS: 'VISAGECS_MEDIA'
(0004, 1200) Offset of the First Directory Recor UL: 406
(0004, 1202) Offset of the Last Directory Record UL: 406
(0004, 1212) File-set Consistency Flag US: 0
(0004, 1220) Directory Record Sequence 1 item(s) ----
(0004, 1400) Offset of the Next Directory Record UL: 0
(0004, 1410) Record In-use Flag US: 65535
(0004, 1420) Offset of Referenced Lower-Level Di UL: 0
(0004, 1430) Directory Record Type CS: 'PATIENT'
(0008, 0005) Specific Character Set CS: 'ISO_IR 100'
(0010, 0010) Patient's Name PN: 'NOT^ANONYM'
(0010, 0020) Patient ID LO: '12345678'
(0010, 0030) Patient's Birth Date DA: '19000101'
------—-
Note, I manually changed the patient data in order not to disclose anything here. The DirectoryRecordSequence had over 700 entries, so for debugging, I removed all but one entry to minimally demo the issue.
I ran dicomanonymizer.anonymize_dicom_file('DICOMDIR', 'DICOMDIR-anon')
and expected the patient name etc. to be removed or replaced.
Instead, when I then pydicom.read_file('/DICOMDIR-anon')
I got:
Dataset.file_meta -------------------------------
(0002, 0000) File Meta Information Group Length UL: 180
(0002, 0001) File Meta Information Version OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID UI: Media Storage Directory Storage
(0002, 0003) Media Storage SOP Instance UID UI: 2.25.172643117625232586517094341815358543841
(0002, 0010) Transfer Syntax UID UI: Explicit VR Little Endian
(0002, 0012) Implementation Class UID UI: 1.2.276.0.45.1.1.0.71.20130122
(0002, 0013) Implementation Version Name SH: 'DicomWeb_71'
-------------------------------------------------
(0004, 1130) File-set ID CS: 'VISAGECS_MEDIA'
(0004, 1200) Offset of the First Directory Recor UL: 406
(0004, 1202) Offset of the Last Directory Record UL: 406
(0004, 1212) File-set Consistency Flag US: 0
(0004, 1220) Directory Record Sequence 1 item(s) ----
(0004, 1400) Offset of the Next Directory Record UL: 0
(0004, 1410) Record In-use Flag US: 65535
(0004, 1420) Offset of Referenced Lower-Level Di UL: 0
(0004, 1430) Directory Record Type CS: 'PATIENT'
(0008, 0005) Specific Character Set CS: 'ISO_IR 100'
(0010, 0010) Patient's Name PN: 'NOT^ANONYM'
(0010, 0020) Patient ID LO: '12345678'
(0010, 0030) Patient's Birth Date DA: '19000101'
---------
The Instance UID changed, but not the Patient data within the sequence (of len 1 here). I am not allowed to provide the full original file but I think this smaller one reproduces the issue.
Please feel free to include this file in your test suite.
Let me know if I can be helpful here.
Any quick fix idea would be welcome.
thanks,
Samuel
Currently, tags in the meta information header (0x0002 group) are not applied. This fixes that by applying the action to the file_meta dataset instead.
#18
Hello,
Is it possible to anonymize all DT tags without having to specify each tag?
Thanks
How can I set one tag to a specific value (no regexp) from the command line?
e.g.
dicom-anonymizer InputPath OutputPath -t (0x0010,0x0010) DOE^JOHN
Hi!
I have started looking into the codes and have sent one PR, and noticed the current code base might be under some code formatter/linter other than ruff or black which are common these days.
How about introduce one of them like pydicom and make it as this project's default?
Even today some contributors have changed single quote to double qoute to surround string, which isn't consistent.
element.value = re.sub(options['find'], options['replace'], element.value)
fails when element is "Patient's name (0x0010,0x0010)"
The following works though:
element.value = re.sub(options['find'], options['replace'], str(element.value))
See title. If not, any future plans to support this?
Thanks!
I've tried your example: https://github.com/KitwareMedical/dicom-anonymizer#customoverrides-actions, but I always get the error message:
NameError: name 'allTags' is not defined
Can someone explain this to me?
I would like to use the following function:
def setupSeriesDescription(dataset, tag, value):
r'''
Modify the series description by adding a suffix
'''
element = dataset.get(tag)
if element is not None:
element.value = element.value + '-' + value
and then use them as follows:
def anonymize_dicom(src_path, dst_path):
# List the files' names that we want to extract data from
dicom_files = glob.glob(os.path.join(src_path, "**" ,'*.dcm'), recursive = True)
# Iterate over each DICOM file in the folder and read it using dcmread()
for file_path in dicom_files:
# dictionary which map your functions to a tag
extraAnonymizationRules = {}
if True:
# series description
extraAnonymizationRules[(0x0008, 0x103E)] = setupSeriesDescription
# Launch the anonymization and delete all private tags
dcm = anonymize(file_path, dst_path, extraAnonymizationRules, deletePrivateTags=True)
How can I now use the variable value
to be able to append something to the series description that differs from dataset to dataset?
A --version
flag would be useful.
Dear all,
Here is my DICOM file:
https://sourceforge.net/p/gdcm/gdcmdata/ci/2bddc5695f2482ee3f4d92db7de2348b816fe64c/tree/MR-SIEMENS-DICOM-WithOverlays.dcm
I run:
dicom-anonymizer MR-SIEMENS-DICOM-WithOverlays.dcm output.dcm
I got these error messages:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 763, in get
key = Tag(key)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/tag.py", line 79, in Tag
raise ValueError("Tag must be an int or a 2-tuple")
ValueError: Tag must be an int or a 2-tuple
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/tag.py", line 28, in tag_in_exception
yield
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 2216, in walk
callback(self, data_element) # self = this Dataset
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 387, in range_callback
action(dataset, tag)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 164, in delete
element = dataset.get(tag)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 765, in get
raise TypeError("Dataset.get key must be a string or tag") from exc
TypeError: Dataset.get key must be a string or tag
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/bin/dicom-anonymizer", line 8, in
sys.exit(main())
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/anonymizer.py", line 161, in main
anonymize(input_path, output_path, new_anonymization_actions, not args.keepPrivateTags)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/anonymizer.py", line 50, in anonymize
anonymize_dicom_file(input_files_list[cpt], output_files_list[cpt], anonymization_actions, deletePrivateTags)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 295, in anonymize_dicom_file
anonymize_dataset(dataset, extra_anonymization_rules, delete_private_tags)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 393, in anonymize_dataset
dataset.walk(range_callback)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 2222, in walk
dataset.walk(callback)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in exit
self.gen.throw(type, value, traceback)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/tag.py", line 32, in tag_in_exception
raise type(exc)(msg) from exc
TypeError: With tag (6000, 3000) got exception: Dataset.get key must be a string or tag
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 763, in get
key = Tag(key)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/tag.py", line 79, in Tag
raise ValueError("Tag must be an int or a 2-tuple")
ValueError: Tag must be an int or a 2-tuple
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/tag.py", line 28, in tag_in_exception
yield
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 2216, in walk
callback(self, data_element) # self = this Dataset
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 387, in range_callback
action(dataset, tag)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 164, in delete
element = dataset.get(tag)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydicom/dataset.py", line 765, in get
raise TypeError("Dataset.get key must be a string or tag") from exc
TypeError: Dataset.get key must be a string or tag
Plese help, thank you.
Part 16 CID 7050 lists various De-identification Methods:
Apart from the methods that impact pixel data cleaning, the rest of the methods are documented in the Application Level Confidentiality Profile Attributes (Part 15 Table E.1-1):
Feature request:
dicom-anonymizer
could allow entering a list of the methods/attributes to use as presets and override the basic profileThe following code in examples/anonymize_extra_rules.py
encourages incorrect anonymization by not removing series description. That tag should be removed! We should create the example using a different tag:
def setup_series_description(dataset, tag):
element = dataset.get(tag)
if element is not None:
element.value = f'{element.value}-{args.suffix}'
I'm seeing:
(useful) ddrucker@mic-dicom-router-mercure:~$ dicom-anonymizer E12034344 anon
0%| | 0/3612 [00:00<?, ?it/s]/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '1.1.16.7.6707.3.3.60.06253.2332294204858050509242727.5.1.9'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
warnings.warn(msg)
/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '7.6.013.026981.5934635445.0220487836.2'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
warnings.warn(msg)
/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '5.1.59.6.4683.0.4.60.82574.0199929790689920395106101.5.8.8'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
warnings.warn(msg)
/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '3.9.59.4.0298.8.4.72.68889.658132895322857682744131'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
warnings.warn(msg)
0%| | 2/3612 [00:00<03:52, 15.55it/s]/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '0.2.08.6.9364.4.6.39.26563.3829691537983094193002813.7.1.1'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
warnings.warn(msg)
0%| | 5/3612 [00:00<02:50, 21.12it/s]/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '5.6.63.5.3208.2.0.67.68311.0344948525748214621517980'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
warnings.warn(msg)
0%| | 8/3612 [00:00<02:32, 23.65it/s]/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '5.3.40.7.9556.8.5.07.50254.3682606057412565359545607'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
warnings.warn(msg)
/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '2.3.42.7.0184.5.4.49.35639.4066135185099479761379104'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
warnings.warn(msg)
/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/valuerep.py:290: UserWarning: Invalid value for VR UI: '8.4.08.0.3756.8.9.48.29829.1828695589432128234084115'. Please see <https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_6.2-1> for allowed values for each VR.
warnings.warn(msg)
0%|▏ | 12/3612 [00:00<02:09, 27.87it/s]Traceback (most recent call last):
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 762, in get
key = Tag(key)
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/tag.py", line 84, in Tag
raise ValueError("Tag must be an int or a 2-tuple")
ValueError: Tag must be an int or a 2-tuple
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/tag.py", line 28, in tag_in_exception
yield
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 2390, in walk
callback(self, data_element) # self = this Dataset
File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 387, in range_callback
action(dataset, tag)
File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 164, in delete
element = dataset.get(tag)
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 764, in get
raise TypeError("Dataset.get key must be a string or tag") from exc
TypeError: Dataset.get key must be a string or tag
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ddrucker/useful/bin/dicom-anonymizer", line 8, in <module>
sys.exit(main())
File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/anonymizer.py", line 161, in main
anonymize(input_path, output_path, new_anonymization_actions, not args.keepPrivateTags)
File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/anonymizer.py", line 50, in anonymize
anonymize_dicom_file(input_files_list[cpt], output_files_list[cpt], anonymization_actions, deletePrivateTags)
File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 295, in anonymize_dicom_file
anonymize_dataset(dataset, extra_anonymization_rules, delete_private_tags)
File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 393, in anonymize_dataset
dataset.walk(range_callback)
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 2396, in walk
dataset.walk(callback)
File "/usr/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/tag.py", line 32, in tag_in_exception
raise type(exc)(msg) from exc
TypeError: With tag (6000, 3000) got exception: Dataset.get key must be a string or tag
Traceback (most recent call last):
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 762, in get
key = Tag(key)
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/tag.py", line 84, in Tag
raise ValueError("Tag must be an int or a 2-tuple")
ValueError: Tag must be an int or a 2-tuple
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/tag.py", line 28, in tag_in_exception
yield
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 2390, in walk
callback(self, data_element) # self = this Dataset
File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 387, in range_callback
action(dataset, tag)
File "/home/ddrucker/useful/lib/python3.8/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 164, in delete
element = dataset.get(tag)
File "/home/ddrucker/useful/lib/python3.8/site-packages/pydicom/dataset.py", line 764, in get
raise TypeError("Dataset.get key must be a string or tag") from exc
TypeError: Dataset.get key must be a string or tag
0%|▏ | 12/3612 [00:00<02:29, 24.13it/s]
(useful) ddrucker@mic-dicom-router-mercure:~$
Found while doing unit testing test-SR.dcm
from pydicom's test files:
dicomanonymizer\simpledicomanonymizer.py:440: in anonymize_dataset
action(dataset, tag)
dicomanonymizer\simpledicomanonymizer.py:134: in replace
replace_element(element)
dicomanonymizer\simpledicomanonymizer.py:122: in replace_element
replace_element(sub_element)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
element = RawDataElement(tag=(0040, a010), VR='CS', length=16, value=b'HAS OBS CONTEXT ', value_tell=16, is_implicit_VR=False, is_little_endian=True, is_raw=True)
def replace_element(element):
"""
Replace element's value according to it's VR:
- LO, LT, SH, PN, CS, ST, UT: replace with 'Anonymized'
- UI: cf replace_element_UID
- DS and IS: value will be replaced by '0'
- FD, FL, SS, US, SL, UL: value will be replaced by 0
- DA: value will be replaced by '00010101'
- DT: value will be replaced by '00010101010101.000000+0000'
- TM: value will be replaced by '000000.00'
- UN: value will be replaced by b'Anonymized' (binary string)
- SQ: call replace_element for all sub elements
See https://laurelbridge.com/pdf/Dicom-Anonymization-Conformance-Statement.pdf
"""
if element.VR in ('LO', 'LT', 'SH', 'PN', 'CS', 'ST', 'UT'):
> element.value = 'Anonymized'
E AttributeError: can't set attribute
dicomanonymizer\simpledicomanonymizer.py:108: AttributeError
If pointed at a directory which contains any non-dicom files, dicom-anonymizer dies with:
Traceback (most recent call last):
File "/usr/local/bin/dicom-anonymizer", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/site-packages/dicomanonymizer/anonymizer.py", line 161, in main
anonymize(input_path, output_path, new_anonymization_actions, not args.keepPrivateTags)
File "/usr/local/lib/python3.6/site-packages/dicomanonymizer/anonymizer.py", line 50, in anonymize
anonymize_dicom_file(input_files_list[cpt], output_files_list[cpt], anonymization_actions, deletePrivateTags)
File "/usr/local/lib/python3.6/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 298, in anonymize_dicom_file
dataset = pydicom.dcmread(in_file)
File "/usr/local/lib/python3.6/site-packages/pydicom/filereader.py", line 888, in dcmread
force=force, specific_tags=specific_tags)
File "/usr/local/lib/python3.6/site-packages/pydicom/filereader.py", line 670, in read_partial
preamble = read_preamble(fileobj, force)
File "/usr/local/lib/python3.6/site-packages/pydicom/filereader.py", line 623, in read_preamble
raise InvalidDicomError("File is missing DICOM File Meta Information "
pydicom.errors.InvalidDicomError: File is missing DICOM File Meta Information header or the 'DICM' prefix is missing from the header. Us
e force=True to force reading.
I'd like it to either just throw a warning by default, or if that's considered too dangerous, at least have a flag that lets us do it.
I was trying this project on Python 3.12 and discovered that it has removed bundled setuptools
and with it pkg_resources
. This breaks the recent version information change. What is the right way to fix this? I see these paths:
setuptools
to list of required packages to keep supporting old versions of pythonimportlib.metadata
to import version
. This was probably "provisional" for 3.8 and 3.9 and became available in 3.10 linkI volunteer to make these changes, but need guidance!
The strategy question: what is your recommendation for maintaining compatibility with older versions of Python?
It seems the new version of Pydicom
will require at least 3.10 version of python.
I found this while looking at pytest failures for "color3d_jpeg_baseline.dcm" file from pydicom
's test files. The element (0020,0244) was not getting deleted. The tag is listed in X_TAGS
.
The code for delete_element
calls replace_element_date
for VR=='DA'
, even if the element is supposed to be deleted. I cannot figure out why that is all right. Please help!
In addition to 'X', I see 'K' and 'C' listed in that row of PS 3.15: Table E-1.1. What do these mean?
test_cli.py
pydicom/data/data_manager.py:112: in get_external_sources
from pkg_resources import iter_entry_points
E ModuleNotFoundError: No module named 'pkg_resources'
pydicom
. The latest version does not use pkg_resources
, changed in Commit f5eeeeHi, I'm just starting to investigate the use of this tool as we have been relying on https://mircwiki.rsna.org/index.php?title=MIRC_CTP_Articles for years now, but it has some shortcomings that have led me to explore alternatives. In particular, it is not being very actively developed anymore and it has been extended to support a huge number of use cases over the years. This makes it far more powerful than what we need for most of our data submitters, which also results in a lot more complexity in using it. It would be great if I could setup a meeting with someone from the Kitware team to understand your short and long term plans for maintaining this repository and discuss potential collaboration opportunities.
I love that you've approached this by mirroring the different de-id profiles and options defined in the DICOM standard. However, it doesn't appear that you are currently supporting the "Retain Longitudinal With Modified Dates Option" at the moment if you only support keeping or deleting dates. Let me know if I've missed something, but this is pretty critical to most de-identification use cases. Dates are PHI (so you can't keep them), but it generates useless DICOM if you delete them entirely since you lose all understanding the various timepoints for your patients.
CTP has a variety of approaches to this which you may want to emulate. The DateInterval and the IncrementDate functions are the ones we use most so I would advocate them as the best candidates to implement in dicom-anonymizer.
In any case, hope we can discuss further sometime soon.
Best,
Justin
It looks like 1.0.12 was released a few days ago, but it is still not on PyPI.
In dicomfields, you have three tags with 4 indices:
E.g.
(0x5000, 0x0000, 0xFF00, 0x0000), # Curve Data
but pydicom.dataset.get()
has the following interface:
get(key: Union[int, Tuple[int, int], pydicom.tag.BaseTag], default: Optional[object] = 'None')
so it doesn't recognize 4 ints as a tag.
TypeError: Dataset.get key must be a string or tag
Are those three special tags common? :)
Edit:
I see that they are deleted without calling get()
. But in anonymize_data()
you call .get()
on it and then print when it fails. So that happens for every file. So perhaps that part should not be run for tags with len > 2?
Edit 2:
I refined my idea for a solution in a PR #18
Hi there!
I'd like to propose making get_UID
pluggable so that users can implement their own version without affecting others.
In certain cases, it is necessary to keep track of UID changes by recording the UIDs before and after modifications over a series of anonymizations done in separate processes. Additionally, some users might want to use a specific prefix to generate their own UIDs. The current implementation doesn't allow this since get_UID
is tightly coupled with the code, and its dictionary is volatile.
If this proposal is acceptable, I am willing to raise a PR for review. I would like to hear feedback on whether this change is agreeable.
dicom-anonymizer seems to be deleting some dicom fields, even if asked to keep them:
$ dcmdump input.dcm | grep 0029,0010
(0029,0010) LO [SIEMENS CSA HEADER] # 18, 1 PrivateCreator
$ dicom-anonymizer input.dcm output.dcm -t '(0x0029,0x1010)' keep
100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 18.67it/s]
$ dcmdump output.dcm | grep 0029,0010
$
This is super odd.
When I anonymize a SRe file (SRe.1.3.12.2.1107.5.2.43.66094.30000020021213353802600000201
), I get:
Traceback (most recent call last):
File "/home/ddrucker/venvs/test/bin/dicom-anonymizer", line 10, in <module>
sys.exit(main())
File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/anonymizer.py", line 144, in main
anonymize(InputPath, OutputPath, newAnonymizationActions)
File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/anonymizer.py", line 41, in anonymize
anonymizeDICOMFile(inputFilesList[cpt], outputFilesList[cpt], anonymizationActions)
File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 241, in anonymizeDICOMFile
anonymizeDataset(dataset, extraAnonymizationRules)
File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 253, in anonymizeDataset
action(dataset, tag)
File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 126, in delete
deleteElement(dataset, element) # element.tag is not the same type as tag.
File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 110, in deleteElement
deleteElement(subDataset, subElement)
File "/home/ddrucker/venvs/test/lib/python3.7/site-packages/dicomanonymizer/simpledicomanonymizer.py", line 109, in deleteElement
for subElement in subDataset.elements():
AttributeError: 'int' object has no attribute 'elements'
Now, here's the odd part.
Here's the code where it dies:
def deleteElement(dataset, element):
if element.VR == 'DA':
replaceElementDate(element)
elif element.VR == 'SQ':
for subDataset in element.value:
for subElement in subDataset.elements(): ### dying here
deleteElement(subDataset, subElement)
else:
del dataset[element.tag]
So I added a print statement:
def deleteElement(dataset, element):
if element.VR == 'DA':
replaceElementDate(element)
elif element.VR == 'SQ':
for subDataset in element.value:
print(element.value) ### add this line
for subElement in subDataset.elements():
deleteElement(subDataset, subElement)
else:
del dataset[element.tag]
And now it doesn't fail - it works fine.
My best guess is that printing element.value forces an enumeration which actually changes it - like maybe something without a value is forced to have an empty one instead?
Part 15 E.1-1 table differs greatly between the 2013 standard (referenced in the README.md and listed in dicomfields.py) and the current standard:
Notes:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.