suyashkumar / dicom Goto Github PK
View Code? Open in Web Editor NEW⚡High Performance DICOM Medical Image Parser in Go.
License: MIT License
⚡High Performance DICOM Medical Image Parser in Go.
License: MIT License
Proto might look something like
// Just a general idea:
message DataSet {
repeated Element elements = 1;
}
message Element {
VR vr = 1;
bytes tag = 2;
oneof value {
Sequence sequence = 3;
int64 integer = 4; // or maybe consider breaking out all the various possible int types
repeated int64 integers = 5;
float flt = 6;
repeated float flts = 7;
bytes bytes_value = 8;
string string_value = 9;
repeated string strings_list = 10;
}
}
message Sequence ...
// more to follow, general idea here.
I'm getting this error many times when run over our data collections. Looks like we're having some issues with several DICOMs we have.
Much of the high level parse logic comes from the upstream fork, but there are some ways this should be simplified (recursion instead of looping for sequence elements, more traditional golang error handling, a nicer interface, etc).
Don't have more details right now (just repasting some error messages here from our last run over many Ms of DICOMs), but essentially some of our DICOMs can'd be decoded b/c of the "EOF" message. Will dig deeper into the source code later.
It would be nice to deal with a single unified Frame
interface that wraps Encapuslated and Native frame data. Main thing it would do is try to abstract away a GetImage
method that just returns a standard image.Image
no matter what the underlying frame data.
Something like
type Frame interface {
// GetDefaultImage returns this frame as a standard golang image.Image
GetDefaultImage() image.Image
GetImage(opts frame.ImageOptions) image.Image
// Below methods needed to access raw frame data?
GetNativeFrame() (NativeFrame, error)
GetEncapsulatedFrame() (EncapsulatedFrame, error)
IsEncapsulated() bool
}
However there are many considerations:
GetImage
(more relevant for Native Frames). Should we just have a GetDefaultImage
? Users can also manipulate the image directly, but assumtions will need to be made about bitdepth and such at that point.NativeFrame
and EncapsulatedFrame
objects if they want to do their own post-processing or image renderingThe current top level dicom
api is pretty crowded with many public functions, interfaces, and struct definitions (and has historically grown more crowded across forks). It is reasonable to group some of these into subpackages.
~/go/src/dicom# make
dep ensure
dep: WARNING: Unknown field in manifest: prune
The dicoms provided by https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection are not standardized to have MetaElementGroupLength come first. The pydicom library, which the dataset was probably intended for use with, accounts for this by reading until the first non-0x0002 tag then rewinding.
Example error output with the RSNA dataset (error is returned from dicom.NewParserFromBytes()):
2020/01/01 21:39:41 error processing stage_2_train/ID_014d9a502.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014ddb831.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014dfa44a.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014e0a593.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
2020/01/01 21:39:41 error processing stage_2_train/ID_014e24cfc.dcm: failed to create dicom parser: MetaElementGroupLength not found; insteadfound (0002,0002) (file offset 166)
In #21, there were some major refactors, moving certain entities into standalone packages, which did introduce some stutter. While there are examples in the std lib of this (context.Context
), if there's some way to group or name packages to allow this to read better that would be preferrable.
The dicomutil CLI utility does not currently autoscale colors and intensities, but should probably do so.
Lots of opportunity to continue refactoring this code base!
The cli should be modified to be able to operate over a set of DICOMs that reside in a provided directory tree.
One could imagine using dicomutil
to iterate over a set of DICOMs to process them in some way--say, converting them into tf.Example
s or dumping their metadata to CSV, or writing out DICOM frames to disk as jpg
s.
A deterministic output structure would be required--a possibility is dumping the outputs in a separate output folder that would mimic the same directory structure as the input folder.
Need to look into why we are setting global errors on the Decoder
instead of following a return error pattern.
Another issue when scanning many DICOMs:
Encountered odd length (vl=13945) when reading explicit VR SQ for tag (6003,1010)[private] (file offset 1752)
Encountered odd length (vl=27975) when reading explicit VR SQ for tag (6003,1010)[private] (file offset 1752)
Encountered odd length (vl=286383071) when reading implicit VR 'UN' for tag (1111,11e3)[private] (file offset 5382)
Encountered odd length (vl=419395) when reading explicit VR OW for tag (6000,3000)[??] (file offset 13496)
I would like to explore the options for building a GUI client as well as annotated tools in golang.
Is there any more info you have about the jpg files ? Are they in some special format ?
https://github.com/suyashkumar/dicom/blob/master/dicomutil/dicomutil.go#L132
I've hit this issue with 2 many DICOM files coming from 2 different providers/countries.
PyDICOM works fine in this case.
Looks like pydicom considers it as a normal encoding. I think we should too.
python_encoding = {
# default character set for DICOM
'': default_encoding,
# alias for latin_1 too (iso_ir_6 exists as an alias to 'ascii')
'ISO_IR 6': default_encoding,
'ISO_IR 13': 'shift_jis',
'ISO_IR 100': 'latin_1',
(edit): Same issue with: Unknown character set 'ISO_IR 192'. Assuming utf-8 (file offset 348)
In NewParserFromFile the optional error from NewParser should be checked prior to assigning parser.file in case the returned parser is nil (see here).
or comment here and let us know your use cases and if you're using this for research, and if there are any features you might want!
At some point there seems to have been an inadvertent find and replace that snuck in to change %d to %decoder. Example:
Line 103 in 668a92a
Triage old issues I opened over at https://github.com/gradienthealth/dicom/issues and migrate as appropriate.
Benchmarks.
A potentially useful feature for the included CLI (and library) could be converting DICOMs (frames + metadata) into neat tf.Example
protocol buffers for downstream use in Tensorflow.
memory usage has been growing when i use in webassembly
Is this a bug?
Results after too many dicom files (in my case 1024) have attempted to be opened and failed.
The issue seems to results from the function NewParserFromFile()
in Parse.go. Each time the function is called, it opens a new file, but that file is not closed in the function or the termination of the function, but rather is indirectly closed later by the code execution following p.(*parser).file = file
. The result of this is that files that incur an error are never closed because they return before that line, and once the max number of errored files are open, the program spits out this error for every subsequent file.
I'm in the process of submitting a fix. I'll add a pull request soon.
Line 33 in b201728
I think it should be
i.SetGray16(j%n.Cols, j/n.Cols, color.Gray16{Y: uint16(n.Data[j][0])})
Is it right?
ERROR ReadBytes: requested 2, available 0 (file offset 65274)
I haven't had a chance to look into it yet, but I've attached 3 example files and will try to get more info sometime soon.
Data Citation:
Kinahan, Paul; Muzi, Mark; Bialecki, Brian; Coombs, Laura. (2017). Data from ACRIN-FLT-Breast. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2017.ol20zmxg
When encoding an element, a VR not matching what is specified in the standard triggers an error.
For tags where the standard is ambiguous or allows more than one VR type, this is problematic. For example, (0028,0120) Pixel Padding Value somewhat frequently appears as SS, rather than US, which will trigger an error.
One way to solve this would be to allow encoding of elements without VR validation, which also might be useful for situations where writing of non compliant DICOM files might be necessary.
When refactoring consider incorporating a tool like GolangCI-Lint
This would improve/enforce the readability for new-comers. Apart from that, excellent project BTW, its either this or https://github.com/Enet4/dicom-rs
Across the many forks and contributions of this project, different enum naming conventions were introduced. Standardize and unify.
Provide an optional API that callers can use to work with nested Sequence items easier. This will probably be more allocations and copying than just iterating over the Value
in the SQ element and using type assertion, but the caller has the option to do either (use the nicer API or go with the optimization).
Maybe I can also provide a callback based helper iterator to help callers go iterate more efficiently using recursion and type assertion (need to look more into how efficient type assertion from interface{}
is, but my understanding is that it's pretty efficient in recent versions of go) .
Something like
func IterateOverSequence(root *element.Element, cb func(*element.Element)) {
...
}
Hit another issues when scanning DICOMs:
Expect Item in pixeldata but found tag (00fd,f9ff)[private] (file offset 5384)
Expect Item in pixeldata but found tag (1a1a,1a1a)[??] (file offset 5390)
Did you get rid of the Implementations in github.com/grailbio/go-netdicom?
Would like to switch from grailbio, but dont want use both.
Maybe i can implement a separate dcm server depending on your implementation.
http://www.dclunie.com has many interesting imaging datasets to test against. We should also consider possibly generating a synthetic test set of dicoms and sharing them.
I have an XRay DICOM failing to initialize the parser b/c of:
Keyword 'DICM' not found in the header (file offset 132)
I can dump it with with fine dcmtk
[dcmdump]. Good news is that the competition fails as well:
Traceback (most recent call last):
File "../bench/dicom_bench.py", line 29, in <module>
ds = pydicom.dcmread(fn)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pydicom/filereader.py", line 870, in dcmread
force=force, specific_tags=specific_tags)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pydicom/filereader.py", line 667, in read_partial
preamble = read_preamble(fileobj, force)
File "/home/ubuntu/.local/lib/python3.6/site-packages/pydicom/filereader.py", line 620, in read_preamble
raise InvalidDicomError("File is missing DICOM File Meta Information "
pydicom.errors.InvalidDicomError: File is missing DICOM File Meta Information header or the 'DICM' prefix is missing from the header. Use force=True to force reading.
API like:
dicomutil --extract-images https://suyashkumar.com/test.dcm
The user must trust the entity serving the URL (and cert).
Another issue resulting from a scan around several millions of DICOMs. We have to dig deeper around what is causing that.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.