Git Product home page Git Product logo

pdf2dcm's Introduction

pdf2dcm

PyPI version Supported Python versionsDownloads Downloads License: MITcodecovTest PipelineRelease Pipeline

PDF to DICOM Converter

A python package for PDF to Encapsulated DCM and PDF to DICOM RGB converter

SETUP

Python Package Setup

The python package is available for use on PyPI. It can be setup simply via pip

pip install pdf2dcm

To the check the setup, simply check the version number of the pdf2dcm package by

python -c 'import pdf2dcm; print(pdf2dcm.__version__)'

Poppler Setup

Poppler is a popular project that is used for the creation of Dicom RGB Secondary Capture. You can check if you already have it installed by calling pdftoppm -h in your terminal/cmd. To install poppler these are some of the recommended ways-

Conda

conda install -c conda-forge poppler

Ubuntu

sudo apt-get install poppler-utils

MacOS

brew install poppler

PDF to Encapsulated DCM

Usage

from pdf2dcm import Pdf2EncapsDCM

converter = Pdf2EncapsDCM()
converted_dcm = converter.run(path_pdf='tests/test_data/test_file.pdf', path_template_dcm='tests/test_data/CT_small.dcm', suffix =".dcm")
print(converted_dcm)
# [ 'tests/test_data/test_file.dcm' ]

Parameters converter.run:

  • path_pdf (str): path of the pdf that needs to be encapsulated
  • path_template_dcm (str, optional): path to template for getting the repersonalisation of data.
  • suffix (str, optional): suffix of the dicom files. Defaults to ".dcm".

Returns:

  • List[Path]: list of path of the stored encapsulated dcm

PDF to RGB Secondary Capture DCM

Usage

from pdf2dcm import Pdf2RgbSC

converter = Pdf2RgbSC()
converted_dcm = converter.run(path_pdf='tests/test_data/test_file.pdf', path_template_dcm='tests/test_data/CT_small.dcm', suffix =".dcm")
print(converted_dcm)
# [ 'tests/test_data/test_file_0.dcm', 'tests/test_data/test_file_1.dcm' ]

Parameters converter.run:

  • path_pdf (str): path of the pdf that needs to be converted
  • path_template_dcm (str, optional): path to template for getting the repersonalisation of data.
  • suffix (str, optional): suffix of the dicom files. Defaults to ".dcm".

Returns:

  • List[Path]: list of paths of the stored secondary capture dcm

Notes

  • The name of the output dicom is same as the name of the input pdf
  • If no template is provided no repersonalisation takes place
  • It is possible to produce dicoms without a suffix by simply passing suffix="" to the converter.run()

Repersonalisation

It is the process of copying over data regarding the identity of the encapsualted pdf from a template dicom. Currently, the fields that are repersonalised by default are-

  • PatientName
  • PatientID
  • PatientSex
  • StudyInstanceUID
  • SeriesInstanceUID
  • SOPInstanceUID

The fields SeriesInstanceUID and SOPInstanceUID have been removed from the repersonalization by copying as it violates the DICOM standards.

You can set the fields to repersonalize by passing repersonalisation_fields into Pdf2EncapsDCM(), or Pdf2RgbSC()

Example:

fields = [
    "PatientName",
    "PatientID",
    "PatientSex",
    "StudyInstanceUID",
    "AccessionNumber"
]
converter = Pdf2RgbSC(repersonalisation_fields=fields)

note: this will overwrite the default fields.

pdf2dcm's People

Contributors

a-parida12 avatar actions-user avatar dependabot[bot] avatar fraser29 avatar trkohler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pdf2dcm's Issues

[DICOM RGB SC] Merge all Pages

Currently, we are generating a single DCM output for a single page. We need the option to be able to generate a single DCM for all the pages.

Test the repersonalisation function

Currently, the code coverage report says that we have 100% coverage, except for the personalization function. We need to add a test in the base test file(test_01_base.py) to cover the method personalize_dcm in the BaseConverter classs.

ImageJ not able to open generated DICOM

Hi everyone,
first of all, thanks for the effort you are putting into this library.

I have a (coloured) PDF containing a medical report which I need to store back into PACS as a secondary capture in an automated way. Therefore I was looking for some pdf2dicom converters and I came across this repo.

I can successfully generate a dicom image from a pdf using your python package, but

  • ImageJ won't open the newly generated image ("Unable to decode DICOM header"). Is it a ImageJ "bug" or are there some problems with the dicom fields? The dicom fields were taken from an actual CT image that opens correctly via ImageJ
  • Horos will open the image, but it looks really weird (see attachment below). Am I missing something or maybe the conversion from RGB pdf is not implemented yet?

Here the link to the sample pdf I am trying to convert
Thanks

Andrea

view_from_horos

[RFE] Allow to pass file like object

Hello,

First of all, great work with the package, it is really helpful. I have a suggestion for improvement. When working with remote storage, using for example RaRe-Technologies/smart_open, or when you accept Dicom in a web application through the form, you may have the file(s) already in some variable, and to store them on the disk and pass the path may be a bit cumbersome.

pydicom is able to work with file-like object and does not need a file path, e.g

import pydicom
source_dcm = pydicom.dcmread(io.BytesIO(data.getvalue()))

It would be nice to have pdf2dcm API ready for such case. First shot at how it could look like

from pdf2dcm import Pdf2EncapsDCM
converter = Pdf2EncapsDCM()
encapsulated_pdf = converter.run(
    pdf=io.BytesIO(b"My pdf"), template_dcm=pydicom.dcmread(io.BytesIO(data.getvalue()), return_ds=True)
)

return_ds would prevent the file from being saved on the disk and it would return pydicom Dataset because sometimes you want to, for example, save some metadata to the database and you need to read the file repeatedly. Also, template_dcm could accept pydicom Dataset as well.

I would be glad for your feedback on these suggestions and I am willing to open the PR afterward. Of course, I will understand if this functionality is out of the scope of this package, but I think more people could benefit from it.

I am looking forward to your response!

Rado.

pdf2rgb row - column references are inverted

Dicoms produced through pdf2rgb are incorrect (row - column inversion problem)

Steps to produce error (note valid dicoms are produced but content is incorrect):

from pdf2dcm import Pdf2RgbSC
converter = Pdf2RgbSC()
converter.run('tests/test_data/test_file.pdf', 'tests/test_data/CT_small.dcm')

Inspect test_file_0.dcm in dicom viewer to see rows / cols swapped.

Steps to correct:
PIL img is width-x-height so lines 57-58 in pdf2rgb.py

ds.Rows = img.size[0]
ds.Columns = img.size[1]

should read:

        ds.Rows = img.size[1]
        ds.Columns = img.size[0]

This will produce correct dicoms.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.