mit-lcp / wfdb-python Goto Github PK

View Code? Open in Web Editor NEW

726.0 51.0 296.0 167.17 MB

Native Python WFDB package

License: MIT License

Jupyter Notebook 82.48% Python 17.52%

physionet ecg wfdb python ekg

wfdb-python's Introduction

The WFDB Python Package

Introduction

A Python-native package for reading, writing, processing, and plotting physiologic signal and annotation data. The core I/O functionality is based on the Waveform Database (WFDB) specifications.

This package is heavily inspired by the original WFDB Software Package, and initially aimed to replicate many of its command-line APIs. However, the projects are independent, and there is no promise of consistency between the two, beyond each package adhering to the core specifications.

Documentation and Usage

See the documentation site for the public APIs.

See the demo.ipynb notebook file for example use cases.

Installation

The distribution is hosted on PyPI at: https://pypi.python.org/pypi/wfdb/. The package can be directly installed from PyPI using either pip or poetry:

pip install wfdb
poetry add wfdb

On Linux systems, accessing compressed WFDB signal files requires installing libsndfile, by running sudo apt-get install libsndfile1 or sudo yum install libsndfile. Support for Apple M1 systems is a work in progess (see bastibe/python-soundfile#310 and bastibe/python-soundfile#325).

The development version is hosted at: https://github.com/MIT-LCP/wfdb-python. This repository also contains demo scripts and example data. To install the development version, clone or download the repository, navigate to the base directory, and run:

# Without dev dependencies
pip install .
poetry install

# With dev dependencies
pip install ".[dev]"
poetry install -E dev

# Install the dependencies only
poetry install -E dev --no-root

See the note about dev dependencies.

Developing

Please see the DEVELOPING.md document for contribution/development instructions.

Citing

When using this resource, please cite the software publication on PhysioNet.

wfdb-python's People

Contributors

Stargazers

Watchers

Forkers

chenyao808 karolciba fedda1993 amit4111989 williamsdoug alinizam naghayev pawel-dubiel embeddedsamurai lfliu dubrzr kbrose roszcz oz-r sawon1234 nospoko ankeetshkk jcbsv samamor dominiquemakowski siucaan yongfu-li jazzman37 brvier jskdr jianning-li peterzhousz kelvinzch hiredd faraway-kapok mishraanup dukyongyoon lihaossu honorforlee vivihengww nichealpham ddcw-xufangwei cyustcer yanshuolee lkjell rocapp allenlzcoder zijianding gogowenzhang qinfeng tusharsatya woodballhead hdulazyman dikshajadhav maidens tomfisher daspapierbirke engineerkhan waqaraziz123 haifanwen jessicarryly ieee820 svioletta tiagoooliveira natsusun1 mrwhyy achuthpv codenergy saqibm128 jinzhongyi ray201 atpage jsdjsd afcarl robnicholls jjongjjong wsan70 kdavasli hengjaywang sram022 katarinaslama matthewjurow seulcx roxxrschach missdu dk538 lbishal matthewstidham a0155077 dokwa weitann koalary burtlin jigar786 gabriel-azevedo-ferreira stanleycai123 wumaster samantha-fu danigoland amirunpri2018 aiwiscal scoodood yanyan-cas xiaoxiaobear dusanka

wfdb-python's Issues

Create a release on Github for 0.1.1

@cx1111 now that the first release is on PyPi (https://pypi.python.org/pypi/wfdb/0.1.1), the release should be marked on GitHub too: https://github.com/MIT-LCP/wfdb-python/releases. We could create a development branch for v0.1.2 now too and then merge into the master when it is ready?

How to read the header file?

How to read the header file from physiobank databases?

To do list for rdann

sampfrom and sampto
annotation samples into times. Includes readheader?

Downloads fail on Windows

Main problem is prevalence of using os.path.join() to create URLs, on Windows this will use backslashes to join the arguments instead of forward slash, ending up with errors like

>>> import wfdb; import os; wfdb.dldatabasefiles('ltstdb', os.getcwd(), ['s20011.hea', 's20011.dat'], overwrite=True)
Downloading files...
multiprocessing.pool.RemoteTraceback:
... < SNIP > ...
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://physionet.org/physiobank/database/ltstdb%5Cs20011.hea

(of course %5C is a backslash).

Probably want to use posixpath instead?

urllib.request module is not part of the Python 2 standard library

The urllib.request module is part of the Python 3 standard library, but not the Python 3 library. Importing the package in Python 2 fails with the following error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-f69d0cc5fe48> in <module>()
----> 1 import wfdb

/Users/tompollard/projects/wfdb-python/wfdb/__init__.py in <module>()
----> 1 import pbdownload
      2 import plotwfdb
      3 import readannot
      4 import readsignal

/Users/tompollard/projects/wfdb-python/wfdb/pbdownload.py in <module>()
      2 ### Please report bugs and suggestions to https://github.com/MIT-LCP/wfdb-python or [email protected] ###
      3 
----> 4 import urllib.request as ul
      5 import configparser
      6 import os

ImportError: No module named request

Download/process selected chunk of remote files

@cx1111 we discussed setting up rdsamp (or a download function) to process/download a selected chunk of a remote file. It looks like we could use the stream argument of the requests module for this: http://docs.python-requests.org/en/master/user/advanced/

Simplify method for configuring package

The config.ini file includes several user configurable settings (e.g. directories). The file is included in the wfdb package and loads to the site packages directory when pip installed. This location is not easy to find for someone wanting to overwrite the configuration settings.

We should consider either (1) implementing a multi-step search for the config file so it can be overwritten by a local version or (2) removing dependence on config file and support the settings in the function call.

making sense of data format

Hi,

I just downloaded some acceleration data from the physionet database but the values I get are huge (e.g. 1.07841715e+198). Reading the FAQ I guess the signal format is "format 16" (using two's complement) but I'm still confused. I would expect values ranging from -20 to +20 m/s^2. Can you please help?

Below are the first 4 lines of:
https://physionet.org/physiobank/database/ltmm/LabWalks/co001_base.hea

co001_base 6 100 3801
co001_base.dat 16 47916.5958(-53949)/g 0 0 -10008 -15948 0 v-acceleration
co001_base.dat 16 70148.6031(4980)/g 0 0 -6474 -18391 0 ml-acceleration
co001_base.dat 16 54691.0073(34671)/g 0 0 15947 -23295 0 ap-acceleration

Thanks a lot,
David

rdann applies sample range (sampfrom -> sampto) after processing full record

It looks like rdann processes the full data file before snipping the record to the specified sampfrom to sampto arguments. Processing should be limited to the specified range if possible.

Managing wfdb-python branches

I think the plan for this repository was for the master branch to reflect the current release of the package (currently 0.1.2), with developments being merged into a branch with the name of the next release (currently 0.1.3-dev).

The master and dev branches have become out of sync, so I'll merge the master branch into the dev branch. To help avoid conflicts, my preference is for future development work to be done in project specific branches and then merged into the dev branch when ready.

Handle gain=0

The following code does not work:

sig, fields = wfdb.rdsamp('ltdb/14046')
annsamp=wfdb.rdann('ltdb/14046', 'atr')[0]
wfdb.plotwfdb(sig, fields, annsamp, title='ltdb/14046')

It raises the following warnings:

/home/dubrzr/venv/lib/python3.6/site-packages/wfdb/_rdsamp.py:728: RuntimeWarning: divide by zero encountered in true_divide
  sig = np.divide(sig, np.array([fields["gain"]]))
/home/dubrzr/venv/lib/python3.6/site-packages/wfdb/_rdsamp.py:728: RuntimeWarning: invalid value encountered in true_divide
  sig = np.divide(sig, np.array([fields["gain"]]))

This page states the following:

If the gain is zero or missing, this indicates that the signal amplitude is uncalibrated; in such cases, a value of 200 (DEFGAIN, defined in <wfdb/wfdb.h>) ADC units per physical unit may be assumed.

Deleting files downloaded by rdsamp with the keepfiles argument

v0.1.2 includes a keepfiles argument (default = 0) that defines whether or not files are deleted after being downloaded by rdsamp. Defaulting to 0 has the potential to be confusing (the file is downloaded and then immediately deleted) and it is difficult to see when the argument would be useful anyway. Agreed to remove...

Response to "Remove file and redownload?" not recognised

In [5]: sig, fields = wfdb.rdsamp('mitdb/100', sampto=2000, pbdl=1)

File /Users/tompollard/sand/100.hea is already present.
Warning - File /Users/tompollard/sand/100.dat is 0 bytes. Likely interrupted download.
Remove file and redownload? [y/n] - y

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-bf902e201277> in <module>()
----> 1 sig, fields = wfdb.rdsamp('mitdb/100', sampto=2000, pbdl=1)

/usr/local/lib/python2.7/site-packages/wfdb/_rdsamp.pyc in rdsamp(recordname, sampfrom, sampto, channels, physical, stacksegments, pbdl, dldir, keepfiles)
   1029         sys.exit("input channels must be non-negative")
   1030 
-> 1031     dirname, baserecordname, filestoremove = checkrecordfiles(recordname, pbdl, dldir, keepfiles)
   1032 
   1033     fields = readheader(os.path.join(dirname, baserecordname))

/usr/local/lib/python2.7/site-packages/wfdb/_rdsamp.pyc in checkrecordfiles(recordname, pbdl, dldir, keepfiles)
    968     # Download physiobank files if specified
    969     if pbdl == 1:
--> 970         dledfiles = dlrecordfiles(recordname, dldir)
    971         if keepfiles==0:
    972             filestoremove = dledfiles

/usr/local/lib/python2.7/site-packages/wfdb/_rdsamp.pyc in dlrecordfiles(pbrecname, targetdir)
     45         for f in set(fields["filename"]):
     46             # Missing dat file
---> 47             dledfiles, displaydlmsg = dlifmissing(physioneturl+pbdir+"/"+f, os.path.join(targetdir, f), dledfiles, displaydlmsg, targetdir)
     48     else:  # Multi segment. Check for all segment headers and their dat files
     49         for segment in fields["filename"]:

/usr/local/lib/python2.7/site-packages/wfdb/_rdsamp.pyc in dlifmissing(url, filename, dledfiles, displaydlmsg, targetdir)
     68         # Likely interrupted download
     69         if os.path.getsize(filename)==0:
---> 70             userresponse=input("Warning - File "+filename+" is 0 bytes. Likely interrupted download.\nRemove file and redownload? [y/n] - ")
     71             while userresponse not in ['y','n']:
     72                 userresponse=input("Remove file and redownload? [y/n] - ")

/usr/local/lib/python2.7/site-packages/wfdb/_rdsamp.pyc in <module>()

NameError: name 'y' is not defined

New WFDB record object/class

Object with a field for each parameter. The signal fields will be placed in lists of length nsig. Things to implement:

method to extract all the information of a target signal by creating a signal object. Alternatively or in addition, method to redefine the record object with only the target signal.
method to print all the fields.
method to print all the fields not equal to none.
method to print the important fields for the average user.
This package will have more stringent requirements for writing WFDB records. ie. fs, siglen, gain, baseline, units must all be present (some of these will have to be explicitly input by the user, some can be figured out via input variables or defaults will be set). When certain defaults are set, print a message. When reading preexisting wfdb records that do not satisfy these more stringent requirements, print a warning message when assuming a default value.
method to convert a WFDBmultirecord object into a WFDBrecord object.

Data loaded incorrectly for format 212 file

I think there is some bug in the way negative values are handled... I tried loading in data from v102s - attached here example-data.zip

When I load it directly and plot it I get this:

recName = 'v102s'
base_path = '/home/alistairewj/challenge-2015/test/challenge-2015-training-dat/'
sig, fields=readsignal.rdsamp(base_path + recName)
plotwfdb.plotsigs(sig[0:(0+int(fields['fs'])*10),:], fields, title=recName)

However if I first read it using the shell rdsamp

rdsamp -r v102s -c -pS -H > test.csv

Then load in this csv directly in python (Note I included the test.csv file in the zip above):

data = np.genfromtxt(base_path + 'test.csv',delimiter=',', missing_values='-')
plt.figure()
plt.plot(range(250*10), data[range(250*10),1], '-')
plt.show()

I get this:

If we directly compare the first lead from python (left) and using the shell (right):

Python (incorrect)	Shell (correct)
28.720	-0.011
28.723	-0.008
0.006	0.006
0.024	0.024
0.046	0.046
0.068	0.068
0.090	0.090
0.111	0.111
0.135	0.135
0.165	0.165

Looks like maybe the python package has underflow for negative values? Here's the header file:

v102s 4 250 75000
v102s.dat 212 2281/mV 0 0 -26 -9286 0 II
v102s.dat 212 1856/mV 0 0 340 2647 0 V
v102s.dat 212 1250/NU 0 0 -46 -11021 0 PLETH
v102s.dat 212 38880/NU 0 0 339 12236 0 RESP

Move download from rdsamp to a separate module

Currently rdsamp has an option to download a record from PhysioNet to a local folder. Consider separating read and download functionality by moving into separate modules.

Original WFDB C package notes and nuances

WFDB Invalid sample limitation

Currently the wfdb library does not account for >16bit formats and hence uses the value -32768 as nan.

For this python package for rdsamp physical==0, I have currently decided to set empty multi-segment channels (with stacksegments==1) to -2^31. We can revise this later if needed and see what happens with the next version of the wfdb library. In addition, all format samples will be stored (with physical==0) as they are with no mapping.

For rdsamp for option physical==1, 2^(fmt-1) will be interpreted as nans.

Default behaviour for locating and downloading files in rdsamp

The rdsamp locate-and-download-file behavior could be explicitly set with arguments, rather than having a separate config file.

Sensible behaviour might be:

rdsamp requires the path to the file to be specified as an argument.
rdsamp saves the data to the current working directory (os.cwd) unless a download directory is specified as an argument.

rdann annotation test intermittently fails

In Python 2, test_rdann sometimes passes:

$ nosetests
----------------------------------------------------------------------
Ran 14 tests in 4.008s

OK

... and sometimes fails. The error appears to be consistently the 3rd item in the comp list, sometimes in test_1 and sometimes in test_2:

$ nosetests
======================================================================
FAIL: test_readannot.test_rdann.test_2
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/tompollard/projects/wfdb-python/tests/test_readannot.py", line 104, in test_2
    assert (comp == [True] * 6)
AssertionError: 
-------------------- >> begin captured stdout << ---------------------
print comp
[True, True, False, True, True, True]
--------------------- >> end captured stdout << ----------------------
----------------------------------------------------------------------
Ran 14 tests in 3.987s

FAILED (failures=1)

Make package pip installable

Guidelines for making the package installable with pip are at:
https://packaging.python.org/en/latest/distributing/

Nuance in original C WFDB package rdsamp for multi-segment variable layout records

In the C WFDB software package, if you call rdsamp -p on a multi-segment record, it will read the digital values from each segment and use the layout header's gain and baseline for each channel to convert the values to physical. It is possible (such as in record sampledata/matched/s25047/s25047-2704-05-04-10-44) that the layout header's gains/baselines do not match those of the individual segment headers, and hence calling rdsamp on the master record may return slightly different values than calling rdsamp on individual segments.

In this python package, rdsamp will convert each segment's values into physical units using the gain and baseline specified in each individual segment header. This makes it so that the values received from calling rdsamp on the master record will give the exact same values as if it were called on the individual records.

I think the latter method is more proper and I discussed this with Benjamin as well. But this does mean the two rdsamps perform slightly differently for multi-segment records, and also that any integrity checks will have to be done via reading the individual segments.

Division of integers returns floor in python 2.x

The default behaviour for division has changed between Python 2 and Python 3. In Python 3, dividing an integer by an integer returns a float. e.g.

In  [1]: 3/2
Out [1]: 1.5

In Python 2, the default behaviour is to return an integer (the floor of the division). e.g.

In  [1]: 3/2
Out [1]: 1

The current code does not account for this difference, so incorrect results are returned in Python 2. For example, https://github.com/MIT-LCP/wfdb-python/blob/master/wfdb/_rdann.py includes the line:

testbytes=filebytes[:(12+math.ceil(auxlen/2)),:].flatten()

Adding test data files to repo

@cx1111 Is it possible to populate the repo with sample data files? If this is not possible (e.g. because the files contain sensitive data) then can we provide a script for downloading the sample dataset?

TypeError: The 'annotation' field must be a 'wfdb.Annotation' object in python2, but not python3

Unless I am in a python 3 environment, I cannot get annotations to plot.

import wfdb
import numpy as np
import os
from IPython.display import display

# Following demo.ipynb Example 4
# Can also read the same file hosted on PhysioBank 
annotation2 = wfdb.rdann('100', 'atr', sampfrom = 100000, sampto = 110000, pbdir = 'mitdb')
annotation2.fs = 360
wfdb.plotann(annotation2, timeunits = 'minutes')

This results in a TypeError on python2:

/home/spark/anaconda2/envs/waveforms/lib/python2.7/site-packages/wfdb/plots.pyc in checkannplotitems(annotation, title, timeunits)
    286     # signals
    287     if type(annotation)!= annotations.Annotation:
--> 288         raise TypeError("The 'annotation' field must be a 'wfdb.Annotation' object")
    289 
    290     # fs and timeunits

TypeError: The 'annotation' field must be a 'wfdb.Annotation' object

But if I print annotation2:

print(type(annotation2))
<type 'instance'>

This runs fine in python3 and plots as expected and the object is the same.

print(type(annotation2))
<class 'wfdb.annotations.Annotation'>

I tested this by creating a new/clean conda virtual environment, and switched between python 2 and python 3.

Is this a known issue? Or am I missing something simple? I'd prefer to run using python 2.

Apply PEP8 style guidelines

Applying the PEP8 style guidelines would help to improve readability. Perhaps run https://pypi.python.org/pypi/autopep8 over the code.

dldatabase fails to download nested databases correctly

In the MIMIC 2 Waveform Database the database is organised in a nested manner, such that directories contain directories which then contain the signal files. Each of these directories has a RECORDS file.

When attempting to download part of this database as follows:
import wfdb
wfdb.dldatabase('mimic2wdb/30', os.getcwd())

The following error occurs:
>>> wfdb.dldatabase('mimic2wdb/30',os.getcwd()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/homes/ht1013/.local/lib/python3.5/site-packages/wfdb/records.py", line 1176, in dldatabase record = rdheader(baserecname, pbdir = posixpath.join(pbdb, dirname)) File "/homes/ht1013/.local/lib/python3.5/site-packages/wfdb/records.py", line 851, in rdheader headerlines, commentlines = _headers.getheaderlines(recordname, pbdir) File "/homes/ht1013/.local/lib/python3.5/site-packages/wfdb/_headers.py", line 413, in getheaderlines headerlines, commentlines = downloads.streamheader(recordname, pbdir) File "/homes/ht1013/.local/lib/python3.5/site-packages/wfdb/downloads.py", line 15, in streamheader r.raise_for_status() File "/homes/ht1013/.local/lib/python3.5/site-packages/requests/models.py", line 909, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://physionet.org/physiobank/database/mimic2wdb/30/3000003/.hea

I believe this is because as the directories are listed in the RECORDS file it attempt to find the header files for these, as opposed to entering the directories and looking for records within.

Similar errors occur for attempting other parts or the entire database:
wfdb.dldatabase('mimic2wdb',os.getcwd())

What is ann.chan and how to use it?

Hello,

I don't get what the ann.chan variable is:

t = 'nsrdb/18177'
sig, fields = wfdb.srdsamp(t)
fs = fields['fs']
ann = wfdb.rdann(t, 'atr')
print(fields)
>> {'fs': 128.0, 'signame': ['ECG1', 'ECG2'], 'units': ['mV', 'mV'], 'comments': ['26 F']}
print(numpy.unique(ann.chan))
>> [ 0  1  2 16 17 18 32 33 34]

Here we have 2 signals but it seems that there are 9 different channels; I thought that len(signal)==len(channels), are they different things? How do I associate an annotation channel with a signal?

I've also tested plotrec:

record = wfdb.rdsamp('nsrdb/18177', sampto = 1000)
ann = wfdb.rdann('nsrdb/18177', 'atr', sampto = 1000)
wfdb.plotrec(record, annotation=ann, title='Record nsrdb/17453', timeunits = 'seconds', annch=[0, 1])

Which gives:

Here I specified annch=[0, 1] as I understood that ECG1=channel 0 and ECG2=channel 1.
But from what I understand, each annotation is associated with a specific channel: ann.chan[x] is associated with ann.anntype[x] and ann.annsamp[x], where ann.chan[x] specifies the signal channel on which there is an annotation. Shouldn't plotrec display annotations on the corresponding channel instead?

Or maybe I missed something?

Thanks! :)

Move readheader from rdsamp to separate module

It would be helpful to make the readheader function callable through the API. For consistency, move readheader to a separate module.

Add unit tests to check binary files are read correctly

To test whether binary files are read correctly by the WFDB functions, we can create unit tests.

Casting from unsigned to signed data type for offset formats

Format 80 and 160 are stored in binary offset format. They are read as unsigned integers and then 128 or 32768 is subtracted from each number to get the real digital interpretation. I load the values from the dat files as unsigned integers into numpy arrays.

If I directly try to subtract these numbers from them, their values don't go below 0 because of their storage formats and instead the negative value is subtracted from the maximum positive value of their respective format.

So I need to typecast into signed datatypes, but when I try sig=sig.astype(int8) for example for format 80, the values stored are immediately reinterpreted before I even do anything. The bits are reinterpreted as signed two's complement (not binary offset) and the value stored becomes totally different. Currently I use sig=sig.astype(int) which is int64 which works without reinterpreting the bits for some reason. Firstly it's a waste of space, secondly I'd like to understand why this typecast doesn't reinterpret the bits while the other does and what the solution is.

ValueErrors for some ids from mghdb Dataset

Thanks for providing the wfdb python package.

I downloaded the mghdb dataset locally.

When extracting the time series for this data set I get errors for some ids.

sig, fields = wfdb.rdsamp('/mghdb/mgh026')
ValueError: could not broadcast input array from shape (6912000) into shape (7128000)

sig, fields = wfdb.rdsamp('/mghdb/mgh175')
ValueError: could not broadcast input array from shape (5031488) into shape (5555520)

sig, fields = wfdb.rdsamp('/mghdb/mgh196')
ValueError: could not broadcast input array from shape (6739200) into shape (6946560)

sig, fields = wfdb.rdsamp('/mghdb/mgh199')
ValueError: could not broadcast input array from shape (6739200) into shape (6998400)

sig, fields = wfdb.rdsamp('/mghdb/mgh210')
ValueError: operands could not be broadcast together with shapes (7144790,) (7144789,)

The other ids can be loaded without a problem

To do list for rdsamp

error when processing record

Hi,

I am trying to run wfdb with the following record \matched\s00001\s00001-2896-10-10-00-31'

as
tstart = 14428365
tend = 14428375
sig, fields = wfdb.rdsamp(filename,sampfrom=tstart,sampto=tend)

and I am getting the following error
*** TypeError: exit expected at most 1 arguments, got 2

It runs if I do
sig, fields = wfdb.rdsamp(filename,sampfrom=14428365) or
sig, fields = wfdb.rdsamp(filename)

wfdb.rdsamp('mitdb/100', sampto=2000, pbdl=1) returns UnicodeEncodeError

In [1]: import wfdb

In [2]: sig, fields = wfdb.rdsamp('mitdb/100', sampto=2000, pbdl=1)

Downloading missing file(s) into directory: /Users/tompollard/sand
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-2-bf902e201277> in <module>()
----> 1 sig, fields = wfdb.rdsamp('mitdb/100', sampto=2000, pbdl=1)

/usr/local/lib/python2.7/site-packages/wfdb/_rdsamp.pyc in rdsamp(recordname, sampfrom, sampto, channels, physical, stacksegments, pbdl, dldir, keepfiles)
   1029         sys.exit("input channels must be non-negative")
   1030 
-> 1031     dirname, baserecordname, filestoremove = checkrecordfiles(recordname, pbdl, dldir, keepfiles)
   1032 
   1033     fields = readheader(os.path.join(dirname, baserecordname))

/usr/local/lib/python2.7/site-packages/wfdb/_rdsamp.pyc in checkrecordfiles(recordname, pbdl, dldir, keepfiles)
    968     # Download physiobank files if specified
    969     if pbdl == 1:
--> 970         dledfiles = dlrecordfiles(recordname, dldir)
    971         if keepfiles==0:
    972             filestoremove = dledfiles

/usr/local/lib/python2.7/site-packages/wfdb/_rdsamp.pyc in dlrecordfiles(pbrecname, targetdir)
     45         for f in set(fields["filename"]):
     46             # Missing dat file
---> 47             dledfiles, displaydlmsg = dlifmissing(physioneturl+pbdir+"/"+f, os.path.join(targetdir, f), dledfiles, displaydlmsg, targetdir)
     48     else:  # Multi segment. Check for all segment headers and their dat files
     49         for segment in fields["filename"]:

/usr/local/lib/python2.7/site-packages/wfdb/_rdsamp.pyc in dlifmissing(url, filename, dledfiles, displaydlmsg, targetdir)
     81             print("File "+filename+" is already present.")
     82     else:
---> 83         dledfiles.append(dlorexit(url, filename, displaydlmsg, targetdir))
     84         displaydlmsg=0
     85 

/usr/local/lib/python2.7/site-packages/wfdb/_rdsamp.pyc in dlorexit(url, filename, displaydlmsg, targetdir)
     95         r = requests.get(url)
     96         with open(filename, "w") as text_file:
---> 97             text_file.write(r.text)
     98         return filename
     99     except requests.HTTPError:

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)

IndexError when using srdsamp with ltstdb/s20601

When loading 'ltstdb/s20601', I got the following error:

sig, fields = wfdb.srdsamp('data/ltstdb/s20601')

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-10-a9e3120755d1> in <module>()
      2 t = 'data/ltstdb/s20601'
      3 
----> 4 sig, fields = wfdb.srdsamp(t)
      5 
      6 

C:\Anaconda3-4.2-3.5\lib\site-packages\wfdb\records.py in srdsamp(recordname, sampfrom, sampto, channels, pbdir)
    945     """
    946 
--> 947     record = rdsamp(recordname, sampfrom, sampto, channels, True, pbdir, True)
    948 
    949     signals = record.p_signals

C:\Anaconda3-4.2-3.5\lib\site-packages\wfdb\records.py in rdsamp(recordname, sampfrom, sampto, channels, physical, pbdir, m2s)
    741 
    742     # Read the header fields into the appropriate record object
--> 743     record = rdheader(recordname, pbdir = pbdir)
    744 
    745     # Set defaults for sampto and channels input variables

C:\Anaconda3-4.2-3.5\lib\site-packages\wfdb\records.py in rdheader(recordname, pbdir)
    864         if len(headerlines)>1:
    865             # Read the fields from the signal lines
--> 866             d_sig = _headers.read_sig_lines(headerlines[1:])
    867             # Set the object's signal line fields
    868             for field in _headers.sigfieldspecs:

C:\Anaconda3-4.2-3.5\lib\site-packages\wfdb\_headers.py in read_sig_lines(siglines)
    463             d_sig['checksum'][i],
    464             d_sig['blocksize'][i],
--> 465             d_sig['signame'][i]) = rxSIGNAL.findall(siglines[i])[0]
    466 
    467         for field in sigfieldspecs:

IndexError: list index out of range

wfdb version = 1.0.5

Thanks!

IPython.display.display converts normal interpreter into IPython session

I'm not quite sure why IPython.display.display is used to display certain objects. When I'm using the interpreter, the call to display dumps me into an IPython session.

Would it be possible to remove those? I've not used that function before so I'm not sure what the trade-offs are between display and simple print statements.

>>> # Inside regular python interpreter
>>> wfdb.showanncodes()
           Ann Code Meaning
Ann Symbol
                     NOTQRS
N                    NORMAL
L                      LBBB
R                      RBBB
a                     ABERR
V                       PVC
F                    FUSION
J                       NPC
A                       APC
S                      SVPB
E                      VESC
j                      NESC
/                      PACE
Q                   UNKNOWN
~                     NOISE
[15]
|                     ARFCT
[17]
s                      STCH
T                       TCH
*                   SYSTOLE
D                  DIASTOLE
"                      NOTE
=                   MEASURE
p                     PWAVE
B                       BBB
^                    PACESP
t                     TWAVE
+                    RHYTHM
u                     UWAVE
?                     LEARN
!                     FLWAV
[                      VFON
]                     VFOFF
e                      AESC
n                     SVESC
@                      LINK
x                      NAPC
f                      PFUS
(                      WFON
)                     WFOFF
r                      RONT
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
In : # now I'm inside an IPython shell
...:
In :

Use of sys.exit instead of raising errors

Using sys.exit when things go wrong boots me out of my Python shell. I think the more pythonic way to handle problems is to raise an error, e.g.

# Instead of using
# sys.exit('The database contains no annotators with extension: ', a)
# I suggest using
raise ValueError('The database contains no annotators with extension: ', a)

Does that sound reasonable to others?

Suggestions for comparing floating point values in nosetests?

Some nosetests are failing because of the floating point comparisons. When I create the target text files using rdsamp, I get 8 decimal places.

What else could I put aside from assert np.array_equal(sig, targetsig)?

Edit: Decided to round the output of my function to 8dp for physical signals. Seems like a fair comparison.

Reading and writing skewed signals

rdsamp reads the signals and applies the skew, ie. into the d_signals attribute.

But how would/should the package interpret the writing of a record with the d_signals array and at least one skewed channel? The only thing that it can currently do is write those digital values straight to the file as they are, and write the skew values to the header. But then calling rdsamp() followed by wrsamp() will get you a different result.

Need to figure out clear way to make distinction for writing signals.

Plot ECG signal with big box and small box

Posibility of when you plot ECG signal appear the big box and small box that identify ms like the image: http://www.e-projects.ubi.pt/smart-clothing/images/fig11.jpg

Wish list for WFDB functions

A list of functions available in the original WFDB package is at:
https://physionet.org/physiotools/wag/wag.htm

Which other of these functions would it be helpful to implement?

ValueError when using srdsamp with sddb/49

When loading 'sddb/49', I got the following error:

sig, fields = wfdb.srdsamp('data/sddb/49')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-b8a3bfe371be> in <module>()
      2 #t = 'data/ltstdb/s20601'
      3 
----> 4 sig, fields = wfdb.srdsamp(t)
      5 
      6 

C:\Anaconda3-4.2-3.5\lib\site-packages\wfdb\records.py in srdsamp(recordname, sampfrom, sampto, channels, pbdir)
    945     """
    946 
--> 947     record = rdsamp(recordname, sampfrom, sampto, channels, True, pbdir, True)
    948 
    949     signals = record.p_signals

C:\Anaconda3-4.2-3.5\lib\site-packages\wfdb\records.py in rdsamp(recordname, sampfrom, sampto, channels, physical, pbdir, m2s)
    756         record.d_signals = _signals.rdsegment(record.filename, dirname, pbdir, record.nsig, record.fmt, record.siglen,
    757             record.byteoffset, record.sampsperframe, record.skew,
--> 758             sampfrom, sampto, channels)
    759         # Arrange/edit the object fields to reflect user channel and/or signal range input
    760         record.arrangefields(channels, sampto - sampfrom)

C:\Anaconda3-4.2-3.5\lib\site-packages\wfdb\_signals.py in rdsegment(filename, dirname, pbdir, nsig, fmt, siglen, byteoffset, sampsperframe, skew, sampfrom, sampto, channels)
    331     for fn in w_filename:
    332         signals[:, out_datchannel[fn]] = rddat(fn, dirname, pbdir, w_fmt[fn], len(datchannel[fn]), 
--> 333             siglen, w_byteoffset[fn], w_sampsperframe[fn], w_skew[fn], sampfrom, sampto)[:, r_w_channel[fn]]
    334 
    335     return signals

C:\Anaconda3-4.2-3.5\lib\site-packages\wfdb\_signals.py in rddat(filename, dirname, pbdir, fmt, nsig, siglen, byteoffset, sampsperframe, skew, sampfrom, sampto)
    365 
    366     # Read the dat file into the correctly shaped np array
--> 367     sig, bytesloaded = processwfdbbytes(filename, dirname, pbdir, fmt, startbyte, sampto - sampfrom, nsig, sampsperframe, floorsamp)
    368 
    369     # Shift the samples in the channels with skew if any

C:\Anaconda3-4.2-3.5\lib\site-packages\wfdb\_signals.py in processwfdbbytes(filename, dirname, pbdir, fmt, startbyte, readlen, nsig, sampsperframe, floorsamp)
    406             # One sample pair is stored in one byte triplet.
    407             sig[0::2] = sigbytes[0::3] + 256 * \
--> 408                 np.bitwise_and(sigbytes[1::3], 0x0f)  # Even numbered samples
    409             if len(sig > 1):
    410                 # Odd numbered samples

ValueError: operands could not be broadcast together with shapes (22380958,) (22380957,)

wfdb version = 1.0.5

Thanks!

wrsamp development

Program flow pseudocode

*Note this function is for writing single segment wfdb files. There will be another function that calls wrsamp multiple times for writing multi-section records.

mandatorywritefields = [[recordname, nsig, fs, siglen], [filename, fmt, adcgain, baseline, units]]

This includes all the fields required by the original WFDB specifications, along with a few extra fields I feel like should be explicitly written. I really do not like how some fields are optional or can be 'assumed', such as how default gain = 200, or how siglen is optional.

wrsamp(sig, fields, physical=1, ...)

Notes:

Be careful when calculating the checksum with signals with skew

Module structure

How about the following:

headers.py: Functions and definitions for reading and writing WFDB header files. No dependencies.
- rdheader
- wrheader
- The header fields dictionary/list
- Other private helpers
signals.py: Functions and definitions for reading and writing WFDB record (signal+header) files. Uses headers.py.
- rdsamp
- wrsamp
- Other private helpers (namely rddat and wrdat)
annotations.py: Functions and definitions for reading and writing WFDB annotation files. Uses headers.py.
- rdann
- wrann
- Other private helpers.

Suggestions for loading physiobank data

import numpy as np
import urllib
targeturl="http://physionet.org/physiobank/database/macecgdb/test01_00s.dat"

hr=urllib.urlopen(targeturl) # The returned object is of type 'http.client.HTTPResponse'
content = hr.read() # This is a bytes object
content=np.array(list(content))

So right now the solution I have found to read physiobank data is to use the urlib package to return an HTTPResponse object for the read file. Then I use the read() method to extract a bytes object from it, which essentially gives me all the data in unsigned 1 byte format. This is ok but requires more processing for all the format types, as opposed to something like numpy.fromfile() where you can specify the format and gives you a numpy array directly.

I've had a look at this doc page https://docs.python.org/3/library/http.client.html#http.client.HTTPResponse and tried to call help(HR.read) but I don't see a faster method than to read everything in 1 byte blocks, convert to a list, and then to a numpy array, and then to do calculations. If anyone knows of a more efficient method please let me know.

Reading and writing special WFDB formats (212, 310, 311)

Ensure files written can be read by both wfdb c and wfdb-python
Ensure wfdb-python can read at least the files readable by wfdb c
Do not use wfdb c wrsamp to write files for fmt 310,311 when samples don't result in full byte blocks. It produces flawed outputs even for to rdsamp in c package. It can produce valid files when the sample number does produce full byte blocks.

Format 212

wfdb c CAN read odd number sampled records, even if the file length does not capture dummy samples to round up a whole byte block. ie. 2997 sample file with size 4996 bytes is readable. 2997*1.5 = 4995.5. upround(4995.5, 3) = 4997.

Format 310

wfdb c CAN read records with one extra sample after a complete block sample number, even if the file length does not capture dummy samples to round up a whole byte block. ie. 1018 sample file with size 1358 bytes is readable. 1018*4/3 = 1357+1/3. upround(1357, 4) = 1360.
Nothing to test for 2 extra samples after complete blockdue to the layout of 310. The next sample would require the whole byte block.

Format 311

wfdb c CAN read records with one extra sample after a complete block sample number, even if the file length does not capture dummy samples to round up a whole byte block. ie. 1018 sample file with size 1358 bytes is readable. 1018*4/3 = 1357+1/3. upround(1357, 4) = 1360.
wfdb c CANNOT read records with 2 extra samples after a complete block sample number, if the file length does not capture dummy samples to round up a whole byte block. ie. 1019 sample file with size 1359 bytes is not readable. 1019*4/3 = 1358+2/3. upround(1358, 4) = 1360. WEAKNESS. Current library tries to read in byte pairs. Hence why the first extra sample is always readable coincidentally.

Conclusion

Write rdsamp to be able to read these formats using the minimum number of bytes needed, without needing to read in entire blocks.
Write wrsamp to write just enough bytes for mod1 samples, and round up whole blocks for mod2 samples.

Certain output fields in rdann sometimes become equal to the values of other fields

Somehow sporadically, (or probably not), running rdann on the exact same file with the same parameters multiple times in succession occasionally ends up with 'chan' equal to 'annsamp', and other similarities.

ValueError: cannot convert float NaN to integer

Hello ,
when I read ecg data from physiobank chfdb. I use wfdb.srdsamp func read.
"File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/wfdb/records.py", line 948, in srdsamp
record = rdsamp(recordname, sampfrom, sampto, channels, True, pbdir, True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/wfdb/records.py", line 764, in rdsamp
record.p_signals = record.dac()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/wfdb/_signals.py", line 178, in dac
p_signal[nanlocs] = np.nan
ValueError: cannot convert float NaN to integer "
Where do I use the error?

Checklist for adding multiple samples/frame expanded signals option

Record.dac()
Record.adc()
Record.arrangefields()
Special formats
Multi-segment expanded support in certain cases.
Make smoothframes() function to convert e_<>_signals into <>_signals. Instance method which calls the module/class method. This should accept an fs or similar option to upsample channels. Return uniform array.

Move functions from Notebook to module

Rather than including the rdsamp, readheader, and readdat functions in the Jupyter Notebook, it would be cleaner to include them in a module, which can then be imported.