Git Product home page Git Product logo

sdds's Introduction

SDDS

Cron Testing Code Climate coverage Code Climate maintainability (percentage)

PyPI Version GitHub release Conda-forge Version DOI

This package provides reading and writing functionality for self describing data sets (sdds) files. On the python side, the data is stored in a class structure with attributes corresponding to the sdds-format itself (see sdds-format).

See the API documentation for details.

Installing

Installation is easily done via pip:

python -m pip install sdds

One can also install in a conda environment via the conda-forge channel with:

conda install -c conda-forge sdds

Example Usage

import sdds

sdds_data = sdds.read("path_to_input.sdds")
sdds.write(sdds_data, "path_to_output.sdds")

Read files with different endianness

By default the endianness (byte order) of the file is determined either by a comment !# little-endian or !# big-endian in the header of the file. If this comment is not found, the endianness of the running machine is assumed.

One can force a certain kind of endianness to the reader by supplying it to the read function:

import sdds

sdds_data = sdds.read("path_to_input_with_big_endian.sdds", endianness="big")
sdds_data = sdds.read("path_to_input_with_little_endian.sdds", endianness="little")

Be aware that sdds.write will always write the file in big-endian order and will also leave a comment in the file, so that the reader can determine the endianness and there is no need to supply it when reading a file written by this package.

Known Issues

  • Can't read binary columns
  • No support for &include tag

License

This project is licensed under the MIT License - see the LICENSE file for details.

sdds's People

Contributors

dronakurl avatar fscarlier avatar fsoubelet avatar jaimecp89 avatar joschd avatar mael-le-garrec avatar pylhctokens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sdds's Issues

SDDS update for llong

Hi,

The new format for sdds files often uses llong format. I was able to use your code with the following modifications :
classes.py, line 17-19 : add llong format (>i8, 8, int)
reader.py line 139 : convert num_dims to int -> int(num_dims)

Thank you for the good work! Is there any plan to add ascii compatibility?

Here is an example of the file I was trying to open

LHC_BLM@08_36_18_005@[email protected]

As for the ASCII, this is the file I want to open (I can do it line by line, but that is it for now) :
Ufo_2018-06-09_22-05-42.0_BLMQI.11L8.B2I10_MQ.sdds.gz

Cheers

Read gzipped SDDS files

Feature Description

Would be nice to be able to read directly gzipped files. A minor modification to the read_sdds function would do the trick, a bit of refactoring would allow to add it on top of a common interface for both functions.

Possible Implementation

def read_sdds_gz(file_path: Union[pathlib.Path, str], endianness: str = None) -> SddsFile:
    """
    Reads SDDS file from the specified ``file_path``.

    Args:
        file_path (Union[pathlib.Path, str]): `Path` object to the input SDDS file. Can be a
            `string`, in which case it will be cast to a `Path` object.
        endianness (str): Endianness of the file. Either 'big' or 'little'.
                          If not given, the endianness is either extracted from
                          the comments in the header of the file (if present)
                          or determined by the machine you are running on.
                          Binary files written by this package are all big-endian,
                          and contain a comment in the file.

    Returns:
        An `SddsFile` object containing the loaded data.
    """
    file_path = pathlib.Path(file_path)
    with gzip.open(file_path,'rb') as inbytes:
#     with file_path.open("rb") as inbytes:
        if endianness is None:
            endianness = _get_endianness(inbytes)
        version, definition_list, description, data = _read_header(inbytes)
        data_list = _read_data(data, definition_list, inbytes, endianness)

        return SddsFile(version, description, definition_list, data_list)

Fix error in SDDS writing

Release 0.2 deployed all changes made since August 9, 2019. This means all commits from fca0994 to b0ee5b9 included.

Testing in omc3 started breaking after using this new release, as can be seen here.

The reading does not seem to pose an issue.
However, reading the result of a tbt conversion fails, encountering an invalid start byte, which indicates writing was wrong.

These tests in omc3 fail starting with sdds's commit 5d2a0d0, which is change made to writing behavior.

Should aim to fix this asap, add proper testing, and release 0.2.1.

[Question]: Why not use the python sdds package ANL?

Documentation

  • Yes

Question

There is this other sdds package here:
www.aps.anl.gov/
Are there reasons not to use this? What are the main differences? It would be helpful to point that out on the README page.
When using elegant, you have to use their package and someone might confuse the packages, since the other package is also called sdds.

Refactor for consistency

Last package to refactor for consistency. Includes

  • no bumpversion
  • setup.py
  • __init__.py
  • docs/conf.py
  • no setup_requires
  • direct use of pytest
  • update license to actually be MIT (as said in setup.py)
  • update out-of-date parts of README

[Question]: The field 'format' in class Definition may be wrong?

Documentation

  • Yes

Question

I'm encountering an error while using this library to read SDDS5 files, meaning "class Definition does not have a field named format_string." I checked the 1995 documentation (version [1.4]) https://doi.org/10.2172/179270 on page 5, where the Definition only includes format_string without format.
I want to know where the field format in the Definition class comes from.

Recently, while using this library for SDDS5, I found that both reading and writing data were not implemented, so I made some additions. However, due to my limited understanding of the version history of SDDS, I cannot guarantee the completeness of my modifications. Can I submit these relevant changes? Alternatively, is there an updated version of this project available?

Update CI config

CI config should updated for better maintenance on dependencies and a simpler usage

[Bug]: IndexError for empty arrays

Documentation

  • Yes

Operating System

Any

Python Version

Any

Package Version

0.3.0

Bug Description

If there is an empty array or an array of empty arrays, the writing function will fail with an index error in:

return [len(value)] + get_dimensions_from_array(value[0])

Steps to Reproduce

No response

Relevant output

No response

Possible Fix Implementation

Possible solution could be

return [len(value)] + get_dimensions_from_array(value[0]) if len(value) else []

but needs to be tested if this would read okay. Or if empty arrays should be allowed in the first place?

[Feature Request]: Columns

Implementation of reading and writing of column data

  • Implement writing
  • Implement reading

@mulingLHY mentioned they were working on something like this, so maybe we can merge this back into this package later on.
But also @lmalina has been looking into this a few years ago I think. Maybe he can also comment on the status.

Very useful for us to have in this regard would be an example file.
Our sdds files do not use columns and hence this feature has not been neccessary.

Cheers,

Josch

encounter errors with sdds.read('file.sdds')

I want to use the sdds module to process the output data of ELEGANT,but when I used sdds.read(),it turns out errors.I'm a beginner in PYTHON,could anybody sent me a example script to show how to use sdds module.
Email:[email protected]

++++++++++++++++++++++
import sdds
import os

os.chdir(r'D:\IHEPBox\elegantExamples\PAR\twissCalculation')
sdds_data = sdds.read('parTwiss.twi')
++++++++++++++++++++++++++

I use the commands shown above to load the twiss file(a sdds file),but it turns errors like this:

Traceback (most recent call last):
File "C:\Users\Saike Tian\Desktop\1111111111111111111111111.py", line 5, in
sdds_data = sdds.read('parTwiss.twi')
File "E:\ProgramData\Anaconda3\lib\site-packages\sdds\reader.py", line 25, in read_sdds
version, definition_list, description, data = _read_header(inbytes)
File "E:\ProgramData\Anaconda3\lib\site-packages\sdds\reader.py", line 43, in _read_header
def_dict: Dict[str, str] = _get_def_as_dict(word_gen)
File "E:\ProgramData\Anaconda3\lib\site-packages\sdds\reader.py", line 181, in _get_def_as_dict
[assign.split("=") for assign in parts]}
File "E:\ProgramData\Anaconda3\lib\site-packages\sdds\reader.py", line 180, in
return {key.strip(): value.strip() for (key, value) in
ValueError: not enough values to unpack (expected 2, got 1)

[Bug]: Type Hints empty for derived class

Documentation

  • Yes

Operating System

macOS Big Sur 11.6.8

Python Version

Python 3.9.13

Package Version

cloned today

Bug Description

When reading a file, the reader fails to parse the header. The type_hints dictionary in classes.py line 118 appears to only have the type hints from the base class and not this derived class, so it fails with:

Traceback (most recent call last):
  File "/Users/nevay/physics/reps/sdds/sdds/reader.py", line 40, in read_sdds
    version, definition_list, description, data = _read_header(inbytes)
  File "/Users/nevay/physics/reps/sdds/sdds/reader.py", line 61, in _read_header
    definitions.append({
  File "/Users/nevay/physics/reps/sdds/sdds/classes.py", line 118, in __init__
    type_hint = type_hints[argname]
KeyError: 'description'

See attached screenshot of debugging in pycharm.
Screenshot 2022-09-14 at 11 19 15

This doesn't seem to be specific to the data file and happens every time for me.

Cheers,
Laurie

Steps to Reproduce

Load a file as per the documentation.

Relevant output

No response

Possible Fix Implementation

No response

[Feature Request]: Little endian binary file support

Feature Description

The current test covers the big endian binary file tests/inputs/test_file.sdds, would be nice if little-endian files were supported too.

The current behavior is to crash while trying to read a string of incorrect length:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "$prefix/lib/python3.9/site-packages/sdds/reader.py", line 29, in read_sdds
    data_list = _read_data(data, definition_list, inbytes)
  File "$prefix/lib/python3.9/site-packages/sdds/reader.py", line 89, in _read_data
    return _read_data_binary(definitions, inbytes)
  File "$prefix/lib/python3.9/site-packages/sdds/reader.py", line 107, in _read_data_binary
    return [functs_dict[definition.__class__](inbytes, definition) for definition in definitions]
  File "$prefix/lib/python3.9/site-packages/sdds/reader.py", line 107, in <listcomp>
    return [functs_dict[definition.__class__](inbytes, definition) for definition in definitions]
  File "$prefix/lib/python3.9/site-packages/sdds/reader.py", line 120, in _read_bin_param
    return inbytes.read(str_len).decode(ENCODING)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xee in position 20: invalid continuation byte

Possible Implementation

No response

Dimensions not written correctly

The write is currently using len(value) as the dimension for arrays, which is wrong:

outbytes.write(np.array(len(value), dtype=NUMTYPES["long"]).tobytes())

There can be multiple dimensions, nested lists:

dimensions: 36 and 4452 items
[[-0.26359558 -0.5675639  -0.29767203 ...  0.          0.
   0.        ]
 [-3.2426553  -3.8073838  -3.4528794  ...  0.          0.
   0.        ]
 [ 0.          0.          0.         ...  0.          0.
   0.        ]
 ...
 [ 0.          0.          0.         ...  0.          0.
   0.        ]
 [ 0.          0.          0.         ...  0.          0.
   0.        ]
 [ 0.          0.          0.         ...  0.          0.
   0.        ]]

In this case the writer will only write 36

[Feature Request]: Rewrite

Feature Description

Upon inspection of the SDDS code, I think I stumbled upon multiple possible bugs.

a) if you have entries with duplicate names, these will overwrite each other, as they are stored by name in the dict. -> this should at least raise an error somewhere (e.g. assert len(names) == len(set(names)))

b) The order of the definitions in the header is sorted (parameter, array, column) upon reading, but not the data. And it is sorted before reading the data. So basically, the data needs to be already sorted in that order. If sorting would actually sort anything, the file could not be read anymore.
Same goes upon writing: here actually the data is always written in that order, but the order of the headers is whatever came into the SDDSFile init. Here sorting the header would make sense, but it is not done!!

We never encountered problems, as we read LHC data which is already sorted and mostly use the turn_by_turn package for writing, which sorts the entries in the same way.
If it wouldn't the order in the header would be different from the order in the data and that order is actually required.

My suggestion is to remove all of that sorting and loop in the writer simply through the lists and write parameter, array, column one at a time, instead of gathering them first.

Possible Implementation

We could add a random hash to the defintions, maybe build from name + some random chars.
Then definitions + values could be combined into a single dict with definitions as keys and the corresponding values as values. (which we could also accept as alternative input to the init, so either two lists, or this kind of dict).
The only problem is, that we then have to rewrite the getter a bit, so that if a name-string is given, we actually return a tuple in case multiple entries are found.

something like that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.