Git Product home page Git Product logo

tfs's Introduction

TFS-Pandas

Cron Testing Code Climate coverage Code Climate maintainability (percentage)

PyPI Version GitHub release Conda-forge Version DOI

This package provides reading and writing functionality for Table Format System (TFS) files. Files are read into a TfsDataFrame, a class built on top of the famous pandas.DataFrame, which in addition to the normal behavior attaches an OrderedDict of headers to the DataFrame.

See the API documentation for details.

Installing

Installation is easily done via pip:

python -m pip install tfs-pandas

One can also install in a conda/mamba environment via the conda-forge channel with:

conda install -c conda-forge tfs-pandas

Example Usage

The package is imported as tfs, and exports top-level functions for reading and writing:

import tfs

# Loading a TFS file is simple
data_frame = tfs.read("path_to_input.tfs", index="index_column")

# You can access and modify the headers with the .headers attribute
useful_variable = data_frame.headers["SOME_KEY"]
data_frame.headers["NEW_KEY"] = some_variable

# Manipulate data as you do with pandas DataFrames
data_frame["NEWCOL"] = data_frame.COL_A * data_frame.COL_B

# You can check the validity of a TfsDataFrame, and choose the behavior in case of errors
tfs.frame.validate(data_frame, non_unique_behavior="raise")  # or choose "warn"

# Writing out to disk is simple too
tfs.write("path_to_output.tfs", data_frame, save_index="index_column")

Reading and writing compressed files is also supported, and done automatically based on the provided file extension:

import tfs

# Reading a compressed file is simple, compression format is inferred
df = tfs.read("path_to_input.tfs.gz")

# When writing choose the compression format by providing the appropriate file extension
tfs.write("path_to_output.tfs.bz2", df)
tfs.write("path_to_output.tfs.zip", df)

License

This project is licensed under the MIT License - see the LICENSE file for details.

tfs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tfs's Issues

[Feature Request]: Speedup Dataframe Validation

Feature Description

As reported by some users, the validation step can be very slow on large dataframes (for instance sliced FCC lattice).

In #107 the validation step was made optional. This is a first step but it would be nice for users who wish to validate to be able to do so faster.

Possible Implementation

The main culprit is is_not_finite(x) in the validate function, which is slow and applied with applymap, also very slow.

This can be sped-up (see implementation).

[Feature Request]: Check for madx-compatibility

Feature Description

Option to check for madx-compatibility on writing

Possible Implementation

into the writer add a flag: madx_compatibility with the options None (default), warn or error.
If not none it should be checked that:

a) the header contains TYPE
b) header-names do not contain spaces

... maybe more?

significant digits returns str

Currently tfs.tools.significant_digits returns strings, for further use and consistency the possibility to return floats would be advantageous

Update README build badge to Github status

Since we moved away from Travis CI, the build badge should be changed to reflect the github build status.

I suggest to change it to the status of cron tests (which will be on no status until the first cron kicks in on Monday).

@lmalina @JoschD

Remove FixedTfs

FixedTfs does not work with pandas 1.0 and newer. Pandas are getting more flexible. Fix seems to require quite deep digging and redefinition of basic pandas.DataFrame methods. If resolved the behaviour of FixedTfs and of TfsDataFrame will diverge. Different behaviour may be misleading since the use cases are very similar, if not the same.
Now the questions to answer are:

  • Is anyone actually using FixedTfs?
  • If so, is it worth the effort to make it work with newer pandas?
  • Or we just get rid of it?

Writer Malfunction on empty dataframe

Empty dataframes might cause the writer to crash. Observed behavior:

a) only index given (columns empty):
Crashes in handler.py:238 as the check in line 234 should read if len(data_frame.columns) == 0:

b) only columns given (index empy):
Writes out empty dataframe, containing columns but also the column names as index

Auto-set numeric datatypes

Problem:
When writing a dataframe without specifying the dtype beforehand one often runs into the error
unkown format '%s' for object of type 'int' or similar, because dtype=object, the default if pandas does not know the datatype, maps to '%s' in our writer.

I would not automatically try to solve it, but I would add an easy solution. Talked to Lukas: Have it as a function in tools, then you can also use it on pandas DataFrames. (please share opinion @Mael-Le-Garrec @lmalina @mihofer @awegsche
)

from functools import partial
import pandas as pd
(...)
def auto_dtype(df):
    return df.apply(partial(pd.to_numeric, errors='ignore')) 

Advanced Formatting Suggestions

Comparing our tfs output to madx tfs output shows great disadvantages in readablitity in out tfs files.

Suggestions

  • Allow for fixed header widths
  • If save_index is used, left-align the first (index) column

Writing crash on non-string column names

Writer crashes on non-string (or non iterable) column names.

ddf = tfs.TfsDataFrame([[1, 2.], [3, 4.]])
tfs.write_tfs('fdfsdf.tfs', ddf)

ddf
   0    1
0  1  2.0
1  3  4.0

leads to

Traceback (most recent call last):
  File "tfs/handler.py", line 187, in write_tfs
    _validate(data_frame, f"to be written in {tfs_file_path.absolute()}")
  File "tfs/handler.py", line 474, in _validate
    if any(" " in c for c in data_frame.columns):
TypeError: argument of type 'int' is not iterable
  • Fix
  • WRITE TEST

Merging functions crash when encountering pandas DataFrames

At the moment, if any of TfsDataFrame.append, TfsDataFrame.join, TfsDataFrame.merge or tfs.concat encounter a pandas.DataFrame in their input, they will fail as they try to query the object's headers.

We should make a patch to accept these.

Fix use of deprecated np.str

Various CI build raise the following DeprecationWarning, which should be fairly simple to fix.

DeprecationWarning: `np.str` is a deprecated alias for the builtin `str`. To silence this warning, use `str` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

[Feature Request]: Make validation optional

Feature Description

As reported by some users, the validation step can be very slow on large dataframes (for instance sliced FCC lattice).

A good first step would be to make the validation step optional and leave the choice to users.

Possible Implementation

A new bool argument in the reader and writer functions, and an if statement before calling validate.

eol at eof ?

Should we add an eol (\n) at the end of the tfs file?

VIM is complaining and if you do cat file.tfs the next prompt starts directly after the values of the last column:
image

To be tested if this is then still readable by a) us b) madx (which also has no eofeol)

Wrintig crash on dtype problems

ddf = tfs.TfsDataFrame([[1, 2.], [3, 4.]], columns=['int', 'float'])
tfs.write_tfs('fdfsdf.tfs', ddf)

ddf
   int  float
0    1    2.0
1    3    4.0

crashes with

ValueError: Unknown format code 'd' for object of type 'float'
  • WHY????
  • fix
  • TEST

remove steup.cfg

not really needed and the constant coverage reports are annoying

[Discussion]: Add support for boolean type (%b) in twiss header?

See the last slide of Alexander's presentation here: MAD-NG introduces %b (boolean) in the twiss headers, which Alexander suggested we support.

Should be a simple case of:

  • adding the type to ID_TO_TYPE in the handler.
  • give a specific handling in _dtype_to_id_string when pdtypes.is_bool_dtype(type_) is true (currently gives back %d as for an integer)
  • add tests for the behavior

Important to check: what does MAD-X do when reading these (unknow for it) types? As @JoschD suggested we could create a flag (compatibility) for write_tfs that determines how to handle this.

@lmalina @JoschD @mihofer

Support for trackone output (PTC)

Hello, I like this package and I use it a lot. So thanks for writing it.

I am trying to use it to load output from MADX PTC for the first time and the current implementation of the loader is crashing because of the presence of non-standard lines mixed in with the main table data. A snipped of the data is pasted below (leaving out the header). Obviously it is crashing when trying to load the lines starting with #segment. My understanding of these lines is they mark an observation point with all particles below belonging to that point. The first number on the line is just the index and counts up. Second number I don't know what this is. Third number is the number of lines in that segment. Fourth number is possibly the index of the sampler in the beamline. The name at the end if the name of the observation point.

Maybe this is a "won't fix / not an issue" but if not then it would be nice to consider how to implement this. I don't mind doing it. No doubt if this was treated by default it would drastically slow down the loading. So ideally would be a flag to enable the parsing.

$                    %le                    %le                    %le                    %le                    %le                    %le                    %le                    %le                   
#segment       1       2      11       0 start                                          
  1.000000000000000e+00  0.000000000000000e+00 -7.485050595000000e-03 -3.486651945000000e-03  1.408320528500000e-02 -3.486651945000000e-03  0.000000000000000e+00 -5.000000000000000e-03  0.000000000000000e
  2.000000000000000e+00  0.000000000000000e+00 -5.988040476000000e-03 -2.789321556000000e-03  1.126656422800000e-02 -2.789321556000000e-03  0.000000000000000e+00 -4.000000000000000e-03  0.000000000000000e
  3.000000000000000e+00  0.000000000000000e+00 -4.491030357000000e-03 -2.091991167000000e-03  8.449923171000000e-03 -2.091991167000000e-03  0.000000000000000e+00 -3.000000000000000e-03  0.000000000000000e
  4.000000000000000e+00  0.000000000000000e+00 -2.994020238000000e-03 -1.394660778000000e-03  5.633282114000000e-03 -1.394660778000000e-03  0.000000000000000e+00 -2.000000000000000e-03  0.000000000000000e
  5.000000000000000e+00  0.000000000000000e+00 -1.497010119000000e-03 -6.973303890000001e-04  2.816641057000000e-03 -6.973303890000001e-04  0.000000000000000e+00 -1.000000000000000e-03  0.000000000000000e
  6.000000000000000e+00  0.000000000000000e+00  0.000000000000000e+00  0.000000000000000e+00  0.000000000000000e+00  0.000000000000000e+00  0.000000000000000e+00  0.000000000000000e+00  0.000000000000000e
  7.000000000000000e+00  0.000000000000000e+00  1.497010119000000e-03  6.973303890000001e-04 -2.816641057000000e-03  6.973303890000001e-04  0.000000000000000e+00  1.000000000000000e-03  0.000000000000000e
  8.000000000000000e+00  0.000000000000000e+00  2.994020238000000e-03  1.394660778000000e-03 -5.633282114000000e-03  1.394660778000000e-03  0.000000000000000e+00  2.000000000000000e-03  0.000000000000000e
  9.000000000000000e+00  0.000000000000000e+00  4.491030357000000e-03  2.091991167000000e-03 -8.449923171000000e-03  2.091991167000000e-03  0.000000000000000e+00  3.000000000000000e-03  0.000000000000000e
  1.000000000000000e+01  0.000000000000000e+00  5.988040476000000e-03  2.789321556000000e-03 -1.126656422800000e-02  2.789321556000000e-03  0.000000000000000e+00  4.000000000000000e-03  0.000000000000000e
  1.100000000000000e+01  0.000000000000000e+00  7.485050595000000e-03  3.486651945000000e-03 -1.408320528500000e-02  3.486651945000000e-03  0.000000000000000e+00  5.000000000000000e-03  0.000000000000000e
#segment       2       2      11     104 LUXE.TARGET                                    
  1.000000000000000e+00  1.000000000000000e+00  5.259466027436782e-03 -2.051903873665704e-02 -2.978780358169780e-02 -2.541279540830233e-02  0.000000000000000e+00 -5.000000000000000e-03  7.730849426828061e
  2.000000000000000e+00  1.000000000000000e+00  4.392648890935142e-03 -1.665085832937077e-02 -2.340368089034140e-02 -2.025864365870777e-02  0.000000000000000e+00 -4.000000000000000e-03  7.730849426828061e
  3.000000000000000e+00  1.000000000000000e+00  3.431943743212929e-03 -1.266012177677902e-02 -1.722476714179673e-02 -1.513417573531897e-02  0.000000000000000e+00 -3.000000000000000e-03  7.730849426828061e
  4.000000000000000e+00  1.000000000000000e+00  2.378645362977142e-03 -8.551614443416757e-03 -1.126011892202533e-02 -1.004604018279110e-02  0.000000000000000e+00 -2.000000000000000e-03  7.730849426828061e
  5.000000000000000e+00  1.000000000000000e+00  1.234161014750900e-03 -4.330038920110554e-03 -5.516836135196690e-03 -4.999821487667771e-03  0.000000000000000e+00 -1.000000000000000e-03  7.730849426828061e
  6.000000000000000e+00  1.000000000000000e+00  1.060641785314563e-30 -1.389211385140205e-30  8.408973718086066e-15  1.106205531972171e-14  0.000000000000000e+00  0.000000000000000e+00  7.730849426828061e
  7.000000000000000e+00  1.000000000000000e+00 -1.322236565765053e-03  4.434010064600291e-03  5.287384342390506e-03  4.950085226447743e-03  0.000000000000000e+00  1.000000000000000e-03  7.730849426828061e
  8.000000000000000e+00  1.000000000000000e+00 -2.730865609241517e-03  8.967625924981525e-03  1.034444121494745e-02  9.848270945371038e-03  0.000000000000000e+00  2.000000000000000e-03  7.730849426828061e
  9.000000000000000e+00  1.000000000000000e+00 -4.224131469513009e-03  1.359662305874779e-02  1.517247445916981e-02  1.469360189857558e-02  0.000000000000000e+00  3.000000000000000e-03  7.730849426828061e
  1.000000000000000e+01  1.000000000000000e+00 -5.800214661983473e-03  1.831693117876930e-02  1.977501489057500e-02  1.948636102584803e-02  0.000000000000000e+00  4.000000000000000e-03  7.730849426828061e
  1.100000000000000e+01  1.000000000000000e+00 -7.457239940043075e-03  2.312464709990392e-02  2.415786594644772e-02  2.422809810282515e-02  0.000000000000000e+00  5.000000000000000e-03  7.730849426828061e```

Bring back string input tests

I moved tests to using pathlib.Path inputs in #30, which in hindsight should have been an addition and not a change.

Let's bring string inputs back to tests, in order to test both.
This should be a quite small change: files similar to actual tests, but with the old fixtures inputs using os.path.join shenanigans.

Capitalize plane keys in Tfs class

To improve consistency among our packages capitalize the plane keys for "planed" data frames, i.e. ("X", "Y") instead of ("x", "y").
Requires change of _define_property_two_planes(args, kwargs).

Fix docstrings

Sphinx parses docstrings a bit strangely. We may also update the example in TfsCollection (should be at least .tfs files).

[Feature Request]: TFS definition

Feature Description

Provide a tfs definition for reference

Possible Implementation

Write it into the README

  • general definition (how to write header, columns, types, content, what does our reader allow)
  • MADX specifics (e.g. headers need to be single word, TYPE needs to be in header, all entries are converted to lowercase etc.)

Remove checks for unique indices

I'm not sure why we have this in the first place, apart from the fact that we later (i.e. omc3) need unique indices. But we might check there ...?

Especially if you save a DataFrame without save_index the check doesn't really make sense.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.