small-bodies-node / pds4_tools Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 12.0 2.44 MB

Python package to read and display NASA PDS4 data.

Python 100.00%

astronomy nasa pds pds4 python

pds4_tools's People

Contributors

Stargazers

Watchers

Forkers

cmillion levn0 michaelaye sbn-psi landingellipse whigg casigg dahlend mkelley ellequelle oli-ver-ui cgobat

pds4_tools's Issues

Support for later/latest mpl versions

I get the following error:

keyword grid_b is not recognized;

when trying to plot from pds4_tools with mpl 3.7.1

It looks like grid_b is now grid_visible.

I understand updating dependencies may not be easy/possible, but wanted to flag this anyway. Thanks!

Label Array axes in Table View

PDS Class Array allows for names of array axes using attribute axis_name.

Although axis_name is present in the label, it gets "ignored" by the PDS viewer when displaying the array by clicking on the Table bottom. (current button available: Label, Table, Plot)

It would be good if the axis name/s were recognised by the viewer also when displaying the array (and not just in the label)

Thank you
Daniela

Colons stripped from table field names on read

I have a data product with columns defined like:

                <Field_Character>
                    <name>VFS1102L FREND:FRAM NUMD1</name>
                    <field_number>3</field_number>
                    <field_location unit="byte">47</field_location>
                    <data_type>ASCII_NonNegative_Integer</data_type>
                    <field_length unit="byte">10</field_length>
                    <description>FRAM NUM: Running counter, 0...65535</description>
                </Field_Character>

However when I read this table in pds4_tools, the colon seems to be stripped, e.g.

In [28]: struct = pds4_read('frd_raw_sc_d_20160406T000000-20160406T235959.xml')
In [30]: struct[0].data
<...snip...>
            dtype=(numpy.record, [('PUS_TIME_UTC', '<U27'), ('PUS_TIME', '<U17'), ('VFS1102L FREND_FRAM NUMD1', 'u1'), ('PACKET_COUNTER', 'i1'), ('VFS1105L FREND_DOS DATA1', 'O'), ('VFS1205L FREND_DOS DATA2', 'O'), ('VFS1305L FREND_DOS DATA3', 'O'), ('VFS1405L FREND_DOS DATA4', 'O'), ('VFS1505L FREND_DOS DATA5', 'O'), ('VFS1605L FREND_DOS DATA6', 'O'), ('VFS1705L FREND_DOS DATA7', 'O'), ('VFS1805L FREND_DOS DATA8', 'O'), ('VFS1905L FREND_DOS DATA9', 'O'), ('VFS2005L FREND_DOS DATA10', 'O')]))

and if I try to index by the correct name I get an error, e.g.

In [31]: struct[0].data['VFS1102L FREND:FRAM NUMD1']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [31], in <cell line: 1>()
----> 1 struct[0].data['VFS1102L FREND:FRAM NUMD1']

File ~/miniconda3/envs/bepi/lib/python3.10/site-packages/pds4_tools/reader/data.py:169, in PDS_ndarray.__getitem__(self, idx)
    155 def __getitem__(self, idx):
    156     """
    157     Parameters
    158     ----------
   (...)
    167         then the meta_data will be preserved for those fields or records.
    168     """
--> 169     obj = super(PDS_ndarray, self).__getitem__(idx)
    171     # For structured arrays, retrieve the correct meta_data portion if we are not obtaining all of the
    172     # fields
    173     if isinstance(obj, np.ndarray):

ValueError: no field of name VFS1102L FREND:FRAM NUMD1

but instead I have to access the field with an underscore:

struct[0].data['VFS1102L FREND_FRAM NUMD1']

Is this a bug, or intended? Currently, for various reasons, I'm getting the field names from the table manifest, where the values are correct, and then this fails when using the field name to access the data.

to_dict method | Reference_List dictionary - sub-groups collapse

Dear @LevN0,

I found out that the to_dict method has apparently an issue when dealing with a Reference_List, and apparently collapses the duplicate sub-groups into a single one.
Consider a Reference_List containing multiple Internal_Reference labels set as below:

Which when read as follows:

Provides the following results:

My understanding is that this feature happens despite the Reference_List being structured as expected according the Information model specs (https://pds.nasa.gov/datastandards/documents/im/v1/index_1K00.html#reference_type).

Could you throw maybe some light on this ?
Looking forward to your reply,
All the best

pds4_tools should not propagate (lower level) LogRecords when instructed to be quiet

Observed behaviour

In [1]: import logging

In [2]: logging.basicConfig(level=logging.DEBUG)

In [3]: logger = logging.getLogger("application")

In [4]: logger.info("hello world")
INFO:application:hello world

In [5]: import pds4_tools

In [6]: dp = pds4_tools.read("/tmp/product.xml", quiet=True)
INFO:PDS4ToolsLogger:Processing label: /tmp/product.xml

INFO:PDS4ToolsLogger:Now processing a Table_Delimited structure: TABLE_ANCIL

Desired behaviour

Setting pds4_tool.reads keyword argument quiet to True prevents LogRecords (below logging.ERROR) from propagating up the hierarchy.

Details

This is possibly getting into opinion territory, but when using PDS4 Tools as a library I would prefer if it respected it's advertised quietness. This is currently not the case because the quiet keyword sets the logging level of the package's stdout handler, not that of the "PDS4ToolsLogger" logger itself. I surmise that this is because doing the latter would break the read in log functionality, as the PDS4SilentHandler instance would be starved of input.

Workaround

A minimally intrusive fix would be to add some logic in the already subclassed PDS4Logger._log method, to instruct the logger not to propagate records below _quiet (i.e. logging.ERROR) when PDS4Logger.is_quiet() == True. For instance:

def _log(self, level, *args, **kwargs):
    """
    ...
    """
    self.propagate = not (self.is_quiet() and level < _quiet)
    ...
    super(PDS4Logger, self)._log(level, *args, **kwargs)
    ...

...where the level parameter has been extracted from args for readability.

Incorrect parsing in `read_tables._extract_fixed_width_field_data()`

The function read_tables._extract_fixed_width_field_data() is constructed to prefer a few optimized special cases and fall back to general-case code. However, that general code does not correctly extract from table_byte_data in my case. Specifically, in my case, the entire table is a single group with a single, repeated field. Since the entirety of table_byte_data should be appended to extracted_data, I can perform a partial verification of correct parsing by inserting code into the general-case section that creates a zeroed array and then increments each value position as it is used:

a = np.zeros(len(table_byte_data), np.int64)  # New
...
    extracted_data.append(table_byte_data[start_byte:stop_byte])  # Original
    a[start_byte:stop_byte] += 1  # New

After extraction, np.all(a == 1) should be True, indicating that each byte was read exactly once, but instead:

a.min() == 0
a.max() == 8
(a == 1).sum() == 256  # Values read exactly once.
a.size == 384_000_000  # Count of all values.
(a.mean() < 1) == True  # Because some slices extend beyond end.

Sufficient for my (very special) use case, I added this code at the start of read_tables._extract_fixed_width_field_data():

    # Most simplified, sped up, case for fields in a single group that 
    # spans entire table
    if ((field_location == 0) and
        (len(repetition_lengths) == 1) and 
        (field_length*repetition_lengths[0] == record_length)):

        # Loop over each field in each record
        for start_byte in range(0, len(table_byte_data), field_length):
            stop_byte = start_byte + field_length
            
            extracted_data.append(table_byte_data[start_byte:stop_byte])
     
        return

Analysis with array a, as further above, results in np.all(a == 1) == True.

That case may be too special to directly support in the production code, but I provide it for reference. Certainly, the general-case code still needs debugging.

PDS4 Standards Change for Line Delimiters

PDS4 is planning a standards change that will allow Table-type objects to use record delimiters other than "Carriage-Return Line-Feed" - specifically, just the linefeed character. It looks like the current delimiter requirement is hard-coded into the PDS4 viewer. This needs to be upgraded to check the delimiter attribute in the label and act accordingly.

Contact @acraugh for PDS4 standards details as they are finalized.

vendored version of six.py is incompatible with python 3.12

Importing pds4_tools in Python 3.12 fails. This appears to be due to an incompatibility between Python 3.12 and how six.py 1.13.0 constructs the import metapath for six.moves. Updating to version 1.16.0 of six.py fixes this.

Traceback below:

  File "/home/michael/Desktop/pdr/read_scratch.py", line 24, in <module>
    import pds4_tools
  File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/__init__.py", line 3, in <module>
    from .reader import pds4_read
  File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/reader/__init__.py", line 1, in <module>
    from .core import pds4_read
  File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/reader/core.py", line 10, in <module>
    from .label_objects import Label
  File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/reader/label_objects.py", line 11, in <module>
    from .general_objects import Meta_Class
  File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/reader/general_objects.py", line 11, in <module>
    from ..utils.data_access import is_supported_url, download_file
  File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/utils/data_access.py", line 14, in <module>
    from ..extern.six.moves import urllib
ModuleNotFoundError: No module named 'pds4_tools.extern.six.moves'

`ArrayStructure.as_masked` fails when input has no masked constants

User reported issue with visualizing a simple array using PDS4 Viewer. Traced to the following error,

File "read_arrays.py", line 239, in new_array
array_structure.data.set_fill_value(array.fill_value)
AttributeError: 'PDS_ndarray' object has no attribute 'fill_value'

PDS4 Viewer : Inconvenient to view large data in table

Context :
When opening a Table with PDS4 Viewer, if the content of the table are long they are viewed truncated.

Mac OS 13.4 (22F66)
PDS4 Viewer version 1.3

Problem :
We want to see the whole information

Workaround:
When selecting the text by hand we can change the display of the value but only one by one

Wanted behaviour:
When opening a Table with PDS4 Viewer the columns keep the same width by default but can be resized by drag and dropping handlers on each column sides.

Fix crash when using Movie_Display_Settings

Users Report:

When using Array_3D_Movie in combination with the Display Dictionary's Movie_Display_Settings, PDS4 Viewer will raise an exception upon attempt to display the image.

(Example omitted.)

`pds4-tools` on conda

We are attempting to make the Planetary Data Reader (pdr) available via conda. pds4-tools is one of our dependencies, and it is not on conda, so we can't put our code on conda without vendoring your code, which is not our preference. Would you consider putting this package onto conda?

Investigate default value of ``Label.to_dict(..., skip_attributes=bool)``

The Label class provides a method to convert the label from Label (which is effectively a subclass of ElementTree) to a dictionary, namely Label.to_dict(...).

The method includes a number of optional arguments.

One of the optional arguments is skip_attributes. The effect of this argument is as follows:

Original XML,

<Table_Character>
    <offset unit="byte">0</offset>
    <records>76</records>
    ...
</Table_Character>

Converted to dict via Label.to_dict(skip_attributes=True),

{
    "Table_Character": {
        "offset": "0",
        "records": "76"
    }
}

Converted to dict via Label.to_dict(skip_attributes=False),

{
    "Table_Character": {
        "offset": {
            "@unit": "byte",
            "_text": "0"
        },
       "records": "76"
    }
}

The help string for this argument states "If True, skips adding attributes from XML. Defaults to False." However, despite this text, the argument is actually set to True by default, i.e. attributes are skipped when converting to dict.

Need to investigate whether the argument should be changed to False by default as the help string says, or whether the help string should be adjusted.

Error converting dates with fractional seconds

User Report:

Came across another small bug in the PDS4 tools, which affects date formats as: 2018-12-12T08:51:16.28972Z (5 decimals) when trying to plot.

The issue is in pds4_tools.reader.data_types.data_type_convert_dates on line 652, where I believe that:

for i in range(0, 6):
    format_lengths[max(format_lengths.keys()) + i] = format

should be

for i in range(0, 6):
    format_lengths[max(format_lengths.keys()) + 1] = format

otherwise the key for the lengths of formats with various numbers of decimals is calculated incorrectly - specifically for the above case I get a key error due to the length (25) not being in the format_lengths dictionary.

Feature request: Support image coordinates from EBT dictionary

The Earth Based Telescope dictionary, e.g., https://pds.nasa.gov/datastandards/dictionaries/index-1.20.0.0.shtml#ebt , contains classes to convert pixels to sky coordinates. Can pds4_tools support <ebt:World_Coordinate_System>? Especially useful for data reviews would be the ability to mouse over an image and see the sky coordinate values in real-time (much like the DS9 tool does with FITS files). We can discuss ways to address this, e.g., with a new external library or by adding something to this package specifically. Regardless, we definitely want a solution that this package can use.

pds4_tools.view() does not open anything on mac

As I'm using macOS 10.14.6 which is not supported (and indeed a quite bad bug it is), I'm trying to use the python route of the viewer.
Trying it in an ipython session, I'm calling pds4_tools.view() but nothing happens.
I'm guessing this might need the correct backend setting for the GUI to work. Which backend is required?

Releasing a new version

Hi,

I have done a pip install pds4_tools on a relatively clean installation of python 3.11 and I am getting a module not found error for TKInter.

I was wondering if a new version could be released so that I can take advantage of the tkinter as an optional import which got merged in may, I would rather not have to grab an un-versioned master copy if I can avoid it.

Thanks!

TypeError: 'NoneType' object is not callable when reading a TableDelimited in debug mode with pandas in dept

Context:

I have Pycharm in MacOsX
running a test with pytest in debug mode.

def test_reading_pds4():
    product = pds4_tools.read( "./my_product.xml")

Works fine

Whereas :

import pandas
pandas.DataFrame()
def test_reading_pds4():
    product = pds4_tools.read( "./my_product.xml")

raises :

../venv/lib/python3.10/site-packages/pds4_tools/reader/core.py:198: in pds4_read
    structures = read_structures(label, filename, lazy_load=lazy_load, no_scale=no_scale,
../venv/lib/python3.10/site-packages/pds4_tools/reader/core.py:309: in read_structures
    structure.data
../venv/lib/python3.10/site-packages/pds4_tools/extern/cached_property.py:86: in __get__
    return obj_dict.setdefault(name, self.func(obj))
../venv/lib/python3.10/site-packages/pds4_tools/reader/table_objects.py:363: in data
    read_table_data(self, no_scale=self._no_scale, decode_strings=self._decode_strings, masked=self._masked)
../venv/lib/python3.10/site-packages/pds4_tools/reader/read_tables.py:955: in read_table_data
    table_structure.data = new_table(extracted_fields, no_scale=no_scale, decode_strings=decode_strings,
../venv/lib/python3.10/site-packages/pds4_tools/reader/read_tables.py:730: in new_table
    table_structure.data = np.recarray(num_records, dtype=dtypes).view(array_type)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

subtype = <class 'numpy.recarray'>, shape = 44
dtype = [('INSTRUMENT_ID', dtype('<U4')), ('UNIT_ID', dtype('<U1')), ('PDS4_TARGET_LID', dtype('<U56')), ('PDS4_TARGET_NAME', dtype('<U10')), ('PDS4_TARGET_TYPE', dtype('<U17')), ('VISIBILITY_START', dtype('<U24')), ...]
buf = None, offset = 0, strides = None, formats = None, names = None
titles = None, byteorder = None, aligned = False, order = 'C'

    def __new__(subtype, shape, dtype=None, buf=None, offset=0, strides=None,
                formats=None, names=None, titles=None,
                byteorder=None, aligned=False, order='C'):
    
        if dtype is not None:
>           descr = sb.dtype(dtype)
E           TypeError: 'NoneType' object is not callable

../venv/lib/python3.10/site-packages/numpy/core/records.py:423: TypeError

my_product.zip

pds4_tools should not override the Logger class globally

Observed behaviour

In [1]: import logging

In [2]: type(logging.getLogger("pre"))
Out[2]: logging.Logger

In [3]: import pds4_tools

In [4]: pds4_tools.utils.logging.logger_init().__class__
Out[4]: pds4_tools.utils.logging.PDS4Logger

In [5]: type(logging.getLogger("post"))
Out[5]: pds4_tools.utils.logging.PDS4Logger

Desired behaviour

In [1]: import logging

In [2]: type(logging.getLogger("pre"))
Out[2]: logging.Logger

In [3]: import pds4_tools

In [4]: pds4_tools.utils.logging.logger_init().__class__
Out[4]: pds4_tools.utils.logging.PDS4Logger

In [5]: type(logging.getLogger("post"))
Out[5]: logging.Logger

Details

pds4_tools.util.logging:logger_init() instructs the Python logging module to use the PDS4Logger class for all subsequent logger instantiations. This is of course not a problem when PDS4 Tools is used as an application, i.e. via the viewer, but when used as a library within other applications the behaviour is not ideal.

Resetting the default logger class to its previous state after instantiation/fetching of the "PDS4ToolsLogger" object solves this issue:

...
def logger_init():
    """ Initializes or obtains the logger and its handlers.

    Returns
    -------
    PDS4Logger
        The global logger for all pds4 tools.
    """

    _Logger = logging.getLoggerClass()  # delta: add
    logging.setLoggerClass(PDS4Logger)
    logger = logging.getLogger('PDS4ToolsLogger')
    logging.setLoggerClass(_Logger)  # delta: add

    logger.setLevel(_loud)
    ...

Incorrect (?) warning on field_format

I get the following warning opening a data product:

Warning: field_format '%+17.10E' does not conform to PDS4 standards for field 'ACC_X' (full location: 'ACC_X')

which goes away if I replace the upper-case E in the format specifier with a lower case e. However, as far as I can see from the IM both are allowed, and from the standards doc should be identical:

Perhaps I'm missing something, but I cannot obviously see what is invalid in the given specifier.

Support for web-hosted data

@LevN0 what do you think about support for URLs and remote files in pds4_read?

PDS Standards Change: Line Delimiters

PDS4 Standards have changed to allow text delimiters other that CR/LF, effective with the current IM release. We are starting to get reports from users that pds4_viewer is failing when LF delimiters are specified.

pds4 viewer : display a spectral profile from a spectral cube.

To view an Array_3D_Spectrum data, the PDS4 viewer offers Table or Image options.
It would be nice to have a Plot option inside the Image visualisation to view a spectral representation of one pixel.

examples :

3 - a cube of dimension spatial_x/spatial_y/spectral viewed with frames of dimensions spatial_x/spatial_y
-> a button "view spectral pixel" would change the cursor to a cross and a selection of pixel would give a graph representation of array_value per spectral axis (frame index)

1 - a cube of dimension spatial_x/spatial_y/spectral viewed with frames of dimensions (spatial_y or spatial_x)/spectral

a button "view spectral horizontal line" would change the cursor to an horizontal arrow and a selection of a pixel would give a graph representation or array_value per spectral axis.

####2 - similarily in vertical mode

3 - a cube of dimension spatial_x/spatial_y/spectral viewed with frames of dimensions spatial_x/spatial_y

-> a button "view spectral pixel" would change the cursor to a cross and a selection of pixel would give a graph representation of array_value per spectral axis (frame index)

StructureList get by name fails when LID also present

Users Report:

In the case that a data structure has both a local_identifier and a name, it seems that the structure can only be accessed by the local_identifier, and no longer the name.

(Example omitted.)

Array read-in crash if Object_Statistics exists without child elements

User Reports:

If an Array structure contains,

<Object_Statistics>
   <!-- exists but has no child elements --!>
</Object_Statistics>

This causes array read-in to crash.

Table_Delimited with semi-colon delimiter broken

User Report:

PDS4 Viewer does not correctly handle csv files with semicolons as the delimiter. You get the error as shown in the screenshot. If I change the .xml file to look for Comma instead of Semicolon and replace the ; with , throughout the .csv, pds4 viewer works.

Table_Delimited crashes on null values in ASCII_Boolean fields

Issue

When reading Table_Delimited structures, records containing empty values for numeric fields like ASCII_Integer are masked as expected. However, this is not the case for ASCII_Boolean fields, where instead a ValueError is thrown ("invalid literal for int() with base 10: b''").

The only hint I've found on the "legality" of empty values in boolean fields comes from the Standards Reference section 4C.1 (Delimiter Separated Value Format Description), which says that:

A field may be empty. The interpretation of an empty field will be application and data type dependent.

While this does not directly imply that PDS4 Tools must support empty booleans in particular, I think the fact that the tool already interprets empty numeric fields via masking makes it reasonable for end users to assume that this behaviour would extend also to booleans.

Potential solution

Since boolean fields are also represented as ndarrays, I've had success with simply extending the existing masking behaviour, i.e.

pds4_tools/pds4_tools/reader/data_types.py

Lines 472 to 476 in 8ada764

 for i, datum in enumerate(data): 

 if datum.strip() == b'': 

 mask_array[i] = True 

 data[mask_array] = b'0'

...to the handling of boolean fields. I've submitted #37 as a starting point for a solution.

I'm looking forward to hearing your thoughts on this, in particular if you can think of any caveats or reasons not to support masked booleans.

Many thanks for your work on PDS4 Tools!

Version of Six causing warnings in python 3.11

I've been having a few issues using python 3.10/3.11 with this package due to the older version of the Six package which appears to be copied into the Extern folder.

See the discussion here: benjaminp/six#341

Specifically the newer versions of Six have this: https://github.com/benjaminp/six/blob/master/six.py#L194 to be defined.

Potential solutions:

Update this six.py file
Update the package installation method so that pip will automatically grab the latest package versions instead of having an extern folder.
Remove support for python 2?

	for i, datum in enumerate(data):
	if datum.strip() == b'':
	mask_array[i] = True

	data[mask_array] = b'0'

small-bodies-node / pds4_tools Goto Github PK

pds4_tools's People

Contributors

Stargazers

Watchers

Forkers

pds4_tools's Issues

Observed behaviour

Desired behaviour

Details

Workaround

Observed behaviour

Desired behaviour

Details

examples :

1 - a cube of dimension spatial_x/spatial_y/spectral viewed with frames of dimensions (spatial_y or spatial_x)/spectral

3 - a cube of dimension spatial_x/spatial_y/spectral viewed with frames of dimensions spatial_x/spatial_y

Issue

Potential solution

Recommend Projects

Recommend Topics

Recommend Org