small-bodies-node / pds4_tools Goto Github PK
View Code? Open in Web Editor NEWPython package to read and display NASA PDS4 data.
Python package to read and display NASA PDS4 data.
I get the following error:
keyword grid_b is not recognized;
when trying to plot from pds4_tools with mpl 3.7.1
It looks like grid_b
is now grid_visible
.
I understand updating dependencies may not be easy/possible, but wanted to flag this anyway. Thanks!
PDS Class Array allows for names of array axes using attribute axis_name.
Although axis_name is present in the label, it gets "ignored" by the PDS viewer when displaying the array by clicking on the Table bottom. (current button available: Label, Table, Plot)
It would be good if the axis name/s were recognised by the viewer also when displaying the array (and not just in the label)
Thank you
Daniela
I have a data product with columns defined like:
<Field_Character>
<name>VFS1102L FREND:FRAM NUMD1</name>
<field_number>3</field_number>
<field_location unit="byte">47</field_location>
<data_type>ASCII_NonNegative_Integer</data_type>
<field_length unit="byte">10</field_length>
<description>FRAM NUM: Running counter, 0...65535</description>
</Field_Character>
However when I read this table in pds4_tools, the colon seems to be stripped, e.g.
In [28]: struct = pds4_read('frd_raw_sc_d_20160406T000000-20160406T235959.xml')
In [30]: struct[0].data
<...snip...>
dtype=(numpy.record, [('PUS_TIME_UTC', '<U27'), ('PUS_TIME', '<U17'), ('VFS1102L FREND_FRAM NUMD1', 'u1'), ('PACKET_COUNTER', 'i1'), ('VFS1105L FREND_DOS DATA1', 'O'), ('VFS1205L FREND_DOS DATA2', 'O'), ('VFS1305L FREND_DOS DATA3', 'O'), ('VFS1405L FREND_DOS DATA4', 'O'), ('VFS1505L FREND_DOS DATA5', 'O'), ('VFS1605L FREND_DOS DATA6', 'O'), ('VFS1705L FREND_DOS DATA7', 'O'), ('VFS1805L FREND_DOS DATA8', 'O'), ('VFS1905L FREND_DOS DATA9', 'O'), ('VFS2005L FREND_DOS DATA10', 'O')]))
and if I try to index by the correct name I get an error, e.g.
In [31]: struct[0].data['VFS1102L FREND:FRAM NUMD1']
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [31], in <cell line: 1>()
----> 1 struct[0].data['VFS1102L FREND:FRAM NUMD1']
File ~/miniconda3/envs/bepi/lib/python3.10/site-packages/pds4_tools/reader/data.py:169, in PDS_ndarray.__getitem__(self, idx)
155 def __getitem__(self, idx):
156 """
157 Parameters
158 ----------
(...)
167 then the meta_data will be preserved for those fields or records.
168 """
--> 169 obj = super(PDS_ndarray, self).__getitem__(idx)
171 # For structured arrays, retrieve the correct meta_data portion if we are not obtaining all of the
172 # fields
173 if isinstance(obj, np.ndarray):
ValueError: no field of name VFS1102L FREND:FRAM NUMD1
but instead I have to access the field with an underscore:
struct[0].data['VFS1102L FREND_FRAM NUMD1']
Is this a bug, or intended? Currently, for various reasons, I'm getting the field names from the table manifest, where the values are correct, and then this fails when using the field name to access the data.
Dear @LevN0,
I found out that the to_dict method has apparently an issue when dealing with a Reference_List, and apparently collapses the duplicate sub-groups into a single one.
Consider a Reference_List containing multiple Internal_Reference labels set as below:
Which when read as follows:
Provides the following results:
My understanding is that this feature happens despite the Reference_List being structured as expected according the Information model specs (https://pds.nasa.gov/datastandards/documents/im/v1/index_1K00.html#reference_type).
Could you throw maybe some light on this ?
Looking forward to your reply,
All the best
In [1]: import logging
In [2]: logging.basicConfig(level=logging.DEBUG)
In [3]: logger = logging.getLogger("application")
In [4]: logger.info("hello world")
INFO:application:hello world
In [5]: import pds4_tools
In [6]: dp = pds4_tools.read("/tmp/product.xml", quiet=True)
INFO:PDS4ToolsLogger:Processing label: /tmp/product.xml
INFO:PDS4ToolsLogger:Now processing a Table_Delimited structure: TABLE_ANCIL
Setting pds4_tool.read
s keyword argument quiet
to True prevents LogRecord
s (below logging.ERROR
) from propagating up the hierarchy.
This is possibly getting into opinion territory, but when using PDS4 Tools as a library I would prefer if it respected it's advertised quietness. This is currently not the case because the quiet
keyword sets the logging level of the package's stdout handler, not that of the "PDS4ToolsLogger" logger itself. I surmise that this is because doing the latter would break the read in log functionality, as the PDS4SilentHandler
instance would be starved of input.
A minimally intrusive fix would be to add some logic in the already subclassed PDS4Logger._log
method, to instruct the logger not to propagate records below _quiet
(i.e. logging.ERROR
) when PDS4Logger.is_quiet() == True
. For instance:
def _log(self, level, *args, **kwargs):
"""
...
"""
self.propagate = not (self.is_quiet() and level < _quiet)
...
super(PDS4Logger, self)._log(level, *args, **kwargs)
...
...where the level
parameter has been extracted from args
for readability.
The function read_tables._extract_fixed_width_field_data()
is constructed to prefer a few optimized special cases and fall back to general-case code. However, that general code does not correctly extract from table_byte_data
in my case. Specifically, in my case, the entire table is a single group with a single, repeated field. Since the entirety of table_byte_data
should be appended to extracted_data
, I can perform a partial verification of correct parsing by inserting code into the general-case section that creates a zeroed array and then increments each value position as it is used:
a = np.zeros(len(table_byte_data), np.int64) # New
...
extracted_data.append(table_byte_data[start_byte:stop_byte]) # Original
a[start_byte:stop_byte] += 1 # New
After extraction, np.all(a == 1)
should be True
, indicating that each byte was read exactly once, but instead:
a.min() == 0
a.max() == 8
(a == 1).sum() == 256 # Values read exactly once.
a.size == 384_000_000 # Count of all values.
(a.mean() < 1) == True # Because some slices extend beyond end.
Sufficient for my (very special) use case, I added this code at the start of read_tables._extract_fixed_width_field_data()
:
# Most simplified, sped up, case for fields in a single group that
# spans entire table
if ((field_location == 0) and
(len(repetition_lengths) == 1) and
(field_length*repetition_lengths[0] == record_length)):
# Loop over each field in each record
for start_byte in range(0, len(table_byte_data), field_length):
stop_byte = start_byte + field_length
extracted_data.append(table_byte_data[start_byte:stop_byte])
return
Analysis with array a
, as further above, results in np.all(a == 1) == True
.
That case may be too special to directly support in the production code, but I provide it for reference. Certainly, the general-case code still needs debugging.
PDS4 is planning a standards change that will allow Table-type objects to use record delimiters other than "Carriage-Return Line-Feed" - specifically, just the linefeed character. It looks like the current delimiter requirement is hard-coded into the PDS4 viewer. This needs to be upgraded to check the delimiter attribute in the label and act accordingly.
Contact @acraugh for PDS4 standards details as they are finalized.
Importing pds4_tools
in Python 3.12 fails. This appears to be due to an incompatibility between Python 3.12 and how six.py
1.13.0 constructs the import metapath for six.moves
. Updating to version 1.16.0 of six.py
fixes this.
Traceback below:
File "/home/michael/Desktop/pdr/read_scratch.py", line 24, in <module>
import pds4_tools
File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/__init__.py", line 3, in <module>
from .reader import pds4_read
File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/reader/__init__.py", line 1, in <module>
from .core import pds4_read
File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/reader/core.py", line 10, in <module>
from .label_objects import Label
File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/reader/label_objects.py", line 11, in <module>
from .general_objects import Meta_Class
File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/reader/general_objects.py", line 11, in <module>
from ..utils.data_access import is_supported_url, download_file
File "/opt/mambaforge/envs/pdr/lib/python3.12/site-packages/pds4_tools/utils/data_access.py", line 14, in <module>
from ..extern.six.moves import urllib
ModuleNotFoundError: No module named 'pds4_tools.extern.six.moves'
User reported issue with visualizing a simple array using PDS4 Viewer. Traced to the following error,
File "read_arrays.py", line 239, in new_array
array_structure.data.set_fill_value(array.fill_value)
AttributeError: 'PDS_ndarray' object has no attribute 'fill_value'
Context :
When opening a Table with PDS4 Viewer, if the content of the table are long they are viewed truncated.
Problem :
We want to see the whole information
Workaround:
When selecting the text by hand we can change the display of the value but only one by one
Wanted behaviour:
When opening a Table with PDS4 Viewer the columns keep the same width by default but can be resized by drag and dropping handlers on each column sides.
Users Report:
When using Array_3D_Movie in combination with the Display Dictionary's Movie_Display_Settings, PDS4 Viewer will raise an exception upon attempt to display the image.
(Example omitted.)
We are attempting to make the Planetary Data Reader (pdr
) available via conda. pds4-tools
is one of our dependencies, and it is not on conda, so we can't put our code on conda without vendoring your code, which is not our preference. Would you consider putting this package onto conda?
The Label class provides a method to convert the label from Label (which is effectively a subclass of ElementTree) to a dictionary, namely Label.to_dict(...)
.
The method includes a number of optional arguments.
One of the optional arguments is skip_attributes
. The effect of this argument is as follows:
Original XML,
<Table_Character>
<offset unit="byte">0</offset>
<records>76</records>
...
</Table_Character>
Converted to dict via Label.to_dict(skip_attributes=True)
,
{
"Table_Character": {
"offset": "0",
"records": "76"
}
}
Converted to dict via Label.to_dict(skip_attributes=False)
,
{
"Table_Character": {
"offset": {
"@unit": "byte",
"_text": "0"
},
"records": "76"
}
}
The help string for this argument states "If True, skips adding attributes from XML. Defaults to False." However, despite this text, the argument is actually set to True by default, i.e. attributes are skipped when converting to dict.
Need to investigate whether the argument should be changed to False by default as the help string says, or whether the help string should be adjusted.
User Report:
Came across another small bug in the PDS4 tools, which affects date formats as: 2018-12-12T08:51:16.28972Z (5 decimals) when trying to plot.
The issue is in pds4_tools.reader.data_types.data_type_convert_dates on line 652, where I believe that:
for i in range(0, 6):
format_lengths[max(format_lengths.keys()) + i] = format
should be
for i in range(0, 6):
format_lengths[max(format_lengths.keys()) + 1] = format
otherwise the key for the lengths of formats with various numbers of decimals is calculated incorrectly - specifically for the above case I get a key error due to the length (25) not being in the format_lengths dictionary.
The Earth Based Telescope dictionary, e.g., https://pds.nasa.gov/datastandards/dictionaries/index-1.20.0.0.shtml#ebt , contains classes to convert pixels to sky coordinates. Can pds4_tools support <ebt:World_Coordinate_System>
? Especially useful for data reviews would be the ability to mouse over an image and see the sky coordinate values in real-time (much like the DS9 tool does with FITS files). We can discuss ways to address this, e.g., with a new external library or by adding something to this package specifically. Regardless, we definitely want a solution that this package can use.
As I'm using macOS 10.14.6 which is not supported (and indeed a quite bad bug it is), I'm trying to use the python route of the viewer.
Trying it in an ipython session, I'm calling pds4_tools.view()
but nothing happens.
I'm guessing this might need the correct backend setting for the GUI to work. Which backend is required?
Hi,
I have done a pip install pds4_tools
on a relatively clean installation of python 3.11 and I am getting a module not found error for TKInter.
I was wondering if a new version could be released so that I can take advantage of the tkinter as an optional import which got merged in may, I would rather not have to grab an un-versioned master copy if I can avoid it.
Thanks!
Context:
I have Pycharm in MacOsX
running a test with pytest in debug mode.
def test_reading_pds4():
product = pds4_tools.read( "./my_product.xml")
Works fine
Whereas :
import pandas
pandas.DataFrame()
def test_reading_pds4():
product = pds4_tools.read( "./my_product.xml")
raises :
../venv/lib/python3.10/site-packages/pds4_tools/reader/core.py:198: in pds4_read
structures = read_structures(label, filename, lazy_load=lazy_load, no_scale=no_scale,
../venv/lib/python3.10/site-packages/pds4_tools/reader/core.py:309: in read_structures
structure.data
../venv/lib/python3.10/site-packages/pds4_tools/extern/cached_property.py:86: in __get__
return obj_dict.setdefault(name, self.func(obj))
../venv/lib/python3.10/site-packages/pds4_tools/reader/table_objects.py:363: in data
read_table_data(self, no_scale=self._no_scale, decode_strings=self._decode_strings, masked=self._masked)
../venv/lib/python3.10/site-packages/pds4_tools/reader/read_tables.py:955: in read_table_data
table_structure.data = new_table(extracted_fields, no_scale=no_scale, decode_strings=decode_strings,
../venv/lib/python3.10/site-packages/pds4_tools/reader/read_tables.py:730: in new_table
table_structure.data = np.recarray(num_records, dtype=dtypes).view(array_type)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
subtype = <class 'numpy.recarray'>, shape = 44
dtype = [('INSTRUMENT_ID', dtype('<U4')), ('UNIT_ID', dtype('<U1')), ('PDS4_TARGET_LID', dtype('<U56')), ('PDS4_TARGET_NAME', dtype('<U10')), ('PDS4_TARGET_TYPE', dtype('<U17')), ('VISIBILITY_START', dtype('<U24')), ...]
buf = None, offset = 0, strides = None, formats = None, names = None
titles = None, byteorder = None, aligned = False, order = 'C'
def __new__(subtype, shape, dtype=None, buf=None, offset=0, strides=None,
formats=None, names=None, titles=None,
byteorder=None, aligned=False, order='C'):
if dtype is not None:
> descr = sb.dtype(dtype)
E TypeError: 'NoneType' object is not callable
../venv/lib/python3.10/site-packages/numpy/core/records.py:423: TypeError
In [1]: import logging
In [2]: type(logging.getLogger("pre"))
Out[2]: logging.Logger
In [3]: import pds4_tools
In [4]: pds4_tools.utils.logging.logger_init().__class__
Out[4]: pds4_tools.utils.logging.PDS4Logger
In [5]: type(logging.getLogger("post"))
Out[5]: pds4_tools.utils.logging.PDS4Logger
In [1]: import logging
In [2]: type(logging.getLogger("pre"))
Out[2]: logging.Logger
In [3]: import pds4_tools
In [4]: pds4_tools.utils.logging.logger_init().__class__
Out[4]: pds4_tools.utils.logging.PDS4Logger
In [5]: type(logging.getLogger("post"))
Out[5]: logging.Logger
pds4_tools.util.logging:logger_init()
instructs the Python logging
module to use the PDS4Logger
class for all subsequent logger instantiations. This is of course not a problem when PDS4 Tools is used as an application, i.e. via the viewer, but when used as a library within other applications the behaviour is not ideal.
Resetting the default logger class to its previous state after instantiation/fetching of the "PDS4ToolsLogger" object solves this issue:
...
def logger_init():
""" Initializes or obtains the logger and its handlers.
Returns
-------
PDS4Logger
The global logger for all pds4 tools.
"""
_Logger = logging.getLoggerClass() # delta: add
logging.setLoggerClass(PDS4Logger)
logger = logging.getLogger('PDS4ToolsLogger')
logging.setLoggerClass(_Logger) # delta: add
logger.setLevel(_loud)
...
I get the following warning opening a data product:
Warning: field_format '%+17.10E' does not conform to PDS4 standards for field 'ACC_X' (full location: 'ACC_X')
which goes away if I replace the upper-case E in the format specifier with a lower case e. However, as far as I can see from the IM both are allowed, and from the standards doc should be identical:
Perhaps I'm missing something, but I cannot obviously see what is invalid in the given specifier.
@LevN0 what do you think about support for URLs and remote files in pds4_read
?
PDS4 Standards have changed to allow text delimiters other that CR/LF, effective with the current IM release. We are starting to get reports from users that pds4_viewer is failing when LF delimiters are specified.
To view an Array_3D_Spectrum data, the PDS4 viewer offers Table or Image options.
It would be nice to have a Plot option inside the Image visualisation to view a spectral representation of one pixel.
3 - a cube of dimension spatial_x/spatial_y/spectral viewed with frames of dimensions spatial_x/spatial_y
-> a button "view spectral pixel" would change the cursor to a cross and a selection of pixel would give a graph representation of array_value per spectral axis (frame index)
a button "view spectral horizontal line" would change the cursor to an horizontal arrow and a selection of a pixel would give a graph representation or array_value per spectral axis.
####2 - similarily in vertical mode
-> a button "view spectral pixel" would change the cursor to a cross and a selection of pixel would give a graph representation of array_value per spectral axis (frame index)
Users Report:
In the case that a data structure has both a local_identifier and a name, it seems that the structure can only be accessed by the local_identifier, and no longer the name.
(Example omitted.)
User Reports:
If an Array structure contains,
<Object_Statistics>
<!-- exists but has no child elements --!>
</Object_Statistics>
This causes array read-in to crash.
User Report:
PDS4 Viewer does not correctly handle csv files with semicolons as the delimiter. You get the error as shown in the screenshot. If I change the .xml file to look for Comma instead of Semicolon and replace the ; with , throughout the .csv, pds4 viewer works.
When reading Table_Delimited
structures, records containing empty values for numeric fields like ASCII_Integer
are masked as expected. However, this is not the case for ASCII_Boolean
fields, where instead a ValueError
is thrown ("invalid literal for int() with base 10: b''").
The only hint I've found on the "legality" of empty values in boolean fields comes from the Standards Reference section 4C.1 (Delimiter Separated Value Format Description), which says that:
A field may be empty. The interpretation of an empty field will be application and data type dependent.
While this does not directly imply that PDS4 Tools must support empty booleans in particular, I think the fact that the tool already interprets empty numeric fields via masking makes it reasonable for end users to assume that this behaviour would extend also to booleans.
Since boolean fields are also represented as ndarrays, I've had success with simply extending the existing masking behaviour, i.e.
pds4_tools/pds4_tools/reader/data_types.py
Lines 472 to 476 in 8ada764
I'm looking forward to hearing your thoughts on this, in particular if you can think of any caveats or reasons not to support masked booleans.
Many thanks for your work on PDS4 Tools!
I've been having a few issues using python 3.10/3.11 with this package due to the older version of the Six
package which appears to be copied into the Extern
folder.
See the discussion here: benjaminp/six#341
Specifically the newer versions of Six have this: https://github.com/benjaminp/six/blob/master/six.py#L194 to be defined.
Potential solutions:
six.py
fileextern
folder.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.