stcorp / coda Goto Github PK

The Common Data Access toolset

Home Page: http://stcorp.github.io/coda/doc/html/index.html

License: BSD 3-Clause "New" or "Revised" License

CMake 1.38% Shell 0.06% Makefile 0.39% Lex 0.11% C 90.35% Fortran 0.03% Yacc 0.59% HTML 0.02% C++ 0.72% MATLAB 0.32% Python 1.95% Java 2.82% M4 0.46% Pascal 0.21% SWIG 0.58%

coda's Introduction

Copyright (C) 2007-2024 S[&]T, The Netherlands

                CODA 2.25.2 Release Notes


CODA is the Common Data Access framework that allows reading of scientific data
from various data formats, including structured ascii, structured binary, XML,
netCDF, CDF, HDF4, HDF5, GRIB, RINEX and SP3. It provides a single consistent
hierarchical view on data independent of the underlying storage format.

CODA is used as a core component in various ESA software among which the ESA
Atmospheric Toolbox (BEAT) and the Broadview Radar Altimetry Toolbox (BRAT).

The CODA software package comes with interfaces for C, Fortran, IDL, MATLAB,
Python, and Java and several useful command-line tools.

In order to make use of the data reading facilities of CODA, you will need to
have the CODA product format definition files (.codadef files) for the data
products that you want to access. It is important to note that the CODA
software package does not come with any product format definition files itself!
The get access to .codadef files have a look at the software packages that make
use of CODA.

For files in netCDF, CDF, HDF4, HDF5, GRIB, RINEX, or SP3 format you can use
CODA without any .codadef files, since for these formats CODA either comes with
a built in definition of the format or CODA determines the format from the file
itself.


Changes
=======

An overview of the changes in this release can be found in the CHANGES file.


Installation
============

Installation instructions can be found in the INSTALL file.


Documentation
=============

Full documentation in HTML is included with the CODA software.

A version matching the latest development status on GitHub can be viewed at:

    http://stcorp.github.io/coda/doc/html/index.html


Download
========

The latest release of CODA can be downloaded from the CODA GitHub website:

    https://github.com/stcorp/coda/releases

If you encounter any issues with CODA or if you would like to see certain
functionality added then create a topic on the Atmospheric Toolbox Forum:

    https://forum.atmospherictoolbox.org/


CODA Developers
S[&]T, The Netherlands

coda's People

Contributors

Stargazers

Watchers

Forkers

dallasmasters titusjan gerritholl tdanckaert mdyzma wasat schwehr naushad-rahman raytl dalavancloud rsip4sh ruimaranhao bambang ertian srepmub sahar-github carlosjordandev yangdryang

coda's Issues

[MIPAS] coda.time_double_to_parts_utc can't manage the conversion of the date in the right way

coda.time_double_to_parts(0.0)
[2000, 1, 1, 0, 0, 0, 0]

coda.time_double_to_parts_utc(0.0)
[1999, 12, 31, 23, 59, 28, 0]

Since the reference time of coda output is 2000-1-1, this 2 functions should return the same time for 0.0 s.
2 seconds likely should take into account the leap seconds: during the MIPAS mission the insertion of a leap second occurred only 2 times (2005-12-31 and 2008-12-31).

What are the minimum version of compilers for coda?

Documentation request:

This impacts changes that I'd like send to coda. Does coda really need to support compilers older than 8 years (e.g. if it's Visual Studio >= 2012 or newer, gcc >= ?, clang >= ?, mingw >= ?) that changes what the code has to do.

e.g. for my local build (with bazel and clang/llvm), I have to remove a swath of coda.h and replace it with just #include <stdint.h> (without a surrounding #ifdef HAVE_STDINT_H). I'm happy to share that change if it doesn't break the requirements of coda. e.g. In my local code, I can assume that stdint.h works and I'm wondering your assumptions.

And does the code have to be C89 or can C99 features be used? I presume you don't support and C11 features.

No unittest or test automation

Submitting pull requests when there aren't any automatic tests makes me nervous.

rm -rf build-ninja/
mkdir -p build-ninja && cd build-ninja && cmake -GNinja .. && cmake --build . && ctest .

[169/169] Linking C executable codadd
Test project /home/schwehr/src/coda/build-ninja
No tests were found!!!

find . | grep -i test
./java/CodaTest.java

And CodaTest.java is not a unittest.

If you want a starter test... here is a googletest based draft (yes, I work for Google) for ziparchive. Sadly, it's only 50% coverage of ziparchive.c.

// Copyright 2018 Google Inc. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include <stddef.h>
#include <string>
#include <vector>

// TODO(schwehr): These includes need to change.
#include "googletest.h"
#include "gunit.h"
#include "logging.h"
#include "path.h"

extern "C" {
#include "third_party/stcorp_coda/libcoda/ziparchive.h"
}

namespace {

const char kTestData[] = "third_party/stcorp_coda/test/testdata/";

void handle_ziparchive_error(const char *message, ...) { LOG(INFO) << message; }

TEST(ZiparchiveTest, DoesNotExist) {
  const char filepath[] = "/does/not/exist.zip";
  za_file *zf = coda_za_open(filepath, handle_ziparchive_error);
  ASSERT_EQ(nullptr, zf);
}

TEST(ZiparchiveTest, NotAZipFile) {
  // Try to open a file containing some text.
  const string filepath =
      file::JoinPath(FLAGS_test_srcdir, kTestData, "not_a_zip.zip");

  za_file *zf = coda_za_open(filepath.c_str(), handle_ziparchive_error);
  ASSERT_EQ(nullptr, zf);
}

// Try a simple zip file containing a single uncompressed file named "1" that
// contains the string "1\n".
class ReadSimpleZipTest : public ::testing::Test {
 protected:
  void SetUp() override {
    filepath_ = file::JoinPath(FLAGS_test_srcdir, kTestData, "1.zip");
    zf_ = coda_za_open(filepath_.c_str(), handle_ziparchive_error);
    ASSERT_NE(nullptr, zf_);
  }
  void TearDown() override { coda_za_close(zf_); }

  string filepath_;
  za_file *zf_ = nullptr;
};

TEST_F(ReadSimpleZipTest, Filename) {
  EXPECT_STREQ(filepath_.c_str(), coda_za_get_filename(zf_));
}

TEST_F(ReadSimpleZipTest, NumEntries) {
  EXPECT_EQ(1, coda_za_get_num_entries(zf_));
}

TEST_F(ReadSimpleZipTest, NonExistingEntry) {
  EXPECT_EQ(nullptr, za_get_entry_by_index(zf_, 1));
}

TEST_F(ReadSimpleZipTest, CheckFileEntry) {
  // entry is owned by zf_.
  za_entry *entry = za_get_entry_by_index(zf_, 0);
  ASSERT_NE(nullptr, entry);

  EXPECT_STREQ("1", za_get_entry_name(entry));

  constexpr size_t kFileSize = 2;
  ASSERT_EQ(kFileSize, za_get_entry_size(entry));

  // The file "1" contains "1\n".
  std::vector<char> buf(kFileSize + 1, '\0');
  EXPECT_EQ(0, za_read_entry(entry, &buf[0]));
  EXPECT_EQ('1', buf[0]);
  EXPECT_EQ('\n', buf[1]);
}

TEST_F(ReadSimpleZipTest, GetEntryByName_DoesNotExist) {
  EXPECT_EQ(nullptr, za_get_entry_by_name(zf_, "does-not-exist"));
}

TEST_F(ReadSimpleZipTest, GetEntryByName_Exists) {
  // entry returned by za_get_entry_by_name owned by zf_
  EXPECT_NE(nullptr, za_get_entry_by_name(zf_, "1"));
}

}  // namespace

filename() expression provides full path instead of basename on Windows

The filename() expression does not properly remove the directory component of the file path on Windows.

Coda 2.21 includes some functions of of libz which leads to name clash

We compile coda with the following configure command:
./configure --prefix=$INSTALLPATH --disable-shared --with-hdf4 --with-hdf5
We want to have a static library that we can link to our final binary.

In the final link step we link coda, then hdf4, then hdf5, then libz, and get the following error:

/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /data/upas2-resources/inst-eval/lib64/libz.a(inflate.o): in function inflateValidate:
inflate.c:(.text+0x3468): multiple definition of inflateValidate; /data/upas2-resources/inst-eval/lib64/libcoda.a(libz_internal_la-inflate.o):/data/zimm_wa/projects/dockertest/dockerfile2/upaslibcompile/coda-2.21/libcoda/zlib/inflate.c:109: first defined here
/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /data/upas2-resources/inst-eval/lib64/libz.a(inflate.o): in function inflateCodesUsed:
inflate.c:(.text+0x355d): multiple definition of inflateCodesUsed; /data/upas2-resources/inst-eval/lib64/libcoda.a(libz_internal_la-inflate.o):/data/zimm_wa/projects/dockertest/dockerfile2/upaslibcompile/coda-2.21/libcoda/zlib/inflate.c:109: first defined here
/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /data/upas2-resources/inst-eval/lib64/libz.a(adler32.o): in function adler32_z:
adler32.c:(.text+0x0): multiple definition of adler32_z; /data/upas2-resources/inst-eval/lib64/libcoda.a(libz_internal_la-adler32.o):/data/zimm_wa/projects/dockertest/dockerfile2/upaslibcompile/coda-2.21/libcoda/zlib/adler32.c:72: first defined here
/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /data/upas2-resources/inst-eval/lib64/libz.a(crc32.o): in function crc32_z:
crc32.c:(.text+0xd): multiple definition of crc32_z; /data/upas2-resources/inst-eval/lib64/libcoda.a(libz_internal_la-crc32.o):/data/zimm_wa/projects/dockertest/dockerfile2/upaslibcompile/coda-2.21/libcoda/zlib/crc32.c:207: first defined here

So it seems that some funtions in libz.a are defined in libcoda.a with the same name.

On the other hand, if we don't link against libz, there are some (other) functions missing.

So what can we do? This was never a problem, so it might be new in 2.21. Thanks!

Wrong delete call in coda-grib.c

Using

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  FuzzerTemporaryFile temp_file(data, size);

  coda_init();
  const char *product_class = NULL;
  const char *product_type = NULL;
  coda_format format;
  int version;
  coda_recognize_file(temp_file.filename(), NULL, &format, &product_class,
                      &product_type, &version);
  coda_done();
  return 0;
}

I got this crash:

0x0000555556534acf in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=62, 
    function=0x555555928ae0 <__PRETTY_FUNCTION__.coda_grib_type_delete> "void coda_grib_type_delete(coda_dynamic_type *)") at base/logging.cc:106
#8  0x0000555555cf2bfc in coda_grib_type_delete (type=0x604000002050) at third_party/stcorp_coda/libcoda/coda-grib-type.c:62
#9  0x0000555555ce8ef5 in read_grib1_message (product=0x607000000560, message=<optimized out>, file_offset=<optimized out>) at third_party/stcorp_coda/libcoda/coda-grib.c:1727
#10 0x0000555555cdc85e in coda_grib_reopen (product=<optimized out>) at third_party/stcorp_coda/libcoda/coda-grib.c:3103
#11 0x0000555555d10a77 in reopen_with_backend (product_file=0x10418, format=66584) at third_party/stcorp_coda/libcoda/coda-product.c:408
#12 0x0000555555d0e61e in open_file (filename=<optimized out>, product_file=<optimized out>, force_binary=<optimized out>) at third_party/stcorp_coda/libcoda/coda-product.c:550
#13 0x0000555555d0e09d in coda_recognize_file (filename=<optimized out>, file_size=<optimized out>, file_format=<optimized out>, product_class=<optimized out>, 
    product_type=<optimized out>, version=<optimized out>) at third_party/stcorp_coda/libcoda/coda-product.c:594
#14 0x0000555555c68ced in LLVMFuzzerTestOneInput (data=<optimized out>, size=<optimized out>) at third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:16

The assert is here:

void coda_grib_type_delete(coda_dynamic_type *type)
{
    assert(type != NULL);
    assert(type->backend == coda_backend_grib);  // <--- this assert was hit

Fixed by switching coda_grib_type_delete to coda_mem_type_delete lets the fuzzer poc run without crashing.

            /* data representation type is Latitude/Longitude Grid */
            gds = coda_mem_record_new((coda_type_record *)grib_type[grib1_grid], NULL);

            NV = buffer[3];
            gtype = grib_type[grib1_numberOfVerticalCoordinateValues];
            type = (coda_dynamic_type *)coda_mem_uint8_new((coda_type_number *)gtype, NULL, cproduct, NV);
            coda_mem_record_add_field(gds, "numberOfVerticalCoordinateValues", type, 0);

            PVL = buffer[4];

            gtype = grib_type[grib1_dataRepresentationType];
            type = (coda_dynamic_type *)coda_mem_uint8_new((coda_type_number *)gtype, NULL, cproduct, buffer[5]);
            coda_mem_record_add_field(gds, "dataRepresentationType", type, 0);

            if (read_bytes(product->raw_product, file_offset, 26, buffer) < 0)
            {
                coda_mem_type_delete((coda_dynamic_type *)gds);
                // coda_grib_type_delete((coda_dynamic_type *)gds);  <----- wrong type
                return -1;
            }

Does CODA still use the HDF4 netCDF API, and in which case?

Hi,

the INSTALL document states:

If you want to use the HDF4 features of CODA then you will need to have a
recent version of HDF4 installed (for building the source package on
Windows you will need to have version 4.2.11 of HDF).
You will also need the additional required libraries libjpeg, szlib, and
zlib.
On some UNIX systems you can install HDF4 via the package manager on your
system. Make sure that this package also installs the netcdf.h include file
on your system. If this is not the case you will also have to install the
netcdf package on your system.

Is the part about netcdf.h still up to date? I have installed coda using --with-hdf4, but linking against a HDF4 library that does not contain netcdf.h (HDF4 configured with --disable-netcdf), in an environment without netCDF package. Still, it seems to work (I could open an HDF4 file and read the names of some groups in the file).

coda indexing in python interface is confusing and inconsistent

When using the code interface in python3 I have several issues with indexing.
For my testing I used an Aeolus L2B product file and matching CODADEF file (simply because I am most familiar with that product as I developed it myself). Extracting single values with fetch works as intended:

import coda
coda.version()
fn = 'AE_TEST_ALD_U_N_2B_20181002T001000_20181002T001136_0001.DBL'
fh = coda.open(fn)
coda.fetch(fh, 'meas_map', 0, 'mie_map_of_l1b_meas_used', 0, 'which_l2b_wind_id')

gives the expected output of:
'2.19'
1

Also replacing the data selection by a single string works fine:

coda.fetch(fh, 'meas_map[0]/mie_map_of_l1b_meas_used[0]/which_l2b_wind_id')

However, extracting this data set as 2D array only works for the first form:

data = coda.fetch(fh, 'meas_map', 0, 'mie_map_of_l1b_meas_used', -1, 'which_l2b_wind_id')
data.shape

this gives the expected output of:
(24,)

But this does not work:

data = coda.fetch(fh, 'meas_map[0]/mie_map_of_l1b_meas_used[-1]/which_l2b_wind_id')

It gives me this error:
Traceback (most recent call last):
File "", line 1, in
File "/usr/people/matlab/.local/lib/python3.7/site-packages/coda/codapython.py", line 652, in fetch
(intermediateNode,pathIndex) = _traverse_path(cursor,path)
File "/usr/people/matlab/.local/lib/python3.7/site-packages/coda/codapython.py", line 185, in _traverse_path
cursor_goto(cursor,path[pathIndex])
codac.CodacError: coda_cursor_goto(): array index (-1) exceeds array range [0:24)

Additional remarks / feature requests:

the use of the value -1 to get the full array is very confusing in python. This value in python usually gives you the last item of an array or list. I would suggest to also allow the special value ':' in the python interface to extract the full array.
there is no way to do slicing. coda only allows extracting a single number or the full array. Especially for large files getting the full array can take a very long time, and calling coda fetch inside a nested for loop to extract a partial 2D array, can take even longer. I would suggest to implement a mechanism similar to the slice known in python. I.e. '[5:10]' would give you array elements with index values 5 up to, but not including 10.

Include full generated documentation in repository

Including the doxygen generated documentation in the repository prevents developers having to generate this themselves. But it also allows viewing the full documentation online using the link https://htmlpreview.github.io/?https://raw.githubusercontent.com/stcorp/coda/master/doc/html/index.html

Create conda package for CODA with python

Create packages for conda for CODA with HDF4 and HDF5 enabled and with the Python interfaces enabled. This is especially useful for using the CODA python interfaces on Windows.

The approach should probably match this tutorial

Use C99 to reduce where variables are define and make them const when possible

There are a lot of places where is would make static analysis easier if the code used C99 syntax. Doing this is typically pretty easy. While it doesn't seem like much, it makes debugging easier and makes compiler and static analyzers' output clearer.

I can do some of these as pull requests if that's okay with the project.

e.g.

uint32_t section_size;
// Lots of code
section_size = (((uint32_t)buffer[0] * 256 + buffer[1]) * 256 + buffer[2]) * 256 + buffer[3];

Could become:

// Lots of code
const uint32_t section_size =
    (((uint32_t)buffer[0] * 256 + buffer[1]) * 256 + buffer[2]) * 256 + buffer[3];

And

int i;
// Lots of code
for (i = 0; i < num_grib_types; i++)
{
    grib_type[i] = NULL;
}

Could be:

// Lots of code

for (int i = 0; i < num_grib_types; i++)
{
    grib_type[i] = NULL;
}

And lots of places where the scope of things can be reduced. e.g.

cppcheck --enable=all --std=c99 --force --inconclusive coda-grib.c
Checking coda-grib.c ...
coda-grib.c:2165:5: style: Assignment of function parameter has no effect outside the function. [uselessAssignmentArg]
    file_offset += 4;
    ^
coda-grib.c:1549:13: style: The scope of the variable 'intvalue' can be reduced. [variableScope]
    int32_t intvalue;
            ^
coda-grib.c:1681:26: style: The scope of the variable 'gds' can be reduced. [variableScope]
        coda_mem_record *gds;
                         ^
coda-grib.c:2316:22: style: The scope of the variable 'raw_data' can be reduced. [variableScope]
            uint8_t *raw_data;
                     ^
[SNIP]

Add support for GRIB messages using quasi-regular grids

This is to support the native grid as used for the CAMS and ERA-Interim ECMWF data.

Split type descriptions in codadef definitions into short and long descriptions

We need to split the description into something that can be used for e.g. plot axis labels or command line output (short description) and something that allows for a more extensive explanation of the data (long description).

The idea is to replace the current single description parameter into separate short_description and long_description parameters.

Add OO variant of CODA Python interface

Currently, the Python interface of CODA is not using the Object Oriented approach in order to make it consistent with the IDL/MATLAB interface (for the higher level CODA API) or the C interface (for the lower level CODA API).

We should, however, add a CODA Python interface that is more 'pythonic' and that treats Products, Cursors, Types, etc. as classess with methods.
For most CODA functions the mapping to a class methods is quite straightforward.

Allow coda fetch function to take a cursor path as string argument

The high-level CODA functions of Python, MATLAB and IDL (such as the fetch() function or field and size inspection functions) currently take a series of record field names and array index references as arguments to indicate a specific part in a product. String parameters indicate field names. Integer (or integer list) parameters indicate array index references.

The idea is to change this such that string parameters represent a cursor path instead of just a fieldname. This would translate into a call to coda_cursor_goto() in the C library.

Add 'raw' output option to codadump

This would allow dumping of a binary subblock of data to stdout or a file.
This would only work for data where we currently track the byte offset/lengths of the data inside the original file. So at least for pure ascii/binary data files.

But we should also try to see if we can't use this to e.g. dump specific submessages of grib files or dump specific subsections of xml files.

INSTALL file not complete

The coda sources as provided on github do not contain the configure script as mentioned in the INSTALL file. Therefore, if a user wishes to install the coda from sources cloned from github some extra instructions are needed, i.e. the autotools should be applied first.
For example these commands worked in my Fedora linux system:

ln -s ./config.h.cmake.in config.h.in
libtoolize
aclocal
autoconf
automake -a

./configure --enable-python --prefix=`pwd`/../coda_install --with-hdf5
make
make install

I think it would be good to add some instructions to the INSTALL file to explain this.

coda_recognize_file_fuzzer: Crash in read_bytes

#0 0x7fcee3833548 in __memmove_ssse3_back (/usr/grte/v4/lib64/libc.so.6+0xc2548)
--
  | #1 0x55fd54764b8c in read_bytes third_party/stcorp_coda/libcoda/coda-read-bytes.h:79:9
  | #2 0x55fd54767b46 in read_var_array third_party/stcorp_coda/libcoda/coda-netcdf.c:550:13
  | #3 0x55fd547645fd in coda_netcdf_reopen third_party/stcorp_coda/libcoda/coda-netcdf.c:901:9
  | #4 0x55fd5475b081 in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:402:17
  | #5 0x55fd547568db in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
  | #6 0x55fd54756163 in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
  | #7 0x55fd54651fdf in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3

and

0x60200000508f is located 1 bytes to the left of 1-byte region [0x602000005090,0x602000005091)
allocated by thread T0 here:
    #0 0x55859606213d in malloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    #1 0x5585961742dc in read_var_array third_party/stcorp_coda/libcoda/coda-netcdf.c:542:16
    #2 0x558596170f08 in coda_netcdf_reopen third_party/stcorp_coda/libcoda/coda-netcdf.c:901:9
    #3 0x558596168403 in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:402:17
    #4 0x5585961650e8 in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
    #5 0x558596164a2a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
    #6 0x55859607b231 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3

Reproduces at 06fa8ab

Treat last dimension of HDF4 character array as string length

For netCDF we already treat the last dimension of a multi-dimensional character array as a string length (if that dimension is not the appendable dimension).
We should add the same behaviour for HDF4.
This would improve handling of HDF4 character SDS data.

Native HDF5 backend

Instead of using the HDF5 library, create our own implementation to read HDF5 files.
This will allow a much faster access to the data (not need to work with dataspaces, vlen APIs, etc.)

The tricky part will be dealing with compressed data, but this should be similar to how we currently handle this with the CDF backend for zipped data.

Support for MIP_NL__2P ORDER_OF_SPECIES

The ordering of datasets in the MIP_NL__2P codadef is currently fixed even though it should be made dependent on the order of the species as found in ORDER_OF_SPECIES.

With the regex coda expression function we are actually already quite close to supporting this ordering dynamically:

each vmr dataset should use a common product variable that maps from species index to vmr index
this mapping can be calculated by using for each species 'regex("(.*)HCN", str(/sph/order_of_species),1)' and then counting the amount of ',' in the resulting string
to allow this, a coda expression function to count occurrences of one string in another should be introduced (possibly together with an 'index' function that gives the index of the k-th occurrence of one string within another string)

Add descriptions to product variables

Add a 'description' attribute to product variables that can be included in the generated documentation. This allows adding an explanation of what the product variables are used for and how they are derived.

coda_recognize_file_fuzzer: Crash in read_bytes

==614332==ERROR: AddressSanitizer: SEGV on unknown address 0x7f5369c00fff (pc 0x7f536ce755a0 bp 0x7ffdca355690 sp 0x7ffdca354e48 T0)
--
  | ==614332==The signal is caused by a READ memory access.
  | #0 0x7f536ce755a0 in __memmove_ssse3_back (/usr/grte/v4/lib64/libc.so.6+0xc35a0)
  | #1 0x5597d854d4cc in read_bytes third_party/stcorp_coda/libcoda/coda-read-bytes.h:79:9
  | #2 0x5597d854e2d2 in read_GDR third_party/stcorp_coda/libcoda/coda-cdf.c:1107:9
  | #3 0x5597d854de33 in read_file third_party/stcorp_coda/libcoda/coda-cdf.c:1297:9
  | #4 0x5597d854d044 in coda_cdf_reopen third_party/stcorp_coda/libcoda/coda-cdf.c:1380:9
  | #5 0x5597d854ca9a in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:396:17
  | #6 0x5597d85497a8 in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
  | #7 0x5597d85490ea in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
  | #8 0x5597d845f8f1 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3

testcase-5685773629652992.zip

Add support for HDF5 null dataspace for attributes

HDF5 uses the null dataspace H5S_NULL to indicate 'emtpy' attributes. This is the way to represent empty string values for attributes in HDF5.

We should support this specific case in CODA for scalar string attributes to return an empty string.

Problem when installing coda 2.21 with ./configure on MacOS 10.14.6

Installation process with ./configure breaks down with error :
"checking the archiver (ar) interface... unknown
configure: error: could not determine ar interface"

can not import coda

I was installed codasetup-win64-2.17.2.exe and python setup.py install.
But can not import the coda package
the error message as followed:


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\CODA\python\coda\__init__.py", line 22, in <module>
    from .codapython import *
  File "D:\CODA\python\coda\codapython.py", line 22, in <module>
    from .codac import *
  File "D:\CODA\python\coda\codac.py", line 26, in <module>
    _codac = swig_import_helper()
  File "D:\CODA\python\coda\codac.py", line 22, in swig_import_helper
    _mod = imp.load_module('_codac', fp, pathname, description)
  File "F:\User\iht\Program Files\Anaconda3\lib\imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "F:\User\iht\Program Files\Anaconda3\lib\imp.py", line 342, in load_dynamic
    return _load(spec)

OS: windows 10
python: anaconda python 3.5

Is this package can not use in python 3?

Install python bindings via pypi

Are there any plans to make the python bindings available on pypi? I know there is already another project with the same name and also some similar project names are assigned.

It would be very comfortable to install the bindings using pip (e.g. in a virtual environment) while the C-library is installed by the system's package manager. libcoda-dev is in Debian buster.

coda_recognize_file_fuzzer: Direct-leak in coda_mem_record_new

==231736==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #1 0x55f342eba1a5 in coda_mem_record_new third_party/stcorp_coda/libcoda/coda-mem-type.c:438:31
    #2 0x55f342e91d55 in read_grib1_message third_party/stcorp_coda/libcoda/coda-grib.c:1712:19
    #3 0x55f342e84eb2 in coda_grib_reopen third_party/stcorp_coda/libcoda/coda-grib.c:3105:17
    #4 0x55f342ec76db in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:410:17
    #5 0x55f342ec44c8 in open_file third_party/stcorp_coda/libcoda/coda-product.c:552:9
    #6 0x55f342ec3e0a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:596:9
    #7 0x55f342dd9c91 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3

Indirect leak of 256 byte(s) in 1 object(s) allocated from:
    #1 0x55f342f2266c in coda_hashtable_insert_name third_party/stcorp_coda/libcoda/hashtable.c:166:32
    #2 0x55f342f0eec4 in coda_type_record_insert_field third_party/stcorp_coda/libcoda/coda-type.c:1331:9
    #3 0x55f342f0df17 in coda_type_record_add_field third_party/stcorp_coda/libcoda/coda-type.c:1427:12
    #4 0x55f342e88a30 in grib_init third_party/stcorp_coda/libcoda/coda-grib.c:658:5
    #5 0x55f342e84642 in coda_grib_reopen third_party/stcorp_coda/libcoda/coda-grib.c:3010:9
    #6 0x55f342ec76db in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:410:17
    #7 0x55f342ec44c8 in open_file third_party/stcorp_coda/libcoda/coda-product.c:552:9
    #8 0x55f342ec3e0a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:596:9
    #9 0x55f342dd9c91 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3

There were a lot more indirect leaks.

testcase-5060985932480512.zip

Add mechanism to raise warnings

Add a way to raise warnings using coda_report_warning() and coda_set_warning_handler() (similar to the way it is done in HARP).

The idea is to use this when CODA opens a product using a self-describing data format to raise warnings for content that is not supported by CODA. A warning handler will be set for codacheck by default to report on these warnings. For other tools like codadump this could potentially become a command line option.

Allow codadd to create .codadef files

We currently have a shell script that uses grep/set/sort/head to determine the last modification date and then uses zip to create the .codadef file (from a directory of .xml files).
The problem is that this script only works on Linux/macOS systems but not on Windows.

It would be more convenient to have this functionality provided by the codadd tool:

it would allow us to build .codadef files on Windows
other software that wants to create .codadef files can then use a tool that already comes installed with CODA (instead of having to bundle their own version of a codadef.sh script).

Overflow of section_size possible in coda-grib.c

section_size = ((buffer[0] * 256 + buffer[1]) * 256 + buffer[2]) * 256 + buffer[3];

Gives:

third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer crash-af51b1fb3ec5c81523e08e7e7c0a567c34650366
Running the target on file crash-af51b1fb3ec5c81523e08e7e7c0a567c34650366 (97 bytes)
third_party/stcorp_coda/libcoda/coda-grib.c:2216:59: runtime error: signed integer overflow: 16777215 * 256 cannot be represented in type 'int'
    #0 0x559fc57e6896 in read_grib2_message third_party/stcorp_coda/libcoda/coda-grib.c:2216:59
    #1 0x559fc57d3e0f in coda_grib_reopen third_party/stcorp_coda/libcoda/coda-grib.c:3138:17
    #2 0x559fc5807eb6 in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:408:17
    #3 0x559fc5805a5d in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
    #4 0x559fc58054dc in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
    #5 0x559fc576024c in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:16:3

Proposed solution:

#include <limits.h>

// SNIP

    uint64_t section_size_tmp =
        ((buffer[0] * 256 + buffer[1]) * 256 + buffer[2]) * 256 + buffer[3];
    if (section_size_tmp > UINT_MAX)
    {
        return -1;
    }
    section_size = section_size_tmp;

Add codadef definitions for self-describing formats

Allow codadefs to contain format definitions for self-describing formats: netCDF/CDF/HDF4/HDF5/...

A coda_product_check() should then also be able to check this definition against the actual format of the file.

Overflow in Ni * Nj in coda-grib.c

third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer crash-de1b3e5847b6bf90abb4b39bf5ce15b52c765ce6 

Running the target on file crash-de1b3e5847b6bf90abb4b39bf5ce15b52c765ce6 (91 bytes)
third_party/stcorp_coda/libcoda/coda-grib.c:1743:35: runtime error: signed integer overflow: 65500 * 56540 cannot be represented in type 'int'
    #0 0x5649e8feddf6 in read_grib1_message third_party/stcorp_coda/libcoda/coda-grib.c:1743:35
    #1 0x5649e8fdfc0d in coda_grib_reopen third_party/stcorp_coda/libcoda/coda-grib.c:3115:17
    #2 0x5649e9013ec6 in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:408:17
    #3 0x5649e9011a6d in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
    #4 0x5649e90114ec in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
    #5 0x5649e8f6c22c in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:16:3

This is the trouble:

            if (Ni != 65535 && Nj != 65535)
            {
                num_elements = Ni * Nj;
            }

Ni and Nj are both int values.

A possible fix:

            if (Ni != 65535 && Nj != 65535)
            {
                num_elements = Ni * (long)Nj;
            }

Allow codadefs to reinterpret content of self decribing formats

This issue covers the general solution for #13.

The idea is to allow codadefs to be used to (re)interpret data from HDF4, HDF5 and netcdf formatted files. (for CDF, GRIB, rinex, and sp3, this currently does not seem needed).

This will allow:

single-element arrays to be treated as scalars (or vice versa)
addition of special types, such as time values
addition of conversions (unit conversion and/or missing value mapping)
potentially interpreting text data using a specific structure (similar to XML)

We should combine this with a global CODA option that will either read products using the dynamic format or using the codadef format. If the dynamic format is used a codacheck will then still be possible and will allow to present all issues instead of stopping at the first issue found (which is what will happen when (re)interpreting the product using the codadef as format).

Add special product variables for caching of byte offsets of top-level array elements

To improve the reading performance of ASCII data and in general of 'array of record' style products (e.g. binary Level 0 data and GOME-2 L1 data) we should find a way to have the byte offsets of top-level array elements cached.

The idea is to do this using product variables. The problem with product variables in its current form is that it 1) requires the length of the variable to be decided before initialization and 2) (requiring a second pass through the product) a full initialization of the variable in one operation to assign all values of the product variable.

We could improve the performance by allowing these two steps to be performed using lazy initialization:

allow product variables to start with length 0 and automatically resize the product variable array when higher indexed values are read
allowing element-wise initialization (setting the index variable i to the array element that needs to be initialized)
use a special 'missing value' (e.g. -1) that will trigger the initialization of that specific product variable array element

Next to this, we then also need to introduce an 'offset expression' for array types (similar to the 'offset expression' for record fields).

Allow first argument of coda fetch function to be a file path

For cases where you only want to call coda.fetch (or coda_fetch) once, it would be very useful if the coda.fetch does the open/close of a product itself.

pf = coda.open('filename')
data = coda.fetch(pf)
coda.close(pf)

would then become a single

data = coda.fetch('filename')

This means that instead of the first parameter be a product handle or cursor, it should also be possible to let it be a string providing a file path.

This change applies to the Python, IDL, and MATLAB interfaces only.

coda_recognize_file_fuzzer: Direct-leak in coda_mem_record_new

Reproduces at 06fa8ab

#1 0x558891712f75 in coda_mem_record_new /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-mem-type.c:438:31
--
  | #2 0x5588916ddddf in coda_grib_reopen /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-grib.c:3092:25
  | #3 0x55889172035b in reopen_with_backend /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-product.c:408:17
  | #4 0x55889171d148 in open_file /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-product.c:550:9
  | #5 0x55889171ca8a in coda_recognize_file /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-product.c:594:9
  | #6 0x558891633291 in LLVMFuzzerTestOneInput /proc/self/cwd/third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3

testcase-5911428917100544-936c5358e3fc46c11d68.zip

coda_recognize_file_fuzzer: Integer-overflow in read_grib2_message

third_party/stcorp_coda/libcoda/coda-grib.c:2305:74: runtime error: signed integer overflow: 16770703 * 256 cannot be represented in type 'int'
    #0 0x55b0fb8c0e80 in read_grib2_message third_party/stcorp_coda/libcoda/coda-grib.c:2305:74
    #1 0x55b0fb8ad540 in coda_grib_reopen third_party/stcorp_coda/libcoda/coda-grib.c:3131:17
    #2 0x55b0fb8efb4b in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:408:17
    #3 0x55b0fb8ec938 in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
    #4 0x55b0fb8ec27a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
    #5 0x55b0fb802151 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3

SUMMARY: UndefinedBehaviorSanitizer: signed-integer-overflow third_party/stcorp_coda/libcoda/coda-grib.c:2305

testcase-6232605866852352.zip

netcdf attributes & codadef (scalar vs. array)

In CODA, the dynamic definitions for netcdf use a scalar for netcdf attributes if there is only one element, and will use an array if there is more than one element. If we want to create a codadef where the number of elements can range from 1 to more, then we have a problem, since we don’t read the attributes with one element as an array.

The best way forward (which also solves other aspects, such as allowing conversions and introducing 'time' types) is to use codadefs to (re)interpret how netcdf/etc. products are read (just as XML).

We could combine this with a global CODA option that will either read products using the dynamic format or using the codadef format. If the dynamic format is used a codacheck will then still be possible. This will allow to present all issues instead of stopping at the first issue found (which is what will happen when (re)interpreting the product using the codadef as format).

Add support for GRIB2 data with Data Representation Template 3

This would allow us to read Global Forecast System (GFS) data.

Allow detection expressions for self describing formats

Allow products with self describing formats to be recognized using detection expressions in a codadef. Note that this does not require a format definition in the codadef (see #6).

The approach would be to just open the product as normal and then perform a sequence of bool coda expressions on the product (using the same tree hierarchy of tests as used for matching on the detection block for ascii/binary products).

Compiling CODA fails with Python support

When compiling and making CODA with Python support, I get the following error message when running the make install command:
make install-am
make[1]: Entering directory '/data/hedelt/coda'
w119,451 -python -Ipython -I./python -DPRINTF_ATTR= -o python/codac.c ./python/codac.i
make[1]: w119,451: Command not found
make[1]: [Makefile:3413: python/codac.c] Error 127 (ignored)

I have pulled the latest version of CODA today...

coda_recognize_file_fuzzer: Direct-leak in coda_bin_open

==23560==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 80 byte(s) in 1 object(s) allocated from:
    #0 0x55cc130cab9d in malloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    #1 0x55cc13112de8 in coda_bin_open third_party/stcorp_coda/libcoda/coda-bin.c:237:40
    #2 0x55cc131ce447 in open_file third_party/stcorp_coda/libcoda/coda-product.c:532:9
    #3 0x55cc131cde0a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
    #4 0x55cc130e3c91 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3

Indirect leak of 36 byte(s) in 1 object(s) allocated from:
    #0 0x55cc130b6c41 in strdup third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0x55cc13112eb3 in coda_bin_open third_party/stcorp_coda/libcoda/coda-bin.c:264:30
    #2 0x55cc131ce447 in open_file third_party/stcorp_coda/libcoda/coda-product.c:532:9
    #3 0x55cc131cde0a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
    #4 0x55cc130e3c91 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3

SUMMARY: AddressSanitizer: 116 byte(s) leaked in 2 allocation(s).

testcase-6241787911340032.zip

coda_expression_fuzzer: Direct-leak in integer_constant_new

#1 0x5577a044e25f in integer_constant_new /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-expr.c:284:12
--
  | #2 0x5577a044d118 in coda_expression_new /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-expr.c:359:20
  | #3 0x5577a0490720 in coda_expression_parse /proc/self/coda/libcoda/coda-expr-parser.y:464:28
  | #4 0x5577a049376e in coda_expression_from_string /proc/self/coda/libcoda/coda-expr-parser.y:1049:9
  | #5 0x5577a03ecfb9 in LLVMFuzzerTestOneInput /proc/self/cwd/third_party/stcorp_coda/fuzz/coda_expression_fuzzer.cc:23:3

from

#include <stdarg.h>
#include <stddef.h>
#include <stdint.h>

#include <string>

#include "third_party/absl/cleanup/cleanup.h"
#include "third_party/stcorp_coda/libcoda/coda.h"

int printf_black_hole(const char* fmt, ...) {
  va_list args;
  va_end(args);
  return 0;
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  coda_init();
  auto done = absl::MakeCleanup([] { coda_done(); });

  const std::string exprstring(reinterpret_cast<const char *>(data), size);

  coda_expression *expr = nullptr;
  coda_expression_from_string(exprstring.c_str(), &expr);

  if (!expr) return 0;

  coda_expression_print(expr, printf_black_hole);
  coda_expression_delete(expr);

  return 0;
}

testcase-5913480749645824.zip

codacheck dumps binary characters on format error in ASCII date/time field

When codacheck encounters a date/time format error, it dumps whatever is in that field as part of the error message. When this is a random binary chunk, strange characters appear and terminal behaviour is affected (control codes printed at the prompt after codacheck is finished).

Example output:

ERROR: date/time argument (???5?
????C??) has an incorrect format at [166]/mph/beg_prod_utc
  ERROR: date/time argument (?) has an incorrect format at [166]/mph/gen_mph_utc
  ERROR: date/time argument (????C??) has an incorrect format at [166]/mph/ref_utc
k???<?R: date/time argument (
?????C?8) has an incorrect format at [166]/mph/asc_utc
  ERROR: date/time argument () has an incorrect format at [167]/mph/beg_prod_utc
  ERROR: date/time argument (?) has an incorrect format at [167]/mph/gen_mph_utc
  ERROR: date/time argument (?6???C??) has an incorrect format at [167]/mph/ref_utc
  ERROR: date/time argument () has an incorrect format at [167]/mph/asc_utc
  ERROR: date/time argument () has an incorrect format at [168]/mph/beg_prod_utc
  ERROR: date/time argument (>?) has an incorrect format at [168]/mph/gen_mph_utc
  ERROR: date/time argument (????????????????????????) has an incorrect format at [168]/mph/ref_utc
  ERROR: date/time argument () has an incorrect format at [168]/mph/asc_utc
  ERROR: incorrect file size (actual size: 235248, calculated: 235079)

~/envisat/products/RA/EMWC $
1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c

Add support for netCDF CDF-5

There is a new version 5 flavour of the netCDF classic format called CDF-5.
This adds support for unsigned integers, 64 bit integers, and 64 bit values for counts and lengths.

This was introduced in the netCDF library with version 4.4.0-RC4 via a new mode flag called NC_64BIT_DATA.

It should be rather straightforward to add support for this format in CODA.

Note that it is a bit inconvenient that they call this CDF-5 as it creates confusion with the actual CDF format

Have rinex/sp3 backend use the 'raw' product for reading

The rinex and sp3 backends currently read the file contents using a FILE * instead of using the initial coda_product handle that maps the whole file as a raw binary block (and which is also used for the file format detection).

The problem with using the 'raw' product handle is that the current mapping of rinex/sp3 to the memory backend uses a buffered file pointer (FILE *) together with fgets. To migrate this we would need some function that provides an efficient fgets (or rather, a ‘readline’) for the CODA ‘raw’ product.

This is likely linked to the replacement of AsciiLine/AsciiLineSeparator/AsciiWhitespace in codadefs by a more generic special expression in codadef for ASCII data that is terminated by special characters (e.g. comma-separated, EOL-separated, etc.)

Add an R interface

Are there plans or previous requests for an R interface for CODA?

coda_recognize_file_fuzzer: Segv on unknown address in coda_dynamic_type_delete

==1496606==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x55c53036df1e bp 0x7ffe48aa4330 sp 0x7ffe48aa4320 T0)
--
  | ==1496606==The signal is caused by a READ memory access.
  | ==1496606==Hint: this fault was caused by a dereference of a high value address (see register values below).  Dissassemble the provided pc to learn which register was used.
  | #0 0x55c53036df1e in coda_dynamic_type_delete third_party/stcorp_coda/libcoda/coda-cursor.c:160:19
  | #1 0x55c5304a9d05 in parser_info_cleanup third_party/stcorp_coda/libcoda/coda-xml-parser.c:280:13
  | #2 0x55c5304a944a in coda_xml_parse third_party/stcorp_coda/libcoda/coda-xml-parser.c:851:13
  | #3 0x55c5304a8495 in coda_xml_reopen third_party/stcorp_coda/libcoda/coda-xml.c:77:9
  | #4 0x55c530453aab in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:368:17
  | #5 0x55c5304508c8 in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
  | #6 0x55c53045020a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
  | #7 0x55c530366a11 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19

Trouble happens here:

void coda_dynamic_type_delete(coda_dynamic_type *type)
{
    if (type == NULL)
    {
        return;
    }

    switch (type->backend)  // <-- backend is not valid
    {

testcase-5067049788768256.zip

Allow GRIB files to contain combination of GRIB1 and GRIB2 messages

It happens in practice that GRIB files contain a mix of GRIB1 and GRIB2 messages.

To support this we will have to:

Combine coda_format_grib1 and coda_format_grib2 into a single coda_format_grib
Introduce support for unions in the memory backend of CODA
Change the CODA type mapping of GRIB files from an array of records (with each record being a grib message) into an array of unions with each union having grib1 and grib2 fields pointing to the GRIB1 or GRIB2 record (and where only one of them will be populated)