stcorp / coda Goto Github PK
View Code? Open in Web Editor NEWThe Common Data Access toolset
Home Page: http://stcorp.github.io/coda/doc/html/index.html
License: BSD 3-Clause "New" or "Revised" License
The Common Data Access toolset
Home Page: http://stcorp.github.io/coda/doc/html/index.html
License: BSD 3-Clause "New" or "Revised" License
Copyright (C) 2007-2024 S[&]T, The Netherlands CODA 2.25.2 Release Notes CODA is the Common Data Access framework that allows reading of scientific data from various data formats, including structured ascii, structured binary, XML, netCDF, CDF, HDF4, HDF5, GRIB, RINEX and SP3. It provides a single consistent hierarchical view on data independent of the underlying storage format. CODA is used as a core component in various ESA software among which the ESA Atmospheric Toolbox (BEAT) and the Broadview Radar Altimetry Toolbox (BRAT). The CODA software package comes with interfaces for C, Fortran, IDL, MATLAB, Python, and Java and several useful command-line tools. In order to make use of the data reading facilities of CODA, you will need to have the CODA product format definition files (.codadef files) for the data products that you want to access. It is important to note that the CODA software package does not come with any product format definition files itself! The get access to .codadef files have a look at the software packages that make use of CODA. For files in netCDF, CDF, HDF4, HDF5, GRIB, RINEX, or SP3 format you can use CODA without any .codadef files, since for these formats CODA either comes with a built in definition of the format or CODA determines the format from the file itself. Changes ======= An overview of the changes in this release can be found in the CHANGES file. Installation ============ Installation instructions can be found in the INSTALL file. Documentation ============= Full documentation in HTML is included with the CODA software. A version matching the latest development status on GitHub can be viewed at: http://stcorp.github.io/coda/doc/html/index.html Download ======== The latest release of CODA can be downloaded from the CODA GitHub website: https://github.com/stcorp/coda/releases If you encounter any issues with CODA or if you would like to see certain functionality added then create a topic on the Atmospheric Toolbox Forum: https://forum.atmospherictoolbox.org/ CODA Developers S[&]T, The Netherlands
coda.time_double_to_parts(0.0)
[2000, 1, 1, 0, 0, 0, 0]
coda.time_double_to_parts_utc(0.0)
[1999, 12, 31, 23, 59, 28, 0]
Since the reference time of coda output is 2000-1-1, this 2 functions should return the same time for 0.0 s.
2 seconds likely should take into account the leap seconds: during the MIPAS mission the insertion of a leap second occurred only 2 times (2005-12-31 and 2008-12-31).
Documentation request:
This impacts changes that I'd like send to coda. Does coda really need to support compilers older than 8 years (e.g. if it's Visual Studio >= 2012 or newer, gcc >= ?, clang >= ?, mingw >= ?) that changes what the code has to do.
e.g. for my local build (with bazel and clang/llvm), I have to remove a swath of coda.h and replace it with just #include <stdint.h>
(without a surrounding #ifdef HAVE_STDINT_H). I'm happy to share that change if it doesn't break the requirements of coda. e.g. In my local code, I can assume that stdint.h works and I'm wondering your assumptions.
And does the code have to be C89 or can C99 features be used? I presume you don't support and C11 features.
Submitting pull requests when there aren't any automatic tests makes me nervous.
rm -rf build-ninja/
mkdir -p build-ninja && cd build-ninja && cmake -GNinja .. && cmake --build . && ctest .
[169/169] Linking C executable codadd
Test project /home/schwehr/src/coda/build-ninja
No tests were found!!!
find . | grep -i test
./java/CodaTest.java
And CodaTest.java is not a unittest.
If you want a starter test... here is a googletest based draft (yes, I work for Google) for ziparchive. Sadly, it's only 50% coverage of ziparchive.c.
// Copyright 2018 Google Inc. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <stddef.h>
#include <string>
#include <vector>
// TODO(schwehr): These includes need to change.
#include "googletest.h"
#include "gunit.h"
#include "logging.h"
#include "path.h"
extern "C" {
#include "third_party/stcorp_coda/libcoda/ziparchive.h"
}
namespace {
const char kTestData[] = "third_party/stcorp_coda/test/testdata/";
void handle_ziparchive_error(const char *message, ...) { LOG(INFO) << message; }
TEST(ZiparchiveTest, DoesNotExist) {
const char filepath[] = "/does/not/exist.zip";
za_file *zf = coda_za_open(filepath, handle_ziparchive_error);
ASSERT_EQ(nullptr, zf);
}
TEST(ZiparchiveTest, NotAZipFile) {
// Try to open a file containing some text.
const string filepath =
file::JoinPath(FLAGS_test_srcdir, kTestData, "not_a_zip.zip");
za_file *zf = coda_za_open(filepath.c_str(), handle_ziparchive_error);
ASSERT_EQ(nullptr, zf);
}
// Try a simple zip file containing a single uncompressed file named "1" that
// contains the string "1\n".
class ReadSimpleZipTest : public ::testing::Test {
protected:
void SetUp() override {
filepath_ = file::JoinPath(FLAGS_test_srcdir, kTestData, "1.zip");
zf_ = coda_za_open(filepath_.c_str(), handle_ziparchive_error);
ASSERT_NE(nullptr, zf_);
}
void TearDown() override { coda_za_close(zf_); }
string filepath_;
za_file *zf_ = nullptr;
};
TEST_F(ReadSimpleZipTest, Filename) {
EXPECT_STREQ(filepath_.c_str(), coda_za_get_filename(zf_));
}
TEST_F(ReadSimpleZipTest, NumEntries) {
EXPECT_EQ(1, coda_za_get_num_entries(zf_));
}
TEST_F(ReadSimpleZipTest, NonExistingEntry) {
EXPECT_EQ(nullptr, za_get_entry_by_index(zf_, 1));
}
TEST_F(ReadSimpleZipTest, CheckFileEntry) {
// entry is owned by zf_.
za_entry *entry = za_get_entry_by_index(zf_, 0);
ASSERT_NE(nullptr, entry);
EXPECT_STREQ("1", za_get_entry_name(entry));
constexpr size_t kFileSize = 2;
ASSERT_EQ(kFileSize, za_get_entry_size(entry));
// The file "1" contains "1\n".
std::vector<char> buf(kFileSize + 1, '\0');
EXPECT_EQ(0, za_read_entry(entry, &buf[0]));
EXPECT_EQ('1', buf[0]);
EXPECT_EQ('\n', buf[1]);
}
TEST_F(ReadSimpleZipTest, GetEntryByName_DoesNotExist) {
EXPECT_EQ(nullptr, za_get_entry_by_name(zf_, "does-not-exist"));
}
TEST_F(ReadSimpleZipTest, GetEntryByName_Exists) {
// entry returned by za_get_entry_by_name owned by zf_
EXPECT_NE(nullptr, za_get_entry_by_name(zf_, "1"));
}
} // namespace
The filename()
expression does not properly remove the directory component of the file path on Windows.
We compile coda with the following configure command:
./configure --prefix=$INSTALLPATH --disable-shared --with-hdf4 --with-hdf5
We want to have a static library that we can link to our final binary.
In the final link step we link coda, then hdf4, then hdf5, then libz, and get the following error:
/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /data/upas2-resources/inst-eval/lib64/libz.a(inflate.o): in function inflateValidate:
inflate.c:(.text+0x3468): multiple definition of inflateValidate; /data/upas2-resources/inst-eval/lib64/libcoda.a(libz_internal_la-inflate.o):/data/zimm_wa/projects/dockertest/dockerfile2/upaslibcompile/coda-2.21/libcoda/zlib/inflate.c:109: first defined here
/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /data/upas2-resources/inst-eval/lib64/libz.a(inflate.o): in function inflateCodesUsed:
inflate.c:(.text+0x355d): multiple definition of inflateCodesUsed; /data/upas2-resources/inst-eval/lib64/libcoda.a(libz_internal_la-inflate.o):/data/zimm_wa/projects/dockertest/dockerfile2/upaslibcompile/coda-2.21/libcoda/zlib/inflate.c:109: first defined here
/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /data/upas2-resources/inst-eval/lib64/libz.a(adler32.o): in function adler32_z:
adler32.c:(.text+0x0): multiple definition of adler32_z; /data/upas2-resources/inst-eval/lib64/libcoda.a(libz_internal_la-adler32.o):/data/zimm_wa/projects/dockertest/dockerfile2/upaslibcompile/coda-2.21/libcoda/zlib/adler32.c:72: first defined here
/usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /data/upas2-resources/inst-eval/lib64/libz.a(crc32.o): in function crc32_z:
crc32.c:(.text+0xd): multiple definition of crc32_z; /data/upas2-resources/inst-eval/lib64/libcoda.a(libz_internal_la-crc32.o):/data/zimm_wa/projects/dockertest/dockerfile2/upaslibcompile/coda-2.21/libcoda/zlib/crc32.c:207: first defined here
So it seems that some funtions in libz.a are defined in libcoda.a with the same name.
On the other hand, if we don't link against libz, there are some (other) functions missing.
So what can we do? This was never a problem, so it might be new in 2.21. Thanks!
Using
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
FuzzerTemporaryFile temp_file(data, size);
coda_init();
const char *product_class = NULL;
const char *product_type = NULL;
coda_format format;
int version;
coda_recognize_file(temp_file.filename(), NULL, &format, &product_class,
&product_type, &version);
coda_done();
return 0;
}
I got this crash:
0x0000555556534acf in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=62,
function=0x555555928ae0 <__PRETTY_FUNCTION__.coda_grib_type_delete> "void coda_grib_type_delete(coda_dynamic_type *)") at base/logging.cc:106
#8 0x0000555555cf2bfc in coda_grib_type_delete (type=0x604000002050) at third_party/stcorp_coda/libcoda/coda-grib-type.c:62
#9 0x0000555555ce8ef5 in read_grib1_message (product=0x607000000560, message=<optimized out>, file_offset=<optimized out>) at third_party/stcorp_coda/libcoda/coda-grib.c:1727
#10 0x0000555555cdc85e in coda_grib_reopen (product=<optimized out>) at third_party/stcorp_coda/libcoda/coda-grib.c:3103
#11 0x0000555555d10a77 in reopen_with_backend (product_file=0x10418, format=66584) at third_party/stcorp_coda/libcoda/coda-product.c:408
#12 0x0000555555d0e61e in open_file (filename=<optimized out>, product_file=<optimized out>, force_binary=<optimized out>) at third_party/stcorp_coda/libcoda/coda-product.c:550
#13 0x0000555555d0e09d in coda_recognize_file (filename=<optimized out>, file_size=<optimized out>, file_format=<optimized out>, product_class=<optimized out>,
product_type=<optimized out>, version=<optimized out>) at third_party/stcorp_coda/libcoda/coda-product.c:594
#14 0x0000555555c68ced in LLVMFuzzerTestOneInput (data=<optimized out>, size=<optimized out>) at third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:16
The assert is here:
void coda_grib_type_delete(coda_dynamic_type *type)
{
assert(type != NULL);
assert(type->backend == coda_backend_grib); // <--- this assert was hit
Fixed by switching coda_grib_type_delete
to coda_mem_type_delete
lets the fuzzer poc run without crashing.
/* data representation type is Latitude/Longitude Grid */
gds = coda_mem_record_new((coda_type_record *)grib_type[grib1_grid], NULL);
NV = buffer[3];
gtype = grib_type[grib1_numberOfVerticalCoordinateValues];
type = (coda_dynamic_type *)coda_mem_uint8_new((coda_type_number *)gtype, NULL, cproduct, NV);
coda_mem_record_add_field(gds, "numberOfVerticalCoordinateValues", type, 0);
PVL = buffer[4];
gtype = grib_type[grib1_dataRepresentationType];
type = (coda_dynamic_type *)coda_mem_uint8_new((coda_type_number *)gtype, NULL, cproduct, buffer[5]);
coda_mem_record_add_field(gds, "dataRepresentationType", type, 0);
if (read_bytes(product->raw_product, file_offset, 26, buffer) < 0)
{
coda_mem_type_delete((coda_dynamic_type *)gds);
// coda_grib_type_delete((coda_dynamic_type *)gds); <----- wrong type
return -1;
}
Hi,
the INSTALL document states:
- If you want to use the HDF4 features of CODA then you will need to have a
recent version of HDF4 installed (for building the source package on
Windows you will need to have version 4.2.11 of HDF).
You will also need the additional required libraries libjpeg, szlib, and
zlib.
On some UNIX systems you can install HDF4 via the package manager on your
system. Make sure that this package also installs the netcdf.h include file
on your system. If this is not the case you will also have to install the
netcdf package on your system.
Is the part about netcdf.h still up to date? I have installed coda using --with-hdf4, but linking against a HDF4 library that does not contain netcdf.h (HDF4 configured with --disable-netcdf), in an environment without netCDF package. Still, it seems to work (I could open an HDF4 file and read the names of some groups in the file).
When using the code interface in python3 I have several issues with indexing.
For my testing I used an Aeolus L2B product file and matching CODADEF file (simply because I am most familiar with that product as I developed it myself). Extracting single values with fetch works as intended:
import coda
coda.version()
fn = 'AE_TEST_ALD_U_N_2B_20181002T001000_20181002T001136_0001.DBL'
fh = coda.open(fn)
coda.fetch(fh, 'meas_map', 0, 'mie_map_of_l1b_meas_used', 0, 'which_l2b_wind_id')
gives the expected output of:
'2.19'
1
Also replacing the data selection by a single string works fine:
coda.fetch(fh, 'meas_map[0]/mie_map_of_l1b_meas_used[0]/which_l2b_wind_id')
However, extracting this data set as 2D array only works for the first form:
data = coda.fetch(fh, 'meas_map', 0, 'mie_map_of_l1b_meas_used', -1, 'which_l2b_wind_id')
data.shape
this gives the expected output of:
(24,)
But this does not work:
data = coda.fetch(fh, 'meas_map[0]/mie_map_of_l1b_meas_used[-1]/which_l2b_wind_id')
It gives me this error:
Traceback (most recent call last):
File "", line 1, in
File "/usr/people/matlab/.local/lib/python3.7/site-packages/coda/codapython.py", line 652, in fetch
(intermediateNode,pathIndex) = _traverse_path(cursor,path)
File "/usr/people/matlab/.local/lib/python3.7/site-packages/coda/codapython.py", line 185, in _traverse_path
cursor_goto(cursor,path[pathIndex])
codac.CodacError: coda_cursor_goto(): array index (-1) exceeds array range [0:24)
Additional remarks / feature requests:
Including the doxygen generated documentation in the repository prevents developers having to generate this themselves. But it also allows viewing the full documentation online using the link https://htmlpreview.github.io/?https://raw.githubusercontent.com/stcorp/coda/master/doc/html/index.html
There are a lot of places where is would make static analysis easier if the code used C99 syntax. Doing this is typically pretty easy. While it doesn't seem like much, it makes debugging easier and makes compiler and static analyzers' output clearer.
I can do some of these as pull requests if that's okay with the project.
e.g.
uint32_t section_size;
// Lots of code
section_size = (((uint32_t)buffer[0] * 256 + buffer[1]) * 256 + buffer[2]) * 256 + buffer[3];
Could become:
// Lots of code
const uint32_t section_size =
(((uint32_t)buffer[0] * 256 + buffer[1]) * 256 + buffer[2]) * 256 + buffer[3];
And
int i;
// Lots of code
for (i = 0; i < num_grib_types; i++)
{
grib_type[i] = NULL;
}
Could be:
// Lots of code
for (int i = 0; i < num_grib_types; i++)
{
grib_type[i] = NULL;
}
And lots of places where the scope of things can be reduced. e.g.
cppcheck --enable=all --std=c99 --force --inconclusive coda-grib.c
Checking coda-grib.c ...
coda-grib.c:2165:5: style: Assignment of function parameter has no effect outside the function. [uselessAssignmentArg]
file_offset += 4;
^
coda-grib.c:1549:13: style: The scope of the variable 'intvalue' can be reduced. [variableScope]
int32_t intvalue;
^
coda-grib.c:1681:26: style: The scope of the variable 'gds' can be reduced. [variableScope]
coda_mem_record *gds;
^
coda-grib.c:2316:22: style: The scope of the variable 'raw_data' can be reduced. [variableScope]
uint8_t *raw_data;
^
[SNIP]
This is to support the native grid as used for the CAMS and ERA-Interim ECMWF data.
We need to split the description into something that can be used for e.g. plot axis labels or command line output (short description) and something that allows for a more extensive explanation of the data (long description).
The idea is to replace the current single description
parameter into separate short_description
and long_description
parameters.
Currently, the Python interface of CODA is not using the Object Oriented approach in order to make it consistent with the IDL/MATLAB interface (for the higher level CODA API) or the C interface (for the lower level CODA API).
We should, however, add a CODA Python interface that is more 'pythonic' and that treats Products, Cursors, Types, etc. as classess with methods.
For most CODA functions the mapping to a class methods is quite straightforward.
The high-level CODA functions of Python, MATLAB and IDL (such as the fetch() function or field and size inspection functions) currently take a series of record field names and array index references as arguments to indicate a specific part in a product. String parameters indicate field names. Integer (or integer list) parameters indicate array index references.
The idea is to change this such that string parameters represent a cursor path instead of just a fieldname. This would translate into a call to coda_cursor_goto()
in the C library.
This would allow dumping of a binary subblock of data to stdout or a file.
This would only work for data where we currently track the byte offset/lengths of the data inside the original file. So at least for pure ascii/binary data files.
But we should also try to see if we can't use this to e.g. dump specific submessages of grib files or dump specific subsections of xml files.
The coda sources as provided on github do not contain the configure script as mentioned in the INSTALL file. Therefore, if a user wishes to install the coda from sources cloned from github some extra instructions are needed, i.e. the autotools should be applied first.
For example these commands worked in my Fedora linux system:
ln -s ./config.h.cmake.in config.h.in
libtoolize
aclocal
autoconf
automake -a
./configure --enable-python --prefix=`pwd`/../coda_install --with-hdf5
make
make install
I think it would be good to add some instructions to the INSTALL file to explain this.
#0 0x7fcee3833548 in __memmove_ssse3_back (/usr/grte/v4/lib64/libc.so.6+0xc2548)
--
| #1 0x55fd54764b8c in read_bytes third_party/stcorp_coda/libcoda/coda-read-bytes.h:79:9
| #2 0x55fd54767b46 in read_var_array third_party/stcorp_coda/libcoda/coda-netcdf.c:550:13
| #3 0x55fd547645fd in coda_netcdf_reopen third_party/stcorp_coda/libcoda/coda-netcdf.c:901:9
| #4 0x55fd5475b081 in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:402:17
| #5 0x55fd547568db in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
| #6 0x55fd54756163 in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
| #7 0x55fd54651fdf in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3
and
0x60200000508f is located 1 bytes to the left of 1-byte region [0x602000005090,0x602000005091)
allocated by thread T0 here:
#0 0x55859606213d in malloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x5585961742dc in read_var_array third_party/stcorp_coda/libcoda/coda-netcdf.c:542:16
#2 0x558596170f08 in coda_netcdf_reopen third_party/stcorp_coda/libcoda/coda-netcdf.c:901:9
#3 0x558596168403 in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:402:17
#4 0x5585961650e8 in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
#5 0x558596164a2a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
#6 0x55859607b231 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3
Reproduces at 06fa8ab
For netCDF we already treat the last dimension of a multi-dimensional character array as a string length (if that dimension is not the appendable dimension).
We should add the same behaviour for HDF4.
This would improve handling of HDF4 character SDS data.
Instead of using the HDF5 library, create our own implementation to read HDF5 files.
This will allow a much faster access to the data (not need to work with dataspaces, vlen APIs, etc.)
The tricky part will be dealing with compressed data, but this should be similar to how we currently handle this with the CDF backend for zipped data.
The ordering of datasets in the MIP_NL__2P codadef is currently fixed even though it should be made dependent on the order of the species as found in ORDER_OF_SPECIES.
With the regex coda expression function we are actually already quite close to supporting this ordering dynamically:
Add a 'description' attribute to product variables that can be included in the generated documentation. This allows adding an explanation of what the product variables are used for and how they are derived.
==614332==ERROR: AddressSanitizer: SEGV on unknown address 0x7f5369c00fff (pc 0x7f536ce755a0 bp 0x7ffdca355690 sp 0x7ffdca354e48 T0)
--
| ==614332==The signal is caused by a READ memory access.
| #0 0x7f536ce755a0 in __memmove_ssse3_back (/usr/grte/v4/lib64/libc.so.6+0xc35a0)
| #1 0x5597d854d4cc in read_bytes third_party/stcorp_coda/libcoda/coda-read-bytes.h:79:9
| #2 0x5597d854e2d2 in read_GDR third_party/stcorp_coda/libcoda/coda-cdf.c:1107:9
| #3 0x5597d854de33 in read_file third_party/stcorp_coda/libcoda/coda-cdf.c:1297:9
| #4 0x5597d854d044 in coda_cdf_reopen third_party/stcorp_coda/libcoda/coda-cdf.c:1380:9
| #5 0x5597d854ca9a in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:396:17
| #6 0x5597d85497a8 in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
| #7 0x5597d85490ea in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
| #8 0x5597d845f8f1 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3
HDF5 uses the null
dataspace H5S_NULL
to indicate 'emtpy' attributes. This is the way to represent empty string values for attributes in HDF5.
We should support this specific case in CODA for scalar string attributes to return an empty string.
Installation process with ./configure breaks down with error :
"checking the archiver (ar) interface... unknown
configure: error: could not determine ar interface"
I was installed codasetup-win64-2.17.2.exe and python setup.py install.
But can not import the coda package
the error message as followed:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\CODA\python\coda\__init__.py", line 22, in <module>
from .codapython import *
File "D:\CODA\python\coda\codapython.py", line 22, in <module>
from .codac import *
File "D:\CODA\python\coda\codac.py", line 26, in <module>
_codac = swig_import_helper()
File "D:\CODA\python\coda\codac.py", line 22, in swig_import_helper
_mod = imp.load_module('_codac', fp, pathname, description)
File "F:\User\iht\Program Files\Anaconda3\lib\imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "F:\User\iht\Program Files\Anaconda3\lib\imp.py", line 342, in load_dynamic
return _load(spec)
OS: windows 10
python: anaconda python 3.5
Is this package can not use in python 3?
Are there any plans to make the python bindings available on pypi? I know there is already another project with the same name and also some similar project names are assigned.
It would be very comfortable to install the bindings using pip (e.g. in a virtual environment) while the C-library is installed by the system's package manager. libcoda-dev
is in Debian buster.
==231736==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 48 byte(s) in 1 object(s) allocated from:
#1 0x55f342eba1a5 in coda_mem_record_new third_party/stcorp_coda/libcoda/coda-mem-type.c:438:31
#2 0x55f342e91d55 in read_grib1_message third_party/stcorp_coda/libcoda/coda-grib.c:1712:19
#3 0x55f342e84eb2 in coda_grib_reopen third_party/stcorp_coda/libcoda/coda-grib.c:3105:17
#4 0x55f342ec76db in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:410:17
#5 0x55f342ec44c8 in open_file third_party/stcorp_coda/libcoda/coda-product.c:552:9
#6 0x55f342ec3e0a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:596:9
#7 0x55f342dd9c91 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3
Indirect leak of 256 byte(s) in 1 object(s) allocated from:
#1 0x55f342f2266c in coda_hashtable_insert_name third_party/stcorp_coda/libcoda/hashtable.c:166:32
#2 0x55f342f0eec4 in coda_type_record_insert_field third_party/stcorp_coda/libcoda/coda-type.c:1331:9
#3 0x55f342f0df17 in coda_type_record_add_field third_party/stcorp_coda/libcoda/coda-type.c:1427:12
#4 0x55f342e88a30 in grib_init third_party/stcorp_coda/libcoda/coda-grib.c:658:5
#5 0x55f342e84642 in coda_grib_reopen third_party/stcorp_coda/libcoda/coda-grib.c:3010:9
#6 0x55f342ec76db in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:410:17
#7 0x55f342ec44c8 in open_file third_party/stcorp_coda/libcoda/coda-product.c:552:9
#8 0x55f342ec3e0a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:596:9
#9 0x55f342dd9c91 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3
There were a lot more indirect leaks.
Add a way to raise warnings using coda_report_warning()
and coda_set_warning_handler()
(similar to the way it is done in HARP).
The idea is to use this when CODA opens a product using a self-describing data format to raise warnings for content that is not supported by CODA. A warning handler will be set for codacheck by default to report on these warnings. For other tools like codadump this could potentially become a command line option.
We currently have a shell script that uses grep/set/sort/head to determine the last modification date and then uses zip to create the .codadef file (from a directory of .xml files).
The problem is that this script only works on Linux/macOS systems but not on Windows.
It would be more convenient to have this functionality provided by the codadd tool:
section_size = ((buffer[0] * 256 + buffer[1]) * 256 + buffer[2]) * 256 + buffer[3];
Gives:
third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer crash-af51b1fb3ec5c81523e08e7e7c0a567c34650366
Running the target on file crash-af51b1fb3ec5c81523e08e7e7c0a567c34650366 (97 bytes)
third_party/stcorp_coda/libcoda/coda-grib.c:2216:59: runtime error: signed integer overflow: 16777215 * 256 cannot be represented in type 'int'
#0 0x559fc57e6896 in read_grib2_message third_party/stcorp_coda/libcoda/coda-grib.c:2216:59
#1 0x559fc57d3e0f in coda_grib_reopen third_party/stcorp_coda/libcoda/coda-grib.c:3138:17
#2 0x559fc5807eb6 in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:408:17
#3 0x559fc5805a5d in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
#4 0x559fc58054dc in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
#5 0x559fc576024c in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:16:3
Proposed solution:
#include <limits.h>
// SNIP
uint64_t section_size_tmp =
((buffer[0] * 256 + buffer[1]) * 256 + buffer[2]) * 256 + buffer[3];
if (section_size_tmp > UINT_MAX)
{
return -1;
}
section_size = section_size_tmp;
See also:
Allow codadefs to contain format definitions for self-describing formats: netCDF/CDF/HDF4/HDF5/...
A coda_product_check() should then also be able to check this definition against the actual format of the file.
third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer crash-de1b3e5847b6bf90abb4b39bf5ce15b52c765ce6
Running the target on file crash-de1b3e5847b6bf90abb4b39bf5ce15b52c765ce6 (91 bytes)
third_party/stcorp_coda/libcoda/coda-grib.c:1743:35: runtime error: signed integer overflow: 65500 * 56540 cannot be represented in type 'int'
#0 0x5649e8feddf6 in read_grib1_message third_party/stcorp_coda/libcoda/coda-grib.c:1743:35
#1 0x5649e8fdfc0d in coda_grib_reopen third_party/stcorp_coda/libcoda/coda-grib.c:3115:17
#2 0x5649e9013ec6 in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:408:17
#3 0x5649e9011a6d in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
#4 0x5649e90114ec in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
#5 0x5649e8f6c22c in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:16:3
This is the trouble:
if (Ni != 65535 && Nj != 65535)
{
num_elements = Ni * Nj;
}
Ni and Nj are both int values.
A possible fix:
if (Ni != 65535 && Nj != 65535)
{
num_elements = Ni * (long)Nj;
}
This issue covers the general solution for #13.
The idea is to allow codadefs to be used to (re)interpret data from HDF4, HDF5 and netcdf formatted files. (for CDF, GRIB, rinex, and sp3, this currently does not seem needed).
This will allow:
We should combine this with a global CODA option that will either read products using the dynamic format or using the codadef format. If the dynamic format is used a codacheck will then still be possible and will allow to present all issues instead of stopping at the first issue found (which is what will happen when (re)interpreting the product using the codadef as format).
To improve the reading performance of ASCII data and in general of 'array of record' style products (e.g. binary Level 0 data and GOME-2 L1 data) we should find a way to have the byte offsets of top-level array elements cached.
The idea is to do this using product variables. The problem with product variables in its current form is that it 1) requires the length of the variable to be decided before initialization and 2) (requiring a second pass through the product) a full initialization of the variable in one operation to assign all values of the product variable.
We could improve the performance by allowing these two steps to be performed using lazy initialization:
i
to the array element that needs to be initialized)Next to this, we then also need to introduce an 'offset expression' for array types (similar to the 'offset expression' for record fields).
For cases where you only want to call coda.fetch (or coda_fetch) once, it would be very useful if the coda.fetch
does the open/close of a product itself.
pf = coda.open('filename')
data = coda.fetch(pf)
coda.close(pf)
would then become a single
data = coda.fetch('filename')
This means that instead of the first parameter be a product handle or cursor, it should also be possible to let it be a string providing a file path.
This change applies to the Python, IDL, and MATLAB interfaces only.
Reproduces at 06fa8ab
#1 0x558891712f75 in coda_mem_record_new /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-mem-type.c:438:31
--
| #2 0x5588916ddddf in coda_grib_reopen /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-grib.c:3092:25
| #3 0x55889172035b in reopen_with_backend /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-product.c:408:17
| #4 0x55889171d148 in open_file /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-product.c:550:9
| #5 0x55889171ca8a in coda_recognize_file /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-product.c:594:9
| #6 0x558891633291 in LLVMFuzzerTestOneInput /proc/self/cwd/third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3
third_party/stcorp_coda/libcoda/coda-grib.c:2305:74: runtime error: signed integer overflow: 16770703 * 256 cannot be represented in type 'int'
#0 0x55b0fb8c0e80 in read_grib2_message third_party/stcorp_coda/libcoda/coda-grib.c:2305:74
#1 0x55b0fb8ad540 in coda_grib_reopen third_party/stcorp_coda/libcoda/coda-grib.c:3131:17
#2 0x55b0fb8efb4b in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:408:17
#3 0x55b0fb8ec938 in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
#4 0x55b0fb8ec27a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
#5 0x55b0fb802151 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3
SUMMARY: UndefinedBehaviorSanitizer: signed-integer-overflow third_party/stcorp_coda/libcoda/coda-grib.c:2305
In CODA, the dynamic definitions for netcdf use a scalar for netcdf attributes if there is only one element, and will use an array if there is more than one element. If we want to create a codadef where the number of elements can range from 1 to more, then we have a problem, since we don’t read the attributes with one element as an array.
The best way forward (which also solves other aspects, such as allowing conversions and introducing 'time' types) is to use codadefs to (re)interpret how netcdf/etc. products are read (just as XML).
We could combine this with a global CODA option that will either read products using the dynamic format or using the codadef format. If the dynamic format is used a codacheck will then still be possible. This will allow to present all issues instead of stopping at the first issue found (which is what will happen when (re)interpreting the product using the codadef as format).
This would allow us to read Global Forecast System (GFS) data.
Allow products with self describing formats to be recognized using detection expressions in a codadef. Note that this does not require a format definition in the codadef (see #6).
The approach would be to just open the product as normal and then perform a sequence of bool
coda expressions on the product (using the same tree hierarchy of tests as used for matching on the detection block for ascii/binary products).
When compiling and making CODA with Python support, I get the following error message when running the make install command:
make install-am
make[1]: Entering directory '/data/hedelt/coda'
w119,451 -python -Ipython -I./python -DPRINTF_ATTR= -o python/codac.c ./python/codac.i
make[1]: w119,451: Command not found
make[1]: [Makefile:3413: python/codac.c] Error 127 (ignored)
I have pulled the latest version of CODA today...
==23560==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 80 byte(s) in 1 object(s) allocated from:
#0 0x55cc130cab9d in malloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
#1 0x55cc13112de8 in coda_bin_open third_party/stcorp_coda/libcoda/coda-bin.c:237:40
#2 0x55cc131ce447 in open_file third_party/stcorp_coda/libcoda/coda-product.c:532:9
#3 0x55cc131cde0a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
#4 0x55cc130e3c91 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3
Indirect leak of 36 byte(s) in 1 object(s) allocated from:
#0 0x55cc130b6c41 in strdup third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
#1 0x55cc13112eb3 in coda_bin_open third_party/stcorp_coda/libcoda/coda-bin.c:264:30
#2 0x55cc131ce447 in open_file third_party/stcorp_coda/libcoda/coda-product.c:532:9
#3 0x55cc131cde0a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
#4 0x55cc130e3c91 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19:3
SUMMARY: AddressSanitizer: 116 byte(s) leaked in 2 allocation(s).
#1 0x5577a044e25f in integer_constant_new /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-expr.c:284:12
--
| #2 0x5577a044d118 in coda_expression_new /proc/self/cwd/third_party/stcorp_coda/libcoda/coda-expr.c:359:20
| #3 0x5577a0490720 in coda_expression_parse /proc/self/coda/libcoda/coda-expr-parser.y:464:28
| #4 0x5577a049376e in coda_expression_from_string /proc/self/coda/libcoda/coda-expr-parser.y:1049:9
| #5 0x5577a03ecfb9 in LLVMFuzzerTestOneInput /proc/self/cwd/third_party/stcorp_coda/fuzz/coda_expression_fuzzer.cc:23:3
from
#include <stdarg.h>
#include <stddef.h>
#include <stdint.h>
#include <string>
#include "third_party/absl/cleanup/cleanup.h"
#include "third_party/stcorp_coda/libcoda/coda.h"
int printf_black_hole(const char* fmt, ...) {
va_list args;
va_end(args);
return 0;
}
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
coda_init();
auto done = absl::MakeCleanup([] { coda_done(); });
const std::string exprstring(reinterpret_cast<const char *>(data), size);
coda_expression *expr = nullptr;
coda_expression_from_string(exprstring.c_str(), &expr);
if (!expr) return 0;
coda_expression_print(expr, printf_black_hole);
coda_expression_delete(expr);
return 0;
}
When codacheck encounters a date/time format error, it dumps whatever is in that field as part of the error message. When this is a random binary chunk, strange characters appear and terminal behaviour is affected (control codes printed at the prompt after codacheck is finished).
Example output:
ERROR: date/time argument (???5?
????C??) has an incorrect format at [166]/mph/beg_prod_utc
ERROR: date/time argument (?) has an incorrect format at [166]/mph/gen_mph_utc
ERROR: date/time argument (????C??) has an incorrect format at [166]/mph/ref_utc
k???<?R: date/time argument (
?????C?8) has an incorrect format at [166]/mph/asc_utc
ERROR: date/time argument () has an incorrect format at [167]/mph/beg_prod_utc
ERROR: date/time argument (?) has an incorrect format at [167]/mph/gen_mph_utc
ERROR: date/time argument (?6???C??) has an incorrect format at [167]/mph/ref_utc
ERROR: date/time argument () has an incorrect format at [167]/mph/asc_utc
ERROR: date/time argument () has an incorrect format at [168]/mph/beg_prod_utc
ERROR: date/time argument (>?) has an incorrect format at [168]/mph/gen_mph_utc
ERROR: date/time argument (????????????????????????) has an incorrect format at [168]/mph/ref_utc
ERROR: date/time argument () has an incorrect format at [168]/mph/asc_utc
ERROR: incorrect file size (actual size: 235248, calculated: 235079)
~/envisat/products/RA/EMWC $
1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c
There is a new version 5 flavour of the netCDF classic format called CDF-5.
This adds support for unsigned integers, 64 bit integers, and 64 bit values for counts and lengths.
This was introduced in the netCDF library with version 4.4.0-RC4 via a new mode flag called NC_64BIT_DATA.
It should be rather straightforward to add support for this format in CODA.
Note that it is a bit inconvenient that they call this CDF-5 as it creates confusion with the actual CDF format
The rinex and sp3 backends currently read the file contents using a FILE *
instead of using the initial coda_product
handle that maps the whole file as a raw binary block (and which is also used for the file format detection).
The problem with using the 'raw' product handle is that the current mapping of rinex/sp3 to the memory backend uses a buffered file pointer (FILE *) together with fgets. To migrate this we would need some function that provides an efficient fgets (or rather, a ‘readline’) for the CODA ‘raw’ product.
This is likely linked to the replacement of AsciiLine/AsciiLineSeparator/AsciiWhitespace in codadefs by a more generic special expression in codadef for ASCII data that is terminated by special characters (e.g. comma-separated, EOL-separated, etc.)
Are there plans or previous requests for an R interface for CODA?
==1496606==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x55c53036df1e bp 0x7ffe48aa4330 sp 0x7ffe48aa4320 T0)
--
| ==1496606==The signal is caused by a READ memory access.
| ==1496606==Hint: this fault was caused by a dereference of a high value address (see register values below). Dissassemble the provided pc to learn which register was used.
| #0 0x55c53036df1e in coda_dynamic_type_delete third_party/stcorp_coda/libcoda/coda-cursor.c:160:19
| #1 0x55c5304a9d05 in parser_info_cleanup third_party/stcorp_coda/libcoda/coda-xml-parser.c:280:13
| #2 0x55c5304a944a in coda_xml_parse third_party/stcorp_coda/libcoda/coda-xml-parser.c:851:13
| #3 0x55c5304a8495 in coda_xml_reopen third_party/stcorp_coda/libcoda/coda-xml.c:77:9
| #4 0x55c530453aab in reopen_with_backend third_party/stcorp_coda/libcoda/coda-product.c:368:17
| #5 0x55c5304508c8 in open_file third_party/stcorp_coda/libcoda/coda-product.c:550:9
| #6 0x55c53045020a in coda_recognize_file third_party/stcorp_coda/libcoda/coda-product.c:594:9
| #7 0x55c530366a11 in LLVMFuzzerTestOneInput third_party/stcorp_coda/fuzz/coda_recognize_file_fuzzer.cc:19
Trouble happens here:
void coda_dynamic_type_delete(coda_dynamic_type *type)
{
if (type == NULL)
{
return;
}
switch (type->backend) // <-- backend is not valid
{
It happens in practice that GRIB files contain a mix of GRIB1 and GRIB2 messages.
To support this we will have to:
coda_format_grib1
and coda_format_grib2
into a single coda_format_grib
grib1
and grib2
fields pointing to the GRIB1 or GRIB2 record (and where only one of them will be populated)A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.