Git Product home page Git Product logo

libfyaml's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

libfyaml's Issues

String values starting with & must be quoted

When creating a scalar node that starts with &, libfyaml does not quote or escape the string value. When subsequently parsed, the value is mistaken for an anchor. See attached test case (change extension from .txt to .cpp).

The relevant bits of the test case:
...
fy_node* val1 = fy_node_create_scalar(fydoc, "&Hello \"Value1\"", -1);
...
fy_emit_document_to_file(fydoc, flags, yaml_file);
...
fy_document_build_from_file(&fypcfg, yaml_file);
...
const char* nodeval = fy_node_get_scalar(val1, &nodeval_len);
if (nodeval =="&Hello \"Value1\"") { print "PASS"; } else { print "FAIL"; }

Small changes regarding fy-tool's new --tsv-format

Currently to use fy-tool --tsv-event you also need --testsuite.

When I asked for the tsv output, it had nothing to do with the test suite, even
though that output is also one event per line.

The purpose of tsv was for writing a loader/composer using fyaml by just
reading stdin rather than needing to do a formal C binding.
This could be useful for all sorts of things, and honestly we might do well to
make this into a separate binary called bin/yaml-parse-events at some point.

Also it's a bit wrong to have --testsuite as the option for events formatted
for testing with the suite. In the future we will likely have other tests in
the suite besides parse events.

My suggestions at this point would be:

  • --mode=events-tsv
  • --mode=events-testsuite

You can keep --testsuite as an alias for --mode=events-testsuite.


I'd also like the events-tsv output to add a =ERR output line when a parse
error occurs. It should contain the line position numbers and the error
message.

That way a loader built over it could handle/format the error as it sees fit.
This would be cleaner than the loader having to capture an actual parse error
from fy-tool.

Note: This =ERR should not be added to the events-testsuite output.

hard to read document

At present, there is only one api document, which is difficult to read.
No collapsing, no functional navigation.
Can you introduce the functions of each series of APIs separately? as a separate page

stdin based input does too much buffering

I am using the stdin based input for reading a sequence of documents.
These documents are produced in another program that pipes the result to the next program.

My problem right now is that the output on stdout is flushed without problem, but the receiving program does to good a job in buffering the input. I am passing several tiny documents before the first one is even recognized by the parser.

Question is, why is the buffering done anyway? the stdio functions already have a buffering and some standard ways to tweak it. So fyaml doesn't need to add another layer of buffer in my opinion. For my purposes I am removed the buffering and it worked fine.

Missing versioning information during build process

Hello,

I seem to have run into an issue when installing this library from scratch onto a remote linux box (Ubuntu 18.04). I am finding that when I build the software, I eventually get to the screen in which the major/minor/etc/ versioning information is displayed. I am receiving "UNKNOWN" for all versioning metrics, which flows onto the actual library itself. Inspecting the /usr/local/lib folder, I am seeing all associated files with this software having the same effect. ie; libfyaml-UNKNOWN.UNKNOWN.la/so/etc.

Do you have any idea why this may be? I have installed this same software only one month ago and didnt experience the same issue. For reference, I downloaded the source code zip from release 0.5.4., and have installed all pre-requisite software as per the README.

Do let me know if you need further information, thank you.

Best regards,
Adam

Comment parsing doubts

Hi @pantoniou :

I did some "stress tests" to the latest comment-parsing code, and found some edge cases (perhaps you were already familiar with them):

Parsing this block:

myMap:
  e1: 10.0  # Right comment for e1 value
myMap2:
  # Top comment for e2
  e2: 10.0
myMap3:
  # Top comment for e3
  e3: 10.0 # right comment for e3 value
# Top comment for myMap4
myMap4:
  ~
# Top comment for myMap5
myMap5:
  # top comment for a4
  a4: 1

generates these events:

>> Event: Stream start
>> Event: Doc start
>> Event: MAP START
>> token: 0x55a486fbd8d0 Scalar: implicit=0 tag: 0 anchor: 0 value: myMap
>> Event: MAP START
>> token: 0x55a486fbdde0 Scalar: implicit=0 tag: 0 anchor: 0 value: e1
>> token: 0x55a48705bcd0 Scalar: implicit=0 tag: 0 anchor: 0 value: 10.0
>> token: 0x55a48705bcd0 comment [1]: 'Right comment for e1 value'
>> Event: MAP END
>> token: 0x55a486fbdde0 Scalar: implicit=0 tag: 0 anchor: 0 value: myMap2
>> Event: MAP START
>> token: 0x55a486fbd8d0 Scalar: implicit=0 tag: 0 anchor: 0 value: e2
>> token: 0x55a486fbd8d0 comment [0]: 'Top comment for e2'
>> token: 0x55a48705bcd0 Scalar: implicit=0 tag: 0 anchor: 0 value: 10.0
>> Event: MAP END
>> token: 0x55a486fbd8d0 Scalar: implicit=0 tag: 0 anchor: 0 value: myMap3
>> Event: MAP START
>> token: 0x55a486fbdde0 Scalar: implicit=0 tag: 0 anchor: 0 value: e3
>> token: 0x55a486fbdde0 comment [0]: 'Top comment for e3'
>> token: 0x55a48705bcd0 Scalar: implicit=0 tag: 0 anchor: 0 value: 10.0
>> token: 0x55a48705bcd0 comment [1]: 'right comment for e3 value'
>> Event: MAP END
>> token: 0x55a486fbdde0 Scalar: implicit=0 tag: 0 anchor: 0 value: myMap4
>> token: 0x55a486fbdde0 comment [0]: 'Top comment for myMap4'
>> token: 0x55a486fbd8d0 Scalar: implicit=0 tag: 0 anchor: 0 value: ~
>> token: 0x55a486fbd8d0 comment [1]: 'Top comment for myMap5'
>> token: 0x55a486fbdde0 Scalar: implicit=0 tag: 0 anchor: 0 value: myMap5
>> Event: MAP START
>> token: 0x55a48705bcd0 Scalar: implicit=0 tag: 0 anchor: 0 value: a4
>> token: 0x55a48705bcd0 comment [0]: 'top comment for a4'
>> token: 0x55a486fbd8d0 Scalar: implicit=0 tag: 0 anchor: 0 value: 1
>> Event: MAP END
>> Event: MAP END

Note how "Top comment for myMap5" is reported as being a comment on the right of scalar ~, instead of its correct top comment of the next item, the map key "myMap5".

Hope this test case helps finding the issue...

Double free

I'm getting a double free error. Debugging the code I got to line 198 of fy-doc.c:

	if (data_copy)
		fyi = fy_input_from_malloc_data((void *)text, len, &handle, true);

I suppose that should read:

	if (data_copy)
		fyi = fy_input_from_malloc_data((void *)data_copy, len, &handle, true);

Stream output support

Is there a way to stream output? I attempted to by creating and fy_event manually, but there appears to be no way to create tokens.

JSON dump issues

There seem to be a number of issues with the JSON output provided by fy-tool. e.g. given an input file x.yml of:

foo: 
bar: a\s+b

a simple dump command of src/fy-tool -m json x.yml yields

{
  "foo": ,
  "bar": "a\s+b"
}

...which isn't JSON for two reasons:

  1. the value of foo should be a null literal
  2. the backslash in bar's string isn't escaped properly.

Run (or advise to run) `ldconfig` after `make install`

Not running ldconfig after installing yields:

fy-tool: error while loading shared libraries: libfyaml-0.3.so.0: cannot open shared object file: No such file or directory

The install step should do this automatically, or documentation/README should be updated to indicate it as necessary.

unprintable characters in anchor names causes failure

When this file: anchortest.txt
... is given to fy-tool, the expected output should be close to the input. Instead, the output seems to be truncated:

%YAML 1.1
---
- !!tagaaa
  name: somename1
  somekey1:
    - !!tagbbb
      somekey2: !!tagccc
        groups:
          - &group_

If the non-printable characters in the anchor name are replaced with something innocuous like _, (as in this file: anchortest_clean.txt), then the output is as expected:

%YAML 1.1
---
- !!tagaaa
  name: somename1
  somekey1:
    - !!tagbbb
      somekey2: !!tagccc
        groups:
          - &group_____QMCEQMTN_AGMOQDA_B1_rsan`ps._agmoQda/ !!group
            name: "óà\x8A¼QMCEQMTN\x1FAGMOQDA\x1FB1\x1Ersan`ps.\x1FagmoQda/"
- !!tagaaa
  name: end
...

Problem configuring release 0.5.7

I get an odd result when trying to install from libfyaml-v0.5.7.tar.gz
which doesn't happen when I use libfyaml-0.5.5.tar.gz.

The build is on CentOS 7. I believe I'm following identical steps for each.

When I configure version 0.5.5 I get this:

---{ libfyaml 0.5.5 }---

   VERSION:               0.5.5
   MAJOR.MINOR:           0.5
   PATCH:                 5
   EXTRA:                 0.5.5
.
.
.

However when I configure version 0.5.7 I get this:

---{ libfyaml UNKNOWN }---

    VERSION:               UNKNOWN
    MAJOR.MINOR:           UNKNOWN.UNKNOWN
    PATCH:                 UNKNOWN
    EXTRA:                 UNKNOWN

As a result when I install the libraries the file names are incorrect:

for example:

libfyaml-UNKNOWN.UNKNOWN.a

I am not experienced developing with autoconf but I've installed many
packages that use it and I've never seen this before.

Heap-use-after-free and dynamic-stack-buffer-overflow and when calling fy_document_build_from_file()

Hi,

I also find an UAF bug and a dynamic-stack-buffer-overflow bug when doing experiments for AFLAPI.

Environment: Ubuntu 20.04 + gcc 9.4.0

Harness (attached: file named as "test_fy_document_build_from_file.c"):

#include <libfyaml.h>
#include <stdio.h>

int main(int argc, char** argv) {

	if(argc != 2) return 0;
	
	struct fy_document *fyd = NULL;
	fyd = fy_document_build_from_file(NULL, argv[1]);
	if (!fyd) {
		fprintf(stderr, "failed to build document");
		goto failed;
	}
	
failed:
	fy_document_destroy(fyd);
	return 0;
	
}

Poc:
Poc2.zip

To reproduce:
• Complie the hole project with ASAN:

CFLAGS="-fsanitize=address -g" ./bootstrap.sh
CFLAGS="-fsanitize=address -g" ./configure
make && sudo make install

• Complie the harness with ASAN:

gcc -fsanitize=address -o test_fy_document_build_from_file test_fy_document_build_from_file.c -lfyaml

• Run harness:

./test_fy_document_build_from_file ./UAF.yaml # for reporduce UAF
./test_fy_document_build_from_file ./dynamic-stack-buffer-overflow.yaml # for reporduce dynamic stack buffer overflow

About UAF, ASAN says:

UAF.yaml:3:18: error: cannot use tab for indentation of block entry
? a complex key
               :       
                 ^~~~~~~
=================================================================
==1614640==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000000080 at pc 0x7f0adbbea7be bp 0x7ffed9c79870 sp 0x7ffed9c79860
READ of size 8 at 0x606000000080 thread T0
    #0 0x7f0adbbea7bd in list_del lib/fy-list.h:120
    #1 0x7f0adbbea7bd in fy_simple_key_list_del lib/fy-parse.h:79
    #2 0x7f0adbbea7bd in fy_simple_key_list_pop lib/fy-parse.h:79
    #3 0x7f0adbbea7bd in fy_simple_key_vacuum_internal lib/fy-types.c:31
    #4 0x7f0adbba6c75 in fy_parse_cleanup lib/fy-parse.c:842
    #5 0x7f0adbc2fd0e in fy_document_build_internal lib/fy-doc.c:3287
    #6 0x7f0adbc3030c in fy_document_build_from_file lib/fy-doc.c:3320
    #7 0x55bb3c28628b in main (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_build_from_file+0x128b)
    #8 0x7f0adb9b2082 in __libc_start_main ../csu/libc-start.c:308
    #9 0x55bb3c28616d in _start (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_build_from_file+0x116d)

0x606000000080 is located 0 bytes inside of 64-byte region [0x606000000080,0x6060000000c0)
freed by thread T0 here:
    #0 0x7f0adbdcf40f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:122
    #1 0x7f0adbbea782 in fy_simple_key_vacuum_internal lib/fy-types.c:31
    #2 0x60600000007f  (<unknown module>)

previously allocated by thread T0 here:
    #0 0x7f0adbdcf808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
    #1 0x7f0adbbea5f1 in fy_simple_key_alloc_simple_internal lib/fy-types.c:31
    #2 0x7f0adbbea5f1 in fy_simple_key_alloc_simple_internal lib/fy-types.c:31

SUMMARY: AddressSanitizer: heap-use-after-free lib/fy-list.h:120 in list_del
Shadow bytes around the buggy address:
  0x0c0c7fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c0c7fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c0c7fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c0c7fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c0c7fff8000: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa
=>0x0c0c7fff8010:[fd]fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  0x0c0c7fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==1614640==ABORTING

About dynamic stack buffer overflow, ASAN says:

dynamic-stack-buffer-overflow.yaml:2:255: error: plain scalar is malformed UTF8
                                                                                                                   
                                                                                                                                                                                                                                                                                                                                           ^

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ~~~~~~~~~~~~~~~~
                                                                                                                   =================================================================
==1614738==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow on address 0x7ffdf6134cd5 at pc 0x7f73f2f75f3d bp 0x7ffdf6134ad0 sp 0x7ffdf6134278
WRITE of size 1793 at 0x7ffdf6134cd5 thread T0
    #0 0x7f73f2f75f3c in __interceptor_memset ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:762
    #1 0x7f73f2e3a93e in memset /usr/include/x86_64-linux-gnu/bits/string_fortified.h:71
    #2 0x7f73f2e3a93e in fy_diag_error_atom_display lib/fy-diag.c:789
    #3 0x7f73f2e3c236 in fy_diag_error_token_display lib/fy-diag.c:806
    #4 0x7f73f2e3c236 in fy_diag_error_token_display lib/fy-diag.c:801
    #5 0x7f73f2e3c236 in fy_diag_vreport lib/fy-diag.c:854
    #6 0x7f73f2e3ed3f in fy_reader_diag_report lib/fy-diag.c:1243
    #7 0x7f73f2e1ec4f in fy_reader_fetch_plain_scalar_handle lib/fy-parse.c:4261
    #8 0x7f73f2e261c3 in fy_fetch_plain_scalar lib/fy-parse.c:4707
    #9 0x7f73f2e2827f in fy_fetch_tokens lib/fy-parse.c:5022
    #10 0x7f73f2e2a057 in fy_scan_peek lib/fy-parse.c:5093
    #11 0x7f73f2e2a057 in fy_scan_peek lib/fy-parse.c:5038
    #12 0x7f73f2e2f6ac in fy_parse_internal lib/fy-parse.c:5989
    #13 0x7f73f2e84bff in fy_document_builder_load_document lib/fy-docbuilder.c:529
    #14 0x7f73f2e7b6be in fy_parse_load_document_with_builder lib/fy-doc.c:1940
    #15 0x7f73f2e7bacd in fy_document_build_internal lib/fy-doc.c:3242
    #16 0x7f73f2e7c30c in fy_document_build_from_file lib/fy-doc.c:3320
    #17 0x55698ebce28b in main (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_build_from_file+0x128b)
    #18 0x7f73f2bfe082 in __libc_start_main ../csu/libc-start.c:308
    #19 0x55698ebce16d in _start (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_build_from_file+0x116d)

Address 0x7ffdf6134cd5 is located in stack of thread T0
SUMMARY: AddressSanitizer: dynamic-stack-buffer-overflow ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:762 in __interceptor_memset
Shadow bytes around the buggy address:
  0x10003ec1e940: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10003ec1e950: 00 00 00 00 00 00 00 00 00 00 00 00 ca ca ca ca
  0x10003ec1e960: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10003ec1e970: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10003ec1e980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10003ec1e990: 00 00 00 00 00 00 00 00 00 00[05]cb cb cb cb cb
  0x10003ec1e9a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10003ec1e9b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
  0x10003ec1e9c0: f1 f1 04 f2 00 00 00 00 00 00 00 00 00 00 00 f2
  0x10003ec1e9d0: f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
  0x10003ec1e9e0: 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3 f3 f3 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==1614738==ABORTING

346 test failures when compiling with UBSan/ASan

$ make check V=1 CFLAGS='-O2 -g -pipe -fsanitize=address,undefined -fsanitize-address-use-after-scope -fno-sanitize-recover=all -fno-omit-frame-pointer -fno-common'
...
# TOTAL: 996
# PASS:  650
# SKIP:  0
# XFAIL: 0
# FAIL:  346
# XPASS: 0
# ERROR: 0

In test/test-suite.log, it seems all the failures arise from the following 2 errors:

xxhash/xxhash.c:227:52: runtime error: member access within misaligned address 0x7f5615565007 for type 'struct U32_S', which requires 4 byte alignment

lib/fy-emit.c:1685:47: runtime error: left shift of 3 by 30 places cannot be represented in type 'int'

build_from/emit_to string reverse operation, result not the same as origin, maybe memory breaked

build_from_file, emit to string, build_from_string again , then emit to file, the file not the same as origin

    fyd = fy_document_build_from_file(NULL, "key.yaml");
    char *out = NULL, *in = NULL;    
   out = fy_emit_document_to_string(fyd, FYECF_DEFAULT);
    if (out) {
        printf("out[\n%s\n] line[%d]\n", out, __LINE__);
    }
       fy_document_destroy(fyd); fyd=NULL;
    in = out;
    printf("in[\n%s\n] len[%ld] line[%d]\n", in, strlen(in), __LINE__);
    fyd = fy_document_build_from_string(NULL, in, FY_NT); // BUG,乱码
    //fyd = fy_document_create(NULL);
    //fyn = fy_node_build_from_string(fyd, in, strlen(in)); // BUG, 乱码
    free(out); out=NULL;
    fy_emit_document_to_file(fyd, FYECF_DEFAULT, "key.yaml.tmp2");

libfyaml does not adhere to the yaml specification

We found a case where libfyaml does not adhere to the yaml specification. We're storing ip addresses in a sequence, and one of the ip addresses is "::". That is a valid IPv6 address. A very short yaml file that's valid according to the specification is

---
- ::
...

libfyaml tries to parse the :: as a mapping. It should only be a mapping if there's a space after the last colon. libfyaml is in good company making this mistake. I've tried libyaml, ruby's yaml parser, and a number of online validators. Only one got it right:

https://yamlchecker.com/

fy_document_scanf() and scalar value

I tried using the fy_document_scanf() function to read a scalar value from an array and it looks like it doesn't work? This test code:

#include <stdlib.h>

#include <libfyaml.h>

int main(int argc, char *argv[])
{
	static const char *yaml = 
		"---\n"
		" - aaa\n"
		" - bbb\n"
		" - ccc\n";
	static const char *json = "{\"bla\":[\"AAA\",\"BBB\",\"CCC\"]}";

	struct fy_document *fyd1, *fyd2;
	char sout[10];

	if (!(fyd1=fy_document_build_from_string(NULL, yaml, (size_t)-1))) return EXIT_FAILURE;
	if (!(fyd2=fy_document_build_from_string(NULL, json, (size_t)-1))) return EXIT_FAILURE;
	
	if (fy_document_scanf(fyd1, "/[1] %10s", sout)!=1) return EXIT_FAILURE;
	printf("sout.yaml='%s'\n", sout);

	if (fy_document_scanf(fyd2, "/bla/[-2] %10s", sout)!=1) return EXIT_FAILURE;
	printf("sout.json='%s'\n", sout);

	return EXIT_SUCCESS;
}

I looked in the library and after a small intervention it looks better, I can also read the value from a document containing only one scalar (path /):

--- src/lib/fy-doc.c.orig	2022-06-11 19:57:44.000000000 +0100
+++ src/lib/fy-doc.c	2022-06-12 07:51:53.000000000 +0100
@@ -4070,9 +4070,10 @@
 			__func__, __LINE__, fy_node_get_path(fyn), (int)(e - s), s); */
 	fyn = fy_node_follow_aliases(fyn, flags, true);
 
-	/* scalar can't match (it has no key) */
+	/* scalar can be only last element in the path (it has no key) */
 	if (fy_node_is_scalar(fyn)) {
-		fyn = NULL;
+		if (*s)
+			fyn = NULL; /* not end of the path - fail */
 		goto out;
 	}
 

Please, is there a documented search path format?

bootstrap.sh fails on not-latest aclocal

I'm getting a bad symbol error on HPE NonStop when running bootstrap.sh:

configure.ac:99: error: possibly undefined macro: AC_LTDL_ENABLE_INSTALL
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.

I am running autoconf 2.69.

support for huge files

I'm using YAML for configuration of scientific simulations and in exceptional cases, the file size can exceed 2GB. When using such a large file, I get [ERR]: fy_parse_load_document() failed.

Could this be related to using int for some length-related operations or is it most likely caused by another limitation?

If it is only an integer overflow: Is a MR changing int to size_t where needed welcomed? Or is the higher memory consumption not acceptable (having in mind that I'm probably the only person with ridiculously large YAML files).

API to get parsed comments

I've been trying to add an API to get the parsed comments, via adding this commit https://github.com/MRPT/libfyaml/commit/2e67a1528fabb9f8478ebadc66d744a3b4227bea
both, over your project "master", and over the "comment-wip" branch (both result in the same error described next).

Token values with a comment correctly have fy_atom_is_set() to TRUE, but trying to get the comment text from a parser event loop with the new function fy_token_get_comment() only returns a pointer to rubbish...

Please, take a look at my use of fy_atom_format_text() since something must be wrong but can't find it.

PS: Thanks for this amazing library!

sequence of documents output style

I'm trying to get the emitter to output an array of documents in full block style. This is what the library outputs in block mode:

key: [
    a: 1,
    b: 2]

and this is what I'm trying to emit as output:

key:
  - a: 1
  - b: 2

Is this possible?

Limit for document_build_from_file

It seems that some resources are exhausted when calling fy_document_build_from_file.

The following code crashes for me after 1021 iterations

#ifdef HAVE_CONFIG_H
#include "config.h"
#endif

#include <stdlib.h>
#include <stdio.h>

#include <libfyaml.h>

int main(int argc, char *argv[])
{
    struct fy_document *fyd = NULL;

    int i=1;
    while(i<10000)
    {
        fyd = fy_document_build_from_file(NULL, "test.yaml");
        if (!fyd) {
        	fprintf(stderr, "failed to build document");
        	return 1;
        }
        fprintf(stdout,"%d\n",i);
        i++;
    }
    return 0;
}

The content of the file (test.yaml) seems to be irrelevant, I use the following small example.

---
test: 1

One test fails

ERROR: testemitter-streaming.test - missing test plan
ERROR: testemitter-streaming.test - exited with status 127 (command not found?)
============================================================================
Testsuite summary for libfyaml 0.7.12
============================================================================
# TOTAL: 1490
# PASS:  1485
# SKIP:  2
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 2
============================================================================
See test/test-suite.log
Please report to [email protected]
============================================================================
FAIL: testerrors
================

1..10
errmsg: :2:1: error: duplicate key
ok 1 0002 - Duplicate key (plain scalar)
PASS: testerrors.test 1 0002 - Duplicate key (plain scalar)
errmsg: :2:2: error: duplicate key
ok 2 0003 - Duplicate key (plain scalar, quoted scalar)
PASS: testerrors.test 2 0003 - Duplicate key (plain scalar, quoted scalar)
errmsg: :3:1: error: duplicate key
ok 3 0004 - Duplicate key (plain scalar, literal scalar)
PASS: testerrors.test 3 0004 - Duplicate key (plain scalar, literal scalar)
errmsg: :2:1: error: duplicate key
ok 4 0005 - Duplicate key (sequence)
PASS: testerrors.test 4 0005 - Duplicate key (sequence)
errmsg: :2:1: error: duplicate key
Segmentation fault
ok 5 0006 - Duplicate key (simple mapping)
PASS: testerrors.test 5 0006 - Duplicate key (simple mapping)
errmsg: 
--- ./test-errors/0007//test.error	2019-12-18 23:13:37.000000000 -0800
+++ /tmp/tmp.TKUbWAy0	2022-11-02 23:53:05.336522000 -0700
@@ -1 +1 @@
-:2:1: error: duplicate key
+
not ok 6 0007 - Duplicate key (complex sorted mapping)
FAIL: testerrors.test 6 0007 - Duplicate key (complex sorted mapping)
errmsg: :1:5: error: invalid alias
ok 7 0008 - Unknown alias
PASS: testerrors.test 7 0008 - Unknown alias
errmsg: :2:9: error: invalid merge key value
ok 8 0009 - Invalid merge key (referencing not a mapping)
PASS: testerrors.test 8 0009 - Invalid merge key (referencing not a mapping)
errmsg: :1:8: error: invalid merge key value
ok 9 0010 - Invalid merge key (not an alias, scalar)
PASS: testerrors.test 9 0010 - Invalid merge key (not an alias, scalar)
errmsg: :2:8: error: invalid merge key value
ok 10 0011 - Invalid merge key (not an alias sequence item)
PASS: testerrors.test 10 0011 - Invalid merge key (not an alias sequence item)

Version: 0.7.12
OS: FreeBSD 13.1

Add the support for Windows

Failed to build on Windows using MinGW.

For example,

  • '_SC_PAGESIZE' undeclared in fy-emit.c:1946:22
  • There's no alloca.h in Windows

Buffer-overflow in fy_utf8_get_right() when calling fy_document_insert_at()

Hi,

I am running some experiments for AFLAPI(fuzzing) and it has found a buffer-overflow (to be exact, out-of-bounds access after debugging) in fy_atom_raw_line_iter_next. This bug seems to be harmless because it happened when insert with an invalid character in alias (fy_diag_error_atom_display-->fy_atom_raw_line_iter_next --> fy_utf8_get_right).

Environment: Ubuntu 20.04 + gcc 9.4.0

I have debugged it a few hours ago, but cannot find what really cause this bug. But I found that this bug access out-of-bounds in fy-utf8.h/line 93, so this bug seems to be harmless.

Harness (attached: file named as "test_fy_document_insert_at.c"):

#include <libfyaml.h>
#include <stdio.h>

int main(int argc, char** argv) {
	
	struct fy_document *fyd = NULL;
	fyd = fy_document_build_from_file(NULL, "test1.yaml");
	if (!fyd) {
		fprintf(stderr, "failed to build document");
		goto failed;
	}
	int rc;
	
	char key[12] = {0x26, 0x2b, 0x74, 0x68, 0x65, 0x62, 0x65, 0x86, 0x6e, 0x67, 0x77, 0x00}; // here is the poc (len: 0xc, but access position 0xd?)

	rc = fy_document_insert_at(fyd, key, FY_NT, fy_node_buildf(fyd, "abc"));
	
	if (rc) {
		fprintf(stderr, "failed to emit document to stdout\n");
		goto failed;
	}
	
	rc = fy_emit_document_to_fp(fyd, FYECF_DEFAULT | FYECF_SORT_KEYS, stdout);
	if (rc) {
		fprintf(stderr, "failed to emit document to stdout\n");
		goto failed;
	}

failed:
	fy_document_destroy(fyd);
	return rc;
	
}

The test1.yaml (attached):

base: &base
    name: this-is-a-name

Poc:
Poc.zip

To reproduce:
• Complie the hole project with ASAN:

CFLAGS="-fsanitize=address -g" ./bootstrap.sh
CFLAGS="-fsanitize=address -g" ./configure
make && sudo make install

• Complie the harness with ASAN:

gcc -fsanitize=address -o test_fy_document_insert_at test_fy_document_insert_at.c -lfyaml

• Run harness:

./test_fy_document_insert_at

ASAN says:

<memory-@0x7ffd52519e40-0x7ffd52519e4a>:1:14: error: invalid character in anchor
=================================================================
==1614159==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffd52519e4c at pc 0x7f0b71a150ce bp 0x7ffd52517ba0 sp 0x7ffd52517b90
READ of size 1 at 0x7ffd52519e4c thread T0
    #0 0x7f0b71a150cd in fy_utf8_get_right lib/fy-utf8.h:93
    #1 0x7f0b71a150cd in fy_atom_raw_line_iter_next lib/fy-atom.c:1732
    #2 0x7f0b71a04339 in fy_diag_error_atom_display lib/fy-diag.c:661
    #3 0x7f0b71a06236 in fy_diag_error_token_display lib/fy-diag.c:806
    #4 0x7f0b71a06236 in fy_diag_error_token_display lib/fy-diag.c:801
    #5 0x7f0b71a06236 in fy_diag_vreport lib/fy-diag.c:854
    #6 0x7f0b71a06eee in fy_parser_diag_vreport lib/fy-diag.c:963
    #7 0x7f0b71a0706f in fy_parser_diag_report lib/fy-diag.c:976
    #8 0x7f0b719d7a6d in fy_fetch_anchor_or_alias lib/fy-parse.c:2894
    #9 0x7f0b719f1c77 in fy_fetch_tokens lib/fy-parse.c:4976
    #10 0x7f0b719f4057 in fy_scan_peek lib/fy-parse.c:5093
    #11 0x7f0b719f4057 in fy_scan_peek lib/fy-parse.c:5038
    #12 0x7f0b719f7144 in fy_parse_internal lib/fy-parse.c:5524
    #13 0x7f0b71a4ebff in fy_document_builder_load_document lib/fy-docbuilder.c:529
    #14 0x7f0b71a456be in fy_parse_load_document_with_builder lib/fy-doc.c:1940
    #15 0x7f0b71a45acd in fy_document_build_internal lib/fy-doc.c:3242
    #16 0x7f0b71a45e9f in fy_document_build_from_string lib/fy-doc.c:3299
    #17 0x7f0b71a4603b in fy_node_mapping_lookup_pair_by_string lib/fy-doc.c:3793
    #18 0x7f0b71a4603b in fy_node_mapping_lookup_pair_by_string lib/fy-doc.c:3784
    #19 0x7f0b71a4609c in fy_node_mapping_lookup_by_string lib/fy-doc.c:3810
    #20 0x7f0b71a434a4 in fy_node_by_path_internal lib/fy-doc.c:4202
    #21 0x7f0b71a47171 in fy_node_by_path lib/fy-doc.c:4467
    #22 0x7f0b71a47386 in fy_document_insert_at lib/fy-doc.c:2484
    #23 0x55bc66b654a7 in main (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_insert_at+0x14a7)
    #24 0x7f0b717c8082 in __libc_start_main ../csu/libc-start.c:308
    #25 0x55bc66b6522d in _start (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_insert_at+0x122d)

Address 0x7ffd52519e4c is located in stack of thread T0 at offset 44 in frame
    #0 0x55bc66b652f8 in main (/home/ubuntu/test/libfyaml/fuzz/test_fy_document_insert_at+0x12f8)

  This frame has 1 object(s):
    [32, 44) 'key' (line 14) <== Memory access at offset 44 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow lib/fy-utf8.h:93 in fy_utf8_get_right
Shadow bytes around the buggy address:
  0x10002a49b370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10002a49b380: 00 00 00 00 00 00 f1 f1 f1 f1 f1 f1 04 f2 00 f2
  0x10002a49b390: f2 f2 04 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
  0x10002a49b3a0: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f3 f3 f3
  0x10002a49b3b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10002a49b3c0: 00 00 00 00 f1 f1 f1 f1 00[04]f3 f3 00 00 00 00
  0x10002a49b3d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10002a49b3e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10002a49b3f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10002a49b400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10002a49b410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==1614159==ABORTING

Handling tabs to support DT YAML binding docs

I think something useful to the DeviceTree community would be a fy_parse_cfg_flags option that supports handling tabs. One of the limitation of the DT binding docs in YAML is that with existing parsers, one can not leave the usual tabs in the example usage (typically copied from a kernel .dts file with spaces). Something like FYPCF_ENABLE_TABS could engage digestion of tabs for this case.

GPL license on fy-list.h

It seem's that it is forbidden to licesne your library under MIT license with code licensed under GPL2.0 in it.

indentation for array elements

Please add the ability to specify an indent specifically for array elements in the emitter. The current output is this:

key:
- a: 1
- b: 2

Many prefer the readability of arrays like this:

key:
  - a: 1
  - b: 2

Does not build on alpine linux (qsort_r missing)

I'm trying to build libfyaml 0.2 on alpine linux 3.10. It seems they are not using glibc, so qsort_r is missing:

lib/fy-doc.c: In function 'fy_node_mapping_perform_sort':
lib/fy-doc.c:3608:2: warning: implicit declaration of function 'qsort_r'; did you mean 'qsort'? [-Wimplicit-function-declaration]                                                                                                        
  qsort_r(fynpp, count, sizeof(*fynpp), fy_node_mapping_sort_cmp, &ctx);
  ^~~~~~~
  qsort
  CC    lib/fy-emit.o
  CC    lib/fy-utils.o
  LINK  libfyaml-0.2.la
ar: `u' modifier ignored since `D' is the default (see `U')
  CC    tool/fy_tool-fy-tool.o
  LINK  fy-tool
/usr/lib/gcc/x86_64-alpine-linux-musl/8.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: ./.libs/libfyaml-0.2.so: undefined reference to `qsort_r'                                                                                       
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:641: fy-tool] Error 1
make[1]: *** [Makefile:575: all-recursive] Error 1
make: *** [Makefile:474: all] Error 2

Preferred defaults for `fy-tool -m dejson`

First off dejson is a bad name for this.
How about yamlfmt, yfmt,, tidy or something?
JSON has nothing to do with this.

So here are some changes to the default emitter format I'd like to see...

$ echo $'foo:\n- bar' | fy-tool -m yamlfmt -
foo:
- bar
$ echo $'foo:\n- "bar\\nbaz\\nboom\\n"' | fy-tool -m yamlfmt -
---
foo:
- |
  bar
  baz
  boom

Do those for starters.

In general @perlpunk's yamlpp-load-dump --preserve 0 does a good job for this kind of formatting.
You can install with cpanm YAML::PP.

Boolean values for YAML v1.2

The v1.2 spec states that "true", "false", "on", "off" etc. should be interpreted as their boolean value counterparts.
As this library claims to support YAML 1.2, how is that handled, or do I just have to strcmp regardless?

Thanks

Parallel test execution breaks tests due to unfinished git clone

When running the test suite in parallel, the two tests testsuite.test and jsontestsuite.test may be executed before their dependencies test-suite-data and json-test-suite-data are finished, as the download / cloning of the git repository takes a considerable amount of time. In this case, those two tests fail due to missing test data.

You may be able to reproduce this issue by running make check with the -j flag, e.g. env TESTS="jsontestsuite.test" make -e -j16 check. If the test is executed too fast / before the git clone(s) of test-suite-data and json-test-suite-data are finished, the test log file will include errors like:

[ERR]: failed to open json-test-suite-data/test_parsing/n_*.json
[ERR]: failed to open json-test-suite-data/test_parsing/i_*.json

It may be a solution to clone the two repos during initial build in e.g. bootstrap.sh - or, if it's desired to only download those files when tests are actually executed, inside the actual tests.

Compilation problem with -fvisibility=hidden

If the gcc option -fvisibility=hidden is set as part of CFLAGS when compiling, libfyaml's make all fails when attempting to link fy-tool. A work-around is to add the lines

#if __GNUC__ >= 4
    #pragma GCC visibility push(default)
#endif

to libfyaml.h. This makes sure that the public functions are not hidden by the -fvisibility=hidden flag.

Thanks for developing the library. It is working great!

quoting strings with comma in flow style

I have the following content:

- Aa,
  Bb, C,
  D
- Eee,
  F, Gg,
  E

fy-dump -m flow-oneline test.yaml gives

[Aa, Bb, C, D, Eee, F, Gg, E]

where only the coloring of the commas indicates that the list contains 2 string entries, not 8. When using the library directly to convert to oneline flow style,

void to_flow(char **flow, int* length_flow, const char *mixed){                                   
  struct fy_document *fyd = NULL;                                                                   
  enum fy_emitter_cfg_flags emit_flags = FYECF_MODE_FLOW_ONELINE | FYECF_STRIP_LABELS | FYECF_STRIP_TAGS |FYECF_STRIP_DOC;
                                                                                                    
  fyd = fy_document_build_from_string(NULL, mixed, -1);                                             
  if (!fyd) {                                                                                       
    *length_flow = -1;                                                                              
    return;                                                                                         
  }                                                                                                 
  int err = fy_document_resolve(fyd);                                                               
  if (err) {                                                                                        
    *length_flow = -1;                                                                              
    return;                                                                                         
  }                                                                                                 
                                                                                                    
  *flow = fy_emit_document_to_string(fyd,emit_flags);                                               
  *length_flow = strlen(*flow);                                                                     
                                                                                                    
  fy_document_destroy(fyd);                                                                         
}

the coloring is lost and one gets a list with 8 entries. Is it possible to automatically add quotes to multiline strings?

programmable fy_document_scanf() and fy_node_scanf()

In https://github.com/pantoniou/libfyaml#usage-and-examples, there is
count = fy_document_scanf(fyd, "/invoice %u " "/bill-to/given %256s", &invoice_nr, given);
User has to hardcode the key in the _scanf() call. Is there a way to do something like this
count = fy_document_scanf(fyd, "%s %u " "%s %256s", key1, &invoice_nr, key2, given);

In the doc page of fy_node_scanf(), https://pantoniou.github.io/libfyaml/libfyaml.html#fy-node-scanf
fyn = { foo: 3 } -> fy_node_scanf(fyn, "/foo d", struct var) -> var = 3
Again, user has to hardcode the key, e.g. "/foo", in the _scanf() call. Since fyn has been located already, is there a way to do something like this?
fyn = { foo: 3 } -> fy_node_scanf(fyn, "d", struct var) -> var = 3

There is also fy_node_get_scalar() API which returns a char*. Are there API's to get numeric values?

Thank you very much for the insight.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.