Git Product home page Git Product logo

Comments (32)

keygee avatar keygee commented on June 7, 2024 2

Hi Andrea,
I made some tests and I would like to share with you my results:

tests made:

tic_frame_size_at_1fs set to 48:

  • ./run_test.sh S16_LE 48000 2 5
  • ./run_test.sh S24_3LE 48000 2 5
  • ./run_test.sh S32_LE 48000 2 5

tic_frame_size_at_1fs set to 96:

  • ./run_test.sh S16_LE 48000 2 5
  • ./run_test.sh S24_3LE 48000 2 5
  • ./run_test.sh S32_LE 48000 2 5

tic_frame_size_at_1fs set to 192:

  • ./run_test.sh S16_LE 48000 2 5
  • ./run_test.sh S24_3LE 48000 2 5
  • ./run_test.sh S32_LE 48000 2 5

I made these tests on:

Machine 1:

  • Raspberry Pi 4 Model B Rev 1.1
    ARMv7 Processor rev 3 (v7l)
    Kernel: 4.19.108-v7l+ (cross Compiled)
    Distribution: CRUX 3.5

Machine 2

  • Sony VAIO VPCS13V9E
    Intel(R) Core(TM) i5 CPU M 460 @ 2.53GHz
    Kernel: 5.7.6
    Distribution: CRUX 3.6

Machine 3

  • DELL Optiplex 980
    Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz
    Kernel: 5.7.6
    Distribution: CRUX 3.6

All results have been OK in all machines, except if I make "stressing work" during the test (compiling heavy code or transcoding videos...). Repeating the tests in a idle state resulted always OK.

If could be of any interest, I can share my .config ( or part of it ) of my kernels
Another info that could (or not) be of any interest, is that I compiled all without clang, because I haven't in my machines.

If you need more info or tests, please let me know.

Cheers
Guido

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

Hi @jtc-dolby,
the problem doesn't seem to show up anymore if I recompile the Linux kernel with the timer frequency set to 100Hz:
CONFIG_HZ=100
In the kernel menu config you can change the timer frequency in:
Kernel features -> Timer frequency
Can you try this and report if it works for you ?
Thanks

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

Also this a frequency of 250Hz the problem doesn't show up.

from aes67-linux-daemon.

jtc-dolby avatar jtc-dolby commented on June 7, 2024

You said that the problem doesn't show up with 250Hz but this is the default frequency in kernel config file. If this is true then why would expect a difference in behaviour with 100Hz? I will try 100Hz anyway and see if it changes anything.

from aes67-linux-daemon.

jtc-dolby avatar jtc-dolby commented on June 7, 2024

I tested with 100Hz. Not good news. I got drops using aplay both with and without the -M option (Memory mapped mode). (Before I got drops only with the -M option). The drops were much more frequent when using the -M option: 43 vs 5 drops over a 45min period.

from aes67-linux-daemon.

drtechno1971 avatar drtechno1971 commented on June 7, 2024

Try:
CONFIG_HZ=1000
CONFIG_NO_HZ_FULL=y
Problem with this driver it uses a webui instead of plain C console echo, but this might fair better.
Also, It might be better to write a driver extension to Alsa instead of using a 3rd party virtual driver to lower latency.

from aes67-linux-daemon.

drtechno1971 avatar drtechno1971 commented on June 7, 2024

Also you might want to try passing idle=mwait at boot time so the kernal doesn't reschedule the timing due to others using adaptive timing ticks.

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

I have finally found some time to proceed on the debugging of this issue and I have just pushed on the branch test_issue_17 the files required to run a test on the loopback network interface and to verify the integrity of the wav file recorded.
In the test I configure a single 48Khz L24 stereo source and the corresponding sink to playback and record on the loopback network interface a sample wav file created artificially to enable simple detection of corruptions on the recording.
To run the test and create the recording:

./run_test.sh

To verify that the recording is correct:

cd test
c++ check.cc -o check
./check

The expected output is : "ok", otherwise the program prints out the file position where the corruption was detected.
I have run the test successfully many times on both ARM and x86_64 platforms and I never had a single invalid recording.
In my tests I have tried to use both mermory mapped and read/write interleaved modes.

from aes67-linux-daemon.

jtc-dolby avatar jtc-dolby commented on June 7, 2024

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

When I have originally opened the issue I was running the tests using the daemon to playback a 48Khz L24 stereo file and the Audinate Dante USB to perform the recording.
Apparently this device has strict latency constrains and when it receives a late packet (that is a packet exceeding the 2ms delay in this configuration) it replaces the late samples with zeros and this leads to such recordings.
I enclosed a picture showing such late packets as reported by the Dante device latency tab:
Dante latency

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

hi @jtc-dolby , great to hear you again ;-)
Can you just try to run the simple test on the test_issue_17 branch ? I'd like to hear the results. Thanks.
Anyway this is not conclusive analysis and I am going to perform more device to device (not loopback) tests.

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

At present I suspect that the issue is actually related to these late packets.
At the moment I am not sure why I have such late packets when I run tests on the X86_64 platform but I have none when using some ARM boards I tested here.
In my opinion the issue is not related to the daemon or the Merging driver but it might come from the Linux kernel or the network device or driver in use.
I enclosed a picture with the latency measurement of a stream sent by the daemon running on NanoPi2 with Linux nanopineo2 5.8.8-sunxi64 kenrl.
As you can see I have no late packets and the peak latency is 1.4ms
Dante latency NanoPi2
On this board the latency keeps stable also on a longer period of time and I don't have a single late packet, see picture below:
Dante latency NanoPi2 -2

from aes67-linux-daemon.

jtc-dolby avatar jtc-dolby commented on June 7, 2024

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

After additional tests on the X86 platform I can reproduce a corruption of the output file by changing the driver base timer period to 48 ("tic_frame_size_at_1fs": 48) in the daemon configuration file used by the test (test/daemon.conf). This was set to 192 in my commit.
So I repeated the tests changing the "tic_frame_size_at_1fs" parameter to 48 (1 ms), 96 (2 ms) and 192 (4 ms). The result is that while on the ARM platforms (I tested so far) the output file is always OK for all the values, on my X86_64 I can systematically reproduce a corruption with 48 only and it always works with 96 and 192. On this platform my attempts to use a low latency or preemptive kernel didn’t solve the issue.
The root cause of the issue is unknown to me but the test I implemented can be used to verify if a specific platform is able to serve or not at the specified base period. In case of a negative result (corruption of the output file) my suggestion is to switch to another.

from aes67-linux-daemon.

jtc-dolby avatar jtc-dolby commented on June 7, 2024

I have executed the 'run_test.sh' on the issue 17 branch several times without making any changes and the checker returns OK. Should I try again with tic set to 48 instead of 192?

from aes67-linux-daemon.

jtc-dolby avatar jtc-dolby commented on June 7, 2024

I ran the tests again with TIC=48 three times and it was OK everytime.

from aes67-linux-daemon.

gugaguga24678 avatar gugaguga24678 commented on June 7, 2024

When I have originally opened the issue I was running the tests using the daemon to playback a 48Khz L24 stereo file and the Audinate Dante USB to perform the recording.
Apparently this device has strict latency constrains and when it receives a late packet (that is a packet exceeding the 2ms delay in this configuration) it replaces the late samples with zeros and this leads to such recordings.
I enclosed a picture showing such late packets as reported by the Dante device latency tab:
Dante latency

Hi @bondagit ,is it possible to change the delay setting from 2ms to 5ms?
I have the same problem. When setting tic_frame_size_at_1fs to 48, the Dante Controller shows more than 50% of the packets are late. The drops were much more frequent when setting tic_frame_size_at_1fs to 96 or 192.
I ran the run_test.sh and check several times, the program prints out "error at position:" every time.

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

Hi @bondagit ,is it possible to change the delay setting from 2ms to 5ms?

I don't know how to do this, check with Audinate.

I have the same problem. When setting tic_frame_size_at_1fs to 48, the Dante Controller shows more than 50% of the packets are late. The drops were much more frequent when setting tic_frame_size_at_1fs to 96 or 192.
I ran the run_test.sh and check several times, the program prints out "error at position:" every time.

What platform are you using for the tests ? What Linux Kernel version ?
The parameter "tic_frame_size_at_1fs" sets the driver base period. At each tick of this period the driver schedule the outbound RTP packets for sending and processes the inbound packets. So if your Dante is using a delay setting of 2ms you have to set tic_frame_size_at_1fs to 48 (1ms), any larger setting cannot work. Despite this as you can see you still have late packets and also the loopback test fails so your platform doesn't seem to support this setting. My suggestion is to try with another board.

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

Hi @jtc-dolby

I ran the tests again with TIC=48 three times and it was OK everytime.

So your platform looks ok, or at least it's ok with a 4 mins recording. Probably we should test it for a longer period of time.
But, if I remember it correctly, you reported drops every minute in case of playback in memory mapped mode and this is weird.
Do you have a Dante AVIO USB device ? can you monitor for late packets and measure the avg and the peak latency of the output stream ?

from aes67-linux-daemon.

jtc-dolby avatar jtc-dolby commented on June 7, 2024

I did see drops fairly frequently (every minute or so). I don't have a Dante AVIO device but the Lawo devices I do have gather statistics regarding late packets. I can use the same build on the same machine as I just ran the tests and see what happens. Is it possible there are differences between stream type or codec? My focus was the AM824 because this is carrying data and that is covered by a CRC so corruption is very apparent. I can put data inside L16/L24 for testing purposes.

from aes67-linux-daemon.

Aika0 avatar Aika0 commented on June 7, 2024

Testing issue 17 with "tic_frame_size_at_1fs": 48 on various Raspberry Pis

Device A (10.1.14.120)
Raspberry Pi 4 Model B Rev 1.1
Hardware: BCM2711
Revision: a03111
Kernel: 5.4.83-v7l+
Result 1: ok
Result 2: ok
Result 3: ok

Device B (10.1.14.77)
Raspberry Pi 3 Model B Plus Rev 1.3
Hardware: BCM2845
Revision: a020d3
Kernel: 5.4.83-v7+
Result 1: ok
Result 2: ok
Result 3: ok

Device C (10.1.14.55)
Raspberry Pi 3 Model B Rev 1.2
Hardware: BCM2835
Revision: a02082
Kernel: 5.4.83-v7+
Result 1: ok
Result 2: ok
Result 3: ok

Device D (10.1.14.92)
Raspberry Pi 2 Model B Rev 1.1
Hardware: BCM2835
Revision: a21041
Kernel: 5.4.83-v7+
Result 1: ok
Result 2: ok
Result 3: ok

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

Testing issue 17 with "tic_frame_size_at_1fs": 48 on various Raspberry Pis
....

Great job, thank you. The results were as expected as I was able to get it to work correctly even on a board with an ARM SOC older than the ones you tested.
My idea was to create a new page in the documentation with a list of compatible HWs. Anyway at the moment the test validates only the configuration with a single L24 / stereo / 48Khz source and sink which is however one of the most used.

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

I did see drops fairly frequently (every minute or so). I don't have a Dante AVIO device but the Lawo devices I do have gather statistics regarding late packets. I can use the same build on the same machine as I just ran the tests and see what happens.

Thanks, I think this should help to narrow down the issue: If you have late packets it can be that your receiver device is dropping them (replacing the late samples with silence).

Is it possible there are differences between stream type or codec? My focus was the AM824 because this is carrying data and that is covered by a CRC so corruption is very apparent. I can put data inside L16/L24 for testing purposes.

Yes, a different codec can make a difference. For this reason I was also thinking to enhance this test suite to allow for different configurations. In the meantime you could try modifying my test to play an AM824 (L32) file and record an L32 wav file. Afterwards you should be able to use the CRC to detect registration errors instead of relying on the samples sequence as I did.
This way we can test using the loopback interface and we exclude any network related issue.

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

Hi @jtc-dolby ,
first of all please include a couple of new patches I have just released for the driver.
Apart from this I enhanced the test suite to support multiple configurations and I re-executed some tests on the platforms I have here with the results reported below.

On X86_64 with tic_frame_size_at_1fs set to 48 in test/daemon.conf, I have the following results:
./run_test.sh S16_LE 48000 2 5 -> fails
./run_test.sh S24_3LE 48000 2 5 -> fails
./run_test.sh S32_LE 48000 2 5 -> fails
On X86_64 with the tic_frame_size_at_1fs set to 96 and 192 in test/daemon.conf, I have the following results:
./run_test.sh S16_LE 48000 2 5 -> OK
./run_test.sh S24_3LE 48000 2 5 -> OK
./run_test.sh S32_LE 48000 2 5 -> OK
On ARM with tic_frame_size_at_1fs set to 48, 96, 192 in test/daemon.conf, I have the following results:
./run_test.sh S16_LE 48000 2 5 -> OK
./run_test.sh S24_3LE 48000 2 5 -> OK
./run_test.sh S32_LE 48000 2 5 -> OK

So, according to my tests the sample format doesn't affect the test result. One of the possible reasons is that the samples size of driver internal playback and capture buffers is always set to 32 bit.

Can you repeat these 3 tests using your platform ? Thanks.

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

I have just found another issue in the driver. This time it's related to the playback in read-write interleaved mode and it affects the initial samples of the output stream.
The issue is solved by 77cf7bd. This patch has anyway no effect on the ./run_test.sh results reported above. The new problem can be easly reproduced by removing the option (-M) to aplay in run_test.sh script.

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

Hi @jtc-dolby, did you have the chance to run the new tests? Thanks

from aes67-linux-daemon.

jtc-dolby avatar jtc-dolby commented on June 7, 2024

I ran the test. Results below. I didn't reinstall the driver between tests but just edited the conf file and reran the script. Is that correct?

Here are my results from the tests:
On X86_64 - Ubuntu 19.10
with tic_frame_size_at_1fs set to 48 in test/daemon.conf, I have the following results:
./run_test.sh S16_LE 48000 2 5 -> fails
./run_test.sh S24_3LE 48000 2 5 -> fails
./run_test.sh S32_LE 48000 2 5 -> fails
with the tic_frame_size_at_1fs set to 96 in test/daemon.conf, I have the following results:
./run_test.sh S16_LE 48000 2 5 -> fails
./run_test.sh S24_3LE 48000 2 5 -> fails
./run_test.sh S32_LE 48000 2 5 -> OK
with the tic_frame_size_at_1fs set to 192 in test/daemon.conf, I have the following results:
./run_test.sh S16_LE 48000 2 5 -> OK
./run_test.sh S24_3LE 48000 2 5 -> fails
./run_test.sh S32_LE 48000 2 5 -> OK

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

Yes, that's correct because the last test script uninstall and reinstall the driver at every run.
It's hard to draw a conclusion here but what I can certainly say is that you cannot achieve 1ms latency on the platform you are using, or better saying you cannot achieve it with a good level of reliabilty. And this is also consistent with the results you had using your final setup.
Please consider that the test result is very much affected by contention among the drivers & processes running on your platform and it provides only an indication of the latency you can achieve: the whole solution is not thought to offer a reliable communication channel but it's meant to keep the latency under control.
Anyway it can still be that the problem depends on some race condition or bug in the driver code.
At present I can achieve the best results using ARM boards running a minimal Linux distro dedicated to AES67 only.

from aes67-linux-daemon.

jtc-dolby avatar jtc-dolby commented on June 7, 2024

OK. I think that is a reasonable theory. My machine is pretty old and 1ms is quite tight. I'd like to prove it by increasing the latency and making the problems go away. 192 (4ms) certainly looks better. Can I go higher? How about 480 (10ms)?

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

yes, I have succesfully tried to raise it up 480 (10ms) included. If think even more should be possible.
I have just noticed that the last patch I released may create troubles. Please repeat the test leaving out the folloiwing patch for the driver:

ravenna-alsa-fix-playback-rw-mode.patch

This was not included at the time I ran the tests my side. I will revise it as soon as I have time.

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

Hi @jtc-dolby,
I have reworked the patch for the read-write interleaved mode and also simplified and unified the driver PCM interface. See commit 8a56006
The previous patch was not always working properly and causing troubles also in memory mapped mode.
After this new patch I re-tested the configurations above multiple times on both X86_64 and ARM and I consistently have the same results already reported (by me) with memory mapped and read-write interleaved modes.
Could you re-try the tests your side using the same configurations you used previously and the latest build ?
Thanks.

from aes67-linux-daemon.

bondagit avatar bondagit commented on June 7, 2024

Hi Guido ,
thanks a lot for taking the time to run these tests. Your timely contribution is very much appreciated ;-) and your tests also show that there is no open issue on the X86_64 platform.
I have also run more tests my side and the driver situation looks pretty stable now.
Of course as you experienced the tests results may be affected by other loads running on the CPU at the same time and this has been discussed already and the problem could be mitigated by setting high priority for aplay & arecord or just by taking care to stop additional loads during the tests ...
I will move the test suite to the master branch as replacement of the demo, add some documentation and prepare the repo for release 1.0

from aes67-linux-daemon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.