Recently I discovered your website which help auto generating command for video fade and audio crossfade, it was a lifesaver for me.
But, theres one critical problem, the video offset. I was joining multiple videos like 4 videos at once and I used your script to fade and acrossfade the videos. Yes it is indeed taking a lot of time to join for almost a week long video, only to find out the audio is not syncing with the video.
I used 1 duration fade for the setting, and found out it can delay for like 3 seconds for 4 video.
Then i spent a lot of time figuring out why does this happen. I tried spliting the video and the audio to find out the output duration, did a lot of discoveries on the internet and lots of math, only to find out that the offset had to be -1 second.
Example, I had this script generated from your website:
name:
2024-04-09_11-08-02.mp4
duration:
10.273177
(seconds, float)
name:
2024-04-09_11-08-14.mp4
duration:
35.673177
(seconds, float)
name:
2024-04-09_11-08-51.mp4
duration:
37.639844
(seconds, float)
name:
2024-04-09_11-09-29.mp4
duration:
26.10651
(seconds, float)
name:
2024-04-09_11-09-57.mp4
duration:
30.373177
(seconds, float)
name:
2024-04-09_11-10-29.mp4
duration:
21.839844
(seconds, float)
Total video duration will be 161.905729
with no fade transition.
Since it has 6 videos, it will be -6s, means it supposed to be 155.905729
long.
ffmpeg -vsync 0 -c:v hevc_cuvid -i 2024-04-09_11-08-02.mp4 -c:v hevc_cuvid -i 2024-04-09_11-08-14.mp4 -c:v hevc_cuvid -i 2024-04-09_11-08-51.mp4 -c:v hevc_cuvid -i 2024-04-09_11-09-29.mp4 -c:v hevc_cuvid -i 2024-04-09_11-09-57.mp4 -c:v hevc_cuvid -i 2024-04-09_11-10-29.mp4 -filter_complex "[0]settb=AVTB[0:v];[1]settb=AVTB[1:v];[2]settb=AVTB[2:v];[3]settb=AVTB[3:v];[4]settb=AVTB[4:v];[5]settb=AVTB[5:v];[0]atrim=0:10.273177[0:a];[1]atrim=0:35.673177[1:a];[2]atrim=0:37.639844[2:a];[3]atrim=0:26.10651[3:a];[4]atrim=0:30.373177[4:a];[5]atrim=0:21.839844[5:a];[0:v][1:v]xfade=transition=fade:duration=0.5:offset=9.773[v1];[v1][2:v]xfade=transition=fade:duration=0.5:offset=44.946[v2];[v2][3:v]xfade=transition=fade:duration=0.5:offset=82.086[v3];[v3][4:v]xfade=transition=fade:duration=0.5:offset=107.693[v4];[v4][5:v]xfade=transition=fade:duration=0.5:offset=137.566,format=yuv420p[video];[0:a][1:a]acrossfade=d=0.5:c1=tri:c2=tri[a1];[a1][2:a]acrossfade=d=0.5:c1=tri:c2=tri[a2];[a2][3:a]acrossfade=d=0.5:c1=tri:c2=tri[a3];[a3][4:a]acrossfade=d=0.5:c1=tri:c2=tri[a4];[a4][5:a]acrossfade=d=0.5:c1=tri:c2=tri[audio]" -c:v hevc_nvenc -rc vbr -cq 30 -qmin 30 -qmax 30 -profile:v main -pix_fmt yuv420p -b:v 0K -b_ref_mode 0 -movflags faststart -c:a libopus -ar 48000 -b:a 96k -map "[audio]" -map "[video]" a.mp4
the video and audio were out of sync,
then i split the video and audio command, found out that:
- The video was
00:02:39.43
long
- The audio was
00:02:31.34
long
then I messed with the offset on the xfade filter based on how many videos to join and then offset -1.1s on the first transition, -2.2s on the second transition and so on:
ffmpeg -vsync 0 -c:v hevc_cuvid -i 2024-04-09_11-08-02.mp4 -c:v hevc_cuvid -i 2024-04-09_11-08-14.mp4 -c:v hevc_cuvid -i 2024-04-09_11-08-51.mp4 -c:v hevc_cuvid -i 2024-04-09_11-09-29.mp4 -c:v hevc_cuvid -i 2024-04-09_11-09-57.mp4 -c:v hevc_cuvid -i 2024-04-09_11-10-29.mp4 -filter_complex "[0]settb=AVTB[0:v];[1]settb=AVTB[1:v];[2]settb=AVTB[2:v];[3]settb=AVTB[3:v];[4]settb=AVTB[4:v];[5]settb=AVTB[5:v];[0]atrim=0:10.273177[0:a];[1]atrim=0:35.673177[1:a];[2]atrim=0:37.639844[2:a];[3]atrim=0:26.10651[3:a];[4]atrim=0:30.373177[4:a];[5]atrim=0:21.839844[5:a];[0:v][1:v]xfade=transition=fade:duration=1:offset=9.273[v1];[v1][2:v]xfade=transition=fade:duration=1:offset=43.946[v2];[v2][3:v]xfade=transition=fade:duration=1:offset=80.586[v3];[v3][4:v]xfade=transition=fade:duration=1:offset=105.693[v4];[v4][5:v]xfade=transition=fade:duration=1:offset=135.066,format=yuv420p[video];[0:a][1:a]acrossfade=d=1:c1=tri:c2=tri[a1];[a1][2:a]acrossfade=d=1:c1=tri:c2=tri[a2];[a2][3:a]acrossfade=d=1:c1=tri:c2=tri[a3];[a3][4:a]acrossfade=d=1:c1=tri:c2=tri[a4];[a4][5:a]acrossfade=d=1:c1=tri:c2=tri[audio]" -c:v h264_nvenc -rc vbr -cq 30 -qmin 30 -qmax 30 -profile:v main -pix_fmt yuv420p -b:v 0K -b_ref_mode 0 -movflags faststart -c:a libopus -ar 48000 -b:a 96k -map "[audio]" -map "[video]" a.mp4
,
i got nearly the duration with the audio, 00:02:31.33
same with the audio:
ffmpeg -vsync 0 -c:v hevc_cuvid -i 2024-04-09_11-08-02.mp4 -c:v hevc_cuvid -i 2024-04-09_11-08-14.mp4 -c:v hevc_cuvid -i 2024-04-09_11-08-51.mp4 -c:v hevc_cuvid -i 2024-04-09_11-09-29.mp4 -c:v hevc_cuvid -i 2024-04-09_11-09-57.mp4 -c:v hevc_cuvid -i 2024-04-09_11-10-29.mp4 -filter_complex "[0]atrim=0:10.273177[0:a];[1]atrim=0:35.673177[1:a];[2]atrim=0:37.639844[2:a];[3]atrim=0:26.10651[3:a];[4]atrim=0:30.373177[4:a];[5]atrim=0:21.839844[5:a];[0:a][1:a]acrossfade=d=1:c1=tri:c2=tri[a1];[a1][2:a]acrossfade=d=1:c1=tri:c2=tri[a2];[a2][3:a]acrossfade=d=1:c1=tri:c2=tri[a3];[a3][4:a]acrossfade=d=1:c1=tri:c2=tri[a4];[a4][5:a]acrossfade=d=1:c1=tri:c2=tri[audio]" -c:v h264_nvenc -b:v 10M -map "[audio]" temp.wav
ofc I got 00:02:31.34
BUT, the audio still end up not syncing with the video
then i try the -1s offset on the video again:
ffmpeg -vsync 0 -c:v hevc_cuvid -i 2024-04-09_11-08-02.mp4 -c:v hevc_cuvid -i 2024-04-09_11-08-14.mp4 -c:v hevc_cuvid -i 2024-04-09_11-08-51.mp4 -c:v hevc_cuvid -i 2024-04-09_11-09-29.mp4 -c:v hevc_cuvid -i 2024-04-09_11-09-57.mp4 -c:v hevc_cuvid -i 2024-04-09_11-10-29.mp4 -filter_complex "[0]settb=AVTB[0:v];[1]settb=AVTB[1:v];[2]settb=AVTB[2:v];[3]settb=AVTB[3:v];[4]settb=AVTB[4:v];[5]settb=AVTB[5:v];[0:v][1:v]xfade=transition=fade:duration=1:offset=8.273[v1];[v1][2:v]xfade=transition=fade:duration=1:offset=41.946[v2];[v2][3:v]xfade=transition=fade:duration=1:offset=77.586[v3];[v3][4:v]xfade=transition=fade:duration=1:offset=101.693[v4];[v4][5:v]xfade=transition=fade:duration=1:offset=130.066,format=yuv420p[video]" -c:v h264_nvenc -rc vbr -cq 30 -qmin 30 -qmax 30 -profile:v main -pix_fmt yuv420p -b:v 0K -b_ref_mode 0 -movflags faststart -c:a libopus -ar 48000 -b:a 96k -map "[video]" temp.mp4
and i got 00:02:31.83
AND this is how it managed to sync with the audio somehow.
Update on this part:
After removing the atrim, I realised the duration of the video and the audio is not the same, that's why atrim is still needed.
Conclusion:
WHAT THE HECK is even going on lol. Been driving my mind crazy for the whole day.
I still can't confirm whether is my video's duration problem or the generative script -1s offset bug. I remember I did managed to work on first try but maybe I didn't even notice about the audio whether is sync or not, that was just 2 videos together so is not obvious, only when there's more videos to join. But in the end, I still managed to have a work around on this issue.
Please confirm this issue, Thanks!
Update 1: I did a lot of edits on this post since I keep experimenting with the values I go right now. I originally had this issue using 0.5 for the duration settings, but I think 1s duration could explain the issue better, does that mean 0.5s duration transition = -0.5s offset? i don't even know anymore lol.