First do face recognition, then do whispering detection, speaker diarization, and combine video speech darilization. Then video captioning,then langchian find related frames interval
kPPM-3PpWRhFx46i.mp4
from video import VideoSeparator
vs = VideoSeparator(tolerance= [0.66, 0.66,0.68], max_encoding_length = 20,
path = "./Web/my-video.mp4")
# clip = vs.read_video()
vs.process_video_and_audio()
python blip2_caption.py
The above code is for data processing. once the data is processed, we can run the following code to run the web app
uvicorn main:app --reload
cd Web/
http-server