wblgers / py_speech_seg Goto Github PK

View Code? Open in Web Editor NEW

121.0 121.0 39.0 4.46 MB

A toolkit to implement segmentation on speech based on BIC and nerual network, such as BiLSTM

Python 100.00%

py_speech_seg's People

Contributors

Stargazers

Watchers

py_speech_seg's Issues

Please input the best K value：？

请问这个k value值是如何确定的呢？

I would like to ask why running my own recorded audio file program will cause an exception

I would like to ask why running my own recorded audio file program will cause an exception

I would appreciate it very much if you could help me solve this problem.

能提供一下训练的音频格式吗

您好，我已经看了prepare_dataset这份代码，但是能提供一下示例的训练音频数据的格式吗谢谢

如果录音中有音乐可以切除吗

如果一段电话录音中有一段铃声，可以自动识别并切除掉吗?
还有就是您的模型的在说话人分离的准确率上表现的怎么样呢?有统计过相应的指标吗?
谢谢!

The output wav file is not usable in python "wave" module. wave.Error: unknown format: 3

Your output wav file is not usable in python "wave" module. Error message:
wave.Error: unknown format: 3

I do not know WAV format well , your output file in encoding with "F.P. PCM" while other working-well files are encoding with "Signed PCM".

I would appreciate it very much if you could help me solve this problem.

Traceback (most recent call last):
File "c:\Users\jiaqi.Li.vscode\extensions\ms-python.python-2019.4.12954\pythonFiles\ptvsd_launcher.py", line 43, in
main(ptvsdArgs)
File "c:\Users\jiaqi.Li.vscode\extensions\ms-python.python-2019.4.12954\pythonFiles\lib\python\ptvsd_main_.py", line 410, in main
run()
File "c:\Users\jiaqi.Li.vscode\extensions\ms-python.python-2019.4.12954\pythonFiles\lib\python\ptvsd_main_.py", line 291, in run_file
runpy.run_path(target, run_name='main')
File "D:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "D:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "D:\ProgramData\Anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "e:\moviesub_proj\py_speech_seg-master\multi_detect.py", line 11, in
cluster_method='bic')
File "e:\moviesub_proj\py_speech_seg-master\speech_segmentation.py", line 122, in multi_segmentation
y, sr = librosa.load(file, sr=sr)
File "D:\ProgramData\Anaconda3\lib\site-packages\librosa\core\audio.py", line 119, in load
with audioread.audio_open(os.path.realpath(path)) as input_file:
File "D:\ProgramData\Anaconda3\lib\site-packages\audioread_init.py", line 111, in audio_open
return BackendClass(path)
File "D:\ProgramData\Anaconda3\lib\site-packages\audioread\rawread.py", line 62, in init
self._fh = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'E:\moviesub_proj\duihua_sample.wav'

Windows python3.8 环境下运行错误

你好，我在Windows python3.8 环境下运行错误，能不能帮我看看是什么问题，截图：

Why the BIC is the trace of convariance matrix instead of the determinant as the paper or your blog said?

为什么BIC的公式被改为了协方差矩阵的迹了？你博客里写的是行列式呢？

找出开头和结尾

请问我想找出一段语音的开头和结尾怎么做呢？

BIC检查窗口

请教一下，不是很理解 wStart = wStart + detBIC + 200 中+200的含义。这样不是就跳过分割点的后200个点,默认这200点中里面没有分割点了吗？

def speech_segmentation(mfccs):
    wStart = 0
    wEnd = 200
    wGrow = 200
    delta = 25

    m, n = mfccs.shape

    store_cp = []
    index = 0
    while wEnd < n:
        featureSeg = mfccs[:, wStart:wEnd]
        detBIC = compute_bic(featureSeg, delta)
        index = index + 1
        if detBIC > 0:
            temp = wStart + detBIC
            store_cp.append(temp)
            wStart = wStart + detBIC + 200
            wEnd = wStart + wGrow
        else:
            wEnd = wEnd + wGrow

    return np.array(store_cp)

这个为什么要乘以0.5，公式上并没有乘

BIC = 0.5*(nnp.log(det0)-indexnp.log(det1)-(n-index)np.log(det2))-0.5(m+0.5m(m+1))*np.log(n)

就是最前面的那个0.5，我看您给的公式中并不需要乘以这个0.5

enframe error

Traceback (most recent call last):
File "multi_detect.py", line 54, in
main()
File "multi_detect.py", line 26, in main
seg_point = seg.multi_segmentation(wavfile,outdir,sr,mono,frame_size,frame_s
hift,plot_seg=False,save_seg=True,classify_seg=False)
File "E:\duplicate_data\test1\speech_seg-master\speech_segmentation.py", line
112, in multi_segmentation
x1, x2 = vad.vad(temp, sr=sr, framelen=frame_size, frameshift=frame_shift)
File "E:\duplicate_data\test1\speech_seg-master\voice_activity_detect.py", lin
e 27, in vad
signs = (tmp1* tmp2) < 0
ValueError: operands could not be broadcast together with shapes (1288,256) (128
7,256)

wblgers / py_speech_seg Goto Github PK

py_speech_seg's People

Contributors

Stargazers

Watchers

Forkers

py_speech_seg's Issues

Recommend Projects

Recommend Topics

Recommend Org