wblgers / py_speech_seg Goto Github PK
View Code? Open in Web Editor NEWA toolkit to implement segmentation on speech based on BIC and nerual network, such as BiLSTM
A toolkit to implement segmentation on speech based on BIC and nerual network, such as BiLSTM
你好,博主,请问是怎么下载AMI数据集的呢?麻烦您有空帮忙解答以下哈
请问这个k value值是如何确定的呢?
您好,我已经看了prepare_dataset这份代码,但是能提供一下示例的训练音频数据的格式吗 谢谢
如果一段电话录音中有一段铃声,可以自动识别并切除掉吗?
还有就是您的模型的在说话人分离的准确率上表现的怎么样呢?有统计过相应的指标吗?
谢谢!
Your output wav file is not usable in python "wave" module. Error message:
wave.Error: unknown format: 3
I do not know WAV format well , your output file in encoding with "F.P. PCM" while other working-well files are encoding with "Signed PCM".
I would appreciate it very much if you could help me solve this problem.
Traceback (most recent call last):
File "c:\Users\jiaqi.Li.vscode\extensions\ms-python.python-2019.4.12954\pythonFiles\ptvsd_launcher.py", line 43, in
main(ptvsdArgs)
File "c:\Users\jiaqi.Li.vscode\extensions\ms-python.python-2019.4.12954\pythonFiles\lib\python\ptvsd_main_.py", line 410, in main
run()
File "c:\Users\jiaqi.Li.vscode\extensions\ms-python.python-2019.4.12954\pythonFiles\lib\python\ptvsd_main_.py", line 291, in run_file
runpy.run_path(target, run_name='main')
File "D:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "D:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "D:\ProgramData\Anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "e:\moviesub_proj\py_speech_seg-master\multi_detect.py", line 11, in
cluster_method='bic')
File "e:\moviesub_proj\py_speech_seg-master\speech_segmentation.py", line 122, in multi_segmentation
y, sr = librosa.load(file, sr=sr)
File "D:\ProgramData\Anaconda3\lib\site-packages\librosa\core\audio.py", line 119, in load
with audioread.audio_open(os.path.realpath(path)) as input_file:
File "D:\ProgramData\Anaconda3\lib\site-packages\audioread_init.py", line 111, in audio_open
return BackendClass(path)
File "D:\ProgramData\Anaconda3\lib\site-packages\audioread\rawread.py", line 62, in init
self._fh = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'E:\moviesub_proj\duihua_sample.wav'
为什么BIC的公式被改为了协方差矩阵的迹了?你博客里写的是行列式呢?
请问我想找出一段语音的开头和结尾怎么做呢?
请教一下,不是很理解 wStart = wStart + detBIC + 200
中+200的含义。这样不是就跳过分割点的后200个点,默认这200点中里面没有分割点了吗?
def speech_segmentation(mfccs):
wStart = 0
wEnd = 200
wGrow = 200
delta = 25
m, n = mfccs.shape
store_cp = []
index = 0
while wEnd < n:
featureSeg = mfccs[:, wStart:wEnd]
detBIC = compute_bic(featureSeg, delta)
index = index + 1
if detBIC > 0:
temp = wStart + detBIC
store_cp.append(temp)
wStart = wStart + detBIC + 200
wEnd = wStart + wGrow
else:
wEnd = wEnd + wGrow
return np.array(store_cp)
BIC = 0.5*(nnp.log(det0)-indexnp.log(det1)-(n-index)np.log(det2))-0.5(m+0.5m(m+1))*np.log(n)
就是最前面的那个0.5,我看您给的公式中并不需要乘以这个0.5
Traceback (most recent call last):
File "multi_detect.py", line 54, in
main()
File "multi_detect.py", line 26, in main
seg_point = seg.multi_segmentation(wavfile,outdir,sr,mono,frame_size,frame_s
hift,plot_seg=False,save_seg=True,classify_seg=False)
File "E:\duplicate_data\test1\speech_seg-master\speech_segmentation.py", line
112, in multi_segmentation
x1, x2 = vad.vad(temp, sr=sr, framelen=frame_size, frameshift=frame_shift)
File "E:\duplicate_data\test1\speech_seg-master\voice_activity_detect.py", lin
e 27, in vad
signs = (tmp1* tmp2) < 0
ValueError: operands could not be broadcast together with shapes (1288,256) (128
7,256)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.