Git Product home page Git Product logo

py_speech_seg's People

Contributors

wblgers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py_speech_seg's Issues

数据集问题

你好,博主,请问是怎么下载AMI数据集的呢?麻烦您有空帮忙解答以下哈

如果录音中有音乐可以切除吗

如果一段电话录音中有一段铃声,可以自动识别并切除掉吗?
还有就是您的模型的在说话人分离的准确率上表现的怎么样呢?有统计过相应的指标吗?
谢谢!

这里报错,源文件没有更改过。

Traceback (most recent call last):
File "c:\Users\jiaqi.Li.vscode\extensions\ms-python.python-2019.4.12954\pythonFiles\ptvsd_launcher.py", line 43, in
main(ptvsdArgs)
File "c:\Users\jiaqi.Li.vscode\extensions\ms-python.python-2019.4.12954\pythonFiles\lib\python\ptvsd_main_.py", line 410, in main
run()
File "c:\Users\jiaqi.Li.vscode\extensions\ms-python.python-2019.4.12954\pythonFiles\lib\python\ptvsd_main_.py", line 291, in run_file
runpy.run_path(target, run_name='main')
File "D:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "D:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "D:\ProgramData\Anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "e:\moviesub_proj\py_speech_seg-master\multi_detect.py", line 11, in
cluster_method='bic')
File "e:\moviesub_proj\py_speech_seg-master\speech_segmentation.py", line 122, in multi_segmentation
y, sr = librosa.load(file, sr=sr)
File "D:\ProgramData\Anaconda3\lib\site-packages\librosa\core\audio.py", line 119, in load
with audioread.audio_open(os.path.realpath(path)) as input_file:
File "D:\ProgramData\Anaconda3\lib\site-packages\audioread_init
.py", line 111, in audio_open
return BackendClass(path)
File "D:\ProgramData\Anaconda3\lib\site-packages\audioread\rawread.py", line 62, in init
self._fh = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'E:\moviesub_proj\duihua_sample.wav'

BIC检查窗口

请教一下,不是很理解 wStart = wStart + detBIC + 200 中+200的含义。这样不是就跳过分割点的后200个点,默认这200点中里面没有分割点了吗?

def speech_segmentation(mfccs):
    wStart = 0
    wEnd = 200
    wGrow = 200
    delta = 25

    m, n = mfccs.shape

    store_cp = []
    index = 0
    while wEnd < n:
        featureSeg = mfccs[:, wStart:wEnd]
        detBIC = compute_bic(featureSeg, delta)
        index = index + 1
        if detBIC > 0:
            temp = wStart + detBIC
            store_cp.append(temp)
            wStart = wStart + detBIC + 200
            wEnd = wStart + wGrow
        else:
            wEnd = wEnd + wGrow

    return np.array(store_cp)

enframe error

Traceback (most recent call last):
File "multi_detect.py", line 54, in
main()
File "multi_detect.py", line 26, in main
seg_point = seg.multi_segmentation(wavfile,outdir,sr,mono,frame_size,frame_s
hift,plot_seg=False,save_seg=True,classify_seg=False)
File "E:\duplicate_data\test1\speech_seg-master\speech_segmentation.py", line
112, in multi_segmentation
x1, x2 = vad.vad(temp, sr=sr, framelen=frame_size, frameshift=frame_shift)
File "E:\duplicate_data\test1\speech_seg-master\voice_activity_detect.py", lin
e 27, in vad
signs = (tmp1* tmp2) < 0
ValueError: operands could not be broadcast together with shapes (1288,256) (128
7,256)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.