saturn-lab / audioplot Goto Github PK

View Code? Open in Web Editor NEW

6.0 10.0 3.0 16 KB

Plotting audio files to spectrum and waveform files in batch processing.

Home Page: http://gitlab.icenter.tsinghua.edu.cn/saturnlab/audioPlot

Python 100.00%

audioplot's Introduction

Step 1. Prepare for processing

Step 1.1 get FFMPEG

Download ffmpeg from ffmpeg, you should select Static linking and get a zip file.
extract the zip file into ffmpeg folder, so that there exists ffmpeg/bin/ffmeg.exe.

Step 1.2 get SOX

Download sox from SOund eXchange, you should get a zip file.
extract zip file into the sox folder. so that there exists sox/sox.exe.

Step 1.3 specify the path to ffmpeg and sox

On MacOSX, you can substitude Step 1.1, 1.2 by brew install ffmpeg sox
edit user_config.py and make sure the FFMPEG_PATH and SOX_PATH is correct.

Step 2. Uniforming Audio files with same `*.wav` format:

python3 ./convert_file.py ./rawfile/

Step 3. Output spectrogram and waveform images

Step 3.1 Convert wavfiles to spectrogram

python3 wavfile2spectrum.py ./wavfile/

put spectrograms into images/spectrum/

Step 3.2 Convert wavfiles to waveform

python3 wavfile2waveform.py ./wavfile/

put waveforms into images/waveform/

audioplot's People

Contributors

Stargazers

Watchers

Forkers

bhfxwangshida ritou11 zhenghe-wang

audioplot's Issues

buffer size must be a multiple of element size

生成wavform时，报错。

Fix bug about "buffer size problem" maybe

ValueError: buffer size must be a multiple of element size
原因在于data = numpy.frombuffer(buf[itr + 8: itr + 8 + size], dtype=numpy.float32)中size的大小与numpy.float32的4byte不成倍数，所以把size调整成倍数可以解决wavform只能部分输出波形图的问题。复制下面这段代码替代原有的wavReader.py文件或者直接下载我新上传的wavReader.py文件应该都能解决这个问题，虽然可能波形图会有一点点尾音缺失的问题（不过其实也很可能不会），因为是将size按照4字节的倍数向下取整来处理。

#!/opt/anaconda3/bin/python

import argparse
import os
import numpy

class Format:
    size=0   # size of this 'fmt ' chunk
    fmtTag=0 
    nChannel=0  # channel number
    sps=0  # samples per second
    bps=0  # bytes per second
    
    def __init__(self, buf):
        self.size = numpy.frombuffer(buf[16:20], dtype=numpy.uint32)[0]
        self.fmtTag = numpy.frombuffer(buf[20:22], dtype=numpy.uint16)[0]
        self.nChannel = numpy.frombuffer(buf[22:24], dtype=numpy.uint16)[0]
        self.sps = numpy.frombuffer(buf[24:28], dtype=numpy.uint32)[0]
        self.bps = numpy.frombuffer(buf[28:32], dtype=numpy.uint32)[0]
    
def _parseData(buf):
    if buf[0:4] != b'RIFF':
        raise ValueError('"RIFF" header is missing')
    if buf[8:12] != b'WAVE':
        raise ValueError('"WAVE" header is missing')
    if buf[12:16] != b'fmt ':
        raise ValueError('"fmt " header is missing')
    fmt = Format(buf)
    itr = 12 + fmt.size + 8
    #print numpy.fromstring(buf[0:50], dtype=numpy.uint8)
    while(itr < len(buf)):
        riffid = buf[itr:itr+4]
        size = numpy.frombuffer(buf[itr+4:itr+8], dtype=numpy.uint32)[0]
        if riffid != b'data':
            itr += 8 + size
        else:
            break
    if riffid != b'data':
        raise ValueError('"data" header is missing')
    #这里进行对size的值进行变成4Byte的整数倍操作
    more = size % 4
    data = numpy.frombuffer(buf[itr+8: itr + size-more+8], dtype=numpy.float32)
    return fmt.sps, data


def readWav(filename):
    with open(filename, 'rb') as f:
        buf = f.read()
        sps, data = _parseData(buf)
    return sps, data  
    # label ,    samples per second,   samples

def parseName(filename): # ./xx/yy/10000_12.wav
    filename = filename.split(os.sep)
    filename = filename[len(filename) - 1]
    filename = filename.split('.')[0]
    filename = filename.split('_')
    filename = filename[len(filename) - 1]
    try:
        label =  int(filename)
    except:
        label = -1
    if(label < 0 or label > 23):
        raise ValueError(filename)
    return label

def numpyToWav(data, fname):
    with open('./template.wav', 'rb') as f:
        template = f.read()
    itr = 12
    while True:
        riffid = template[itr:itr+4]
        if riffid != b'data':
            itr += 4
            itr += numpy.fromstring(template[itr:itr+4], dtype=numpy.uint32)[0]
            itr += 4
        else:
            break
    
    F = numpy.zeros([len(data)*4 + 8 + itr], dtype=numpy.uint8)
    F[0:itr+4] = numpy.fromstring(template[0:itr+4], dtype=numpy.uint8)
    F[4:8] = numpy.fromstring(numpy.array([len(data)*4 + itr - 4], dtype=numpy.uint32).tostring(), dtype=numpy.uint8)
    F[22:24] = numpy.fromstring(numpy.array([1], dtype=numpy.uint16).tostring(), dtype=numpy.uint8)
    F[24:28] = numpy.fromstring(numpy.array([44100], dtype=numpy.uint32).tostring(), dtype=numpy.uint8)
    F[28:32] = numpy.fromstring(numpy.array([44100*4], dtype=numpy.uint32).tostring(), dtype=numpy.uint8)
    F[32:34] = numpy.fromstring(numpy.array([4], dtype=numpy.uint16).tostring(), dtype=numpy.uint8)
    F[34:36] = numpy.fromstring(numpy.array([32], dtype=numpy.uint16).tostring(), dtype=numpy.uint8)
    itr += 4
    F[itr:itr + 4] = numpy.fromstring(numpy.array([len(data)*4], dtype=numpy.uint32).tostring(), dtype=numpy.uint8)
    itr += 4
    print(F.shape - itr)
    print(data.dtype)
    F[itr:] = numpy.fromstring(data.tostring(), dtype=numpy.uint8)
    with open(fname, 'wb') as f:
        f.write(F.tostring())
    return
if __name__ == '__main__':
    #parse = argparse.ArgumentParser()
    #parse.add_argument('input')
    #args = parse.parse_args()
    
    ######  HERE  #######
    import sys
    sps, data = readWav(sys.argv[1])
    ######  HERE  ####### 
    print(sps, len(data))
    print(max(data))

valuerror

command: python3 wavefile2waveform.py ./wavefile

Traceback (most recent call last):
File "wavfile2waveform.py", line 47, in
ConvertAudioToWaveform(sys.argv[1])
File "wavfile2waveform.py", line 43, in ConvertAudioToWaveform
ConvertFile2Waveform(audio, dir_out)
File "wavfile2waveform.py", line 15, in ConvertFile2Waveform
rate, data =readWav(audio)
File "/home/ryan/Documents/audioPlot/wavReader.py", line 47, in readWav
sps, data = _parseData(buf)
File "/home/ryan/Documents/audioPlot/wavReader.py", line 40, in _parseData
data = numpy.frombuffer(buf[itr + 8: itr + 8 + size], dtype=numpy.float32)
ValueError: buffer size must be a multiple of element size

audioPlot Step 3.2

使用修改后的py文件后，产生新的错误

Original error was: dlopen(/anaconda3/lib/python3.7/sitepackages/numpy/core/_multiarray_umath.cpython-37m-darwin.so, 2): Library not loaded: @rpath/libopenblas.dylib
Referenced from: /anaconda3/lib/python3.7/sitepackages/numpy/core/_multiarray_umath.cpython-37m-darwin.so
Reason: image not found

No module named 'keras'

C:\Users\yuqm\Desktop\大数据和机器智能\BDMI-2019\audioNet-master>python train.py
Traceback (most recent call last):
File "train.py", line 4, in
from model import KerasModel
File "C:\Users\yuqm\Desktop\大数据和机器智能\BDMI-2019\audioNet-master\model.py", line 5, in
from fourierWeight import fourierLayer, fourierLayerShape, fourierWeight
File "C:\Users\yuqm\Desktop\大数据和机器智能\BDMI-2019\audioNet-master\fourierWeight.py", line 4, in
from keras import backend as K
ModuleNotFoundError: No module named 'keras'

解决办法：pip install keras

audioPlot中step3.2转换时报错

Step 3.2 Convert wavfiles to waveform
$python wavfile2waveform.py ./wavfile
put waveforms into images/waveform/
在进行这一步时命令行界面显示如下错误：
Traceback (most recent call last):
File "wavfile2waveform.py", line 47, in
ConvertAudioToWaveform(sys.argv[1])
File "wavfile2waveform.py", line 43, in ConvertAudioToWaveform
ConvertFile2Waveform(audio, dir_out)
File "wavfile2waveform.py", line 15, in ConvertFile2Waveform
rate, data =readWav(audio)
File "C:\Users\10344\audioPlot\wavReader.py", line 47, in readWav
sps, data = _parseData(buf)
File "C:\Users\10344\audioPlot\wavReader.py", line 40, in _parseData
data = numpy.frombuffer(buf[itr + 8: itr + 8 + size], dtype=numpy.float32)
ValueError: buffer size must be a multiple of element size