Comments (10)
Hi,
Your work is really interesting. I used your code and trying to contribute in it. I could not see any code that creates these two data files "development_sample_dataset_speaker.hdf5" and "enrollment-evaluation_sample_dataset.hdf5". Any help will be appreciated.
from 3d-convolutional-speaker-recognition.
@Ahmed-Abouzeid HiοΌ I have some problems too! I also want to know how to create my own data. Now I have run create_development.py directly ,but it gave me that train_files_subjects_list.append(file_name.split('/')[7])
IndexError: list index out of range
could you help me? I don't konw how to change the index
from 3d-convolutional-speaker-recognition.
@Ostnie are you aware that the training files are not audio samples, they are .npy files which include features of a certain person voice. So, you need first to run the speechpy package on the wav files to create npy files and use them in that repository. I am copying the speechpy code I wrote for clarification:
`for x, f in enumerate(os.listdir(sys.argv[1])):
file_name = os.path.join(os.path.dirname(os.path.abspath(__file__)), sys.argv[1] + '/' + f )
fs, signal = wav.read(file_name)
# Example of pre-emphasizing.
signal_preemphasized = speechpy.processing.preemphasis(signal, cof=0.98)
# Example of staching frames
frames = speechpy.processing.stack_frames(signal, sampling_frequency=fs, frame_length=0.020, frame_stride=0.01, filter=lambda x: np.ones((x,)),
zero_padding=True)
# Example of extracting power spectrum
power_spectrum = speechpy.processing.power_spectrum(frames, fft_points=512)
#print('power spectrum shape=', power_spectrum.shape)
############# Extract MFCC features #############
mfcc = speechpy.feature.mfcc(signal, sampling_frequency=fs, frame_length=0.020, frame_stride=0.01,
num_filters=40, fft_length=512, low_frequency=0, high_frequency=None)
mfcc_cmvn = speechpy.processing.cmvnw(mfcc,win_size=301,variance_normalization=True)
#print('mfcc(mean + variance normalized) feature shape=', mfcc_cmvn.shape)
mfcc_feature_cube = speechpy.feature.extract_derivative_feature(mfcc)
print('mfcc feature cube shape=', mfcc_feature_cube.shape)
np.save(sys.argv[1] + '/' + f, mfcc_feature_cube)`
from 3d-convolutional-speaker-recognition.
@Ostnie Next step should be running the create_development.py after preparing the train_subjects_path.txt. For me I wrote that:
../sample_data/nihalEssmat
../sample_data/samarBahaaeldin
because I placed these two folders (nihal_essmat, samarbahaeldin) in sample data and they contain the .npy files generated for each from the speechpy code I provided above. The final result should be the hdf5 to use during the training.
Note (1) the output of mine, the numpy array was with a specific shape which I changed the code in this repository accordingly. For example, in the create_development.py, I changed num_coefficient = 40 to be num_coefficient = 13 since 13 was in the the shape of my features npy files.
Note (2), I created eval_subjects_path.txt and placed into it something similar to the train_subjects_path.txt but this time I placed npy files that I will use for evaluation, you will need that later when you run the bash script (run.sh)
I hope what I am saying make sense to you. I will let you know if I was able to fix the original issue I posted earlier.
from 3d-convolutional-speaker-recognition.
@Ahmed-Abouzeid Hello, have you ever fixed your issue?
from 3d-convolutional-speaker-recognition.
@nkcsfight Hi, I am again visiting this problem and will work on it these days. If I reached something I will post it here!
from 3d-convolutional-speaker-recognition.
Hi, I believe you are using the wrong method here. You should be using lmfe instead mfcc.
from 3d-convolutional-speaker-recognition.
@Ahmed-Abouzeid I guess your problem is that you have decreased the number of filters from 40 to 13. The network is structured in such a way that if input of 40x80 in not as an input, during convolution operation the size becomes negative, which is exactly happening in your case. Changing it back to 40 and getting your data prepared using 40 filters would definitely solve the problem.
from 3d-convolutional-speaker-recognition.
@naeemrehmat65 Thanks for your interest. Please refer to this part for creating development. Creating enrollment and evaluation is similar. This part is just related to how you create an HDF5 for feeding it to the network. It must be customized and modified considering your specific datat.
from 3d-convolutional-speaker-recognition.
from 3d-convolutional-speaker-recognition.
Related Issues (20)
- Convolution expects input with rank 4, got 5 HOT 9
- Extracting VAD for our own dataset HOT 1
- How to generate data HOT 13
- Run time error in the demo HOT 6
- No such file or directory: 'results/SCORES/score_vector.npy HOT 1
- Where does input_feature.py store it's results? HOT 1
- How to make Speaker Verification (1:1 recognition) model in keras? HOT 1
- What is the exact meaning of "utterances"? HOT 4
- ValueError: axes don't match array
- What does low-level and high-level features extraction mean? HOT 1
- .wav inputs specifics HOT 6
- where is score_vector.npy HOT 2
- Demo video recording link is broken
- Please update the vedio link of the demo.
- Pre trained model HOT 1
- Does anyone know the EER of this repo? HOT 1
- How to deal with .hdfs5 files ? HOT 2
- version problem HOT 1
- Speakers for Enrollment and Development
- γAll Dependency of This Project HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from 3d-convolutional-speaker-recognition.