Hi, I've tried to use LibriSpeech to train the model, and I found that "backward"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question on time cost during each iteration about uis-rnn HOT 6 CLOSED

google commented on May 7, 2024

Question on time cost during each iteration

from uis-rnn.

Comments (6)

wq2012 commented on May 7, 2024

backward is the step to compute gradient so it's supposed to be the most expensive step.

The backward is not associated with the dataset, but associated with the size of input.

You can try to break the dataset into subsets and call the fit function multiple times as suggested in README.md.

from uis-rnn.

hcfeng201 commented on May 7, 2024

backward is the step to compute gradient so it's supposed to be the most expensive step.

The backward is not associated with the dataset, but associated with the size of input.

You can try to break the dataset into subsets and call the fit function multiple times as suggested in README.md.

Sir,
I break the dataset into 2 subsets, each subsets include 41000 elements which Smaller than the test train_sequence you provided (47350).But it still cost 8 seconds in each iteration, and with the data you provide, it took less than a second.

from uis-rnn.

wq2012 commented on May 7, 2024

Is 41000 the number of time steps, or the feature dimension? If former, what is your feature dimension?

Also, did you normalize the features before you feed them into uisrnn? If not, what's the range of your features?

from uis-rnn.

hcfeng201 commented on May 7, 2024

Is 41000 the number of time steps, or the feature dimension? If former, what is your feature dimension?

Also, did you normalize the features before you feed them into uisrnn? If not, what's the range of your features?

41000 is the feature dimension of a 2-dim numpy array (41000, 256), like your feature dimension(47350, 256). The normalize you mentioned is "The embedding vector (d-vector) is defined as the L2 normalization of the network output"? I extracted d-vector by "PyTorch_Speaker_Verification". I think the normalize is done.

from uis-rnn.

wq2012 commented on May 7, 2024

You will need to discuss this with the author of PyTorch_Speaker_Verification.

We are not responsible for the correctness or any issue of third-party libraries.

from uis-rnn.

Aurora11111 commented on May 7, 2024

@hcfeng201
you should change the embedding create dome:
for file in os.listdir(folder):
if file[-4:] == '.wav':
# subprocess.call(['ffmpeg', '-i', 'file', file[-4:]+'.wav'])
print(folder + '/' + file)
times, segs = VAD_chunk(2, folder + '/' + file)
print("times" * 10, times)
print("segs" * 10)

    if segs == []:
        print('No voice activity detected')
        continue
    concat_seg = concat_segs(times, segs)
    STFT_frames = get_STFTs(concat_seg)
    STFT_frames = np.stack(STFT_frames, axis=2)
    STFT_frames = torch.tensor(np.transpose(STFT_frames, axes=(2, 1, 0)))
    embeddings = embedder_net(STFT_frames)
    # print(embeddings)
    aligned_embeddings = align_embeddings(embeddings.detach().numpy())
    train_sequence.append(aligned_embeddings)
    for embedding in aligned_embeddings:
        train_cluster_id.append(str(label))
    label += 1
    test_sequence = np.concatenate(train_sequence, axis=0)
    test_cluster_id = np.asarray(train_cluster_id)

np.save('test_sequence', test_sequence)
np.save('test_cluster_id', test_cluster_id)
print("%" * 100)
print(test_sequence.shape, type(test_sequence))

and change uis-rnn test demo:
test_sequence = np.load('./data/test_sequence.npy')
test_cluster_id = np.load('./data/test_cluster_id.npy')

model = uisrnn.UISRNN(model_args)

model.load(SAVED_MODEL_NAME)
#testing
print("%" * 100)
print(test_sequence.shape, type(test_sequence))
print(test_cluster_id, type(test_cluster_id))
#for (test_sequence, test_cluster_id) in zip(test_sequences, test_cluster_ids):
predicted_label = model.predict(test_sequence, inference_args)
predicted_labels.append(predicted_label)
accuracy = uisrnn.compute_sequence_match_accuracy(list(test_cluster_id), predicted_label)
test_record.append((accuracy, len(test_cluster_id)))
print('Ground truth labels:')
print(test_cluster_id)
print('Predicted labels:')
print(predicted_label)
print('-' * 80)
output_string = uisrnn.output_result(model_args, training_args, test_record)
print('Finished diarization experiment')
print(output_string)

from uis-rnn.

Question on time cost during each iteration about uis-rnn HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent