Git Product home page Git Product logo

pyannote-whisper's Introduction

pyannote-whisper

Run ASR and speaker diarization based on whisper and pyannote.audio.

Installation

  1. Install whisper.
  2. Install pyannote.audio.
  3. Downgrade setuptools to 59.5.0

Command-line usage

Same as whisper except a new param diarization:

 python -m pyannote_whisper.cli.transcribe data/afjiv.wav --model tiny --diarization True

Python usage

Transcription can also be performed within Python:

import whisper
from pyannote.audio import Pipeline
from pyannote_whisper.utils import diarize_text
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
                                    use_auth_token="your/token")
model = whisper.load_model("tiny.en")
asr_result = model.transcribe("data/afjiv.wav")
diarization_result = pipeline("data/afjiv.wav")
final_result = diarize_text(asr_result, diarization_result)

for seg, spk, sent in final_result:
    line = f'{seg.start:.2f} {seg.end:.2f} {spk} {sent}'
    print(line)
0.00 10.34 SPEAKER_00  I think if you're a leader and you don't understand the terms that you're using, that's probably the first start.
10.34 16.24 SPEAKER_00  It's really important that as a leader in the organisation you understand what digitisation means.
16.24 18.52 SPEAKER_00  You take the time to read widely in the sector.
18.52 26.16 SPEAKER_00  There are a lot of really good books, Kevin Kelly, who started Wired magazine has written a great book on various technologies.
26.16 34.80 SPEAKER_00  I think understanding the technologies, understanding what's out there so that you can separate the hype from the hope is really an important first step.
34.80 41.04 SPEAKER_00  And then making sure you understand the relevance of that for your function and how that fits into your business is the second step.
41.04 44.92 SPEAKER_01  I think two simple suggestions.
44.92 49.68 SPEAKER_01  One is I love the phrase brilliant at the basics.
49.68 52.00 SPEAKER_01  How can you become brilliant at the basics?
52.00 62.48 SPEAKER_01  But beyond that, the fundamental thing I've seen which hasn't changed is so few organisations as a first step have truly taken control of their spend data.
62.48 68.44 SPEAKER_01  As a key first step on a digital transformation, taking ownership of data.
68.44 71.76 SPEAKER_01  That's not a decision to use one vendor over someone else.
71.76 76.40 SPEAKER_01  That says we are going to be completely data driven, we're going to try and be as real time as possible.
76.40 81.04 SPEAKER_01  And we're going to be able to explain that data to anyone the way they want to see it.
81.04 91.04 SPEAKER_03  Understand why you're doing it.
91.04 95.24 SPEAKER_03  Talk to them, collaborate with them, you'll get a much better outcome.
95.24 104.32 SPEAKER_04  Think about what outcome you want at the end instead of thinking about the different processes and their software names.
104.32 108.32 SPEAKER_04  So, e-sourcing being one of 20.
108.32 109.52 SPEAKER_04  Think big and be brave.
109.52 118.56 SPEAKER_04  I think and talk to technology vendors because rather than just sending them forms, we won't bite you.
118.56 130.96 SPEAKER_02  I think we should fundamentally, all of us, rethink how procurement should be done and then start to define the functionality that we need and how we can make this work.
130.96 135.68 SPEAKER_02  What we do today is absolutely wrong.
135.68 172.00 SPEAKER_02  We don't like it, but we don't like it, our colleagues don't like it, nobody wants it and we're spending a huge amount of money for no reason.

Python usage 2

please find more details in this notebook.

import whisper
from pyannote.audio import Pipeline
from pyannote.audio import Audio
from pyannote_whisper.utils import diarize_text
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
                                    use_auth_token="your/token")
model = whisper.load_model("tiny.en")
diarization_result = pipeline("data/afjiv.wav")

from pyannote.audio import Audio
audio = Audio(sample_rate=16000, mono=True)
audio_file = "data/afjiv.wav"
for segment, _, speaker in diarization_result.itertracks(yield_label=True):
    waveform, sample_rate = audio.crop(audio_file, segment)
    text = model.transcribe(waveform.squeeze().numpy())["text"]
    print(f"{segment.start:.2f}s {segment.end:.2f}s {speaker}: {text}")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.