Git Product home page Git Product logo

cleansio's People

Contributors

c-bain avatar levin-noro avatar patrickduncan avatar victorcarri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cleansio's Issues

Python 3.4 GCS issues on TravisCI

pip install --upgrade google-cloud-speech

throws error:

FileNotFoundError: [Errno 2] No such file or directory: '/home/travis/virtualenv/python3.4.6/lib/python3.4/site-packages/six-1.10.0.dist-info/METADATA'

Behaviour Diagram

I would like a detailed diagram of the main actions as of c5d310f. It would help greatly when new teammates are added to our project.

Please use https://draw.io and make it open for extension.

Research - Tempo (BPM)

  • Create a wiki page
  • Add the wiki page to the sidebar under "Lyric Accuracy Assessment"
  • Document how the song's beats per minute affects the accuracy of the lyrics. Based on this investigation we may need to either speed or slow down snippets.
  • Provide examples

Create PERT chart

PERT Charts or other project management technique, Continuous, with rotating deadlines per group: 10%

Overlapping Chunk

To improve the accuracy of #70, use an overlapping audio chunk.

Right now we're sequentially breaking up the audio file into 5 second chunks. The problem with this is the lyrics at the border between 2 audio chunks is cut off. By using another audio chunk that is 2.5 seconds ahead we can avoid this issue.

  • Create another chunk 2.5 seconds after the current chunk
  • Transcribe with GCS
  • Use that transcription to improve the accuracy of the current chunk
  • Discard the overlapping chunk

Windows support

Provide documentation on how to set up it on Windows. Assume they're using Windows 7+ and Bash

No Lyric Output

We shouldn't output lyrics at all, so please remove the code which does so.

  • Remove the print
  • Return the lyrics

Conversion fails for .m4a

Traceback (most recent call last):
  File "/Users/patrick/anaconda3/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Users/patrick/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 532, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/patrick/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 466, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
	status = StatusCode.INVALID_ARGUMENT
	details = "Must use 16 bit samples for LINEAR_PCM, but WAV header indicates 32 bit per sample."
	debug_error_string = "{"created":"@1545427960.216637000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1099,"grpc_message":"Must use 16 bit samples for LINEAR_PCM, but WAV header indicates 32 bit per sample.","grpc_status":3}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "cleansio/cleansio.py", line 46, in <module>
    transcribe(AudioFile(sys.argv[1]))
  File "/Users/patrick/dev/cleansio/cleansio/speech/transcribe.py", line 14, in transcribe
    audio_file.sample_rate)
  File "/Users/patrick/dev/cleansio/cleansio/speech/transcribe.py", line 32, in transcribe_each_slice
    response = client.recognize(config, audio)
  File "/Users/patrick/anaconda3/lib/python3.6/site-packages/google/cloud/speech_v1/gapic/speech_client.py", line 227, in recognize
    request, retry=retry, timeout=timeout, metadata=metadata)
  File "/Users/patrick/anaconda3/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 139, in __call__
    return wrapped_func(*args, **kwargs)
  File "/Users/patrick/anaconda3/lib/python3.6/site-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
    on_error=on_error,
  File "/Users/patrick/anaconda3/lib/python3.6/site-packages/google/api_core/retry.py", line 177, in retry_target
    return target()
  File "/Users/patrick/anaconda3/lib/python3.6/site-packages/google/api_core/timeout.py", line 206, in func_with_timeout
    return func(*args, **kwargs)
  File "/Users/patrick/anaconda3/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 61, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 Must use 16 bit samples for LINEAR_PCM, but WAV header indicates 32 bit per sample.

Maximize Chunk Loudness

Based off #56, we need to increase the volume of the audio file to the threshold before clipping.

According to pydub's doc there's a relative loudness endpoint and it also supports modifying the loudness of a file

Convert to Mono

  1. Check how many channels it has
  2. Convert to Mono
    • Google Cloud Speech only accepts 1 channel

Unit Tests

  • Add to CI
  • Unit test all functions that do not involve GCS.

Mute Explicits

Now that we've located where the explicits are, we need to ensure the user doesn't hear those words. Based on the timestamps mute sections of the audio chunk created by Cleansio and stored in ~/.cleansio-temp.

  • Mute sections based on timestamps
  • Overwrite the audio chunk
  • MUTE instead of BLEEP

Research - Vocal Pitch

  • Create a wiki page
  • Add the wiki page to the sidebar under "Lyric Accuracy Assessment"
  • Document how the artist's pitch affects the accuracy of the lyrics
  • Provide examples

Create Poster - Dec. 5

Due

  • GO/NoGO Poster, December 5th: 10%
  • Poster Tweets, December 31st: 10%

Responsibilities

  • Corie
    • Background
    • Censoring
  • Levin
    • Process (writeup)
    • Retrieving Lyrics
  • Patrick
    • Design
    • Process (diagram)
  • Victor
    • Future Work: Real-time
    • Acknowledgements

Accept Any Audio Format

Convert to FLAC on runtime
Create a temporary WAVE file if they input something other than FLAC/WAVE.

Delete the file after execution. Use system's temp.

Defaulting to WAVE because it yields higher GCS results.

Always cleanup files

If you exit the program early the temporary files will stay in the folder. Implement some type of finally block.

Word Timestamps

Now that we have a list of explicits, we need to locate where each word is sung in the 5 second audio chunk.

  • Locate the explicits as accurately as possible
  • Use milliseconds
  • Define a start and end timestamp (for each explicit)

Use Package Manager

Currently, we're installing each dependency separately. We should use a package manager that lists each library and the version to guarantee deterministic behaviour.

  • Use pipenv, poetry or requirements.txt
    • Chose requirements.txt
  • Update README
  • Upate TravisCI
  • Update Dockerfile

Research - Genre

  • Create a wiki page
  • Add the wiki page to the sidebar under "Lyric Accuracy Assessment"
  • Document how the music genre affects the accuracy of the lyrics
  • Provide examples

No Length Cap

Transcribe an entire song by transcribing small 10~ second chunks

Research - Removing Instrumental

  • Create a wiki page
  • Add the wiki page to the sidebar under "Lyric Accuracy Assessment"
  • Document how the removing the instrumental affects the accuracy of the lyrics
  • Provide examples

Improve Lyrics Output

Currently, we're just printing the JSON response. Display it like this:
09/34: I left my girl at home
10/34: I don't love her no more

Internet Usage

In real-time mode Google Cloud Speech will be constantly called over and over again in a very short period of time.

  • Monitor internet traffic while using Cleansio on a long audio file
  • Record download bandwidth
  • Record upload bandwidth
  • Create a wiki page with the results
  • Link to wiki page in this issue

Explicit Word List Options

We need to combine the lists if the user provides one. We should also provide a way for the user to choose how the lists should be combined, or whether the user-provided list should override our internal one.

  • Check user-provided options to see what mode they want
  • Option to use just our list
  • Option to use just their list
  • Option to use a combination of both lists
  • If they want to combine the lists, put both lists into a set to remove duplicates, then create the new internal list of explicit words using that set's contents

Check Adjacent Words

It's possible for multiple chunks to be individually inoffensive, but be offensive when combined with each other. (For example, the letters in F-*-*-* said individually).

These should also be censored.

Logo

We need a logo for our project.

Timestamp Safe Mode

  • Use #84
  • Safe mode that widens the timestamp to decrease the chance that a word slips through the censor

Internal List of Explicits

We need to have an internal list of explicit words in case the user doesn't choose to provide their own list. This should be an encrypted YAML file.

  • Reference a reliable source for explicit words
  • Use a YAML file to list the explicits
  • Either encrypt the contents of the entire file or each explicit
    • We're encrypting to avoid curse words to our repo

Crypto Suggestion

Research - Dynamics

  • Create a wiki page
  • Add the wiki page to the sidebar under "Lyric Accuracy Assessment"
  • Document how the loudness of a snippet affects the accuracy
  • Provide examples

Sample Rate

Find the sample rate of the file and change the GCS request instead of making it static.

If the sampling rate cannot be found convert the sample rate to a default value.

Remove String Interpolation

String interpolation is was added to Python on v3.6. We support v3.4-v3.6, remove all f"".

We definitely need unit tests.

Research - Snippet Length

  • Create a wiki page
  • Add the wiki page to the sidebar under "Lyric Accuracy Assessment"
  • Document how the length of the snippet affects the accuracy of the lyrics
  • Provide examples

Libraries List

We need a wiki page that lists every library we're using and why we're using it. Also add a link to the library.

Clean File Options

  • Provide a cli arg to allow the user to specify where it should be stored
  • Provide a cli arg to allow the user to specify the audio encoding

User List of Explicits

We shouldn't use a hard-coded list. Please add code which loads in an encrypted YAML list of swear words before the rest of the program runs.

  • The file should either be in Cleansio's directory (gitignore the content) or they should provide the path to the YAML
  • There should be a sample YAML as a reference to the user
  • Pass the file through the command-line

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.