gwbasic / soft_matrix Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 0.0 4.66 MB

Upmixes stereo to surround sound

License: MIT License

Rust 100.00%

soft_matrix's Introduction

soft_matrix

Soft Matrix upmixes two-channel stereo to surround sound.

Goals and Purpose

The goal of Soft Matrix is to provide ideal upmixing of two-channel stereo audio to 5.1 channels. Positioning of sounds are based on their panning between the two channels and the phase difference between two channels.

Soft Matrix's default matrix works very well with recordings that have significant out-of-phase material, and Soft Matrix has a horseshoe mode for recordings with significant panning; but mostly in-phase material.

Currently, Soft Matrix supports the RM and Dolby Stereo matrixes. The goal is to support common phase and panning based matrixes, including SQ. (Current support for SQ is experimental.)

Usage

To use Soft Matrix with default options, merely run:

soft_matrix "input.wav" "output.wav"

More options and examples are described in options.md.

Soft Matrix only supports wav files as inputs. It only outputs 32-bit floating point wav files. (I recommend sox for converting to/from wav.)

How It Works

See How is Stereo Upmixed to Surround Sound

Samples

See Samples

Installation

Soft Matrix is available via cargo, or as source code. It is written in Rust.

Installation via Cargo

Prerequisite: Install Rust

cargo install soft_matrix

Update

To update, merely re-run:

cargo install soft_matrix

Release History

Chocolatey and Homebrew support?

There are currently open "help wanted" issues to support Chocolatey and Homebrew:

Chocolatey (Windows): #81
Homebrew (Homebrew): #80

Pre-built binaries?

There currently are no plans to provide pre-built binaries.

Building and Running from Source Code

Once you have installed Rust and installed Git:

git clone https://github.com/GWBasic/soft_matrix.git
cd soft_matrix
cargo build --release

The soft_matrix binary will be in the soft_matrix/target/release folder:

cd target/release
./soft_matrix

Supported Platforms

I currently develop on Mac. Soft Matrix successfully runs on both Intel and Apple silicon.

I have not tested on Windows or Linux yet; but I am optimistic that Soft Matrix will build and run on those platforms.

Examples

Examples are in options.md

Examples (for sox)

Convert a flac (or mp3) to a wav file

sox "spiral.flac" "spiral.wav"

Convert a wav to a flac file

Note that soft_matrix's output is a 32-bit floating point wav. This is a very inefficient file format, even compared to a 24-bit flac.

24-bit flac file: (Blu-ray, master quality)

sox "spiral - upmixed.wav" -b 24 "spiral - upmixed.flac" dither -s -p 24

16-bit flac file: (CD quality)

sox "spiral - upmixed.wav" -b 16 "spiral - upmixed.flac" dither -s -p 16

Tips

When upmixing a continuous performance, you will have best results if all tracks are concatenated into a single file. (For example, if you upmix the second side of Abbey Road, concatenate it into a single wav file.) This is because the upmixer inspects roughly 1/20th second of audio at a time. If there are file breaks throughout a continuous performance, it will interfere with windowing and could lead to a noticeable click at the track break.

I personally use sox for converting between different audio formats, like wav and flac.

How does it Work?

Soft Matrix attempts to steer audio by:

Breaking each sample up into its frequency components
Steering based on the instantaneous panning and phase relationship between each frequency component in each sample

To do this, Soft Matrix performs a fourier transform for each sample in the source wav file. It uses a window size large enough to process down to 20hz. To prevent noise, panning from adjacent samples are averaged.

Performance and Speed

Soft Matrix runs slowly. On my M2 Macbook Pro, it generally can upmix in approximately realtime.

This is because:

Fourier transforms large enough to go down to 20hz take a long time to perform.
Soft Matrix performs a transform for every sample.
Soft Matrix performs significant averaging of adjacent panning calculations.

To make Soft Matrix run as quickly as possible, it uses all available cores.

To keep performance "within reason," I suggest avoiding unreasonable sampling rates. Human hearing, in rare circumstances, only goes up to 28khz. Therefore, if you use high sampling rates, I suggest downsampling to 56khz before using Soft Matrix.

Options.md lists some other tuning options for faster upmixes; but I only recommend them for previews.

Feature Requests

Currently, I only plan on adding features to support additional out-of-phase matrixes, performance, audio quality, and configuration options.

I do not plan on adding any other features.

Specifically, I have no plans to support reading other file formats, outputting to other file formats, or outputting anything other than a 32-bit wav file. There are many excellent tools for audio format conversion that can handle this much better than I can. I personally use sox.

Getting Help

Before asking for help:

Use Google or your favorite search engine before asking for help; especially with issues related to Git or Rust.
This is a hobby project; it may take me some time to respond.
I cannot provide help with other audio processing tools, like sox.

If you need assistance, please visit https://andrewrondeau.com/blog/ and email me directly.

Known Issues

Large Wav File Support

The wav file format has a 2GB limit. There are at least two incompatible proposals to overcome this limitation. wave_stream, also by me, has no plans to support either large file proposal.

If large file support becomes desirable, I will investigate supporting .aiff as an alternative to .wav.

Inputting large wav files

Soft_matrix can not handle wav files larger than 2GB as inputs. It will handle an entire compact disk ripped as 16-bit, 44.1khz. When upmixing compact disks, I suggest upmixing each disk as a separate file. Higher bit rates and sampling rates may exceed the 2GB limit, thus I suggest breaking those files at moments of complete silence and upmixing each segment separately.

Outputting large wav files

Soft_matrix will automatically break output files to not exceed 2GB. Sox can be used to concatenate the files together. Example:

soft_matrix "stereo.wav" "surround.wav"
sox "surround - 1 of 2.wav" "surround - 2 of 2.wav" -b 24 "surround.flac" dither -s -p 24

SQ matrix

Current support for the SQ matrix is limited. Positioning within the SQ matrix is approximate; panning levels are incorrect and there may be noise or other distortion. I do not recommend using soft_matrix for professional SQ dematrixing.

The SQ matrix is a very unusual matrix compared to typical phase-based matrixes like Dolby Surround, RM, and the default matrix. These all work by maintaining the same left-to-right pan and using phase to pan front to back. (Tones that are in-phase (0 degrees phase difference) are in the front, and tones that are out-of-phase (180 degree phase difference) are in the back.) Instead, the SQ matrix uses phase to steer a tone around the perimeter of the room as if it's a circle.

I personally spent at least 6 months of weekends trying to get SQ "right." Unfortunately, SQ relies on trigonometry that I really struggle with.

Source Separation (Stemming)

Source Separation (Stemming) is the act of separating out individual channels from a fully-mixed recording. It is the technology used to finish The Beatles' Now and Then.

soft_matrix does not perform any source separation. I am unfamiliar with source separation tools, but if you'd like to use them, I suggest:

Do source separation before using soft_matrix.
Each separated source should be stereo, and preserve the phase of the original recording
Use soft_matrix separately on each source
Mix all upmixes together

Please get in touch with me if you do this. I am interested in hearing if it works!

Contributing

If you would like to contribute, please contact me using the above channels so that we may discuss your goals and motivations.

I would really appreciate help distributing through tools like Homebrew and Chocolatey. If you are motivated to upmix some SQ recordings, and enjoy math, maybe we can figure out SQ.

License

Soft Matrix is distributed under the [MIT license] (LICENSE.md)

soft_matrix's People

Contributors

Stargazers

Watchers

soft_matrix's Issues

Make rear phase shifts configurable

Distribute via Chocholatey (or similar) Windows

Users on Windows should be able to install via a standard package manager such as chocholatey

Make number of threads configurable

Distribute via Homebrew (Mac)

Users should be able to install via "brew install soft_matrix" (or similar)

Readme.md with build instructions

There should be a readme.md that has build and usage instructions

Remove "loud" matrixes and instead make "loud" an option

In general, when mixing in stereo, items in the center channel need to be at amplitude 0.707 in order to be as loud as items isolated in a speaker at amplitude 1.0.

During upmixing, this creates a complication: If a tone plays at amplitude 1.0 in both speakers, it will be directed to the center speaker at amplitude 1.414213562373095. This will create clipping.

There are two options to handle this:

Lower all amplitudes by .707. This what most matrixes do
Allow clipping. This is what the loud matrixes do

In this ticket:

Get rid of the "loud" matrixes
Each matrix should have, as a property, an "adjustment" value. This will normally be .707. During writing, everything will be adjusted by this value
The loud option should skip this value, although there should be a warning that clipping may occur

Make samples per transform configurable

Currently, there is one Fourier transform per sample. See if there is a way to have a single Fourier transform shared among a few samples.

Even if the quality is poor, this might be a good way to preview results quickly.

Automatically adjust as available_parallelism() changes

https://doc.rust-lang.org/stable/std/thread/fn.available_parallelism.html states that available_parallelism changes depending on load.

Automatically adjust:

Create new threads as available_parallelism grows higher
Block threads when available_parallelism is smaller

Music fades in

To reproduce:

Upmix a steady sine wave
Observe that the result "fades in"

I suspect this has to do with the slight amount of silence used to buffer the beginning.

Source sine wav:

Upmixed. Observe fade-in at the beginning and end:

Support SQ

For SQ: https://en.wikipedia.org/wiki/Stereo_Quadraphonic
https://en.wikipedia.org/wiki/Matrix_decoder#SQ_matrix,_%22Stereo_Quadraphonic%22,_CBS_SQ_(4:2:4)

Rear left: Right is (3/4)pi ahead
Rear center: 135 degrees difference between channels, right is 135 degrees forward relative to left
Rear right: Right is (1/4)pi behind

https://www.desmos.com/calculator/zimzev6yla

l: left back in left total
e: right back in right total
k: left back in right total
r: right back in right total

Bottom functions:

Rear center in left total
Rear center in right total

Set up CI checking on pull requests

Support Dolby Stereo

See: https://en.wikipedia.org/wiki/Dolby_Stereo#The_Dolby_Stereo_Matrix

In Dolby stereo, the center and rear channels are reduced by 0.707106781186548 when encoding.

Switch transforms from RefCell to Option

In order to avoid unneeded copying of transform vectors:

They are stored as RefCell and swapped
0-length vectors are used when the transform isn't needed.

Instead, use an Option and None when no transform is needed

Use usize to indicate location in the sound file

A mix of usize an u32 is used to indicate location in the sound file and window size.

This is confusing.

Instead, use usize everywhere.

Currently blocked by: GWBasic/wave_stream#26

Support UHJ (Ambisonics)

https://en.wikipedia.org/wiki/Ambisonic_UHJ_format#Encoding

I must admit that I don't fully understand how Ambisonics works. It will require more detailed study to understand how to decode it.

This summary makes a lot more sense: https://en.wikipedia.org/wiki/Matrix_decoder#Ambisonic_UHJ_kernel_(3:2:4_or_more)

Remove usage of .unwrap()

Some .unwraps() were introduced in #3. Remove them.

Optimize window size

Optimize window size following instructions at: https://docs.rs/rustfft/latest/rustfft/#avx-performance-tips

Currently:

Minimum window size: 2205
Window size: 4096

Only end the thread if all samples are written

If the averaging phase takes too long, and there's no more input, threads will exit. This could leave a single thread performing all of the backwards transforms and writing.

To fix this:

Keep a counter of the number of samples written
Only end the thread if all samples are written
Instead of ending the thread, lock on the averaging mutex. This will block the thread until averaging completes

Support SQ, RM, and other "old" matrix formats

Also include a matrix named "carlos" for Wendy Carlos's stuff that she never released in discrete surround.

Move logging to the main loop

Logging currently happens after writing a sample. This creates a large delay between starting and logging.

Instead, logging's percentage should be based on the number of forward and backward transforms performed:

At startup, calculate the total number of transforms to complete (both forwards and backwards)
Keep a count of the number of transforms performed (both forwards and backwards)

The check for logging should happen whenever the counts of transforms are incremented.

Experiment with integer-based processing

One thing to try is using 24-bit waves instead of floating point.

See if fft can run with 32-bit or 64-bit ints
Try shifting 24-bit audio up to 32-bit (or higher) to avoid rounding errors

Distribute via "cargo install"

It appears that distributing via "cargo install" is the easiest way to distribute soft_matrix. (It will require that users install rust, though.)

Instructions at:

When I document how to install soft_matrix, I'll need to link to github issues with "Volunteers needed"

Write proper surround sound wav file

Blocked on GWBasic/wave_stream#28

Use channelMask to write a proper 5.1 channel. (So that on playback the 6 channels are mapped correctly.)

No separation between front and back

I believe this is because of poor understanding of phase, I suspect I misunderstand how .re is represented

Keep computer awake while running

It appears that the computer will still sleep while soft_matrix is running. The computer should stay awake while soft_matrix is running.

To do this:

Use the keepawake crate: https://docs.rs/keepawake/latest/keepawake/
Include an option to disable keeping awake

Make reading stream based

Make reading stream based:

Read from an iter
Automatically pad the buffers for the fourier transforms
Automatically pad (or alternate heuristic) the queue of calculated right/lefts

This will be useful to allow reading the wav from stdin; which will allow using sox or similar to read formats other than wav

Averaging should vary with frequency, panning does not align with correct point in time

Averaging for panning front to back only needs to be a single wavelength. Right now all averaging is the length of the entire window.

This will require varying length buffers, or an entirely different means of averaging

Observe that the point where the phase changes isn't centered in the transition between front and rear:

Open-source wave_stream

At this point, I believe I've used wave_stream enough to open-source it.

It's getting awkward using a private reference to wave_stream.

Support files that are shorter than the window length (or abort)

Ignore phase in LFE

Branch IgnorePhaseInLFE has an attempt to ignore phase in the LFE. It's very staticy when normalized.

Front to back panning doesn't inspect wavelength

Front to back panning, in comments, explains that the phase is based on wavelength, but it doesn't actually use wavelength

Phase shift rear channels

One rear channel needs to be +(0.5 pi), the other needs to be -(0.5 pi).

According to https://en.wikipedia.org/wiki/QS_Regular_Matrix:

The left-back needs to be -(0.5 pi). (It's shifted forward on encoding and backwards on playback)
The right-back needs to be +(0.5 pi) (It's shifted backwards on encoding and forward on playback)

Note that Dolby Surround tends to have the same phase shift for the back:

Document all options

Typing "soft_matrix" with no options should give an informative set of instructions

Write elapsed time when upmixing is complete

Rename rm to qs

The article on QS states that it's incorrectly refferred to as RM: https://en.wikipedia.org/wiki/QS_Regular_Matrix

RM (Regular Matrix) was often used a synonym for the 'Sansui QS', 'Toshiba QM' and 'Nippon Columbia QX' matrix systems that were previously launched before the advent of the RM specification in 1973. Although none of the three previous matrices were compatible with the new RM specification, and with Toshiba and Nippon Columbia withdrawing their 'further RM incompatible' matrix systems from the market, Sansui's QS system was unofficially labelled by some record labels as RM, until the situation was clarified to those responsible for the mislabeling

Files are truncated to about an hour

Upmixing a file that's 2:05:56 (Just under 2 hours, 6 minutes) long, was truncated to 58:18. (Fifty-eight minutes, eighteen seconds.)

Not sure why

Edit: The root cause is that wave_stream doesn't support files longer than RIFF's 32-bit size values: GWBasic/wave_stream#30

To fix this: I'm going to implement some file splitting logic. Wav files longer than 4GB have inconsistent support, but Sox can concatinate into other formats that support long files.

Current sound quality is highly staticy

The current audio output is highly staticy. Fix this

Currently shifts pitch

When hardcoding the front-to-back to be the front, the pitch is always shifted up.

In this ticket, get a "no-op" with silent rear channels and no pitch shift.

Write progress every second

Every second, write progress:

Percentage complete
Estimated time of completion

Everything mixed to center in 5.1 mode

When dematrixing to 5.1, everything is mixed into the center channel.

Source:

Result, (dematrixed with default)

Make lowest supported frequency configurable

Currently, the lowest supported frequency is 10 hz. This should be configurable. A higher lowest supported frequency will allow for more quickly previewing an upsample.

Support 5.1

Output currently is currently 4 channels; output should be 5.1

To start, introduce an options processor:

Options struct
Static method to parse options from arguments

See if there is a faster way to copy vectors

In the final phase of upmixing, when sound is steered, the vectors for the transforms are copied so that there are transforms for the front and back.

The copy occurs via a .tovec();

See if there is a faster way to copy these vectors.

(An early lesson is that copying vectors to maintain memory integrity had a lot of overhead, and using RefCells that allowed swapping, and empty vectors where the transforms are ignored, is much faster.)

Multithreading

Make the whole thing multithreaded

Refactor to use separate files for the three phases of upmixing

There are three phases in upmixing:

Read from the wave and perform a forwards transform
Average / smooth the panning
Perform actual panning, a backwards transform, and write to the wave file

Each phase should be a separate file. (Keeping them all in upmixer.rs is unwieldy.) Each file should have a struct to maintain its state, and use closures as dependency injection to send along the next phase in processing.

Support Dynaquad

https://en.wikipedia.org/wiki/Dynaquad

In general, this requires:

Disabling the rear phase shift
Widening the rear channels

Support horseshoe surround and super-stereo

Horseshoe surround:

Ignore phase
Right and leftmost sounds are in the rear channels
Mid-panned sounds are in the front

Super-stereo:

Halfway between horseshoe and surround
Extreme left and extreme right are panned halfway to the back
Completely out of phase is panned halfway to the back

Two samples near the end of the file are missing

It appears that the last two samples aren't being written.

total_samples_written always shows two missing files.

Looking at the written file shows two 0ed out samples near the end.

Support Stereo-4

https://en.wikipedia.org/wiki/Stereo-4

In general this requires:

Disabling the rear phase shift
Slight widening in the front
Additional rear widening

Amplitude levels are wrong in SQ

Support was added for SQ in #70

One issue is that amplitude levels are wrong when a tone is panned front-to-back, or along the back channels.

In the following screenshot, all tone amplitudes should be equal:

#70