google / compare-codecs Goto Github PK

License: Apache License 2.0

Python 84.12% Shell 5.61% HTML 4.36% C 1.00% CSS 2.15% JavaScript 2.74%

compare-codecs's Introduction

Compare Codecs

This project exists to fulfill a frequently heard reqest: To be able to compare the performance of codecs -- consistently, openly, and usefully.

For a complete description of what it does, how it does it, and why, check out the website, the source for which is in website/_site.

A compatible system (at the moment: Ubuntu Linux, Trusty Tahr) is required to install this project.

To fire up the Web server to read the web site, run the following steps:

source init.sh
./install_software.sh (installs all required software, this takes time)
cd website; jekyll serve --watch

The website will now be visible as "localhost:4000".

See the file LICENSE for licensing details, CONTRIBUTING for how to make contributions.

compare-codecs's People

Contributors

Stargazers

Watchers

Forkers

pombredanne kleopatra999 ruil2 sijchen vbohinc walterfan richardor ewouth mayankjhamtani jesseyx neotim angelo-abel jurisbu xelement marmikreal isabella232 ghas-results

compare-codecs's Issues

Hardware encoders

Is the testing of hardware encoders something that falls under the goals of this project?

For example I believe that Intel QuickSync Video, nVidia NVENC and AMD VCE h.264 implementations are all widely available with open source drivers (and the right hardware).

Centralize command line parameter definitions

The command line parameters "--codec", "--criterion" and so on are all over the place.
Their definitions should be collected into a single module for consistency.

(at the moment: 62 calls to parser.add_argument, only 21 different ones.)

Support .y4m

The biggest repository of test files, the Derf filestore, uses .y4m files.
In order to use these, we need to:
a) read the .y4m format in scripts
b) compare the decode result with the .y4m file

Use aq-mode=3 as VP9 parameter in RT

Recommendation from Marco Panioni:

For RT, VP9 should use --rt --end-usage=cbr --aq-mode=3 - this is most like the mode used in WebRTC for realtime encoding.

Verify_scores needs to use old DB to find "best"

At the moment the verify_scores function uses the current scores to find the "best" scores
to evaluate whether they have changed or not. This is bad in the case where the "best" score
is worsened - the next run of the tool will pick a new configuration as "best".

The solution likely involves moving the choice of scoredir into the optimizer.

x264 settings are bad

Please contact x264 developers to discuss further.

The results are pretty meaningless when settings are chosen without an understanding of what they do.

Need to detect and delete illegal configurations from storage

At the moment, parsing from config strings, such as reading from storage, does not check the parameter values. They ought to be checked (numeric values are numeric, bounded-range values within ranges, choice values have a valid choice), and action taken (likely raise exception).

This needs tools to clean out improper configurations from storage, too.

This situation arises when the bounds of a parameter have been tightened.

Change "raise encoder.Error" into more specific error classes.

From @pzembrod : The style guide recommendation is to use a module's Error class only as base class for the actual exceptions thrown, so that you can catch either all exceptions from that module (Error) or any specific exception in a targeted manner.

Investigate performance comparisions between VP9 in 1.3 and 1.6

Some indications are that PSNR rates dropped for many of the MPEG clips with the same settings, but settings' meanings might have shifted.

Not a high priority.

MJPEG codec needs to control its -qmin and -qmax values

The following gave an error:

/usr/local/google/home/hta/code/compare-codecs/tools/ffmpeg -loglevel warning -s 1920x1080 -i video/mpeg_video/Kimono1_1920x1080_24.yuv -codec:v mjpeg -qmin 58 -b:v 2500k -y /usr/local/google/home/hta/code/compare-codecs/workdir/mjpeg/c0a4fd36dd74/2500/Kimono1_1920x1080_24.mjpeg

Error message:

[rawvideo @ 0x2d792e0] Estimating duration from bitrate, this may be inaccurate
[swscaler @ 0x2d650a0] deprecated pixel format used, make sure you did set range correctly
[mjpeg @ 0x2d7bac0] qmin and or qmax are invalid, they must be 0 < min <= max
Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height

Likely, qmin is larger than the default value for qmax.

Options should be able to go away

As part of the varying process, options can be added to the commandline, but they don't go away. They should - shorter command lines are easier to understand.

MJPEG issue at certain settings

The setting -qmin 50 -qmax 722 seems to generate an error on BQMall for MJPEG @1200 kbps.

/home/hta/code/compare-codecs/tools/ffmpeg -loglevel warning -s 832x480 -i video/mpeg_video/BQMall_832x480_60.yuv -codec:v mjpeg -qmax 722 -qmin 50 -b:v 1200k -y /home/hta/code/compare-codecs/workdir/mjpeg/fd8f0e24053d/1200/BQMall_832x480_60.mjpeg
[rawvideo @ 0x1b48e20] Estimating duration from bitrate, this may be inaccurate
[swscaler @ 0x1b300c0] deprecated pixel format used, make sure you did set range correctly
Encode took 1.400000 CPU seconds 1.470000 clock seconds
/home/hta/code/compare-codecs/tools/ffmpeg -loglevel warning -codec:v mjpeg -i /home/hta/code/compare-codecs/workdir/mjpeg/fd8f0e24053d/1200/BQMall_832x480_60.mjpeg /home/hta/code/compare-codecs/workdir/mjpeg/fd8f0e24053d/1200/BQMall_832x480_60tempyuvfile.yuv
/home/hta/code/compare-codecs/workdir/mjpeg/fd8f0e24053d/1200/BQMall_832x480_60.mjpeg: Invalid data found when processing input

Settings that completed successfully:

7a2b7def4a12 -848.682010 -qmax 597
e3fef2f71b05 -530.719020 -qmax 597 -qmin 58
b4895ab262ea -499.714020 -qmax 598 -qmin 69
278ae022d480 -499.714020 -qmax 626 -qmin 69
366f29e2de05 -504.051020 -qmax 722 -qmin 67
456420d31ede -848.682010 -qmax 722
56c8aea054b6 -499.714020 -qmax 722 -qmin 69
f1306ea6b766 -499.714020 -qmax 827 -qmin 69
5499167d9533 -848.682010 -qmax 943
7df43d1fe8ce -499.714020 -qmax 943 -qmin 69
e8a589627343 -511.853020 -qmax 943 -qmin 64
d31668749543 -499.714020 -qmax 985 -qmin 69
1a5518512496 -499.714020 -qmax 1024 -qmin 69

More informative display of "tuned results"

The "tuned results" display is not optimal for comprehension.
Suggested alternate graphic:

Use color to differentiate the two codecs
Include the -single-criterion as a line (currently not included)
Include the "tuned results" as bullets, not as a connected line
Shape the "tuned results" that fail to achieve the criterion (negative score) as X, not bullet.

The graph generation is done purely in Javascript. The information needs to be easily accessible.

Upgrade ffmpeg to version n2.2.3

This needs to verify that encodings are stable or improve for the ffmpeg-provided encoders.

AllEncoderFilenames returns bare filenames, not paths

In converting to use multiple score stores, it turns out that EncodingDiskCache.AllEncoderFilenames returns just the filename, not the path. This needs fixing.

Need baseline and high profiles as separate x264 test targets

The settings used for x264 were inherited from a test where the constrained baseline H.264 profile was of interest. Default x264 profile should use High, and baseline should be a separate target (if it's still of interest).

Clean out or ignore illegal parameters from shared repo

When running with a shared repo, and parameter combinations are made illegal, stored parameters may turn illegal. This causes an exception when trying to retrieve "the best config".

Exceptions when retrieving "the best config" from the path should just ignore the bad configs.

Share results from different sites

There should be a common repository for results, which allow you to generate graphs without running everything yourself.

Checking in should be guarded so that only improvements are entered by default, and that results are traceable.

URL of graphs should track selection

When choosing alternatives in graphs - especially sweepdata.html - the URL should be updated to follow the selection. This would allow sharing of links to specific graph displays.

Add multiple YUV filesets

There needs to be parameters for switching between filesets.

Suggested design: Fileset names = directory names under "video".
Bitrates are taken from a fixed table based on width x height.

Name "local" should be reserved for non-global results (not uploaded or made consistent).

RT mode configurations should disable multi-pass modes.

VP9 forces 1-pass internally, but all other codecs in RT mode should also ensure 1-pass.

ChangeValue should be CreateVariant

Since the parameters.ChangeValue function returns a new parameter block (and similar for its echoes up the stack), it ought to be called something that implies it doesn't change its argument. CreateVariant is one possible name - ModifiedObject would be another.

39 occurences so far.

Encoder version should be stored in results

The encoder version is important if we want to generate statistics comparing one version of an encoder to another version.

This bug tracks picking the encoder version out of the compiled binary (for instance, using "x264 --version").

Display of rate vs psnr should show encoding parameters and score components

A hover or popup for each point should show the command line and the resulting score components used for that particular encoding.

Permit multithread configurations

Both VP9 and x264 claim higher performance numbers when multithreading; this is important for the RT case, where encoding speed is the limiting factor.

But target systems have very different numbers of (free) cores, so results at various threading levels might be important to show, not just "max threads".

Split encoder.py

At almost a thousand lines, encoder.py is too big. Consider splitting it.

Removing the Encoding*Cache objects would take ~250 lines off the footprint.

Support 2-pass mode for codecs that have it.

Codecs that support 2-pass (vp8, vp9, x264, x265?) need to be allowed to use it in "unconstrained" mode. We should also have a separate comparision table that only shows 1-pass results.

test_BlackFrame in various codecs can be centralized

Most of the code for this should be a common utility function VerifyEncodingScoreBetterThan.

Pages should offer a way to display different encoder versions' results

On a page, we should (somehow) offer the opportunity to pick the encoder version for each side of a comparision.

Remove AST parsing when AST repos are gone

The AST parsing is just for backwards compatiblity, and should be removed.

Update pylint invocation in run_all_tests to 1.3 compatible

The output format of pylint has changed to hide the filename.

Use a msg-format to get it back. Instructions here: http://docs.pylint.org/output.html

Make Numpy warnings into errors

Numpy gives warnings about poorly conditioned polynominals when given less than 3 points (?).

These curves are most likely badly shaped, and need to be inspected to see if we can generate sensible numbers from them at all.
Investigation needed.

Refactoring: Concentrate env[] access to one module

Having code that accesses os.env['CODEC_WORKDIR'] and os.env['WORKDIR'] in many places is not good.
Add a module that centralizes access to these variables, and vends the paths when needed.
Note: Needs setters, because testing.

The run_all_tests script doesn't pick up all tests

Due to the split between full and limited mode, the run_all_tests doesn't pick up all unittests.
In full mode, it should pick them all.

Database should store encodes with same parameter, differing versions

There should be a place in the database to store results for the same parameter sets, but differing encoder versions.

Depends on #68

Need tool for ensuring compare_json --single_config will succeed

compare_json will drop out results with negative scores.
Negative scores are quite common under criterion RT.
When all scores for a file are negative for all configs where all files have been scored, compare_json --single_config will fail.

There needs to be a tool for making sure there exists a config with at least one positive score for each file in the test set - one can do it manually by picking a likely config and running force_run_config, but this should be automatable.

Video playback

We should support video playback of encoded files - preferably side-by-side.

Most likely design:
x264-lossless encoding the resulting YUV file and placing it on the server.
Keep track of two last rate points clicked in a graph, and play them back when requested.

Consider encoding speed in comparison

It will be much better, either if the file-size comparison can be put under a similar encoding speed, or the results can include speed data along with the filesize-PSNR data.
The current default setting of x264 is '-preset slow', which can give very different filesize/PSNR results with '-preset ultrafast' or '-preset superfast' .

For most of the codec, the speed info can be read from the encoder output to screen, for example, x264 outputs "encoded xxx frames, xxx.xx fps, xxx.xx kb/s" at the end of encoding.

Travis build

There should be one.

install_prerequisites.sh not found

I am trying to run the scripts to compare Vp9 vs x264. The file website/contributing/index.md, instructs to run install prerequisites. I am guessing this will fetch the input YUV files also. But I couldn't find this script in the directory.

Investigate using libyuv for psnr tools

libyuv (open source) has a PSNR measurement tool as well as an SSIM measurement tool.
Using those tools instead of / in addition to our source-included PSNR needs to be investigated.

Do sensible things for nonoverlapping PSNR ranges

At the moment, if all the PSNR values for one codec are higher than the best PSNR values for another codec, the "size AVG" calculation returns 0, and the "size DRATE" calculation returns a very large number.

These cases should be treated sensibly (either clamped to some arbitrary large/small number, or omitted from the "average improvement" calculation altogether).

Result storage should use JSON not AST

AST is order-of-magnitude slower than JSON, and parsing has turned out to be the heaviest part of report processing.

Add --overshoot-pct limiter to VP9 encoder

Note (from Paul Wilkins): This vp9 control has no effect in 1-pass mode. So this only makes sense when 2-pass mode is fully supported.

Goal-seeking needs to control certain parameters of the codec.

The RT mode needs to keep the lookahead parameter at zero to meet its requirements.
This means that the goal-seeker needs to know to avoid it - this may fit best if the ConfigurationFixups function of "Codec" takes a "mode" parameter - this requires keeping the name of the mode around, which argues for encapsulating the scoring function into an object having both a mode name and a score function.

An alternative design is to have separate codec names for each mode, which control the underlying parameters.

Apply Javascript conventions

The current Javascript uses UpperCamelCase for function names, while proper Javascript style is to use lowerCamelCase for functions, and UpperCamelCase only for constructors (~ class names).

This should be fixed.