Git Product home page Git Product logo

intel / code-base-investigator Goto Github PK

View Code? Open in Web Editor NEW
25.0 8.0 13.0 669 KB

An analysis tool providing insight into the portability and maintainability of an application’s source code.

Home Page: https://intel.github.io/code-base-investigator/

License: BSD 3-Clause "New" or "Revised" License

Python 94.65% Fortran 0.59% C++ 3.95% C 0.44% Assembly 0.36%
code-quality p3 static-analysis-tools

code-base-investigator's Introduction

Code Base Investigator

DOI OpenSSF Best Practices

Code Base Investigator (CBI) is an analysis tool that provides insight into the portability and maintainability of an application's source code.

  • Measure code divergence to understand how much code is specialized for different compilers, operating systems, hardware micro-architectures and more.

  • Visualize the distance between the code paths used to support different compilation targets.

  • Identify stale, legacy, code paths that are unused by any compilation target.

  • Export metrics and code path information required for P3 analysis using other tools.

Table of Contents

Dependencies

  • jsonschema
  • Matplotlib
  • NumPy
  • pathspec
  • Python 3
  • PyYAML
  • SciPy

Installation

The latest release of CBI is version 1.2.0. To download and install this release, run the following:

git clone --branch 1.2.0 https://github.com/intel/code-base-investigator.git
cd code-base-investigator
pip install .

We strongly recommend installing CBI within a virtual environment.

Getting Started

After installation, run codebasin -h to see a complete list of options.

A full tutorial can be found in the online documentation.

Contribute

Contributions to CBI are welcome in the form of issues and pull requests.

See CONTRIBUTING for more information.

License

BSD 3-Clause

Security

See SECURITY for more information.

The main branch of CBI is the development branch, and should not be used in production. Tagged releases are available here.

Code of Conduct

Intel has adopted the Contributor Covenant as the Code of Conduct for all of its open source projects. See CODE OF CONDUCT for more information.

Citations

If your use of CBI results in a research publication, please consider citing the software and/or the papers that inspired its functionality (as appropriate). See CITATION for more information.

code-base-investigator's People

Contributors

al42and avatar douglasjacobsen avatar itsjayway avatar laserkelvin avatar pennycook avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

code-base-investigator's Issues

Paths in tests should not be relative to the root

Feature/behavior summary

Multiple tests hard-code paths relative to the root, like so:

def setUp(self):
self.rootdir = "./tests/define/"

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?

Related issues

N/A

Solution description

Paths in tests should be relative to the test:

from pathlib import Path

path = Path(__file__).parent.joinpath("file-for-testing")

Additional notes

No response

Configuration makes assumptions about compilation database contents

Expected behavior

Compilation database entries like this should be ignored, because they don't define a command:

  {
    "directory": "/opt/nbody/serial",
    "command": "",
    "file": "meson-internal__clean-ctlist",
    "output": "clean-ctlist"
  },

Compilation database entries like this should be ignored, because the input file isn't something we recognize as a source file:

  {
    "directory": "/opt/nbody/serial",
    "command": "c++  -o nbody nbody.p/meson-generated_.._nbody.cpp.o nbody.p/src_nbody_CPU_AOS.cpp.o nbody.p/src_nbody_CPU_AOS_tiled.cpp.o nbody.p/src_nbody_CPU_AVX.cpp.o nbody.p/src_nbody_CPU_AltiVec.cpp.o nbody.p/src_nbody_CPU_NEON.cpp.o nbody.p/src_nbody_CPU_SOA.cpp.o nbody.p/src_nbody_CPU_SOA_tiled.cpp.o nbody.p/src_nbody_CPU_SSE.cpp.o nbody.p/src_nbody_render_gl.cpp.o nbody.p/src_nbody_util.cpp.o -flto -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -fPIC -pthread -lm",
    "file": "nbody.p/meson-generated_.._nbody.cpp.o",
    "output": "nbody"
  },

Actual behavior

Such compilation databases currently pass validation because they match the JSON schema.

We don't currently validate that the command is non-empty, nor that the input file is a source file, so CBI crashes when trying to process these entries.

Steps to reproduce the problem

Include the problematic entries in any valid compilation database.

Specifications

Tested with current main.

Deduplication leads to incorrect/unexpected results

Expected behavior

When a codebase contains multiple copies of the same file, that should count as divergence. Any change to one file requires a corresponding change to the other file. If two files are 99% the same with one line changed, we treat this as 100% divergence; it's very surprising that when two files are 100% the same, the divergence disappears.

This deduplication behavior was originally introduced as a quick fix for out-of-source builds, and in hindsight I think it was the wrong fix. Although deduplication allows out-of-source builds to produce meaningful results, it breaks the conceptual model.

Actual behavior

Codebase 1

cpu/foo.cpp

void foo() {}

gpu/foo.cpp

void foo() {}

Output

------------------------
Platform Set LOC  % LOC
------------------------
  {cpu, gpu}   1 100.00
------------------------
Code Divergence: 0.00
Unused Code (%%): 0.00
Total SLOC: 1

Codebase 2

cpu/foo.cpp

// CPU version
void foo() {}

gpu/foo.cpp

// GPU version
void foo() {}

Output

-----------------------
Platform Set LOC % LOC
-----------------------
       {cpu}   1 50.00
       {gpu}   1 50.00
-----------------------
Code Divergence: 1.00
Unused Code (%%): 0.00
Total SLOC: 2

Steps to reproduce the problem

Running codebasin without any options produces the behavior as described above.

Specifications

This bug was introduced way back in 6194e55.

Validation errors should be improved

Feature/behavior summary

Functions like util._load_json and util._load_toml currently throw exceptions that contain information from jsonschema. We intercept these (typically by catching BaseException) in order to throw another exception containing details about which type of file failed validation.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?

Related issues

No response

Solution description

We should adopt a new design of these validation routines that enables loading a JSON or TOML file to directly throw an exception containing meaningful information.

Additional notes

No response

Warning for include files is unclear

Feature/behavior summary

The warning message for an include file that cannot be found is:

[WARNING ] /path/to/file:line: 'path' not found

We should change the message to make it clear that:

  • It is related to an #include directive
  • It may affect the correctness of the divergence calculation
  • It is more important to pay attention to for non-system headers

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?

Related issues

No response

Solution description

We should look at how other tools handle multi-line warnings. But something like the below might work.

For non-system headers:

[WARNING ] /path/to/file:line: #include "path"
           Header file 'path' not found. 
           Ignoring this warning is likely to impact calculations.
           Check include paths.

For system headers:

[WARNING ] /path/to/file:line: #include <path>
           System header file 'path' not found.
           Ignoring this warning is likely to be safe.
           Consider specifying additional include paths in the importcfg.

...or we could consider separate "warning" and "severe warning" categories.

Additional notes

No response

divergent-source example is incorrect

The divergent-source example currently lists the following output:

-----------------------
Platform Set LOC % LOC
-----------------------
       {CPU}  19 26.39
       {GPU}  11 15.28
  {GPU, CPU}  42 58.33
-----------------------
Code Divergence: 0.42
Unused Code (%): 0.00
Total SLOC: 72

Distance Matrix
--------------
     GPU  CPU
--------------
GPU 0.00 0.42
CPU 0.42 0.00
--------------

The correct output should be:

-----------------------
Platform Set LOC % LOC
-----------------------
          {}   1  1.39
       {GPU}  18 25.00
       {CPU}  18 25.00
  {GPU, CPU}  35 48.61
-----------------------
Code Divergence: 0.51
Unused Code (%): 1.39
Total SLOC: 72

Distance Matrix
--------------
     CPU  GPU
--------------
CPU 0.00 0.51
GPU 0.51 0.00
--------------

CI pipeline is incomplete

Feature/behavior summary

We have pre-commit hooks and unittests that can be run offline, but should run them as part of the CI to ensure that we catch errors that contributors don't catch locally.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?

Related issues

No response

Solution description

  • Run the same formatting checks as the pre-commit hooks
  • Run the unit tests
  • Run coverage tests (after #36)

Additional notes

No response

Improve unit tests

Feature/behavior summary

While working on #87, it became apparent that several of the unit tests are actually end-to-end tests.

For example, instead of testing whether specific directives are parsed correctly, we currently test whether a file containing those directives results in an expected amount of code divergence.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?

Related issues

No response

Solution description

I think we should:

  • Identify the minimum set of functionality being tested by each unit test.
  • Rewrite the unit tests to use only the minimum set of functionality.

If there are cases where we believe things should remain an end-to-end test, it may make sense to separate those tests out.

Additional notes

No response

CBI doesn't parse command line options correctly

Expected behavior

When running, it should accept command line arguments of both -X VAL and --X=val (-h suggests the former, but only the latter works).

$ codebasin --report=all cbi-analysis.toml

works correctly.

Actual behavior

Using -X val causes an invalid choice error.

$ codebasin -R all cbi-analysis.toml
usage: codebasin [-h] [--version] [-r <dir>] [-c <config-file>] [-v] [-q] [-R <report> [<report> ...]] [-d <file.json>] [--batchmode] [-x <pattern>]
                 [-p <platform>]
                 [<analysis-file>]
codebasin: error: argument -R/--report: invalid choice: 'cbi-analysis.toml' (choose from 'all', 'summary', 'clustering')

Steps to reproduce the problem

Run CBI while requesting a particular report type.

Specifications

MacOS 14.5, Python 3.12

Cannot #include files without file extensions

Code Base Investigator uses a file's extension to detect whether it should be parsed as C or Fortran, and throws an error if a file has no extension. Many of the headers in the C++ standard library have no extension, and this prevents us from parsing certain files correctly.

We should assume that files included into C/C++ files are also C/C++, and not throw an error if a file extension is missing.

Hooks are stale and do not reflect latest coding standards

pylint complains that several options used by Code Base Investigator's rc file don't exist:

.pylintrc:1:0: E0015: Unrecognized option found: optimize-ast, files-output, module-name-hint, class-attribute-name-hint, class-name-hint, attr-name-hint, const-name-hint, variable-name-hint, inlinevar-name-hint, method-name-hint, function-name-hint, argument-name-hint, no-space-check (unrecognized-option)

...and many others were removed:

.pylintrc:1:0: R0022: Useless option value for '--disable', 'file-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'coerce-method' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'oct-method' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'buffer-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'map-builtin-not-iterating' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'input-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'round-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'cmp-method' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'import-star-module-level' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'standarderror-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'old-raise-syntax' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'zip-builtin-not-iterating' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'raw_input-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'using-cmp-argument' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'cmp-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'intern-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'unicode-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'reload-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'xrange-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'dict-view-method' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'coerce-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'filter-builtin-not-iterating' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'dict-iter-method' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'raising-string' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'reduce-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'delslice-method' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'print-statement' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'nonzero-method' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'hex-method' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'old-division' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'setslice-method' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'parameter-unpacking' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'unichr-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'execfile-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'range-builtin-not-iterating' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'metaclass-assignment' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'getslice-method' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'no-absolute-import' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'backtick' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'indexing-exception' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'next-method-called' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'unpacking-in-except' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'apply-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'long-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: R0022: Useless option value for '--disable', 'basestring-builtin' was removed from pylint, see https://github.com/pylint-dev/pylint/pull/4942. (useless-option-value)
.pylintrc:1:0: W0012: Unknown option value for '--disable', expected a valid pylint message and got 'old-ne-operator' (unknown-option-value)
.pylintrc:1:0: W0012: Unknown option value for '--disable', expected a valid pylint message and got 'long-suffix' (unknown-option-value)
.pylintrc:1:0: W0012: Unknown option value for '--disable', expected a valid pylint message and got 'old-octal-literal' (unknown-option-value)

When we have merged in the features currently in development and CBI is feature-complete for the next major release, we should consider modernizing our approach here. Using black & flake8 instead of autopep8 & pylint, and using Actions to enforce the style, would be more consistent with our other projects (e.g., https://github.com/intel/p3-analysis-library).

Waiting until the next release would give us a clean break between the old and new styles.

Remove deprecated features

Feature/behavior summary

1.2.0 will deprecate several features that we can remove in the next release. To simplify maintenance, we should remove them.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?

Related issues

No response

Solution description

The following features need to be removed:

  • --rootdir option
  • --config option and support for YAML configuration files
  • --dump option and JSON format for specialization trees
  • --batchmode option
  • Passing more than one argument to --report option
  • Scripts in the etc/ directory

Additional notes

Removing a user-facing deprecated feature will allow us to remove related internals, and may require us to rewrite some tests. I recommend we use a separate pull request to remove each feature, to make it easier to track related changes.

Macros defined by tools are not set

Whether all of the macros defined by a compiler appear in a compilation database is compiler-specific.

nvcc, for example, defines macros like __CUDACC__ and __CUDA_ARCH__, but these do not appear on the command line.

We need to either:

  1. Detect usage of certain tools (e.g. nvcc) and automatically define certain macros; or
  2. Provide a way for users to append to the values provided by a compilation database.

Analysis TOML file requires "codebase" section

Expected behavior

The following analysis file should be valid:

[platform.cpu]
commands = "cpu.json"

[platform.gpu]
commands = "gpu.json"

Actual behavior

We currently assume that the "codebase" section is always present:

[ERROR   ] 'codebase'

Steps to reproduce the problem

Running codebasin with the analysis file shown above will produce the error. Adding an empty "codebase" section causes things to work as expected.

Specifications

Tested with 9bfd0fb.

Increase test coverage

Our test coverage is currently only around 89%, which may be hiding some bugs.

A simple way to compute coverage statistics:

python -m coverage run -m unittest
python -m coverage report

UserWarning from scipy in single-source example

Expected behavior

Running the single-source example should not display any warnings or errors.

Actual behavior

scipy gives us a UserWarning, suggesting that we've made a mistake somewhere:

/path/to/scipy/cluster/hierarchy.py:2851: UserWarning: Attempting to set identical low and high xlims makes transformation singular; automatically expanding.
  ax.set_xlim([0, dvw])

I think this happens because in the single-source example all the code is shared.

Steps to reproduce the problem

$ cd /path/to/code-base-investigator/examples/single-source/
$ python /path/to/code-base-investigator/codebasin.py -c single-source.yaml -r .

Specifications

I don't know if this happens with earlier packages, but it affects the latest: scipy==1.12.0

Documentation needs an overhaul

The documentation and examples are out of date and incomplete.

Following what we did with the P3 Analysis Library, it would be much more user-friendly to introduce a webpage with:

  • An explanation of code divergence
  • An explanation of how Code Base Investigator works (and its limitations)
  • A gallery of simple examples, explained step-by-step
  • Links to related tools (Bear, P3 Analysis Library)

Help strings have inconsistent formatting

At the time of writing, the usage string for Code Base Investigator looks like this:

usage: codebasin.py [-h] [-r DIR] [-c FILE] [-v] [-q] [-R REPORT [REPORT ...]] [-d <file.json>] [--batchmode] [-x <pattern>]

Code Base Investigator v1.1.1

optional arguments:
  -h, --help            show this help message and exit
  -r DIR, --rootdir DIR
                        Set working root directory (default .)
  -c FILE, --config FILE
                        configuration file (default: <DIR>/config.yaml)
  -v, --verbose         increase verbosity level
  -q, --quiet           decrease verbosity level
  -R REPORT [REPORT ...], --report REPORT [REPORT ...]
                        desired output reports (default: all)
  -d <file.json>, --dump <file.json>
                        dump out annotated platform/parsing tree to <file.json>
  --batchmode           Set batch mode (additional output for bulk operation.)
  -x <pattern>, --exclude <pattern>
                        Exclude files matching this pattern from the code base. May be specified multiple times.

We should adopt a consistent style and format for these options. I know that argparse is configurable, but I have limited experience with it. Ideally, I think we should aim for something closer to:

usage: codebasin.py [-h] [-r <directory>] [-c <file>] [-v] [-q] [-R <report> [<report> ...]] [-d <file>] [--batchmode] [-x <pattern>]

Code Base Investigator v1.1.1

Options:
  -h, --help
    Show this help message and exit.
    
  -r <directory>, --rootdir <directory>
    Set working root directory.
    Default: current directory.
    
  -c FILE, --config FILE
    Specify a configuration file.
    Default: config.yaml in the root directory.

  -v, --verbose
    Increase verbosity level.
    May be specified multiple times.

  -q, --quiet
    Decrease verbosity level.
    May be specified multiple times.

  -R <report> [<report> ...], --report <report> [<report> ...]
    Enable the specified output report(s).
    Default: all
   
  -d <file>, --dump <file>
    Dump an annotated platform/parsing tree to file in JSON format.
    
  --batchmode
    Enable batch mode. Prints additional output for bulk operation.

  -x <pattern>, --exclude <pattern>
    Exclude files matching this pattern from the code base.
    May be specified multiple times.

If we wanted to be really fancy we could even try and group the options based on what they do.

Codebase specification is too hard for large projects

Accurately describing all the files in a codebase using regular expressions is very difficult for large projects.

Code Base Investigator already supports loading commands from compilation databases, but the user must still specify which files appearing in the compilation database should be considered part of the codebase.

There should be a mode that allows Code Base Investigator to discover the files in a codebase from the platform definitions.

Logging format could be improved

There are several inconsistencies in the way that we report warnings and errors. Some are prefixed with information about where (i.e., in which file) an error occurred, some are quite vague, and others are the result of uncaught exceptions. Adopting a consistent format for anything that we log as a warning or error would likely improve usability.

While we're at it, I would also like us to revisit the formatting used by the logger. The current output looks like this:

[INFO   ] An informational message
[WARNING] A warning.
[ERROR  ] Error: An error occurred.

A format like the below would be more consistent with compilers and other command line tools:

An information message.
warning: a warning
error: an error

...especially if we could employ bold and colors to match gcc/clang.

Some dendrograms have no name

Expected behavior

Dendrograms and other outputs should have unique and meaningful names.

Actual behavior

When driven from a TOML analysis file, the dendrogram produced has the name -dendrogram.png. This occurs because:

  • There is no YAML configuration file for use by guess_project_name.
  • There is no explicit list of platforms on the command line.

Steps to reproduce the problem

Use any analysis TOML file.

Specifications

Tested with 9bfd0fb.

Report duplicate files

Feature/behavior summary

As detailed in #72 and #79, the handling of duplicate files is complicated.

Since a duplicate file is likely to be the cause of a misconfigured codebase or something that the user may want to address, reporting the amount of duplicated code and/or identifying duplicate files in a codebase is something we should consider reporting.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?

Related issues

#72

Solution description

The simplest solution here would just be to issue a warning when we encounter a duplicate file.

Additional notes

No response

Variable name "args" can be confused with "*args"

Several functions in CBI take a parameter called args, intended to mean the "arguments" list that appears in a compilation database entry.

This can easily be confused with *args, and should be changed to something like argv: list[str] across all impacted functions.


I think the association is that args is usually used like (*args, **kwargs) and unpacked, so I mentally assume multiple things will be passed into the function like so:

def func(*args):
    for arg in args: ...

where I can use the function like func(a, b, c, d), whereas this current style I can only use it via func((a, b, c, d)).

Originally posted by @laserkelvin in #27 (comment)

Explore use of pathlib instead of os.path

Feature/behavior summary

pathlib offers a higher-level abstraction of paths than os.path, including a Path object. Moving to pathlib would make the code cleaner, and would make it simpler for us to expose (via type-hints and docstrings) when certain functions expect paths.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?

Related issues

N/A

Solution description

  • Migrate existing uses of os.path to pathlib
  • Update interfaces to accept Path or os.PathLike instead of str

Additional notes

If applied generally, I think this would break backwards compatibility -- some functions accepting strings should probably accept a Path (or os.PathLike).

Track all source files in the source directory

Feature/behavior summary

With the modern interface, only files that are explicitly listed in a compilation database are known to CBI. Consequently, source files that are completely unused do not show up in the "Unused" category.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?

Related issues

No response

Solution description

The solution is to walk the contents of the source directory and keep track of all files that are recognized as source (based on their extension). It probably makes sense to perform this as a separate step, since we may want to suppress warnings relating to such files -- if they're normally unused, they might not actually be valid.

Additional notes

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.