microsoft / cmd-call-graph Goto Github PK

View Code? Open in Web Editor NEW

48.0 11.0 18.0 1.3 MB

A simple tool to generate a call graph for calls within Windows CMD (batch) files.

License: MIT License

Python 97.78% Shell 2.22%

batch-script batch-file static-code-analysis call-graph call-graph-analysis python

cmd-call-graph's Introduction

cmd-call-graph

A simple tool to generate a call graph for calls within Windows CMD (batch) files.

The tool is available on PyPI: https://pypi.org/project/cmd-call-graph/

By default, it takes the input file as stdin and outputs the resulting file to stdout, outputting logs and errors to stderr.

Output Examples

Given the following CMD script:

@echo off
call :foo
goto :eof
:bar
    echo "in bar"
    call :baz
    call :baz
:baz
    echo "in baz"
    call powershell.exe Write-Host "Hello World from PowerShell"

:foo
    echo "In foo"
    goto :bar

This script would generate the following graph:

If the --hide-node-stats option is enabled, then the following graph would be generated:

Invocation Examples

Invocation example for Ubuntu Linux and WSL (Windows Subsystem for Linux), assumes Python and pip are installed:

$ pip install cmd-call-graph
$ cmd-call-graph < your-file.cmd > your-file-call-graph.dot 2>log

The resulting dot file can be rendered with any dot renderer. Example with graphviz (VIEWER could be explorer.exe under Windows):

$ sudo apt install graphviz
$ dot -Tpng your-file-call-graph.dot > your-file-call-graph.png
$ $VIEWER your-file-call-graph.png

Example with PowerShell:

PS C:\> choco install graphviz python3 pip
PS C:\> cmd-call-graph.exe -i your-file.cmd -o your-file-call-graph.dot
PS C:\> dot.exe -Tpng your-file-call-graph.dot -O
PS C:\> explorer.exe your-file-call-graph.dot.png

Types of entities represented

The script analyzes CMD scripts, and represents each block of text under a given label as a node in the call graph.

Node properties

Each node always contains the line number where it starts, except if the node is never defined in the code, which can happen in case of programming errors, dynamic node names (e.g., %command%) and the eof pseudo-node.

If a node causes the program to exit, it is marked as terminating.

Each node contains the following extra stats, if present:

number of lines of code (LOC);
number of external calls.

Special nodes

There are 2 special nodes:

_begin_ is a pseudo-node inserted at the start of each call graph, which represents the start of the script, which is by definition without a label;
eof, which may or may not be a pseudo-node. In CMD, eof is a special node that is used as target of goto to indicate that the current "subroutine" should terminate, or the whole program should terminate if the call stack is empty.

The eof node is automatically removed if it's a pseudo-node and it's not reached via call or nested connections.

The _begin_ pseudo-node is removed if there is another node starting at line 1.

Types of connections

goto: if an edge of type goto goes from A to B, it means that in the code within the label A there is an instruction in the form goto :B.
call: if an edge of type call goes from A to B, it means that in the code within the label A there is an instruction in the form call :B.
nested: if an edge of type nested goes from A to B, it means that in the code within the label A ends directly where B starts, and there is no goto or exit statement at the end of A which would prevent the execution from not going into B as A ends.

Example of a nested connection:

A:
  echo "foo"
  echo "bar"
B:
  echo "baz"

The above code would lead to a nested connection between A and B.

Command-line options

The input file needs to be passed as an argument.

--simplify-calls: create one edge for each type of connection instead of creating one for each individual call/goto (which is the default). Leads to a simpler but less accurate graph;
--hide-node-stats: removes from each node additional information about itself (i.e., number of lines of code, number of external calls);
--nodes-to-hide: hides the list of nodes passed as a space-separated list after this parameter.
-v or --verbose: enable debug output, which will be sent to the log file;
-l or --log-file: name of the log file. If not specified, the standard error file is used;
-o or --output: name of the output file. If not specified, the standard output file is used.

Legend for Output Graphs

The graphs are self-explanatory: all information is codified with descriptive labels, and there is no information conveyed only with color or other types of non-text graphical hint.

Colors are used to make the graph easier to follow, but no information is conveyed only with color.

Here is what each color means in the graph:

Orange: goto connection;
Blue: call connection;
Teal: nested connection;
Light gray: background for terminating nodes

Why?

Sometimes legacy code bases may contain old CMD files. This tool allows to generate a visual representation of the internal calls within the script.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Unit tests

Run unit tests from the project root either with the built-in unittest module:

python -m unittest discover

Or by using pytests, which can produce reports both for unit test success and for code coverage, by using the following invocation:

pip install pytest
pip install pytest-cov
pytest tests --doctest-modules --junitxml=junit/test-results.xml --cov=callgraph --cov-report=xml --cov-report=html

cmd-call-graph's People

Contributors

Stargazers

Watchers

Forkers

refack spsforks zaq9 riccardompesce ankitsaxena21 modulexcite venkata16924b bhaskers-blu-org2 taffywrinkle claudiusgonzo cmg-src qqnidaye orbital-transfer-survey standardgalactic vivekwebm2020 viveklucky2022 ppjmoors

cmd-call-graph's Issues

Track the terminating vs. non-terminating code paths for a given node

A node can be both terminating and non-terminating, depending on how it's invoked.

If it's reached via goto or nested from a node that would have been terminating, it's terminating for that code path. But if it's reached via call from a non-terminating node, it's not terminating.

It would be nice to surface this information somehow. The easiest option might be a warning, not sure how to actually represent this case in the call graph.

Publish test and coverage results to Azure DevOps

Might require migrating tests to pytest.

Create a ChangeLog

The Keep a ChangeLog format has worked well for me in the past, let's adopt it.

Add a lint mode

The tool can identify some flaws in batch files, for example goto / call to non-existing labels.

Implement a --lint option that only outputs potential problems with a given script.

Add a PowerShell usage example

Migrate to Python 3

Python 2's EOL is at the end of 2019 (see https://legacy.python.org/dev/peps/pep-0373/). It does not make sense to keep supporting both Python 2 and 3: it would instead make sense to migrate the codebase to Python 3 and fully use the new features that recent versions of Python offer.

Migrate CI to Azure Pipelines

It would be nice to run automated tests on Azure Pipelines instead of Travis.

Once that works, we could even automate publishing the build artifacts to PyPi.

Fix treating the last block as exit node if it's not

The script currently considers the last node as a terminating node, no matter what.

In reality, the last node could contain, as a last statement, a goto which makes it not a terminating node.

Add option to display some statistics for each node

It would be nice to print:

number of lines of code in the node;
number of external calls in a node.

Add unit tests

The script was published "as-is", but for any further change, and to validate existing behavior, we really need to add some basic tests.

The right parenthesis that signals the end of a codeblock is considered a part of the label

Example: when a batch file contains a line like this:

if foo==%var% (foo & goto :foo) ELSE (bar & goto :bar)

cmd-call-graph considers foo) and bar) as labels.

Update description replacing files with file

The description and the README say:

A simple tool to generate a call graph for Windows CMD (batch) files.

IMHO, it should read "for Windows CMD (batch) file."

The "files" suggest one can generate graph for batch scripts CALL-ing other batch scripts which, AFAICT, is not the case. Call graph can be generated for single batch file only.

Add option to enable/disable verbose output

My god this is amazing !!!111

Borkes when trying to parse https://github.com/nodejs/node/blob/master/vcbuild.bat, but that's not surprising.
I'll try to submit a PR later.

Design a better color scheme for the output

The color scheme today is rather primitive. It would be nice to have a better color scheme, with subtler and more thought-out colors.

Ideally, the output should not rely on colors only to convey information, to be more accessible (see https://usabilla.com/blog/how-to-design-for-color-blindness/).

Add options to choose the input file, output file and log file

Execute a full end-to-end test in Travis

It would be nice to run dot on Travis on the output of the tool, to verify that the dot code that it produces is readable by the dot tool.

Fix handling of nodes with %

If a node has a name with percent signs in it, for example %command%, the dot program doesn't seem to like it, and outputs a garbage node with just a percent sign and a letter or number.

Figure out why that's the case, and fix it.

Add version number to --help output

Improve command-line options

All the options around getting a richer output should be enabled by default. That's how I mostly use the tool and there is no reason to disable them unless specifically desired by the user.

Also I think using stdin as input can be confusing (the tool just hangs if no input is given), so I'd change -i to be mandatory and possibly even positional instead of an argument.

Automate pushing releases to PyPI

Documentation: https://docs.microsoft.com/en-us/azure/devops/pipelines/artifacts/pypi?view=vsts&tabs=yaml

define what triggers publishing to PyPI
define how to handle version numbers
define how to handle the changelog
handle authentication (https://docs.microsoft.com/en-us/azure/devops/pipelines/tasks/package/twine-authenticate?view=vsts)
implement pushing to PyPI

Improve detection of exit nodes

Some exit nodes are not detected by the current logic, which is rather primitive and looks at where the script ends, rather than exploiting the call graph.

For example, the script wouldn't recognize the bar node as the exit node in the following code:

goto :foo
:bar
exit /b 1
:foo
goto bar

Follow calls to external batch scripts

We could extend the tool to follow calls to external batch scripts, and produce a larger call graph including a set of scripts.

There is already some primitive logic to identify external calls in CallGraph._AnnotateNode, as it generates Command instances of type external_call, but those commands are not processed in the later loop that goes through all Command instances. Therefore the logic to process external calls can be added to that loop.

What we can do there is simply call CallGraph.Build again to recursively generate another instance of CallGraph, that we would need to add to a container in the original CallGraph instance itself, keeping track of this new type of connection.

There are a few open questions that come to mind:

we need to handle recursion and not fall into the trap of infinite loops (for example, specifying a depth parameter to limit how deep we go in the chain of calls.
we need to de-duplicate call graphs for external batch files, so that if we call a given file multiple times we point to the same sub-graph.
we need to handle gracefully calls to non-batch files or files we don't want to expand (for example if we reached maximum depth). For example, by adding a single node for the file if we don't want or can't expand the given file
we need to decide how to visually represent the different files. For example, we could have rectangular enclosures around each batch file involved in the graph.
we may want to limit the set of files to expand (for example, only the ones belonging to a specific codebase). This might be achieved by specifying a set of files to expand as a command-line parameter.

Fix the build

Unit tests are failing, but the recent changes only touch README.md.

The error is the following:

2019-01-06T17:35:53.2926467Z =================================== FAILURES ===================================
2019-01-06T17:35:53.2930569Z ______________________ CodeLineTest.test_command_counters ______________________
2019-01-06T17:35:53.2933899Z 
2019-01-06T17:35:53.2940444Z self = <pytest_cov.plugin.CovPlugin object at 0x7f336dc97cd0>
2019-01-06T17:35:53.2950537Z item = <TestCaseFunction test_command_counters>
2019-01-06T17:35:53.2957731Z 
2019-01-06T17:35:53.2958299Z     @compat.hookwrapper
2019-01-06T17:35:53.2973012Z     def pytest_runtest_call(self, item):
2019-01-06T17:35:53.2989594Z >       if (item.get_marker('no_cover')
2019-01-06T17:35:53.2990087Z                 or 'no_cover' in getattr(item, 'fixturenames', ())):
2019-01-06T17:35:53.2990448Z E               AttributeError: 'TestCaseFunction' object has no attribute 'get_marker'

This seems to be related to a very recent change in pytest. From https://docs.pytest.org/en/latest/changelog.html#pytest-4-1-0-2019-01-05:

#4546: Remove Node.get_marker(name) the return value was not usable for more than a existence check. - pytest-dev/pytest#4546

Do not require the value True to be passed in existing options

Instead of --show-node-stats=True, it would be nicer to just type --show-node-stats.

Keep track of how many terminating statements are in a node

It might be useful to know how many terminating statements exists in a given terminating node.

This repo is missing important files

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

Merge this pull request

    goto :foo
    :foo
    exit

This snippet creates the following dot file:

digraph g {
__begin__ [label=<<b>__begin__</b><br />(line 1)>]
__begin__ -> foo [label=goto,color=red3]
__begin__ -> foo [label=nested,color=blue3]
foo [label=<<b>foo</b><br />(line 2)>,color=red,penwidth=2]
}

The connection of type nested is wrong, and should not be reported.

Make the ordering of the resulting graph deterministic

The rendered graphs often have different shapes. It would be nice to change the resulting dot code to always use the same ordering, if possible.

Also, it would be nice to have begin at the start and the terminating nodes at the end.