Git Product home page Git Product logo

cmd-call-graph's Introduction

cmd-call-graph

PyPI

A simple tool to generate a call graph for calls within Windows CMD (batch) files.

The tool is available on PyPI: https://pypi.org/project/cmd-call-graph/

By default, it takes the input file as stdin and outputs the resulting file to stdout, outputting logs and errors to stderr.

Output Examples

Given the following CMD script:

@echo off
call :foo
goto :eof
:bar
    echo "in bar"
    call :baz
    call :baz
:baz
    echo "in baz"
    call powershell.exe Write-Host "Hello World from PowerShell"

:foo
    echo "In foo"
    goto :bar

This script would generate the following graph:

call graph

If the --hide-node-stats option is enabled, then the following graph would be generated:

call graph showall

Invocation Examples

Invocation example for Ubuntu Linux and WSL (Windows Subsystem for Linux), assumes Python and pip are installed:

$ pip install cmd-call-graph
$ cmd-call-graph < your-file.cmd > your-file-call-graph.dot 2>log

The resulting dot file can be rendered with any dot renderer. Example with graphviz (VIEWER could be explorer.exe under Windows):

$ sudo apt install graphviz
$ dot -Tpng your-file-call-graph.dot > your-file-call-graph.png
$ $VIEWER your-file-call-graph.png

Example with PowerShell:

PS C:\> choco install graphviz python3 pip
PS C:\> cmd-call-graph.exe -i your-file.cmd -o your-file-call-graph.dot
PS C:\> dot.exe -Tpng your-file-call-graph.dot -O
PS C:\> explorer.exe your-file-call-graph.dot.png

Types of entities represented

The script analyzes CMD scripts, and represents each block of text under a given label as a node in the call graph.

Node properties

Each node always contains the line number where it starts, except if the node is never defined in the code, which can happen in case of programming errors, dynamic node names (e.g., %command%) and the eof pseudo-node.

If a node causes the program to exit, it is marked as terminating.

Each node contains the following extra stats, if present:

  • number of lines of code (LOC);
  • number of external calls.

Special nodes

There are 2 special nodes:

  • _begin_ is a pseudo-node inserted at the start of each call graph, which represents the start of the script, which is by definition without a label;
  • eof, which may or may not be a pseudo-node. In CMD, eof is a special node that is used as target of goto to indicate that the current "subroutine" should terminate, or the whole program should terminate if the call stack is empty.

The eof node is automatically removed if it's a pseudo-node and it's not reached via call or nested connections.

The _begin_ pseudo-node is removed if there is another node starting at line 1.

Types of connections

  • goto: if an edge of type goto goes from A to B, it means that in the code within the label A there is an instruction in the form goto :B.
  • call: if an edge of type call goes from A to B, it means that in the code within the label A there is an instruction in the form call :B.
  • nested: if an edge of type nested goes from A to B, it means that in the code within the label A ends directly where B starts, and there is no goto or exit statement at the end of A which would prevent the execution from not going into B as A ends.

Example of a nested connection:

A:
  echo "foo"
  echo "bar"
B:
  echo "baz"

The above code would lead to a nested connection between A and B.

Command-line options

The input file needs to be passed as an argument.

  • --simplify-calls: create one edge for each type of connection instead of creating one for each individual call/goto (which is the default). Leads to a simpler but less accurate graph;
  • --hide-node-stats: removes from each node additional information about itself (i.e., number of lines of code, number of external calls);
  • --nodes-to-hide: hides the list of nodes passed as a space-separated list after this parameter.
  • -v or --verbose: enable debug output, which will be sent to the log file;
  • -l or --log-file: name of the log file. If not specified, the standard error file is used;
  • -o or --output: name of the output file. If not specified, the standard output file is used.

Legend for Output Graphs

The graphs are self-explanatory: all information is codified with descriptive labels, and there is no information conveyed only with color or other types of non-text graphical hint.

Colors are used to make the graph easier to follow, but no information is conveyed only with color.

Here is what each color means in the graph:

  • Orange: goto connection;
  • Blue: call connection;
  • Teal: nested connection;
  • Light gray: background for terminating nodes

Why?

Sometimes legacy code bases may contain old CMD files. This tool allows to generate a visual representation of the internal calls within the script.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Unit tests

Run unit tests from the project root either with the built-in unittest module:

python -m unittest discover

Or by using pytests, which can produce reports both for unit test success and for code coverage, by using the following invocation:

pip install pytest
pip install pytest-cov
pytest tests --doctest-modules --junitxml=junit/test-results.xml --cov=callgraph --cov-report=xml --cov-report=html

cmd-call-graph's People

Contributors

ankitsaxena21 avatar dcrusty avatar dependabot[bot] avatar lupino3 avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar msftgits avatar notheotherben avatar refack avatar riccardompesce avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cmd-call-graph's Issues

Track the terminating vs. non-terminating code paths for a given node

A node can be both terminating and non-terminating, depending on how it's invoked.

If it's reached via goto or nested from a node that would have been terminating, it's terminating for that code path. But if it's reached via call from a non-terminating node, it's not terminating.

It would be nice to surface this information somehow. The easiest option might be a warning, not sure how to actually represent this case in the call graph.

Add a lint mode

The tool can identify some flaws in batch files, for example goto / call to non-existing labels.

Implement a --lint option that only outputs potential problems with a given script.

Migrate CI to Azure Pipelines

It would be nice to run automated tests on Azure Pipelines instead of Travis.

Once that works, we could even automate publishing the build artifacts to PyPi.

Add unit tests

The script was published "as-is", but for any further change, and to validate existing behavior, we really need to add some basic tests.

Update description replacing files with file

The description and the README say:

A simple tool to generate a call graph for Windows CMD (batch) files.

IMHO, it should read "for Windows CMD (batch) file."

The "files" suggest one can generate graph for batch scripts CALL-ing other batch scripts which, AFAICT, is not the case. Call graph can be generated for single batch file only.

Fix handling of nodes with %

If a node has a name with percent signs in it, for example %command%, the dot program doesn't seem to like it, and outputs a garbage node with just a percent sign and a letter or number.

Figure out why that's the case, and fix it.

Improve command-line options

All the options around getting a richer output should be enabled by default. That's how I mostly use the tool and there is no reason to disable them unless specifically desired by the user.

Also I think using stdin as input can be confusing (the tool just hangs if no input is given), so I'd change -i to be mandatory and possibly even positional instead of an argument.

Improve detection of exit nodes

Some exit nodes are not detected by the current logic, which is rather primitive and looks at where the script ends, rather than exploiting the call graph.

For example, the script wouldn't recognize the bar node as the exit node in the following code:

goto :foo
:bar
exit /b 1
:foo
goto bar

Follow calls to external batch scripts

We could extend the tool to follow calls to external batch scripts, and produce a larger call graph including a set of scripts.

There is already some primitive logic to identify external calls in CallGraph._AnnotateNode, as it generates Command instances of type external_call, but those commands are not processed in the later loop that goes through all Command instances. Therefore the logic to process external calls can be added to that loop.

What we can do there is simply call CallGraph.Build again to recursively generate another instance of CallGraph, that we would need to add to a container in the original CallGraph instance itself, keeping track of this new type of connection.

There are a few open questions that come to mind:

  1. we need to handle recursion and not fall into the trap of infinite loops (for example, specifying a depth parameter to limit how deep we go in the chain of calls.
  2. we need to de-duplicate call graphs for external batch files, so that if we call a given file multiple times we point to the same sub-graph.
  3. we need to handle gracefully calls to non-batch files or files we don't want to expand (for example if we reached maximum depth). For example, by adding a single node for the file if we don't want or can't expand the given file
  4. we need to decide how to visually represent the different files. For example, we could have rectangular enclosures around each batch file involved in the graph.
  5. we may want to limit the set of files to expand (for example, only the ones belonging to a specific codebase). This might be achieved by specifying a set of files to expand as a command-line parameter.

Fix the build

Unit tests are failing, but the recent changes only touch README.md.

The error is the following:

2019-01-06T17:35:53.2926467Z =================================== FAILURES ===================================
2019-01-06T17:35:53.2930569Z ______________________ CodeLineTest.test_command_counters ______________________
2019-01-06T17:35:53.2933899Z 
2019-01-06T17:35:53.2940444Z self = <pytest_cov.plugin.CovPlugin object at 0x7f336dc97cd0>
2019-01-06T17:35:53.2950537Z item = <TestCaseFunction test_command_counters>
2019-01-06T17:35:53.2957731Z 
2019-01-06T17:35:53.2958299Z     @compat.hookwrapper
2019-01-06T17:35:53.2973012Z     def pytest_runtest_call(self, item):
2019-01-06T17:35:53.2989594Z >       if (item.get_marker('no_cover')
2019-01-06T17:35:53.2990087Z                 or 'no_cover' in getattr(item, 'fixturenames', ())):
2019-01-06T17:35:53.2990448Z E               AttributeError: 'TestCaseFunction' object has no attribute 'get_marker'

This seems to be related to a very recent change in pytest. From https://docs.pytest.org/en/latest/changelog.html#pytest-4-1-0-2019-01-05:

#4546: Remove Node.get_marker(name) the return value was not usable for more than a existence check. - pytest-dev/pytest#4546

Visual representation of node size

It would be nice to visualize somehow the relationship between nodes from the point of view of number of lines of code. For example, have node size be proportional to the number of lines of code.

Fix connection type for blocks ending in goto

As an example:

    goto :foo
    :foo
    exit

This snippet creates the following dot file:

digraph g {
__begin__ [label=<<b>__begin__</b><br />(line 1)>]
__begin__ -> foo [label=goto,color=red3]
__begin__ -> foo [label=nested,color=blue3]
foo [label=<<b>foo</b><br />(line 2)>,color=red,penwidth=2]
}

The connection of type nested is wrong, and should not be reported.

Make the ordering of the resulting graph deterministic

The rendered graphs often have different shapes. It would be nice to change the resulting dot code to always use the same ordering, if possible.

Also, it would be nice to have begin at the start and the terminating nodes at the end.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.