17451k / clade Goto Github PK
View Code? Open in Web Editor NEWClade is a tool for extracting information about software build process and source code
License: Apache License 2.0
Clade is a tool for extracting information about software build process and source code
License: Apache License 2.0
BTW, this can result in #42.
Right now it is quite confusing, difficult to use and it lacks some features available in the low-level interface. docstrings are also must be added.
At the moment there is just the following error message "CIF output has unexpected format" on screen. This does not show particular errors. Besides, there is not any other sources of additional information.
Passing an unsupported option to CIF usually results in fail. It is not possible to specify complete list of unsupported options, so instead it would be better to pass only those options that are truly needed.
To easily distinguish different stored sources using check sums calculated in advance seem to be the best solution. Other solutions like providing attributes for stored sources, calculating check sums on the fly or archiving sources and comparing archive sizes are worse by different reasons.
Without this it is impossible to properly work with commands that store their arguments in temporary files, like some Microsoft build utilities:
cl.exe @"C:\Users\shchepetkov\AppData\Local\Temp\tmp0a392aa8bcbf41339a14bcbe325dd7d1.rsp"
When directory "Functions" exists Clade still needs to check for file "functions.json" existence.
Currently instead of shutting down with error in case of incorrect command line argument, Clade can just hang.
Recently we discussed that cross referencing is a very valuable feature for users. Clade should perform additional code querying for gathering necessary data and generate indexes for source files.
Some commands are in fact just wrappers for other commands. For example, on macOS calling gcc can result in the following command stack:
- /usr/bin/gcc ...
- /Library/Developer/CommandLineTools/usr/bin/gcc ...
- /usr/bin/xcrun clang ...
- /Library/Developer/CommandLineTools/usr/bin/clang ...
- /Library/Developer/CommandLineTools/usr/bin/clang -cc1 ...
Generally we are interested only in the first command in such stack, but currently there is no way to know that these commands are connected to each other, so corresponding extensions (CC, in case of this example) process all of them as independent commands. The result is several duplicate commands.
The reason is Linux-specific command line tools used in cif.c file.
CIF is an optional dependency of Clade used for getting information about source code.
Example:
{
"command":"ld",
"cwd":"/work/git/linux",
"id":29309,
"in":[
"max-page-size=0x200000",
"drivers/acpi/.tmp_scan.o"
],
"opts":[
"-m",
"elf_x86_64",
"-z",
"-r",
"-T",
"drivers/acpi/.tmp_scan.ver"
],
"out":"drivers/acpi/scan.o"
}
max-page-size is definitely not an input file.
Unparsed command:
{
"command":[
"ld",
"-m",
"elf_x86_64",
"-z",
"max-page-size=0x200000",
"-r",
"-o",
"drivers/acpi/scan.o",
"drivers/acpi/.tmp_scan.o",
"-T",
"drivers/acpi/.tmp_scan.ver"
],
"cwd":"/work/git/linux",
"id":29309,
"which":"/usr/bin/ld"
},
It should be possible to execute build process without having to manually change current directory.
For example, compilation command with "-c" option and multiple input files have multiple output files as well.
Like the ones that starts with "-":
clean:
-rm -f *.o
Currently clade-intercept exits with error on such make commands, but it should return 0 instead.
There are already some API functions inside extensions classes (for example, methods load_cmd_by_id()
and load_all_cmds())
. We need to add more such methods.
There are another problem: to use these interface methods it is required to manually create extension object beforehand. Perhaps interface functions should be independent from extensions classes.
I think that for some long operations of Clade, e.g. querying source code, you can evaluate and output a progress quite easily.
The issue is not very important since Clade does not work very much time first and it is not intended to be invoked often.
Class Extention constructor contains the following code:
def __init__(self, work_dir, conf=None, preset="base"): ... self.temp_dir = tempfile.mkdtemp()
This leads to creation of billions of dirs in /tmp even if a user is not going to call parse method.
Logging implementation should be the same across all Clade scripts.
load_all_cmds() method of the CC extension can contain linker or assembler commands. This is expected and right, but sometimes user want to receive only proper compilation commands, so an additional method is required.
Installation log:
$ sudo pip3 install -e .
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Obtaining file:///home/work/tmp/clade
Requirement already satisfied: ujson in /usr/local/lib64/python3.6/site-packages (from clade==1.0)
Requirement already satisfied: graphviz in /usr/local/lib/python3.6/site-packages (from clade==1.0)
Requirement already satisfied: jinja2 in /usr/local/lib64/python3.6/site-packages (from clade==1.0)
Requirement already satisfied: ply in /usr/lib/python3.6/site-packages (from clade==1.0)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib64/python3.6/site-packages (from jinja2->clade==1.0)
Installing collected packages: clade
Found existing installation: clade 1.0
Can't uninstall 'clade'. No files were found to uninstall.
Running setup.py develop for clade
Successfully installed clade
Run:
$ clade
Traceback (most recent call last):
File "/usr/local/bin/clade", line 11, in <module>
load_entry_point('clade', 'console_scripts', 'clade')()
File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 476, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2699, in load_entry_point
raise ImportError("Entry point %r not found" % ((group, name),))
ImportError: Entry point ('console_scripts', 'clade') not found
Can be reproduced on the test-project on Ubuntu 18.04 with any version of GCC.
The problem is in the abstract extension that calls logging.basicConfig.
pip3 install clade (Ubuntu):
Failed building wheel for clade
error: can't copy 'clade/libinterceptor/lib': doesn't exist or not a regular file
Command "/usr/bin/python3 -u -c "import setuptools, tokenize;file='/tmp/pip-install-0cttqg1y/clade/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-record-1b0ttqqp/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-0cttqg1y/clade/
Installing Clade with a command
sudo pip3 install -t ../install .
I do not have bin directory with required scripts. Without sudo it does not work at all because pip prevents installation at all (Ubuntu bug as far as I know):
raise DistutilsOptionError("can't combine user with prefix, " distutils.errors.DistutilsOptionError: can't combine user with prefix, exec_prefix/home, or install_(plat)base
As Eugeny explained os.path.join() does not work with absolute paths. Thus, in info.py:148 cif_out is usually assigned to origin "cmd_in" + ".o"
Code starting to look really complicated without it.
Currently only compatibility with Python 3.5, 3.6 and 3.7 is tested. Adding 3.4 to the list is a little bit tricky due to issues in Travis.
At the moment, the commands arguments are separated by the "||" characters. These characters can also occur in the arguments themselves, so it would be better if we structure this file a little bit differently.
Updated readme should include:
See details at: clade/extensions/callgraph.py:121-122.
BTW, this was mentioned at #45.
I catch the following exception:
clade Callgraph: Processing calls
'NoneType' object is not iterable
It is hard to understand what went wrong.
Experimenting with the linux bases I noted that load_all_cmds_by_type provides more commands than expected by the implied preset. In PDF file I do not observe these commands but the method returns them.
For instance, there are commands with empty in files attribute or .tmp\w+.s in files.
If Clade does not run CIF at all (e.g. this is the case when there aren't input files), it still blames it that it fails on every command. Instead, I expect that there should be errors, e.g. that input files are missed.
Currently, this path is relative to the directory in which the compilation command was executed.
Probably info: expand(__EXPORT_SYMBOL(sym, sec))
request doesn't work properly.
Sometimes CIF outputs really large files: for example, for macros expansions for the whole Linux kernel the size of the output file is approximately 34GB. It consists almost entirely from duplicate lines which must be removed from the file. Currently this process takes almost an hour of time, which is unacceptable.
By some unknown reason Clade does not print all CIF logs both when errors happen and without that.
At least PDF shows that all source files are built into all object files. Likely this is caused by an incorrect internal representation. I am not sure if this is the case for CC and other compilers as well.
It is unbalanced: if processed data is uneven then in the end there is only one worker left with a bunch data to process.
Consider to use concurrent.futures or multiprocessing.Queue().
To fix "wrong ELF class: ELFCLASS64 (and ELFCLASS32)" errors.
And add option to change this behavior.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.