Comments (3)
Current state
Depending on the kind of CppEntity
being parsed, declarations and instantiations/calls may or may not persist additional (duplicate) entities for the relevant pieces of code into the database.
Here are my observations on the current state of our master branch (without any modifications):
Variables
This code produces 3 CppVariable
entities, even though they are all technically just declarations:
extern int v0;
extern int v0;
extern int v0;
This code also produces 3 CppVariable
entities, even though only one of them is the definition:
extern int v1;
int v1;
extern int v1;
Note that an AST node is also generated for each of them, so navigation on the UI is working fine.
Records
This code produces no CppRecord
entities, as there is no definition:
struct s0;
struct s0;
struct s0;
This code produces 1 CppRecord
entity for the one definition:
struct s1;
struct s1 {};
struct s1;
Note that an AST node is still generated for each of them regardless of definition, so navigation on the UI is working fine.
Functions
This code produces 3 CppFunction
entities, even though they are all declarations:
void f0();
void f0();
void f0();
This code also produces 3 CppFunction
entities, even though only one of them is the definition:
void f1();
void f1() {}
void f1();
Note that an AST node is also generated for each declaration and definition as well, so navigation on the UI is working fine.
This code produces 4 CppFunction
entities for f2
, and 4 CppVariable
entities for p0
, p1
, p2
and the unnamed parameter.
void f2(int p0);
void f2(int p1) {}
void f2(int p2);
void f2(int);
Interesting side note: When I select either of the f2
variants on the UI, the Info Tree displays p0
as the parameter of f2
, regardless of which declaration I select. It seems that parameter info are queried on a "first entity that matches the function hash" basis. This means that getting rid of multiple CppFunction
entities would probably still result in a correct match. If we only generated CppVariable
entities for the definition, that would also always result in that name being returned which will actually be used in the function's body. This would be more logical here, as the parameter whose value would actually be used is p1
in this case.
This code produces 5 CppFunction
entities for f3
, and 5 CppVariable
entities for p
.
void f3(int p); // in a shared main.h
void f3(int p) {} // in main.cpp, including main.h
void f3(int p = 0); // in def1.cpp, including main.h
void f3(int p = 1); // in def2.cpp, including main.h
void f3(int p = 2); // in def3.cpp, including main.h
Again, an interesting side note here is that regardless of which declaration I select, the default values are not displayed on the Info Tree anyway, so it probably wouldn't matter if the multiple definitions of p
were reduced to just one. (Unless the lack of default argument display is already an issue on its own...)
Templates
A similar pattern can be observed with records and functions when making them into templates.
Template records
The following code produces no CppRecord
entities, as there is no definition:
template<typename T> struct ts0;
template<typename T> struct ts0;
template<typename T> struct ts0;
The following code produces 1 CppRecord
entity for the one definition:
template<typename T> struct ts1;
template<typename T> struct ts1 { T value; };
template<typename T> struct ts1;
Furthermore, any distinct instantiations of ts1
result in one additional CppRecord
entity for the specialized types (3 in this case):
static void use_ts1()
{
ts1<bool > lv0a;// 1 CppRecord for ts1<bool>
ts1<bool > lv0b;// ts1<bool> already accounted for; no record
ts1<long > lv1a;// 1 CppRecord for ts1<long>
ts1<long > lv1b;// ts1<long> already accounted for; no record
ts1<float> lv2a;// 1 CppRecord for ts1<float>
ts1<float> lv2b;// ts1<float> already accounted for; no record
}
Similarly, having different names for template parameters has no effect. The code below still produces 1 CppRecord
entity:
template<typename T0> struct ts2;
template<typename T1> struct ts2;
template<typename T2> struct ts2;
template<typename T > struct ts2 { T value; };
And instantiating it produces another 3 CppRecord
entities:
static void use_ts2()
{
ts2<bool > lv0a;// 1 CppRecord for ts2<bool>
ts2<bool > lv0b;// ts2<bool> already accounted for; no record
ts2<long > lv1a;// 1 CppRecord for ts2<long>
ts2<long > lv1b;// ts2<long> already accounted for; no record
ts2<float> lv2a;// 1 CppRecord for ts2<float>
ts2<float> lv2b;// ts2<float> already accounted for; no record
}
An interesting side node is that the Info Tree displays ts2
correctly regardless of which declaration I select. Also, when I inspect the type of its value
member, it says T
. This means that the query (correctly) returns the name corresponding to the definition, and not the alternatives in the declarations (e.g. T0
, T1
, or T2
). This is in contrast with the behavior observed in non-template functions where the first occurrence of the parameter name is returned.
A similar logic applies to having different defaults for the template parameters.
The following code produces 5 CppRecord
entities: 1 for the definition in main.h and 4 for the instantiations in the cpp-s:
// in a shared main.h header:
template<typename T> struct ts3 { T value; }; // 1 CppRecord for ts3
// in main.cpp, including main.h:
template<typename T> struct ts3; // no CppRecord; declaration only
static void use_ts3()
{
ts3<int> ta;// 1 CppRecord for ts3<int>
ts3<int> tb;// ts3<int> already accounted for; no record
}
// in def1.cpp, including main.h:
template<typename T = bool> struct ts3;// no CppRecord; declaration only
static void use_ts3()
{
ts3<> ta;// 1 CppRecord for ts3<bool>
ts3<> tb;// ts3<bool> already accounted for; no record
}
// in def2.cpp, including main.h:
template<typename T = long> struct ts3;// no CppRecord; declaration only
static void use_ts3()
{
ts3<> ta;// 1 CppRecord for ts3<long>
ts3<> tb;// ts3<long> already accounted for; no record
}
// in def3.cpp, including main.h:
template<typename T = float> struct ts3;// no CppRecord; declaration only
static void use_ts3()
{
ts3<> ta;// 1 CppRecord for ts3<float>
ts3<> tb;// ts3<float> already accounted for; no record
}
Template functions
The following code produces 3 CppFunction
entities, even though they are all just declarations:
template<typename T> void tf0();
template<typename T> void tf0();
template<typename T> void tf0();
The following code also produces 3 CppFunction
entities, even though only one of them is the definition:
template<typename T> void tf1();
template<typename T> void tf1() { T value; }
template<typename T> void tf1();
Furthermore, just like with records, any distinct instantiation of tf1
adds an extra CppFunction
(3 in this case):
static void use_tf1()
{
tf1<bool >();// 1 CppFunction for tf1<bool>
tf1<bool >();// tf1<bool> already accounted for; no function
tf1<long >();// 1 CppFunction for tf1<long>
tf1<long >();// tf1<long> already accounted for; no function
tf1<float>();// 1 CppFunction for tf1<float>
tf1<float>();// tf1<float> already accounted for; no function
}
As expected, changing the name of the template parameter does not change the behavior, the code below still produces 4 CppFunction
entities:
template<typename T0> void tf2();
template<typename T1> void tf2();
template<typename T2> void tf2();
template<typename T > void tf2() { T value; }
And any distinct instantiations add further CppFunction
entities (3 in this case):
static void use_tf2()
{
tf2<bool >();// 1 CppFunction for tf2<bool>
tf2<bool >();// tf2<bool> already accounted for; no function
tf2<long >();// 1 CppFunction for tf2<long>
tf2<long >();// tf2<long> already accounted for; no function
tf2<float>();// 1 CppFunction for tf2<float>
tf2<float>();// tf2<float> already accounted for; no function
}
An interesting side note is that whichever declaration of tf2
I select, the Info Tree displays most of the function correctly. However, there is a difference between selecting the definition and any of the declarations:
- Declarations are only displayed in the Declarations section when a declaration is selected.
- No declarations are displayed in the Declarations section if I select the definition.
- The Usage section displays all declarations (including the definition) when a declaration is selected, but none of the instantiations in
use_tf2
. - The same Usage section displays only the definition and some (but not all (!?)) instantiations in
use_tf2
when the definition is selected.
Finally, a rather curious thing happens when we have different defaults for the template parameter.
The following code produces 6 CppFunction
entities: 2 for the definition in main.h (!?) and 1 for the re-declaration in the cpp-s, but no entities for the instantiations/calls (unlike with tf2
):
// in a shared main.h header:
template<typename T> void tf3(T&& t) { T value = t; } // this actually produces 2 CppFunction-s for some reason...
// in main.cpp, including main.h:
template<typename T> void tf3(T&& t);// 1 CppFunction for the declaration
static void use_tf3()
{
tf3<int>(0);// no CppFunction for the instantiation of tf3<int>
tf3<int>(0);// no CppFunction for the instantiation of tf3<int>
}
// in def1.cpp, including main.h:
template<typename T = bool> void tf3(T&& t);// 1 CppFunction for the declaration
static void use_tf3()
{
tf3(0);// no CppFunction for the instantiation of tf3<bool>
tf3(0);// no CppFunction for the instantiation of tf3<bool>
}
// in def2.cpp, including main.h:
template<typename T = long> void tf3(T&& t);// 1 CppFunction for the declaration
static void use_tf3()
{
tf3(0);// no CppFunction for the instantiation of tf3<long>
tf3(0);// no CppFunction for the instantiation of tf3<long>
}
// in def3.cpp, including main.h:
template<typename T = float> void tf3(T&& t);// 1 CppFunction for the declaration
static void use_tf3()
{
tf3(0);// no CppFunction for the instantiation of tf3<float>
tf3(0);// no CppFunction for the instantiation of tf3<float>
}
I do not have a full explanation for why this happens. There might be more to this issue that has to do with the fact that certain re-declarations reside in different compilation units...
from codecompass.
Assumptions on correctness
In VisitRecordDecl
, just before the creation of the CppRecord
structure that will later be persisted, there is the following check:
if (!rd_->isThisDeclarationADefinition())
return true;
This is the piece of code responsible for skipping anything that only needs to happen for the definition of a record. (Which is also why for every struct, only one CppRecord
is present in the database, regardless of the number of declarations.)
Assuming that this is the correct behavior - which is what I would advocate for at first glance anyway - we could add the same early-return check into VisitFunctionDecl
and VisitVarDecl
as well, just before the creation of the respective database entities. Doing so would change the above statistics for variables and functions as follows:
Variables
This code now produces no CppVariable
entities, (as they are all declarations):
extern int v0;
extern int v0;
extern int v0;
For v0
, the Info Tree displays "Variable: int v0", with the Declaration and Usage sections appearing correctly (with correct navigation too). However, the Name, Qualified name and Type fields are not present, as those would apparently need to come from a CppVariable entity, which this variable does not have.
This code now produces only 1 CppVariable
entity for the one definition:
extern int v1;
int v1;
extern int v1;
For v1
, the InfoTree displays all information about the variable as expected, with all fields present. Navigation works correctly regardless of whether a declaration or the definition is selected.
Functions
This code now produces no CppFunction
entities (as they are all declarations):
void f0();
void f0();
void f0();
Similarly to v0
, the Info Tree displays "Function: void f0()", with the Declaration and Usage sections appearing correctly (with correct navigation too). However, the Name, Qualified name and Signature fields are not present, as those would apparently need to come from a CppFunction
entity, which this function does not have.
This code now produces 1 CppFunction
entity for the one definition:
void f1();
void f1() {}
void f1();
Similarly to v1
, the Info Tree displays all information about f1
as expected, with all fields present. Navigation works correctly regardless of whether a declaration or the definition is selected.
This code now also produces 1 CppFunction
entity for f2
, and 4 CppVariable
entities for p0
, p1
, p2
and the unnamed parameter.
void f2(int p0);
void f2(int p1) {}
void f2(int p2);
void f2(int);
Interesting side note: The Parameters section (showing p1
) is only displayed when I select the definition. Apart from this, selecting the definition or a declaration produces the same behavior on the Info Tree.
This code now also produces 1 CppFunction
entity for f3
, and 5 CppVariable
entities for p
.
void f3(int p); // in a shared main.h
void f3(int p) {} // in main.cpp, including main.h
void f3(int p = 0); // in def1.cpp, including main.h
void f3(int p = 1); // in def2.cpp, including main.h
void f3(int p = 2); // in def3.cpp, including main.h
A similar observation can be made here as with f2
: The Parameters section is only displayed when selecting the definition. Apart from that, everything is the same and works just fine. Default values of parameters are still not displayed anywhere, just like in the baseline case (without the modifications in this post).
Template functions
The following code now produces no CppFunction
entities (as they are all declarations):
template<typename T> void tf0();
template<typename T> void tf0();
template<typename T> void tf0();
Info Tree behavior is same as with f0
.
The following code now produces 1 CppFunction
entity for the one definition:
template<typename T> void tf1();
template<typename T> void tf1() { T value; }
template<typename T> void tf1();
Info Tree behavior is same as with f1
.
Furthermore, just like in the baseline case, any distinct instantiation of tf1
adds an extra CppFunction
(3 in this case):
static void use_tf1()
{
tf1<bool >();// 1 CppFunction for tf1<bool>
tf1<bool >();// tf1<bool> already accounted for; no function
tf1<long >();// 1 CppFunction for tf1<long>
tf1<long >();// tf1<long> already accounted for; no function
tf1<float>();// 1 CppFunction for tf1<float>
tf1<float>();// tf1<float> already accounted for; no function
}
As expected, changing the name of the template parameter does not change the behavior, the code below still produces 1 CppFunction
entity:
template<typename T0> void tf2();
template<typename T1> void tf2();
template<typename T2> void tf2();
template<typename T > void tf2() { T value; }
And any distinct instantiations add further CppFunction
entities (3 in this case):
static void use_tf2()
{
tf2<bool >();// 1 CppFunction for tf2<bool>
tf2<bool >();// tf2<bool> already accounted for; no function
tf2<long >();// 1 CppFunction for tf2<long>
tf2<long >();// tf2<long> already accounted for; no function
tf2<float>();// 1 CppFunction for tf2<float>
tf2<float>();// tf2<float> already accounted for; no function
}
Everything that could be said about the behavior of the Info Tree in the baseline case also applies here:
- The Declarations section is only displayed if a declaration is selected.
- The Caller and Local variables section is only displayed if the definition is selected.
- The Usage section does not list all usages; and which ones are displayed depends on whether a declaration or the definition is selected.
Finally, the "curious" case for when we have different defaults for the template parameter:
The following code produces 2 CppFunction
entities or the definition in main.h (!?) and no further entities for the re-declarations in the cpp-s or the instantiations/calls (unlike with tf2
):
// in a shared main.h header:
template<typename T> void tf3(T&& t) { T value = t; } // this actually produces 2 CppFunction-s for some reason...
// in main.cpp, including main.h:
template<typename T> void tf3(T&& t);// no CppFunction; declaration only
static void use_tf3()
{
tf3<int>(0);// no CppFunction for the instantiation of tf3<int>
tf3<int>(0);// no CppFunction for the instantiation of tf3<int>
}
// in def1.cpp, including main.h:
template<typename T = bool> void tf3(T&& t);// no CppFunction; declaration only
static void use_tf3()
{
tf3(0);// no CppFunction for the instantiation of tf3<bool>
tf3(0);// no CppFunction for the instantiation of tf3<bool>
}
// in def2.cpp, including main.h:
template<typename T = long> void tf3(T&& t);// no CppFunction; declaration only
static void use_tf3()
{
tf3(0);// no CppFunction for the instantiation of tf3<long>
tf3(0);// no CppFunction for the instantiation of tf3<long>
}
// in def3.cpp, including main.h:
template<typename T = float> void tf3(T&& t);// no CppFunction; declaration only
static void use_tf3()
{
tf3(0);// no CppFunction for the instantiation of tf3<float>
tf3(0);// no CppFunction for the instantiation of tf3<float>
}
Database size
Apart from the behavioral changes listed above, eliminating duplicate definitions of variables and functions also affects the database size. I did a file size comparison using the SQLite database configuration for TinyXML and Xerces-C before and after applying the changes outlined at the beginning of this post. The results are as follows (sizes are given in bytes):
TinyXML:
Original: 8 785 920
Modified: 8 552 448
Redundancy: 233 472 (2.66%)
Xerces-C:
Original: 217 526 272
Modified: 210 067 456
Redundancy: 7 458 816 (3.43%)
But perhaps more interesting is the comparison between the sizes of the CppVariable and CppFunction tables themselves (e.g. number of records) in these repositories:
TinyXML:
Entity | Original | Modified | Redundancy | Redundancy (%) |
---|---|---|---|---|
Variables | 3159 | 3122 | 37 | 1.17% |
Functions | 1615 | 647 | 968 | 59.94% |
Xerces-C:
Entity | Original | Modified | Redundancy | Redundancy (%) |
---|---|---|---|---|
Variables | 66051 | 64947 | 1104 | 1.67% |
Functions | 34858 | 15717 | 19141 | 54.91% |
Even though the redundancy is not as pronounced in overall database size as I would have expected, it is definitely significant in the size of the CppFunction
table. If applying such simple changes could get rid of over half of the records in this table, then any query iterating over these records would instantly get twice as fast. As the number of metrics added to CodeCompass increases, so does the potential number of queries that might access (iterate or join) the CppFunction
table at some point. And then this difference is definitely going to be significant in the runtime of the metrics plugin.
from codecompass.
The above statistics were exported from my diagnostics branch created specifically for this purpose:
https://github.com/dbukki/CodeCompass/tree/diagnostics
Note: The changes on this branch are meant to serve diagnostic purposes of hypothetical scenarios only (i.e. what would happen to CodeCompass if...?). As expected, for the reasons already outlined in #714 (comment), these changes make certain unit tests fail, thus making the CI for the whole branch fail as well.
The sample codes I used to analyze the CC database can be found at https://github.com/dbukki/cc-tests/tree/master/decl_def
from codecompass.
Related Issues (20)
- Local variables are not shown for methods
- Relational Cohesion at Module Level for C++
- Lack of Cohesion of Methods at Type Level for C++ HOT 3
- Cyclomatic Complexity at Function Level for C++
- Cyclomatic Complexity at Type Level for C++ HOT 13
- Bumpy Road Complexity at Function Level for C++
- Add an option to skip webgui builds HOT 3
- LibODB not added properly to runtime docker images
- Separate parse-time C++ metrics information into distinct tables
- Fix code scanning alert - Token-Permissions
- Newer NodeJS versions are not compatible with Ubuntu 16.04 / SUSE 42.1
- Dynamic parsing job launch based on available memory
- Add C++ linter to the CI workflow
- Macro expansion error in C/C++ for _Pragma
- Parallel parsing in C++ metrics
- Incremental parsing fails with segmentation fault HOT 3
- Long C++ metrics parsing time on Ubuntu 22.04 HOT 1
- Handle duplicated compile commands
- Incorrect ODB installation to webserver Docker images HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from codecompass.