Comments (47)
nearly 4 years past, is there any body work on it? we have lots of legacy code write as non-self-contained style, looking forward this feature, so we can migrate to vscode from VS or CLion as our main IDE. Vscode + Clangd is so good for daily deploping, is three any plan?
from clangd.
My two cents about this:
- You should definitely support non-self contained files. The C++ standard allows it and many big projects (e.g OpenFOAM a.k.a. my usecase) do it. And e.g. the rust language server support non-self contained files in rust.
- As long as you don't have support for non-self contained files, you need to document it. Claiming that clangd can parse C++ is false advertisement, because non-self contained files are part of the C++ standard. You should replace "clangd understands your C++ code" on your website with "clangd understands your C++ code, if you follow these(link here) coding guidlines." Nobody, except for a very small minority, knows this limitation of clangd, but it is important to consider this limitation when choosing your tool. (For example the guys in r/cpp don't know this: https://www.reddit.com/r/cpp/comments/j57se3/tool_for_go_to_definition/ .)
- As long as you don't have support for non-self contained files, you need to faIl LOUDLY. Currently you fail super-silently. Clangd logs
... Updating file .../lib.cpp with command inferred from .../main.cpp
which means that clangd nows that lib.cpp is not self-contained. If you notice that a file is not self contained, you need to write an error message in the logfile with a link to this issue. It would be even better if trying to "Go to Definition" in a non-self contained file would show "This is not a self contained file, see #45" instead of ""No definition found for 'func". Otherwise it will waste a lot of time for everyone trying to figure out why it is not working.
Sincerely,
someone who is pissed off, because he just spent 3 hours creating a minimal example of this bug after noticing that "Go to Definition" does not work for one specific function in a big program.
Slightly OT: Do you know a tool, similar to clangd that even works for non-self-contained files?
from clangd.
I just ran into this problem while trying to look at things like wake_up_var() in today's Linux kernel. This symbol is in a file (wait_bit.c) that is included from within another file (build_utility.c), and so VS Code + clangd cannot browse any of the symbols within wait_bit.c.
This is a serious limitation. If you look at the top of build_utility.c (and also look through "git blame build_utility.c"), you'll notice that this is an intentional, recent effort to reduce build times. And that means that my chances of changing this are basically zero--people want it this way.
So now I'm reduced to maintaining local hacks in order to browse a big chunk of the sched unit in the Linux kernel, with a fair chance that the damage will continue to spread, because this is apparently a popular way to speed up builds.
This is an extremely bad situation that will drive all of us back to horrible older techniques, unless clangd loses the "this is unholy and impure and bad software engineering, so we are not interested" attitude. Please reconsider that, because Real Projects are messy and impure!
from clangd.
Since January I have successfully worked around this with a method I have not seen proposed anywhere else, as outlined in this blog post which I wrote at the time. https://www.frogtoss.com/labs/clangd-with-unity-builds.html
from clangd.
Very good. In that case, I'll tell my compiler engineering colleagues about this, and see if it sparks any interest.
from clangd.
Wouldn't you only want to do the unity build as a final heavily-optimized, slow-building, non-incremental release build, but use a regular build for development?
No, some projects use unity builds for build performance (since it means common includes only have to be parsed once per unified-TU rather than once per individual-TU).
Some projects take this a step further and only support unity builds, since non-unity builds can easily break due to include-what-you-use violations if devs are doing unity builds locally, and the project may not have the CI resources to test both configurations. Such projects don't even have the option of doing a non-unity build specifically for clangd's consumption.
from clangd.
For posterity, ycmd/libclang allows this by allowing users to return something like { 'flags': list_of_flags, 'override_filename': some_file_name }
from the extra confs.
That makes the TU main file be the value of override_filename
, but completion is still requested for the original filename. As for the libclang API, this means:
clang_parseTranslationUnit2FullArgv
gets passed the value ofoverride_filename
clang_codeCompleteAt
gets passed the original filename.
from clangd.
unless clangd loses the "this is unholy and impure and bad software engineering, so we are not interested" attitude. Please reconsider that, because Real Projects are messy and impure!
I believe it's been clarified in later comments (particularly this one) that the project is open to contributions to improve support for non-self-contained files.
It's just that there are non-trivial technical challenges to overcome (as detailed in this comment and this one), and someone invested in improving support for these scenarios will need to put in the legwork to solve them.
from clangd.
another workaround is -include="xxx.h" in compile_commands or clangd file
from clangd.
but use a regular build for development?
one of the main purposes of jumbo TU is to reduce compile time where it's dominated by IO (particularly includes, etc.). At work i benchmarked speedup comparable to the preamble trick on compiling a libarary as a single jumbo vs -j
on a 40 core system. in this case, dominating factor was the SAN
from clangd.
Funny how a request to elevate quality standards, complete with some actionable points (even if not completely on target) is quickly dismissed as "ranting". Says quite a bit about the state of software these days, imo.
Aaanyway, I briefly tried ccls in the past (https://github.com/MaskRay/ccls), and it seemed to at least try in regards to unity builds. The project I did test it on is pretty simple so I cannot really say how thorough that support is though. I had some performance related troubles using it from Vim and so had to look for something else.. might be worth a try in any case.
from clangd.
Self-contained files are good software engineering practice
Maybe, but I can't rewrite all C++ libraries. Jumping to definitions is most useful when exploring unfamiliar external code. I have no influence to force all the authors to implement good practices.
from clangd.
Well yes. This fundamentally applies to any TU which is composed of multiple files (including headers, c, cpp, inc, etc). Clangd models TU=file and as the others have said it's a big refactor to fix it.
from clangd.
Well yes. This fundamentally applies to any TU which is composed of multiple files
You're not wrong, I just wanted to provide another data point to help avoid some kind of unity-build-specific solution (I have no idea how that would arise, but you never know).
from clangd.
Really waiting for this feature to release!
from clangd.
For the record, YCM has just heard from another users who needed override_filename
. This feature can also be supported for users who use compile_commands.json
, if the comp db specification can be extended.
[
{
"directory": "/home/bstaletic/test/",
"command": [ "/usr/bin/c++", "-c", "bar.cpp" ],
"file": "/home/bstaletic/test/bar.cpp",
"override_filename": "/home/bstaletic/test/foo.cpp"
}
]
In the above example:
directory
is the same as it always was.command
is the same - contains the actual command line.file
would contain the file containing the cursor.override_filename
would contain the TU to be parsed
This is just an idea that, frankly, hasn't been given much thought and it doesn't address the problem of wiring this throughout the clangd code.
Considering that the latest cmake has implemented support for unity builds and that the comp db generation is broken when used with unity builds, override_filename
should probably be considered again.
from clangd.
+1000 to this. I think unity builds are gaining more and more traction, and this would massively help a lot of people.
from clangd.
Documentation to llvm 8 said that to achieve what people ask in this bug, one needs to use clangd-indexer. These lines disappeared from llvm 10 documentation, it just says that clangd-indexer is for developers.
Why this feature is so complicated?
from clangd.
Documentation to llvm 8 said that to achieve what people ask in this bug, one needs to use clangd-indexer. These lines disappeared from llvm 10 documentation, it just says that clangd-indexer is for developers.
AFAIK, clangd-indexer has never supported the use cases described in this bug. The "project-wide view" mentioned in the linked doc just refers to e.g. being able to perform Find References and get results in other translation units.
Mention of clangd-indexer has been removed because that functionality is now built into clangd via the background indexing feature.
from clangd.
Yes, what Nathan said.
Why this feature is so complicated?
A few main reasons:
- clangd relies heavily on the preamble optimization: splitting the file in two parts which can be built separately. This is essential for performance (10-1000x speedup on common operations depending on the codebase) and greatly complicates the compilation model. Cross-file preambles will complicate it further.
- preamble building is a part of clang, it's complicated and entangled with other features, none of us understand it well enough that making changes is straightforward.
- clangd relies in many places on clang's notion of the main file, untangling the concepts of "interesting file" and "entrypoint" is a lot of work, and we don't have great ideas for isolating the complexity
- (dons flame-retardant suit) Self-contained files are good software engineering practice (and don't prevent unity builds AFAIK). Most clangd developers are paid by employers who enforce this practice - in part to make tools easier to write. We do spend lots of time on things to make clangd more well-rounded in general even outside what $employer cares about directly, but it affects prioritization for sure, especially for features like this that are probably many months of work.
from clangd.
Wait, Nathan? Eclipse CDT supporter, who suggested to use clangd instead of CDT's cpp-indexer? You actually work on clangd? :)
from clangd.
(and don't prevent unity builds AFAIK)
They kinda do. If your entire codebase is "shoved" into a single TU, your compile_commands.json
contains only that single entry which may or may not work for other directories in that codebase.
I've heard that people work around this problem by running cmake twice (now that cmake actually has built-in support for unity builds). Once with unity off, to get a working compile_commands.json
and once with unity on, to compile the project. This is of no use to those who don't use cmake.
from clangd.
Wait, Nathan? Eclipse CDT supporter, who suggested to use clangd instead of CDT's cpp-indexer? You actually work on clangd? :)
I contribute to it, yeah :) I switched from Eclipse CDT about a year ago.
from clangd.
Self-contained files are good software engineering practice (and don't prevent unity builds AFAIK).
Clangd does not currently work for unity builds (see e.g. #147 (comment)). Perhaps you mean support for them could be added more easily than by solving the challenging issues you described?
Most clangd developers are paid by employers who enforce this practice
To be fair, LLVM itself does use non-self contained files at times. I'm thinking of things like #include "clang/Basic/TokenKinds.def"
.
from clangd.
Clangd does not currently work for unity builds (see e.g. #147 (comment)).
Right, but if you have self-contained files, you don't have to always do a unity build with them. Wouldn't you only want to do the unity build as a final heavily-optimized, slow-building, non-incremental release build, but use a regular build for development?
To be fair, LLVM itself does use non-self contained files at times
LLVM does lots of questionable things :-) Tablegen .def format falls more into the "stupid pp tricks" bucket than sensible C++ code we're just having trouble parsing, though.
from clangd.
What is unity build?
from clangd.
I switched from Eclipse CDT about a year ago
I am switching now. Especially after C++20 CDT became unbearable.
from clangd.
What is unity build?
It's a technique where groups of source files are combined (by e.g. #include
-ing several .cpp files into one file), and the resulting files are compiled, rather than the individual source files. It has several advantages, such as common includes only having to be parsed once per combined source file, more inlining / inter-procedural optimization opportunities, etc. (And also downsides, such as a greater potential to introduce undetected include-what-you-use violations, and incremental builds potentially being slower.)
from clangd.
It's a technique where groups of source files are combined (by e.g.
#include
-ing several .cpp files into one file), and the resulting files are compiled, rather than the individual source files.
Wow. Never've heard of this technique.
from clangd.
@sam-mccall so just to understand this correctly, for the moment this is very hard to implement because of #45 (comment). Does what you said earlier also apply in an unified-build
environment?
from clangd.
(and don't prevent unity builds AFAIK)
They kinda do. If your entire codebase is "shoved" into a single TU, your
compile_commands.json
contains only that single entry which may or may not work for other directories in that codebase.I've heard that people work around this problem by running cmake twice (now that cmake actually has built-in support for unity builds). Once with unity off, to get a working
compile_commands.json
and once with unity on, to compile the project. This is of no use to those who don't use cmake.
Vote for https://gitlab.kitware.com/cmake/cmake/-/issues/19826
from clangd.
which means that clangd nows that lib.cpp is not self-contained.
Ranting aside, no. That only means that lib.cpp
wasn't mentioned in your compile_commands.json
. I don't think there is a way to detect that a file is not sef-contained, other than being told by the user.
Slightly OT: Do you know a tool, similar to clangd that even works for non-self-contained files?
The only thing that I know of that works with non-self-contained files in this regard is ycmd, with the legacy libclang based completion engine (i.e. not ycmd + clangd). (Yes, "shameless plug" and all that.) However, I did call it "legacy" for a reason - there are things that clangd suppports, but ycmd+libclang don't.
from clangd.
Slightly OT: Do you know a tool, similar to clangd that even works for non-self-contained files?
Ironically Microsoft's language server (https://github.com/Microsoft/vscode-cpptools) works flawlessly even for clang and GCC in Linux.
from clangd.
Slightly OT: Do you know a tool, similar to clangd that even works for non-self-contained files?
Ironically Microsoft's language server (https://github.com/Microsoft/vscode-cpptools) works flawlessly even for clang and GCC in Linux.
It also does not support including *.cpp files:
microsoft/vscode-cpptools#6250
from clangd.
Aaanyway, I briefly tried ccls in the past (https://github.com/MaskRay/ccls), and it seemed to at least try in regards to unity builds. The project I did test it on is pretty simple so I cannot really say how thorough that support is though. I had some performance related troubles using it from Vim and so had to look for something else.. might be worth a try in any case.
I'm currently using it. It supports non-self-contained files, but fails in some other cases.
from clangd.
clangd
is better to depend on intermediate result when compiling,
I think it should also be tolerant enough for errors, for it is to assist developers, not for building.
I don't think it's very hard for compiler experts.
from clangd.
clangd
is better to depend on intermediate result when compiling,
That would mean that you have to build it (slow) before you can use clangd.
That being said, that wouldn't be that big of a problem.
from clangd.
clangd
is better to depend on intermediate result when compiling,That would mean that you have to build it (slow) before you can use clangd.
That being said, that wouldn't be that big of a problem.
Optimizations can be done, e.g.
- the compiler works in background threads
- partial data produced by the compiler should be useful to assist developers, and it should support incrementality
- higher priority for recent used files
- cache in memory
- no compiling optimization needed
- ...
It's just possible, and not hard to achieve the results like Visual Stduio.
from clangd.
Without using vscode-cpptools
, is there a solution for the problem on Linux now?
from clangd.
As I've written before, ycmd/libclang supports these use cases, but with libclang we can't do indexing, so no rename, references... ycmd does have a vscode client, but I have no idea how or even if it is maintained. Frankly we still support libclang only because of this missing feature.
from clangd.
A non-update here: this is a hard problem, and we're open to solutions but (AFAIK) none of the active contributors are committed to working on this.
If someone wants to attempt this problem, there's more to do than implementation and it'd be worth discussing design here first.
The design needs to solve some hard constraints:
- There needs to be a clear scope and idea of how it'll solve the problems. For instance:
- does it support files with parse context (e.g. snippets of class bodies like ConfigFragment.inc), or just PP context? If parse context, how is this persisted or recreated? (PCH can't do this today). How does this interact with preambles?
- how is the file-that-provides-context identified? Is the existing index actually good/available enough for this?
- it needs to not add too much complexity, or at least isolate the complexity to a few places. "Priorities and effort" also means this can't become a maintenance timesink.
- it needs to be useful/powerful enough to justify the complexity that adds. (This ~certainly means it needs to be safe to turn on by default eventually)
I'm not certain it's possible to satisfy all these, but it seems like it might be possible.
from clangd.
FWIW my best guess is:
- we'd want to build a preamble for the entrypoint file and then parse everything from there. This involves eliminating from many places the assumptions that a) the preamble is part of the file we care about, b) the file we care about is the "main file" as far as SourceManager is concerned.
- a significant challenge with this solution is the performance where the fragment file has
#include
s which are not part of the entrypoint file's preamble - the include graph in the background index or the includer cache in TUScheduler are a reasonable initial step for identifying entrypoints but we'll eventually want something better built by fast include-scanning.
from clangd.
it needs to not add too much complexity, or at least isolate the complexity to a few places. "Priorities and effort" also means this can't become a maintenance timesink.
Ycmd's implementation was not a timesink at all. In fact, no maintenance was needed after the initial PR. Our implementation does mean that needed a path to the main file now needs two paths - file path to be parsed and a file path of the cursor location.
from clangd.
it needs to not add too much complexity, or at least isolate the complexity to a few places. "Priorities and effort" also means this can't become a maintenance timesink.
Ycmd's implementation was not a timesink at all. In fact, no maintenance was needed after the initial PR. Our implementation does mean that needed a path to the main file now needs two paths - file path to be parsed and a file path of the cursor location.
To confirm: the ycmd implementation to support this took me less than an hour. (apropos nothing; ycmd libclang engine is an invisible fraction of complexity compared with clangd)
from clangd.
Just to add a data point to this that is not about unity builds: I have seen and used this style for DSP and embedded projects, where
- there is a C module exposing a small number of API functions in
module_a/include/module_a.h
module_a/src/module_a.c
contains those function with external linkage, but also extra helper functions with internal linkage (ie.static
functions)- we want to unit test the helper functions, because they're sufficiently complex to warrant it
I mention DSP and embedded as a context because it relates to point #2. It seems like one solution to this would be to have the helper functions have external linkage, and have separate private header files for them (or just put them in a separate module). Unit tests could then use that private module/header's API.
But this means that the functions have to appear in the symbol table, and can't be inlined, or can be inlined but need to be duplicated as non-inlined versions, even though they're not meant to be public. It's bloat and pessimisation we want to avoid for DSP/embedded applications.
Another solution that avoids this is to consider that the unit tests for that module form a kind of single-TU project (or subproject), where only the main function (or top-level test runner function) is exposed, and it has the same set of static "helper" functions as the real project. And so your unit tests include the implementation file (module_a/src/module_a.c
):
// module_a/test/test_module_a.c
#include "unit_test_harness_whatever.h"
#include "test_module_a.h"
// Note include of implementation file!
#include "module_a/src/module_a.c"
static void test_module_a_helper_x_but_not_y(void) { /* ... */ }
static void test_module_a_helper_annoying_corner_case(void) { /* ... */ }
void test_module_a_all(void) {
RUN_TEST(test_module_a_helper_x_but_not_y);
RUN_TEST(test_module_a_helper_annoying_corner_case);
}
Yes it's niche, yes it looks jarring and unidiomatic at first, but it's valid, it makes sense and works in these contexts, and I've seen it in real, professional, high-quality C code bases. (It also tests that the implementation file is appropriately self-contained!)
There are other solutions to this as well, eg. #define
your linkage keyword to be static
for library builds but empty for unit test builds. But that has drawbacks, and the benefits of support from a tool that hadn't been invented yet couldn't really be factored in to the decision (and probably aren't motivating enough to do retroactively).
from clangd.
- clangd relies heavily on the preamble optimization: splitting the file in two parts which can be built separately. This is essential for performance (10-1000x speedup on common operations depending on the codebase) and greatly complicates the compilation model. Cross-file preambles will complicate it further.
Ok, I'm trying to understand the problem a bit more deeply. Is this a fair restatement of what you said above?:
Let's say we have CPython, which is a bit of a mess of non-self-contained headers, but basically the external interface is that anyone using it is supposed to #include <Python.h>
. Probably that header is approximately the only header in the project that can be assumed to be self-contained.
While it would be possible to devise a mechanism to allow the user to say "do not even try to parse this header as a TU, parse Python.h
instead"; but this would necessarily result in abysmal performance at least when someone edits one of those headers, as most of the entire Python.h
needs to be reparsed. (For users, of course, I'd assume that they don't edit the headers often. For CPython developers, they might.)
Or would it even affect performance for people just using <Python.h>
without going deeper into the private headers?
preamble building is a part of clang, it's complicated and entangled with other features, none of us understand it well enough that making changes is straightforward.
clangd relies in many places on clang's notion of the main file, untangling the concepts of "interesting file" and "entrypoint" is a lot of work, and we don't have great ideas for isolating the complexity
Is this more of a problem of "where do we get that information"? I noticed that there's support for IWYU pragmas for specifying things like "this header is private; if you want its symbols, always include header X instead". That seems very much the same kind of configuration data that would be needed by this, but I think it's currently only used for flagging non-direct includes and automatically adding includes.
Or is it something more nuanced?
- (dons flame-retardant suit) Self-contained files are good software engineering practice (and don't prevent unity builds AFAIK). Most clangd developers are paid by employers who enforce this practice - in part to make tools easier to write. We do spend lots of time on things to make clangd more well-rounded in general even outside what $employer cares about directly, but it affects prioritization for sure, especially for features like this that are probably many months of work.
No arguments here. I'm a big enough fan of self-contained headers and even C++ modules that I'd generally be willing to spend a non-trivial amount of time to clean up things. With projects like CPython, I suspect selling a major header file refactoring to CPython upstream might be an uphill battle, even if they see the value in good IDE support...
from clangd.
FWIW my best guess is:
we'd want to build a preamble for the entrypoint file and then parse everything from there. This involves eliminating from many places the assumptions that a) the preamble is part of the file we care about, b) the file we care about is the "main file" as far as SourceManager is concerned.
a significant challenge with this solution is the performance where the fragment file has
#include
s which are not part of the entrypoint file's preamblethe include graph in the background index or the includer cache in TUScheduler are a reasonable initial step for identifying entrypoints but we'll eventually want something better built by fast include-scanning.
I now see that some of my previous comment's questions were addressed in this comment, which I somehow missed on the first skim... Sorry about the noise!
from clangd.
Related Issues (20)
- Unused header warning for placement new
- headers from pch are not recognized inside project HOT 8
- clangd does not detect C++17 or newer features when compiled with CMake 3.28 + Ninja on Windows HOT 8
- omit `self` when clangd compltes member function with explicit self HOT 4
- Include Cleaner fails for symbols forward-declared and aliased
- Include cleaner recommends removing includes needed for casts to related types
- Unsupported argument 'cortex-a76.cortex-a55' to option '-mcpu='clang(drv_unsupported_option_argument) HOT 1
- clangd find references, refactor rename works only with opened files HOT 6
- Clangd sometimes fails to deduce non-dependent type inside template HOT 3
- Clangd throws error on working code with CUDA HOT 6
- Include Cleaner doesn't include headers needed by default template arguments HOT 1
- Include Cleaner doesn't recognize uses of `operator bool`
- Include Cleaner recommends removing header needed by template instantiation
- Allocated memory is leaked HOT 3
- Missing go-to-definition on auto types deduced as pointers
- <cerrno> is suggested instead of <errno.h> for errno values when `-std=c99`
- Preserve white spaces and new lines from the comments in hover request
- board support package includes HOT 3
- Folding is not available for preprocessor blocks HOT 1
- clangd loaded unexpected compilation database when multiple instances were launched.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clangd.