parallel101 / course Goto Github PK
View Code? Open in Web Editor NEW高性能并行编程与优化 - 课件
Home Page: https://space.bilibili.com/263032155
License: Other
高性能并行编程与优化 - 课件
Home Page: https://space.bilibili.com/263032155
License: Other
slides/thread/mtqueue.hpp中,std::optional try_pop_until函数中,使用的是m_cv_empty.wait_for而不是wait_until。
小彭老师您好,我想请教一个static和static inline的问题,我网上查询的是如果这两个来修饰函数是几乎没有区别的,一下是我对两者的简单理解:
multi define
了。inline
属于那种如果多文件引用的话引用的是同一个函数,但是用static修饰的话那我们多个源文件引用就会每个文件都拷贝一份函数。
static
修饰的函数那么每个源文件内部的变量是同一个。我在github上阅读的很多开源代码都是在头文件内定义函数的时候都是使用的static inline
,我奇怪的一点是如果上面的理解是对的话,为什么还要加static,这样的话岂不是每个引用头文件的源文件都会有一个自己的函数拷贝?不如仅仅使用inline来所有源文件共享一个不是更好吗?或者说使用static inline
相比之下还有什么更深的考虑吗?
谢谢小彭老师!
08/06_thrust/01/main.cu文件里面include <thrust/universal_vector.h>
显示找不到 <thrust/universal_vector.h>
官方thrust库里面的"universal_allocator.h", "universal_ptr.h", "universal_vector.h"都没有在我电脑cuda的include的文件夹里面。是我安装cuda有问题吗?(2022年安装的cuda)还是更新了吗?#include <thrust/device_vector.h> #include <thrust/host_vector.h>
这些没有问题。
老师您好,我现在有一个周期性的gpu任务,但是我用nvidia-smi工具发现,在执行完一次任务等待第二次执行该任务的间隔内,gpu功耗并没有降到静态功耗,这是什么原因,该如何解决呢?我的gpu静态功耗是29W,执行任务时功耗会升至120W,执行完功耗就保持87W左右,再执行任务又升至120W。
我在ubuntu20.04上尝试跑了一下07这个代码,出来结果有点奇怪,请问这是什么原因?
#include <cstdio>
#include <mutex>
std::mutex mtx1;
int main() {
if (mtx1.try_lock())
printf("succeed\n");
else
printf("failed\n");
if (mtx1.try_lock())
printf("succeed\n");
else
printf("failed\n");
mtx1.unlock();
return 0;
}
输出结果都是succeed,按理应该先是succeed后failed
succeed
succeed
# ubuntu20.04
# linux 内核: 5.13.0-48-generic
# gcc 11.0
/sparse_data_struct/00.cpp: In instantiation of ‘static void RootGrid<T, Layout>::_write(Node&, int, int, T) [with Node = const HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >; T = char; Layout = HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >]’:
/sparse_data_struct/00.cpp:158:22: required from ‘void RootGrid<T, Layout>::write(int, int, T) const [with T = char; Layout = HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >]’
/sparse_data_struct/00.cpp:197:17: required from here
/sparse_data_struct/00.cpp:152:37: 错误: passing ‘const HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >’ as ‘this’ argument discards qualifiers [-fpermissive]
152 | auto *child = node.touch(x >> node.bitShift, y >> node.bitShift);
| ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/sparse_data_struct/00.cpp:87:11: 附注: 在调用‘Node* HashBlock<Node>::touch(int, int) [with Node = PointerBlock<11, DenseBlock<8, PlaceData<char> > >]’时
87 | Node *touch(int x, int y) {
| ^~~~~
/sparse_data_struct/00.cpp: In instantiation of ‘void DenseBlock<Bshift, Node>::foreach(const Func&) [with Func = RootGrid<char, HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > > >::_foreach<DenseBlock<8, PlaceData<char> >, main()::<lambda(int, int, char&)> >(DenseBlock<8, PlaceData<char> >&, int, int, const main()::<lambda(int, int, char&)>&)::<lambda(int, int, auto:1*)>; int Bshift = 8; Node = PlaceData<char>]’:
/sparse_data_struct/00.cpp:170:32: required from ‘static void RootGrid<T, Layout>::_foreach(Node&, int, int, const Func&) [with Node = DenseBlock<8, PlaceData<char> >; Func = main()::<lambda(int, int, char&)>; T = char; Layout = HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >]’
/sparse_data_struct/00.cpp:171:25: required from ‘RootGrid<char, HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > > >::_foreach<PointerBlock<11, DenseBlock<8, PlaceData<char> > >, main()::<lambda(int, int, char&)> >(PointerBlock<11, DenseBlock<8, PlaceData<char> > >&, int, int, const main()::<lambda(int, int, char&)>&)::<lambda(int, int, auto:1*)> [with auto:1 = DenseBlock<8, PlaceData<char> >]’
/sparse_data_struct/00.cpp:60:25: required from ‘void PointerBlock<Bshift, Node>::foreach(const Func&) [with Func = RootGrid<char, HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > > >::_foreach<PointerBlock<11, DenseBlock<8, PlaceData<char> > >, main()::<lambda(int, int, char&)> >(PointerBlock<11, DenseBlock<8, PlaceData<char> > >&, int, int, const main()::<lambda(int, int, char&)>&)::<lambda(int, int, auto:1*)>; int Bshift = 11; Node = DenseBlock<8, PlaceData<char> >]’
/sparse_data_struct/00.cpp:170:32: required from ‘static void RootGrid<T, Layout>::_foreach(Node&, int, int, const Func&) [with Node = PointerBlock<11, DenseBlock<8, PlaceData<char> > >; Func = main()::<lambda(int, int, char&)>; T = char; Layout = HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >]’
/sparse_data_struct/00.cpp:171:25: required from ‘RootGrid<char, HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > > >::_foreach<HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >, main()::<lambda(int, int, char&)> >(HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >&, int, int, const main()::<lambda(int, int, char&)>&)::<lambda(int, int, auto:1*)> [with auto:1 = PointerBlock<11, DenseBlock<8, PlaceData<char> > >]’
/sparse_data_struct/00.cpp:102:17: required from ‘void HashBlock<Node>::foreach(const Func&) [with Func = RootGrid<char, HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > > >::_foreach<HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >, main()::<lambda(int, int, char&)> >(HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >&, int, int, const main()::<lambda(int, int, char&)>&)::<lambda(int, int, auto:1*)>; Node = PointerBlock<11, DenseBlock<8, PlaceData<char> > >]’
/sparse_data_struct/00.cpp:170:32: required from ‘static void RootGrid<T, Layout>::_foreach(Node&, int, int, const Func&) [with Node = HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >; Func = main()::<lambda(int, int, char&)>; T = char; Layout = HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >]’
/sparse_data_struct/00.cpp:178:17: required from ‘void RootGrid<T, Layout>::foreach(const Func&) [with Func = main()::<lambda(int, int, char&)>; T = char; Layout = HashBlock<PointerBlock<11, DenseBlock<8, PlaceData<char> > > >]’
/sparse_data_struct/00.cpp:201:15: required from here
/sparse_data_struct/00.cpp:27:28: 错误: no match for ‘operator*’ (operand type is ‘PlaceData<char>’)
27 | func(x, y, *m_data[x][y]);
|
修改后 DenseBlock & HashBlock后可以编译跑通:
template <int Bshift, class Node>
struct DenseBlock {
static constexpr bool isPlace = false;
static constexpr bool bitShift = Bshift;
static constexpr int B = 1 << Bshift;
static constexpr int Bmask = B - 1;
Node m_data[B][B];
Node *fetch(int x, int y) const {
return &m_data[x & Bmask][y & Bmask];
}
Node *touch(int x, int y) {
return &m_data[x & Bmask][y & Bmask];
}
// change_00 : * -> &
template <class Func>
void foreach(Func const &func) {
for (int x = 0; x < B; x++) {
for (int y = 0; y < B; y++) {
func(x, y, &m_data[x][y]);
}
}
}
};
template <class Node>
struct HashBlock {
static constexpr bool isPlace = false;
static constexpr bool bitShift = 0;
struct MyHash {
std::size_t operator()(std::tuple<int, int> const &key) const {
auto const &[x, y] = key;
return (x * 2718281828) ^ (y * 3141592653);
}
};
//change_01 <Node> -> std::unique_ptr<Node>
std::unordered_map<std::tuple<int, int>, std::unique_ptr<Node>, MyHash> m_data;
// change_02 : it->second.get();
Node *fetch(int x, int y) const {
auto it = m_data.find(std::make_tuple(x, y));
if (it == m_data.end())
return nullptr;
return it->second.get();
}
Node *touch(int x, int y) {
auto it = m_data.find(std::make_tuple(x, y));
if (it == m_data.end()) {
std::unique_ptr<Node> ptr = std::make_unique<Node>();
auto rawptr = ptr.get();
m_data.emplace(std::make_tuple(x, y), std::move(ptr));
return rawptr;
}
return it->second.get();
}
//change_03 &block -> unique_node.get() (unique_Node = block)
template <class Func>
void foreach(Func const &func) {
for (auto &[key, unique_Node]: m_data) {
auto &[x, y] = key;
func(x, y, unique_Node.get());
}
}
};
在运行时遇到内存不断增加的状况。 当N 为(2 * 2) 时,使用valgrind 和 massif-visualizer 查看内存,构造的空间足足有96M, 可能是代码改错了,还请小彭老师看看。
为了方便在我自己平台的跑起来,将main中的foreach改成了和之前的代码一样。
int count = 0;
a->foreach([&] (int x, int y, char &value) {
if (value != 0) {
count++;
}
});
printf("count: %d\n", count);
结果:运行内存64g全部跑满,最后core dump。
这个可视化的图是在代码运行占到内存一半的时候,ctrl + c 后得到的信息图。
count: 109
main: 14.5131s
最后想问一下小彭老师,这个代码没有给错吗,为什么会不停的申请内存导致最后完全不够用呢。
首先是在WSL上跑的,Vemmem把32g内存全部占满勒,就卡住了。
接下来换了台换了个小服务器跑,也是cpu和内存近乎跑满,然后这个程序直接被kill掉了。
呜呜呜,太菜勒,完全不知道为啥跑不通。
std::string src = "\r\nabc\r\r\r\r\r\r\r\r123456\nABCDEF\r\n\r\n\r\r\r\r";
std::remove_if(src.begin(), src.end(), [](char c) {
return c == '\r';
});
auto size = src.size();
//我认为应该是 \nabc123456\nABCDEF\n\n
//实际却是:src = "\nabc123456\nABCDEF\n\n\nABCDEF\r\n\r\n\r\r\r\r"
最近在学习彭老师的课程,学到了 string其实也有迭代器。然后就找了一个remove_if函数试了一下。
请彭老师帮忙解答一下。这是为什么呢?
我的电脑是win10 64位, 使用的是virtual stdio 2022 preview
环境:ubuntu22.04
gcc:9
运行时出现/usr/include/c++/9/bits/stl_vector.h(130): error: no instance of constructor "CudaAllocator::CudaAllocator [with T=float]" matches the argument list
detected during instantiation of "std::_Vector_base<_Tp, _Alloc>::_Vector_impl::_Vector_impl() [with _Tp=float, _Alloc=CudaAllocator]"
(337): here
这个问题
#include "print.h"
template<typename T>
class TypeToID{
public:
static int const ID=-1;
};
template<> class TypeToID<void*>{
public:
static int const ID=1;
};
int main(){
print(TypeToID<void *>::ID);
return 0;
}
链接报错 找不到符号
.rdata$.refptr._ZN8TypeToIDIPvE2IDE[.refptr._ZN8TypeToIDIPvE2IDE]+0x0): undefined reference to `TypeToID<void*>::ID'
三维数组vector初始化大小应为n * n * n
std::vector a(n * n); ---------》 std::vector a(n * n *n);
course/05/03_mutex/04/main.cpp
Line 12 in fe22cd6
OS: Win11
cmake version 3.26.4
CUDA version 12.1
编译过程:
CMake Warning at C:/Program Files/CMake/share/cmake-3.26/Modules/FindBoost.cmake:1384 (message):
New Boost version may have incorrect or missing dependencies and imported
targets
Call Stack (most recent call first):
C:/Program Files/CMake/share/cmake-3.26/Modules/FindBoost.cmake:1508 (_Boost_COMPONENT_DEPENDENCIES)
C:/Program Files/CMake/share/cmake-3.26/Modules/FindBoost.cmake:2119 (_Boost_MISSING_DEPENDENCIES)
vcpkg/installed/x64-windows/share/boost/vcpkg-cmake-wrapper.cmake:11 (_find_package)
vcpkg/scripts/buildsystems/vcpkg.cmake:813 (include)
vcpkg/installed/x64-windows/share/openvdb/FindOpenVDB.cmake:504 (find_package)
vcpkg/installed/x64-windows/share/openvdb/vcpkg-cmake-wrapper.cmake:10 (_find_package)
vcpkg/scripts/buildsystems/vcpkg.cmake:813 (include)
CMakeLists.txt:16 (find_package)
-- Found Boost: D:/Projects/parallel101/course/09/01_texture/08/vcpkg/installed/x64-windows/include (found version "1.83.0") found components: iostreams regex
-- Found ZLIB: optimized;D:/Projects/parallel101/course/09/01_texture/08/vcpkg/installed/x64-windows/lib/zlib.lib;debug;D:/Projects/parallel101/course/09/01_texture/08/vcpkg/installed/x64-windows/debug/lib/zlibd.lib (found version "1.3.0")
-- Found OpenVDB 10.0.0 at D:/Projects/parallel101/course/09/01_texture/08/vcpkg/installed/x64-windows/lib/openvdb.lib
-- Configuring done (7.6s)
-- Generating done (0.1s)
CMake Warning:
Manually-specified variables were not used by the project:
CMAKE_TOOLCHAIN_FILE
-- Build files have been written to: D:/Projects/parallel101/course/09/01_texture/08/build`
D:\Projects\parallel101\course\09\01_texture\08\build>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsof
t Visual Studio\2022\Professional\VC\Tools\MSVC\14.34.31933\bin\HostX64\x64" -x cu -ID:\Projects\parallel101\course\09\01_texture\08. -ID:\Projects\parallel101\course\09\01_
texture\08....\include -I"D:\Projects\parallel101\course\09\01_texture\08\vcpkg\installed\x64-windows\include" -I"D:\Projects\parallel101\course\09\01_texture\08\vcpkg\instal
led\x64-windows\include\Imath" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart s
tatic -std=c++17 --generate-code=arch=compute_52,code=[compute_52,sm_52] --extended-lambda --expt-relaxed-constexpr /EHsc -Xcompiler="/EHsc -Ob0 -Zi" -g -D_WINDOWS -DOPENVDB_D
LL -D_WIN32 -DNOMINMAX -DOPENVDB_ABI_VERSION_NUMBER=10 -DOPENVDB_USE_DELAYED_LOADING -DIMATH_DLL -DTBB_USE_DEBUG -D"CMAKE_INTDIR="Debug"" -D_MBCS -DWIN32 -D_WINDOWS -DOPENVDB
_DLL -D_WIN32 -DNOMINMAX -DOPENVDB_ABI_VERSION_NUMBER=10 -DOPENVDB_USE_DELAYED_LOADING -DIMATH_DLL -DTBB_USE_DEBUG -D"CMAKE_INTDIR="Debug"" -Xcompiler "/EHsc /W1 /nologo /Od
/FS /Zi /RTC1 /MDd " -Xcompiler "/Fdmain.dir\Debug\vc143.pdb" -o main.dir\Debug\main.obj "D:\Projects\parallel101\course\09\01_texture\08\main.cu"
nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified
C:\Program Files\Microsoft Visual Studio\2022\Professional\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.1.targets(799,9): error MSB3721: 命令“"C:\Program Files\NVIDIA GPU
Computing Toolkit\CUDA\v12.1\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.34.31933\bin\HostX64\x64" -x cu
-ID:\Projects\parallel101\course\09\01_texture\08. -ID:\Projects\parallel101\course\09\01_texture\08....\include -I"D:\Projects\parallel101\course\09\01_texture\08\vcpkg\insta
lled\x64-windows\include" -I"D:\Projects\parallel101\course\09\01_texture\08\vcpkg\installed\x64-windows\include\Imath" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.
1\include" --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -std=c++17 --generate-code=arch=compute_52,code=[compute_52,sm_52] --extended-lambda
--expt-relaxed-constexpr /EHsc -Xcompiler="/EHsc -Ob0 -Zi" -g -D_WINDOWS -DOPENVDB_DLL -D_WIN32 -DNOMINMAX -DOPENVDB_ABI_VERSION_NUMBER=10 -DOPENVDB_USE_DELAYED_LOADING -DIMATH_
DLL -DTBB_USE_DEBUG -D"CMAKE_INTDIR="Debug"" -D_MBCS -DWIN32 -D_WINDOWS -DOPENVDB_DLL -D_WIN32 -DNOMINMAX -DOPENVDB_ABI_VERSION_NUMBER=10 -DOPENVDB_USE_DELAYED_LOADING -DIMATH_
DLL -DTBB_USE_DEBUG -D"CMAKE_INTDIR="Debug"" -Xcompiler "/EHsc /W1 /nologo /Od /FS /Zi /RTC1 /MDd " -Xcompiler "/Fdmain.dir\Debug\vc143.pdb" -o main.dir\Debug\main.obj "D:\Proj
ects\parallel101\course\09\01_texture\08\main.cu"”已退出,返回代码为 1。 [D:\Projects\parallel101\course\09\01_texture\08\build\main.vcxproj]`
谢谢!
unsigned long long int i1 = -1;
print(i1);
预期:
-1
实际:
18446744073709551615
环境 VS2022 cuda12.2
代码为08课04节
Line 7 in 2d30da6
报错信息
[build] MSBuild version 17.4.0+18d5aef85 for .NET Framework
[build] Compiling CUDA source file ..\src\allocator.cu...
[build]
[build] C:\Dev\mgxpbd\build>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\bin\HostX64\x64" -x cu -IC:\Dev\mgxpbd\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include" --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static --generate-code=arch=compute_52,code=[compute_52,sm_52] -std=c++17 -Xcompiler="/EHsc -Ob0 -Zi" -g -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W1 /nologo /Od /FS /Zi /RTC1 /MDd " -Xcompiler "/Fdallocator.dir\Debug\vc143.pdb" -o allocator.dir\Debug\allocator.obj "C:\Dev\mgxpbd\src\allocator.cu"
[build] C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vector(2125): error : no suitable user-defined conversion from "CudaAllocator<int>" to "CudaAllocator<std::_Container_proxy>" exists [C:\Dev\mgxpbd\build\allocator.vcxproj]
[build] auto&& _Alproxy = static_cast<_Rebind_alloc_t<_Alty, _Container_proxy>>(_Al);
[build] ^
[build] detected during:
[build] instantiation of "void std::vector<_Ty, _Alloc>::_Construct_n(std::vector<_Ty, _Alloc>::size_type, _Valty &&...) [with _Ty=int, _Alloc=CudaAllocator<int>, _Valty=<>]" at line 683
[build] instantiation of "std::vector<_Ty, _Alloc>::vector(std::vector<_Ty, _Alloc>::size_type, const _Alloc &) [with _Ty=int, _Alloc=CudaAllocator<int>]" at line 30 of C:\Dev\mgxpbd\src\allocator.cu
[build]
[build] C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vector(832): error : no suitable user-defined conversion from "CudaAllocator<int>" to "CudaAllocator<std::_Container_proxy>" exists [C:\Dev\mgxpbd\build\allocator.vcxproj]
[build] auto&& _Alproxy = static_cast<_Rebind_alloc_t<_Alty, _Container_proxy>>(_Getal());
[build] ^
[build] detected during instantiation of "std::vector<_Ty, _Alloc>::~vector() noexcept [with _Ty=int, _Alloc=CudaAllocator<int>]" at line 30 of C:\Dev\mgxpbd\src\allocator.cu
[build]
[build] C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vector(833): error : no instance of function template "std::_Delete_plain_internal" matches the argument list [C:\Dev\mgxpbd\build\allocator.vcxproj]
[build] argument types are: (<error-type>, std::_Container_proxy *)
[build] _Delete_plain_internal(_Alproxy, ::std:: exchange(_Mypair._Myval2._Myproxy, nullptr));
[build] ^
[build] detected during instantiation of "std::vector<_Ty, _Alloc>::~vector() noexcept [with _Ty=int, _Alloc=CudaAllocator<int>]" at line 30 of C:\Dev\mgxpbd\src\allocator.cu
[build]
[build] 3 errors detected in the compilation of "C:/Dev/mgxpbd/src/allocator.cu".
[build] allocator.cu
[build] C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\BuildCustomizations\CUDA 12.2.targets(799,9): error MSB3721: 命令“"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\bin\HostX64\x64" -x cu -IC:\Dev\mgxpbd\include
cmake如何生成XXXTargets.cmake文件
在08课程04_sugar\01下面生成 visual studio 2022 项目之后,运行报错如下:
“no suitable user-defined conversion from "CudaAllocator" to "CudaAllocatorstd::_Container_proxy" exists”,请问老师如何修改?
intel出的多线程框架在arm处理器M1 mac上支持运行吗?
Lines 32 to 35 in c8787cf
课件这里的写法是复杂又容易出错的,其实我们可以采取下面这样更安全的方式:
//C* raw_p = p.get(); // no need
func(std::make_unique<C>(*p)); // deep copy
p->do_something(); // OK, run normally
虽然 std::unique_ptr
删除了 copy constructor 和 copy assignment operator ,但其实我们可以借助解引用操作变通地对 std::unique_ptr
进行拷贝。
deep copy 示例如下:
std::unique_ptr<std::string> up1(std::make_unique<std::string>("Good morning"));
// copy construct!
std::unique_ptr<std::string> up2(std::make_unique<std::string>(*up1));
// safe copy construct!
std::unique_ptr<std::string> up3(up1 ? std::make_unique<std::string>(*up1) : nullptr);
// copy assignment!
up2 = std::make_unique<std::string>(*up1);
// safe copy assignment!
up3 = up1 ? std::make_unique<std::string>(*up1) : nullptr;
其它的例证:
以下代码不适用于clang15+msvc(vs2022)的情形
#if defined(_MSC_VER)
size_t pos = s.find(',');
pos += 1;
size_t pos2 = s.find('>', pos);
#else
在执行这一段前,s的值为:"const char *__cdecl 函数名 [T = 枚举, N = 枚举::枚举常量]"
我现在改为以下代码可以正常使用:
#if defined(_MSC_VER) && !defined(__clang__)
size_t pos = s.find(',');
pos += 1;
size_t pos2 = s.find('>', pos);
#elif defined(__clang__)
size_t pos = s.find("N = ");
pos += 1;
size_t pos2 = s.find(']', pos);
#else
size_t pos = s.find("N = ");
pos += 4;
size_t pos2 = s.find_first_of(";]", pos);
#endif
(等有人来了再细分吧
p01 = 80min = 5min * 16
p02 = 135min = 5min * 27
p03 = 110min = 5min * 22
p04 = 112min = 5min * 22
目前我估测 5min 视频大约对应 30~40 句话,我自己因为很熟练大概需要半小时能校对完。时间仅供参考。
下面两行代码是不是写反了:
Lines 13 to 14 in 3940bba
是不是应该先获取裸指针,再使用std::move
移交拥有权,如下:
child->m_parent = parent.get();
parent->m_child = std::move(child); // 移交 child 的所属权给 parent
我的代码中,在vs2022中,凡是使用CudaAllocator 的地方都会编译出错:“no instance of constructor "CudaAllocator::CudaAllocator [with T=int]" matches the argument list main” ,不知道是为什么?
”
hi, 小彭老师好。关于 07/03_prefetch/06 例子运行结果我有一些疑问,望指正。
我的平台是 Intel i5-13500, Ubuntu 24.04, gcc version 13.2.0
在运行 07/03_prefetch/06 这个例子时,
去掉例子中的 #pragma omp parallel for 才能得到与课程中类似的结果。我不清楚 #pragma omp parallel for 是否除了并行之外还有其他的优化?
从运行结果可以看到,BM_write_stream_then_read 跟 BM_write_streamed 运行耗时相近,似乎读对 stream 指令并没有影响
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_read 25228152 ns 18180668 ns 38
BM_write 32696238 ns 25309548 ns 33
BM_write_streamed 19530899 ns 17132181 ns 36
BM_write_stream_then_read 19586335 ns 17525509 ns 43
BM_write_streamed_ps 19550735 ns 14485110 ns 39
BM_write_streamed_ps_skipped 37094026 ns 26238143 ns 26
BM_read_and_write 36829027 ns 33520956 ns 22
从运行结果可以看到,BM_write_stream_then_read 运行耗时显著比 BM_write_streamed 长
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_read 38213301 ns 38207623 ns 19
BM_write 52209723 ns 52203705 ns 13
BM_write_streamed 34738316 ns 34735390 ns 20
BM_write_stream_then_read 40930259 ns 40927256 ns 17
BM_write_streamed_ps 17725541 ns 17724305 ns 36
BM_write_streamed_ps_skipped 36891533 ns 36889477 ns 19
BM_read_and_write 44972351 ns 44969916 ns 12
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.