Git Product home page Git Product logo

Comments (22)

Ratuchetp avatar Ratuchetp commented on June 5, 2024 6

For my side, the main source of problem is MKL automatically opens too many threads when evaluating numpy matrix. I found that the problem occurred when the tracker updating the Kalman filter, and I limited it in kalman_filter.py by adding:
import mkl
mkl.set_num_threads(2)
then it work well even better.

The problem also may caused by openBLAS. You can use:
numpy.show_config()
to confirm.

I'm not sure the problem comes entirely from Numpy, and I'm not sure my solution is reasonable either, but at least it works.

from yolo_tracking.

mikel-brostrom avatar mikel-brostrom commented on June 5, 2024 2

I limited the number of CPUs to one by:

taskset --cpu-list 0 python3 track.py --source 0

and get the same inference speed as when running:

python3 track.py --source 0

In the first case all the CPUs reached a ~50% usage, in the second just one got 100%.
I need to investigate this further.

from yolo_tracking.

TonyX19 avatar TonyX19 commented on June 5, 2024 1

in _cosine_distance, numpy arrays should declare dtype.
if not data_is_normalized:
a = np.asarray(a) / np.linalg.norm(a, axis=1, keepdims=True)
b = np.asarray(b) / np.linalg.norm(b, axis=1, keepdims=True)

change to:

if not data_is_normalized:
a = np.array(a, dtype=np.float128) / np.linalg.norm(a, axis=1, keepdims=True)
b = np.array(b, dtype=np.float128) / np.linalg.norm(b, axis=1, keepdims=True)

from yolo_tracking.

mikel-brostrom avatar mikel-brostrom commented on June 5, 2024

Could you let me know if this temporary fix works for you @fareed945 ?

from yolo_tracking.

fareed945 avatar fareed945 commented on June 5, 2024

Hey, I'm having a fast api wrapper around it.. I'm not using it from the command line..
Is the way mentioned above an ideal one?
Ideally we shouldn't be manipulating the cpu usage.. Correct?

from yolo_tracking.

fareed945 avatar fareed945 commented on June 5, 2024

The cpu usage increases rapidly? Any idea as to where the processing is increasing? Is it the entire deep sort process as a whole!
I couldn't find any memory leaks :(

from yolo_tracking.

mikel-brostrom avatar mikel-brostrom commented on June 5, 2024

CPU affinity was the quickest but also the dirtiest fix I could find with the time I had. Yolov5 doesn't seem to be the problem as I can run it separately without triggering the CPU overload. So yep, DeepSort seems to be the problem. I tried cv2.setNumThreads(0) just in case but that didn't make any difference

from yolo_tracking.

fareed945 avatar fareed945 commented on June 5, 2024

Any loopholes that you can think off ??

from yolo_tracking.

fareed945 avatar fareed945 commented on June 5, 2024

Yea that's a solution.. But wouldn't it lead to loss of frames from a stream?

from yolo_tracking.

fqdeng avatar fqdeng commented on June 5, 2024

@mikel-brostrom so, is there a solution to solve this problem now ? taskset seems limit one core, but still not good enough.

from yolo_tracking.

fqdeng avatar fqdeng commented on June 5, 2024

https://www.pastefile.com/f8t7t7

from yolo_tracking.

PFSHI avatar PFSHI commented on June 5, 2024

I used a graphics card to calculate it, and it performed perfectly. (using torch cu library and torchvision cu library)

from yolo_tracking.

fqdeng avatar fqdeng commented on June 5, 2024

I used a graphics card to calculate it, and it performed perfectly. (using torch cu library and torchvision cu library)

In one month ago , I use my nvidia GTX960 in Ubuntu 20.04 desktop, it performed perfectly, but I lost my data, and reinstall unbuntu 20.04 with random nvidia dirver, and it cause problem, so could you give me the information about your nvida cuda version and python dependencies, you can use pip freeze to output your python dependencies.

from yolo_tracking.

hdnh2006 avatar hdnh2006 commented on June 5, 2024

I applied the solution given by @mikel-brostrom, using the command taskset --cpu-list 0 python3 track.py --source 0 in Ubuntu these are the results:

Without taskset:
image

With taskset:
image

I don't know what happen if you stream with several cameras, I cannot understand why this happen but it is strange.

from yolo_tracking.

fqdeng avatar fqdeng commented on June 5, 2024

I applied the solution given by @mikel-brostrom, using the command taskset --cpu-list 0 python3 track.py --source 0 in Ubuntu these are the results:

Without taskset:
image

With taskset:
image

I don't know what happen if you stream with several cameras, I cannot understand why this happen but it is strange.

same , it works, I will use it in production environment.

from yolo_tracking.

mikel-brostrom avatar mikel-brostrom commented on June 5, 2024

Hi @Ratuchetp! Glad that you could help out with this issue!
Following your response I found: this.

Out of all the environmental variables, either of these worked for me:

os.environ["OMP_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"

These did not:

os.environ["MKL_NUM_THREADS"] = "1"
os.environ["VECLIB_MAXIMUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"

Can anybody else confirm this by placing either of the lines of code that works at the top of your track.py file

from yolo_tracking.

bbeomee avatar bbeomee commented on June 5, 2024

Hi Ratuchetp and @mikel-brostrom!
import mkl
mkl.set_num_threads(1) works for me!

But it didn't go well from the beginning, and I had to build openBLAS on my computer and rebuild numpy and scipy.

For openBLAS builds and others, refer to the link this : (https://leemendelowitz.github.io/blog/installing-numpy-with-openblas.html)

Another problem was that my research environment was using intel cpu(i7-8700k), but my production environment was using amd cpu(Threadripper 3990x).

In an environment using intel cpu, the number of threads could be easily adjusted after installing the mkl package(pip install mkl, pip install mkl-service), but in an environment using amd cpu, the openBLAS library had to be read and operated directly.

If you are using an intel cpu, you can adjust the number of threads in the same way as @Ratuchetp said, and if you are using an amd cpu, refer to this : (https://stackoverflow.com/a/29582987)

By using the above method, only the context in which DeepSort operates can adjust the number of threads to 1, thereby improving performance without affecting other running processes.

So anyone can run openBLAS or mkl without changing the environment variable through the export command or the os.environ command.

But fundamentally, I don't know why the speed is faster when using only one CPU core than when using all CPU cores.

from yolo_tracking.

mikel-brostrom avatar mikel-brostrom commented on June 5, 2024

Did you try any of these env variables @bbeomee :

os.environ["OMP_NUM_THREADS"] = "1"
os.environ["OPENBLAS_NUM_THREADS"] = "1"

or was

mkl.set_num_threads(1)

the only thing that made it work for you?

from yolo_tracking.

mikel-brostrom avatar mikel-brostrom commented on June 5, 2024

I will set a maximum of 1 thread for each of the high performance libraries that could be causing the issue. In that way we are on the safe side. Thank you all for contributing to the fix of this bug!

from yolo_tracking.

ljl86092297 avatar ljl86092297 commented on June 5, 2024

您好 使用了您提供的方法 并没有解决cpu占满的情况,请问有其他有效的方法吗

from yolo_tracking.

mikel-brostrom avatar mikel-brostrom commented on June 5, 2024

您好 使用了您提供的方法 并没有解决cpu占满的情况,请问有其他有效的方法吗

If this is the case, please open a new bug issue explaining how we should reproduce this behavior to be able to solve it.

from yolo_tracking.

tuonisitake avatar tuonisitake commented on June 5, 2024

in _cosine_distance, numpy arrays should declare dtype. if not data_is_normalized: a = np.asarray(a) / np.linalg.norm(a, axis=1, keepdims=True) b = np.asarray(b) / np.linalg.norm(b, axis=1, keepdims=True)

change to:

if not data_is_normalized: a = np.array(a, dtype=np.float128) / np.linalg.norm(a, axis=1, keepdims=True) b = np.array(b, dtype=np.float128) / np.linalg.norm(b, axis=1, keepdims=True)

it works for me ! thank you !

from yolo_tracking.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.