Git Product home page Git Product logo

Comments (16)

chenkovsky avatar chenkovsky commented on August 21, 2024

match is thread safe.

from cyac.

OblackatO avatar OblackatO commented on August 21, 2024

I analyzed the function after posting this issue and arrived to that conclusion.
I am not into cython, but do you think it would be possible to release the GIL only for the match function?
This has the following advantages:

  • Python threads using the match function use multiple cores (because GIL is ignored)
  • Python threads can share the same AC automaton

From what I have read, nogil requires that no Python code is executed. However, in the match function there are several python objects, such as xstring or the trie. It would be necessary to change the def statement of match to cdef. Also, the word yield could not make part of the function anymore.

from cyac.

nppoly avatar nppoly commented on August 21, 2024

@OblackatO could you please give a patch? it seems that you know GIL better than me.

from cyac.

OblackatO avatar OblackatO commented on August 21, 2024

Ty for your answer @nppoly, 1t of all let me just tell you that your library is probably the best python library I found in terms of speed and memory efficiency dealing with aho corasick.

... could you please give a patch?

Before posting this issue I already try it. I do not think I can. As I said I am not familiar with Cython, but I learnt a bit about it to better understand the code of this library. It seems that the no GIL feature of Cython requires no Python objects at all in the code, only then the GIL can be release. This makes perfect sense to me. From the Cython docs we can read that:

... Code in the body of the with-statement must not manipulate Python objects in any way, and must not call anything that manipulates Python objects without first re-acquiring the GIL

The code on this library has python objects, the AC and the trie. Which makes me think that in order to release the GIL with the noGIL feature of Cython, the majority of the code of this library needs to be re-written (but I might be wrong, no Cython expert)

Even so, here is what I tried already(before posting this issue).

  • Create a cdef function wrapper to the actual match function of the ac and use the with nogil when calling the cdef wrapper
    • Result: Instant error when building with setup.py: It says that are several Python objects in the function, such as: xstring and the with nogil cannot be used here. The yield keyword is also a problem, it is a python keyword and it shows up on the long list of errors I got.

I know this is not a lot, but this is what I could do without starting to re-write all the objects of the library. What we really need is just to make the method match to be used with the nogil feature, because that way we can have several threads sharing the same AC. But since the match function makes part of a Python object, this is not possible to do.

Just to conclude this, sharing the AC automaton would have the following advantages:

  • Threads ignore the GIL and effectively use multiple cores
  • Threads can share the same AC, which is a huge advantage. For instance, currently if we want to find matches in parallel, using multiple cores, we need the multi processing library. However, it is not possible to share the AC between several Python processes (we can use pipes and all that to send the pickled AC back and forward, but this it too slow) . This means that we need a copy of some AC, in each process. If an AC has 34Mbytes and there are 6 processes, we are going to have 34Mbytes*6 Mbytes total. Using threads we would only have 34Mbytes because they share the same virtual address space, however threads do not use multiple cores because of the GIL. Bypassing the GIL would then make the threads using multiple cores and sharing the AC. I hope this made sense, let me know if it did not. Can also go into details if needed.

from cyac.

OblackatO avatar OblackatO commented on August 21, 2024

@nppoly I might have an idea to share the AC with the new features of shared memory of Python3.8 without re-writing anything at all on this library, but I need to ask you something first.
In the file ac.pyx file
the function getstate ():

array_to_bytes(<char*>self.output, self.trie.array_size * sizeof(OutNode)),
            array_to_bytes(<char*>self.fails, self.trie.array_size * sizeof(int)),
            array_to_bytes(<char*>self.key_lens, self.trie.leaf_size * sizeof(unsigned int)),

All the elements that have <char*> are pointers?

The same question for the function setstate():

self.trie, output, fails, key_lens = data
        self.output = <OutNode*> bytes_to_array(output, self.trie.array_size * sizeof(OutNode))
        self.fails = <int*> bytes_to_array(fails, self.trie.array_size * sizeof(int))
        self.key_lens = <unsigned int*> bytes_to_array(key_lens, self.trie.leaf_size * sizeof(unsigned int))

Is <OutNode*>, <int*>, <unsigned int*> pointers?

If yes, this means that when we use getstate() we are just creating pointers to the original AC, and not copying memory right?

from cyac.

nppoly avatar nppoly commented on August 21, 2024

@OblackatO cedar trie allocs three arrays, ac automata allocs three arrays. they don't alloc any other memories. so share memory of these arrays will work. but actually the array_to_bytes bytes_to_array copy memory(if don't copy memory, segment fault will happen during pickle).

can we use mmap? add one method that writes data into file, one method that inits automata from array.

from cyac.

OblackatO avatar OblackatO commented on August 21, 2024

... but actually the array_to_bytes bytes_to_array copy memory(if don't copy memory, segment fault will happen during pickle).

Ok, good to know. Then I guess my idea will not do any good. I wanted to convert the AC to bytes, share using shared mem, or mmap as you suggested (Just instead of writing into a file, I would write to memory directly) and then access that piece of memory in other processes. However, because bytes_to_array copy, this will not do us any good, because each process is going to have a copy of the AC and not sharing memory. When we convert from bytes to the actual AC, memory is always copied.

The only two possibilities I can think of is really just release the GIL with cython, which we all know the implications OR use numba ! using its (nogil)[http://numba.pydata.org/numba-doc/dev/user/jit.html] feature.

Numa is basically a very easy to use library that allows use to release the GIL, however again, Python code cannot be used, which is normal. So the question now is, in the description of the project @nppoly, you wrote: "It's implemented by cython, and will be compiled to cpp" so the actual code being called in match function is cpp code? if that is the case I might be able to make it work with numba.

This would allow us to:

  • Make threads ignore GIL, effectively using multi cores and sharing the AC.

from cyac.

nppoly avatar nppoly commented on August 21, 2024

If we use mmap. there's no need to copy data. we can write data in our format, we can get rid of pickle. I can implement it, it's easy. @OblackatO

from cyac.

OblackatO avatar OblackatO commented on August 21, 2024

I see, that is great news. Could you just confirm that you can indeed:

  • convert the whole AC to bytes and from bytes to AC without copying any AC content? I guess so, if pickle is not used.

If you can do that, I can implement a shareable AC between several Python processes using V shared memory. This would be amazing. There is not a single Python library that can do that. However, this would require this project to use Python3.8 or higher, is that a problem? If yes, we could just put that patch in a separate branch.

from cyac.

nppoly avatar nppoly commented on August 21, 2024

@OblackatO I have implemented api which can init trie or ac from buffer, you can trie it. python mmap supports buffer protocol: https://github.com/python/cpython/blob/74b4eda98b650245216292ace8d7e30d3997baa7/Modules/mmapmodule.c.

from cyac.

OblackatO avatar OblackatO commented on August 21, 2024

Ok, I will give it a try and do some memory analysis. :)
I will get back to you when I am done.

from cyac.

OblackatO avatar OblackatO commented on August 21, 2024

I am done with the memory analysis. This is going to be a big post, because there are many things that I need to state and that you should take in consideration.

First of all, I used memory-profiler to make these memory analysis. When I was in doubt if the results of memory-profile were accurate, I analyzed the memory using the ps tool, with the following command:
ps -p <process_id> -o pcpu,rss,pid,command , where rss is the Resident Set Size memory (more simply the actual physical memory).

Test 1:

I tried to construct several automatons with the argument copy=False and see how the memory scaled. The code is here.
The memory analysis output the following:

Filename: shared_buffer_test2.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    28    143.9 MiB    143.9 MiB           1   @profile
    29                                         def load_and_search():
    30    143.9 MiB      0.0 MiB           1       with open("ac_trie", "r+b") as bf:
    31    143.9 MiB      0.0 MiB           1           mm = mmap.mmap(bf.fileno(), 0)
    32    143.9 MiB      0.0 MiB           1       ac_first = AC.from_buff(mm, copy=False)  # it shares memory
    33    143.9 MiB      0.0 MiB           1       ac_second = AC.from_buff(mm, copy=False)  # it shares memory
    34    143.9 MiB      0.0 MiB           1       ac_third = AC.from_buff(mm, copy=False)  # it shares memory
    35    143.9 MiB      0.0 MiB           1       ac_four = AC.from_buff(mm, copy=False)  # it shares memory
    36    143.9 MiB      0.0 MiB           1       print("Matches after loading shared buffer:")
    37    143.9 MiB      0.0 MiB           3       for id, start, end in ac_first.match(string_to_search):
    38    143.9 MiB      0.0 MiB           2           print(id, string_to_search[start:end])

Test 1 Conclusion:

We can see that in lines 33,34 and 35 the memory does not increase when creating new automatons. After analyzing the ac.pyx from_buff, this is the expected behavior. The memory is indeed shared between all the ac variables. Note that the 143MiB are not entirely from the AC, the patterns and the other variables also count.

Test 2:

I did the same thing as in Test 2, however I set the copy=True
Memory analysis output:

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    28    144.8 MiB    144.8 MiB           1   @profile
    29                                         def load_and_search():
    30    144.8 MiB      0.0 MiB           1       with open("ac_trie", "r+b") as bf:
    31    144.8 MiB      0.0 MiB           1           mm = mmap.mmap(bf.fileno(), 0)
    32    240.9 MiB     96.1 MiB           1       ac_first = AC.from_buff(mm, copy=True)  
    33    288.9 MiB     48.0 MiB           1       ac_second = AC.from_buff(mm, copy=True)  
    34    336.8 MiB     48.0 MiB           1       ac_third = AC.from_buff(mm, copy=True)  
    35    384.8 MiB     48.0 MiB           1       ac_four = AC.from_buff(mm, copy=True)  
    36    384.8 MiB      0.0 MiB           1       print("Matches after loading shared buffer:")
    37    384.8 MiB      0.0 MiB           3       for id, start, end in ac_first.match(string_to_search):
    38    384.8 MiB      0.0 MiB           2           print(id, string_to_search[start:end])

Test 2, Conclusion:

In lines 33, 34 and 35 the memory increased as expected. This is a clear sign that the copy/not copy arg is functioning properly. We can also see that the AC has around 50MiB.

Test 3: Shared memory between python processes.

Code here
I created 6 python processes and shared the AC automaton between them. I also tried to find some matches within each Process to make sure the AC automaton worked properly. Before diving into this, @nppoly Python processes cannot share memory using mmap, because mmap instances cannot be pickled. This is one of the reasons behind the shared_memory features of Python3.8.
So, when you say in the readme.md file that: "we can use mmap to load file, share data between processes." this is true, but misleading. It is true that we can indeed share memory between processes with mmap, but not using Python processes. Let me illustrate this what a quick example. If I create a Python process like this:

p = Process(
            target=get_shared_memory_find_matches,
            args=(
                "process_" + str(x),
                shared_memory_tag,
                ac_size
            )
        )

and if I put an instance of mmap in the args it will raise a TypeError because the mmap object cannot be pickled, which is needed to pass the data to the child process. We can use the os.fork() API, but this is not the best solution.

In the code I am simply putting in the shared memory buffer the mmap instance. I also set to None all unnecessary variables before creating the child processes. Otherwise they will be (rather, will almost always be) copied to the child processes. For instance, if the var ac_patterns, which has all the patterns to build the AC is not set to None, the garbage collector is not going to kick in and we will see an additional 20MBytes in the child processes.

The line:
print("Size of AC automaton:Total patterns: {}bytes:{}patterns".format(ac_size, total_patterns))
outputs the following:
Size of AC automaton:Total patterns: 50476096bytes:759000patterns

50676096Bytes is around 50MBytes. Given that the AC automaton has 75k patterns this makes total sense for me. Mostly because of the memory analysis I did to compare pyahocorasick and cyac.

memory_consumption_cyac_pyahocorasick

Finally, I analyzed the memory on each process, individually.

Executing search in process_0
...
Filename: shared_buffer_processes.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    41     33.5 MiB     33.5 MiB           1   @profile
    42                                         def get_shared_memory_find_matches(processname, shared_memory_tag, ac_size):
    43     33.5 MiB      0.0 MiB           1       shm = shared_memory.SharedMemory(shared_memory_tag)
    44     33.5 MiB      0.0 MiB           1       AC_in_bytes = shm.buf[0:ac_size]
    45     33.6 MiB      0.1 MiB           1       ac_in_process = AC.from_buff(AC_in_bytes, copy=False)
    46     33.6 MiB      0.0 MiB           1       string_to_search = "asdpythonasdasdruby"
    47     33.6 MiB      0.0 MiB           1       print("Executing search in {}".format(processname))
    48     33.6 MiB      0.0 MiB           3       for id, start, end in ac_in_process.match(string_to_search):
    49     33.6 MiB      0.0 MiB           2           print(id, string_to_search[start:end])
    50                                             #time.sleep(100)
    51     33.6 MiB      0.0 MiB           1       ac_in_process = None
    52     33.6 MiB      0.0 MiB           1       AC_in_bytes.release() # MUST release memory beforing closing shm insance, otherwise error is raised.
    53     33.9 MiB      0.3 MiB           1       shm.close()

I had the exact same results for the remaining 5 processes. Not only we can see that the memory does not increase when getting the shared bytes, but we can also see that the memory used is less than 50MBytes, which is the size of the AC automaton, that resides in the Virtual Address Space of the main process.
This 33.6MiB, can be justified by the fact the the child processes launch a brand new Virtual Address Space with a brand new Python interpreter. I am also pretty sure of this, because if I increase the number of iterations in the line:

for x in range(0,1000):
    for key in patterns_dict:
        for value in patterns_dict[key]:
            total_patterns += 1
            ac_patterns.append(value+str(random.randint(0,10000)))

from 1k to 10k, the memory in the child processes remains stable at: 34MiB-36MiB., even if the AC is a lot bigger.This is, in my humble opinion, the biggest proof that the processes are indeed using shared memory. The exact same piece of physical memory allocated with the AC automaton.

I was not so sure about this, but after running the ps command, as described at the beginning, and seeing the child processes in a tree in htop, the results were exactly the same of memory_profile. Literally the same, not similar.

Overall Conclusion

Finally, after this memory analysis, my humble conclusion is that the shared_buffer works as expected. This is just amazing. You have no idea for how long I have been trying to do something like this, efficiently. You can read more about it, in other python libraries that also implement the ahocorasick algorithm, here and here. Thank you for your time, interest and for this this great library!
I highly appreciate it.

Just one question for you, @nppoly Would you like me to write a function so that users can easily use shared memory. A function that does all the shared memory stuff and returns a string that is the shared memory tag. Another function that takes that shared memory tag as an arg and builds the AC from shared mem. Similar to what I did in Test 3.

One final remark, it would be a good idea to state in the readme.md that the match function is thread/process safe, and to modify the sentence: "we can use mmap to load file, share data between processes." , which can IMO, be misleading as explained earlier. But that is up to you :)

edit: Changed some phrases syntax.

from cyac.

nppoly avatar nppoly commented on August 21, 2024

for mmap, you should open file, and call mmap in every process. the operating system knows that you are sharing this file.

from cyac.

nppoly avatar nppoly commented on August 21, 2024

"""
Would you like me to write a function so that users can easily use shared memory.
"""

it's most welcome.

from cyac.

OblackatO avatar OblackatO commented on August 21, 2024

for mmap, you should open file, and call mmap in every process. the operating system knows that you are sharing this file.

Yes, you are right. I was just tying to point out that we cannot do it while passing arguments to child python processes.

it's most welcome.

Ok, I will do it.

from cyac.

OblackatO avatar OblackatO commented on August 21, 2024

@nppoly I reconsider implementing the shared_memory part and I do not think it is a good idea anymore. It could bring more confusion than good. I added an example with multiprocessing in the readme.md and referenced this issue. I made a pull request #5 as well. If in the future someone needs to use shared_memory I could easily implement it in another branch. Besides, an example can already be found on this issue.

from cyac.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.