Git Product home page Git Product logo

Comments (17)

janisz avatar janisz commented on May 14, 2024 2

No is you mean proper aliasing.

You can generate hash collison and find a key that collides with another.

from bigcache.

hobbs avatar hobbs commented on May 14, 2024 1

Not my exact scenario, but an example:

  1. Say you're caching HTML from an origin website
  2. A lot of your pages use a template called "article-layout"
  3. When you push a new version of that template, it'd be nice to delete all items in the cache that use that template

So it could look something like:

// Third argument takes in additional tags
cache.Set(url, pageResponse, []string{ 'article-layout', 'article-id-123' })
cache.Set(url2, pageResponse2, []string{ 'article-layout', 'article-id-456' })
...
cache.RemoveByTag('article-layout') 

It's similar to what some CDNs do with surrogate keys, or cache-tags, etc:
Fastly, Part 1: https://www.fastly.com/blog/surrogate-keys-part-1
Fastly, part 2: https://www.fastly.com/blog/surrogate-keys-part-2
Cloudflare: https://support.cloudflare.com/hc/en-us/articles/206596608-How-to-Purge-Cache-Using-Cache-Tags-Enterprise-only-

from bigcache.

siennathesane avatar siennathesane commented on May 14, 2024

Closing due to stale.

from bigcache.

hobbs avatar hobbs commented on May 14, 2024

I'd also be interested in this

from bigcache.

janisz avatar janisz commented on May 14, 2024

@hobbs If you are interested in this feature we need to think about a solution. Can you propose something?

from bigcache.

cristaloleg avatar cristaloleg commented on May 14, 2024

@janisz I did a small investigation today at morning. I was interested if it can be done via hash function. A small impl is here: https://gist.github.com/cristaloleg/fad647add6226659876fc716796f2a3c
but it will be much simpler to add to the each shard a new map to store aliases.
And additional exported method SetAlias(both cache and setAlias for shard).

from bigcache.

janisz avatar janisz commented on May 14, 2024

I'd like to start with defining the API. How will user add an alias? What is a use case for this feature and what are the requirements? Then we can think if this could be done by extending hash function with alias map or do we need to add aliases map as an independent component.

from bigcache.

hobbs avatar hobbs commented on May 14, 2024

I'm realizing that my requirements are more complicated than I originally thought. I'm actually looking for an alias key that points to an array of values, essentially introducing the concept of a many to many relationship. It's more of an index for specific fields in my value set.

Anyways, I think my requirements are out of scope, sorry waking up this ticket :)

from bigcache.

siennathesane avatar siennathesane commented on May 14, 2024

@hobbs would you be able to expand on your requirements either in this issue or another? I'd be curious to see where you see value in many-many and what you think that would add to this project.

from bigcache.

mfzl avatar mfzl commented on May 14, 2024

Use case I have is to allow cache lookup with two different unique identifiers. A common example is to lookup cached user information by both ID and username/email.

from bigcache.

janisz avatar janisz commented on May 14, 2024

@hobbs Do you need to query by tag?
@faxal Is the second ID added after or when the value is inserted?

from bigcache.

mfzl avatar mfzl commented on May 14, 2024

@janisz the latter, keys will be set when cache entry is inserted.

from bigcache.

siennathesane avatar siennathesane commented on May 14, 2024

@hobbs is the use case just removal? How do you see updates being handled? i.e., if you updated an existing record, cache.Set(url, updatedPageResponse, []string{'article-layout', ...}, what happens if there is now an orphaned tag? Would you require the tags to match when updating an existing record? What about changing tags? How would you see updating the tags on an item without updating the item? Is that a use case?

Currently we only block with mutexes when necessary and abstract that away from users. Updating tags and references, in my mind, is a blocking update as we need to traverse a structure to resolve relationships on-the-fly before we can take any action.

My primary concern here is performance. Managing tags isn't terribly hard, and there are several ways to do it - graphs, slices, trees, structs, etc. The problem is traversing the various structures required for sane tag management range from O(1) to O(n log(n)) or even O(n^x) at worst. It's really hard to be performant across millions of items when you have to traverse a graph with O(n^x) - possible, just very difficult.

from bigcache.

siennathesane avatar siennathesane commented on May 14, 2024

Having thought about this all morning (I have OCD), if we were to focus on this use case, which I believe could be important, I think a graph would likely be the easiest way to manage many<->many relationships that you would see with tags. I am also open to better ideas. Realistically, as long as we don't make an attempt to access any cache data when resolving tags, we could optimise certain operations such as loading, storing, and traversing a node. Depending on the current state of compilers (gccgo or something else), we also might be able to heavily optimise at compilation time so long as we are careful about loop implementations and provide this library as a plugin.

With that in mind, the well-known graph search algorithms would be way too slow to traverse 100k objects to resolve tag relationships. Here are some algorithms I've been looking at, which could have a higher grade of performance when traversing graphs. Here are some of my findings.

Algorithm: BC-GA: A Graph Partitioning Algorithm for Parallel Simulation of Internet Applications
Abstract: Static task mapping of parallel simulation is studied as a graph partitioning problem (GPP). However, the existing algorithms are primarily designed for partitioning finite element meshes or random graphs which are essentially different from the Internet-like topologies used in the field of large scale network simulation. In this paper, we present a new genetic algorithm, BC-GA, for effectively partitioning Internet-like topologies based on, boundary crossing, a quite different principle inspired by the analysis of characteristic of the Internet topology and its related solutions. All operations of this algorithm are novel, such as pizza-cutting and autogamy propagation. We test this algorithm on a large extent of graphs, including the snapshots of the real AS-level Internet, the topologies produced by the Internet model generator and many traditional benchmark graphs. The experiment shows that our algorithm can outperform traditional approaches on partitioning Internet-like topologies and it is also better than or comparable to those on traditional GPP. The experiment also shows that a genetic algorithm can converge quickly if the domain knowledge is fitly combined.
Why: This algorithm is focused on concurrent partition searching. If we are trying to search for a partition (tag), being able to search and resolve all partitions (tags) concurrently would be helpful. This algorithm is also designed for scale, so performance shouldn't be too bad.

Algorithm: An Adaptive Parallel Algorithm for Computing Connected Components
Abstract: We present an efficient distributed memory parallel algorithm for computing connected components in undirected graphs based on Shiloach-Vishkin's PRAM approach. We discuss multiple optimization techniques that reduce communication volume as well as load-balance the algorithm. We also note that the efficiency of the parallel graph connectivity algorithm depends on the underlying graph topology. Particularly for short diameter graph components, we observe that parallel Breadth First Search (BFS) method offers better performance. However, running parallel BFS is not efficient for computing large diameter components or large number of small components. To address this challenge, we employ a heuristic that allows the algorithm to quickly predict the type of the network by computing the degree distribution and follow the optimal hybrid route. Using large graphs with diverse topologies from domains including metagenomics, web crawl, social graph and road networks, we show that our hybrid implementation is efficient and scalable for each of the graph types. Our approach achieves a runtime of 215 seconds using 32K cores of Cray XC30 for a metagenomic graph with over 50 billion edges. When compared against the previous state-of-the-art method, we see performance improvements up to 24x.
Why: This is likely a more simplistic representation of a concurrent/PRAM approach at a breadth-first search of a graph. If we focus on betweenness (tag correlation), this could be a good algorithm that would let us traverse all the nodes quickly, using a set as a correlation of visited nodes. The performance on this should be okay.

Maybe this should be a proposal...

from bigcache.

hobbs avatar hobbs commented on May 14, 2024

Sorry for not getting back to you on your questions sooner. I think I would only be using the tags for deletion. And yes, Orphaned tags are definitely a concern. I may try to solve this in a naive way in the meantime, using a separate data structure to let me look up BigCache hashes by tag.

from bigcache.

siennathesane avatar siennathesane commented on May 14, 2024

@hobbs yeah, a naive way is best for now if you need something sooner. Not sure when I'll get around to PRAM stuff.

from bigcache.

siennathesane avatar siennathesane commented on May 14, 2024

This has been on my mind for awhile, I haven't made any progress on it, though. I still think a graph structure is the right way to go, I'm just thinking through the graph structure.

from bigcache.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.