netflix / evcache Goto Github PK

A distributed in-memory data store for the cloud

License: Apache License 2.0

Java 99.79% Shell 0.21%

evcache's Introduction

EVCache

EVCache is a memcached & spymemcached based caching solution that is mainly used for AWS EC2 infrastructure for caching frequently used data.

EVCache is an abbreviation for:

Ephemeral - The data stored is for a short duration as specified by its TTL (Time To Live).
Volatile - The data can disappear any time (Evicted).
Cache - An in-memory key-value store.

evcache's People

Contributors

Stargazers

Watchers

Forkers

quidryan jametong softmentor mstaylor geoffmomin grze tarakranjan m3git newsky viet-nguyen jlming6 lewismc rmarshasatx timiblossom chaordic pat-humphreys mgawinski ccstartfish101 julianduniec sinsixx asylumcorp is00hcw opuneet simontuffs vicall alexislitool thevuuranusls mbrukman rspieldenner senugula volo mitesh91 marcosbarbero fysoft2006 backoffbelief sxhao liuhongyuand modulexcite trigan-d reversemind fengshao0907 cloudxtreme mindis kangkot okigan zhaoxunyong kinzer1 hardiku ankit-rai hitanshu-trivedi nereuschen flyingfishs wuyunhao adrianprecub leobelive arthursxl8 tharanga-abeyseela gitter-badger zogwei lemonhall sharper kumaranjv061990 rsellathurai wesley-moore-rockalltech jjsahalf rtvt123 343829084 irfanj ycaihua vmuthusamy windofthesky shengang1978 markpasternak human7912 containerz zmyer us1415 ljx0305 zofuthan younjinjeong zeq9069 priestd09 mailmahee garrettcadams zhuzhengping911 xslackx yilab cmdavinder foodotbar parakrama1995 dut3062796s pioti xuliang87 amit2014 rapala61 cookingcodewithme raymondlwb kaeb hugoren radhikari54

evcache's Issues

NPE in EVCacheKetamaNodeLocatorConfiguration when using simple node list provider

Hi. We are facing NPE at https://github.com/Netflix/EVCache/blob/master/evcache-client/src/main/java/com/netflix/evcache/pool/EVCacheKetamaNodeLocatorConfiguration.java#L59
We have "use.simple.node.list.provider=true" and don't register our cache nodes in eureka. So DiscoveryClient.getApplication(appId) obviously returns null.
It looks like a bug. EVCacheKetamaNodeLocatorConfiguration seems to be unaware of SimpleNodeListProvider. Am I right?
Could you think of any possible workaround until it's fixed?

EVCacheNodeImpl memory leak

Hi. We are experiencing a possible memory leak in EVCacheNodeImpl.
Full GC triggered via JMX didn't allocate sufficient amount of memory.
Initial analysis showed that major amount of memory is allocated for numerous Object[]. Links to that object are stored in multiple instances of EVCacheNodeImpl.
The problem happened twice for different EvCache-powered applications, independently. A couple of screenshots from the heap dumps attached.
EvCache version is 4.38.0
Does it say anything to you? Do you have any guess about possible reasons and workarounds?

How to improve the set/delete performance?

We use evcache with one memcache server node. We find there are much EVCache.set/delete operations which need more than 500ms.
I'm not sure if the performance is up to the limitation. Here are some monitor log.
Key total number in memcache: 700k
QPS: 400 to 600

EvCache - Eureka

I would like to ask question related with the EvCache and Eureka setup.
I spent quite a while checking EvCache - testing and reading about different setup's. At this moment, I am not sure what is the best implementation with Eureka.

Basically, sample deployment on wiki page explains how to run this on localhost (or on ec2 instances) with 2 replicas (with 2 shards each). But with hard coded memcached replicas/shards endpoint.

In case I want to use dynamic method of discovering servers with memcached, what I need to do in order to initialize EvCache client?

Right now, for example in Sample deployment we are creating instance of the EvCache using following:
deploymentDescriptor = "SERVERGROUP1=localhost:11211;SERVERGROUP2=localhost:11212"; System.setProperty("EVCACHE_APP1-NODES", deploymentDescriptor); evCache = new EVCache.Builder().setAppName("EVCACHE_APP1").build();

In case of the Eureka, do we need to use discovery service manually and then recreate instance of evCache or this is done automatically? Is there any kind of technical wiki document which explains this setup little bit more detailed?

Large amount of PoolRefresher threads

Hello,
We are using v4.76.0 with stand alone (non-eureka mode) and have two nodes memcached cluster.
We faced situation when several thousands of PoolRefresher threads remain in BLOCKED state

at com.netflix.evcache.pool.EVCacheClientPool.refresh(EVCacheClientPool.java:680) at com.netflix.evcache.pool.EVCacheClientPool.access$200(EVCacheClientPool.java:47) at com.netflix.evcache.pool.EVCacheClientPool$7.run(EVCacheClientPool.java:859)

waiting for lock for com.netflix.evcache.pool.EVCacheClientPool owned by single PoolRefresher.
Comparing samples we can see that it's not deadlock situation and PoolRefresher which owns the lock is changing constantly. All PoolRefresher related to single memcached node.
Netstat shows that connections to this node in SYN_SENT status.
Are there workaround for this situation ?
From code I can see that it is result of EVCacheClientPool#refreshAsync. Would it make sense to limit number of concurrent refreshAsync operations?

Releases on maven central?

There don't appear to be artifacts for EVCache on maven central. Can we get builds up there?

Consistent Hashing Wikipedia link is broken

Links to Consistent Hashing points to https://en.wikipedia.org/wiki/Consistent_hashing/ but that link is broken due to the trailing slash. I updated them to https://en.wikipedia.org/wiki/Consistent_hashing in my "fork" here: https://github.com/Yarles404/EVCache-wiki

tomcat 7/8 crashes when using the evcache with much data

We use the evcache as the memcached client (one note). Recently, the Tomcat 7 (we upgrade to tomcat 8.5) down several times every day. I have spent much time on it. But I only find few clue in the tomcat log (there isn't hs_err_pid file. I think the jvm crashed before it wrote the log file).

Could you help me on this issue?

java runtime error.

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f9959e43e10, pid=15465, tid=0x00007f9829cec700

JRE version: Java(TM) SE Runtime Environment (8.0_151-b12) (build 1.8.0_151-b12)

Java VM: Java HotSpot(TM) 64-Bit Server VM (25.151-b12 mixed mode linux-amd64 compressed oops)

Problematic frame:

C [libc.so.6+0x81e10]2017-11-20 23:10:39.453 INFO com.netflix.evcache.EVCacheTranscoder: Compression increased the size of [B from 136 to 151

[ timer expired, abort... ]

Sometimes, tomcat crashes when the evcache reconnecting.
Here is the log.
2017-11-20 20:58:18.970 INFO net.spy.memcached.EVCacheConnection: Reconnecting due to exception on {QA sa=m-bp175bd9d8160bc4.memcache.rds.aliyuncs.com/10.50.117.204:11211, #Rops=0, #Wops=0, #iq=54, topRop=null, topWop=null, toWrite=0, interested=1}
java.lang.IllegalStateException: No read operation.
at net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:842)
at net.spy.memcached.MemcachedConnection.handleReadsAndWrites(MemcachedConnection.java:732)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:697)
at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:450)
at net.spy.memcached.EVCacheConnection.run(EVCacheConnection.java:50)
2017-11-20 20:58:18.972 WARN net.spy.memcached.EVCacheConnection: Closing, and reopening {QA sa=m-bp175bd9d8160bc4.memcache.rds.aliyuncs.com/10.50.117.204:11211, #Rops=0, #Wops=0, #iq=87, topRop=null, topWop=null, toWrite=0, interested=1}, attempt 0.

Is the project abandoned?

The project hasn't seen any official updates for the past 10 months. Many announced features, such as the EVCache server and remote cache invalidation never made it to the open source branch. Any plans on supporting these features soon?

Multi-region EVCache

Hi, in reading about EVCache, I see mention of multiple availablitiy zones, but no mention of multiple regions. Does EVCache work in a multi-region system?

thanks, Mitchell

SASL auth: bucket user/password

Does your library support SASL?

EVCacheClient doesn't provide incr/decr memcache operations

MemcachedClient has effective operations to increment and decrement cached values.
Is it possible to expose them through EVCacheClient as set/get/touch are done?

SimpleNodeListProvider obtains node list from System properties instead of ConfigurationManager... again

Hi.
Several months ago I've created an issue #28 with the same problem. My PR was approved and merged by @smadappa.
Now I want to upgrade to EvCache 4.75.0, but see that you've reverted my fix in ec41e79#commitcomment-21150671 (version https://github.com/Netflix/EVCache/releases/tag/v4.49.0)
What is the reason? Did it break anything?
Thanks in advance!

Netflix

Using Memcached with Prana

I am facing difficulty trying to configure healthurl "prana.host.healthcheck.url" for memcached . I have one instance running on port 11211 for memcached and prana is running on 8078. My understanding is this url should be reachable but memcached doesn't have any direct url that can be polled from prana. Please suggest.

Possible bug - Multi-Zone fallback

evcache-client-1.0.5

I am using the ZoneClusteredEVCacheClientPoolImpl with 1 client instance and 6 memcache servers (3 in each zone). If I write a key to both zones and then kill the primary memcache the next read for that key goes to the primary first.

Then realizes that the primary is down, throws an exception which is caught in EVCacheImpl.get(String key, EVCacheTranscoder tc) where it does the following check

Line 244
catch (Exception ex) { //TODO: Add counter for exception ? if (log.isDebugEnabled()) { log.debug("Exception while getting data for key : " + key, ex); } if (!_throwException.get()) { return null; } throw new EVCacheException("Exception getting data for key : " + key, ex); }
it then returns a null.

The next request for this key goes to the secondary and succeeds, should it not try the secondary rather than returning a null?

Question: Zone Based Replication

I am a bit unclear on how this should work.

Test
Send 10,000 objects into memcache with a client in Zone A.

Questions

Keys/values will get distributed to 1 node in zone a and 1 node zone b (at random)?
If I kill 2.2.2.2,3.3.3.3 and 4.4.4.4,5.5.5.5,6.6.6.6 should I still be able to get 10,000 objects back? This appears to be the case from my testing, but I dont understand the logic to redistribute the values.
If the answer to No 2 is Yes can you explain how the replication works :)

Config
MultiZone EVCache setup
ZoneFallback=true
zone.a=1.1.1.1,2.2.2.2,3.3.3.3
zone.b=4.4.4.4,5.5.5.5,6.6.6.6

How does EVCache handle race condition caused by ReplicationAgent

Hi,

Looks like EVCache does replication through Kafka and HTTPS. It means the replication mesaages can be out of order.
For example, SET K1=V1, DEL K1, SET K1 = V2 can be received by another datacenter as SET K1=V2, DEL, SET K1 = V1,which results into a stale value.
Does EVCache handle that? And what kind of consistency does it guarantee?

Two issues on the evcache

The evcache.EVCacheClientPool.poolSize=2 setting will cause the memcached unstable.
If we set the poolSize as 2, the cache client becomes unstable. After several days/hours running and there are about 1M data in cache server, the get/set/delete operation become more and more slow. They needs more than tens seconds.
In default, the evcache writes so many logs with info level.
After we upgrade to the 4.131. 0, the evcache writes so many data, our log database is easy to be full. And it will slow our servers.

Does evcache plan to support the memcached namespace?

In our scene， we need flush several caches which are associated with one value. For example, user profile, score and other user info, when user updates his/her info, we need update all the caches which are related to the user.
Currently, we use another cache entry to store all the user keys. Then we get all the keys by the cached entry and flush them. We meet some problems on this solution.

Sometimes, the cached entry is expired or evicted. We cannot get the cache keys to flush.
The performance is not high. It needs 3 operations to store a data to cache. (1 query and 1 set for the cache entry, and another 1 to store the data)
The cache entry data may be dirty because there are several servers which update the cache entry.

Do you have the same scene? And how do you resolve it if have?

Asynchronous Gets

Hello! I was looking at the evcache-client to use as a memcached client. When playing around with it, I noticed that there is an asynchronous get method, but that its deprecated. I see that there are set methods that return futures, but the get does not. I was wondering, using this library, is it still possible to perform get operations asynchronously (without using the deprecated method), and if so, what is the recommended way to do that? Thanks!

Cache calls taking 10 seconds

I have deployed evcache to a highish load production environment and we appeared to have some kind of network glitch where cache calls ended up taking 10 seconds per call (or multiples of 10 i.e. 20 or 30 seconds). Even though I had sent the max timeout to 100ms, this resulted in threads backing up and server grinding to a hault.

As it was production we had very limited logging and I after extensive testing in QA I was unable to replicate the issue. So would anyone have any idea what could cause this?

evcache version:1.0.5
spymemcache version:2.7.3

ConfigurationManager.getConfigInstance();
EVCacheClientPoolManager.getInstance().initEVCache(applicationName);

evCache = (new EVCache.Builder()).setAppName(applicationName).enableZoneFallback().build();

Config
EC2_AVAILABILITY_ZONE=zone1
${evcache.pool.provider}=com.netflix.evcache.pool.standalone.ZoneClusteredEVCacheClientPoolImpl
${evcache.application.name}.${evcache.server.zone.a.name}.EVCacheClientPool.hosts">${evcache.server.zone.a.hostnames}=server1,server2,server3
${evcache.application.name}.${evcache.server.zone.b.name}.EVCacheClientPool.hosts">${evcache.server.zone.b.hostnames}=server4,server5,server6
${evcache.application.name}.EVCacheClientPool.poolSize">${evcache.connection.poolsize}=1
${evcache.application.name}.EVCacheClientPool.readTimeout">${evcache.read.timeout}=100

Logs Before problem
We recieved a number of these warnings from a heartbeat thread we had running
MemcachedConnection] - Could not redistribute to another node, retrying primary node for key11111111

The about 1 minute later (where there were a number of successful cache calls) we started having the issue.

Logs around problem time
[ n.s.m.MemcachedConnection] - Closing, and reopening {QA sa=/serverip_1:serverport, #Rops=2, #Wops=162, #iq=0, topRop=Cmd: -1 Opaque: -1 Keys: , topWop=Cmd: 0 Opaque: 1865615 Key: Anotherkey_V11, toWrite=0, interested=5}, attempt 0.
[ n.s.m.p.b.BinaryMemcachedNodeImpl] - Discarding partially completed op: Cmd: -1 Opaque: -1 Keys:
[5ad1c8ab56d9445fa5aa4b0034c8c93d n.s.m.MemcachedConnection] - Could not redistribute to another node, retrying primary node for Anotherkey2_V11.
[ n.s.m.MemcachedConnection] - Closing, and reopening {QA sa=/serverip_2:serverport, #Rops=2, #Wops=220, #iq=0, topRop=Cmd: 1 Opaque: 1865530 Key: cckey_V11 Cas: 0 Exp: 0 Flags: 1 Data Length: 222, topWop=Cmd: 0 Opaque: 1865532 Key: cckey_V11, toWrite=0, interested=8}, attempt 0.
[ n.s.m.p.b.BinaryMemcachedNodeImpl] - Discarding partially completed op: Cmd: 1 Opaque: 1865530 Key: cckey_V11 Cas: 0 Exp: 0 Flags: 1 Data Length: 222
[ n.s.m.MemcachedConnection] - Closing, and reopening {QA sa=/serverip_3:serverport, #Rops=2, #Wops=180, #iq=0, topRop=Cmd: -1 Opaque: -1 Keys: , topWop=Cmd: 0 Opaque: 1865589 Key: cckey2_V11, toWrite=0, interested=5}, attempt 0.
[ n.s.m.p.b.BinaryMemcachedNodeImpl] - Discarding partially completed op: Cmd: -1 Opaque: -1 Keys:

Also I noticed the value DEFAULT_OP_QUEUE_MAX_BLOCK_TIME in the spymemcache code which coincidentally = 10 seconds. Which is quite strange, anyone know what this value is for?

DefaultConnectionFactory
public static final long DEFAULT_OP_QUEUE_MAX_BLOCK_TIME =
TimeUnit.SECONDS.toMillis(10);

Any ideas or help would be most welcome.

EVCache Touch method throw NoSuchMethodError exception

Here are the jar files version:
EVCache Client: 4.104
spymemcached: 2.12.3

I use the EVCache.touch(key, 120000) to refresh the time. But it throws the below exception.
java.lang.NoSuchMethodError: net.spy.memcached.OperationFactory.touch(Ljava/lang/String;ILnet/spy/memcached/ops/OperationCallback;)Lnet/spy/memcached/ops/KeyedOperation;
net.spy.memcached.EVCacheMemcachedClient.touch(EVCacheMemcachedClient.java:322)
com.netflix.evcache.pool.EVCacheClient.touch(EVCacheClient.java:1099)
com.netflix.evcache.EVCacheImpl.touchData(EVCacheImpl.java:755)
com.netflix.evcache.EVCacheImpl.touch(EVCacheImpl.java:712)
com.netflix.evcache.EVCacheImpl.touch(EVCacheImpl.java:663)
...

Do I use the wrong jar files?

How long before EvCache Server is open-sourced?

.travis.yml: The 'sudo' tag is now deprecated in Travis CI

Travis are now recommending removing the sudo tag.

"If you currently specify sudo: false in your .travis.yml, we recommend removing that configuration"

The performance of the cache operations become more and more slow after several days

We face a serious problem on the performance of the EVCache to memcached server. Every several days (about 4 or 5 days), it becomes very slow to operate with the memcached server by EVCache on some servers (not all servers with the memcached servers).

We have to run the flush_all command to clear the memcached data to resolve it.
That restarting the web servers cannot resolve the performance issue. It still very slows after restarting.
We have 10 web servers which connect the memcached server. But only several servers have the performance issue.
The QPS/bandwith are not high. QPS only serveral hundreds.
Most of the set/delete operation cost more than 1s.

We have doubt if it is the memcached server problem. But after we replace the memcached server with other two (replace one server (2 cores/16G ) with two servers (1 core 4G)), it still has the performance issue. And only part of the servers have the problem.

Any particular reason compare-and-set is not supported?

while "add" is supported

Difference in hashing algorithm for evache-client v1.0.5 vs v4.38.0

Hello,
After trying to upgrade evcache-client from v 1.0.5 to v 4.38.0 we faced found increase in cache misses during Blue-Green deployment, on environments using more then one cache nodes.
Performing test where number of SETs performed from old client and GETs from new client, we can see that in ~50% of cases new client misses values. Backward test ( SETs from new and GETs from old) show almost same results.
On environments with single cache node given tests pass 100% correctly.
Both clients using default - KETAMA hashing algorithm.

Assuming, this we think that something in hashing algorithm implementation have been changed.
Is this known issue, and are there some workaround?

EvCache + Eureka

Hello,
We already integrate many Netflix components to our architecture and Eureka is, of course, one of them. So, i'm particularly interested in the EvCache setup using Eureka. Some questions:

In the relevant setup example, you keep mentioning about a single instance of an EvCache Client per Zone. What exactly do you mean by this? That it is mandatory for the Client to be inside a Service app that acts as a proxy for all cache calls of our architecture and only one instance of this Service app is allowed per zone? Because, i was thinking of setting up a Hystrix wrapper over an EvCache Client instance and use it from every service that needs to get data from the cache. So, in my setup, if we have 4 Domain Service types and 3 instances of each of this service type spanned across multiple zones, can i use an EvCache Client embedded inside each one of these, in order to get data from the cache?
Are you planning to release the EvCache Server any time soon?
Due to the fact that EvCache Server is not yet released, we opted to go with ElastiCache. How do you imagine that a Eureka-based EvCache client setup can work with ElastiCache? I guess that the problem is, how do we report to Eureka about the ups and downs of the ElastiCache instances? Or maybe the Eureka-based setup does not really work with ElastiCache and we need to wait for EvCache Server to get released? Or maybe we can use this sidecar Java app you mention? Can it work for ElastiCache instances somehow or is strictly specific to EvCache Server?

Archaius-based properties

Hello and thanks for your replies so far!

I can see in your examples that you use system properties for configuration. Which is pretty much strange to me as all other Netflix components use Archaius-based configuration and we opted for that solution also! Nevertheless, i can see that for some properties you mention that they can be dynamically updated, so that got me thinking that Archaius is indeed somewhere behind!
So, the question is: Can i use Archaius for ALL the properties that you mention in your examples? Can we have a list or better yet some wiki entry list of all the properties we can define and which of them can be dynamically updated?

Private Datacenter Eureka Pooling

I saw that there's a datacenter validation allowing only instances on AWS's datacenter to be discovered by eureka.

Is there any change of Eureka Pooling being allowed in private datacenter?

JUnit 4.10 marked as compile dependency

Please mark JUnit 4.10 as a testCompile dependency. This will prevent JUnit 4.10 from being pulled down when using this jar.

Spymemcached client version 2.8 and above is not compatible

I found out through testing, that i managed to start up the EvCache client using spymemcached client version 2.7.3. Everything from 2.8 and up breaks the contract of the HashAlgorithm interface.

It would be nice if this could work this latest version present in Maven central. They only uploaded versions 2.9 and later.

MultiZone PoolSize

Can anyone recommend a poolsize to use for an application that would have around 1000 interactions per second with memcache? I had this set to 5, but I was seeing some very unusual results, almost looking like a threading issue. I am still trying to collect more data around that, but a poolsize recommendation would be great.

how does evcache facilitate eventual consistency?

Hi,

I found this post Caching for a Global Netflix on EVCache, but the "eventual consistency" model hadn't been elaborated in detail.

Is there any posts or sources describing this model now?