It seems like I'm getting a cache miss on every query. In cache.log, I see messages su

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Q] "Data-Cache miss" every time about graphite-web HOT 11 CLOSED

ebryerwork commented on June 11, 2024

[Q] "Data-Cache miss" every time

from graphite-web.

Comments (11)

ebryerwork commented on June 11, 2024

Maybe I misunderstand the output, but it does seem odd. Please let me know if I'm (mis)understood the cache miss message.

I managed to reproduce this with a minimal startup command for the container. I'm going to file a bug in the container channel, but I'll leave this open for now in case anyone can (dis)confirm my interpretation of the log.

from graphite-web.

deniszh commented on June 11, 2024

@ebryerwork : that messages are coming from django cache, not from carbon cache.
Request cache caching request completely (all fields), data cache caching targets, starttime, endtime and xfilisfactor params. Only in case of all field match you'll get request or data cache hit.
See views.py if you're curious and understand a python code.

from graphite-web.

ebryerwork commented on June 11, 2024

How can I tell if results are coming from carbon-cache.py's in-RAM cache? From what you've said, I can't tell if you mean I have to request a specific data point to get a carbon in-RAM cache hit, or if that's just a Django cache thing (which is separate from carbon-cache). The code is a little beyond my abilities. Thanks.

from graphite-web.

deniszh commented on June 11, 2024

Carbon is not supported memcached as cache, only Django does. Carbon will return data only from RAM.

from graphite-web.

ebryerwork commented on June 11, 2024

Putting aside memcached for the moment, it's not clear to me what the difference is, regarding cache, between Django and carbon.cache.py. Does Django have it's own cache apart from carbon, and that is sometimes used? If so, what's the difference between Django caching and carbon caching?

I thought carbon returns data from its in-RAM cache first and, if the requested data wasn't present in that cache, then it returns the data from the whisper data files. How can I tell which of those two it is doing?

from graphite-web.

cbowman0 commented on June 11, 2024

The purpose of Carbon Cache is to buffer incoming data prior to writing to disk (normally into whisper files). This saves disk IO by allowing batching of writes. It is not caching reads.

Graphite reads from carbon cache to obtain any data not yet written to disk and then combines it with data directly read from whisper files on disk.

Adding to be more specific:
Graphite-web reads directly from whisper files on disk.
Graphite-web reads unwritten data from carbon-cache memory via a network connection to the carbon-cache process.
Graphite-web combines the two into a result set and performs and transformations (functions) on it and returns it to the requestor.
Graphite-web can utilize Django caching to cache the result set.
Graphite-web can return a cached result set if one exists in the Django cache.

Carbon-cache accepts incoming data and stores in memory (cache).
Carbon-cache, in a dedicated thread, loops over the data stored in memory and writes the data to the appropriate whisper file.
Carbon-cache responds to requests from graphite-web for data in memory (aka. unwritten data)

from graphite-web.

ebryerwork commented on June 11, 2024

Thanks very much for that illuminating comment. :)

Please let me know if I've (mis)understood: I'm intaking data every 60s. The time is 12:00. I query for a specific data point written at 10:00. This comes from disk and this result set is cached in Django. If I do that request again, I'll get the result from Django. If I do another query for all data starting from 2 hours ago to the latest data point, it comes from a combination of carbon in-RAM cache (if it hasn't yet all been flushed to disk) and disk (whisper). If I do that query again after 2 minutes, it's not going to come from Django; it'll again come from a combination of carbon cache and disk, because there are new data points that are not in the previous result set.

As far as I can tell the carbon cache retention time is altered by carbon.conf mainly with MAX_UPDATES_PER_SECOND = 500; there is no tweak to tell carbon to retain incoming data for N seconds. If I could tell carbon to cache incoming data for 3 hours, then the results of my query for past 2 hours of data could come exclusively from carbon cache, which would be ideal. I don't think there's a way to do this though.

Let me back up a little. I'm in a situation where queries (using multiple * globs) are taking too long. I could create aggregation rules to effectively pre-do those globs. Or I can try to improve overall query performance, and that's what I'm doing. On my little-used test instance of Graphite that's using a copy of the production whisper files, a certain query takes only 17s. However, the production instance, which intakes ~30k points per minute, requires 90s for the same query. (The times were measured after dropping all kernel buffers.) I thought perhaps production's single carbon-cache.py was too overwhelmed to answer queries effectively; however, adding a relay and 7 carbon daemons didn't help. I thought next of using memcached. If I were to implement it with DEFAULT_CACHE_DURATION = 600 and I did two consecutive queries to 'return all data points starting from 1200 seconds ago to now', I don't know if the webapp could, on the 2nd query, figure out that some of the data are in carbon cache only, some are in memcached, and the rest are on disk. Do you think it does this? It seems like it would need to be able to do so for it to help in the scenario I've given.

from graphite-web.

cbowman0 commented on June 11, 2024

Your summarization sounds correct to me.

Also to note, the wildcard globing happens via the underlying file system that the whisper files reside on. This is part of the Finder mechanisms. The querying of carbon cache happens after the whisper files are read and is a serial iteration on all found keys (hashed to the correct carbon cache, if there are multiple).

So, the Graphite architecture does not support only querying carbon cache for data.

Also, my experience on large installations is the globing in Graphite is very fast. It heavily utilizes the Linux disk cache for directory entries. Also, the reading of whisper files is also sufficiently fast on its own.

The main factors to identify with query performance are related to disk IO. How is the disk IO utilization on your installation?

Make sure you have fast disks (SSD), enough ram to let the kernel cache a bunch and, finally, make sure that carbon cache is tuned to not use all the IO available (you’ve identified the parameter).

One thing to know about updating whisper files is that an update of a single value is just as expensive as an update of many. Any update is a read, modify and write of a binary file. The writing of 10 datapoints is the same disk IO as the writing of 1 datapoint to the whisper file. So you really do gain a lot of IO throughout with allowing carbon cache to queue the data a little bit. The downside to letting it queue a little is you risk losing data if carbon cache is killed without allowing it to gracefully shutdown.

Also, graphite has some instrumention via logging that can tell you how long some steps take per query, so enable those for debugging this and see if something is enlightening in those logs.

from graphite-web.

ebryerwork commented on June 11, 2024

Thanks for confirming that I've understood you. :)

You mentioned: "The querying of carbon cache happens after the whisper files are read...", i.e. increasing carbon cache retention time won't help. I had been thinking about implementing memcached, but it seems like it would run into the same problem. My queries are of the form 'show me the most recent 2 hours (of this 60s resolution data)', so queries would often be requesting data that's not in memcached. It seems that the webapp would need to read data from the whisper files, and memcached wouldn't speed things up. Does that sound correct?

You also mentioned: "the wildcard globing happens via the underlying file system". I think this means the webapp accesses the files sequentially, because

for f in $(find /storage/icds/data/graphite/storage/whisper/pcp/*/network/all/*/bytes.wsp)
do 
    cat $f >/dev/null 
done

takes 24s and the query accessing a 2 hour time range in those files takes 13s (measured after dropping kernel cache in each case), which I think is similar enough. Accessing the files in parallel as with

for f in $(find /storage/icds/data/graphite/storage/whisper/pcp/*/network/all/*/bytes.wsp)
do 
    cat $f >/dev/null & 
done

takes only 4s (measured after dropping kernel cache). I concluded that the webapp accesses the files sequentially despite the fact that I start multiple webapp threads by passing to the graphite container: --env=GRAPHITE_WSGI_PROCESSES=8 --env=GRAPHITE_WSGI_THREADS=12. I wonder, could running a cluster of webapps on the same server I'm using now help by accessing the queried whisper files in parallel?

The whisper files are on a high performance NFS volume that is backed by a combination of flash and disk. The query I'm testing with is generated with

time wget 'http://rc-graphite.2e.hpc.psu.edu:4000/render?target=aliasByNode(pcp.*.network.all.*.bytes,%201)&format=raw&from=-2h&until=-1min' -O output

Experimentally setting `MAX_UPDATES_PER_SECOND = 1 (vs. 500) did not change that result. Making a copy of the database files onto an attached disk and querying them there lowered the query time by 46% vs NFS. This is a VM, so the attached disk (actually part of a storage array) may be slower than what we'd see on a bare metal system with a flash drive.

from graphite-web.

deniszh commented on June 11, 2024

@ebryerwork : please take note that whisper format potentially require reads for write operations, so, reading data from stale and live files would give you different performance, that's expected.
Also, as you mentioned, increasing wsgi threads and processes will increase parallelism for concurrent requests, but every single request would be sequential. You can try pypy instead of python (which is unfortunately quite outdated) and gunicorn+gevent (--env=GRAPHITE_WSGI_WORKER_CLASS=gevent --env=GRAPHITE_WSGI_WORKER_CONNECTIONS=1000) instead of WSGI for generic increase of python performance for a bit, but both are closer to black voodoo IMO.
I would recommend trying go-carbon or maybe even go-carbon + carbonserver, but trying go-carbon should be a good start.
So try either --env=GOCARBON=1 --env=GRAPHITE_CARBONLINK_HOSTS="127.0.0.1:7002" or --env=GOCARBON=1 --env=GRAPHITE_CARBONLINK_HOSTS="127.0.0.1:7002" --env=GRAPHITE_CLUSTER_SERVERS="127.0.0.1:8000" (but go-carbon and carbonserver are separate software and require own tuning, so, YMMV)

from graphite-web.

cbowman0 commented on June 11, 2024

I would not run this backed with NFS storage. I believe that is the entire performance problem you are seeing.

from graphite-web.

[Q] "Data-Cache miss" every time about graphite-web HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent