Comments (11)
Maybe I misunderstand the output, but it does seem odd. Please let me know if I'm (mis)understood the cache miss message.
I managed to reproduce this with a minimal startup command for the container. I'm going to file a bug in the container channel, but I'll leave this open for now in case anyone can (dis)confirm my interpretation of the log.
from graphite-web.
@ebryerwork : that messages are coming from django cache, not from carbon cache.
Request cache caching request completely (all fields), data cache caching targets, starttime, endtime and xfilisfactor params. Only in case of all field match you'll get request or data cache hit.
See views.py if you're curious and understand a python code.
from graphite-web.
How can I tell if results are coming from carbon-cache.py's in-RAM cache? From what you've said, I can't tell if you mean I have to request a specific data point to get a carbon in-RAM cache hit, or if that's just a Django cache thing (which is separate from carbon-cache). The code is a little beyond my abilities. Thanks.
from graphite-web.
Carbon is not supported memcached as cache, only Django does. Carbon will return data only from RAM.
from graphite-web.
Putting aside memcached for the moment, it's not clear to me what the difference is, regarding cache, between Django and carbon.cache.py. Does Django have it's own cache apart from carbon, and that is sometimes used? If so, what's the difference between Django caching and carbon caching?
I thought carbon returns data from its in-RAM cache first and, if the requested data wasn't present in that cache, then it returns the data from the whisper data files. How can I tell which of those two it is doing?
from graphite-web.
The purpose of Carbon Cache is to buffer incoming data prior to writing to disk (normally into whisper files). This saves disk IO by allowing batching of writes. It is not caching reads.
Graphite reads from carbon cache to obtain any data not yet written to disk and then combines it with data directly read from whisper files on disk.
Adding to be more specific:
Graphite-web reads directly from whisper files on disk.
Graphite-web reads unwritten data from carbon-cache memory via a network connection to the carbon-cache process.
Graphite-web combines the two into a result set and performs and transformations (functions) on it and returns it to the requestor.
Graphite-web can utilize Django caching to cache the result set.
Graphite-web can return a cached result set if one exists in the Django cache.
Carbon-cache accepts incoming data and stores in memory (cache).
Carbon-cache, in a dedicated thread, loops over the data stored in memory and writes the data to the appropriate whisper file.
Carbon-cache responds to requests from graphite-web for data in memory (aka. unwritten data)
from graphite-web.
Thanks very much for that illuminating comment. :)
Please let me know if I've (mis)understood: I'm intaking data every 60s. The time is 12:00. I query for a specific data point written at 10:00. This comes from disk and this result set is cached in Django. If I do that request again, I'll get the result from Django. If I do another query for all data starting from 2 hours ago to the latest data point, it comes from a combination of carbon in-RAM cache (if it hasn't yet all been flushed to disk) and disk (whisper). If I do that query again after 2 minutes, it's not going to come from Django; it'll again come from a combination of carbon cache and disk, because there are new data points that are not in the previous result set.
As far as I can tell the carbon cache retention time is altered by carbon.conf mainly with MAX_UPDATES_PER_SECOND = 500
; there is no tweak to tell carbon to retain incoming data for N seconds. If I could tell carbon to cache incoming data for 3 hours, then the results of my query for past 2 hours of data could come exclusively from carbon cache, which would be ideal. I don't think there's a way to do this though.
Let me back up a little. I'm in a situation where queries (using multiple * globs) are taking too long. I could create aggregation rules to effectively pre-do those globs. Or I can try to improve overall query performance, and that's what I'm doing. On my little-used test instance of Graphite that's using a copy of the production whisper files, a certain query takes only 17s. However, the production instance, which intakes ~30k points per minute, requires 90s for the same query. (The times were measured after dropping all kernel buffers.) I thought perhaps production's single carbon-cache.py was too overwhelmed to answer queries effectively; however, adding a relay and 7 carbon daemons didn't help. I thought next of using memcached. If I were to implement it with DEFAULT_CACHE_DURATION = 600
and I did two consecutive queries to 'return all data points starting from 1200 seconds ago to now', I don't know if the webapp could, on the 2nd query, figure out that some of the data are in carbon cache only, some are in memcached, and the rest are on disk. Do you think it does this? It seems like it would need to be able to do so for it to help in the scenario I've given.
from graphite-web.
Your summarization sounds correct to me.
Also to note, the wildcard globing happens via the underlying file system that the whisper files reside on. This is part of the Finder mechanisms. The querying of carbon cache happens after the whisper files are read and is a serial iteration on all found keys (hashed to the correct carbon cache, if there are multiple).
So, the Graphite architecture does not support only querying carbon cache for data.
Also, my experience on large installations is the globing in Graphite is very fast. It heavily utilizes the Linux disk cache for directory entries. Also, the reading of whisper files is also sufficiently fast on its own.
The main factors to identify with query performance are related to disk IO. How is the disk IO utilization on your installation?
Make sure you have fast disks (SSD), enough ram to let the kernel cache a bunch and, finally, make sure that carbon cache is tuned to not use all the IO available (you’ve identified the parameter).
One thing to know about updating whisper files is that an update of a single value is just as expensive as an update of many. Any update is a read, modify and write of a binary file. The writing of 10 datapoints is the same disk IO as the writing of 1 datapoint to the whisper file. So you really do gain a lot of IO throughout with allowing carbon cache to queue the data a little bit. The downside to letting it queue a little is you risk losing data if carbon cache is killed without allowing it to gracefully shutdown.
Also, graphite has some instrumention via logging that can tell you how long some steps take per query, so enable those for debugging this and see if something is enlightening in those logs.
from graphite-web.
Thanks for confirming that I've understood you. :)
You mentioned: "The querying of carbon cache happens after the whisper files are read...", i.e. increasing carbon cache retention time won't help. I had been thinking about implementing memcached, but it seems like it would run into the same problem. My queries are of the form 'show me the most recent 2 hours (of this 60s resolution data)', so queries would often be requesting data that's not in memcached. It seems that the webapp would need to read data from the whisper files, and memcached wouldn't speed things up. Does that sound correct?
You also mentioned: "the wildcard globing happens via the underlying file system". I think this means the webapp accesses the files sequentially, because
for f in $(find /storage/icds/data/graphite/storage/whisper/pcp/*/network/all/*/bytes.wsp)
do
cat $f >/dev/null
done
takes 24s and the query accessing a 2 hour time range in those files takes 13s (measured after dropping kernel cache in each case), which I think is similar enough. Accessing the files in parallel as with
for f in $(find /storage/icds/data/graphite/storage/whisper/pcp/*/network/all/*/bytes.wsp)
do
cat $f >/dev/null &
done
takes only 4s (measured after dropping kernel cache). I concluded that the webapp accesses the files sequentially despite the fact that I start multiple webapp threads by passing to the graphite container: --env=GRAPHITE_WSGI_PROCESSES=8 --env=GRAPHITE_WSGI_THREADS=12. I wonder, could running a cluster of webapps on the same server I'm using now help by accessing the queried whisper files in parallel?
The whisper files are on a high performance NFS volume that is backed by a combination of flash and disk. The query I'm testing with is generated with
time wget 'http://rc-graphite.2e.hpc.psu.edu:4000/render?target=aliasByNode(pcp.*.network.all.*.bytes,%201)&format=raw&from=-2h&until=-1min' -O output
Experimentally setting `MAX_UPDATES_PER_SECOND = 1 (vs. 500) did not change that result. Making a copy of the database files onto an attached disk and querying them there lowered the query time by 46% vs NFS. This is a VM, so the attached disk (actually part of a storage array) may be slower than what we'd see on a bare metal system with a flash drive.
from graphite-web.
@ebryerwork : please take note that whisper format potentially require reads for write operations, so, reading data from stale and live files would give you different performance, that's expected.
Also, as you mentioned, increasing wsgi threads and processes will increase parallelism for concurrent requests, but every single request would be sequential. You can try pypy instead of python (which is unfortunately quite outdated) and gunicorn+gevent (--env=GRAPHITE_WSGI_WORKER_CLASS=gevent --env=GRAPHITE_WSGI_WORKER_CONNECTIONS=1000
) instead of WSGI for generic increase of python performance for a bit, but both are closer to black voodoo IMO.
I would recommend trying go-carbon or maybe even go-carbon + carbonserver, but trying go-carbon should be a good start.
So try either --env=GOCARBON=1 --env=GRAPHITE_CARBONLINK_HOSTS="127.0.0.1:7002"
or --env=GOCARBON=1 --env=GRAPHITE_CARBONLINK_HOSTS="127.0.0.1:7002" --env=GRAPHITE_CLUSTER_SERVERS="127.0.0.1:8000"
(but go-carbon and carbonserver are separate software and require own tuning, so, YMMV)
from graphite-web.
I would not run this backed with NFS storage. I believe that is the entire performance problem you are seeing.
from graphite-web.
Related Issues (20)
- [BUG] summarize() enters infinite loop when interval=0 and alignToFrom=True
- [Q]Graphs not populating and I'm not sure why HOT 3
- [BUG] FAQ states python 2 is used HOT 4
- UTF8 proble HOT 3
- [Q] Issues with Python 3.10 whitenoise 4 and Graphite Web 1.1.10 HOT 3
- [BUG] [XSS] Multiple reflected cross-site scripting vulnarabilites in Graphite composer mygraph parameters(action and graphName).
- [Q] diffSeries on multiple series comparing to a single serie HOT 2
- [Question] last_login cannot be null HOT 3
- Fix whitenoise support in master branch HOT 1
- [BUG] Could not import graphite.local_settings, using defaults! HOT 2
- [Q] Separate hosts for carbon-cache & graphite-web HOT 6
- [Q] Graphite project small to medium installations HOT 3
- [BUG] Cant install using pip. HOT 4
- [BUG] Incorrect Behavior of asPercent and smartSummarize Functions in Different Time Intervals HOT 1
- [Q] What's the advantage of using tag based metrics over normal metrics? HOT 2
- [BUG] Source install includes .egg directories HOT 3
- Graphite Future Dates Selection HOT 5
- [BUG]collectstatic creates endless symlinks when ran in virtualenv HOT 1
- installation uses 'local' under /opt/graphite /opt/graphite/local HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from graphite-web.