Git Product home page Git Product logo

zfs's People

Contributors

ahrens avatar amotin avatar behlendorf avatar dajhorn avatar dechamps avatar dehacked avatar dinatale2 avatar dweeezil avatar fransurbo avatar gamanakis avatar gmelikov avatar grwilson avatar heary-cao avatar ironmann avatar kusumi avatar loli10k avatar lundman avatar mattmacy avatar nabijaczleweli avatar nedbass avatar nivedita76 avatar ofaaland avatar pcd1193182 avatar prakashsurya avatar rincebrain avatar rlaager avatar ryao avatar sdimitro avatar tonyhutter avatar tuxoko avatar

zfs's Issues

OpenZFS: As a sysadmin I can tune prefetcher for data/metadata so I can optimize IO to workload

Summary

Add configuration tunable{s} and code to the ZFS prefetcher to allow users to enable/disable prefetching for specific data classes (data, metadata), so as to optimize (reduce) IO, and improve cache utilisation (lower ZFS ARC memory usage for blocks that will not be hit) for workloads where they are suitable.

Background

ZFS has a "prefetch" subsystem[1], which when enabled, can read additional data off disk, speculating ahead of time that those blocks may be required and thus be served from the cache. More information on the the prefetch mechanism (note this is the file level prefetch subsystem, and not the vdev cache) can be found here:

ZFS currently only provides a global switch for enabling the prefetcher (vfs.zfs.prefetch.disable).

There are however, subsystem-specific prefetch toggles (again global), for L2ARC, DEDUP, SCRUB:

  • vfs.zfs.l2arc.noprefetch
  • vfs.zfs.dedup.prefetch
  • vfs.zfs.no_scrub_prefetch

These are 'optimizations' already in place for prefetch for particular cases. This feature essentially extends existing capability to data classes (data, metadata).

Motivation

For some workloads, datasets and system configurations (available memory for ZFS, etc), the hit rate of the prefetch for data (or metadata: unverified) can be extremely low, so as to not provide a net benefit over the additional IO involved in prefetching data.

In my testing, at least for my workload (heavy metadata, 8gb memory, vfs.zfs.arc.meta_limit_percent: 100), arcstats.prefetch_data has a hit rate of close to zero (and always < 5%), even below that of the vdev cache which has been disabled by default for "not having a a benefit in most cases".

Note: While the stats below represent only a short uptime, longer uptime/workload stats are identical.

│                       Total     MFU     MRU    Anon     Hdr   L2Hdr   Other
│     ZFS ARC           3902M   1465M    638M   1612M  18484K       0    152M
│
│                                Rate   Hits Misses | Total Rate   Hits Misses
│     arcstats                  : 96%   2079     83 |        98%  2520k  47136
│     arcstats.demand_data      : 89%    388     44 |        96%   769k  24451
│     arcstats.demand_metadata  : 99%   1645     13 |        99%  1732k  14439
│     arcstats.prefetch_data    :  0%      0      7 |         0%      0   2661
│     arcstats.prefetch_metadata: 70%     46     19 |        77%  19421   5585
│     zfetchstats               : 92%     24      2 |        38%   5348   8576
│     arcstats.l2               :  0%      0      0 |         0%      0      0
│     vdev_cache_stats          : 58%     18     13 |        30%   6120  13714

To the extent possible, it would be valuable to be able to limit/disable (to the extent possible) prefetching for 'data' blocks.

Considerations

  • Some recent prefetcher improvements landed very recently in OpenZFS: openzfs#13452 I have not reviewed the changes in depth, but these improvements appear to be related/limited to better prefetch 'scaling' and 'performance'. The changes however may impact complexity, understanding or design of this feature request, so worth noting here.
  • Q: Separate tunables for both data and metadata? Having both disabled is tantamount to having prefetch disabled entirely, or is it? Are there any other prefetch things happening that aren't data or metadata? If not and data/metadata is it, how do we name / organize the tunable(s) with respect to naming, values, etc,

Example: prefetch_{data,metadata}: 0|1 vs prefetch_{data,metadata}_disable:1|0. OpenZFS might already have a pattern/policy/guidelines/precedents for this.

  • Q: My case seeks to disable prefetching data blocks (leaving only prefetching metadata). Is the the reverse ability, to disable metadata prefetching (leaving only data) likely to be useful/valuable? Gut feel says yes, if nothing else than to 'have tunables to configure all available data classes', rather than speculating what may or may not be useful for all/any possible workloads.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.