Problem When the database contains a couple hundreds specs, Spack

Not sure I fully comprehend <a class="issue-link js-issue-link" data-error-text="Faile

Faster database,about spack/spack

Comments (22)

tgamblin commented on June 13, 2024 1

Thanks for the detailed analysis. I have just skimmed this but I think I like the SQLite idea the best honestly because I think it has more long term benefits if we can get it to work.

The thing I see missing here is how the append-only database will work on shared and/or parallel filesystems. Right now, we completely rewrite the DB on every write and I believe we move it back into place after so that updates are atomic. This means every write gets us a new DB file.

This has obvious drawbacks that you’ve covered above, but I believe all the parallel filesystems we care about (NFS3/4, Vast, Lustre, GPFS) notice that change across nodes. That is they know to re-read the file and not use the cached version. I am not sure an append is noticed the same way, particularly with caching in NFS. We need to test to make sure that is preserved or the DB is going to get corrupted when used on a shared FS.

from spack.

haampie commented on June 13, 2024 1

It’s very robust, and guaranteed to be able to open old versions

I was talking about forward compatibility specifically, which is relevant for sharing build caches across systems. So that's supported as long as you don't use new features. There's no guarantee though that all future changes to sqlite that are forward incompatible are in fact optional features.

Hm, I'm not following your train of thought.

You're talking about two issues:

aggressive caching, any change to the db requires a new inode to be on the safe side (right?)
file corruption on write failure

Both (1) and (2) are non-problematic with this new setup:

Acquire write lock
Copy a to b
Append serialized objects to b
Atomically move b to a

like they're non-problematic in the current setup:

Acquire write lock
Read a and deserialize
Write new file b with contents a + new stuff serialized
Atomically move b to a

Writing to the db remains a linear time operation in terms of disk I/O, the only optimization is a constant number of Spec <-> json conversions on write and re-read -- that's the bottleneck.

As for sqlite, it looks like their rollback mode is in fact in-place updates, with an (out of place) journal file. So still hits problem (1)? Sounds pretty sensible that their rollback feature is in-place w.r.t. the DB itself, otherwise it would be the least optimal DB in terms of I/O thinkable (but that's our self-imposed constraint with requiring a new inode...)

from spack.

haampie commented on June 13, 2024 1

Another point about sqlite is that either you normalize the data and put stuff in separate tables (nodes, edges), or you keep a single table with unstructured data in json strings. (Can't rely on new json stuff if we wanna maximize compat)

When we normalize tables we can make queries more efficient (if considered relevant for say spack find), but have to deal with database migrations any time we add an attribute to Spec, which is involved.

If we do not normalize, we don't benefit much from sqlite as we still need to deserialize the same amount of json strings.

from spack.

scheibelp commented on June 13, 2024 1

Zooming out a bit, I'm interested if you know a distinct advantage to the append-file approach: both approaches seem to have similar performance and capabilities. Generally, it's only worth making your own version of something if that approach is clearly better in some way.

Also in the append-only file case, with a new {"add": ...} action

My point wasn't that the append-file couldn't handle it - I trust that any modification can be encoded as an action - but rather that it isn't immutable at the moment.

But again, storing the ref count is useless as it can be computed.

I agree, but I think it's worth checking in with the folks who added it (maybe they had another use case in mind)

(edit: it's not obvious to me why the ref counts were added, the blame points me to d0e22b2, but I don't see any explanation there)

from spack.

haampie commented on June 13, 2024 1

A few more data points:

I've exported a database of about 4K specs using

$ jq -c '.database.installs | .[]' < ~/spack/opt/spack/.spack-db/index.json > stream.json
$ jq -sc < stream.json > array.json
$ sqlite3 db.sqlite "CREATE TABLE specs (data TEXT NOT NULL)";
$ sqlite3 db.sqlite "INSERT INTO specs SELECT json_extract(value, '$') FROM json_each(readfile('array.json'))";

which creates the following 3 files

stream.json: {object}\n{object}\n...\n{object}: 7318611 bytes
array.json: [{object},{object},...,{object}]: 7318613 bytes
sqlite.db: single column table: 9347072 bytes (without indexed updated_at and boolean deleted columns)

reading all looks like this:

import json
import sqlite3
import sys
import time

def read_stream(path="stream.json"):
    decoder = json.JSONDecoder()
    with open(path) as f:
        # f.seek(offset)
        return [decoder.decode(line) for line in f]

def read_array(path="array.json"):
    with open(path) as f:
        return json.load(f)

def read_sqlite(path="db.sqlite"):
    conn = sqlite3.connect(path)
    decoder = json.JSONDecoder()
    c = conn.cursor()
    result = c.execute("SELECT data FROM specs")  # `WHERE updated_at > ?`
    return [decoder.decode(row[0]) for row in result]

cmd = sys.argv[1]
start = time.time()
if cmd == "stream":
    read_stream()
elif cmd == "array":
    read_array()
elif cmd == "sqlite":
    read_sqlite()
elif cmd == "test":
    print(read_stream() == read_array() == read_sqlite())
print(f"{1000 * (time.time() - start):.0f}ms")

best of 5 results:

$ env -i /usr/bin/taskset -c 0 /usr/bin/python3 -E bench.py array
97ms

$ env -i /usr/bin/taskset -c 0 /usr/bin/python3 -E bench.py stream
108ms

$ env -i /usr/bin/taskset -c 0 /usr/bin/python3 -E bench.py sqlite
112ms

So yeah, implementation doesn't matter in terms of Spack startup speed, both options are about 10% slower than reading a single JSON file. (As long as it's "carefully" implemented: initially I used .fetchall() which made sqlite about 50% slower).

Sqlite files are significantly larger (25%), even without the additional columns + index on one of these columns -- I don't think anybody cares about this though.

from spack.

psakievich commented on June 13, 2024

Not sure I fully comprehend #2, but at first blush I like the idea of moving toward a tried and true database. This description really hits the issue on the nose though of needing order 1 operations

from spack.

haampie commented on June 13, 2024

Both sqlite and append-to-file would do in-place updates. I dunno how parallel filesystem caches are typically implemented, but was hoping that reopening the file + seek would be sufficient. If a new inode is required to invalidate caches, then yeah, a post-hoc copy is required in either storage method. It would be O(n) in terms of number of bytes read and written, but O(1) in number of python objects serialized, so still better than what we have now.

Not sure I fully comprehend no. 2

The idea is to be able to do a single query select * from specs where updated_since > ? where ? is the update time / counter you had at the last read. Then you get a handful of rows back, and you'd have to update the in-memory Spec objects with that (so if a new spec is added, create a new spec object, add edges to already in-memory Spec objects from last read, etc.). (That doesn't cover deletion though, maybe it needs soft-delete with a column uninstalled = false/true)

With sqlite my main worry is compatibility. Can a db generated with new sqlite be read by old sqlite?

from spack.

tgamblin commented on June 13, 2024

With sqlite my main worry is compatibility. Can a db generated with new sqlite be read by old sqlite?

Yes. https://www.sqlite.org/formatchng.html

I consider SQLite to be a pretty much rock solid dependency at this point. It’s very robust, and guaranteed to be able to open old versions (just as we do now.

I was re-reading their guidance on using it on shared filesystems — see here:

https://www.sqlite.org/useovernet.html

in particular:

Network filesystems do not support the ability to do simultaneous reads and writes while at the same time keeping the database consistent.

So:

was hoping that reopening the file + seek would be sufficient

I think it’s not.

However, look at the recommendation on rollback mode in the link above. I think this could be what we want. It is similar to the in place updates harmen mentions above but also implements a way to roll back if those transactions fail (details)

The in place updates you’re suggesting here were worrying me because I think any in-place update is going to risk corruption on failure, but the rollback thing above looks like it could solve that. I’d rather rely on something very solid (like SQLite) for that if we can.

from spack.

tgamblin commented on June 13, 2024

I was talking about forward compatibility specifically, which is relevant for sharing build caches across systems

ok sorry — misunderstood. I thought you meant backward compat. I’m on a beach on my phone 🏝️

So that's supported as long as you don't use new features. There's no guarantee though that all future changes to sqlite that are forward incompatible are in fact optional features.

Ok so if we choose SQLite we will have to decide how to deal with this, because the SQLite version it’s out of our control (but probably not changing too fast). It looks like python 3.10 bumped the SQLite min version from 3.0.8 to 3.7.15. Not sure when they plan to do that again, but we’d need to consider it whenever Python bumps the version.

Can't rely on new json stuff if we wanna maximize compat

This is also true. Sigh. This was added in 3.38 which is way too new for us to assume an arbitrary python is using it.

If we do not normalize, we don't benefit much from sqlite as we still need to deserialize the same amount of json strings.

I think IF we could use json we could benefit on the query side regardless, just by offloading the query. Have you looked at query() speed? I don’t think we need to optimize it now but this is one of the long term benefits I was thinking about, which seems harder to get with all the concerns above. I also agree it’s less of an issue than write performance.

Acquire write lock

Copy a to b

Append serialized objects to b

Atomically move b to a

This sounds to me like what we have to do. I read:

then yeah, a post-hoc copy is required in either storage method

and thought you were putting the copy after the append. Anyway, yes I think this is the simplest way to go.

and re-read -- that's the bottleneck

For this I think we could use the new format to our advantage. If your json record contains not only the operation but a hash of the DB after its completion, you could avoid reconstructing specs except for those after the hash of your current DB. We do something kind of similar right now with uuids that tell you if you need to re-read an identical DB. With this you could identify that the DB is the same as yours modulo a few modifications, and verify the resulting hash. This would increase the re-read speed if think.

On the format: I don’t know if we can reconcile the new json format with the old reader. Right now the DB starts with a _meta section, then data, but all that is in one json object. Can the old reader fail gracefully on a stream of json objects, and can the json module read that as an implicit list as you’d probably want? It’s not valid json, right? I think we want to avoid splitting the input and doing lots of json loads.

from spack.

scheibelp commented on June 13, 2024

If we do not normalize, we don't benefit much from sqlite as we still need to deserialize the same amount of json strings.

The append-file approach stores the specs as json strings, so:

Both approaches (append-file, and sqlite-without-normalize) need to initially deserialize all strings once (to read the DB state)
Both approaches don't need to fully re-read the DB for new specs (e.g. sqlite can keep a time-ordered table of new specs, and only the new ones since the last read have to be read)

So I think the sqlite db without normalization is viable (I have nothing against the append-file approach though).

from spack.

haampie commented on June 13, 2024

thought you were putting the copy after the append

yeah, my mistake ;)

Can the old reader fail gracefully on a stream of json objects

json.loads calls JSONDecoder.decode(...) calls JSONDecoder.raw_decode(...). The difference between decode and raw_decode is exactly that the former errors if not all bytes were consumed, the latter can be used in streaming mode. We could change spack to use the latter already, and maybe backport that, so that there are better error messages in case we change to this "stream of json objects db".

If your json record contains not only the operation but a hash of the DB after its completion, you could avoid reconstructing specs except for those after the hash of your current DB

My suggestion was to store (offset, db_identifier) in memory after read, where offset is how many bytes of the db stream you've read. On re-read compare db_identifier with the current value in the db header (or separate file like now). If equal, seek to offset and continue the stream of updates. Hashing isn't necessary? Bumping the identifier is only necessary if there's a squashing / history modifying operation: [add x, add y, uninstall y] -> [add x].

The db_identifier thingy can be stored as the first line / entry, and can in principle be variable length, since only when a full overwrite happens do we need to change its value.

I feel like append-to-file / stream of json objects is reasonably easy to implement.

To actually benefit from sqlite, I think we'd have to make many more changes: don't construct Spec objects until they're actually needed, translate satisfies to SQL queries (to what extent can we?). And then there's the issue that even if you normalize to separate vertices and edges tables, you likely end up doing O(#edges) queries to construct a DAG (there are some optimizations to make, but sqlite certainly isn't a graph database).

from spack.

scheibelp commented on June 13, 2024

To actually benefit from sqlite, I think we'd have to make many more changes:

I'm having trouble understanding why you think an sqlite based solution would have any of these issues. Perhaps the table organization you described in #43548 (comment) might have this problem (although I don't see that), but I think the following approach wouldn't:

sqlite table where each row is [the-hash, installation-time, json-string-of-spec]
A spack process starts with an initial full read of the DB, converting each json-string to a spec (this matches the append-file approach - both are doing essentially the same amount of work)
- When I say full read, I mean like SELECT * FROM ...
That process then stores the max installation time it read
The next time it needs to read/write, and it gets the lock, it queries for all the new specs looking for install-time >= last-max-read, so it never has to fully re-read the DB (also like the append-file approach)

overall, that appears to have all the same benefits described for the append-file approach.

from spack.

haampie commented on June 13, 2024

Yeah, that's the same as what I described. And yes I agree it's mostly equivalent.

My comment was more about what further benefits sqlite would give that could speed up Spack. (If none I would prefer the stream of json objects implementation, since it's closer to what we have)

If there's always an initial select * from there is little point in using a database, right?

If instead we could lazily construct Spec objects when they're actually requested, and also be fine not to attach parents, only children, then sqlite may have benefits over an append-only file, since we could use queries to fetch less data.

But my point here is that (a) it's difficult to translate Database.query(...) into a sql query (especially if the data is not normalized), (b) there are tricks to select all nodes of a tree in one query, but for a graph in general you have to iteratively query the db, and (c) probably more often than not we read the entire database anyways (like when adding reusable specs for the concretizer), so constructing Specs lazily may in practice only cause overhead.

from spack.

scheibelp commented on June 13, 2024

If there's always an initial select * from there is little point in using a database, right?

I think the potential advantages of an sqlite db have more to do with maintenance burden and familiarity-of-interface than with performance:

The DB handles things like keeping the data divided up (so you don't have to worry about seeking to the wrong spot for example)
When updating reference counts, that's probably easier to maintain as a separate table in an sqlite db, vs. how that would be resolved in the append-file case (probably a separate file?)
For an append-file, presumably at some point there would be a desire for a clean-up operation where that isn't necessary for a sqlite-based approach

from spack.

haampie commented on June 13, 2024

The DB handles things like keeping the data divided up (so you don't have to worry about seeking to the wrong spot for example)

You won't seek to the wrong spot.

When updating reference counts,

There's no good reason to store reference counts.

For an append-file, presumably at some point there would be a desire for a clean-up operation where that isn't necessary for a sqlite-based approach

It is, how would you detect deletion in O(1).

from spack.

scheibelp commented on June 13, 2024

The DB handles things like keeping the data divided up (so you don't have to worry about seeking to the wrong spot for example)

You won't seek to the wrong spot.

I should generalize: with the append-file approach, we take on all data alignment responsibilities for ourselves. Maybe that's OK, but it's definitely something to consider

When updating reference counts,

There's no good reason to store reference counts.

You'll want to explicitly propose removing them then: right now the database file stores reference counts. Any other piece of data we want to update in place like

implicit vs. explicit
installation time

we either have to convince ourselves we can modify in place, or split off into another file that we coordinate (each case like that is an opportunity to trip over data alignment/organization issues I mention above).

For an append-file, presumably at some point there would be a desire for a clean-up operation where that isn't necessary for a sqlite-based approach

It is, how would you detect deletion in O(1).

Sorry yes: I think you are right that you do need an explicit clean up operation. I think it's easier for the sqlite-based approach though (it might be possible to make cleanup easier for either approach by assuming it never happens during another running Spack process).

from spack.

haampie commented on June 13, 2024

I should generalize: with the append-file approach, we take on all data alignment responsibilities for ourselves.

It is always at most 1 seek before read/write. That's not very advanced, is it?

installation time

That's immutable

we either have to convince ourselves we can modify in place, or split off into another file that we coordinate (each case like that is an opportunity to trip over data alignment/organization issues I mention above).

It sounds like you're not getting the concept of an append-only file. Please look again at the json example under "Implementation 1". Store a list of changes, don't store state. The state is computed as database = reduce(apply_change, changes, initializer={}).

from spack.

scheibelp commented on June 13, 2024

installation time

That's immutable

Ignoring installation time: there is also reference counts, implicit/explicit, deprecation, and anything we'd want to add in the future.

(In the current implementation, install time for records for uninstalled specs can be updated if the spec is installed again: records for uninstalled specs with no references can be retained in some circumstances)

we either have to convince ourselves we can modify in place, or split off into another file that we coordinate (each case like that is an opportunity to trip over data alignment/organization issues I mention above).

It sounds like you're not getting the concept of an append-only file. Please look again at the json example under "Implementation 1". Store a list of changes, don't store state.

In this case the task becomes aggregating the actions into consistent records, vs. that already being done in sqlite. Aside from reference count updates (which would be quite a few updates), all the install record modifications I see happening now seem like they'd be straightforward to handle that way (still not as easy as with an sqlite db, but not so bad).

from spack.

haampie commented on June 13, 2024

In the current implementation, install time for records for uninstalled specs can be updated if the spec is installed again

Also in the append-only file case, with a new {"add": ...} action (maybe also for overwrite installs? "install" could be a better name then)

Aside from reference count updates (which would be quite a few updates)

Also applies to sqlite if you insist on bumping updated_at on every mutation, technically invalidating more objects than necessary.

But again, storing the ref count is useless as it can be computed. It only has one use case: selecting dangling nodes in spack gc. (a) the number of bytes saved reading from the db is offset by deleting orders of magnitude more data from the file system when uninstalling stuff, (b) a single query select * from specs where ref_count = 0 is insufficient since you need to do upward traversal; it's an iterative procedure, not much is gained. (And for upward traversal we currently would do select * from specs anyways)

from spack.

haampie commented on June 13, 2024

I named a couple already. Not sure if exhaustive, but what comes to mind:

Advantages of stream of JSON objects:

It's a trivial format
Can still query it with jq [-s] w/o spack (easier than a json string in a sqlite table imo)
Similar enough to the current implementation, so should be a smaller code change
Easy to add functionality like changelog / undo / rollback given a list of changes.
Can always project a list of changes to a sqlite db in the future.

Downsides of stream of JSON objects:

State is computed, so querying if /hash is installed is linear time (possibly relevant for build caches)

Advantages of sqlite:

Potential future performance benefits for certain Spack commands that deal only with a subset of the database.

Disadvantages of sqlite:

sqlite is an optional dependency of Python. There may be minimal systems that disable it, where Spack would be unusable.
sqlite does not guarantee that all its forward incompatible changes are opt-in, which can be problematic for build caches
Database migrations might be trickier than changing the JSON scheme; both the database structure and JSON contents are versioned
Backwards compatibility with read-only database (upstreams) might be more complicated than with JSON. Can't apply a database migration and then do queries for the latest version -- can't get rid of old code / queries easily.
Graph structures do not map well to tables in relational databases. The advantage of sqlite is that in the future we could avoid a full database read if we only need to extract a sub-dag from the db, but: that requires iterative querying while traversing that DAG, which is additional complexity and not an obvious improvement to me.

from spack.

scheibelp commented on June 13, 2024

Instead of many small advantages/disadvantages, can you pick what you think is the strongest of them and present that as the clear reason the append-file approach should win?

Advantages of stream of JSON objects:

counterpoints:

sqlite is even more trivial
(a) who does this (spack find should satisfy users)? and (b) does it work on files which taken together are invalid json but composed of json objects?
subjective, both of these will be significant enough that projected line comparisons are not convincing
also easy in sqlite
Likewise, the sqlite DB could always generate an append file in the future (so I don't consider this an advantage)

Disadvantages of sqlite:

counterpoints:

Of all the advantages/disadvantages, this seems the most-solid to me but Spack has a few requirements like make, git, etc.
For what we need sqlite for, there will never be issues w/forwards or backwards compatibility
Both of approaches should read everything into memory, and write it out in whatever new format we need, I don't think either approach will be problematic in this sense
Similar to point prior point: both approaches read DBs into memory, they both need to accommodate older DB versions: there's no obvious advantage/disadvantage
This confuses me: it sounds like you are saying that this possible advantage to sqlite would be hard to implement - at worst that makes it not-an-advantage, not a disadvantage

from spack.

haampie commented on June 13, 2024

but Spack has a few requirements like make, git, etc.

it requires a database to install anything at all

(Also note that make is not a dependency since 0.20 or so, which is how it should be)

sqlite is even more trivial

it's non-trivial in the sense that you need sqlite to parse sqlite files. You're probably saying it's trivial to write queries which is a different thing.

(a) who does this (spack find should satisfy users)? and (b) does it work on files which taken together are invalid json but composed of json objects?

Me, for build caches and on a local db sometimes, even just for pretty printing like jq < index.json | grep .... And yes, otherwise I wouldn't mention it.

This confuses me: it sounds like you are saying that this possible advantage to sqlite would be hard to implement - at worst that makes it not-an-advantage, not a disadvantage

To me choosing a relational database over a stream of json objects only makes sense if its features are necessary, because it adds a binary dependency to Spack, which I'd rather see avoided for the purpose of minimizing system requirements. You seem to suggest you don't want to make use of any of the features of a relational database, so why choose it then? I'm suggesting there could be future advantages like running certain queries more efficiently for large databases. That could be a good reason to pick sqlite now. Except that it's not that clear it will be efficient in general, because relational databases aren't good at graph related queries (and I don't think it's likely we'll attempt to optimize "lazy traversal" any time soon, given that as it is right now specs fetched from the db contain all connected components, which is half of the time half of the db).

from spack.

Faster database about spack HOT 22 OPEN

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent