Not sure what the problem is, but there's a huge difference in processing time when do

Comments (69)

eyeonus commented on August 16, 2024 1

I've been working on TD for a week straight now, during any free time I had outside of work and sleep. I'm taking a break for a while.

If it ain't a bug, it can wait or someone else can tackle it, is my current mode.

from trade-dangerous.

eyeonus commented on August 16, 2024

The problem is cumulative:
During testing, importing listings.csv, a 2.5 GiB file, starting with an empty database, getting from 10% to 12% took under a minute, getting from 70% to 72% took just under 10 minutes, in the same import.

from trade-dangerous.

Tromador commented on August 16, 2024

Can I suggest we expand this concept:

The server side works.
The client side works (apart from one warning, because a Rares station is in lockdown and doesn't appear in market data).

It works.

So I suggest concentrating on optimization, memory use in any case where that remains an issue and then speed rather than adding any new features. Draw a line under it, call it feature complete until we have something which can process imports and output runs in a more reasonable timeframe.

from trade-dangerous.

kfsone commented on August 16, 2024

I saw we recently enabled incremental vacuuming and some other sqlite options to improve performance? It's probably worth turning those off locally and investigating SQLite's features for reporting on what the vacuum will do (keywords: sqlite3, pragma, vacuum, analyze, optimize) which may explain the issue - maybe there's an index that needs tweaking or scrapping.

You also made it more aggressive about flushing the cache (closing the db, even) yesterday, to counter the memory bloat that my simple executemany was causing. So this make sit smell like a transaction issue - either an absence of them at some point or an unexpected nesting of them.

Look for manual transactions (execute "BEGIN"/"COMMIT") and switch those to context-managed cursors so that the sqlite3 lib isn't taken by surprise.

I also strongly recommend looking at the apsw module; it's designed as a drop-in-replacement for sqlite3 but less of it is implemented in python so it has a potential to be faster, but I have not bench'd it with our particular use case.

from trade-dangerous.

kfsone commented on August 16, 2024

https://stackoverflow.com/a/17794805/257645
this is intriguing, it would potentially give us a great way to get the data into the database fast and let sqlite worry about finalization, make it actually earn some of it's keep:
https://stackoverflow.com/a/3167052/257645

I strongly recommend using executemany whenever there's a loop, if possible. If you have to cap it, that's fine, but you save a bunch of overhead.

Did you try running the imports with pyinstrument?

python -m venv venv
. venv/Scripts/activate.ps1   # or . venv/bin/activate on mac/linux
pip install -r requirements/dev.txt
pip install pyinstrument
pip install -e .
cd tradedangerous
pyinstrument ./trade.py import ...

at the end it will give you a bunch of blurb and you can customize how it does that, but I tend to run it like the above and then grab the session from the last two lines:

To view this report with different options, run:
    pyinstrument --load-prev 2024-04-24T10-19-42 [options]

and run something like:

> pyinstrument --load-prev 2024-04-24T10-19-42 -r html >speed.html  # local output
> start speed.html
# or
> pyinstrument --load-prev 2024-04-24T10-19-42 -r speedscope >speed.scope
# and upload that file here: https://www.speedscope.app/

sample speed.scope from an eddblink -O solo: speedscope.zip unzip, hit https://www.speedscope.app/ and drag it into the browse button. use the buttons in the top left to change type of view.

from trade-dangerous.

eyeonus commented on August 16, 2024

The only transaction being performed when starting from an empty database is:

        listingStmt = """INSERT OR IGNORE INTO StationItem
                        (station_id, item_id, modified,
                         demand_price, demand_units, demand_level,
                         supply_price, supply_units, supply_level, from_live)
                        VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )"""

Removing the "OR IGNORE" part doesn't have any effect.

eddblink used to use the executemany function after it finished gathering the list of all changes to be made, by appending the values for each change to the list listingsList within the for listing in listings loop, and then running executemany(listingsStmt, listingsList) after exiting the loop, which resulted in about 6 hours (on my machine) of nothing appearing to be happening while the database was being updated.

Changing it to what it is now, where each listing is inserted during processing with execute(listingStmt, {values}) still took about 6 hours, but at least there was some indication of progress.

The pragmas that were introduced knocked it down to a little over 2 hours.

from trade-dangerous.

Tromador commented on August 16, 2024

The pragmas was me. WAL journal has a massive speed benefit on Linux, though seems less so on Windows and the vacuum keeps the db size to a minimum. The server is set up to vacuum every couple of days and I do want it to do incremental, as a full vac takes a chunk of time. That said, the first vac and optimise on the server's db reduced it from 6GB and change to 4GB and change so it's definitely worth it.

from trade-dangerous.

kfsone commented on August 16, 2024

I hope I didn't sound like I was handing out assignments :( if you can run the pyinstrument command on the server and snag a .scope file it might be really interesting.

from trade-dangerous.

eyeonus commented on August 16, 2024

I hope I didn't sound like I was handing out assignments :( if you can run the pyinstrument command on the server and snag a .scope file it might be really interesting.

I hope I didn't sound like I was feeling put upon. I'm just so tired from working on this for a week solid and need a break for awhile.

I'm running pyistrument, I'll post the results for you if you want to have a look at it.

IT definitely seems to be more an issue on my *nix box than on Tromador's Win one.

from trade-dangerous.

kfsone commented on August 16, 2024

Jonathon first, trade calculator for online game later, man.

TD is a vague distant memory for me - I honestly thought one of you added the plugin system after I'd ambled off, and I was kinda wondering wtf the saitek crap was until I looked at one of the files and saw the first line comment. cough

Point: I don't know what the priorities are or the full details of what's wear and what's tear. If you think I can productively be of use somewhere specific, you'll need to point me, and while I'm human and will get my knickers in a twist or my butt hurt in the moment when it comes up that I'm responsible for any particular cretinous decision or demonstrate of derp, I'm not a diva: my first job was for a UK "national" contract software company as an embedded engineer, and when the company went belly up I got offers from each customer I'd worked for to go finish development, but I either didn't really "get" the companies or feel adequate, except for one, in my home town, on the fishing docks. So I accepted their offer, which came with a caveat: they weren't big enough to pay for a programmer, they were a cold-store/blast-freezer/fish packing plant. So if it got busy, I worked packing fish, loading/unloading 50lb boxes of frozen shark bellies or halibut. Yeah, I've worked some fancy places since, but I didn't stay.

from trade-dangerous.

kfsone commented on August 16, 2024

In some countries, TD is almost old enough to start putting bread on the table :)

from trade-dangerous.

Tromador commented on August 16, 2024

IT definitely seems to be more an issue on my *nix box than on Tromador's Win one.

We aren't on identical hardware though so although I agree from our testing that Linux and Windows do seem to behave differently, don't forget I have lots of CPU, on the other hand you have more RAM than me. It's hard to simply draw comparisons therefore just based on OS.
The server is another game entirely, running an archaic version of Centos, such that I have had to build TD (from source) its own special environment to run in with its very own sqlite3 and OpenSSL libraries and a super special Python just for it, along with all the environment shenanigans to make it not use the outdated system versions. Not to mention it's kinda short on RAM, but again is free vhost and server hosting isn't cheap.

I to tend to waffle when I am tired, point being, many variables outside of OS. Particularly hardware.

from trade-dangerous.

Tromador commented on August 16, 2024

In some countries, TD is almost old enough to start putting bread on the table :)

I quite deliberately put Est 2015 at the start of the thread on E:D forums. Other tools may be quicker, flashier whatever, but nothing else can do what we can and nothing else has our stubborn will to survive.

from trade-dangerous.

eyeonus commented on August 16, 2024

I don't know what the priorities are or the full details of what's wear and what's tear. If you think I can productively be of use somewhere specific, you'll need to point me,

This is my list of things needing doing at some point. The Niceness value is how important I consider each item, with lower being more important just like *nix CPU prioritizing.

1) N = -20

Anything you can do to improve import performance is welcome. On my box, with the current TD, both spansh and eddblink take a bit over 3 hours to do an import. Obviously different machines will take different times, but ideally anything but a potato should have sub 1 hour import duration. My box has an 8-core CPU with 64GB RAM, so it's not a potato. (Also I checked the drive, and it has 45MB/s+ write speeds, but neither import ever went above 5MB/s, so I highly doubt that's the bottleneck for me.)

2) N = 5

The spicing up of the visuals is certainly welcome, I like what you've done there and would be pleased to see the visual improvements you've done to spansh done to eddblink (and the rest of TD) as well.

3) N=-10

TD needs to be able to build the other Tables. We have two reliable sources, those being the daily dumps from Spansh, and the EDCD lists, such as rare_items.csv

We currently use the Spansh data (and EDDN via the listener) to fill out System, Station, Item, and StationItem, and could potentially use it to fill out Ship, ShipVendor, Upgrade, and UpgradeVendor.

We currently use EDCD to fill out FDevOutfitting and FDevShipyard, and could potentially use it to fill out RareItem, Item, and Category.

There's obviously some overlap, so in the case where we can use either, I think EDCD is the better option.
Spansh (and listener): System, Station, StationItem, ShipVendor, UpgradeVendor, Ship, Upgrade
EDCD: FDevShipyard, FDevOutfitting, RareItem, Item, Category

That leaves Added as the only one that can't be easily updated, but as that's only used in System, to indicate (I think?) what version of Elite Dangerous a system was added to the galaxy, I feel like it should individually get N=20

Getting everything populated via Spansh means updating to utilize the information provided by the dump that is currently being ignored, as well as pulling from the EDCD sources when building the DB from scratch or they've been updated.

Getting everything populated via the TD listener means listening to the other EDDN messages besides just the commoditiy ones, and processing them accordingly. Since the listener uses Spansh, this only matters for things which change more often than once a day. The eddblink plugin pulls from Trom's server, so getting this into the listener automatically gets this into eddblink as well.

4) N=-18

Rescue Ships are EVIL to TD, because not only do they move, they also change their name, as they are named after whatever station they are currently rescuing. This results in duplicate entries in the prices file whenever two different rescue ships go to the same station.
For example, "Rescue Ship - Arc's Faith" has two entries with a different station_id because it's been visited by two different rescue ships. The FDev Id for the ships are different, and searching EDSM, I found one of them is listed as being at HIP 17694 \ Hudson Observatory and the other is at Didio \ Laumer Orbital.
The next Spansh dump should have those updates, but even with that, both of the rescue ships are now named differently since neither one is at Arc's Faith anymore. Renaming the station needs to be done so that users don't go looking for the wrong station, and maybe adding the FDev ID to the name to prevent the duplicate station error is a good idea?

5) N=12

Working GUI, as in feature complete with TDHelper.
If you have TD installed via pip, you can run tradegui and see the awful attempt I've got so far.

6) N=20

Update the wiki to account for changes made to TD since the last time it was updated.

I can probably get all that done myself, but I know it'll take a long time and be coded poorly, because I'm entirely self-taught on this stuff. I'm currently in college for computer science, so I'm better at this stuff than I was when I picked up the torch, but I'm still learning, and most if I'm learning the hard way.

from trade-dangerous.

eyeonus commented on August 16, 2024

Which reminds me, I set this to run before I left for work.
speed.tar.gz

from trade-dangerous.

Tromador commented on August 16, 2024

I'd venture to add. Import performance is the biggest problem as eyeonus says. No doubt. But...

I reinstalled E:D being as we had working TD. I guess I was feeling nostalgic. So I have been using it in anger and there are in fact two things which leave me twiddling my thumbs. The first is absolutely importing data. The second is exporting data, by which I mean getting query results from trade run. The size of the dataset has grown in just the same way for making queries, so that the tree of possibilities which must be examined has become vast. This can be mitigated by filtering down, excluding by age for example, or no odyssey is a good one, but if you want to do a full check with a huge hold, enough money to fill it with whatever, and a jump range sufficient to cross the bubble? It's not usable. I am taking to lying about my real jump range to constrain the tree size.

So yes. Import speed does need optimization, but so does the output.

from trade-dangerous.

kfsone commented on August 16, 2024

@eyeonus The visual stuff would ultimately be about code reduction - if rich had existed 10 years ago I'd not have written progressbar etc. I was looking at the transfers code, and it feels like half that module is just about trying to get "requests" imported. That's covered now by the requests.txt file, right?

GUI I always figured that would end up being web-based but the web-app concepts at the time weren't really much use and I had reasons not to really want to get too much into html. But perhaps 'gradio' would be an option.

So that brings us to the meaningful work, and I guess I always suspected TD would find itself struggling for trying to be a cold-runner rather than some kind of service.

Juts thinking out loud here: We pay an entry fee for storing our data: every operation we need to do has to be expressed textually as SQL, parsed by the SQLite code, sqlite has to plan what to do and then execute on it. Lot of cpu cycles just to start reading. The ordinary payoff is a sort of storage-jit: tell the engine what you want in the abstract and it can figure out the fastest way to do it. We're not doing elaborate queries with sophisticated multi-table joins, groups, sorts, wheres, etc.

I mean, SELECT * FROM System is actually better for us than SELECT * FROM System ORDER BY .... We don't have geospatial indexing turned on, and the SQLite options there would hurt us anyway.

Spitballs, consider each individually not as a sequence:

Incremental imports

for system in systems:
INSERT system INTO SystemIncoming
for station in stations:
INSERT station into StationIncoming
executemany (
INSERT item INTO ItemIncoming FOR item in station.items
)
if thread: thread.join
thread = Thread( move *incoming -> * )

Using the temp table means we can avoid either building our own list or the cost of per-item queries or database indexes while we're doing this work.

We can also parallelize the work here; we fully populate the temporary table without index overheads etc, and then have sqlite move things over. Now we're making SQL pay for itself because the query planner/executor can use its own internal data about quantities and stuff to do the transfer.

That's the theory - I don't know if it's true for SQLite.

Persistent database

We intuitively get why a movie ticket costs $25 when renting the film costs $5, but TD is paying $25 to rent. SQLite's benefit is that you don't need a server, it's con is that you dont' get a server doing all the heavy lifting in another thread/process or possibly on a different cpu. That's a fundamental part of the hitch here. We've turned on indexes and joins. We've started doing vacuums and the problem is they can only happen while there's a process running, and we're it.

We could: a) have TD split itself in two, a TradeAdder that implements TradeDB and soaks up the costs of keeping the database fresh, and a TradeGecko that shuttles stuff back and forth between the user and the backend; not entirely implausible, mostly separating concerns and building facades/bridges b) Make it possible or even a default to have a persistent database spun up one of several ways - docker, podman, vm, local or remote url and use that, with sqlite as a naive fallback

This is where we might look to something like SQLAlchemy, PeeWee or pony. It's also where we might actually change some of the operations encumbered on TradeDB so that it doesn't have to in-memory everything.

Persistent td/REPL

A variant of (a) in the above, we literally just make TD keep itself alive over probably a non-sql persistant store (either embedded or a second process; key value or nosql: leveldb, rocksdb, memcache) and then have a very simple api for talking to it that it can use on itself; either thru json-rpc with something like flask/eel or literally have it read stdin and accept/process trade commands and pass them to the existing command-line arg parser:

> trade.py repl
trade@SOL$ import -P eddb -O solo

via pseudo code:

for line in stdin:
  args = ["trade.py"] + split_args(line)  # deals with quotes etc
  main(args)  # <-- the old main, which can't tell they didn't actually come from the command line

async or threading

I don't know off the top of my head where we can leverage this right now to any good use; I created an experimental Rust-based python-package providing a rust version of the "count lines in file" method. That's a hard one to optimize because cpu isn't the main bottleneck. I did a version where it would on startup create a number of counting workers, and then you simply gave them byte-regions to count from.

And there was no version where it wasn't slower with concurrency. I wasn't exhaustive by any means, but you're trying to apply a very simple set of logic and rush thru memory, so there's not a lot of juggle room.

from trade-dangerous.

kfsone commented on August 16, 2024

Which reminds me, I set this to run before I left for work. speed.tar.gz

that is uniquely ... lacking info lol. I've never seen it give that little information. what are the specs on the server?

from trade-dangerous.

eyeonus commented on August 16, 2024

Which reminds me, I set this to run before I left for work. speed.tar.gz

that is uniquely ... lacking info lol. I've never seen it give that little information. what are the specs on the server?

That's not the server, that's my machine.

Tromador would have to provide you with any server runs.

from trade-dangerous.

eyeonus commented on August 16, 2024

@eyeonus The visual stuff would ultimately be about code reduction - if rich had existed 10 years ago I'd not have written progressbar etc. I was looking at the transfers code, and it feels like half that module is just about trying to get "requests" imported. That's covered now by the requests.txt file, right?

??? What requests.txt file?

GUI I always figured that would end up being web-based but the web-app concepts at the time weren't really much use and I had reasons not to really want to get too much into html. But perhaps 'gradio' would be an option.

I was considering using pySimpleGUI, I'll have to look at gradio, never heard of it.

So that brings us to the meaningful work, and I guess I always suspected TD would find itself struggling for trying to be a cold-runner rather than some kind of service.

Spitballs, consider each individually not as a sequence:
* Incremental imports

Since the speed of the import goes down as the table gets bigger, this sounds like a great idea. The vast majority of the information goes into StationItem: all the other tables, completely filled with the spansh data, don't amolunt to even .5GB, but add in the market data and it grows to ~7GB, so while I doubt we'd see much improvement doing this on any of the other tables, it might be a big help for that particular one.

* Persistent database
* Persistent td/REPL

I"m sorry, that's all over my head. I defer to your expertise.

from trade-dangerous.

kfsone commented on August 16, 2024

| I"m sorry, that's all over my head. I defer to your expertise.

Oh, just to make you regret saying that:

| ??? What requests.txt file?

... was a typo of "requirements.txt". What I mean't was, in order for anyone to pip install Trade-Dangerous, it depends on requirements.txt on its own, right?

Duh. Gonna check that real quick, dum-me:

so it should be safe to remove all that crap screwing around to make sure requests is installed (in transfers.py), haha

from trade-dangerous.

eyeonus commented on August 16, 2024

Oh. Yes.

from trade-dangerous.

kfsone commented on August 16, 2024

| I was considering using pySimpleGUI, I'll have to look at gradio, never heard of it.

I don't have useful input on GUIs really; 'gradio' is - I believe - what people are using to build web-based-gui-apps for things in python, so it's not sexy but you spend a lot more time on what the gui elements do than on the screen itself. So, for instance, "StableDiffusion" has a gradio-based UI, as does the popular "run your own local GPT" https://github.com/oobabooga/text-generation-webui... https://www.gradio.app/ Yeah, I know it says machine learning, but they're pitching to their new market because of those last two.

I've used flask for building UIs before but then you need to build an entire front-end application in angular or react or cut-my-wrists-for-me-now-please.

If you're going to be old school about it and do a desktopy app; wxWidgets is shit, but you can keep it relatively simple and use 'wxFormBuilder' to design the app - I've done that a few times lately: https://github.com/wxFormBuilder/wxFormBuilder.

So pySimpleGui might be rockstar - check that out before wxWidgets, because I hate it as much as I use it.

You could conceivably do something with Dear imgui - I found it easy enough to use I wrote wrappers for it in C++ (https://github.com/kfsone/imguiwrap) when I used it a few years ago, the python version is ok, but it's best use case imho is for in-game debug screens/menus etc.

So I'll look at doing some experiments on the item import and how it is currently working.

How often have frontier been adding items?

I wonder if it would make sense to actually alter the row format of StationItem so that there are two rows per station: one for selling, one for buying.

from trade-dangerous.

kfsone commented on August 16, 2024

Right now I'm trying a little experiment that reintroduces a per-station cursor; we're not building a list of items for executemany, but we're also not doing the full cost of adding each and every item in its own mini-transaction. Also breaking up the code a little-bit where we're not invoking methods in loops and we can afford the extra 40-50ns it takes to call a method in python lol.

So for right now I think it would be useful for us to look at:

apsw instead of sqlite3; it pulls some tricks like automatic stored-procedures that we might benefit from where we're not using executeman. replacing sqlite3 across the whole program will take a bit of work, but just replacing it in one plugin might be fairly easy for benchmark purposes.
incremental import, i.e. write all the new entries into a pristine, unindexed table, and then let sqlite transfer them. could either be radically worse, since its doing "more" work, or radically faster because it's the database doing that work and the overheads we pay are supposed to make exactly that kind of thing faster...
alternative storage backend; perhaps literally take eddblink_plug and replace the sqlite3 calls with something that talks to postgres or mysql and see what it's like having it dump the data into that? not a full transition or anything, but just a naive "what if"

from trade-dangerous.

kfsone commented on August 16, 2024

Win|PS> python -m apsw .\data\TradeDangerous.db
SQLite version 3.45.3 (APSW 3.45.3.0)
Enter ".help" for instructions
sqlite> select count(*) from item;
┌──────────┐
│ count(*) │
│ 395      │
└──────────┘
sqlite>

forget the row-shape idea then lol

from trade-dangerous.

kfsone commented on August 16, 2024

Just so you know - I'm not remotely above figuring out how to get an AI to help me, but I'm also not expecting it to be actual help. However... Giving it the ability to read the repository definitely makes it more useful:
https://github.com/kfsone/ggama

from trade-dangerous.

kfsone commented on August 16, 2024

$ ls -1sh /dev/sda /dev/sdb
/dev/sda 1PB
/dev/sdb 15MB
$ rm -rf pride  # surplus to requirements
$ mv /sdb/shame /sdb  # storage upgrade required

from trade-dangerous.

kfsone commented on August 16, 2024

Do we care about the added column any more?

from trade-dangerous.

eyeonus commented on August 16, 2024

Do we care about the added column any more?

I gave the Added TABLE an N=20

I never cared about that column. :D

from trade-dangerous.

eyeonus commented on August 16, 2024

Just want to say, the recent PR is definitely an imrpovement. Even with the additional 'optimization' step, it finishes processing both listings and listings-live faster than the old way could get through just listings.

I'm happy, and extremely grateful.

from trade-dangerous.

kfsone commented on August 16, 2024

Just doing an experiment with the "import to a different table first" strategy. Took 45 minutes to the import first time, then I repeated it and it takes it less than 1 minute to write the data to the unindexed table.

Now it comes down to how long its going to take merging the tables.

We may need to ditch the partial indexes.

CREATE INDEX si_itm_dmdpr ON StationItem(item_id, demand_price) WHERE demand_price > 0;
CREATE INDEX si_itm_suppr ON StationItem(item_id, supply_price) WHERE supply_price > 0;

and instead, have separate tables for demand and supply.

Or maybe we should drop the indexes before the transfer and re-add them.

https://github.com/kfsone/Trade-Dangerous/tree/kfsone/import-via-table -- check it out;

Expect low presence from me over the weekend, wife points low.

from trade-dangerous.

kfsone commented on August 16, 2024

I remember why I added those indexes - a major difference between a persistent database service and an embedded database process being causes a major hitch point when you try to get a list of only the items that are being sold from stations, because the 'supply_price > 0' ends up being expensive in a where clause where a persistent database would deal with it.

But I don't remember what I added them for - as in where the pain point surfaces.

Option a: Refactor StationItem thus:

CREATE TABLE StationItemSupply (station_id, item_id, credits, units, level, modified);
CREATE TABLE StationItemDemand (station_id, item_id, credits, units, level, modified);
CREATE VIEW StationItem (station_id, item_id, demand_price, demand_units, demand_level, supply_price, supply_units, supply_level, modified) AS
... well, I'm not sure otomh ...

Option b: ditto, but change references to StationItem accordingly rather than creating the view,

Option c: add "in_demand" and "in_supply" bools and make the where based on that -- this becomes useful as long as we don't change the field;

Option d: add a separate table with the in_ flags since that might be easier to avoid updating.

Option e: use NULL -- that is, make one or more of the fields NULLABLE as a way to say "not present", the theory being that value and nullness are two different things, and it may be easier for the db to early-out when it detects that nullability did not change. This is an open question.

Option f: Switch to a different storage engine

from trade-dangerous.

kfsone commented on August 16, 2024

Another way to deal this would be bitmaps; assign every station and item unique ids and then create bitmaps for each access. they'd be kind of big but if you can mmap them then queries against them become pretty trivial, and the filesystem ought to be able to eliminate empty pages; or we use a hash-bucketing approach to constrain the quantity of pages.

from trade-dangerous.

eyeonus commented on August 16, 2024

We do use unique ids for each station and item, we're using the FDev ids, and those are unique.

from trade-dangerous.

kfsone commented on August 16, 2024

Ah, I mean a "local", 0-based unique id - we could probably use row_id except we have no control over when they gets recycled.

If we use the fdev ids - they get quite massive, and we don't want to create a bitmap which has 128,000,000 empty bits at the start to reach the first id :)

from trade-dangerous.

eyeonus commented on August 16, 2024

Oh. So the filesystem-eliminates-empty-pages thing, and the hash-bucketing thing, wouldn't eliminate the empty bits in that case? Or am I misunderstanding something?

from trade-dangerous.

ultimatespirit commented on August 16, 2024

Can I suggest we expand this concept:

The server side works. The client side works (apart from one warning, because a Rares station is in lockdown and doesn't appear in market data).

It works.

Just wanted to chime in that between v10.16.14 (2024-04-23) and v11.1.0 (2024-04-27) (it is quite frankly ridiculous that a new version is being created seemingly every single commit by an automated spam bot btw) this statement does not appear to be true anymore.

Since it's about impossible to figure out what a stable release could be I can't really test if this is just my setup or not, but it's likely at some point in the refactoring process (which is great to see by the way, cutting out the weird multiple layers of database formats would definitely make this clearer) the buildcache functionality completely broke. In an incredibly annoying way too.

As I've found out, even doing $ trade.py import -P eddblink (as per the wiki) will not spare you from making the sqlite database, so any command will tart trade.py buildcache -f it looks like. This command then promptly fails from the rescue megaship problems above causing it to crash.

Okay great, pruning the duplicates (only 5 of them or so) by age of last entry (none of their data is even from this year...) got it to continue. Somewhere between an hour or two hours of single threaded python parsing later it parses the 4.1gb prices file eddblink presumably created on initial import into a 6.8gb data/TradeDangerous.new sqlite databse, next to the already existing (I guess also downloaded by eddblink) 6.1gb data/TradeDangerous.db file. Aaaand then it crashes from apparently closing its own database link:

$ ./trade.py run --from "any-system-here" --credits 50m --capacity 192 --ly-per 15 --ls-max 1100 --routes 3 --hops 5 --loop --progress --summary

NOTE: Rebuilding cache file: this may take a few moments.
NOTE: Import complete: 44257630 new items over 243,411 stations in 23,048 systems
Traceback (most recent call last):
  File "...Trade-Dangerous/./trade.py", line 44, in <module>
    cli.main(sys.argv)
  File "...Trade-Dangerous/tradedangerous/cli.py", line 66, in main
    trade(argv)
  File "...Trade-Dangerous/tradedangerous/cli.py", line 96, in trade
    tdb = tradedb.TradeDB(cmdenv, load=cmdenv.wantsTradeDB)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...Trade-Dangerous/tradedangerous/tradedb.py", line 618, in __init__
    self.reloadCache()
  File "...Trade-Dangerous/tradedangerous/tradedb.py", line 700, in reloadCache
    cache.buildCache(self, self.tdenv)
  File "...Trade-Dangerous/tradedangerous/cache.py", line 1004, in buildCache
    tempDB.commit()
sqlite3.ProgrammingError: Cannot operate on a closed database.

That's very likely caused by this db.close() line in the processPricesFile() function in cache.py:

7c60def3 tradedangerous/cache.py (Oliver Smith          2024-04-24 11:16:29 -0700  709)     tdenv.DEBUG0('Committing...')
9e7d4834 cache.py                (Oliver Smith          2014-12-25 23:24:00 -0800  710)     db.commit()
7c60def3 tradedangerous/cache.py (Oliver Smith          2024-04-24 11:16:29 -0700  711)     db.close()
d509fda7 tradedangerous/cache.py (eyeonus               2019-02-14 12:51:07 -0700  712)

As called by cache.buildCache() here, note that tempDB is db in the above function, which now gets closed (and it attempts to close again had it succeeded, classic use after free)

b9be772e cache.py                (Oliver Smith 2014-11-27 03:37:23 -0800  995)     if pricesPath.exists():
7af37fa7 cache.py                (Oliver Smith 2014-12-02 20:41:11 -0800  996)         processPricesFile(tdenv, tempDB, pricesPath)
44b8f74f cache.py                (Oliver Smith 2015-01-01 12:47:52 -0800  997)     else:
44b8f74f cache.py                (Oliver Smith 2015-01-01 12:47:52 -0800  998)         tdenv.NOTE(
44b8f74f cache.py                (Oliver Smith 2015-01-01 12:47:52 -0800  999)                 "Missing \"{}\" file - no price data.",
44b8f74f cache.py                (Oliver Smith 2015-01-01 12:47:52 -0800 1000)                     pricesPath,
a7dba2f6 tradedangerous/cache.py (Oliver Smith 2024-04-21 21:25:52 -0700 1001)                     stderr=True,
44b8f74f cache.py                (Oliver Smith 2015-01-01 12:47:52 -0800 1002)         )
d509fda7 tradedangerous/cache.py (eyeonus      2019-02-14 12:51:07 -0700 1003)     
b6f3a49f cache.py                (Oliver Smith 2014-12-17 23:26:36 -0800 1004)     tempDB.commit()
8fb2c0b4 buildcache.py           (Oliver Smith 2014-11-15 21:33:09 -0800 1005)     tempDB.close()
7c60def3 tradedangerous/cache.py (Oliver Smith 2024-04-24 11:16:29 -0700 1006)     tdb.close()

I'm a bit confused what the data/TradeDangerous.db file I have is from for there to be 44,257,630 new items tbh, based on the git issues here it sounds like the EDDBLink server spends a lot of time doing an import of the several GB json dump file to make a database of the latest data that users can consume. Perhaps the wiki is just very out of date and the way to do that is different but my usage was instead downloading a several GB set of files from that server to... spend a lot of time making the latest database (that then crashes) and have nearly as many new items as the entire length of TradeDangerous.prices.

Oh and to make this even better, it sees the first crash as a legit database that didn't crash I'm guessing as I have data/TradeDangerous.old of 144kb in size that doesn't get replaced by the TradeDangerous.new file upon retry, instead just clobbering TradeDangerous.new so 2 hours of processing down the drain oops. My fault for not backing everything up every step of the way for fear of the program shooting itself in the foot.

TL;DR / Summary section

There's some diagnostic stuff above, you have a use after free error introduced in commit 7c60def3 presumably as debugging. Then some "hey as a prospective user who isn't also an active developer on this" feedback that this current development spike is great, but also your safe release processes are lacking, making this not very usable at the moment.

I installed via git clone ... but your pypi repo is setup to auto publish every erroneous tag that spam bot publishes. It goes version 10.13.10 published on 2023-02-13 to 10.14.2 on 2024-03-23, and then 32 new versions from 2024-04-15 to 2024-04-27. At v11.0.0 the breaking bug from 7c60def3 gets added in. All versions afterwards are guaranteed broken (this bug clearly hasn't been found yet). git clone ... I can somewhat see a reasonable workflow of saying "our default branch is the dev branch, checkout a tag if you want stability", but pip install... is right now completely broken too.

Because there is no stable release. Releases are autogenerated from the current pinned branch release/v1 by that github bot, and you all have been developing on release instead of a development branch. I'm guessing v10.14.2 is the last "stable" and working release, as that should be when the spansh update info was added in, but I'm not really sure there that's just a guess from reading the github issue threads.

Please, either disable that tag bot or start developing on a dedicated development branch rather than all in one branch. Or both, both is pretty good too. And then you'll need to publish another breaking change major version v12.0.0 that rolls back to whatever is the last known good stable release, or at least somewhat stable, for pip.

Oh, also, it would be quite nice if the for item in items: ... db update loop in processPricesFile() had at least some progress indicator, something like every len(items)//10 items printing how many items are left would be grand. Being able to restart from where it left off rather than clobbering the DB would be great too, but understandably that's likely more complicated to handle consistently. Aaaand in case y'all weren't aware somehow, the type hints are great, but you have type errors being exposed by them. Either functions were too narrowly typed (since the code somehow works), or there are logic errors going uncaught, though understandable if it's just being treated as a complete debug / dev branch warts and all.

And, I apologise if the tone comes off a bit miffed, debugging this mess was quite the chain of "why is it doing that?" frustrations I wasn't expecting to need to do for a supposedly stable tool that looks really cool and I'd love to be able to use and support the use of it, not helped by it deleting its own database. Y'all are doing great stuff. Just, pretty please, stop developing on and pushing to "release", or stop the bot spamming release tags every commit, or both.

from trade-dangerous.

eyeonus commented on August 16, 2024

Thank you for catching that. I had no idea that error was there, obviously I haven't been testing the code well enough. As far as the "developing on release" thing, that's also entirely me, Oliver has been developing on his own private branches and issuing PRs.

As for the bot, that's not going away. That said, since I shall now be endeavoring to work on branch and merge rather than work on main, you'll be seeing a lot less releases being made. It doesn't make a new release for every commit, but I won't get into the details of how it decides to do so, as that's probably a bunch of information you don't care about. If you do, look up 'python-semantic-release', it'll do a better job explaining than I could anyway.

from trade-dangerous.

eyeonus commented on August 16, 2024

I'm currently testing my fix to the cannot operate on closed DB issue you found.

from trade-dangerous.

ultimatespirit commented on August 16, 2024

I just finished confirming the full rebuild finished and trade.py worked again with this change:

$ git diff
diff --git a/trade.py b/trade.py
diff --git a/trade.py b/trade.py
old mode 100644
new mode 100755
diff --git a/tradedangerous/cache.py b/tradedangerous/cache.py
index 9523438..449c81c 100644
--- a/tradedangerous/cache.py
+++ b/tradedangerous/cache.py
@@ -668,7 +668,11 @@ def processPricesFile(tdenv: TradeEnv, db: sqlite3.Connection, pricesPath: Path,
     removedItems = len(zeros)
     
     if items:
-        for item in items:
+        progressSize = len(items)
+        progressMark = progressSize//10
+        for (nu, item) in enumerate(items):
+            if nu % progressMark == 0:
+                print(f"Processing cache items, {nu}/{progressSize} = {nu//progressMark}% (10% = {progressMark})")
             try:
                 db.execute("""
                     INSERT OR REPLACE INTO StationItem (
@@ -708,7 +712,6 @@ def processPricesFile(tdenv: TradeEnv, db: sqlite3.Connection, pricesPath: Path,
     
     tdenv.DEBUG0('Committing...')
     db.commit()
-    db.close()
     
     changes = " and ".join("{} {}".format(v, k) for k, v in {
         "new": newItems,

The only necessary one being the removal of the additional db.close() in processPricesFile(), the print statements are just there for that progress estimator I spoke of above. Obviously if you did want to have more progress marks there it should use tdenv.INFO() or whatever rather than the prints. Just included since it was already in my diff output.

from trade-dangerous.

kfsone commented on August 16, 2024

@ultimatespirit appreciate the diligence and the detailed analysis; apologies to both of you for the churn I've caused lately - @eyeonus I'll try to do better about making less work for you man. I wanna make sure you don't think I'm gonna be a diva if you have to put a flea in my ear or point out I'm a nub.

| Oh. So the filesystem-eliminates-empty-pages thing, and the hash-bucketing thing, wouldn't eliminate the empty bits in that case? Or am I misunderstanding something?

The eliminates-empty-pages thing would be "spare files", but when I say "empty pages" I'm thinking in terms of a handful, as opposed to the file starting with 3906 empty pages (pages required to represent the 128000 unused IDs at the start of the ID-space).

But I'm spitballing there, the hash-bucket has pros-and-cons, primarily that it requires a computation to get from actual-fdev-id to bit-no in the bitmap, and after doing it the resulting bitmap is only good for a one-way lookup: the "stations selling this item" bitmap on an item object can be used to go from a station id to do a quick bit-check of "does this station have this item", but it might not be possible to do the reverse, to say "bit 40401 is set, what does that mean?" because there isn't guaranteed to be a meaningful way to reverse the algorithm and calculate what station id bit 40401 represents.

I did some experimentation to try and find algorithms that might work but at the end of the day I think it will be easiest to just have some kind of per-instance hidden column or table that collapses fdev_ids into near-sequential smaller numbers. I'll table those for an actual presentation of implementation and thinking when I have opportunity to put it together.

from trade-dangerous.

kfsone commented on August 16, 2024

The reason there's no progress display during the listings.csv download is that @Tromador's server isn't giving a size on the download, and the transfers.py code decides not to try and represent the progress.

The rich progress.py changes I have in the "import-via-table" branch at least has a go at this, and we can improve on it:

and the Processing stage is night-and-day. The nice thing about using rich is that the progress stuff happens in a thread so you aren't bleeding away performance in the tight loop.

Recording.2024-04-28.213320.mp4

It also uses these to let you know it's still working when it gets to rebuilding the index at the end, too.

from trade-dangerous.

kfsone commented on August 16, 2024

(didn't mean to throw you under the bus @Tromador - I should have said: our call to trom's server for the listings.csv isn't getting length information -- likely that there's a header missing in the request or sometihng)

from trade-dangerous.

kfsone commented on August 16, 2024

from trade-dangerous.

ultimatespirit commented on August 16, 2024

The eliminates-empty-pages thing would be "spare files", but when I say "empty pages" I'm thinking in terms of a handful, as opposed to the file starting with 3906 empty pages (pages required to represent the 128000 unused IDs at the start of the ID-space).

Not sure what the pattern is for fdev IDs, but you can remove the starting empty spots by just reducing all IDs by whatever the lowest fdev ID is. I.e. if the IDs start at 128,001 just offset all IDs by 128,000 as your zero index. From there though it's still not necessarily efficient to use a bitmap, are IDs always increasing and densely packed? If they aren't, and there are holes in the range, you'd need to non-trivially map them by order, 128,001 -> 0, 128,010 -> 1, etc. and that has no easy conversion back that doesn't involve an array access at least (have an array of IDs matching indices to bitmap positions). Which then runs into the next question, if it's sparsely populated does fdev fill in holes later? Because if they ever fill in a hole well... you have to completely rebuild that mapping, not fun to update an existing database to handle, though perhaps a worthwhile one-time cost if it has enough performance benefits and is considered a rare event.

The reason there's no progress display during the listings.csv download ...

Also, in case that was in response to my talking about progress above, I was actually referring to the DB rebuild steps / prices file processing steps, less so the downloads. But your prototype for more progress messages looks great. I noticed that new UI when spinning up the listener earlier today (though had to turn it off shortly after as it seems to download the spansh dump every hour even if it was listening to the EDDN the entire time in between, which is what spansh uses too?? I'm guessing there's some edge case requiring that...).

from trade-dangerous.

kfsone commented on August 16, 2024

| I was actually referring to the DB rebuild steps / prices file processing steps,

The 10s video I posted is actually it processing the csv file, not downloading it.

I didn't record the tail part because that's the part I'm working on, and at the moment it just sits there for 25 minutes showing this:

and then a few more showing this:

(but the spinner is animated the entire time as is the timer)

@eyeonus the cause of this is these two indexes in particular:

I can look at splitting the data into two tables so it doesn't need that index, if that's useful. It'll mean my doing some diligence on the various commands that access price information from the StationItem table to get it right.

from trade-dangerous.

kfsone commented on August 16, 2024

| Not sure what the pattern is for fdev IDs, but you can remove the starting empty spots by just reducing all IDs by whatever the lowest fdev ID is. I.e. if the IDs start at 128,001 just offset all IDs by 128,000 as your zero index. From there though it's still not necessarily efficient to use a bitmap, are IDs always increasing and densely packed? If they aren't, and there are holes in the range, you'd need to non-trivially map them by order, 128,001 -> 0, 128,010 -> 1, etc. and that has no easy conversion back that doesn't involve an array access at least (have an array of IDs matching indices to bitmap positions). Which then runs into the next question, if it's sparsely populated does fdev fill in holes later? Because if they ever fill in a hole well... you have to completely rebuild that mapping, not fun to update an existing database to handle, though perhaps a worthwhile one-time cost if it has enough performance benefits and is considered a rare event.

Yeah, that's why I think something akin to row-id is "good enough". It's sole purpose is as a guide for the local bitmap - which is not something that should ever get shared, it's entirely local stateful stuff.

My import just got to the regenerating .prices file part, which doesn't have a progress bar,

but the way the rich stuff works let me get rid of a bunch of code while also adding a context-manager styling for increasing the use of progress bars, so you do something like:

with progress.Progress(None, 40, style=pbar.CountingBar) as pbar:
... regenerate prices()

and that's enough to add a concurrent progress spinner/time-elapsed bar.

from trade-dangerous.

eyeonus commented on August 16, 2024

(didn't mean to throw you under the bus @Tromador - I should have said: our call to trom's server for the listings.csv isn't getting length information -- likely that there's a header missing in the request or sometihng)

That's certainly possible, the header we pass is headers = {'User-Agent': 'Trade-Dangerous'}

from trade-dangerous.

eyeonus commented on August 16, 2024

I'm fine with using a row_id if you think it'll help things. StationItem currently uses (station_id, item_id) as the PK, so adding a row_id to the table shouldn't bork anything, except maybe the two unique ids thing with the export?

from trade-dangerous.

kfsone commented on August 16, 2024

They would be something totally transient/private to the data set, almost even to the particular generation. I've been running various experiments/investigations but I probably need to actually create a couple of branches in my fork and try some of them out in more detail.

for instance - I took the hash-bucket thing for a spin, trying to find if there are numbers that would actually work for us, and I actually found some cheap algorithms for transforming the range of fdev_ids in my local Station table into a 0-N range of unique numbers that essentially required 3-bits of bitmap per station to be able to create a bitmap for each item saying "these are all my stations" - so the indexes we currently have could be condensed down from 64-bits-per-station-vs-item to ~3-bits-per-station on each item's row.

I ran an import, which added a station ... ran my test again, and the new station collided with another id after running the algo again. son of a...

there's a python version here https://gist.github.com/kfsone/f7f7de6d7c0216cf44cdf97b526a474e and a rust version (cause the python version was gonna take several hours to run, I should have tried it under pypy instead) https://github.com/kfsone/tinker/tree/rust

---------- it's late, I'm blathering ... sorry.

Current big hurt has to be the two indexes that have to present on the StationItem table to make it practical for buy/sell to distinguish only rows important to them. At a smaller scale, it was actually better having one table with sell/buy prices because you avoided a join, but now it forces us to maintain a massive conditional index. power hurt assemble

So I can focus on splitting that into two tables on that import-via-table branch over the next several days, if you like. It currently has the new progress bar stuff in it because I wanted/needed the extra visibility while running the long tests.

Rich is pretty frickin' cool; https://rich.readthedocs.io/en/stable/ it brings a ton of power to the table, down to the ability to generate a recording of an apps output, and almost all the capabilities are packaged behind self-demoing modules:
python -m rich.progress
python -m rich.live
python -m rich.table
python -m rich.logging
python -m rich.tree
python -m rich.spinner
python -m rich.status
python -m rich.layout
use it as a terminal markdown viewer:
python -m rich.markdown README.md

I had written various bits and pieces of this stuff into TD early on, but I'm awful at UI so none of it is great, and switching it over to using rich should improve quality/appearance/speed and reduce the amount of maintainable code at the same time.

from trade-dangerous.

eyeonus commented on August 16, 2024

That all sounds great. Feel free to work on the two table thing, and if it turns out to be much better than the current implementation, I consider that a win.

from trade-dangerous.

Tromador commented on August 16, 2024

(didn't mean to throw you under the bus @Tromador - I should have said: our call to trom's server for the listings.csv isn't getting length information -- likely that there's a header missing in the request or sometihng)

So long as it's an old school Routemaster, I will happily be under the bus.

The reason the server doesn't send content length header is because it streams listings.csv as gzipped content and because it's doing the compression on the fly, it doesn't actually know how the size of the data it's ultimately going to send.

I guess listener could gzip listings.csv and then apache would know how much data it's sending?

I have a terrible feeling of deja vu here. See, I have sarcoidosis and one of the many symptoms is brain fog and I have become much more forgetful than I once was, but I have an itch in the back of my brain which suggests we've been here before. Why did we choose to have the webserver do on the fly compression instead of just creating compressed files?

from trade-dangerous.

Tromador commented on August 16, 2024

I guess if listener compressed the files, apache sent listings.csv.gz then eddblink will have to uncompress it.

As it is, the download is handled by urllib, which reads the headers, and uncompresses on the fly as the file downloads. Ultimately it's about saving a lot of bandwidth (and thus download time) given the innate compressibility of text.

from trade-dangerous.

kfsone commented on August 16, 2024

Interesting: We do an initial probe for the timestamp of the file using urllib.request using POST:

so I could capture the "actual length" to give me a guide.

But I should probably also eliminate that separate query, because it means we have two - possibly international - SSL-based queries going on (it takes about 2s to determine if there's an update from here in CA) and worse, you actually start sending the file - so it's doubling the bandwidth use (and why the actual downloads start slow for the end user)

from trade-dangerous.

kfsone commented on August 16, 2024

I can save a bunch of bandwidth here by switching from urllib.request.request_open to requests.head:

and this is actually good stuff for the downloader, because we actually want the uncompressed size, since the downloader never sees the compressed data in the first place.

Win win win.

from trade-dangerous.

kfsone commented on August 16, 2024

I name this branch: make imports lightning fast

from trade-dangerous.

eyeonus commented on August 16, 2024

Interesting: We do an initial probe for the timestamp of the file using urllib.request using POST...

Yeah, that's totally my fault. I didn't know any other way to do it.

from trade-dangerous.

Tromador commented on August 16, 2024

But I should probably also eliminate that separate query, because it means we have two - possibly international

Server hosting kindly donated by Alpha Omega Computers Ltd who are in the UK. Until I got sick, I was a company director there. The deal is that I get free hosting and they get to call me to ask dumb questions once in a while.

from trade-dangerous.

Tromador commented on August 16, 2024

I can save a bunch of bandwidth here by switching from urllib.request.request_open to requests.head:

Pretty sure that 'text' is not a valid encoding standard. IANA HTTP content coding registry.

If it is allowed (or works anyway) make sure you don't use it for the download as you will be telling the server you only grok plain text and can't accept compressed content and it will presumably comply, sending the whole thing in the requested, uncompressed format.

from trade-dangerous.

eyeonus commented on August 16, 2024

I can save a bunch of bandwidth here by switching from urllib.request.request_open to requests.head:

Pretty sure that 'text' is not a valid encoding standard.

I tested it, and it didn't give any errors. Progress bar showed up and everything.

from trade-dangerous.

kfsone commented on August 16, 2024

@Tromador Just wasn't sure how to tell it to not encode. And this is only for an HTTP HEAD request:

requests.head(url, headers={...})

Should probably give transfers.py the ability to open and maintain a "Connection" so that it can turn around the queries more rapidly - at the minute it has to do the https handshake on all of them which takes time and costs cpu/bandwidth it doesn't strictly need to, but I don't think it's hurting anyone atm so nice +19 :)

from trade-dangerous.

Tromador commented on August 16, 2024

@Tromador Just wasn't sure how to tell it to not encode.

I think you want 'identity' then, which says "I want the original with no encoding whatsoever".

@eyeonus It may work, but that doesn't make it right, I have been through the RFC and it refers right back to the list in the link I gave above. Doing things correctly per standards is important for the day such a loophole is plugged in a patch and then suddenly it doesn't work. I had a good trawl on the net looking for "text" in the context of an "accept-encoding" header and can't find it. If you can, I am happy to be educated.

from trade-dangerous.

eyeonus commented on August 16, 2024

No, in this case I'm the one to be educated.

from trade-dangerous.

kfsone commented on August 16, 2024

No! It were me as wuz educated. Ready for a round of "wen ah were a lad?"

from trade-dangerous.

Tromador commented on August 16, 2024

No! It were me as wuz educated. Ready for a round of "wen ah were a lad?"

Well, I am a Yorkshireman :)

from trade-dangerous.

kfsone commented on August 16, 2024

I'm from a place called Grimsby which is neither Up North, Down South, Off East or Out West. It has its own ordinality: Theranaz. People from Grimsby are Up-In Theranaz.

GRIMSBY (n.)

A lump of something gristly and foultasting concealed in a mouthful of stew or pie. Grimsbies are sometimes merely the result of careless cookery, but more often they are placed there deliberately by Freemasons. Grimsbies can be purchased in bulk from any respectable Masonic butcher on giving him the secret Masonic handbag. One is then placed correct masonic method of dealing with it. If the guest is not a Mason, the host may find it entertaining to watch how he handles the obnoxious object. It may be (a) manfully swallowed, invariably bringing tears to the eyes. (b) chewed with resolution for up to twenty minutes before eventually resorting to method (a) (c) choked on fatally.

The Masonic handshake is easily recognised by another Mason incidentally, for by it a used grimsby is passed from hand to hand. The secret Masonic method for dealing with a grimsby is as follows: remove it carefully with the silver tongs provided, using the left hand. Cross the room to your host, hopping on one leg, and ram the grimsby firmly up his nose, shouting, 'Take that, you smug Masonic bastard.'

-- Douglas Adams, Meaning of Liff

from trade-dangerous.

eyeonus commented on August 16, 2024

@kfsone I'm in the process of making the spansh plugin capable of filling the Ship, ShipVendor, Upgrade, and UpgradeVendor tables from the spansh data.

Because of the data that is (not) provided by spansh for the ships (cost) and upgrades (weight, cost), I'm wanting to change those tables to include the information that is avaialble.

Current testing indicates that the new code works as expected on a clean run (no pre-existing DB), but with an already existing DB the tables won't match, which borks everything.

Do you know if TD already has code to do this?

If not, where would be the best place to have TD check if the DB needs updated, and if so drop the old table and add the new version? I could put it in the plugin itself, but it might be better to put it somewhere in tradedb or tradeenv?

from trade-dangerous.

eyeonus commented on August 16, 2024

Ideally, I'd like to see if ./tradedangerous/templates/TradeDangerous.sql matches ./data/TradeDangerous.sql and if not, apply the difference from the template to the DB. (So for example if the Ship table in the template doesn't match, TD will drop the DB Ship table and create the template's Ship table in the DB)

from trade-dangerous.

kfsone commented on August 16, 2024

Yes there's a schema-migration system, but it's not particularly smart; basically it operates on the sqlite "pragma user_version". Take a look in cache.py.

In a bit of karmic bloody-nosing while I was working on it (4 weeks ago?), we suddenly needed to change our sqlite schema at work for the first time in 15 years, and I ended up writing a more advanced version there, but I'll see if the boss is OK with me donating it.

From the runbook here's a "change 3" which is recovering from a previous mistake but compounds it, and change 4 that fixes things finally.

MIGRATIONS = {
    3: [
        {
            "if": "SELECT COUNT(*) FROM pragma_table_info('History') WHERE name='Preview'",
            "eq": 0,
            "then": "ALTER TABLE History ADD COLUMN Preview BOOL;"
        },
    ],
    4: [
        "DELETE FROM History WHERE HighWaterMark IS NULL OR HighWaterMark < 1;",
        "ALTER TABLE History ADD COLUMN NewWaterMark FLOAT NOT NULL DEFAULT 0;",
        "ALTER TABLE History ADD COLUMN NewPreview BOOL;",
        "UPDATE History SET NewWaterMark = HighWaterMark, NewPreview = Preview;",
        "ALTER TABLE History DROP COLUMN HighWaterMark;",
        "ALTER TABLE History DROP COLUMN Preview;",
        "ALTER TABLE History RENAME COLUMN NewWaterMark to HighWaterMark;",
        "ALTER TABLE History RENAME COLUMN NewPreview to Preview;",
    ]
}

from trade-dangerous.

SEVERE performance issues doing EDDBlink import when DB is large. about trade-dangerous HOT 69 OPEN

Comments (69)

1) N = -20

2) N = 5

3) N=-10

4) N=-18

5) N=12

6) N=20

TL;DR / Summary section

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent