Continued from the discussion in <a class="issue-link js-issue-link" data-error-text="

Just to reiterate, my suggestion is to make a single object, say <code class="notransl

Answering <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

moving <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Storage and scheduling separation.,about microsoft/qcodes

Comments (20)

akhmerov commented on August 16, 2024

Just to reiterate, my suggestion is to make a single object, say Scheduler responsible for coordination of different types of activities. It would also keep a complete log of everything that happened without omissions. Scheduler strategy should be possible to change at any moment (e.g. start/stop measurement). The actual sweeps would comprise a combination of a part of scheduler strategy (typically a high priority part), and a specification of how to process the raw data such that it's visible to the user.

NOTE ADDED: I think this approach is very flexible, and well suited for a lot of different workflows. However implementation of the Scheduler as well as design of a clean interface for specifying measurements, their priorities and interactions are hard tasks.

from qcodes.

akhmerov commented on August 16, 2024

Answering @guenp questions from #2:

That sounds redundant to me, can't the monitor just refer to the dataset instead and just log parameters that aren't included in the measurement?

No, the dataset shouldn't contain e.g. timestamps, it is also quite nice to have only relevant quantities in the dataset. A dataset may indeed refer to the full log, but that's an extra action for the user to worry about.

Sure, the StorageManager can then decide to omit these files after post-analysis is completer, or indicated by the user

That would mean that you need to implement post-processing logic in the storage manager, while it really is easier to make it before data reaches the storage (also you don't want to accumulate this data in RAM neither).

What exactly would the 'Full log' look like in your opinion? Should it include information such as when measurements were started, and which processes are given priority at which time?

Let's see. The way I imagine it you'd store all calls to the instruments together with the time stamp. So this means separately storing "set gate voltage X" and "measure current Y" events, and combining these with monitor activities. Probably also user input events ("start sweep X, Y, Z", "halt"), should be stored. While it may feel excessive, I can think about a number of cases when you wish you had this information.

Btw, there's no reason why any parameter that's being recorded for a measurement shouldn't be updated in the monitor at the same time (e.g. if the measurement probes the temperature, the monitor could also save that as a datapoint simultaneously)

That's kind of what I suggest, only to make it non-negotiable. My main motivation is the cases when the separation between Monitor and Sweep isn't so well defined.

from qcodes.

alexcjohnson commented on August 16, 2024

I have two main concerns. You've already mentioned the first:

implementation of the Scheduler as well as design of a clean interface for specifying measurements, their priorities and interactions are hard tasks.

It seems to me like this requires quite a tight integration with whatever is constructing the measurement loops... which in turn makes it difficult to maintain flexibility in said measurement loop. We've already come up with a wide range of extra scenarios in #6 and I think we have workable solutions from a syntax standpoint, but then getting .run() to appropriately pass all of these on to the scheduler sounds awkward. Whereas telling the persistent monitor to pause when the measurement starts, telling it when and for how long it's allowed to run within the measurement, and resuming when the measurement ends - that's easy.

My second concern is perhaps more philosophical: experimental data should always take maximum priority; monitor data, and anything else really, is a luxury to be allowed purely at the experiment's convenience. That's why I moved storage out of the measurement process, for example, so only the bare minimum happens in the measurement loop. I have a historical motivation for this: seems like every time I use LabVIEW I encounter headaches with execution flow, because that model is so opaque about scheduling events. Igor was the paragon of clarity by comparison, you specified exactly what to do in what order and at what time. I'm not saying this scheduler concept has anywhere near the opacity of LabVIEW, but it raises red flags for me if the measurement is dependent on any other entity for its execution.

from qcodes.

guenp commented on August 16, 2024

moving @AdriaanRol's comments here

Although the perpetual monitoring of data seems very nice in theory I think the usability of such a construct will be decisive in determining the succes of QCodes. If it is not easy to extract a subset of the data (e.g. of 1 experiment run) or separate datasets for easy sending it will be very hard to run analysis or share data with others, Then there is also the problem of multiple setups producing data and that being accessible and browsable in a nice way. Depending on how this is done I don't know if it is possible to teach an incoming master student how to search in a database on top of all the other things he has to learn.

he => he/she please, although I know your team doesn't have any women (yet!) ;)
As for the data saving - for measurements this should be straightforward. Every measurement is saved on-disk in its own folder. As for shareability - this is something I bet @alexcjohnson will find and excellent solution for with e.g. Azure + data browsing interface/API, but for now it's not on top of the priority list. :)

also moving @alexcjohnson 's reply here

Re: Practicality of data saving

I'm not sure if it's clear, but I see a sharp separation between experiment data and monitor data. We haven't talked at all here about the organization of either one, as the monitor isn't written yet and the experiment data I've so far just provided one example storage format (MergedCSVStorage) but punted on organization for now - it just asks for a disk location.

But what that means is the experiment data is all going to be stored in some simple format, where the class that saved it can also read it back in for later analysis. I had some other thoughts about how to make the experiment data easier to pull back in later.

Pulling up old monitor data - I expect mostly this is going to be for debugging (why did last night's data go screwy?) or reproduction (what were the gate voltages when I took that data?). You're right, we don't want a database for this, that people will need to learn SQL to query or something. I plan to just make a nice text format with well-organized file/folder names. Then most of the time people will just open the log file, scroll to the appropriate time, and look at it. But of course there will be times you want to plot data from the monitor - it should be fairly easy to write reader scripts for this, which won't take users long to learn. Sound reasonable?

I think what I originally had in mind for the Monitor was more of a continuous measurement in the background, but now I agree with @alexcjohnson and @akhmerov that it should be a completely separate process. It's there to log anything that is changed in the system at any time.
However, that still does mean that in some cases I would like to have a measurement process run in the background to monitor a system parameter that is not part of the measurement, such as the cooling water temperature or the PT2 plate temperature for magnet quench protection. @alexcjohnson do you see this as part of the Monitor as well?

Let's see. The way I imagine it you'd store all calls to the instruments together with the time stamp. So this means separately storing "set gate voltage X" and "measure current Y" events, and combining these with monitor activities. Probably also user input events ("start sweep X, Y, Z", "halt"), should be stored. While it may feel excessive, I can think about a number of cases when you wish you had this information.

@akhmerov Great idea & couldn't agree more.

My second concern is perhaps more philosophical: experimental data should always take maximum priority

@alexcjohnson also couldn't agree more, this should be our main design philosophy. However, that does make it sensible to me to have a Scheduler delegate tasks and priorities, instead of the measurement process. This process should be dumb and just focus on taking data - the Scheduler can then figure out when the gaps are for the Monitor process to do things in parallel or get time to read out an instrument parameter without slowing down the measurement process.

from qcodes.

guenp commented on August 16, 2024

It seems to me like this requires quite a tight integration with whatever is constructing the measurement loops... which in turn makes it difficult to maintain flexibility in said measurement loop. We've already come up with a wide range of extra scenarios in #6 and I think we have workable solutions from a syntax standpoint, but then getting .run() to appropriately pass all of these on to the scheduler sounds awkward. Whereas telling the persistent monitor to pause when the measurement starts, telling it when and for how long it's allowed to run within the measurement, and resuming when the measurement ends - that's easy.

@alexcjohnson True, but perhaps we should look at it from an instrument call perspective.
The Scheduler could be (part of) an InstrumentServer. Any request from a measurement process for getting an instrument parameter through GPIB, COM, ethernet etc. should pass through this server, which then acts as a scheduler that gives priority to certain queries, depending on which process requested them and how long they've been in the queue... Just brainstorming but would that make sense?

from qcodes.

alexcjohnson commented on August 16, 2024

@guenp re: InstrumentServer - This could ensure that the measurement is the next task for a given interface, pushing off lower-priority calls until it finishes. But I still don't see how it could ensure that the measurement happens exactly when it's supposed to. You can't cancel a slow call after it's been started, at least not on the instrument side, so you'd have to figure out the call timing beforehand, but that isn't necessarily possible.

from qcodes.

alexcjohnson commented on August 16, 2024

@guenp

However, that still does mean that in some cases I would like to have a measurement process run in the background to monitor a system parameter that is not part of the measurement, such as the cooling water temperature or the PT2 plate temperature for magnet quench protection. @alexcjohnson do you see this as part of the Monitor as well?

Oh absolutely - I'm imagining the Monitor measuring basically everything it can measure, on regular intervals when there isn't a sweep running and however it can while there is, and also potentially taking action based on what it measures (such as a quench - this kind of action would of course be allowed to escalate its priority over a sweep because it invalidates the data)

from qcodes.

guenp commented on August 16, 2024

@alexcjohnson Re InstrumentServer:

You can't cancel a slow call after it's been started, at least not on the instrument side, so you'd have to figure out the call timing beforehand

Not sure what you mean here. Can you give an example of how that would be a problem in the scheme I proposed and how yours would solve that problem?

But I still don't see how it could ensure that the measurement happens exactly when it's supposed to.

The Scheduler would make sure the measurement happens exactly how it's supposed to. The Monitor will send periodic calls to the fridge computer requesting temperature sensor information, which the Scheduler will execute only whenever the measurement is requesting a bunch of GPIB commands or waiting for an instrument to reply. As far as I know this can be done in parallel, but correct me if I'm wrong. So instead of allocating a few small time slots within the measurement time for the Monitor to do it's thing, the Scheduler will fit them in at moments when it can run in parallel such that it doesn't take up any extra time from the measurement.

from qcodes.

alexcjohnson commented on August 16, 2024

@guenp

You can't cancel a slow call after it's been started, at least not on the instrument side, so you'd have to figure out the call timing beforehand

Not sure what you mean here. Can you give an example of how that would be a problem in the scheme I proposed and how yours would solve that problem?

As I understand the Scheduler idea, any process that wants to talk to an instrument would have to add that request to a queue in the Scheduler. So the Monitor would add queue calls every so often, and so would the measurement. Lets say there's a call the Monitor makes that blocks an interface for 100ms. But you're in a measurement loop that uses the same interface to make a call with delays of only 50ms - the Scheduler would somehow need to know this, and avoid making that long call at all during that loop, waiting until an outer loop with a longer delay, or until the measurement finishes entirely. Otherwise there would be semi-random delays introduced between measurement points. This seems complicated to implement, and potentially impossible in certain cases, like when the loop doesn't have fixed delays but is waiting for an event to continue.

But if instead, the measurement controls the Monitor, telling it explicitly "you have 50ms to do whatever you want", then the Monitor could look at the parameters it's tasked with monitoring, see that the long one doesn't fit in that time, and move on to the next one on its list.

This strategy is giving up on some performance, for sure - there may be times that a certain interface is free when another is occupied by the measurement, and we could still be monitoring on the free one. Actually, it occurs to me that it wouldn't be terribly hard to work this into the framework I've proposed: The measurement locks the interfaces it will use, leaving the others unlocked, and the Monitor keeps running its periodic calls on the unlocked interfaces during the measurement. Then the measurement calls the Monitor, telling it "you have 50ms to measure on interfaces X, Y, and Z".

This would still be overly restrictive in certain cases, like if one interface (say, fridge control) is only used in an outer loop (setting temperature) and we're locking it throughout the inner loop. Still, I think even locking all the interfaces except when specifically authorized would be preferable over introducing any avoidable timing noise.

Implementation note: if these locks are RLocks then it's probably not even necessary to explicitly tell the Monitor anything - when the measurement starts, acquiring these locks will ensure that the monitor is done with them before the measurement does anything. Then when the Monitor sees that these locks are already held by another process it can skip the associated parameters. When Monitor is called from within the measurement, it can still acquire these locks (assuming we can operate the Monitor directly in the measurement process), if the most pressing parameters to measure are on those interfaces, but it could choose to measure on the unlocked interfaces too.

And finally a note on monitor call timing: I didn't have anything fancy (machine learning) in mind here, just keeping a record of the call times of the last ~10-20 measurements of each parameter. Then when it's called with a time limit, it makes the call only if it's got enough time left for the longest measurement in its history, otherwise it moves on to the next on its list.

from qcodes.

akhmerov commented on August 16, 2024

As I understand the Scheduler idea, any process that wants to talk to an instrument would have to add that request to a queue in the Scheduler.

No, that would indeed not work. If that was the case, then there would be a problem. I thought Scheduler will only run requests that have enough time to finish before a higher priority event is expected to occur.

EDIT: even more specifically, Scheduler will have a specification to run a monitor measurement with a given maximal frequency if and only if no higher priority measurement is available.

from qcodes.

alexcjohnson commented on August 16, 2024

requests that have enough time to finish before a higher priority event is expected to occur.

So then how do you know either of those? I guess it wouldn't be hard for requests to state an upper confidence level on their running time (my Monitor would have to track this anyhow), but how do you specify when a higher priority event might occur?

from qcodes.

akhmerov commented on August 16, 2024

The specification of measurement should explicitly declare when it isn't changing state for certain time. (Integrating signal, waiting for the RC time, etc)

from qcodes.

alexcjohnson commented on August 16, 2024

But then it also has to declare at the beginning that it's running, and at the end that it's done, and which interfaces it's going to use (if we want to include that level of complexity) so then it doesn't seem to me that this gains any performance over just letting the measurement drive it all... which still seems simpler to me.

from qcodes.

akhmerov commented on August 16, 2024

In the simplest setup there are several priority tasks: setup integrity, then measurement, then bonus monitoring are highest to lowest. Setup integrity and monitoring are useful even when no measurement is running, and would persist across measurements. That's why it seemed that an overseer process is a cleaner interface — there you add a measurement, instead of requiring the measurement to take care of unrelated monitoring.

from qcodes.

MerlinSmiles commented on August 16, 2024

There is something else to consider from monitoring and measurement.
At least for the Triton setups in Copenhagen the temperatures are only read every 60s or so, this is happening continously. When any measurement asks for the current temperature it is not measured but the latest value is returned.
Temperature control behaves a bit different, during that time the lakeshore measurement thing takes over and does not update all measured temperatures to the fridge software, at least from my experience.
Now trying to squeeze in a temperature monitor reading somewhere in between measurement points seems a bit redundant.
Why not push the measurement value to the monitor process instead? One could have oxford change their software to send out the data, or just monitor the log file that is written, or simply spy on the serial connection that goes to the Lakeshore controller.
Its updated really slowly anyways.

Just a thought.

from qcodes.

alexcjohnson commented on August 16, 2024

Why not push the measurement value to the monitor process instead?

Yes, that's something I've been planning to do. I'm not sure it's all that important to performance (though it wouldn't hurt...) - the measurement itself is generally measuring just a few things at a time, whereas the Monitor may be watching watching more like 50 parameters, so you're not cutting down the list very much. But the parameters you're measuring are presumably the ones that are changing the fastest, so it would be nice if the Monitor got to take advantage of the increased measurement frequency automatically.

Ideally in fact, any get call to a monitored parameter, be it part of a regular measurement, some fancy thing you made up, or just a one-off command-line get, should pass its result on to the Monitor. That could happen in an InstrumentServer or Scheduler, though then this object needs to know a good deal more about the Monitor, like which parameters it cares about and what it calls them...

I'm thinking perhaps the loosest way to couple these together would be when you register a parameter to be monitored, Monitor decorates the parameter's get method to report the value correctly. That would work regardless of whether we implement a Scheduler, and keeps all the code for Monitor to interact with a parameter contained in Monitor itself.

from qcodes.

akhmerov commented on August 16, 2024

...this object needs to know a good deal more about the Monitor, like which parameters it cares about and what it calls them...

Why not monitor all the parameters, unless specifically declared to skip? It seems like the best default. I believe even a setting per instrument would be OK.

from qcodes.

alexcjohnson commented on August 16, 2024

Why not monitor all the parameters, unless specifically declared to skip?

Yes - that should be the default when defining a physical instrument.

from qcodes.

guenp commented on August 16, 2024

@akhmerov @alexcjohnson Hmm, it would be much easier to talk about this with a whiteboard. :)

Let's see what we would want in the bare minimum case. In principle, the measurement will be primarily using the GPIB interface for setting & reading values. You'll want the Monitor to primarily focus on periodically reading the temperatures from the Triton, which is all through the ethernet interface.

As @MerlinSmiles noted that these values are currently only updated every 60 seconds. I suggest this to be handled by a separate fridgeserver process (a la https://github.com/majacassidy/Fridgeserver) which won't require interface blocking and can both provide the Monitor and measurement processes with the latest values (which they should be able to request simultaneously at any time - the fridgeserver just returns whatever latest value is stored in the buffer). This Fridgeserver also solves the problem that the Oxford software sometimes requires users to manually reset the network connection (ask any Triton user about this or talk to @damazter @majacassidy ).

As for the Monitor's GPIB communication - most parameters in the system won't change much anyway, which is why a complete snapshot of the system before a measurement starts should be enough. I can only think of situations where a user manually changes some system setting during the measurement (e.g. the lockin time constant) - which they shouldn't do anyway - but still can happen by accident, and logging this seems like a very useful thing... I think a background Monitor process that periodically checks all instrument parameters should do this maybe once every 10, 30 or 60 minutes (up to the user), and only log incremental changes w.r.t. the snapshot it made in the beginning. (Of course it shouldn't log nonsense values, e.g. some floating keithley that's not connected to anything, but then the user has to remove this from the monitored params.)

However useful this may be, for the most basic/vital parts of the system I don't see why the Monitor has to query all instruments periodically, besides the ones that are important for quench protection. So personally I don't think we should give this as much priority and I definitely wouldn't sacrifice time from the measurement process for this. If the parameters were important I would just include them in the measurement anyway.

@alexcjohnson again, to me it would make most sense if a scheduler or something rather manages this snapshot/periodic monitoring/priority stuff, but basically it's up to you how to implement this, as long as ends up doing what we need it to do. :)

from qcodes.

giulioungaretti commented on August 16, 2024

as pointed out by @MerlinSmiles, one is left to wonder why this was closed :D
Given the age of the discussion, and the fact that its really hard to summarise by those not involved in the conversation, closing is a request for somebody to open a new issue , or start a chat on slack, with a summary and continue the discussion.

from qcodes.

Storage and scheduling separation. about qcodes HOT 20 CLOSED

Comments (20)

Re: Practicality of data saving

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent