Comments (34)
@ajtritt you are absolutely right that the two proposals are the same wrt to the data they represent. in fact, if I were implementing this in a python class, I would hide the implementation in hidden attributes and expose BOTH approaches as properties of the class, use the math you outlined to query either, and have setters and getters that either compute the timestamp_offset
if timestamps_reference_time
is passed or compute timestamps_reference_time
if timestamp_offset
is passed.
so practically, pynwb should be able to both proposals, regardless of what is decided here w/r/t to the schema.
so this discussion is about the
- there are cases in which an experimenter does not want to use
session_start_time
as a the reference to map "experiment time" to "real world time" - sometimes an experimenter wants to define an alternative reference to "real world time"
- sometimes an experimenter does not want a reference to "real world time"
from the perspective of someone who needs to write these files and coerce my data into the spec, I find the explicit approach of Proposal 2 to be far more intuitive. Further, Proposal 1 only meets needs 1 & 2.
again, we've already made a decision to break backward compatibility with the schema, hence the "2.0" designation. if there is ever an appropriate time to break more things in the pursuit of clarity and improvement, this is it.
from nwb-schema.
Just to add my final two cents, I don't feel at all strongly about this. I'm perfectly happy for my lab to effectively ignore session_start_time by setting it to '1970-01-01T00:00:00Z' and using the (implicit) assumption (which probably should be made explicit) that session start time is the reference time for all timestamps. But if adding a field is helpful to others, that's fine with me, and I don't have a preference as to how it is done.
My final comment is that I think that if any of us were presented with code that implemented either solution, we would quickly adjust to it and then not think much more about it. As soon as the Technical Advisory Board for NWB is fully formed they can deal with these issues, but until then my suggestion would be for Andrew and Oliver to make a decision that makes sense to them and to move on so we can keep the development going.
from nwb-schema.
@t-b Yes.
Proposal 1 = "Timestamps_offset"
Proposal 2 = "Timestamps_reference_time"
from nwb-schema.
I can live with both proposals and would also be fine with Proposal 2. ISO8601 strings to the rescue!
from nwb-schema.
Okay, proposal 2 it is... timestamps_reference_time
as an ISO8601 string!
Any feelings on what timestamps_reference_time
defaults to? I'm guessing most people will just want to store relative (to session_start_time
) timestamps, so timestamps_reference_time
can default to session_start_time
.
from nwb-schema.
Okay, proposal 2 it is.
I agree.
timestamps_reference_time
can default tosession_start_time
.
Yes, based on the discussion in #50
from nwb-schema.
doesn't this also necessitate a units field and expanding the allowed dtype?
from nwb-schema.
"since this will explicitly specify the offset to obtain relative timestamps, eliminating the need for APIs to guess based on range." can you clarify this point?
instead of adding a new field, wouldn't it be more simple to deprecate session_start_time
and replace it with timestamps_reference_time
as an optional top-level field? however, if people want to use POSIX timestamps (the use case that motivated this), this still necessitates expanding the dtype to include int64 & adding a units field for timestamps, no?
from nwb-schema.
Fwiw, I think that every dimension should have a required unit spec, including time
from nwb-schema.
@neuromusic timestamps already has a unit field. Right now it is sent to the constant 'Seconds'. That would need to be changed to support this proposal. We would need to allow ints as well.
Getting rid of session_start_time
does not preserve backwards compatibility. By adding this field, we preserve the theme of having timestamps relative to a global start.
Regarding my statement about API's guessing.... storing the offset provides a mechanism for explicitly specifying what stored timestamps mean. I am trying to strike the balance of keeping relative timestamps as default, but allowing users to specifying absolute time. By specifying timestamps_offset
, users can store absolute time, which they may want to do queries or analyses on. However, this breaks the concept of storing timestamps relative to a start time. By storing timestamps_offset
, a user is effectively saying "subtract this number from all timestamps to get timestamps relative to the global start time". By explicitly storing this number, we avoid the situation where a users or API has to inspect timestamps to determine which frame they are in. Under the rules I have laid out, It will always be the case that timestamps - timestamps_offset will always produce timestamps that are relative to session_start_time
.
from nwb-schema.
"Under the rules I have laid out, It will always be the case that timestamps - timestamps_offset will always produce timestamps that are relative to session_start_time
."
Currently, timestamps will always produce timestamps that are relative to session_start_time
. This proposal adds the overhead of an additional field to resolve this data.
We are already in the middle backwards compatibility breaking changes to the schema, no? ElectrodeTable, etc. I'm happy to break backwards compatibility for increased clarity and generalization.
And to clarify, all timestamps are relative, always. The need here is not absolute timestamps per se but timestamps relative to an explicit session-agnostic date.
- not all experiments need to cast session-time into absolute time. indeed, many experiments do not have the precision in their data collection to do so. for these scientists, the only need is that the data in the nwb is in a common timebase & for these experiments,
session_start_time
may be off from timestamp[0] by seconds or minutes or hours
my updated proposal:
- we keep
session_start_time
- we add
timestamps_reference_time
as an optional top-level field, which all timestamps in the file are referenced to
this has the added benefit that a user using POSIX timestamps does not need to compute the delta between Jan 1970 and session_start_time
simply to write a file, as they would need to in your proposal.... they can just drop the POSIX reference time into the relevant field and put an experimentally-relevant value into session_start_time
from nwb-schema.
I am a little unclear about what you are proposing. Can you tell me how one would store posix timestamps using this scheme?
from nwb-schema.
minimal example
session_start_time = None
timestamps_reference_time = None
# then, for a given timeseries...
timestamps = <array of float64>
timestamps.dtype = float64
timestamps.unit = 'second'
POSIX example
session_start_time = None
timestamps_reference_time = '1970-01-01T00:00:00Z'
# then for a given TimeSeries....
timestamps = <an array of int64>
timestamps.datatype = int64
timestamps.unit = 'POSIX64'
non-POSIX, but "absolute" reference which is different from session start
session_start_time = '2017-08-31T10:40:01-07:00'
timestamps_reference_time = '2017-08-07T00:00:00Z'
# then for a given TimeSeries....
timestamps = <an array of float64>
timestamps.datatype = float64
timestamps.unit = 'second'
in this case, even though we designated that the session started on 8/31 using local time, all timestamps are relative to GMT at an earlier date (perhaps the animal's birth date? or first day of training? or some other experimentally-relevant time that the experimenter wants to reference their timestamps to)
from nwb-schema.
In the current format, time encodings that are not relative to the global clock would be stored in the sync
group of TimeSeries
http://nwb-schema.readthedocs.io/en/latest/format.html#id94 . As such, I think it might make sense to define these structures within sync
. In this way TimeSeries
itself would not change (i.e.., always have relative times) and a specific spec would be added to the sync
folder to describe absolute times. To make the sync
group easier to extend it would probably be useful to add a neurodata_type: LabTime
group that people can extend to describe lab-specific time encodings.
from nwb-schema.
@oruebel I don't understand. sync
addresses a totally separate issue of tracking provenance information related to synchronizing timestamps between timeseries acquired from different hardware. AFTER sync'ing you need to put everything in the same time reference, which what this issue is concerned with.
in my proposal, all timestamps are still relative to the global clock, we're just being more explicit about the the global clock's reference.
from nwb-schema.
@neuromusic Thanks for clarifying. So then session_start_time
and timestamps_reference_time
are both optional, and there are three rules for how they get interpreted?
from nwb-schema.
yes, both optional.
three rules?
from nwb-schema.
Yeah, the three scenarios you listed and their meanings
from nwb-schema.
rule 1: session_start_time
is metadata. while machine-readable, the specification offers no guidance on how it should be interpreted beyond what the experimenter describes, however, by convention, this should be the datetime of the experiment.
rule 2: timestamps_reference_time
is interpreted as the ISO8601 time where timestamp=0.
rule 3: if timestamps_refernece_time
is empty, then the experiment has no ISO8601 reference & its timestamps can only be referenced internally to the NWB file
n.b. I keep saying "file" but I know we want to break free from HDF5 eventually... should I say "container"? "session"?
from nwb-schema.
What I like about Andrews proposal is that it does not unnecessarily break backwards compatibility. If a user does not specify timestamps_offset
then all times are relative to session_start_time
as they have always been. It only adds additional functionality without breaking existing functionality.
Refining the specification of session_start_time
in terms of defining exactly how it should be stored is I think in general helpful and has been raised in #50
from nwb-schema.
After thinking about this and doing some math, these two proposals are one and the same in terms of how they get used. Here's my explanation for why I think that...
To explain this, let me define some short hands I will use below.
abs
:= absolute timestamps
rel
:= relative timestamps
ts_off
= timestamps_offset
trt
:= timestamps_reference_time
sst
:= session_start_time
ts
:= timestamps
POSIX(xyz)
:= the posix time representation of an ISO8601 encoded timestamps
Proposal 1: store offset of timestamps in field timestamps_offset
abs = ts + sst - ts_off
rel = ts - ts_off
If a user has saved absolute timestamps i.e. ts == abs
and ts_off == POSIX(sst)
, then
rel = ts - tsoff
If a users has saved relative timestamps i.e. ts == rel
and ts_off == 0
, then
abs = ts + sst
Proposal 2: store reference time for timestamps in field timestamps_reference_time
abs = ts + trt
rel = ts + trt - sst
If a users has saved absolute timestamps i.e. ts == abs
and trt == 1970-01-01T00:00:00Z
, then
rel = ts - POSIX(sst)
If a users has saved relative timestamps i.e. ts == rel
and trt=sst
, then
abs = ts + trt
With that said, the issue becomes one of human interpretability, so, I think the decision is up to those who will be interpreting.
from nwb-schema.
@ajtritt that makes sense.
After looking at it a bit more, I think I like the timestamps_offset
approach better, mainly because the meaning/behavior of session_start_time
does not change compared to the current approach if ``timestamps_offset=0. In contrast, with
timestamps_reference_time``` we change the name of the reference and, hence, require folks that are used to the concept of times relative to ```session_start_time``` to change to use ```timestamps_reference_time``` and set ```timestamps_reference_time=session_start_time```
Either way, I think session_start_time
should remain a required field.
from nwb-schema.
Regarding backwards compatibility, in the current schema, there is only an implicit link between session_start_time
and the reference time for timestamps: "Timestamps here have all been corrected to the common experiment master-clock. Time is stored as seconds and all timestamps are relative to experiment start time." It is not clear that session_start_time
is what is being referenced here. The implicit link between the two is simply a convention that I suspect is largely unused.
So the change I propose does not actually break backwards compatability but instead makes what has been implicit explicit.
The key disadvantage of Proposal 1 is that two values are necessary to represent "absolute" time (both session_start_time
and timestamps_offset
). Further, the offset must be computed. To represent POSIX time, this means that there is now a burden on the person creating the file to compute the time delta (in POSIX timestamps) between Jan 1 1970 and session_start_time
. e.g.
session_start_time = '2017-08-31T10:40:01-07:00'
timestamps_offset = <some computed int64 representing the delta between sst & Jan 1 1970>
# then for a given TimeSeries....
timestamps = <an array of int64>
timestamps.datatype = int64
timestamps.unit = 'POSIX64'
While both proposals represent the data, if I want to save my file with a different session_start_time
, then I need to recompute my timestamps_offset
.
Under both proposals, on reading, I can't assume that my timestamps are in POSIX64... I need to check what they are referenced to. Under proposal 1, I need to subtract (or add?) the timestamps_offset from session_start_time and check it against '1970-01-01T00:00:00Z'. Under Proposal 2, I just have to match the string saved in timestamps_reference_time
against '1970-01-01T00:00:00Z'
from nwb-schema.
in general, NWB needs fewer required fields, but I'm willing to save that argument for another day
from nwb-schema.
Time is stored as seconds and all timestamps are relative to experiment start time." It is not clear that session_start_time is what is being referenced here. The implicit link between the two is simply a convention that I suspect is largely unused.
I understood it like that and also create NWB files following this convention. It was a quite a hassle to base all measurement data on this common time zero of experiment start.
from nwb-schema.
how do I get on this technical advisory board?
from nwb-schema.
Right now the plan is for Lydia to the the Allen representative (I think), so I'd suggest touching base with her or Christoph.
from nwb-schema.
My understanding of the issue at play here is that there are two distinct temporal frames of reference under discussion: 1) RealTime (i.e. our shared experience of the world, and ignoring any effects from special relativity), and ExperimentTime. The “timestamps” field of the nwb file provides a coordinate system for ExperimentTime. The original spec requires that a required point, the beginning of the experiment (session_start_time), provide a common reference point to relate this ExperimentTime coordinate system (timestamps[0]) with RealTime, and there has been discussion on this thread about using either ISO_8601 or POSIX as the coordinate system for RealTime.
More controversy arises from the modeling community that claims that this RealTime/ExperimentTime relationship is not relevant for their data, and therefore want to make this session_start_time attribute optional. I agree with this.
Also, I want to point out that the units on the “timestamps“ field are only necessary if 1) there is no RealTime/ExperimentTime correspondence AND 2) there is not other common reference point between the RealTime coordinate system (either ISO_8601 or POSIX) and “timestamps”. This means that no units are required if both session_start_time and something like session_end_time are provided (and of course the value of timestamps[0] and timestamps[-1] coincide with these two fields).
My preference is to adopt ISO_8601 as the coordinate system for RealTime via session_start_time, but make its inclusion optional, as well as a new optional session_stop_time. Also, I would be fine making the “timestamps” unit optional, however if it is not specified then both session_start_time and session_stop_time must be provided. This is a backwards compatible change that increases flexibility for building the files, while ensuring that “timestamps” is always a unitful quantity.
If we cannot agree on a coordinate system for RealTime, then an additional piece of metadata that specifies either ISO_8601 or POSIX will be necessary, required any time session_start_time is required.
from nwb-schema.
@nicain to clarify, the discussion on whether to use POSIX or ISO8601 for "RealTime" reference is #50.
This issue was prompted by the feature request by @lfrank (now retracted, I guess?) to support POSIX values (instead of seconds relative to experiment start) on the "ExperimentTime" timestamps themselves.
from nwb-schema.
With NWB 2.x spec being finalized, and now that solid ISO8601 support is in pynwb ( via NeurodataWithoutBorders/pynwb#641 ), it seems like a good time to try to reach a decision on this issue!
I would like to throw my support to @neuromusic 's proposal ("proposal 2") for an additional, optional timestamps_reference_time
field that encodes the timestamp zero time as an ISO8601 string. (As opposed to "Proposal 1": an optional field representing an offset (in seconds) that is subtracted from the session_start_time
to obtain the reference time).
I believe we can achieve backwards-compatibility by declaring that: "if timestamps_reference_time
is not specified, then the timestamp reference time is taken to be the session_start_time
".
I (quite strongly) prefer the specification of a reference time (rather than an offset from session_start_time) as being simpler to use and more intuitive during both file write and read. For instance:
- An important use case for us is to confirm that a set of NWB files (possibly recorded from the same animal across multiple sessions) all share the same time base. As @neuromusic notes, under proposal 2 this reduces to a simple inspection of the "timestamps_reference_time" field. Whereas in proposal 1 we need to do timestamp math with 2 different representations of time (ISO8601 and elapsed time in seconds). (Possibly leading to floating point precision issues, given that an identical reference time would be represented by different sums across files?).
- When writing data, we typically want to specify a hard-coded reference time. Under proposal 2 this involves no (sadly often error-prone) timestamp math.
- Using pynwb, all the requisite timestamp information (
session_start_time
,file_create_date
, andtimestamps_reference_time
can be provided by passing in a timezone-aware datetime object, which leads to clear and expressive code: (with my apologies to PEP8 for alignment!).
# For a file that uses POSIX timestamps
from datetime import datetime
from dateutil import tz
session_start_time = datetime(2017, 10, 31, 9, 59, 0, tzinfo=tz.gettz('US/Pacific'))
timestamps_reference_time = datetime(1970, 1, 1, 0, 0, 0, tzinfo=tz.tzutc())
file_create_date = datetime.now(tz.tzlocal())
- Miscellaneous points:
- By "POSIX time", I just mean I want to use seconds since the Unix Epoch (not including leap seconds). Contra some discussion above, we do not want a new 'POSIX64' dtype--we don't want to represent time as anything other than float64's with units of 'Seconds'.
- The schema should make it clear that we follow the POSIX convention of ignoring leap seconds in date/time computations. This is what the libraries everyone is relying on will be doing, but doesn't hurt to state it explicitly.
- I vote -1 on @nicain's suggestion to add a "session_end_time", at least insofar as it is used to map a range of timestamps onto real-world time. I think we should stick to SI units for time. Also, many different objects in a file have associated timestamps, so it's not particularly meaningful to refer to a global timestamps[0] and timestamps[-1].
- This issue also seems to discuss whether
session_start_time
should be made optional (e.g. for simulation data). I don't have any objection to this.
from nwb-schema.
@tjd2002 Proposal 2 from Andrew's post in #49 (comment)?
from nwb-schema.
Great that we seem to have a consensus!
timestamps_reference_time
can default tosession_start_time
.
This behavior at the API level sounds works for us and affords a level of backwards compatibility.
Am I correct that this means that timestamps_reference_time
will become a required field in the spec, with a default value set by the API at write time? For those who wanted to make session_start_time
optional, this might be unwelcome news (though it's not a lot worse for them than the current situation).
from nwb-schema.
@tjd2002 Just a FYI. According to http://neurodatawithoutborders.github.io/development_plan this monday (11/05/2018) is the cut off date for schema changes.
from nwb-schema.
from nwb-schema.
Related Issues (20)
- `CorrectedImageStack` is insufficient for modern motion correction HOT 2
- How does the `hdmf-common` namespace work? HOT 4
- Add attributes to `ImageSeries` HOT 1
- Non-square pixel support for GrayscaleImage, ImageSeries, etc.
- Add readthedocs.yaml HOT 1
- `name` and the `nwb.file` schema HOT 5
- Some broken links in the docs HOT 2
- Request for support of spike raster type HOT 3
- EventDetection not appropriate for modern probes HOT 1
- Cannot extend `OptogeneticSeries` to have 2D data HOT 8
- Add GitHub Action to check external and internal links
- [Feature]: Add num samples to `ImageSeries` when external mode is used HOT 2
- Deprecate `ImagingRetinotopy` neurodata type
- Why is optical channel not a link in ImagingPlane? HOT 3
- Best practice for associating units of measurement with simple metadata in extensions HOT 2
- Make `band_mean` and `band_sd` in `DecompositionSeries` optional
- What is the dimension "num_samples" in ElectricalSeries HOT 2
- Make `stimulus/presentation` and `stimulus/templates` groups (and maybe all root-level groups) optional HOT 1
- Add support for PNG string encoding representation for Image HOT 11
- Add CElegansSubject and species-specific Subject neurodata types
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nwb-schema.