Propagation and reduction of the metadata in MapDataset.stack about gammapy HOT 4 OPEN

bkhelifi commented on June 24, 2024

Propagation and reduction of the metadata in MapDataset.stack

from gammapy.

Comments (4)

AtreyeeS commented on June 24, 2024

Thanks @bkhelifi for bringing this up.
What stacking should do to the MetaData was discussed at length in #4853 without conclusion.

The main point was whether we should have a parallel lists leading to code duplication. An approach with a RootModel and BaseModel was proposed by @adonath (see #4853 (comment))
How much info should be kept on a stacked dataset? Currently we have a minimal approach where we throw away all the meta info and keep only the creation info. If required, a meta container can be created from the meta_table. This approach is obviously ill suited. In #4853 I had initially tried keeping all, but that was ill planned and difficult to maintain.

A similar question might arise for the estimators, where the question would be what meta info is propagated from the individual datasets.

from gammapy.

AtreyeeS commented on June 24, 2024

What should be the difference between Datasets metadata and a stacked dataset

from gammapy.

bkhelifi commented on June 24, 2024

For the fixity metadata, there is no staking (of course).
For the context metadata, it depends a bit on the retained data model. But if, e.g., it contains the datapipe version, the calibration version, one should keep only one instance as these data will be unique for a fixed release
For the reference metadata, here I can propose that we append the ObsId list...

We have to go through all individual metadata fields and make a proposal to VODF/CTA (ie @kosack , myself, ...). A spreadsheet and then we discuss to decide which to keep as unique, which to append, which to skip...
I think that this is the hardest part of this 'project'

from gammapy.

adonath commented on June 24, 2024

@bkhelifi Internally in Gammapy I think we can almost always just propagate the meta data to the higher level by building hierarchical structures. There is not necessarily a need to reduce the meta data in each step, only if we find performance issues with Pydantic. The reduction can then finally happen when serializing. The problem with reducing the meta data "on the fly" is that different data formats might require different meta data. And "a priori" we cannot know to which format the user will serialize.

What should be the difference between Datasets metadata and a stacked dataset

The metadata for the stacked dataset is transposed and homogenous in the type of datasets. The datasets meta data is not.

from gammapy.

Recommend Projects

Propagation and reduction of the metadata in MapDataset.stack about gammapy HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent