Currently DeepAR does not use categorical and dynamic features by default even though

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Use categoical and dynamc features by default in DeepAR about gluonts HOT 4 OPEN

awslabs commented on May 22, 2024

Use categoical and dynamc features by default in DeepAR

from gluonts.

Comments (4)

lostella commented on May 22, 2024 1

I think the ideal solution would be to drive what is being used from the data (and therefore expected in the data) using schema-like structures like the following:

{
    'start': {},
    'target': {'shape': ()},
    'feat_dynamic_real': {'shape': (1,)},
    'feat_static_cat': {'shape': (3,), 'cardinality': [4, 5, 6]}
}

This could be used among other things to configure the transformation chain: the keys in such dictionary will tell you what fields are expected to be in the data. Using this schema-like dictionary,
estimators can do many things:

They can assume a minimal schema {'start': {}, 'target': {'shape': ()} unless a different one is specified; this would pretty much amount to the current behaviour, with the difference that the user will be able to specify everything about the data in one single object (instead of potentially 4 flags and 2 cardinalities)
Or, we could decide of inferring such a schema from the training data, as soon as training is triggered.
Given such a schema, one can use it to validate a DataEntry or a whole Dataset.

Constructing such a schema from the training data would require a full pass through the dataset, not only looking at which fields are there, but also looking for the maximum of all categorical features (to get the cardinality of their domain). But this doesn't seem too bad to me.

There are some structures in the codebase that aim at something similar I think (cfr. MetaData). I'm working on a POC for this, I'll send it around when I'm satisfied with it :-)

from gluonts.

mbohlkeschneider commented on May 22, 2024

Which case is important/priority for us right now: Running smoothly even if the data is not properly formatted or running with correct options that succeeds only when the data is consistent?

I think in case of GluonTS, we aspire to make a scientific library. Thus, I think the algorithms should fail if there are issues in the data. That informs the user that something is not right. Otherwise, you are left wondering why your results are not as good as you are expecting, especially if something is silently not used/discarded/filtered. I think this behavior should be avoided throughout the code base.

from gluonts.

benidis commented on May 22, 2024

I have started looking at this issue the last two days and it is a combination of addressing the input format question and defining the correct transformations behaviour. I agree with Michael that we should not do things silently and if something is wrong we should throw an error instead of trying to filter it internally. However, this opens more questions:

Should we check if all the fields are correct in a dataset (probably while creating windows) and throw an error if not? This adds some complexity since it needs to be applied at each created window.
What should we do with custom fields in a dataset or with fields that are not used by the model, especially with the ones that can break the code, e.g. #94 (note that the fix is not global but only for deepar - any other estimator can fail with the same issue)?
Setting aside deepar and looking at the bigger picture, what should be the behaviour of all estimators regarding the input data? Should they always take into account (or at least have the option to do so) a field that appears in the dataset or should they use only prespecified fields regardless of the input data as we were doing up to now?

For the cardinality question I think inferring it from the data in an efficient way is ideal but probably not possible. I think that an informative error message would do the job. Something like: "You are using categorical features but you have not set the cardinality hyperparameter correctly". For the flags part on deepar I have exactly the same opinion (default should be to use the feature since people usually do not know or do not bother to change these values).

from gluonts.

sujayramaiah commented on May 22, 2024

@lostella Can you please confirm if you were able to complete the POC?
Your solution would make using DeepAR much more easier provided we input the data in correct format without having to worry about setting multiple flags. It would also be good if we can log what are the dynamic features and categorical features being used by the model.

from gluonts.

Use categoical and dynamc features by default in DeepAR about gluonts HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent