Git Product home page Git Product logo

Comments (18)

fjxmlzn avatar fjxmlzn commented on September 3, 2024 1

Sorry for the confusion--there is some discrepancy between the wording in the code and what I said. When I say metadata in the paper or here, I mean attribute in the code; when I say measurement or time series in the paper or here, I mean feature in the code.

To get back to your original question, I meant adding it as another dimension in data_attribute, and another output in data_attribute_output.pkl. The reason to add it is to ensure that after generation, we can use it together with the generated x_i-x_{i-1} in time series part to recover all x_i.

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

Could you share more details on the hyper-parameters that you are using, and examples of some rows in data_feature, data_attribute, and data_gen_flag in data_train.npz?

from doppelganger.

dgtriantis avatar dgtriantis commented on September 3, 2024

An additional problem I found is that in the resulting data samples, there are values over 1 and less than -1, with the values not conforming adequately to the input data.

The hyperparameters are:

" "batch_size": 100,
"vis_freq": 200,
"vis_num_sample": 5,
"d_rounds": 1,
"g_rounds": 1,
"num_packing": 1,
"noise": True,
"feed_back": False,
"g_lr": 0.001,
"d_lr": 0.001,
"d_gp_coe": 10.0,
"gen_feature_num_layers": 1,
"gen_feature_num_units": 100,
"gen_attribute_num_layers": 3,
"gen_attribute_num_units": 100,
"disc_num_layers": 5,
"disc_num_units": 200,
"initial_state": "random",
"attr_d_lr": 0.001,
"attr_d_gp_coe": 10.0,
"g_attr_d_coe": 1.0,
"attr_disc_num_layers": 5,
"attr_disc_num_units": 200

"epoch": [600],
"run": [0],
"sample_len": [1, 5, 10, 20],
"extra_checkpoint_freq": [5],
"epoch_checkpoint_freq": [1],
"aux_disc": [True],
"self_norm": [True]

data_feature:

    [[[ 0.40238473,  0.54027295, -0.06659341,  0.13676707],
    [ 0.40435192,  0.5429528 , -0.05841919,  0.1444367 ],
    [ 0.40502715,  0.54378265, -0.05112771,  0.15125336],
    ...,
    [ 0.        ,  0.        ,  0.        ,  0.        ],
    [ 0.        ,  0.        ,  0.        ,  0.        ],
    [ 0.        ,  0.        ,  0.        ,  0.        ]],


   [[ 0.4190874 ,  0.22841692, -0.06659341,  0.13676707],
    [ 0.41735172,  0.2271396 , -0.05560642,  0.14006932],
    [ 0.41562375,  0.22764418, -0.04452052,  0.14354612],
    ...,
    [ 0.        ,  0.        ,  0.        ,  0.        ],
    [ 0.        ,  0.        ,  0.        ,  0.        ],
    [ 0.        ,  0.        ,  0.        ,  0.        ]],

   ...,

   [[ 0.375146  ,  0.56082535, -0.06659341,  0.13676707],
    [ 0.3768556 ,  0.5613334 , -0.05965237,  0.14395174],
    [ 0.3786404 ,  0.5616295 , -0.05256618,  0.15087192],
    ...,
    [ 0.        ,  0.        ,  0.        ,  0.        ],
    [ 0.        ,  0.        ,  0.        ,  0.        ],
    [ 0.        ,  0.        ,  0.        ,  0.        ]],

   [[ 0.16939865,  0.5071855 , -0.06659341,  0.13676707],
    [ 0.17080542,  0.5078957 , -0.05965237,  0.14395174],
    [ 0.17278804,  0.5082092 , -0.05256618,  0.15087192],
    ...,
    [ 0.        ,  0.        ,  0.        ,  0.        ],
    [ 0.        ,  0.        ,  0.        ,  0.        ],
    [ 0.        ,  0.        ,  0.        ,  0.        ]],

   [[ 0.4324977 ,  0.5378058 , -0.06659341,  0.13676707],
    [ 0.4324977 ,  0.5378058 , -0.05965237,  0.14395174],
    [ 0.4324977 ,  0.5378058 , -0.05256618,  0.15087192],
    ...,
    [ 0.        ,  0.        ,  0.        ,  0.        ],
    [ 0.        ,  0.        ,  0.        ,  0.        ],
    [ 0.        ,  0.        ,  0.        ,  0.        ]]],
  dtype=float32)

data_attribute (max value is 63, min value is 0):

   [[ 1.],
   [ 1.],
   [13.],
   ...,
   [17.],
   [17.],
   [17.]], dtype=float32)

data_gen_flag:

   [[1., 1., 1., ..., 0., 0., 0.],
   [1., 1., 1., ..., 0., 0., 0.],
   [1., 1., 1., ..., 0., 0., 0.],
   ...,
   [1., 1., 1., ..., 0., 0., 0.],
   [1., 1., 1., ..., 0., 0., 0.],
   [1., 1., 1., ..., 0., 0., 0.]], dtype=float32)

from doppelganger.

dgtriantis avatar dgtriantis commented on September 3, 2024

I posted this question by a different account by mistake, I am the same person with @tzimbolis.

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

Thank you! Could you please also provide the content inside "data_feature_output.pkl" and "data_attribute_output.pkl"?

from doppelganger.

dgtriantis avatar dgtriantis commented on September 3, 2024
         data_feature_output = [Output(type_=OutputType.CONTINUOUS, dim=4, 
         normalization=Normalization.MINUSONE_ONE, is_gen_flag=False)]

         data_attribute_output = [Output(type_=OutputType.DISCRETE, dim=1, normalization=None, is_gen_flag=False)]

If you'd like I can also send you an email with the data, in case you want to check something yourself.
Thanks

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

Thanks for the information.

One issue is that the data attribute needs to be stored in one-hot encoding, i.e.,
if the value is 0, that row should be [1, 0, 0, ...] (63 zeros after the 1)
if the value is 1, that row should be [0, 1, 0, 0, ...] (62 zeros after the 1)
...
if the value is 63, that row should be [0, 0, ..., 0, 1] (63 zeros before the 1)
The shape of data_attribute should be [number of samples, 64]

In addition, data_attribute_output should be

data_attribute_output = [Output(type_=OutputType.DISCRETE, dim=64, normalization=None, is_gen_flag=False)]

Please let me know if the results are still not as expected after fixing this.

from doppelganger.

dgtriantis avatar dgtriantis commented on September 3, 2024

The results after making the proposed changes (data_attribute_output = [Output(type_=OutputType.DISCRETE, dim=max_len, normalization=None, is_gen_flag=False)], where max_len = 64), the attributes are not discrete:

    array([[5.0039098e-08, 6.6108881e-03, 2.2625262e-01, ..., 4.9293556e-09,
    2.1883304e-10, 1.4889185e-08],
   [1.7649402e-17, 4.5475384e-04, 1.2486913e-19, ..., 9.4630347e-23,
    4.4634356e-19, 2.8835797e-20],
   [1.9636389e-18, 1.1713755e-03, 8.3317159e-04, ..., 6.3824905e-21,
    8.5015989e-24, 4.3212665e-19],
   ...,
   [4.0655615e-07, 3.0217332e-01, 1.0044121e-05, ..., 2.7380713e-09,
    3.3998028e-08, 4.3662823e-08],
   [1.7798087e-16, 1.7455160e-06, 7.8983231e-20, ..., 1.3778067e-18,
    8.6902809e-19, 1.6815877e-18],
   [1.6348466e-08, 6.6678558e-06, 1.1912930e-09, ..., 1.1338585e-11,
    1.4493015e-10, 2.1054688e-11]], dtype=float32)

Do you have any idea why?

from doppelganger.

dgtriantis avatar dgtriantis commented on September 3, 2024

Also the generated sample features don't make sense? What could I do about that?

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

The code does not discretize the generated attributes yet; they are the raw outputs from softmax. You will need to manually use argmax to get the discrete version of the generated attributes.

Can you share more details on how "the generated sample features don't make sense"? E.g., in what metrics?

from doppelganger.

dgtriantis avatar dgtriantis commented on September 3, 2024

So the data that I'm looking to synthesize are pedestrian - autonomous vehicle interactions. As you can understand the trajectories are generally and approximately linear in almost all cases (see an example in the first picture below).
On the other hand, the generated data are consisting of convoluted trajectories, which are not realistic for the said interactions (in most cases they also intersect, something that does not happen in any of the approx. 6000 interactions that I input to DoppelGANger) (you can see an example in the second picture).
image
image

It could be said that there are certain limits-regulations (e.g. specific limit for the difference of the coordinates between two consecutive timesteps) concerning the generated trajectories that need to be in order, could something like that be integrated into the DoppelGANger code (by myself)? If not, is there something else I could try to "rationalize" the generated samples?

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

Regarding "there are values over 1 and less than -1". This could happen when 'self_norm' is turned on. Could you please try setting both 'self_norm' and 'aux_disc' to False and see what the results look like?

Regarding 'generated trajectories that need to be in order'. A simpler version of that could be achieved by a simple data preprocessing trick. For example, if we want the x coordinate always to increase, we can preprocess a trajectory to be 'delta x' instead of x. More specifically, assume that the original trajectory is [x_0,x_1,...,x_t], we can add another metadata x_0, and change the time series to [x_1',...,x_t'] where x_i' = x_i-x_{i-1}. The real x_i' will always be >0, and so will be the generated data. We can then transform the generated x_0 and x_i' back to the original x_i, which will always increase.

This trick has been used in our follow-up paper for learning strictly increasing timestamps: https://dl.acm.org/doi/pdf/10.1145/3544216.3544251

from doppelganger.

dgtriantis avatar dgtriantis commented on September 3, 2024

The results that are shared in my previous comment have resulted from input data that were preprocessed in a similar way(the zero timestep AV coordinates for each timeseries were transformed into the origin of the coordinate system for each timeseries) . Regarding "there are values over 1 and less than -1", that is no longer a problem but the results are still not logical (in the same way as before).

Is there a way to add specific conditions to doppelganger by adding some code to the existing one or is it uncompatible for such a transformation?

Thanks

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

Sorry I don't understand fully. Would you mind explaining more about "the zero timestep AV coordinates for each timeseries were transformed into the origin of the coordinate system for each timeseries", and what "specific conditions" or "transformation" you need?

from doppelganger.

dgtriantis avatar dgtriantis commented on September 3, 2024

I did something similar to what you proposed, which is:
I changed the origin of the coordinate system for each sequence so that it is equal to the coordinates of one of the two agents in the first timestep ( the agents are one pedestrian and one autonomous vehicle per sequence-interaction).

When it comes to the conditions, as you can see in the examples a posted previously, the real trajectories are approximately linear, whereas the generated ones are majorly convoluted. The conditions that I am thinking of regulate the generation so that the generated samples are not illogical (e.g. don't have sudden changes in the inclination of the trajectory or have the two agents crossing each other). In short, I am talking about numerical conditions between the features for each timestep of a sequence. For example, if x[i] is the x coordinate in timestep i then: x[i]-x[i-1] < 0.1.

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

Got it. Thanks for the explanation. The transformation I mentioned should work.

Assume that the original trajectory is [x_0,x_1,...,x_t], we can add another metadata x_0, and change the time series to [x_1',...,x_t'] where x_i' = x_i-x_{i-1}.

(Note that, here, x_i' = x_i-x_{i-1}. The approach you mentioned is instead x_i' = x_i-x_0.)

After doing the above transformation on the original data, you will still need to normalize the time series to between [0, 1] (or [-1, 1]) before using DoppelGANger. Therefore, if for the real data, x[i]-x[i-1] < 0.1 is always satisfied, the generated data will also satisfy that (when self_norm=False), as [0, 1] (or [-1, 1]) will be mapped back to a value < 0.1.

from doppelganger.

dgtriantis avatar dgtriantis commented on September 3, 2024

When you say "add another metadata x_0", you mean add it seperately in "data_feature_output.pkl"?

from doppelganger.

fjxmlzn avatar fjxmlzn commented on September 3, 2024

Feel free to reopen the issue if the problem persists.

from doppelganger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.