Git Product home page Git Product logo

Comments (10)

wentaoxandry avatar wentaoxandry commented on September 24, 2024 1

ok, I understand, I will try first segment the pretrain set and than use it. Thank you so much.

from avsr-tf1.

georgesterpu avatar georgesterpu commented on September 24, 2024

Hallo @wentaoxandry

I remember seeing a similar error whenever my system ran out of RAM.
Could you monitor the RAM usage after launching the experiment ?

Also, try reducing the buffer size of the function that shuffles the batches in memory:
https://github.com/georgesterpu/Sigmedia-AVSR/blob/master/avsr/io_utils.py#L89

from avsr-tf1.

wentaoxandry avatar wentaoxandry commented on September 24, 2024

Hallo @georgesterpu ,

Thank you for your help, I have try change buffer_size from 5000 to 100, but it getting better but still be killed, before change it can only do the first 3 batch, now after first 6 batch, it still be killed.

The memory will always increase after each batch,

from avsr-tf1.

georgesterpu avatar georgesterpu commented on September 24, 2024

What batch size are you using ?
And what is the average length (in seconds) of a sentence in your dataset ?

from avsr-tf1.

wentaoxandry avatar wentaoxandry commented on September 24, 2024

the batch size is still (48, 64), but I have also tried (24,32), still be killed.
the average length is 7s.

from avsr-tf1.

georgesterpu avatar georgesterpu commented on September 24, 2024

What video resolution did you use when writing the video tfrecord files ?

from avsr-tf1.

wentaoxandry avatar wentaoxandry commented on September 24, 2024

I used output_resolution = (36,36)

from avsr-tf1.

georgesterpu avatar georgesterpu commented on September 24, 2024

I believe that, for large scale experiments, we may need to redesign and optimise the input pipeline, but this won't happen until April.

Can you please confirm that your script gets killed when the system runs out of memory ?

from avsr-tf1.

wentaoxandry avatar wentaoxandry commented on September 24, 2024

Actually I also use lrs2 database, but I combine pretrain and train set and use the whole set as the training set, for test is still test set.

I can‘t find the specific script , but only know it is in avsr.py, the train function,

in this while loop.
try:
while True:
out = self._train_session.run([self._train_model.model.train_op,
self._train_model.model.batch_loss,
], **self.sess_opts)

                    if self._hparams.profiling is True:
                    self.profiler.add_step(batches, self.run_meta)

                    from tensorflow.python.profiler import option_builder

                    self.profiler.profile_name_scope(options=(option_builder.ProfileOptionBuilder
                                                              .trainable_variables_parameter()))

                    opts = option_builder.ProfileOptionBuilder.time_and_memory()
                    self.profiler.profile_operations(options=opts)

                    opts = (option_builder.ProfileOptionBuilder(
                        option_builder.ProfileOptionBuilder.time_and_memory())
                            .with_step(batches)
                            .with_timeline_output('/tmp/timelines/').build())

                    self.profiler.profile_graph(options=opts)

                sum_loss += out[1]
                print('batch: {}'.format(batches))
                batches += 1

from avsr-tf1.

georgesterpu avatar georgesterpu commented on September 24, 2024

I see, LRS2 can have some very long sentences. Sigmedia-AVSR doesn't work well with those, as it fills up your memory quite fast.

from avsr-tf1.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.