Git Product home page Git Product logo

Comments (6)

erip avatar erip commented on July 17, 2024 2

I think this can be closed. As of 0.9.0 there's a build_vocab_from_iterator which accepts a List[List[str]].

from text.

jekbradbury avatar jekbradbury commented on July 17, 2024

Is the number of sentences in each sample fixed? Because if not, torchtext doesn't have a good way to do this right now (if yes, use Example.from_list in a custom Dataset). There might be a way to hack up something that would work for the variable-number-of-sentences case, but it would involve subclassing Field, Example, and Dataset and that's probably not worth it.

from text.

Spider101 avatar Spider101 commented on July 17, 2024

Yes, the number of sentences in each sequence is fixed but the number of words in each sentence is variable. So once I convert the list into Examples, is there any streamlined way to convert the Examples into, say, a TabularDataset that I can iterate over?

from text.

jekbradbury avatar jekbradbury commented on July 17, 2024

OK, if the number of sentences per example is fixed, then use a single Example object per sequence with one column for each sentence. You can create Examples like this:

ex = SequentialExample.fromlist(raw_list_of_sentences, [('sentence' + str(i), text_field) for i in range(len(raw_list_of_sentences))])

then build an iterator as normal and use the batches like this

for sentence_index in range(num_sentences):
    do_something(getattr(batch, 'sentence' + str(sentence_index)))

from text.

yyHaker avatar yyHaker commented on July 17, 2024

Hi, SequentialExample is an object in TorchText? Would you please

OK, if the number of sentences per example is fixed, then use a single Example object per sequence with one column for each sentence. You can create Examples like this:

ex = SequentialExample.fromlist(raw_list_of_sentences, [('sentence' + str(i), text_field) for i in range(len(raw_list_of_sentences))])

then build an iterator as normal and use the batches like this

for sentence_index in range(num_sentences):
    do_something(getattr(batch, 'sentence' + str(sentence_index)))

Hi, SequentialExample is an object in TorchText? Would you please say more clearly, thank you for advance.

from text.

Nayef211 avatar Nayef211 commented on July 17, 2024

I think this can be closed. As of 0.9.0 there's a build_vocab_from_iterator which accepts a List[List[str]].

Thanks for identifying stale issues @erip. Moving forward, we should make an effort to clean up non-relevant issues and PRs. I will create an issue for this soon 😃

from text.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.