Comments (6)
I think this can be closed. As of 0.9.0 there's a build_vocab_from_iterator
which accepts a List[List[str]]
.
from text.
Is the number of sentences in each sample fixed? Because if not, torchtext doesn't have a good way to do this right now (if yes, use Example.from_list in a custom Dataset). There might be a way to hack up something that would work for the variable-number-of-sentences case, but it would involve subclassing Field, Example, and Dataset and that's probably not worth it.
from text.
Yes, the number of sentences in each sequence is fixed but the number of words in each sentence is variable. So once I convert the list into Examples, is there any streamlined way to convert the Examples into, say, a TabularDataset that I can iterate over?
from text.
OK, if the number of sentences per example is fixed, then use a single Example
object per sequence with one column for each sentence. You can create Example
s like this:
ex = SequentialExample.fromlist(raw_list_of_sentences, [('sentence' + str(i), text_field) for i in range(len(raw_list_of_sentences))])
then build an iterator as normal and use the batches like this
for sentence_index in range(num_sentences):
do_something(getattr(batch, 'sentence' + str(sentence_index)))
from text.
Hi, SequentialExample is an object in TorchText? Would you please
OK, if the number of sentences per example is fixed, then use a single
Example
object per sequence with one column for each sentence. You can createExample
s like this:ex = SequentialExample.fromlist(raw_list_of_sentences, [('sentence' + str(i), text_field) for i in range(len(raw_list_of_sentences))])
then build an iterator as normal and use the batches like this
for sentence_index in range(num_sentences): do_something(getattr(batch, 'sentence' + str(sentence_index)))
Hi, SequentialExample is an object in TorchText? Would you please say more clearly, thank you for advance.
from text.
I think this can be closed. As of 0.9.0 there's a
build_vocab_from_iterator
which accepts aList[List[str]]
.
Thanks for identifying stale issues @erip. Moving forward, we should make an effort to clean up non-relevant issues and PRs. I will create an issue for this soon 😃
from text.
Related Issues (20)
- Insta Doxxxx HOT 1
- One of the three datasets returned by Multi30k seems to be bugged.
- Confusing docs for build_vocab_from_iterator
- how to run this code
- UTF-8 error with testing set of `torchtext.datasets.Multi30k(language_pair=("de", "en"))`. HOT 4
- Torch Text Transform Documentation Mismatch
- The Future of torchtext HOT 1
- BLEU_SCORE weird behaviour
- Fail to import torchtext KeyError: 'SP_DIR' HOT 1
- how to install libtorchtext for cpp project use? please give some operation .thanks
- Unable to download wikitext datasets HOT 4
- AttributeError: module 'torchtext' has no attribute 'legacy'
- # Liste von Namen und Alter personen = [ {"name": "Max", "alter": 30}, {"name": "Anna", "alter": 25}, {"name": "Lisa", "alter": 35} ] # Ausgabe der Liste for person in personen: print("Name:", person["name"]) print("Alter:", person["alter"]) print()
- [Release Blocking] TorchData is too old for PyTorch 2.3 HOT 1
- Remove SpaCy/NLTK as an optional dependency by creating our own tokenizer for a number of languages
- wikitext-2 is not available anymore HOT 2
- Why torchtext needs to reinstall torch
- [RFC] Deprecate/Stop TorchText releases starting with Pytorch release 2.4 HOT 9
- PyTorch 2.4 is not supported by TorchText
- Wikitext-103 URL is down HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text.