Comments (5)
Hi @IsThatYou this is a great point --- I tried GPT-3.5 and it indeeds performs badly on game of 24. Note that IO: 36% CoT:42% are pass@100 though.
We also tried ToT using GPT-3.5-turbo instead of GPT-4 on Creative Writing (scoring is still via GPT-4). We find all methods perform worse, but ToT is still significantly better than other methods.
Creative Writing | GPT-4 (in paper) | GPT-3.5-turbo |
---|---|---|
IO | 6.19 | 4.47 |
CoT | 6.93 | 5.16 |
ToT | 6.93 | 6.62 |
In general, I believe proposing and evaluating diverse thoughts is an "emerging capability" that is hard even for GPT-4, but significantly harder for smaller/weaker models. It would be important and interesting to study how to make smaller models better at ToT reasoning!
from tree-of-thought-llm.
Yes I agree, and perhaps some better prompt engineering can help with the issue.
from tree-of-thought-llm.
Does this mean that using open source models like vicuna will be worse?
from tree-of-thought-llm.
Actually, I want to use ToT to improve the reasoning ability of open source models, so that they can be close to the reasoning level of gpt3.5, rather than the superficial dialogue style.
from tree-of-thought-llm.
Hi @ysymyth thank you for the response! I closely looked and compared some of the generations between gpt-3.5 and gpt-4, I found gpt-4 to be better at task understanding in general. gpt-3.5 degenerates more often than gpt-4. Anyway, this is pretty interesting. It is definitely interesting to see how to make smaller models better at them. :D
from tree-of-thought-llm.
Related Issues (20)
- need more detail information in the readme document HOT 4
- Backtracking support ? HOT 1
- Does sample selection require np.random.choice(replace=False)? HOT 1
- Errors when running sh scripts/game24/bfs.sh and when directly running run.py HOT 1
- How to use it on oobabooga / text-generation-webui? HOT 1
- install tot error : can not find README.md HOT 2
- src/tot directory issue HOT 1
- Experiment takes too long to run HOT 2
- MiniCrosswordsTask() troubles HOT 2
- The first step of Setup(Setup OpenAI key) is not right in Google Colab ubuntu environment HOT 4
- openai.error.ServiceUnavailableError: The server is overloaded or not ready yet. HOT 5
- Open source llms HOT 1
- Text generation task is not implemented as what the paper shows HOT 2
- MiniCrosswords performance HOT 6
- How to use custom inputs? HOT 1
- 'value_prompt' and function 'propose_score'
- how to get the value HOT 1
- Marketing suggestion for your idea HOT 1
- A Missing Default Argument in MiniCrosswordsTask HOT 1
- Run time
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tree-of-thought-llm.