Git Product home page Git Product logo

textbook_quality's Introduction

Textbook Quality

This project generates very long, textbook quality pretraining data. Here's a 70M token example. It can run generations in parallel, against OpenAI, or your own API. It can generate the topics from scratch, or use a set of seeds you provide.

The generator uses retrieval to improve quality. By default, it will use Serply to do the retrieval, but you can also use SerpAPI, or disable retrieval.

The core is extensible, so you can add your own adaptors to connect to new APIs and retrieval backends.

Installing

Prerequisites

  • Python 3.9+ (ideally 3.11)
  • You will need postgres installed. You can install it with brew install postgres on a Mac.

Setup

  • psql postgres -c "create database textbook;"
  • git clone https://github.com/VikParuchuri/textbook_quality.git
  • cd textbook_quality
  • poetry install
  • invoke migrate-dev

Configuration

First, create a local.env file in the root directory of the repo to store your secret keys. Alternatively, you can set any key below as an env var.

You can see all the available configuration values in app/settings.py.

With OpenAI and retrieval (highest quality)

  • Add your OpenAI key, like OPENAI_KEY=sk-xxxxxx
  • Add your serply key (SERPLY_KEY="...") or serpapi key (SERPAPI_KEY="...").
  • Add SEARCH_BACKEND=serply or SEARCH_BACKEND=serpapi to use the appropriate backend.

By default, this will use gpt-3.5. You can use gpt-4 by setting the env vars LLM_TYPE, LLM_INSTRUCT_TYPE to gpt-4. You may be able to get away with setting LLM_EXTENDED_TYPE to gpt-4 as well, but you may need longer than 8k context.

With vllm or other openai-compatible API and retrieval

  • Set OPENAI_KEY to the value of your API key, or a dummy value.
  • Set OPENAI_BASE_URL to the url of your API (like https://vllm-api.com/v1)
  • Set the LLM_TYPE, LLM_INSTRUCT_TYPE, and LLM_EXTENDED_TYPE settings to your model name (like llama)
  • Set the model name and max tokens in the LLM_TYPES setting.
  • Follow the instructions above for the retrieval setup.

The generator ideally needs a context length of up to 16k, but you can get away with 12k if you need to. If you've finetuned your own model for textbook gen (based on the prompts cached in this repo), you can use the FINETUNED and INCLUDE_EXAMPLES settings to reduce token usage.

Without retrieval

  • Set SEARCH_BACKEND=none

Usage

There are three main scripts in the repo. You can run each script on the output of the previous one. All outputs will appear by default in app/data, which is the specified DATA_DIR in settings.

Generate topics from scratch

You enter a subject, a file you want to save the topics to, and the number of iterations. The topics will be deduplicated.

Usage example:

python topic_generator.py "computer science with python" python_cs_titles.json --iterations 50

Augment topics from seeds

Take a file with existing seeds (in a flat json list), and augment them. You can pass in the output file from the topic generator as the seed file, or use your own seeds. Domain is an optional flag to constrain the topics within a domain.

This will also deduplicate the topics semantically.

Usage example:

python topic_augmentor.py python_titles.json python_topics.json --domain python

Generate textbooks

From titles

This will take a file with a flat json list of topics, and generate one textbook per topic. The workers flag controls the number of parallel generations. Lower it if you hit rate limits.

Usage example:

python book_generator.py topics.json books.jsonl --workers 5

You can also override settings with environment variables (instead of using local.env). This example will use a vllm api instead of openai:

LLM_TYPE=llama LLM_INSTRUCT_TYPE=llama LLM_EXTENDED_TYPE=llama OPENAI_KEY="llama" OPENAI_BASE_URL="https://vllm-api.com/v1" python book_generator.py topics.json books.jsonl --workers 10

You can see all options by running python book_generator.py --help.

Note that courses are cached by default, so regenerating a course with the same name twice will not hit the API again. The cache is specific to each model and each topic. You can skip the cache by using the --revision option to specify a revision number for the courses.

From outlines

You can also generate a book from an existing outline by creating a jsonl file with the following fields:

  • topic - The topic/title of the book
  • outline - The outline of the book, as a flat json list. This needs to be in a specific format, see "clean table of contents" below.
  • queries - Up to 2 search queries to use for retrieval. If you don't want to use retrieval, set this to an empty list.

Clean tables of contents

This will take in a jsonl file with an existing table of contents and title, and process it into the correct format for book generation.

Usage example:

python toc_cleaner.py toc.jsonl clean_toc.jsonl

toc.jsonl should have the following fields in each line:

  • title - The title of the book
  • toc - a string containing the table of contents. This can be poorly formatted

Extending

You can extend this to add in new LLM adaptors, retrieval methods, or tasks. PRs are very welcome.

  • LLM adapters are in app/llm/adaptors
  • Retrieval methods are in app/services/adaptors. You may also need to adjust settings in services/generators/pdf.py
  • Tasks are in app/llm/generators

Debugging

By default, a lot of exceptions will be hidden to avoid console noise. Use DEBUG=true to display them, like this:

DEBUG=true python book_generator.py python_topics.json books.jsonl --max 5 --workers 5

textbook_quality's People

Contributors

vikparuchuri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

textbook_quality's Issues

password authentication failed when run invoke migrate-dev

I run the program in the wsl2 Ubuntu on windows 11. After steps of
psql postgres -c "create database textbook;"
git clone https://github.com/VikParuchuri/textbook_quality.git
cd textbook_quality
poetry install
poetry shell

Errors ou when run the command
invoke migrate-dev
I am not very familiar with postgresql, I created role tluo with access db textbook. can't tell why there is problem. Thanks for your help.

Traceback (most recent call last):
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/bin/alembic", line 8, in
sys.exit(main())
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/alembic/config.py", line 630, in main
CommandLine(prog=prog).main(argv=argv)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/alembic/config.py", line 624, in main
self.run_cmd(cfg, options)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/alembic/config.py", line 601, in run_cmd
fn(
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/alembic/command.py", line 399, in upgrade
script.run_env()
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/alembic/script/base.py", line 578, in run_env
util.load_python_file(self.dir, "env.py")
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/alembic/util/pyfiles.py", line 93, in load_python_file
module = load_module_py(module_id, path)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/alembic/util/pyfiles.py", line 109, in load_module_py
spec.loader.exec_module(module) # type: ignore
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/home/tluo/llm/textbook_quality/alembic/env.py", line 108, in
run_migrations_online()
File "/home/tluo/llm/textbook_quality/alembic/env.py", line 102, in run_migrations_online
asyncio.run(run_async_migrations())
File "/home/tluo/anaconda3/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/tluo/anaconda3/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/home/tluo/llm/textbook_quality/alembic/env.py", line 65, in run_async_migrations
async with connectable.connect() as connection:
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/ext/asyncio/base.py", line 60, in aenter
return await self.start(is_ctxmanager=True)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/ext/asyncio/engine.py", line 157, in start
await (greenlet_spawn(self.sync_engine.connect))
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 126, in greenlet_spawn
result = context.throw(*sys.exc_info())
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/future/engine.py", line 406, in connect
return super(Engine, self).connect()
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3315, in connect
return self._connection_cls(self, close_with_result=close_with_result)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 96, in init
else engine.raw_connection()
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3394, in raw_connection
return self._wrap_pool_connect(self.pool.connect, _connection)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3361, in _wrap_pool_connect
return fn()
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 320, in connect
return _ConnectionFairy._checkout(self)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 884, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 486, in checkout
rec = pool._do_get()
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 256, in _do_get
return self._create_connection()
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 266, in _create_connection
return _ConnectionRecord(self)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 381, in init
self.__connect()
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 677, in connect
with util.safe_reraise():
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 70, in exit
compat.raise
(
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/util/compat.py", line 208, in raise

raise exception
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 673, in __connect
self.dbapi_connection = connection = pool._invoke_creator(self)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 578, in connect
return dialect.connect(*cargs, **cparams)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 598, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 780, in connect
await_only(self.asyncpg.connect(*arg, **kw)),
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 68, in await_only
return current.driver.switch(awaitable)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 121, in greenlet_spawn
value = await result
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/asyncpg/connection.py", line 2114, in connect
return await connect_utils._connect(
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 982, in _connect
conn = await _connect_addr(
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 817, in _connect_addr
return await __connect_addr(params_retry, timeout, False, *args)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/asyncpg/connect_utils.py", line 866, in __connect_addr
await compat.wait_for(connected, timeout=timeout)
File "/home/tluo/.cache/pypoetry/virtualenvs/textbook-quality-s5WH1yUv-py3.10/lib/python3.10/site-packages/asyncpg/compat.py", line 60, in wait_for
return await asyncio.wait_for(fut, timeout)
File "/home/tluo/anaconda3/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
asyncpg.exceptions.InvalidPasswordError: password authentication failed for user "tluo"

poetry show
aiohttp 3.8.6 Async http client/server framework (asyncio)
aiosignal 1.3.1 aiosignal: a list of registered asynchronous callbacks
alembic 1.12.0 A database migration tool for SQLAlchemy.
anyio 4.0.0 High level compatibility layer for multiple asynchronous event loop implementations
argon2-cffi 23.1.0 Argon2 for Python
argon2-cffi-bindings 21.2.0 Low-level CFFI bindings for Argon2
arrow 1.3.0 Better dates & times for Python
astroid 2.15.8 An abstract syntax tree for Python with inference support.
asttokens 2.4.0 Annotate AST trees with source code positions
async-lru 2.0.4 Simple LRU cache for asyncio
async-timeout 4.0.3 Timeout context manager for asyncio programs
asyncpg 0.28.0 An asyncio PostgreSQL driver
attrs 23.1.0 Classes Without Boilerplate
autoflake 2.2.1 Removes unused imports and unused variables
babel 2.13.0 Internationalization utilities
backcall 0.2.0 Specifications for callback functions passed in to an API
beautifulsoup4 4.12.2 Screen-scraping library
black 23.9.1 The uncompromising code formatter.
bleach 6.1.0 An easy safelist-based HTML-sanitizing tool.
certifi 2023.7.22 Python package for providing Mozilla's CA Bundle.
cffi 1.16.0 Foreign Function Interface for Python calling C code.
charset-normalizer 3.3.0 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet.
click 8.1.7 Composable command line interface toolkit
cmake 3.27.6 CMake is an open-source, cross-platform family of tools designed to build, test and package software
comm 0.1.4 Jupyter Python Comm implementation, for usage in ipykernel, xeus-python etc.
datasets 2.14.5 HuggingFace community-driven open-source library of datasets
debugpy 1.8.0 An implementation of the Debug Adapter Protocol for Python
decorator 5.1.1 Decorators for Humans
defusedxml 0.7.1 XML bomb protection for Python stdlib modules
dill 0.3.7 serialize all of Python
exceptiongroup 1.1.3 Backport of PEP 654 (exception groups)
executing 2.0.0 Get the currently executing AST node of a frame, and other information
fastjsonschema 2.18.1 Fastest Python implementation of JSON schema
filelock 3.12.4 A platform independent file lock.
fqdn 1.5.1 Validates fully-qualified domain names against RFC 1123, so that they are acceptable to modern bowsers
frozenlist 1.4.0 A list-like structure which implements collections.abc.MutableSequence
fsspec 2023.6.0 File-system specification
ftfy 6.1.1 Fixes mojibake and other problems with Unicode, after the fact
greenlet 2.0.2 Lightweight in-process concurrent programming
huggingface-hub 0.17.3 Client library to download and publish models, datasets and other repos on the huggingface.co hub
idna 3.4 Internationalized Domain Names in Applications (IDNA)
invoke 2.2.0 Pythonic task execution
ipykernel 6.25.2 IPython Kernel for Jupyter
ipython 8.16.1 IPython: Productive Interactive Computing
ipython-genutils 0.2.0 Vestigial utilities from IPython
ipywidgets 8.1.1 Jupyter interactive widgets
isoduration 20.11.0 Operations with ISO 8601 durations
isort 5.12.0 A Python utility / library to sort Python imports.
jedi 0.19.1 An autocompletion tool for Python that can be used for text editors.
jinja2 3.1.2 A very fast and expressive template engine.
joblib 1.3.2 Lightweight pipelining with Python functions
json5 0.9.14 A Python implementation of the JSON5 data format.
jsonpointer 2.4 Identify specific nodes in a JSON document (RFC 6901)
jsonschema 4.19.1 An implementation of JSON Schema validation for Python
jsonschema-specifications 2023.7.1 The JSON Schema meta-schemas and vocabularies, exposed as a Registry
jupyter 1.0.0 Jupyter metapackage. Install all the Jupyter components in one go.
jupyter-client 8.3.1 Jupyter protocol implementation and client libraries
jupyter-console 6.6.3 Jupyter terminal console
jupyter-core 5.3.2 Jupyter core package. A base package on which Jupyter projects rely.
jupyter-events 0.7.0 Jupyter Event System library
jupyter-lsp 2.2.0 Multi-Language Server WebSocket proxy for Jupyter Notebook/Lab server
jupyter-server 2.7.3 The backend—i.e. core services, APIs, and REST endpoints—to Jupyter web applications.
jupyter-server-terminals 0.4.4 A Jupyter Server Extension Providing Terminals.
jupyterlab 4.0.6 JupyterLab computational environment
jupyterlab-pygments 0.2.2 Pygments theme using JupyterLab CSS variables
jupyterlab-server 2.25.0 A set of server components for JupyterLab and JupyterLab like applications.
jupyterlab-widgets 3.0.9 Jupyter interactive widgets for JupyterLab
lazy-object-proxy 1.9.0 A fast and thorough lazy object proxy.
lit 17.0.2 A Software Testing Tool
mako 1.2.4 A super-fast templating language that borrows the best ideas from the existing templating languages.
markdown 3.5 Python implementation of John Gruber's Markdown.
markupsafe 2.1.3 Safely add untrusted strings to HTML/XML markup.
matplotlib-inline 0.1.6 Inline Matplotlib backend for Jupyter
mccabe 0.7.0 McCabe checker, plugin for flake8
mistune 3.0.2 A sane and fast Markdown parser with useful plugins and renderers
mpmath 1.3.0 Python library for arbitrary-precision floating-point arithmetic
msgpack 1.0.7 MessagePack serializer
multidict 6.0.4 multidict implementation
multiprocess 0.70.15 better multiprocessing and multithreading in Python
mypy-extensions 1.0.0 Type system extensions for programs checked with the mypy type checker.
nbclient 0.8.0 A client library for executing notebooks. Formerly nbconvert's ExecutePreprocessor.
nbconvert 7.9.2 Converting Jupyter Notebooks
nbformat 5.9.2 The Jupyter Notebook format
nest-asyncio 1.5.8 Patch asyncio to allow nested event loops
networkx 3.1 Python package for creating and manipulating graphs and networks
nltk 3.8.1 Natural Language Toolkit
notebook 7.0.4 Jupyter Notebook - A web-based notebook environment for interactive computing
notebook-shim 0.2.3 A shim layer for notebook traits and config
numpy 1.25.2 Fundamental package for array computing in Python
nvidia-cublas-cu11 11.10.3.66 CUBLAS native runtime libraries
nvidia-cuda-cupti-cu11 11.7.101 CUDA profiling tools runtime libs.
nvidia-cuda-nvrtc-cu11 11.7.99 NVRTC native runtime libraries
nvidia-cuda-runtime-cu11 11.7.99 CUDA Runtime native Libraries
nvidia-cudnn-cu11 8.5.0.96 cuDNN runtime libraries
nvidia-cufft-cu11 10.9.0.58 CUFFT native runtime libraries
nvidia-curand-cu11 10.2.10.91 CURAND native runtime libraries
nvidia-cusolver-cu11 11.4.0.1 CUDA solver native runtime libraries
nvidia-cusparse-cu11 11.7.4.91 CUSPARSE native runtime libraries
nvidia-nccl-cu11 2.14.3 NVIDIA Collective Communication Library (NCCL) Runtime
nvidia-nvtx-cu11 11.7.91 NVIDIA Tools Extension
openai 0.28.1 Python client library for the OpenAI API
overrides 7.4.0 A decorator to automatically detect mismatch when overriding a method.
packaging 23.2 Core utilities for Python packages
pandas 2.1.1 Powerful data structures for data analysis, time series, and statistics
pandocfilters 1.5.0 Utilities for writing pandoc filters in python
parso 0.8.3 A Python Parser
pathspec 0.11.2 Utility library for gitignore style pattern matching of file paths.
pexpect 4.8.0 Pexpect allows easy control of interactive console applications.
pickleshare 0.7.5 Tiny 'shelve'-like database with concurrency support
pillow 10.0.1 Python Imaging Library (Fork)
platformdirs 3.11.0 A small Python package for determining appropriate platform-specific dirs, e.g. a "user data dir".
prometheus-client 0.17.1 Python client for the Prometheus monitoring system.
prompt-toolkit 3.0.39 Library for building powerful interactive command lines in Python
protobuf 4.24.4
psutil 5.9.5 Cross-platform lib for process and system monitoring in Python.
psycopg2 2.9.9 psycopg2 - Python-PostgreSQL Database Adapter
ptyprocess 0.7.0 Run a subprocess in a pseudo terminal
pure-eval 0.2.2 Safely evaluate AST nodes without side effects
pyarrow 13.0.0 Python library for Apache Arrow
pycparser 2.21 C parser in Python
pydantic 1.10.13 Data validation and settings management using python type hints
pyflakes 3.1.0 passive checker of Python programs
pygments 2.16.1 Pygments is a syntax highlighting package written in Python.
pylint 2.17.7 python code static checker
pymdown-extensions 10.3 Extension pack for Python Markdown.
pymupdf 1.23.4 A high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
pymupdfb 1.23.3 MuPDF shared libraries for PyMuPDF.
pyperclip 1.8.2 A cross-platform clipboard module for Python. (Only handles plain text for now.)
python-dateutil 2.8.2 Extensions to the standard Python datetime module
python-dotenv 1.0.0 Read key-value pairs from a .env file and set them as environment variables
python-json-logger 2.0.7 A python library adding a json log formatter
pytz 2023.3.post1 World timezone definitions, modern and historical
pyyaml 6.0.1 YAML parser and emitter for Python
pyzmq 25.1.1 Python bindings for 0MQ
qtconsole 5.4.4 Jupyter Qt console
qtpy 2.4.0 Provides an abstraction layer on top of the various Qt bindings (PyQt5/6 and PySide2/6).
ray 2.7.0 Ray provides a simple, universal API for building distributed applications.
referencing 0.30.2 JSON Referencing + Python
regex 2023.10.3 Alternative regular expression module, to replace re.
requests 2.31.0 Python HTTP for Humans.
rfc3339-validator 0.1.4 A pure python RFC3339 validator
rfc3986-validator 0.1.1 Pure python rfc3986 validator
rpds-py 0.10.4 Python bindings to Rust's persistent data structures (rpds)
safetensors 0.4.0
scikit-learn 1.3.1 A set of python modules for machine learning and data mining
scipy 1.9.3 Fundamental algorithms for scientific computing in Python
send2trash 1.8.2 Send file to trash natively under Mac OS X, Windows and Linux
sentence-transformers 2.2.2 Multilingual text embeddings
sentencepiece 0.1.99 SentencePiece python wrapper
setuptools 68.2.2 Easily download, build, install, upgrade, and uninstall Python packages
six 1.16.0 Python 2 and 3 compatibility utilities
sniffio 1.3.0 Sniff out which async library your code is running under
soupsieve 2.5 A modern CSS selector implementation for Beautiful Soup.
sqlalchemy 1.4.41 Database Abstraction Library
sqlalchemy2-stubs 0.0.2a35 Typing Stubs for SQLAlchemy 1.4
sqlmodel 0.0.8 SQLModel, SQL databases in Python, designed for simplicity, compatibility, and robustness.
stack-data 0.6.3 Extract data from python stack frames and tracebacks for informative displays
stopit 1.1.2 Timeout control decorator and context managers, raise any exception in another thread
sympy 1.12 Computer algebra system (CAS) in Python
tenacity 8.2.3 Retry code until it succeeds
terminado 0.17.1 Tornado websocket backend for the Xterm.js Javascript terminal emulator library.
threadpoolctl 3.2.0 threadpoolctl
tiktoken 0.5.1 tiktoken is a fast BPE tokeniser for use with OpenAI's models
tinycss2 1.2.1 A tiny CSS parser
tokenizers 0.14.1
tomli 2.0.1 A lil' TOML parser
tomlkit 0.12.1 Style preserving TOML library
torch 2.0.0 Tensors and Dynamic neural networks in Python with strong GPU acceleration
torchvision 0.15.1 image and video datasets and models for torch deep learning
tornado 6.3.3 Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
tqdm 4.66.1 Fast, Extensible Progress Meter
traitlets 5.11.2 Traitlets Python configuration system
transformers 4.34.0 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
triton 2.0.0 A language and compiler for custom Deep Learning operations
types-python-dateutil 2.8.19.14 Typing stubs for python-dateutil
typing-extensions 4.8.0 Backported and Experimental Type Hints for Python 3.8+
tzdata 2023.3 Provider of IANA time zone data
uri-template 1.3.0 RFC 6570 URI Template Processor
urllib3 2.0.6 HTTP library with thread-safe connection pooling, file post, and more.
wcwidth 0.2.8 Measures the displayed width of unicode strings in a terminal
webcolors 1.13 A library for working with the color formats defined by HTML and CSS.
webencodings 0.5.1 Character encoding aliases for legacy web content
websocket-client 1.6.4 WebSocket client for Python with low level API options
wheel 0.41.2 A built-package format for Python
widgetsnbextension 4.0.9 Jupyter interactive widgets for Jupyter Notebook
wrapt 1.15.0 Module for decorators, wrappers and monkey patching.
xxhash 3.4.1 Python binding for xxHash
yarl 1.9.2 Yet another URL library

Error generating titles Extra data: line 1 column 2 (char 1)

Error generating titles Extra data: line 1 column 2 (char 1)

-MacBook-Air textbook-gen % python topic_generator.py "automotive repair for mechanics" automotive_repair.json --iterations 100
0%| | 0/100 [00:00<?, ?it/s]Error generating titles Extra data: line 1 column 2 (char 1)
1%| | 1/100 [00:03<06:00, 3.64s/Error generating titles Extra data: line 1 column 2 (char 1)
2%| | 2/100 [00:05<03:59, 2.45s/ccCCCCError generating titles Extra data: line 1 column 2 (char 1)
3%| | 3/100 [00:08<04:54, 3.04s/xxxError generating titles Extra data: line 1 column 2 (char 1)
4%| | 4/100 [00:11<04:20, 2.72s/ 4%| | 4/100 [00:12<04:56, 3.08s/
Traceback (most recent call last):

Change Prerequisites - Python 3.9+ to Python 3.10+

The match-case statement was introduced to Python in v3.10.
When I run the code in Python 3.9.13, It gives me this -
File "C:\Users\omkar\Desktop\textbook_quality\app\llm\llm.py", line 74
match model:
^
SyntaxError: invalid syntax

I encountered another syntax error -
created: datetime | None = Field(
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
The reason for this is -
datetime | None syntax is only supported in 3.10 or later.
A solution for this could be -
from typing import Optional
name: Optional[datetime] = None

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.