Git Product home page Git Product logo

Comments (3)

otvj776 avatar otvj776 commented on June 29, 2024

"""
Traceback (most recent call last):
File "D:\code\LoRA_LLM\env\lib\site-packages\multiprocess\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "D:\code\LoRA_LLM\env\lib\site-packages\datasets\utils\py_utils.py", line 1328, in _write_generator_to_queue
for i, result in enumerate(func(**kwargs)):
File "D:\code\LoRA_LLM\env\lib\site-packages\datasets\arrow_dataset.py", line 3463, in _map_single
batch = apply_function_on_filtered_inputs(
File "D:\code\LoRA_LLM\env\lib\site-packages\datasets\arrow_dataset.py", line 3344, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "C:\Users\Administrator\AppData\Local\Temp\ipykernel_25996\2492540495.py", line 4, in preprocess
NameError: name 'cfg' is not defined
"""

The above exception was the direct cause of the following exception:

NameError Traceback (most recent call last)
Cell In[15], line 1
----> 1 ds_train = ds_train_raw.map(
2 preprocess,
3 batched=True,
4 num_proc=4,
5 remove_columns=ds_train_raw.column_names
6 )
8 ds_val = ds_val_raw.map(
9 preprocess,
10 batched=True,
11 num_proc=4,
12 remove_columns=ds_val_raw.column_names
13 )

File D:\code\LoRA_LLM\env\lib\site-packages\datasets\arrow_dataset.py:580, in transmit_tasks..wrapper(*args, **kwargs)
578 self: "Dataset" = kwargs.pop("self")
579 # apply actual function
--> 580 out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
581 datasets: List["Dataset"] = list(out.values()) if isinstance(out, dict) else [out]
582 for dataset in datasets:
583 # Remove task templates if a column mapping of the template is no longer valid

File D:\code\LoRA_LLM\env\lib\site-packages\datasets\arrow_dataset.py:545, in transmit_format..wrapper(*args, **kwargs)
538 self_format = {
539 "type": self._format_type,
540 "format_kwargs": self._format_kwargs,
541 "columns": self._format_columns,
542 "output_all_columns": self._output_all_columns,
543 }
544 # apply actual function
--> 545 out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
546 datasets: List["Dataset"] = list(out.values()) if isinstance(out, dict) else [out]
547 # re-apply format to the output

File D:\code\LoRA_LLM\env\lib\site-packages\datasets\arrow_dataset.py:3180, in Dataset.map(self, function, with_indices, with_rank, input_columns, batched, batch_size, drop_last_batch, remove_columns, keep_in_memory, load_from_cache_file, cache_file_name, writer_batch_size, features, disable_nullable, fn_kwargs, num_proc, suffix_template, new_fingerprint, desc)
3172 logger.info(f"Spawning {num_proc} processes")
3173 with logging.tqdm(
3174 disable=not logging.is_progress_bar_enabled(),
3175 unit=" examples",
(...)
3178 desc=(desc or "Map") + f" (num_proc={num_proc})",
3179 ) as pbar:
-> 3180 for rank, done, content in iflatmap_unordered(
3181 pool, Dataset._map_single, kwargs_iterable=kwargs_per_job
3182 ):
3183 if done:
3184 shards_done += 1

File D:\code\LoRA_LLM\env\lib\site-packages\datasets\utils\py_utils.py:1354, in iflatmap_unordered(pool, func, kwargs_iterable)
1351 break
1352 finally:
1353 # we get the result in case there's an error to raise
-> 1354 [async_result.get(timeout=0.05) for async_result in async_results]

File D:\code\LoRA_LLM\env\lib\site-packages\datasets\utils\py_utils.py:1354, in (.0)
1351 break
1352 finally:
1353 # we get the result in case there's an error to raise
-> 1354 [async_result.get(timeout=0.05) for async_result in async_results]

File D:\code\LoRA_LLM\env\lib\site-packages\multiprocess\pool.py:774, in ApplyResult.get(self, timeout)
772 return self._value
773 else:
--> 774 raise self._value

NameError: name 'cfg' is not defined

from torchkeras.

otvj776 avatar otvj776 commented on June 29, 2024

往里面传全局参数:
def preprocess(cfg,tokenizer, examples):
from functools import partial
new_preprocess = partial(preprocess, cfg,tokenizer)
然后再将preprocess改成new_preprocess。

from torchkeras.

cgnerds avatar cgnerds commented on June 29, 2024

修改这里即可 num_proc=1。

from torchkeras.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.