Hi, I hope that all the provided information meets your criteria.</p

The thread tried to read from or write to a virtual address for which it does not have the appropriate access. about nuitka HOT 8 OPEN

Gition commented on September 26, 2024

The thread tried to read from or write to a virtual address for which it does not have the appropriate access.

from nuitka.

Comments (8)

kayhayen commented on September 26, 2024

Did you also try without these?

--nofollow-import-to=tensorflow --nofollow-import-to=keras --noinclude-default-mode=warning --noinclude-numba-mode=nofollow

It's not a given that code handling their absence is not your issue.

from nuitka.

kayhayen commented on September 26, 2024

And yes, very much better thank you. I am putting needs_example tag only because I asked a question, and that's how I know that I am waiting for a reply.

from nuitka.

Gition commented on September 26, 2024

Certainly, I attempted it without those and the error was the same.

I made an other test whit this reduced script:

#Version to test Nuitka
import os,sys
from pathlib import Path
import torch
from tqdm import tqdm
import random


"""
import transformers
from torch.utils import tensorboard
from torch.utils.tensorboard import SummaryWriter
"""

def get_config():
    return {   
        "root": os.path.dirname(os.path.realpath(__name__)),  
        "file_name_source": "postgres.txt",        
        "folder_source": "source_ini",      
        "folder_source_tokenizer": "source_tokenizer",   
        "folder_tokenizer": "pretrained_tokenizer", 
        "folder_tensors": "pretrained_tensors",         
        "batch_size": 32,
        "num_epochs": 4,
        "lr": 1e-5,
        "seq_len": 350,
        "max_len": 512,
        "model_folder": "MODAI",
        "model_pretrained": "pretrained_model",
        "model_trained": "trained_model",              
        "logging_dir": "logging_dir",        
        "model_basename": "tmodel_",
        "preload": None,
        "tokenizer_file": "tokenizer_{0}.json",
        "experiment_name": "runs/tmodel"
    }

def get_weights_file_path(config, epoch: str):
    model_folder = config["model_folder"]
    model_basename = config["model_basename"]
    model_filename = f"{model_basename}{epoch}.pt"
    return str(Path('.') / model_folder / model_filename)
    
def get_model_folder_base_path(config):
    model_root = config["root"]
    model_folder = config["model_folder"]    
    return str(Path('.') / model_root / model_folder)    

def get_model_pretrained_tensors_path(config):
    model_root = config["root"]
    model_folder = config["folder_tokenizer"]  
    model_basename = config["folder_tensors"]     
    return str(Path('.') / model_root / model_folder / model_basename ) 
    
#Folder for pretrained model    
def get_model_folder_base_pretrained_path(config):
    model_root = config["root"]
    model_folder = config["model_folder"]  
    model_basename = config["model_pretrained"]   
    folder=str(Path('.') / model_root / model_folder / model_basename ) 
    print(f"Root folder: {model_root}")
    return str(Path('.') / model_root / model_folder / model_basename )  

#Folder for trained model    
def get_model_folder_base_trained_path(config):
    model_root = config["root"]
    model_folder = config["model_folder"]  
    model_basename = config["model_trained"]     
    return str(Path('.') / model_root / model_folder / model_basename )  

#Folder for logging    
def get_model_folder_base_logging_path(config):
    model_root = config["root"]
    model_folder = config["model_folder"]  
    model_basename = config["model_trained"]  
    model_logging = config["logging_dir"]     
    return str(Path('.') / model_root / model_folder / model_basename / model_logging )      

##Folder with text file training source
def get_folder_source_path(config):
    model_root = config["root"]
    model_basename = config["folder_source"]
    return str(Path('.') / model_root / model_basename) 

##Text file training source
def get_file_source_path(config):
    model_root = config["root"]
    model_basename = config["folder_source"]
    file_basename = config["file_name_source"]
    return str(Path('.') / model_root / model_basename / file_basename)
    
##le répertoire source contenant les fichiers source pour faire le training de tokenizer    
def get_folder_source_tokenizer_path(config):
    model_root = config["root"]
    model_basename = config["folder_source_tokenizer"]
    return str(Path('.') / model_root / model_basename) 

##le repertoire source du tokenizer
def get_tokenizer_folder_path(config):
    model_root = config["root"]
    model_basename = config["folder_tokenizer"]
    return str(Path('.') / model_root / model_basename) 
    
##le fichier json du tokenizer (vocabuaire)    
def get_tokenizer_file_path(config,version):
    model_root = config["root"]
    model_basename = config["folder_tokenizer"]
    file_basename = config["tokenizer_file"].format(version)
    return str(Path('.') / model_root / model_basename / f'{file_basename}')


def mlm(tensor,seed):
    # create random array of floats with equal dims to tensor
    torch.manual_seed(seed)
    rand = torch.rand(tensor.shape)
    # mask random 15% where token is not 0 <s>, 1 <unk>,2 <s/> or 4 <pad>
    mask_arr = (rand < .15) * (tensor != 0) * (tensor != 1) * (tensor != 2) * (tensor != 4)
    # loop through each row in tensor (cannot do in parallel)
    for i in range(tensor.shape[0]):
        # get indices of mask positions from mask array
        selection = torch.flatten(mask_arr[i].nonzero()).tolist()
        # mask tensor
        tensor[i, selection] = 3 #mask id
    return tensor


class Dataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        # store encodings internally
        self.encodings = encodings

    def __len__(self):
        # return the number of samples
        return self.encodings['input_ids'].shape[0]

    def __getitem__(self, i):
        # return dictionary of input_ids, attention_mask, and labels for index i
        return {key: tensor[i] for key, tensor in self.encodings.items()}
	
	
def main():
    
	config=get_config()
	SEED=42#By default
	torch.manual_seed(SEED)


	"""
	Pretrain model ...
	"""

	# initialize the tokenizer using the tokenizer we initialized and saved to file
	##Folder with trained tokenizers
	tokenizer_pretrained_folder = get_tokenizer_folder_path(config)
	##Folder with splited source file
	tokenizer_source_folder=get_folder_source_tokenizer_path(config)
	print(f"tokenizer_source_folder: {tokenizer_source_folder}")
	max_len = config["max_len"]



	#set folder base model
	model_base_name=get_model_folder_base_path(config)


	path_pretrained_model=get_model_folder_base_pretrained_path(config)
	#print("path_pretrained_model: ",path_pretrained_model)

	if os.path.exists(path_pretrained_model):
		print(f'The folder model [ {path_pretrained_model} ] already exists!')
		sys.exit()


	#Source of file text for training
	paths = [str(x) for x in Path(tokenizer_source_folder).glob('**/*.txt')]
	# initialize lists of tensors
	input_ids = []
	mask = []
	labels = []
	
	# open all files and show paths
	for path in tqdm(paths[:200]):
		# :50
		# open the file and split into list by newline characters
		with open(path, 'r', encoding='utf-8') as fp:
			lines = fp.read().split('\n')
		print (path)
		
	   
	print ("Done")



if __name__ == "__main__":
    main()

and theses conditions:

py -m nuitka --standalone --noinclude-default-mode=warning --noinclude-numba-mode=nofollow --module-parameter=numba-disable-jit=yes --assume-yes-for-downloads --module-parameter=torch-disable-jit=yes --include-data-dir=../Nuitka_test/pretrained_tokenizer=pretrained_tokenizer --include-data-dir=../Nuitka_test/source_tokenizer=source_tokenizer MyScriptTest.py

and the executable works.

In my view, when utilizing the transformers module, the program attempts to write temporary files to a restricted area of the hard disk, resulting in an error due to the lack of administrative privileges on my end.

from nuitka.

Gition commented on September 26, 2024

Meanwhile, I conducted two additional tests.

In the first test, I copied the distribution generated by the second script (the version labeled #Version to test Nuitka) onto a machine where I have administrator privileges, and the executable worked as expected.

In the second test, I transferred the distribution generated by the main script (with the error discussed in this chat) to the same machine. The executable ran without any error messages, but there were no observable actions—no intermediate messages, no final message indicating "Done," and no regeneration of tensor files.

from nuitka.

kayhayen commented on September 26, 2024

Without a reproducer, there is not a lot to see I am afraid. Probably the model has dependencies of some sorts, but I cannot tell that without seeing it.

from nuitka.

Gition commented on September 26, 2024

I intend to generate the executable on a machine where I have administrator privileges. Thank you for your assistance and dedication to developing this tool. It's a remarkable work!
As a final test, I could provide a sample input text file and attempt to generate tensor files. Here is the transformer module which perhaps utilize some sophisticated dependencies.

from nuitka.

Gition commented on September 26, 2024

This is a configuration of a Python-compatible model aimed at replicating the error encountered during the execution of the compiled script.
Following each run, the "model_results" folder needs to be cleared.
PretrainTestNuitka2.zip

from nuitka.

kayhayen commented on September 26, 2024

Ok, once I got my "untrusted" Azure VM setup, I will give that a shot.

from nuitka.

The thread tried to read from or write to a virtual address for which it does not have the appropriate access. about nuitka HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent