Comments (11)
And I also confirmed that I've used these models with the correct settings template.
from llmfarm.
Well, if you run LLMFarm from XCode you should be able to see where the error occurs. I would be very grateful if you could send me a more detailed description of the errors.
from llmfarm.
Well, although some crashes don't show errors, I tried to capture some logs while it was crashing:
ex.
tinnyllama-1.1b: Metal is on / MLock is off / Mmap is on
AI init
llama_model_loader: loaded meta data with 23 key-value pairs and 201 tensors from /var/mobile/Containers/Data/Application/03B8B998-F4AA-4BF3-9E0E-A82E061A1CC1/Documents/models/tinyllama-1.1b-chat-v1.0.Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = tinyllama_tinyllama-1.1b-chat-v1.0
llama_model_loader: - kv 2: llama.context_length u32 = 2048
llama_model_loader: - kv 3: llama.embedding_length u32 = 2048
llama_model_loader: - kv 4: llama.block_count u32 = 22
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 5632
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 64
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 4
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 7
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2
llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m...
llama_model_loader: - kv 22: general.quantization_version u32 = 2
llama_model_loader: - type f32: 45 tensors
llama_model_loader: - type q8_0: 156 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 2048
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 4
llm_load_print_meta: n_layer = 22
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_embd_head_k = 64
llm_load_print_meta: n_embd_head_v = 64
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: n_embd_k_gqa = 256
llm_load_print_meta: n_embd_v_gqa = 256
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 5632
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 1B
llm_load_print_meta: model ftype = Q8_0
llm_load_print_meta: model params = 1.10 B
llm_load_print_meta: model size = 1.09 GiB (8.50 BPW)
llm_load_print_meta: general.name = tinyllama_tinyllama-1.1b-chat-v1.0
llm_load_print_meta: BOS token = 1 '''
llm_load_print_meta: EOS token = 2 '
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 2 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.15 MiB
ggml_backend_metal_buffer_from_ptr: allocated buffer, size = 1114.92 MiB, ( 1115.00 / 5461.34)
llm_load_tensors: offloading 22 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 23/23 layers to GPU
llm_load_tensors: CPU buffer size = 66.41 MiB
llm_load_tensors: Metal buffer size = 1114.92 MiB
llama_new_context_with_model: n_ctx = 1024
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A17 Pro GPU
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading '/var/containers/Bundle/Application/13501DF2-09F1-454E-B12B-62CE7A418F5D/LLMFarm.app/llmfarm_core_llmfarm_core_cpp.bundle/metal/ggml-metal.metal'
ggml_metal_init: GPU name: Apple A17 Pro GPU
ggml_metal_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction support = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 5726.63 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 22.00 MiB, ( 1139.88 / 5461.34)
llama_kv_cache_init: Metal KV buffer size = 22.00 MiB
llama_new_context_with_model: KV self size = 22.00 MiB, K (f16): 11.00 MiB, V (f16): 11.00 MiB
llama_new_context_with_model: CPU input buffer size = 6.01 MiB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 82.02 MiB, ( 1221.89 / 5461.34)
llama_new_context_with_model: Metal compute buffer size = 82.00 MiB
llama_new_context_with_model: CPU compute buffer size = 4.00 MiB
llama_new_context_with_model: graph splits (measure): 3
%s: seed = %d
0
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 |
Logits inited.
ModelSampleParams(n_batch: 512, temp: 0.9, top_k: 40, top_p: 0.95, tfs_z: 1.0, typical_p: 1.0, repeat_penalty: 1.1, repeat_last_n: 64, frequence_penalty: 0.0, presence_penalty: 0.0, mirostat: 0, mirostat_tau: 5.0, mirostat_eta: 5.0, penalize_nl: true)
ModelAndContextParams(model_inference: llmfarm_core.ModelInference.LLama_gguf, context: 1024, parts: -1, seed: 4294967295, n_threads: 6, lora_adapters: [], promptFormat: llmfarm_core.ModelPromptStyle.Custom, custom_prompt_format: "<|user|>{prompt}\n<|assistant|>", system_prompt: "You are a story writing assistant.", f16Kv: true, logitsAll: false, vocabOnly: false, useMlock: false, useMMap: true, embedding: false, processorsConunt: 6, use_metal: true, grammar_path: nil, add_bos_token: false, add_eos_token: false, parse_special_tokens: false, warm_prompt: "\n\n\n", reverse_prompt: [], clip_model: nil)
from llmfarm.
And here is the incorrect content example:
Orca-mini-3b on iPhone 15 Pro Max Simulator
from llmfarm.
Is this incorrect output in version 1.0.1 or earlier?
from llmfarm.
output of versions before 1.0.1 may be very different due to changes in llama.cpp
from llmfarm.
metal does not work in the simulator and since version 1.0.0 it is disabled there
from llmfarm.
The incorrect output is in earlier version (0.9.0 and 1.0.0). I will try it out with 1.0.1. Thanks for the information.
from llmfarm.
Hi @guinmoon ,
I've updated to 1.0.1 version, but I'm still getting the incorrect response, Could you please help me understand why or suggest anything I need to fine-tune? Thank you.
iPhone 15 Pro Max Simulator:
Phi2
from llmfarm.
i use this template
[System](You are a helpful, respectful and honest assistant. Always answer as helpfully as possible.)
Instruct: {prompt}
Output:
from llmfarm.
Thanks for the template. I will try it out.
from llmfarm.
Related Issues (20)
- Dolphin models problem (modelLoadError) HOT 5
- Small bugs/unexpected behavior and feedback with suggestions HOT 6
- Can the model get information from the outside? HOT 3
- Crash when running with RWKV 5 (Raven) HOT 2
- Sampling stuck in greedy HOT 6
- iOS 15 support HOT 2
- Additional Support for Vision Based Models Like Llava? HOT 2
- Apple Model Available HOT 2
- Few feature requests.. HOT 3
- Adding instruction for TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF HOT 1
- App crashes right away HOT 1
- It's so slow HOT 6
- Bugs and suggestions HOT 2
- Issue with LLAVA 1.5 HOT 4
- Add support for image generation models like stable diffusion HOT 1
- The version downloaded via git crashes on the physical device iPhone 14 Plus HOT 1
- Google Gemma 1.1 Support HOT 1
- Add support for Llama 3 HOT 6
- Support for Phi-3 models HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llmfarm.