Git Product home page Git Product logo

Comments (21)

algal avatar algal commented on May 23, 2024 2

I've got a fix. PR incoming in a few hours.

from gptel.

karthink avatar karthink commented on May 23, 2024 1

from gptel.

d1egoaz avatar d1egoaz commented on May 23, 2024 1

Something interesting I found in my environment is that I have an issue retrieving the key via .authinfo.gpg.

When I do that, I need to pass the bearer token encoded:

("Authorization" . ,(encode-coding-string(format "Bearer %s" api-key) 'utf-8))))

If I hardcode the key, I only need to do:

("Authorization" . ,(format "Bearer %s" api-key))))

I've pushed these changes to my (yes, another ChatGPT package) https://github.com/d1egoaz/c3po.el

from gptel.

karthink avatar karthink commented on May 23, 2024 1

@algal and @d1egoaz Thank you for debugging this issue and for the fix! Figuring this out is nontrivial -- may I add you to an acknowledgments section in the README?

from gptel.

algal avatar algal commented on May 23, 2024 1

@karthink Of course, feel free!

from gptel.

karthink avatar karthink commented on May 23, 2024 1

Updated README with acknowledgments.

from gptel.

hraban avatar hraban commented on May 23, 2024 1

By god this was hard to debug but I think I've found the actual culprit:

(dolist (x (list
            "x"
            (shell-command-to-string "printf x")
            (encode-coding-string (shell-command-to-string "printf x") 'utf-8)))
  (let ((s (concat x (encode-coding-string "é" 'utf-8))))
    (message
     "%S: %s(%s) %s, %s"
     s
     (multibyte-string-p s)
     (multibyte-string-p x)
     (string-bytes s)
     (length s))))

Output:

"x\303\251": nil(nil) 3, 3
"x\303\251": t(t) 5, 3
"x\303\251": nil(nil) 3, 3

Note

url-request will fail if (length request)(string-bytes request)

Emacs has two kinds of strings: multibyte and unibyte. If you concat multiple strings, and any single one of them is multibyte, the entire result will be multibyte. The body of url-http-create-request is basically one giant (concat ...) call, including the HTTP headers. So if any header is multibyte, no matter its contents, it spoils the effect of encoding the request body. In my case I was getting an API key from a call to shell-command-to-string, and the same was happening to @d1egoaz further up. Encoding that output, even if it's plain ASCII, ensures the actual string type is unibyte, which in turn ensures the final HTTP request is unibyte. This is why you have to encode the API key.

String literals without multibyte characters are automatically unibyte, which is why you won't trigger this bug if you paste your API key literally into the REPL. This can be quite confusing while debugging! Haha.

from gptel.

d1egoaz avatar d1egoaz commented on May 23, 2024

try using us-ascii instead of utf-8, that fixed the issue for my chatgpt request function. https://github.com/d1egoaz/dotemacs/blob/master/modules/diego-chatgpt.el

         (url-request-data (encode-coding-string (json-encode
                                                  `(:model ,model
                                                           :messages [(:role "system" :content ,sys-content)
                                                                      (:role "user" :content ,input)]
                                                           :temperature 0.7)) 'us-ascii)))

BTW, the answer to your question is: Emacs

from gptel.

d1egoaz avatar d1egoaz commented on May 23, 2024

I'm still not sure why utf-8 is not working, I only use it for english content so us-ascii might work, I've used accents and ñ, and it still works with 'us-ascii

from gptel.

algal avatar algal commented on May 23, 2024

If I wanted to work on this, what’s the easiest way to reproduce the bug? Just make any request that includes multibyte Unicode characters in the prompt?

I am particularly interested in being able to use Modern Greek characters in a prompt, and I believe but am not sure that those are multibyte.

I’m curious for instance if the answer might be as simple as to set the Content-Type to "application/json;charset=UTF-8" when making the request.

from gptel.

algal avatar algal commented on May 23, 2024

Had a closer look at this. Was easily able to reproduce the problem. Did some basic log debugging and I’m pretty sure your code is preparing the request data payload correctly. At least, the functions length and string-bytes return the same number, and the request data payload string itself does look like what Unicode characters forced into byte string would look like. This suggests the problem is somehow due to passing in header values that cause the overall request to have some improperly prepared values, which causes the overall request to be wrong even if the request data payload is fine. If I can figure out the emacs debugger I’ll see if I can spot where this is happening.

from gptel.

CyberShadow avatar CyberShadow commented on May 23, 2024

is to call encoding-coding-string on the JSON request and send utf-8 text, but this does not seem to work

FWIW, it works here with Emacs 28.2. Full self-contained test (code is verbose for illustration):

(require 'json)
(let* ((url-request-method "POST")
       (url-request-extra-headers
        '(("Content-Type" . "application/json")
          ("Authorization" . "Bearer sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")))
       (request-content "qual é o melhor editor, vim ou emacs?")
       (request-object `(:model "gpt-3.5-turbo"
        			:messages [(:role "user"
        					  :content ,request-content)]))
       (request-string (json-encode request-object))
       (request-bytes (encode-coding-string request-string 'utf-8))
       (url-request-data request-bytes)
       (response-buffer (url-retrieve-synchronously "https://api.openai.com/v1/chat/completions"))
       (response-bytes (with-current-buffer response-buffer
        		 (goto-char 1)
        		 (search-forward "\n\n")
        		 (buffer-substring (point) (point-max))))
       (response-string (decode-coding-string response-bytes 'utf-8))
       (response-object (json-read-from-string response-string))
       (response-choices (cdr (assoc 'choices response-object)))
       (response-first-choice (elt response-choices 0))
       (response-message (cdr (assoc 'message response-first-choice)))
       (response-content (cdr (assoc 'content response-message))))
  (message "%s" response-content))

try using us-ascii instead of utf-8,

That just converts non-ASCII characters to question marks. The model may be able to fill in the gaps with Latin-based languages but will fail completely with other languages.

from gptel.

algal avatar algal commented on May 23, 2024

That self-contained test snippet (updated with a valid API key of course) still does not work for me on Emacs 28.1 or Emacs 28.2 on Linux.

It prints nil, rather than failing with any error associated with multibyte encoding.

I notice it calls url-retrieve-synchronously while the code we are debugging called url-retrieve, which is asynchronous (https://github.com/karthink/gptel/blob/master/gptel.el#L400).

Moreover, from adding some logging to it, I suspect it's not passing the authorization information at all, since I get an error saying none were passed. This may be because the bound value url-request-extra-headers, which is supposed to be passed through dynamic scope, is not being seen for one reason or another. I'm not sure why not.

from gptel.

CyberShadow avatar CyberShadow commented on May 23, 2024

It prints nil, rather than failing with any error associated with multibyte encoding.

Interesting. Just to confirm, is it the same if you run it with emacs -Q --batch -l test.el?

from gptel.

d1egoaz avatar d1egoaz commented on May 23, 2024

I've tested #4 (comment)
with emacs -Q --batch -l test.el and it works fine for me.

❯ emacs -Q --batch -l /tmp/test.el
Contacting host: api.openai.com:443
Como uma IA, não tenho opinião e não sou capaz de determinar qual o melhor editor, pois cada um tem seus prós e contras e depende das necessidades e preferências de cada usuário. Uma pesquisa sobre as funcionalidades de cada um e experimentá-los pode ajudar a decidir qual é o melhor para sua rotina de trabalho.

However, when you change the code to use url-retrieve async (as @algal mentioned) it doesn't work:

(require 'json)
(let* ((url-request-method "POST")
       (url-request-extra-headers
        '(("Content-Type" . "application/json")
          ("Authorization" . "Bearer <your-token>")))
       (request-content "qual é o melhor editor, vim ou emacs?")
       (request-object `(:model "gpt-3.5-turbo"
                                :messages [(:role "user"
                                                  :content ,request-content)]))
       (request-string (json-encode request-object))
       (request-bytes (encode-coding-string request-string 'utf-8))
       (url-request-data request-bytes))
  (url-retrieve "https://api.openai.com/v1/chat/completions"
                (lambda (_status callback &rest args)
                  (message (buffer-substring-no-properties (1+ url-http-end-of-headers) (point-max))))))
❯ emacs -Q --batch -l /tmp/test.el
Contacting host: api.openai.com:443

I've changed the code like below, and it works:

(require 'url)
(require 'json)

(defun my-response-handler (_status callback &rest args)
  (message "🤖: %s" (buffer-substring-no-properties (1+ url-http-end-of-headers) (point-max))))

(let* ((api-key "<api-key>")
       (url "https://api.openai.com/v1/chat/completions")
       (model "gpt-3.5-turbo")
       (url-request-method "POST")
       (url-request-extra-headers `(("Content-Type" . "application/json")
                                    ("Authorization" . ,(format "Bearer %s" api-key))))
       (request-content "qual é o melhor editor, vim ou emacs?")
       (url-request-data (encode-coding-string
                          (json-encode `(:model ,model
                                                :messages [(:role "user" :content ,request-content)]))
                          'utf-8)))
  (url-retrieve url #'my-response-handler
                '((lambda (result &rest args)
                    (message result))
                  nil)))
(sleep-for 10)
❯ emacs -Q --batch -l /tmp/test.el
Contacting host: api.openai.com:443
🤖: {"id":"xxxx","object":"chat.completion","created":1680113895,"model":"gpt-3.5-turbo-0301","usage":{"prompt_tokens":18,"completion_tokens":266,"total_tokens":284},"choices":[{"message":{"role":"assistant","content":"Como uma IA, eu não tenho preferência pessoal, mas posso lhe dizer que ambos os editores são populares e poderosos e têm suas próprias vantagens e desvantagens.\n\nVim é um editor de texto baseado em linha de comando e é conhecido por sua velocidade e eficiência. Ele é altamente configurável e pode ser personalizado para atender às suas necessidades, permitindo que você insira, exiba e edite o conteúdo do arquivo sem precisar tocar no mouse.\n\nPor outro lado, Emacs é um editor de texto e ambiente de desenvolvimento integrado (IDE) de código aberto e personalizável. É conhecido por sua flexibilidade e capacidade de extensão, tornando-o ideal para desenvolvedores que desejam personalizar completamente seu ambiente de trabalho. Emacs tem uma curva de aprendizado íngreme, mas um grande potencial para se tornar um ambiente de trabalho produtivo e personalizado.\n\nDito isto, a escolha entre Vim e Emacs é pessoal e dependerá em grande parte das suas necessidades e preferências pessoais. Ambos são excelentes editores de texto e podem ser usados para uma variedade de tarefas e projetos."},"finish_reason":"stop","index":0}]}

I wonder now if the issue is with some specific emacs version, or environment state 🤷

This is my emacs version:

GNU Emacs 30.0.50 (build 1, aarch64-apple-darwin22.3.0, NS appkit-2299.40 Version 13.2.1 (Build 22D68)) of 2023-03-26

from gptel.

d1egoaz avatar d1egoaz commented on May 23, 2024

I wonder if you can try that @algal ^

from gptel.

algal avatar algal commented on May 23, 2024

@d1egoaz This produced some progress! I am using a .authinfo fille (not .authinfo.gpg). And when I modified in gptel.el the function gptel-api-key-from-auth-source, so that instead of returning (funcall secret) it returns (encode-coding-string (funcall secret) 'utf-8), then I became able to make a request successfully without relying on external curl. Hooray!

Unfortunately, the response is now decoded incorrectly. But there are no errors at the HTTP transport layer anymore.

Why is this happening? I suspect it's because the lambda which is returned by auth-source itself returns a multibyte string. (I verified this.) This is odd since that string contains only ASCII characters, and its length equals its strings-bytes count.

But that string needs to be be processed by encode-coding-string, just like our payload data, or else the request string that is concatenated together by url-http-create-request ends up being multibyte as well, and also ends up having its length not equal its string-bytes count, perhaps because of some subtlety of how concat mixes these types.

from gptel.

algal avatar algal commented on May 23, 2024

@CyberShadow

It prints nil, rather than failing with any error associated with multibyte encoding.

Interesting. Just to confirm, is it the same if you run it with emacs -Q --batch -l test.el?

I tried again and this time your snippet worked, both in ielm and when run in batch mode. It's possible it's an intermittent issue but I'd bet I just made some kind of typo the first time when trying your snippet. Sorry for wasting your time!

from gptel.

d1egoaz avatar d1egoaz commented on May 23, 2024

Unfortunately, the response is now decoded incorrectly. But there are no errors at the HTTP transport layer anymore.

@algal I had the same issue, and I needed to do:
https://github.com/d1egoaz/c3po.el/blob/b81f4c48616932f375104419b3f2d1a74b5db28e/c3po.el#L88

I've been able to use emojies (utf-8 stress test) without issues, I'm pretty sure it should work now for any other case.

from gptel.

algal avatar algal commented on May 23, 2024

Unfortunately, the response is now decoded incorrectly. But there are no errors at the HTTP transport layer anymore.

@algal I had the same issue, and I needed to do: https://github.com/d1egoaz/c3po.el/blob/b81f4c48616932f375104419b3f2d1a74b5db28e/c3po.el#L88

I've been able to use emojies (utf-8 stress test) without issues, I'm pretty sure it should work now for any other case.

@d1egoaz You may be onto something. Your code reads the buffer to a data string, decodes the data strong to utf8, and then parses it as json. The GPTel code, which doesn't handle the response properly, goes in the other order: it parses the headers of the buffer, reads the rest as json, and then later decodes as utf8.

from gptel.

d1egoaz avatar d1egoaz commented on May 23, 2024

hraban

Thank you, @hraban! 🏆 Now I can finally put this behind me and move on. It's always been bothering me to not know the source of the problem.

Thank you so much!

from gptel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.